Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
How to safely use personal data
Publishing data, re-using and sharing data between researchers or different organizations increases the need to consider anonymity and privacy issues. Legal rules or ethical codes on anonymity and privacy must be considered. Rules and guidelines may be imposed on the research from the government, a funding agency, the research context (research council or university) or be derived from pledges made to research subjects.
In economic, financial or business statistics individual results may be suppressed by the data collecting organization to avoid that business intelligence of individual companies are inadvertently revealed. Database owners will determine in advance how large the minimum number of cases in a category should be (=k) to publish or suppress results (k-anonymity), providing access only to aggregated results.
Data anonymization techniques include:
- removal of all personally identifiable information fields
- aggregation (publishing results only at an aggregated level) - k-anonymity
- encryption (replacing sensitive data with encrypted data),
- masking (changing data values). Examples are:
- substitution (Substitution consists of replacing the contents of a database column with data from a predefined list of factious but similar data types so it cannot be traced to the original subject),
- shuffling (is similar to substitution, except the anonymized data is derived from the column itself,
- number and data variance (data anonymization techniques for numeric and date columns. The algorithm involves modifying each value in a column by some random percentage of its real value to significantly alter the data to an untraceable point) and
- nulling out data (simply removing sensitive data by replacing it with zeros or random numbers).
Not only should data conform to rules of conduct and be designed to respect the privacy of research subjects, but also the devices on which we work with the data should be safe from unauthorized access. Access to network and WiFi connections, when in use, should be included in safety protocols.
As researchers we collect personal data. When we do this, we need to ensure that the privacy of respondents is guaranteed, that access to personal data is controlled and that individuals cannot be identified in published results. This requires a set of measures in designing a survey. The increasing popularity of data sharing strengthens the need to consider anonymizing the data.
While writing your dissertation you may (re)use existing data or create new data. These data sets may contain sensitive data. Sensitive data is frequently classified as: public, internal, confidential or secret. The data classification is important as it determines how you store, share and publish your data.