LibGuides: Research Data Management: Personal data and anonymization

Personal data

Many researchers process (collect/study/store/share) personal data when conducting their research. Examples of methods and which personal data might be processed are:

Type of study and object	Personal data
Interview: What is your opinion on how the municipality of Rotterdam is dealing with greening the city?	Contact details, recording and opinions
Survey on how to respond to bullies at work	Demographic information and behaviour
Behavioural lab: Researching the importance of attention on decision-making processes with the help from a virtual reality shop	Contact details, demographic information, eye-tracking, motion-tracking and behaviour
Analysing Twitter responses to politicians in America	User name (real name?), avatar (is it a photo?) and opinions

As long as it is possible to link specific information to a participant, for example through contact details, recordings, combining data items (job title and employer or neighbourhood and date of birth) or because of data which sets the individual apart from all others (rare disease), the data is personal data.

Depending on the sensitivity of the personal data, you need to take appropriate measures to reduce the risks for participants. Personal data becomes more sensitive if it concerns vulnerable persons (e.g., children) or covers certain topics (e.g., health). One way to do reduce the risks, always to be considered, is to apply anonymisation (and pseudonymisation) techniques.

If you are a student, contact your supervisor for help.

More information on what personal data is can be found in the anonymisation glossary on the EUR website.

Anonymisation and pseudonymisation

Applying anonymising and pseudonymising techniques is a measure you must take to protect the privacy of research respondents.

In case you are planning to research a sensitive topic, research subjects are believed to respond more truthfully when they know that their identity is not recorded. This is even more relevant when questions deal with sensitive issues, social taboos or illegal behaviour. Since it is in the interest of the researcher to ‘get to the truth’ there may also be an element of self-interest in providing anonymity in collecting primary data.

Which technique of anonymisation and pseudonymisation you could apply depends on the type of research you are doing and the phase of research you are in (i.e., recruiting, collecting, and analysing, publishing and sharing or archiving).

Definitions

Anonymisation

The process by which personal data is irreversibly altered in such a way that an individual can no longer be identified or is identifiable directly or indirectly. Once data is truly anonymous the General Data Protection Regulation (GDPR) does not apply (recital 26 GDPR).

Be aware that whether something is anonymous can change over time, with more data and new technology becoming available. Hence, rather than “anonymity” or “anonymous data”, it is good to talk about data to which anonymisation techniques have been applied.

Pseudonymisation

Pseudonymisation of data means replacing any identifying characteristics of data with a pseudonym, or, in other words, a value which does not allow the research participant to be directly identified. It is different from anonymisation as, in many cases, it still allows identification using indirect identifiers.

The pseudonymisation key, i.e., the file (or other form) in which the direct identifiers are linked to the pseudonym, needs to be stored separately from the pseudonymised data.

More information on anonymisation and pseudonymisation can be found on the EUR website.

Informing participants

Another very important measure is to be transparent with your research participants about which personal data you plan to process (collect/study/store/share) for which reason and at what point you will anonymise the data.

Additionally, it must be made clear to prospective research participants that they are free to decide whether or not to take part in the research, and whether any data collected from and about them is included in analysis. In most cases, this is secured through obtaining informed consent.

For more information on informed consent forms, check out the Erasmus informed consent templates on the EUR website.

Data classification
While writing your dissertation you may (re)use existing data or create new data. These data sets may contain sensitive data. Sensitive data is frequently classified as: public, internal, confidential or secret. The data classification is important as it determines how you store, share and publish your data.

Suggested

Contact

Email the Information skills team

Research Data Management: Personal data and anonymization