Pseudonymity: An answer to Assessment Privacy Concerns

Introduction:

Data security is of paramount importance today due to the vast amounts of sensitive information stored and transferred online, necessitating measures to protect privacy, comply with regulations, and maintain trust. In 2024, there have been over 5000 publicly disclosed data breaches, exposing more than 20 billion records. Pseudonymity plays a crucial role in enhancing data security by replacing personal identifiers with pseudonyms, thereby minimizing the risk of direct identification.

Pseudonymity is a way of storing electronic data where names or other information to identify a person are stored separately from the data about them. For example, an assessment result could be associated with a numeric ID representing the person who took the assessment rather than with the person’s name. When data is pseudonymized, there is a separate index that allows matching the numeric ID to the name, stored separately.

A key benefit of pseudonymity is that if there is a data breach or other leakage, the data leaked may not include information that identifies the people involved. Pseudonymity can reduce the risks involved with processing personal data and often strikes a good balance between allowing data to be used and protecting people’s privacy.

This article explains what pseudonymous data is, explores the benefits to the assessment community of pseudonymity in the processing of assessment results and other personal data used in testing and examinations, and describes how TCS iON utilize pseudonymous data in the provision of its assessment platform.

Identified, pseudonymous, and anonymous data:

Let’s start by defining some commonly used terms:

• Personal data (also called personal information) is often defined as being any data or information related to an identified or identifiable natural person.

• Anonymous data is data that does not relate to an identified or identifiable person, either because identifying information was not captured in the first place or if it has been anonymized or de-identified with the intent that the data cannot be associated with any person again. Anonymous data needs to be unidentifiable.

• Pseudonymous data is data associated with a particular person, or persons, where additional information is needed to identify the specific people. Often this is created by replacing someone’s name with a system generated ID or reference number, where the key to associate the ID to a person is held separately. The separately held information also needs to be kept secure to prevent it from being used to identify individuals.

The table below shows three examples. A common example of personal data in the assessment context is a list of names of people and their scores in an assessment.

The left column shows the full personal data — the name and score achieved.

The middle column shows anonymous data — essentially just a list of scores.

The right-hand column shows pseudonymous data — the names of people have been replaced with IDs.

Figure 1: Examples of Personal, Anonymous and Pseudonymous data

In most assessment use cases, it is important to be able to associate data with the people that generated the data. Data cannot be anonymous as you need to know who passed or failed a test so you can take the appropriate action (e.g., give a certificate or provide notification of failure to pass).

However, making data pseudonymous is a useful measure with assessment data. It still allows data to be associated with people when needed, but identity is masked for other processing. For a lot of tasks, pseudonymous data is sufficient and personal data from which an individual’s data can be identified or is readily identifiable is not needed to achieve the organization’s purposes.

A well-established example is the manual grading of essays. It’s common practice to mask the name of the test taker to graders so they will not be influenced by any knowledge of the test taker, but, of course, the system requires the identity of the test taker to be able to assign the score in the master records.

The European GDPR law advocates pseudonymization and says:

“The application of pseudonymization to personal data can reduce the risks to the data subjects concerned and help controllers and processors to meet their data-protection obligations.”

How pseudonymous data increases security:

When considering the security of assessment data, it is helpful to identify risks and threats to data and potential countermeasures. A common approach to analyzing security risk is to consider the impact and probability of each potential risk to the confidentiality, integrity, and availability of data and put in place measures to reduce the impact and probability.

In general, pseudonymization reduces the impact or consequences of many risks, for example:

• Impact of data breach: A security risk that concerns all organizations is data being leaked or otherwise exposed in a breach, for example on the Internet. However, if data is pseudonymous, this is much less of a loss to confidentiality than if fully identified data is leaked. That is because unless the breach also contains the index of IDs to names, it will be difficult or impossible to identify the real people behind the leaked pseudonymous data, and, as such, the leakage or breach is much less of a concern for individuals.

• Insider risk: To run their operations, most organizations need to give employees some level of access to personal data. Insider risk is where such a trusted individual breaks trust and accesses or uses the data inappropriately, such as for nefarious or otherwise prohibited purposes. Providing most employees that need access to data with pseudonymous data only reduces this risk and the associated impact.

• Use of processors: It’s common in assessment programs to use several different service providers for delivery, scoring, and analysis of assessments. In privacy parlance, these companies are commonly referred to as “processors.” The risks of security problems are obviously increased the more organizations have access to and process data. However, many processors and sub-processors do not need to know the identity of test-takers and can perform the processing required to deliver their services using pseudonymous data. In such a case, there is a much lower impact if they have a security failure.

Data that is pseudonymous, where the index or keys to the pseudonymized data is held separately, is in general much more secure than identified data.

How pseudonymous data reduces legal and compliance risk:

There are significant benefits to pseudonymity under many privacy laws. These laws vary by geography so here is an overview for some countries and territories.

Europe (including the UK)

Under the GDPR, pseudonymized personal data is still personal data and therefore processing needs to comply with GDPR. The same is true under UK data protection law after Brexit. Although individuals working with the data may not know the identity of the test takers, the testing organization is still able to link the individual records back to individual data subjects.

However, there are significant benefits to pseudonymization under the GDPR and UK data protection law:

1. Security Measures: Under these laws, organizations are required to implement technical and organizational security measures that are appropriate to the risk, considering the personal data and processing involved. Pseudonymization is recognized as a strong measure to secure data and if in place, other measures may be less needed.

2. Breach Notification: These laws have strong requirements around notification in the event of a data breach and can also result in fines following data breaches. A breach of genuinely pseudonymous data involves much less risk than a breach of identified data, and typically it would not be necessary to notify data subjects of a pseudonymous data breach, and it might not even be required to notify the supervisory authority. (However, this depends on a risk analysis, and needs to be evaluated on a case-by-case basis.) It’s also much less likely to result in a fine If a data breach happens.

3. Data Protection by Design: These laws encourage use of pseudonymization as part of the recommended “data protection by design.” In many interactions with regulators there will be some benefit given for implementing pseudonymization. For example, if you can achieve a purpose using pseudonymous data rather than identified data, then it is likely expected that you do this, including to conform with personal data minimization expectations.

4. International Transfers: The European Union and the UK have strict rules regarding the transfer of personal data to other geographies which are beyond the scope of this article. The key point here is that when considering the possible impact on individuals of the processing of their personal data outside of the EU or UK, pseudonymization may be an appropriate safeguard to allow processing/transfer to continue.

5. Processing Purposes: If you have some data collected for one purpose and want to use it for a secondary purpose, then if you can do the secondary purpose with pseudonymous data, you may be able to proceed without going back to the test taker. There are other factors that need to be considered in determining compatibility of purposes of processing data, so you should review carefully with a privacy expert beforehand.

USA

The US has a patchwork of federal and state privacy laws, the former being largely sector-specific (at least when it comes to the commercial sector) and the latter still relatively few and existing alongside state data breach notification laws.

usa

All fifty US states, Washington D.C., and Puerto Rico now have their own breach notification laws. Although these laws differ with respect to specific details, all create incentives for organizations to follow good security measures. A common theme among these laws is that the definition of personal information involves a combination of identified data. Only breaches of specified, identifiable personal information trigger reporting requirements under all US state laws—for example, a name and an email address, or a social security number. Because pseudonymous data is not personal information and is not capable of identifying a person without the key, if only pseudonymous data is leaked, there is no breach that would trigger a reporting obligation under state breach notification laws.

Other Jurisdictions

An ever-increasing number of other countries are enacting privacy laws, often taking inspiration from the European GDPR. The examples of these laws that include provisions on pseudonymous data are:

privacy-law-and-provisions-on-pseudonymous-data
How TCS iON uses pseudonymous data:

TCS iON Digital Assessment is the World’s Largest Digital Assessment Platform conducting effortless, paperless, and errorless assessments. TCS iON has serviced more than 80% of the high-stake exams in India and currently expanding globally. TCS iON Digital Assessment platform served more than 425 million+ candidates and 590 + customers till date. The platform has 560+ unique subject question papers and delivered 5,480+ exams as well.

The below numbers will provide a detailed overview of the reach and usability of the digital assessment platform:

tcsion-digital-assessment-digits

Pseudonymity is at the core of the TCS iON Digital Assessment Platform. The platform does not use or need learner’s personal identities. Therefore, following the principles of privacy by design and data minimization, TCS iON requires that customers using its platform pass a nameless user ID. The platform then delivers an assessment to that unknown learner and passes back the results. TCS iON has no knowledge of the learner’s identity. Only the customer can map the user ID back to that individual.

For example, in the diagram below, the learner is called Jane Doe, but an ID is generated “1234567” and TCS iON only knows that and doesn’t know her name, address, or date of birth. It delivers the assessment and passes back the result.

process-of-pseudonymity

Once the assessment is completed, the answer scripts are sent for marking, during which all identifiable learner details are masked. This masking process leverages the principle of pseudonymity, wherein learner’s actual identities are replaced with unique codes. The examiners are only provided with these pseudonyms, ensuring that they cannot associate a learner's identity with their answer script.

This approach adds a layer of privacy protection, safeguarding learner’s personal information from unintended exposure. Pseudonymity, in this context, not only prevents bias but also enhances data security, ensuring that grading remains objective and confidential. By dissociating the learner’sreal identities from their work, it mitigates the risk of discrimination and fosters a fairer and more secure evaluation system.

The advantage of using pseudonymity for TCS iON customers is that they can use TCS iON as an assessment platform with much less concern about the privacy of their learners than with a platform that has the identities of learners. This reduces both security and compliance risk and is a good example of how privacy by design benefits all stakeholders—TCS iON, its customers, and learners.

Conclusion:

The Article provides the introduction to the concept of pseudonymization and explains how it is a useful measure both for reducing security risk and to aid with geography specific compliances. It also imparts the information on how the pseudonymization technique is being used by TCS iON Digital Assessment Platform.

Although the education space is witnessing rapid digitalization and virtualization, yet industry players must tackle several challenges. The lack of adequate IT security policy and cyber security management can not only put crucial data at risk but also bring core operations of the institute to a standstill. With TCS iON’ pseudonymization technique, educational institutes are assured of the benefits of cost saving, cyber resilience, risk mitigation, and remediation.