Redaction, Deidentification, and Anonymization, New Data
September 22, 2020 | 7 minutes read
Protecting Identifying Data
Nearly every company is faced with handling an enormous amount of data on their consumers in order to do business. Currently, there is also a variety of privacy legislation that companies have to follow to maintain and respect their consumer’s privacy. There are regulations in place on how to handle personally identifiable information or PII.
Two public policies that have been enacted are the General Data Privacy Regulation (GDPR) and the California Consumer Privacy Act (CCPA). While these regulations may appear localized in name, they are enforced globally. If your business wants to do business with consumers in the EU or California, regardless of where you are located, you must comply.
So, what exactly do they mean by PII? Personal data, personal information, or PII can be any information relating to a person’s identity. ‘PII’ is a widely used abbreviation in the United States. However, it may be an incomplete definition. The initials are frequent replacements for four common variations of words. It can be based on personal or personally and identifiable or identifying. All these forms have different legal definitions depending on the state or jurisdiction in which it is used. Other data privacy laws are more complete. The GDPR includes the term personal data, and its scope is much broader.
The National Institute of Standards and Technology (NIST) attempts to make a standard definition of what is considered personal data. According to their interpretation of PII, it includes “any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual’s identity, such as name, social security number, date and place of birth, mother’s maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information.”
An example of how there can be some confusion is a person’s IP address. In the United States, a person’s IP address is not classified as PII on its own but is considered a form of linked PII data. In other countries, including the EU, an individual’s IP address is regarded as part of their PII.
PII can be applied to any number of data types, primarily those that are unique or even linked with other data to identify a single individual. This can include names, social security numbers, addresses, phone numbers, credit card data, and even IP addresses in some cases. Any data that can be used to track down the identity of a single individual can be classified as PII.
To safeguard this type of data, it is the agency’s responsibility, be it a small business, large enterprise, or even a government entity to protect the information from abuse or breaches of their data systems. Ways that can be done is through redaction, anonymization, or de-identification of the data. Once data has been used, or reached the end of its lifespan, it must be destroyed.
What is Anonymization?
As a business that handles a variety of data, it is good to understand the range of options available to protect data. This understanding helps to make an informed decision about the best solution to undertake. Anonymization is one such choice.
When a company is looking to sanitize their data to protect it, anonymization is a method describing data removal. When a data set is anonymized, such as the health records of several patients for study purposes, it describes removing all PII from the data set before use or distribution to protect the privacy of the individual.
Data anonymizations have been defined as a “process by which personal data is irreversibly altered so that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party.” Using this method, the remaining data can be shared or dispersed to other agencies for their use.
When it comes to medical or health records, information removed from the data set can be a name, address, birth date, phone number, or other data that can be used to single out and identify a particular patient.
Anonymized data does come with it some risk. There is always a possibility that with two or more data sets, that information can be de-anonymized by matching similar data sets. The use of generalization and perturbation are two popular anonymization methods. The process of creating two data sets in which data is obscured, but when combined later, can define a single individual is called pseudonymization.
How does Deidentification Work?
Deidentification of data sets is the process by which personal information is removed from the data, but the remaining data is left intact. Since the residual data is left unchanged, it makes it an easy target for pseudonymization. It is a weaker form of anonymization, and unless the information is kept in-house and not distributed is not a guarantee of privacy for personal data.
Anonymization is a complicated process and needs someone familiar with data and encryption to handle it properly. As you see, if you have two sets of data that have been de-identified, merging them would allow the discovery of to whom the information belonged.
Redaction Basics
Redaction is a process of removing data entirely from the data so that it can be shared, distributed, or posted. The remaining data should, by choice, not give a reveal to the identity of personal information to the data set holder.
Often when one looks at a redacted document, there are parts blocked out with black boxes covering the data that should be obscured. It has been a fatal mistake for companies to include information with black boxes thinking this was the same as redaction only to find out later the metadata still contained the information and have been exposed publicly. This type of mistake can cost valuable business and consumer trust.
What does all this mean? The term redact is defined as “to edit or prepare for publishing.” The idea itself may seem simple but can be incredibly difficult to carry out. Also, how it is done and by whom before the company can distribute data can similarly have a security impact.
Using a redaction application like the one offered by CaseGuard in house and handling the redaction within your own business reduces such risk. A professional provider of redaction services that understands and follows redaction to meet legislative requirements is also an option. CaseGuard also offers redaction services that exceed industry standards. CaseGuard can discreetly handle your data professionally, perform redaction services, and return the redacted files.
Redaction does take skill. It is also not just hitting the delete button. To redact a document, video, or audio data set properly, the person or company doing the redaction should have a full understanding of privacy law and its impact on the work being done. Every data set is unique, and knowing which fields are significant for the redaction process takes an understanding of not only the law but how it could be then repurposed to reform the data.
If not handled properly, and specific fields or image frames are missed, it can have severe consequences as law enforcement uses redaction of their body cam video before its release. If a person being arrested is found not guilty, and frames within the video were missed, someone can determine the person’s identity. This information can then be mishandled or even used in a discriminatory manner, such as costing the person their employment or other benefits.
Which Method is Best?
Setting de-identification aside, the better two methods are redaction and anonymization. It is better to have a privacy professional work with your company to determine the best fit and what data needs to be removed. A privacy professional can help a business understand the data, where it comes in, how it is used, who handles the data, and in the end, how it is destroyed. Setting up a company process through your privacy professional is likely the best option for all.
Concerning the various levels of privacy required by the many different laws, it can become a struggle to try to fit all of them. The best advice is to have your privacy professional find the highest security level for each type of data that legislation has been created for. If your company is always doing more than required, there is less opportunity for data breaches, loss of data, and in the end, loss of consumer trust.