Article 17 of the General Data Protection Regulation (GDPR) stipulates the Right to Erasure, often referred to as the Right to be Forgotten. While the regulation specifies some requirements as to what controllers must do with data requested to be “erased”, it does not expressly define what the term erasure means. The California Consumer Protection Act (CCPA) also has this provision.
This article describes automatic data discovery and erasure methods in the IRI static data masking products that find and delete personally identifiable information (PII) in different data sources. All IRI ‘shield’ products are front-ended in the same Eclipse IDE called IRI Workbench.
In fact, it is through Workbench that data collectors and processors subject to GDPR can:
- define (classify) PII specific to an individual, group, or any PII attributes in general
- search and locate that PII in disparate sources to validate and fix it, helping comply with Article 16 (the Right to Rectification)
- extract and provide (deliver) that data in various formats, helping comply with Article 20 (Data Portability)
- delete or remove PII ad hoc from individual files or tables, or en masses across multiple sources through a multi-source wizard applying the erasure function to a data class; or,
- Otherwise mask or de-identify PII with other obfuscation and anonymization functions like encryption, pseudonymization, character/blackout redaction, hashing, blurring, etc.
There are three different product avenues for complying with the regulation in IRI Workbench:
- databases and flat files through IRI FieldShield
- unstructured text, document, and image files through IRI DarkShield
- Excel spreadsheets through IRI CellShield EE. 1
PII Erasure in Structured Data Sources
Two of the many data masking methods in FieldShield can be used to comply with an erasure request for PII stored in relational databases and sequential (flat) files. Some semi-structured instances of MongoDB, JSON, and mainframe files may require DarkShield instead.
The first option replaces the personally identifying information (PII) with an empty string. The second option deletes (removes) the entire record.
To replace a PII value, an assignment rule can be created. The rule can then be applied to any field deemed by the controller to be PII. Using the New Field wizard, select the Assignment Function. In the expression field, simply enter “”. When using this rule in a SortCL script, the field name will be replaced with <fieldName>=””.
Filter for the requestee’s information using an Include statement where the search item is a unique identifier.
This example script replaces six fields with an empty string where the IDNUMBER equals 12345678.
Once the scripts are made for all of the sources with PII, the job can be reused by simply changing the identifier being searched for. For example, the IDNUMBER was changed below,
The procedure for deleting a record will depend on the source type of the data. To delete a record in a database, use the Delete statement with a unique identifier field. Filter for the requester’s information using an Include statement as above where the search item is a unique identifier.
To delete from a flat file, use an Omit statement instead to overwrite the file with the same source information, which removes the requestee’s information from that source.
PII Erasure in Unstructured Files
DarkShield provides a custom Deletion Function which can replace PII with an empty string, much like FieldShield’s assignment function. It’s simply a matter of creating the Deletion Function as a data rule from within the Dark Data Discovery Wizard while defining the search criteria, or from the separate New Data Rule Wizard.
This function will then operate automatically on all instances of the specified PII for the requester wishing to be forgotten, as defined for him/her/them in the data class or data class group.
To find and erase PII belonging to the requester, it is necessary to create a Data Class composed of a set file lookup matcher containing the PII of the requester:
The Data Class can then be associated with the Deletion Function through a Search Matcher:
The search methods that can be associated with a given data class include:
- RegEx pattern matching
- String matches to values in a lookup (set) file
- Named-entity recognition (NER) models
- Bounding boxes draws around fixed image areas
- Facial detection and recognition
However, in the context of right to be erased requests, NER would not be applicable, and the bounding box would need that person’s information to be in a fixed position within an image.
The search results can then be obtained by performing the Search and Remediate Job with the generated .search configuration file:
Note that for PDFs the whitespace for where the original PII was located will be retained. For images, a black box redaction will be applied instead.
PII Erasure in Excel Sheets
The CellShield Enterprise Edition (EE) data masking tool for Excel spreadsheets uses the same Dark Data Discovery wizard in IRI Workbench to classify and find specific PII items, and the first three search methods as DarkShield above. At the end of the search process, a report on all the PII found in all the .xls and .xlsx files is also created in an Excel Interchange Format (.eif) file.
The EIF file is an Excel-compatible sheet listing folder, file, sheet, and cell locations for every item of PII discovered LAN-wide. This helps you comply with GDPR articles 16 and 20, and stages compliance with article 17 via ad hoc or bulk erasure through a redaction function using space characters. Either can be invoked in a selector dialog tied to the same report sheet above:
That dialog can also launch — or you can launch separately — the Intra-Cell Search (and masking) dialog, to specify deletion through redaction. For example:
The intra-cell feature identifies the locations of the PII anywhere in a sheet and can erase it using a space-character to mask each string automatically:
Alternatively, now that all the rows with that person are found, you could choose to erase everything else in, or any other personally identifying parts of, those same rows as needed.
If you have questions about how you can use IRI data masking software to comply with the Right to Erasure, or other provisions of the GDPR, CCPA and similar data privacy laws, please email firstname.lastname@example.org with details about your use case, and arrange a live demo with us here.
- All three of these shield products are available in a single IRI Voracity data management platform subscription as well as individually.