IRI has introduced another data discovery feature for personally identifiable information (PII) held in enterprise data sources. Beyond pattern- and fuzzy-match searches already supported, the new feature finds values held in a lookup or ‘set’ file (e.g., a list of names). The feature is now supported in the IRI FieldShield data masking product for databases and files, the IRI CellShield Enterprise Edition (EE) data masking product for Microsoft Excel spreadsheets, and the IRI Voracity platform for data lifecycle management.
Specifically, the new search capability is built into the wizards for database profiling, flat-file profiling, and dark data discovery in the IRI Workbench GUI (built on Eclipse), which supports FieldShield and all other IRI software. And, the same string-search feature was added to CellShield EE for masking data in Excel spreadsheets.
Value Searches in DBs & Files
To use the new lookup feature in the database or flat-file profiling wizards, find the Column (DB profiler) or Field (file profiler wizard) Selection page, and select the check box for Expression Search. On the next page, select Values File from the Search Type list. Then browse to find and select the set file with the values to be searched. Complete the rest of the fields on the page, and click Finish.
Value Searches in Documents
The Dark Data Discovery wizard can also find values in a tab-delimited ASCII file in Microsoft Office, PDF, and other unstructured documents on a local computer or LAN. The wizard extracts and buckets any values it finds in a delimited flat-file, or Excel Interchange File (EIF) for use in CellShield EE.
Value Searches in Excel
Alternatively, CellShield EE’s new Set File [based] Remediation feature can find values in any Excel 2010 or 2013 spreadsheet that exist in a set file, allowing you to mask those values via encryption, redaction, or pseudonymization. You upload the set file, choose the preferred protection function, and click “Remediate.” A popup lets you know when the operation is done.
Pseudonymization, by way of example, is a good way to de-identify names or other proper nouns while preserving realism in the target. Pseudonyms can be reversible, or not.
- In your worksheet, click the Import Set File icon in the CellShield ribbon to open the Set File Search utility.
- Browse to the set file with the names you are looking for, and try to load it. You will get an error if there’s a problem with the file — it must be a list of ASCII values delimited by a space or carriage return. Click OK to continue.
- For the Remediation Type here, we’re choosing Pseudonymization, though we could also choose redaction (full/partial cell), or encryption (AES 128, FPE AES 256, etc.), instead.
- Click on Find Matches, and the Menu gives a count of the matches found, highlighted in red. Click OK to continue.
- Tick Recoverable to save a restore set, or Non-Recoverable to prevent data restoration.
- For Recoverable, a Restore file is automatically created in the “CX-Pseudo_Restore” folder on the local drive.
- The original Set File is scrambled to randomly create pseudonym (substitute) values, which will also get saved into a recovery file for optional restoration later.
- Click Remediate to pseudonymize the names in the sheet. You can see the scrambled names are now in place.
- You can restore the original names by using the recovery file. Click the restore tab in the set file module. Navigate to that file in the CX-Pseudo_Restore folder, and click the Restore button.
This feature is also offered in Bulk Remediate mode. Using your .eif file, you can simultaneously protect all the set file items in all the discovered sheets (lookup values) in the same way with the set file remediation function you choose.