Data Pseudonymization

 

Next Steps
Overview Anonymize Custom Encode Encrypt Hash Pseudonymize Randomize Redact Scramble Twiddle Shift Tokenize

Challenges


While masking data, or producing useful test data, you need output values that look real, but do not reveal personally identifiable information (PII). This is particularly true with the names of people, places, and things.

Encryption, scrambling, redaction, hashing and many data obfuscation functions protect data at risk, but do not provide the level of realism certain recipients require. You need an easier way to change the individualizing characteristics of data using a substitute, but realistic, output value. This is also referred to as data shuffling.

You must also ensure that the real name cannot be readily discovered through reversal or guesswork. And if you want to provide replacement names, or pseudonyms, for people in production or test data environments, the replacements need to be remain consistent for referential integrity, and the values need to stay updated as original names come and go.

Solutions


If you work with PII in tables or flat files, use IRI FieldShield -- or the SortCL program in the IRI CoSort product or IRI Voracity platform -- to replace that data with safe, but realistic replacement output stored in DB tables or external data sets called set files. If you need to do the same with ranges in Excel, use IRI CellShield, or IRI DarkShield. They support:

Recoverable Pseudonymization
Specify a lookup set where real and fake names are either pre-associated, or automatically associated at random. Use the restore set to recover the original names.
Unrecoverable Pseudonymization
Randomly select substitute names for the original value from a set file containing real or fake names. This way the original name value has no automatic basis for restoration.
Consistent, Self-Updating Pseudonymization
Choose from a hash set rule, or palette item in IRI Workbench to maintain updated, consistent pseudonyms that maintain uniqueness and referential integrity.
Deterministic Pseudonymization
Replaces non-unique substitute values from an original value, and fabricates associated PII using this rule in IRI FieldShield or DarkShield. The function creates a unique composite key which allows for natural determinism.

Specify the pseudonym method used in your output fields in simple 4GL job scripts, or use the pseudonymization dialog in the masking rules for FieldShield and DarkShield, in the same Eclipse™ IDE, or in CellShield, which also supports pseudonymous lookup replacements of values in Excel.

Pseudonymization is only one method you can use to shuffle the contents and thereby de-identify information in a record. You can also combine pseudonyms with other field-level data security functions.


Need Test Names?

In addition to pseudonymizing and otherwise masking production data, there is a standalone solution for producing safe, but realistic first and last names of either gender (or other nouns). IRI RowGen uses the same metadata as FieldShield (via CoSort SortCL) to create and format pseudonyms for use as test data values (or in formatted test data targets).

RowGen is especially helpful for providing anonymous, but real-looking, test data when production data is unavailable or insufficient. RowGen builds structurally and referentially correct test data into database, file, and report targets. RowGen is also included in Voracity.

Frequently Asked Questions (FAQs)

1. What is data pseudonymization?
Data pseudonymization is a data masking technique that replaces personally identifiable information (PII) such as names or places with realistic substitutes that retain data utility but prevent identification of the original subject.
2. How does pseudonymization differ from encryption or redaction?
Unlike encryption or redaction, pseudonymization replaces the original value with a consistent, human-readable substitute that looks real. It protects identity without rendering the data unreadable or unusable.
3. What are the main types of pseudonymization supported by IRI?
IRI supports three types of pseudonymization:
Recoverable, using lookup sets to restore original values if needed
Unrecoverable, where values are replaced without a way to reverse them
Self-updating/consistent, which maintains referential integrity across datasets over time
4. How can pseudonymization help with data privacy compliance?
Pseudonymization reduces the risk of re-identification, helping organizations meet GDPR, HIPAA, and other privacy regulations by protecting indirect identifiers while preserving data usefulness for testing or analytics.
5. Can I apply pseudonymization to Excel or unstructured data?
Yes. IRI CellShield supports pseudonymization in Excel ranges, while IRI DarkShield supports pseudonymization in unstructured sources like PDFs, Word documents, and other free-form text files.
6. How do I ensure referential integrity when pseudonymizing data?
Use IRI\'s consistent pseudonymization methods that map original values to stable replacements using hash sets or palette rules. This ensures that the same input always yields the same pseudonym across records.
7. What is a set file in IRI pseudonymization?
A set file in IRI is a list of predefined values —such as names or places— used to substitute real data during pseudonymization. You can use static sets or allow random associations depending on the method chosen.
8. Can I generate realistic pseudonyms for testing without using production data?
Yes. IRI RowGen can generate synthetic but realistic first and last names (or other nouns) for testing purposes using the same metadata as FieldShield. This eliminates dependency on live data.
9. What industries benefit most from pseudonymization?
Any industry that handles sensitive data—such as healthcare, finance, education, and government—can benefit from pseudonymization, especially when sharing data for development, QA, or analytics.
10. Can I combine pseudonymization with other masking methods?
Yes. Pseudonymization can be layered with encryption, redaction, or other field-level data protection techniques within the same job script or GUI wizard, enhancing security while preserving utility.
Share this page

Request More Information

Live Chat

* indicates a required field.
IRI does NOT share your information.