
Fabricating PII
What is the Identity Fabrication Rule?
The identity fabrication rule is a deterministic but non-reversible data masking rule that can help you create realistic test PII for a whole record from just one original value. The new function, configurable from IRI Workbench, creates meaningful and unique new values by pseudonymizing an initial value (e.g., a name), and synthesizing other values alongside it.
The mock data is based on one value in your input data and a seed value. It can be fabricated as a rule in either IRI FieldShield or IRI DarkShield data masking jobs.
For example, let’s say you want to mask the name Tom in every database and file where “Tom” appears – but not just to some random value, but to another realistic-looking name. This is where the Identity Fabrication rule shines.
By passing a name, seed value, and expected return type, you can synthesize a new mock identity for “Tom”. That is, beyond just the name replacement for “Tom” the rule creates additional, common PII for “Tom” as well, which can be used to mask other parts of “Tom’s” Personal Identifiable Information (PII).
With just that one name, the fabrication function will create fake, but realistic PII associated with “Tom”. So it will not only replace every instance of Tom with Carl everywhere, but also create a fake Last Name, Phone Number, SSN, and more, just using “Tom” as the input source.
Note that you do not need to use any or all of the new mock information. It will be available if you want to use it, but it is not required until you, as a user, ask for it.
How is the Identity Fabrication Rule helpful?
Using the example and information below, we can use this single rule to cover most of our important PII. Lets take this fake RDB table for example:
ID Num (PK) | First Name | Last Name | SSN | State | City |
1 | Tom | Baker | 111-11-1111 | Texas | New York |
2 | Mike | ||||
3 |
In the table above, I am giving a few examples of where this rule would be useful. We can use any of these values to use as our deterministic lookup value, meaning, if I want to use the values from FirstName to mask/generate data, we can do that.
Alternatively, in this situation, a better use would be to use the ID Num column, which contains values for all users, even if the other columns are not populated. For the records, I will be using the same rule, just asking for different return values.
Since I do not want to mask column 1, and it is a Primary Key (PK) – meaning, I will always get a unique value – I will use this column as my input data.
Inside the job specifications – either FieldShield SortCL /FIELD statements or DarkShield configuration (.dsc) files – the rules would appear as follows:
For column 2 (First Name):
rule_name(“fname”, ${fieldname}, “seedValue”), where ${fieldname} == 1
(for this specific column + record).
For column 3 (Last Name):
rule_name(“lname”, ${fieldname}, “seedValue”), where ${fieldname} == 1
(for this specific column + record).
And so forth, for record 1. For records 2 and 3 the function will do the same thing, except passing ${fieldname} as 2 and ${fieldname} as 3, respectively.
Below would be the output in a database table, for example:
ID Num (PK) | First Name | Last Name | SSN | State | City |
1 | Brian | Lane | 829-01-8932 | Illinois | Houston |
2 | Serena | Points | 374-57-8257 | Pennsylvania | San Diego |
3 | Mary | Moore | 192-83-7420 | Georgia | San Jose |
As you can see, even when source values were not present, additional test data gets created, based simply on the PK provided in the database.
The same concepts apply to any source, including files, NoSQL DBs, etc. If a file contained free-floating text that contained “The patient in question was Robert Milnn.” We can use this rule to replace both the First Name and the Last Name by just asking for a different return type. So our new masked text would become “The patient in question was Benny Suiin.”
How to use the Identity Fabrication Rule?
This rule will be located inside the New Data Rule wizard in IRI Workbench, and called from the Data Class and Rules Library (iriLibrary.dcrlib), where all other rules reside. The exact location is under Masking -> Deterministic -> Non-reversible -> Identity Fabrication:
After selecting the “Identity Fabrication” rule, a new window will appear:
In this dialog, you would add a field name if you know the name of the field that you want to reference. If you do not know, leave it as i,s and the current field’s value will be used instead. You can also specify any ASCII value to act as a seed value.
The seed value maintains consistency, or determinism, between the data that is passed and the expected output. For example, if I pass in the word Tom with no seed value, I might get Carl. The next time, Tom might be replaced with Trisha instead.
If I specify the same seed each time, however, Tom would be Chris (for example) each time I run the job. Deterministic values that work from job to job support referential integrity in, and “golden copies” of, test data targets.
For the selected type option, this will be the TYPE of value that is given back. In the image below, for example, the rule will return a value that looks like a formatted Social Security Number (xxx-xx-xxxx).
All the options are shown here, and selecting one of these options allows for a different return type:
After masking your selection and clicking “Finish”, the rule is now ready to be used in a FieldShield or DarkShield job.
If you would like a walkthrough on how to run any of these jobs, please see the links below: