Pseudonym Hash Set (File) Creation Wizard
In conjunction with a newly created wizard used to generate Pseudonym Hash Replacement Rules, based on the same concept discussed in a previous article, a Pseudonym Hash Set File Creation Wizard is now also available in IRI Workbench.
This wizard creates a two-column pseudonym replacement set (“pseudo set) file with hash values for the lookup list to provide a pseudo set file compatible with the Pseudonym Hash Replacement Rule for IRI FieldShield.
Normally, a pseudonym replacement set file consists of two columns, a column containing a lookup list and another column containing a list of replacement values. The purpose of a lookup value is to match against values provided in a source like other files or entries in a DB.
Two-Column Pseudonym Replacement Set File
If a match is found from within the lookup list then the adjacent column’s value will be used as the replacement value of the original source value. In that way, consistent pseudonym replacement occurs.
The Pseudonym Hash Set Creation Wizard creates a specially formatted pseudonym replacement set file where the first column containing the lookup list’s values is in a hashed format. This is necessary for the operations used in a Pseudonym Hash Replacement Rule.
Using the Pseudonym Hash Set Creation Wizard will feel very similar to how users have previously created pseudonym replacement set files. The only major difference is that a hashing function must be applied. Please note that the selected hashing function must match the hashing function that will be used in the Pseudonym Hash Replacement Rule.
Steps
1. To access the Pseudo Hash Set Wizard, expand the IRI RowGen icon on the Workbench toolbar and select the New Set File option from the dropdown menu.
2. When the Set File Creation wizard opens, select Pseudo Hash Set and click Next.
3. On the next page select the project location, project name, and any extra options for the set file creation job then click Next. By default, these extra options are checked to enable the job script to be saved and run at completion of the wizard.
4. When the job script is run, a two-column pseudonym set will be generated.
The notable difference between this generated pseudonym set file and the set files generated from the other wizard is that this set file will contain a lookup column with hash values.
As such we need to select a hashing function to be applied to the lookup column. Currently, the Workbench supports three hashing methods: MD5, SHA1, and SHA2.
5. After selecting the hashing function to be applied to the lookup column, a source or multiple sources must be selected by clicking the Add button.
6. A new dialog will open where you can select their source and metadata. If metadata for the source does not exist yet, the Discover button will allow you to create metadata for that source ad hoc.
7. If you had selected the Discover metadata button in the previous step a new window opens that will allow you to create new metadata. On the first page, provide a unique name for the metadata file along with the location where it will be stored after creation. Once finished, click Next.
8. On the next page, specify the source file and the format expected in the source. Once these parameters have been set, click Next to move forward.
9. On the next page, review and edit what field/s are being used for the metadata file creation. If satisfied, click Finish.
10. Finally, a lookup source with metadata has been created and now just needs to be set by clicking the checkbox within the Fields area, and clicking OK.
11. Next, you must choose what source/s to use as the replacement values. If you decide to use the scramble option, the same source that was used for the lookup list will be scrambled and used for the replacement list.
Alternatively, you can choose to add a new source for the replacement values by clicking Add. Repeat steps 6 through 10.
12. Lastly, you must either provide a default value or indicate that an empty string will be used in the event that there are more entries of lookup values than entries of replacement values.
13. If all the necessary fields have been filled out, click Finish.
Results
If the default options at the beginning of the wizard were left as is then the newly created job script should create the two-column pseudonym set file with hash values stored in the lookup list column of the set file.
Generated scripts that will create the pseudo hash set file
A pseudo hash set file with a lookup column containing hashed values and a column of replacement values
After creating a pseudonym-hashed replacement set file, this file can be used in conjunction with a Pseudonym Hash Replacement Rule to pseudonymize fields in a table or file. If you have any questions or need help implementing this concept, please email fieldshield@iri.com.