Data Scrambling: Format-Preserving Scramble (FPS)

by Adam Lewis

This article demonstrates the use of the “format preserving scramble” (FPS) function available in IRI Voracity data masking and test data synthesis tools ¹. FPS obfuscates numeric and alphabetic data in a random way but preserves the format and original widths of the characters (including upper- and lower-case letters).

Punctuation or other characters, such as – or *, are left unaltered so that data formats such as those found in Social Security numbers and phone numbers are retained. This type of masking is useful for Personally Identifiable Information (PII) when both security and realism are needed.

FPS is meant to be a more secure form of masking than format-preserving encryption (FPE). The reason FPS is more secure than FPE is because FPS is not reversible.

Unlike encryption, FPS is not deterministic either, and thus replacement values will not consistently correspond with their original plaintext values (as they would with FPE). For this reason, do not use FPS if preserving referential integrity matters.

FPS Rule Wizard

You can find and open the FSP rule from the Random Replacement section of Masking Rules.²

See this article on the Data Class and Rule Library to understand how masking rules can be applied en-masse to PII or other sensitive data you can define and discover/classify.

Once you highlight this rule and click Next > you will see this wizard page for configuring your FPS masking rule:

When creating an FPS rule, you can either choose to allow FPS to generate values based on each incoming value, or provide a default value that will dictate the format of the output. The ability to provide a default value format allows you to use the FPS rule with IRI RowGen, too; see the second example at the end of this article.

If you decide to provide a default format type for the rule, there are seven options:

Manually provide a value (word, number, phrase).
YYYY-MM-DD
MM-DD-YYYY
DD-MM-YYYY
HH:MM
HH:MM:SS
HH:MM:SS.nnnnnnnnn

In the example above, when option 2 was selected from the menu, a literal value is placed inside the FPS rule to provide a format for future data that will be generated.

Note that the date/time formats will output realistic and valid date values (unlike FPE). However, they are not aligned with SortCL date and time data types, as FPS is for data declared as ASCII or NUMERIC only.

FPS Example in a FieldShield (Data Masking) Job Script

The syntax for the FPS masking rule in a field statement is scramble_fp( {$FIELD} ). Like fp_encrypt functions, FPS can be applied to fields typed as ASCII or NUMERIC.

The example below shows the use of the FPS rule used to anonymize several columns of incoming data from a table in a database while preserving the data’s original format. Orange lines in this FieldShield mapping diagram show which fields are being masked:

This FieldShield job is serialized in this SortCL-compatible script:

In the script above, the scramble_fp() rule takes in the incoming field name to determine the expected format for the scrambled output.

Original data inside database table:

Note that FPS was applied to most, but not all of the target fields mapped to the output. The masked data is shown below as specified, in an Excel spreadsheet:

Note also that the SortCL-compatible FieldShield job script shown above can also be expanded to accommodate ETL, cleansing, reporting and complex business logic.

FPS Example in a RowGen (Data Synthesis) Job Script

It is also possible to apply the FPS function to randomly generated or selected data (which RowGen synthesizes in the input phase of its jobs). In this example, the scramble function defined in the output phase also reformats the raw alpha-digit input into desired formats:

The sample job script shown above produces values based on the format of a literal default value passed to the scramble_fp(“some_value”) rule as a parameter. Here, the scramble_fp(fieldname) statements were used to scramble and transform incoming raw values into the parenthetically-defined formats in the script.

Scrambled Results:

Some of the data generated in this job was used as input in the prior FieldShield job sample, too. If you are interested in the use of FPS or other data masking or generation functions from IRI, please email voracity@iri.com.

In this case, the IRI FieldShield and DarkShield data masking tools, and the RowGen test data synthesis and subsetting tools in the IRI Voracity platform’s Data Protector Suite
Because the purpose of the FPS rule is to replace original values with randomly generated values that follow a specific format, it is categorized as a Masking Rule and further categorized as a type of random replacement.

Masking PHI in X12 EDI Files with DarkShield

Joining Flat-File & RDB Data: Textual ETL (Part 2)