It is a common mistake to refer to data masking and data encryption interchangeably to mean the same things. While field-level encryption is considered one of many possible “data masking” functions, data masking and data encryption are technically distinct processes.
Data masking in this context refers to the redaction or obfuscation of sensitive data by replacing it with other data — typically characters that will meet the requirements of a system designed to test or still work with the masked results. Masking ensures vital parts of personally identifiable information (PII) — like the fist 5 digits of a social security number — are obscured or otherwise de-identified. And under this definition, the string-masked data is not recoverable.
Data encryption involves converting and transforming data into scrambled, often unreadable, cipher-text using non-readable mathematical calculations and algorithms. Restoring the message requires a corresponding decryption algorithm and the original encryption key. Its goal is therefore to be reversible.
When would you choose to use data masking vs data encryption?
Data masking is often performed in the creation of test data, for medical research, and prevent unauthorized recipients from seeing or re-identifying the original content.
Application developers and those prototyping or benchmarking DB/DW operations commonly request production data for testing. Because that data can be sensitive, and pass through multiple hands, it is at great risk of theft or misuse. Masking under the above definition will redact (cover or strip) the PII elements in the data set; e.g., names, addresses, SSNs, etc.
Common industry terms such as anonymization and de-identification also refer to such processes that irreversibly sever the identifying information in the data set. They prevent future identification of the original data even by the people conducting the research or testing. For example, one cannot discern or re-identify a social security number that presents with its first 5 digits covered by X’s.
For information on string masking (redaction) through character replacement, see: www.iri.com/solutions/data-masking/masking/overview
Data encryption is often used to protect data that is transferred between computers or networks so that it can be later restored. Data like this – whether in transit or at rest – can be vulnerable to a breach. Conversion of data into non-readable gibberish (or even format-preserved ciphertext which is still hard to crack) creates highly secure results. The only way to gain access to the data is to unlock it with a key or password which only those authorized can access.
For more information on column (field) encryption, see: www.iri.com/solutions/data-masking/encryption
Thus it may be easy to think of data masking and data encryption as the same things, since they are both data-centric means of protecting sensitive data. However, it is their inherent procedures and purposes that differentiate them.
IRI FieldShield software protects PII in files and databases with a wide array of protection functions, including data masking and encryption. FieldShield is a the primary data masking product in the IRI Data Protector suite, and is included in the IRI Voracity platform for data discovery, integration, migration, governance, and analytics.