Masking PANs in Credit Card Images

by Adam Lewis

Abstract: Previously the DarkShield API for Files could perform black-box redaction of primary account numbers (PANs, also known as CCNs) on credit cards only within defined areas in the image ¹. The accuracy of credit card number detection, and thus masking, however, has now been enhanced thanks to new OCR-A support in DarkShield. These addresses use cases where PAN locations vary among images, and when masked or synthetic PANs are needed in testing.

DarkShield and Optical Character Recognition

IRI DarkShield is a solution built to provide data protection in semi-structured and unstructured data files. By searching for and identifying sensitive information in these types of files, DarkShield is able to provide precise masking control over files using various masking operations.

It is important to note that unstructured data does not have to mean text in text files. Text in images and text in images embedded in documents are all viable targets for DarkShield. To scan and detect sensitive data in images the DarkShield API uses a technology called optical character recognition (OCR).

How OCR Works

OCR works by processing a scanned image and analyzing the areas where there are black and white pixels in order to identify characters. Normally a process called thresholding occurs ahead of time to preprocess the image into a black and white image.

Each character is segmented into its own individual images. Then during the recognition phase, individual characters and words are identified based on a score assigned to them.

OCR and Credit Card Font

Normally OCR relies on machine learning to recognize characters from groups of contours and shapes detected in an image. Machine learning is a powerful tool that provides an accurate and flexible method for character recognition.

That said, machine learning is also limited by the trained models it uses. A broad but easy-to-understand example would be that a model trained on detecting cars cannot be used to detect a motorcycle.

Similarly, the Tesseract OCR models trained for detecting characters in images have challenges recognizing the characters of a credit card because they may be in a special font. This special credit card font is referred to as OCR-A font. To deal with special fonts, models need to be trained with large data sets to learn how to recognize characters in special fonts.

A common alternative for recognizing characters in an exotic font is template matching. Template matching can be useful in certain situations, like in recognizing PANs, and is available as an alternative in the DarkShield API for detecting these characters in credit cards.

What is Template Matching?

Template matching is a technique used to find matches or close matches in an image using a template image as the reference. In the context of OCR, template matching is used to help recognize optically processed characters in images.

This technique can be a very simple but effective method used for matching on handwriting or characters of a particular font.

Template matching requires a template image containing characters in the target font. Using the template image as the base comparator, OCR will process the actual image to be parsed by using a sliding technique.

Reference image for OCR-A font used in template matching.

This process of sliding a template image from left to right, up to down, one pixel at a time, calculates how well a match has been discovered at each location. The index of the best match is recorded as the recognized character.

Sliding across an image and matching based on the template image

To learn more about OCR template matching, follow this link.

Configuring the DarkShield-Files API Call

To begin with, calls must be made to the DarkShield-Files API to create a search context, mask context, file search context, and file mask context. These contexts tell the API how it will search for PII, what masking operations should be applied on found PII, and special configurations to be used based on file types.

Search and mask context

In the search context shown above, a credit card matcher named “CcnMatcher” is defined and uses a regular expression pattern matcher to identify PANs. A mask context defines rules to indicate the masking function that will be used, and rule matchers to match rules with search matchers.

In the setup above, a format-preserving encryption (FPE) rule called “FpeRule” is created and a rule matcher that pairs the “CcnMatcher” with the “FpeRule” is created and called “FpeRuleMatcher”.

File search context and file mask context

File search contexts and file mask contexts allow for file-type specific configurations to be passed as part of the context. In the setup above, there is a configuration for image files that specify that template matching will be used in conjunction with OCR.

Results of Search and Masking

Original credit card image

Credit card image with redacted numbers

From the comparison of the before and after images we can see the PANs have been identified and masked using black-box redaction. To view the full source code, see the credit card demo hosted in IRI’s GitHub repository. Contact darkshield@iri.com if you like more information.

Frequently Asked Questions (FAQs)

1. What is PAN masking and why is it important?

PAN data masking is the process of protecting Primary Account Numbers (PANs) or credit card account numbers in images (or other data sources) by redacting or encrypting them. It is important because it prevents exposure of sensitive payment information in production, test, or shared environments, reducing the risk of fraud and regulatory non-compliance.

2. How does DarkShield detect and mask credit card numbers in images?

IRI DarkShield uses Optical Character Recognition (OCR) technology to locate and recognize credit card numbers in images. Once detected, it applies the selected masking function, such as black-box redaction or format-preserving encryption, to replace or obscure the numbers.

3. What is OCR-A font and why does it matter for credit card masking?

OCR-A is a standardized font commonly used for printing credit card numbers. Recognizing characters in OCR-A font can be challenging for standard OCR models, so DarkShield includes specialized OCR-A support to improve detection accuracy for PANs in card images.

4. How does template matching improve OCR accuracy for PANs?

Template matching compares each character in the image to a reference template of OCR-A characters, sliding pixel by pixel to find the closest match. This method is effective for detecting numbers printed in consistent fonts, improving accuracy when standard machine learning models struggle.

5. Can DarkShield generate synthetic credit card numbers instead of redacting them?

Yes. Instead of simply black-boxing numbers, DarkShield can use format-preserving encryption or generation rules to produce realistic, non-sensitive PANs that maintain the original format, which is ideal for testing environments.

6. How do you configure DarkShield to use OCR and template matching together?

You configure a file search context and file mask context in the DarkShield REST API or DarkShield-Files RPC API. Within the configuration, specify that template matching should be applied alongside OCR to ensure higher accuracy when scanning credit card images.

7. What masking methods can be applied to PANs?

Supported methods for PANs in images include black-box redaction, format-preserving encryption (FPE), pseudonymization, and replacement with synthetic numbers. The method you choose should depend on whether you need to preserve format and test data usability.

8. Can DarkShield handle multiple credit cards in a single image?

Yes. DarkShield scans the entire image and detects every occurrence of a credit card number, applying the chosen masking function to each detected PAN automatically.

9. How are search matchers and masking rules paired in DarkShield?

A credit card matcher (e.g., CcnMatcher) is typically defined using a regex pattern to locate PANs, and is usually associated with IRI’s included data class validator for credit cards. A masking rule, such as FPE, is created and paired with the matcher through a rule matcher in the IRI Data Class and Rules Library. This ensures all detected PANs are masked consistently.

10. Can DarkShield mask other financial data like account or routing numbers?

Yes. You can also use DarkShield to find and mask account and routing numbers in checks or other financial documents by specifying the coordinates or search matchers for those fields.

11. How does format-preserving encryption benefit test environments?

Format-preserving encryption replaces real PANs with secure, reversible values that retain the same length and structure. This allows systems and workflows to function normally while ensuring sensitive data is protected.

12. What are the prerequisites to use OCR-based PAN masking?

You need to have DarkShield installed and running with access to the DarkShield-Files API. You also need to configure a data class for PANs, a search matcher using regex or template matching, and a masking rule to apply to detected values.

13. How accurate is OCR-based credit card detection?

Accuracy depends on image quality, lighting, and font clarity. With OCR-A font support and template matching, DarkShield significantly improves recognition rates for PANs, even when card numbers appear in different positions across images.

14. Can I preview results before applying masking to production data?

Yes. You can run search-only jobs to verify which PANs are detected and confirm accuracy before enabling masking. This allows fine-tuning of search matchers and template matching settings.

15. How do I integrate PAN masking into an automated workflow?

You can schedule DarkShield jobs in the IRI Workbench task scheduler. Or, you can call the DarkShield API or command task (CLI) to automate search-and-mask operations on incoming images from within a program or other scheduled workflow. The GUI can also be used to export API or CLI specifications to make it easier to integrate these calls from external programs. Running these jobs automatically can help ensure continuous protection of sensitive payment data.

Note that the DarkShield-Files API can also mask account and routing numbers in checks by supplying an API call with the coordinates where these numbers are located (the location of these numbers are always the same place). This can be seen in the following article where the DarkShield-Files API demonstrates the ability to mask credit card numbers and checking account and routing numbers by replacing the sensitive data in images with newly generated realistic data. Alternatively, the data can be simply redacted with a black box.

Voracity Software Support for Cloud File Stores

Synthesize Realistic DB Test Data with Set Files