{"id":14170,"date":"2020-12-22T18:40:23","date_gmt":"2020-12-22T23:40:23","guid":{"rendered":"http:\/\/www.iri.com\/blog\/?p=14170"},"modified":"2025-04-21T08:23:54","modified_gmt":"2025-04-21T12:23:54","slug":"masking-pdfs-and-images","status":"publish","type":"post","link":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/","title":{"rendered":"Masking PII in PDF &#038; Image Files"},"content":{"rendered":"<p><a href=\"https:\/\/www.iri.com\/products\/darkshield\"><span style=\"font-weight: 400;\">IRI DarkShield<\/span><\/a><span style=\"font-weight: 400;\"> can search for, and mask, personally identifiable information (PII) and other sensitive data in many different file types, documents and databases on-premise or in the cloud. Among the file types supported are those in PDF and image formats, which are the focus of this article. Note that DarkShield can now also find and redact signatures in these sources <a href=\"https:\/\/www.iri.com\/blog\/data-protection\/finding-and-redacting-signatures-in-darkshield\/\">as well<\/a>.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As with other document types, DarkShield supports sensitive data discovery and protection in PDFs and images using masking functions specified as rules during <a href=\"https:\/\/www.iri.com\/blog\/data-protection\/iri-data-classification\/\">data classification<\/a> in IRI Workbench. These <\/span><span style=\"font-weight: 400;\">data masking <\/span><a href=\"https:\/\/www.iri.com\/solutions\/data-masking\/static-data-masking\"><span style=\"font-weight: 400;\">functions<\/span><\/a><span style=\"font-weight: 400;\"> include, but are not limited to, hashing, encryption, pseudonymisation, and black-box redaction.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This article provides standalone examples of PDF masking and image masking via DarkShield Jobs, created through wizards in the IRI Workbench <a href=\"https:\/\/www.iri.com\/products\/workbench\/darkshield-gui\">GUI for DarkShield<\/a>. For a more general discussion of using the GUI for all kinds of files, see <a href=\"https:\/\/www.iri.com\/blog\/data-protection\/finding-and-masking-pii-in-files-with-the-darkshield-files-wizard\/\">this article<\/a>.<\/span><\/p>\n<h5><b>Challenges with PDFs and Images<\/b><\/h5>\n<p><span style=\"font-weight: 400;\">Searching and masking can at times present file-type-specific challenges. This is the result of files varying in their format, how data is stored in files, and the necessary techniques used to access and manipulate data in these files. Broadly speaking, PDF and image files can represent an outsized challenge for both search and masking operations.<\/span><\/p>\n<h5><b>Search Challenges<\/b><\/h5>\n<p><span style=\"font-weight: 400;\">Technical limitations around PDF and image files can hamper the ability of software to find and extract PII from them. More specifically, both the quality and format of the files can determine whether complications will arise during the search process.\u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">PII in images is translated to text using OCR (optical character recognition). The OCR model IRI ships with DarkShield can be used as is or fine-tuned (trained) to perform at a higher level of accuracy.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Low-quality images present a significant challenge. Specifically, images with low resolution, images containing multi-colored backgrounds, and text that cannot be easily read (e.g., in a very unique font or handwritten characters) may cause the OCR engine to stumble.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Searching for PII in PDFs also has its own set of challenges. You may need to extract the PII either from within form fields (fairly straightforward), or from free-form text based on X,Y coordinate specifications. Unfortunately, there are several challenges with the latter.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The first problem is that because characters are X,Y coordinate positioned, the use of spaces and tabs is not guaranteed. Moreover, null values can be in different locations to represent that there are \u201cspaces\u201d in a line and\/or line breaks to indicate the end of a line or sentence.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Either way, this can result in sentences that are broken up incorrectly because of these faulty characters. That can disrupt content interpretation and thus the accuracy of Named Entity Recognition in searches.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Another problem with PDF character reading is the possibility of slight differences in Y elevation of characters on the same line. This commonly occurs when there are subscripts and superscripts. This is a serious problem when extracting text that is above an underline.\u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Consider this PDF from which text must be extracted for identification and\u00a0<\/span><span style=\"font-weight: 400;\">masking to occur:\u00a0\u00a0<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-17253 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/sample-patient-admission-form-1-300x248.png\" alt=\"\" width=\"435\" height=\"360\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/sample-patient-admission-form-1-300x248.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/sample-patient-admission-form-1-768x635.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/sample-patient-admission-form-1.png 800w\" sizes=\"(max-width: 435px) 100vw, 435px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Below is the PDF processor\u2019s extracted text that has difficulty reading words hovering above lines.<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17254\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/PDF-processor-extracted-text-300x157.png\" alt=\"\" width=\"602\" height=\"315\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/PDF-processor-extracted-text-300x157.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/PDF-processor-extracted-text-768x401.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/PDF-processor-extracted-text.png 936w\" sizes=\"(max-width: 602px) 100vw, 602px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Yet another problem is that PDFs can have images embedded within. That loops back to the problems with images.<\/span><\/p>\n<h5><b>Recommendations<\/b><\/h5>\n<p><span style=\"font-weight: 400;\">Some of the issues mentioned above can be alleviated by providing images with decent resolution quality and monochrome backgrounds (as much as possible). IRI also recommends modifying your DarkShield File Configurations as needed for a more tailored processing of these file types.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">File configurations for images that may help include:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Pass in bounding boxes to target hard-to-recognize text like signatures<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Pass in parameters for the OCR engine<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Provide a different OCR engine more fine-tuned for the task at hand.<\/span><\/li>\n<\/ul>\n<h5><b>Masking Challenges<\/b><\/h5>\n<p><span style=\"font-weight: 400;\">The challenges of masking PDFs and images mainly arise from how those files are formatted. First is the issue of replacing text based on X,Y coordinates. Unlike most file types, PDFs and images do not make accommodations (shift characters to the right) for new text entered on a line unless it is done manually by an editor, like Adobe Acrobat.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">For example, consider this \u2018before\u2019 clip of text in a PDF:<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17256\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/before-text-in-pdf.png\" alt=\"\" width=\"320\" height=\"88\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Note that same clip \u2018after\u2019 a <\/span><b>hash<\/b><span style=\"font-weight: 400;\"> function was used to replace the name \u201cGerald\u201d with hash value:\u00a0<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17257\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/hash-value.png\" alt=\"\" width=\"343\" height=\"90\" \/><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0<\/span><span style=\"font-weight: 400;\">This demonstrates how replacement values exceeding the length of original PII values can create a text overlap issue. A total word\u2019s length is determined by the number of characters, font type, and individual differences in character width. It can thus be a challenge to find a suitable replacement value even if it has the same number of characters.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In image files, this issue can manifest differently. Consider this \u2018before\u2019 clip of text in a JPG file:<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17258\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/before-clip-in-jpg-300x56.png\" alt=\"\" width=\"488\" height=\"91\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/before-clip-in-jpg-300x56.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/before-clip-in-jpg.png 556w\" sizes=\"(max-width: 488px) 100vw, 488px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">In this \u2018after\u2019 clip, AES256 encryption was applied to the name to produce ciphertext value:<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17259\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/after-clip-in-jpg-300x83.png\" alt=\"\" width=\"502\" height=\"139\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/after-clip-in-jpg-300x83.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/after-clip-in-jpg.png 568w\" sizes=\"(max-width: 502px) 100vw, 502px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">The image process attempts to \u201cfit\u201d generated text into the X,Y coordinates by shrinking the text. As you can see from the example above, at some point the text can become too small to read.<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17260\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/check-xmple-300x137.png\" alt=\"\" width=\"481\" height=\"220\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/check-xmple-300x137.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/check-xmple.png 728w\" sizes=\"(max-width: 481px) 100vw, 481px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Images have an additional challenge when substituting text in images as a new background behind the text must also be drawn. This can lead to differences in color gradients between the background of the original image versus the one in the snippet generated with the replacement value.<\/span><\/p>\n<h5><b>Recommendations<\/b><\/h5>\n<p><span style=\"font-weight: 400;\">Because of the complexity and challenges of masking PII in PDFs and images, there is no one-size-fits-all solution. That said, some general guidelines should be followed:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Use the appropriate masking rules when working with PDFs and images.\u00a0 When trying to resolve word overlap issues, use masking functions that will not return new values longer than the length of the original text. <\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><span style=\"font-weight: 400;\">Certain masking functions like format-preserving encryption, length-preserving pseudonymization, and character redaction will produce words that are equal in character count. This can keep overlap to a minimum or prevent it altogether.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">By way of alternatives, character removal in PDFs and the default black-box redaction function for images work fine in terms of space. There are also PDF and image file masking settings in DarkShield that you can configure, e.g.,\u00a0<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">the font type of replacement text<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">whether to copy and reuse original background color when inserting replacement text<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Whether to have PDF text replacement attempt to perform character shifting to prevent overlap (which does not always produce desired results).<\/span><\/li>\n<\/ul>\n<h5><b>Masking PII with the DarkShield Wizard<\/b><\/h5>\n<p><span style=\"font-weight: 400;\">Below I demonstrate the use of the <\/span><i><span style=\"font-weight: 400;\">New File Search\/Masking Job<\/span><\/i><span style=\"font-weight: 400;\">\u2026 wizard in the I<\/span><span style=\"font-weight: 400;\">RI Workbench GUI for DarkShield<\/span><span style=\"font-weight: 400;\"> to build a DarkShield job to search and mask PII in PDF and image files.\u00a0<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17261\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/Wizard-in-IRI-workbench-gui-for-DS-300x100.png\" alt=\"\" width=\"588\" height=\"196\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Wizard-in-IRI-workbench-gui-for-DS-300x100.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Wizard-in-IRI-workbench-gui-for-DS-768x257.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Wizard-in-IRI-workbench-gui-for-DS.png 933w\" sizes=\"(max-width: 588px) 100vw, 588px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">To open the wizard,\u00a0 select the DarkShield menu dropdown from the top toolbar\u2019s charcoal shield icon and select the <\/span><i><span style=\"font-weight: 400;\">New Files Search\/Masking Job<\/span><\/i><span style=\"font-weight: 400;\">\u2026 wizard. This brings up the first page where you can name your new job:<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17262\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/DS-dropdown-menu-300x274.png\" alt=\"\" width=\"408\" height=\"373\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/DS-dropdown-menu-300x274.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/DS-dropdown-menu.png 535w\" sizes=\"(max-width: 408px) 100vw, 408px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Here you will also specify the folder and file name for the DarkShield .dsc job file and location where the job file is placed after wizard is completed. The subfolder will also indicate where the metadata file will be placed in an IRI project after a DarkShield Search Job has been run.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Click <\/span><i><span style=\"font-weight: 400;\">Next<\/span><\/i><span style=\"font-weight: 400;\"> to move into the Search Report Options page.<br \/>\n<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17263\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/search-report-options-300x244.png\" alt=\"\" width=\"397\" height=\"323\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/search-report-options-300x244.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/search-report-options.png 533w\" sizes=\"(max-width: 397px) 100vw, 397px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">This page lets you customize a flat-file search log by selecting metadata attributes of the files in which PII was discovered.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Click <\/span><i><span style=\"font-weight: 400;\">Next<\/span><\/i><span style=\"font-weight: 400;\"> when finished to move to the Data Class and Masking Rule Selection page where you can select your data classes, which determine the PII you are trying to find, and how the PII found, should be masked.<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17265\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/Click-Next-when-finished-300x274.png\" alt=\"\" width=\"394\" height=\"360\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Click-Next-when-finished-300x274.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Click-Next-when-finished.png 534w\" sizes=\"(max-width: 394px) 100vw, 394px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">In the <\/span><i><span style=\"font-weight: 400;\">Data Class and Masking Rule Selection<\/span><\/i><span style=\"font-weight: 400;\"> dialog, you will define the contents of your project\u2019s <\/span><a href=\"https:\/\/www.iri.com\/blog\/data-protection\/iri-data-classification\/\"><i><span style=\"font-weight: 400;\">IRI Data Class and Rule Library<\/span><\/i><\/a><span style=\"font-weight: 400;\">. This library contains Data Classes and\/or Data Class Groups, and the data masking functions\/rules you assign to them.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">You can filter the Data Classes and Groups from the library that you intend to use by selecting or deselecting Data Classes in the <\/span><i><span style=\"font-weight: 400;\">Active<\/span><\/i><span style=\"font-weight: 400;\"> column. In this example, I am using all default Data Classes provided when creating an IRI Project.<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17266\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/all-default-Data-Classes-300x274.png\" alt=\"\" width=\"415\" height=\"379\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/all-default-Data-Classes-300x274.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/all-default-Data-Classes.png 536w\" sizes=\"(max-width: 415px) 100vw, 415px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">In the Masking Rules tab, we can see the masking functions are available. These rules dictate how PII found using Data Classes will be masked. It is also possible to add or remove Masking Rules from this tab.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Click <\/span><i><span style=\"font-weight: 400;\">Next<\/span><\/i><span style=\"font-weight: 400;\"> when finished to move onto the page that will allow you to assign these Masking Rules to specific Data Classes.<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17268\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/Assign-these-Masking-Rules-to-specific-Data-Classes-1-300x272.png\" alt=\"\" width=\"426\" height=\"387\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Assign-these-Masking-Rules-to-specific-Data-Classes-1-300x272.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Assign-these-Masking-Rules-to-specific-Data-Classes-1.png 531w\" sizes=\"(max-width: 426px) 100vw, 426px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">On the Assign Masking Rules to Data Classes wizard page, each Data Class or Data Class Group must be assigned a <\/span><a href=\"https:\/\/www.iri.com\/solutions\/data-masking\/static-data-masking\"><span style=\"font-weight: 400;\">data masking function<\/span><\/a><span style=\"font-weight: 400;\"> to specify how you will protect that type of PII (everywhere). If you do not wish to modify a particular PII type, click <\/span><i><span style=\"font-weight: 400;\">Back <\/span><\/i><span style=\"font-weight: 400;\">and deselect the Active checkbox associated with that Data Class or Group; then return here to finish assigning Masking Rules to Data Classes.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Once done, click <\/span><i><span style=\"font-weight: 400;\">Next &gt;<\/span><\/i><span style=\"font-weight: 400;\"> to begin specifying the location(s) of the files to search and mask:<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17270\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/specifying-the-location-of-the-files-1-300x274.png\" alt=\"\" width=\"408\" height=\"373\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/specifying-the-location-of-the-files-1-300x274.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/specifying-the-location-of-the-files-1.png 534w\" sizes=\"(max-width: 408px) 100vw, 408px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">On this page, you add, edit, or remove data sources that DarkShield will scan. If you click <\/span><i><span style=\"font-weight: 400;\">Add<\/span><\/i><span style=\"font-weight: 400;\">\u2026 a sub-wizard opens so you can specify the file storage type and a connection registry.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">A Connection Registry is a reusable connection configuration for connecting a data silo. To create a new Connection Registry first select the desired file storage type, then click <\/span><i><span style=\"font-weight: 400;\">New.<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">My example below demonstrates accessing files in the local (PC) file system, but DarkShield supports other (cloud) file sources (listed above) in Workbench. The <\/span><a href=\"https:\/\/www.iri.com\/blog\/data-protection\/darkshield-files-rpc-api\/\"><span style=\"font-weight: 400;\">DarkShield-Files API<\/span><\/a><span style=\"font-weight: 400;\"> can support files that reside in other storage silos, plus streaming sources, using custom code.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">After selecting or creating a new data connection, the connection registry information is displayed on the Data Sources page. The source item (URI) reveals the root directory from which the searches will occur. You can add more sources to the same search process here.<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17271\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/data-target-300x274.png\" alt=\"\" width=\"404\" height=\"369\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/data-target-300x274.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/data-target.png 534w\" sizes=\"(max-width: 404px) 100vw, 404px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">When finished, you can click <\/span><i><span style=\"font-weight: 400;\">Next &gt; <\/span><\/i><span style=\"font-weight: 400;\">to open the Filter Selection page but skip it for PDFs and image files since it only applies to narrowing the search scope of flat or semi-structured (CSV, Excel, JSON or XML) files u<\/span><span style=\"font-weight: 400;\">sing metadata filters per <\/span><a href=\"https:\/\/www.iri.com\/blog\/data-protection\/finding-and-masking-pii-in-files-with-the-darkshield-files-wizard\/\"><span style=\"font-weight: 400;\">this article<\/span><\/a><span style=\"font-weight: 400;\">. So click <\/span><i><span style=\"font-weight: 400;\">Next &gt;<\/span><\/i><span style=\"font-weight: 400;\"> from there to move onto targeting.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">On the Data Targets page, you will provide the destination for your masked files. The steps to add a data target are the same as for a data source, except no file type selection is requested (since it will be in the same format as the source).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">At this point, you can click <\/span><i><span style=\"font-weight: 400;\">Finish<\/span><\/i><span style=\"font-weight: 400;\"> to produce a .dsc file, or <\/span><i><span style=\"font-weight: 400;\">Next &gt;<\/span><\/i><span style=\"font-weight: 400;\"> to move on to the File Search\/Mask Configurations page. There you can further define job attributes applicable only to certain file types, like PDFs or image formats; see the <\/span><b>Optional Search\/Mask Configurations <\/b><span style=\"font-weight: 400;\">that follow.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">These attributes can be stored for reuse in a configuration registry. You can select from an existing DarkShield File Configuration option registry entry, or create a new one.<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17272\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/DS-file-configuration-300x246.png\" alt=\"\" width=\"441\" height=\"361\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/DS-file-configuration-300x246.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/DS-file-configuration.png 540w\" sizes=\"(max-width: 441px) 100vw, 441px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">If you opt to create a <\/span><i><span style=\"font-weight: 400;\">New \u2026<\/span><\/i> <span style=\"font-weight: 400;\">entry, the file configuration option selection page will appear:<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17273\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/optional-file-search-300x262.png\" alt=\"\" width=\"458\" height=\"400\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/optional-file-search-300x262.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/optional-file-search.png 501w\" sizes=\"(max-width: 458px) 100vw, 458px\" \/><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17274\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/pdf-config-page-291x300.png\" alt=\"\" width=\"446\" height=\"459\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/pdf-config-page-291x300.png 291w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/pdf-config-page.png 507w\" sizes=\"(max-width: 446px) 100vw, 446px\" \/><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17275\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/image-config-page-296x300.png\" alt=\"\" width=\"449\" height=\"455\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/image-config-page-296x300.png 296w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/image-config-page-70x70.png 70w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/image-config-page.png 513w\" sizes=\"(max-width: 449px) 100vw, 449px\" \/><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17276\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/image-config-page-2-294x300.png\" alt=\"\" width=\"450\" height=\"459\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/image-config-page-2-294x300.png 294w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/image-config-page-2-70x70.png 70w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/image-config-page-2.png 513w\" sizes=\"(max-width: 450px) 100vw, 450px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">On this page, select the types of file configuration options to specify, and enter a name for the DarkShield File Configuration registry entry. Once that\u2019s done, you can finally c<\/span><span style=\"font-weight: 400;\">lick <\/span><i><span style=\"font-weight: 400;\">Finish<\/span><\/i><span style=\"font-weight: 400;\"> to produce the .dsc file that is used by the DarkShield API.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In the demonstration below, I will run a DarkShield Search and Mask Job on PDF and image files and show before and after examples. To run your DarkShield Search and Mask Job, right-click on the .dsc file and select<\/span><i><span style=\"font-weight: 400;\"> Run As &gt; IRI Search and Masking Job<\/span><\/i><span style=\"font-weight: 400;\">.\u00a0<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17277\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/right-click-on-the-dsc-file-300x291.png\" alt=\"\" width=\"589\" height=\"572\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/right-click-on-the-dsc-file-300x291.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/right-click-on-the-dsc-file.png 674w\" sizes=\"(max-width: 589px) 100vw, 589px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">After running the job, PII found in the search phase gets immediately masked, and written into files with the same name within the data silo (target location) previously specified in the wizard.<\/span><span style=\"font-weight: 400;\"> \u00a0<\/span><\/p>\n<h5><b>Examples of DarkShield Search and Masking\u00a0<\/b><\/h5>\n<p><span style=\"font-weight: 400;\">Per my configuration options above, you will see that DarkShield applied a same-length pseudonym to the first name, format-preserving encryption to the phone number, and redacted the SSN where those data class values were found in both types of files below:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">PDF before DarkShield:<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17278\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/PDF-before-DarkShield-300x286.png\" alt=\"\" width=\"550\" height=\"524\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/PDF-before-DarkShield-300x286.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/PDF-before-DarkShield.png 556w\" sizes=\"(max-width: 550px) 100vw, 550px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">PDF after DarkShield:<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17279\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/pdf-after-darkshield-300x283.png\" alt=\"\" width=\"583\" height=\"550\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/pdf-after-darkshield-300x283.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/pdf-after-darkshield.png 742w\" sizes=\"(max-width: 583px) 100vw, 583px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Image before DarkShield:<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17281\" style=\"text-align: start;\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/image-before-DS-300x135.png\" alt=\"\" width=\"591\" height=\"266\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/image-before-DS-300x135.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/image-before-DS-768x345.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/image-before-DS.png 780w\" sizes=\"(max-width: 591px) 100vw, 591px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Image after DarkShield:<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17282\" src=\"\/blog\/wp-content\/uploads\/2020\/12\/image-after-DS-300x141.png\" alt=\"\" width=\"617\" height=\"290\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/image-after-DS-300x141.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/image-after-DS-768x362.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/image-after-DS.png 779w\" sizes=\"(max-width: 617px) 100vw, 617px\" \/><\/p>\n<p style=\"text-align: center;\"><strong>Check out our YouTube video here!<\/strong><\/p>\n<p><iframe loading=\"lazy\" title=\"Masking PII in PDF Files Using IRI DarkShield\" width=\"1140\" height=\"641\" src=\"https:\/\/www.youtube.com\/embed\/FZtZ7dV5U24?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Contact <\/span><a href=\"mailto:darkshield@iri.com\"><span style=\"font-weight: 400;\">darkshield@iri.com<\/span><\/a><span style=\"font-weight: 400;\"> or your <\/span><a href=\"https:\/\/www.iri.com\/partners\/resellers\"><span style=\"font-weight: 400;\">IRI representative<\/span><\/a><span style=\"font-weight: 400;\"> if you have any questions or need assistance using DarkShield.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>IRI DarkShield can search for, and mask, personally identifiable information (PII) and other sensitive data in many different file types, documents and databases on-premise or in the cloud. Among the file types supported are those in PDF and image formats, which are the focus of this article. Note that DarkShield can now also find and<\/p>\n<div><a class=\"btn-filled btn\" href=\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/\" title=\"Masking PII in PDF &#038; Image Files\">Read More<\/a><\/div>\n","protected":false},"author":133,"featured_media":17284,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[8,91,29],"tags":[1386,211,1493,1388,850,1748,615,1749,1492,1750],"class_list":["post-14170","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-protection","category-iri-workbench","category-test-data","tag-darkshield","tag-data-privacy-laws","tag-image-masking","tag-iri-darkshield","tag-iri-workbench","tag-masking-pii-in-pdf-files","tag-pdf","tag-pdf-data-masking","tag-pdf-masking","tag-sensitive-data-protection"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v23.4 (Yoast SEO v23.4) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Masking PII in PDF &amp; Image Files - IRI<\/title>\n<meta name=\"description\" content=\"Learn how to find and mask PII or other sensitive data in PDFs and image files to protect data privacy and help you comply with privacy laws.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Masking PII in PDF &amp; Image Files\" \/>\n<meta property=\"og:description\" content=\"Learn how to find and mask PII or other sensitive data in PDFs and image files to protect data privacy and help you comply with privacy laws.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/\" \/>\n<meta property=\"og:site_name\" content=\"IRI\" \/>\n<meta property=\"article:published_time\" content=\"2020-12-22T23:40:23+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-21T12:23:54+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Masking-PII-in-PDF-Image-Files-DS-featured-image.png\" \/>\n\t<meta property=\"og:image:width\" content=\"768\" \/>\n\t<meta property=\"og:image:height\" content=\"368\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Edward Alvey and Adam Lewis\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Edward Alvey and Adam Lewis\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"15 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/\"},\"author\":{\"name\":\"Edward Alvey and Adam Lewis\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/14207244600c66de0b4659b2817d710e\"},\"headline\":\"Masking PII in PDF &#038; Image Files\",\"datePublished\":\"2020-12-22T23:40:23+00:00\",\"dateModified\":\"2025-04-21T12:23:54+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/\"},\"wordCount\":2067,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Masking-PII-in-PDF-Image-Files-DS-featured-image.png\",\"keywords\":[\"DarkShield\",\"data privacy laws\",\"image masking\",\"IRI DarkShield\",\"IRI Workbench\",\"masking PII in PDF files\",\"pdf\",\"PDF data masking\",\"PDF masking\",\"sensitive data protection\"],\"articleSection\":[\"Data Masking\/Protection\",\"IRI Workbench\",\"Test Data\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/\",\"url\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/\",\"name\":\"Masking PII in PDF & Image Files - IRI\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Masking-PII-in-PDF-Image-Files-DS-featured-image.png\",\"datePublished\":\"2020-12-22T23:40:23+00:00\",\"dateModified\":\"2025-04-21T12:23:54+00:00\",\"description\":\"Learn how to find and mask PII or other sensitive data in PDFs and image files to protect data privacy and help you comply with privacy laws.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/#primaryimage\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Masking-PII-in-PDF-Image-Files-DS-featured-image.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Masking-PII-in-PDF-Image-Files-DS-featured-image.png\",\"width\":768,\"height\":368},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.iri.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Masking PII in PDF &#038; Image Files\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.iri.com\/blog\/#website\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"name\":\"IRI\",\"description\":\"Total Data Management Blog\",\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.iri.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\",\"name\":\"IRI\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"width\":750,\"height\":206,\"caption\":\"IRI\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\"}},[{\"@type\":[\"Person\"],\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/14207244600c66de0b4659b2817d710e\",\"name\":\"Edward Alvey\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/\",\"inLanguage\":\"en_US\",\"url\":\"\",\"caption\":\"Edward Alvey\"}},{\"@type\":[\"Person\"],\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/14207244600c66de0b4659b2817d710e\",\"name\":\"Adam Lewis\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/\",\"inLanguage\":\"en_US\",\"url\":\"\",\"caption\":\"Adam Lewis\"}}]]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Masking PII in PDF & Image Files - IRI","description":"Learn how to find and mask PII or other sensitive data in PDFs and image files to protect data privacy and help you comply with privacy laws.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/","og_locale":"en_US","og_type":"article","og_title":"Masking PII in PDF & Image Files","og_description":"Learn how to find and mask PII or other sensitive data in PDFs and image files to protect data privacy and help you comply with privacy laws.","og_url":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/","og_site_name":"IRI","article_published_time":"2020-12-22T23:40:23+00:00","article_modified_time":"2025-04-21T12:23:54+00:00","og_image":[{"width":768,"height":368,"url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Masking-PII-in-PDF-Image-Files-DS-featured-image.png","type":"image\/png"}],"author":"Edward Alvey and Adam Lewis","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Edward Alvey and Adam Lewis","Est. reading time":"15 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/#article","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/"},"author":{"name":"Edward Alvey and Adam Lewis","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/14207244600c66de0b4659b2817d710e"},"headline":"Masking PII in PDF &#038; Image Files","datePublished":"2020-12-22T23:40:23+00:00","dateModified":"2025-04-21T12:23:54+00:00","mainEntityOfPage":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/"},"wordCount":2067,"commentCount":0,"publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Masking-PII-in-PDF-Image-Files-DS-featured-image.png","keywords":["DarkShield","data privacy laws","image masking","IRI DarkShield","IRI Workbench","masking PII in PDF files","pdf","PDF data masking","PDF masking","sensitive data protection"],"articleSection":["Data Masking\/Protection","IRI Workbench","Test Data"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/","url":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/","name":"Masking PII in PDF & Image Files - IRI","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/#primaryimage"},"image":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Masking-PII-in-PDF-Image-Files-DS-featured-image.png","datePublished":"2020-12-22T23:40:23+00:00","dateModified":"2025-04-21T12:23:54+00:00","description":"Learn how to find and mask PII or other sensitive data in PDFs and image files to protect data privacy and help you comply with privacy laws.","breadcrumb":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/#primaryimage","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Masking-PII-in-PDF-Image-Files-DS-featured-image.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Masking-PII-in-PDF-Image-Files-DS-featured-image.png","width":768,"height":368},{"@type":"BreadcrumbList","@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pdfs-and-images\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.iri.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Masking PII in PDF &#038; Image Files"}]},{"@type":"WebSite","@id":"https:\/\/www.iri.com\/blog\/#website","url":"https:\/\/www.iri.com\/blog\/","name":"IRI","description":"Total Data Management Blog","publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.iri.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.iri.com\/blog\/#organization","name":"IRI","url":"https:\/\/www.iri.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","width":750,"height":206,"caption":"IRI"},"image":{"@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/"}},[{"@type":["Person"],"@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/14207244600c66de0b4659b2817d710e","name":"Edward Alvey","image":{"@type":"ImageObject","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/","inLanguage":"en_US","url":"","caption":"Edward Alvey"}},{"@type":["Person"],"@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/14207244600c66de0b4659b2817d710e","name":"Adam Lewis","image":{"@type":"ImageObject","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/","inLanguage":"en_US","url":"","caption":"Adam Lewis"}}]]}},"jetpack_featured_media_url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/12\/Masking-PII-in-PDF-Image-Files-DS-featured-image.png","_links":{"self":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/14170"}],"collection":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/users\/133"}],"replies":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/comments?post=14170"}],"version-history":[{"count":26,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/14170\/revisions"}],"predecessor-version":[{"id":18374,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/14170\/revisions\/18374"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media\/17284"}],"wp:attachment":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media?parent=14170"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/categories?post=14170"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/tags?post=14170"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}