{"id":15590,"date":"2022-02-04T15:02:17","date_gmt":"2022-02-04T20:02:17","guid":{"rendered":"http:\/\/www.iri.com\/blog\/?p=15590"},"modified":"2024-11-07T06:37:13","modified_gmt":"2024-11-07T11:37:13","slug":"generating-test-data-in-pdf-and-images","status":"publish","type":"post","link":"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/","title":{"rendered":"Generating Test Data in PDF and Image Files"},"content":{"rendered":"<p><a href=\"https:\/\/www.iri.com\/products\/darkshield\">IRI DarkShield<\/a> has always had the ability to search through and mask Personally Identifiable Information (PII) and other sensitive data in unstructured sources like PDF documents and image files. Now DarkShield can also work with <a href=\"https:\/\/www.iri.com\/products\/rowgen\">IRI RowGen<\/a> to generate and insert <a href=\"https:\/\/www.iri.com\/solutions\/test-data\">test data<\/a> in those formats, too, on-premise or in the cloud.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15621 alignright\" src=\"\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-document-lifecycle.png\" alt=\"\" width=\"218\" height=\"219\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-document-lifecycle.png 370w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-document-lifecycle-150x150.png 150w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-document-lifecycle-300x300.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-document-lifecycle-70x70.png 70w\" sizes=\"(max-width: 218px) 100vw, 218px\" \/>This capability enables developers and testers of applications that process or otherwise manage these sources of data to work with realistic, but safe\/artificial samples in DevOps, etc. It does not enable fraudulent document or image creation due to material differences in the background (appearance) and nature (randomization) of the test values, per the samples shown.<\/p>\n<p>This article will explain the new functionality in the <a href=\"https:\/\/www.iri.com\/blog\/data-protection\/darkshield-files-rpc-api\">DarkShield-Files API<\/a> which enables this capability, and provide some examples of creating test data in PDFs and images.<\/p>\n<h6><b>Generating Test Data in PDFs<\/b><\/h6>\n<p>In PDFs, values have always been able to be replaced with a consistent or random pseudonym replacement from a <a href=\"https:\/\/www.iri.com\/blog\/test-data\/all-about-iri-set-files-a-primer\/\">set file<\/a> (dictionary lookup file) of test values. However, now form fields in PDFs can be populated from scratch with set-file data values, too.<\/p>\n<p>Multi-column set files can be used as well, and any references between values within a row will be kept. What this means is that if a <a href=\"https:\/\/www.iri.com\/blog\/data-transformation2\/drawing-values-from-set-files\/\">multi-column set file<\/a> with related values such as city, state and zip code are used, their relationships can be maintained in a realistic fashion.<\/p>\n<p>A form field is referenced with the field\u2019s name, which can be viewed through an application like Adobe Acrobat Reader, or dispensed through a new utility application called <i>pdflist<\/i> provided in the <i>bin <\/i>folder of a <i>plankton <\/i>distribution starting with version 1.4.0.<\/p>\n<p>Form fields can be filled out in a similar way to user-specified bounding boxes in images. In the case of form fields in PDFs, the field is referenced by a name, which defines where to put the data and is akin to specifying the coordinates of a bounding box in the case of an image.<\/p>\n<p>The field can either have its value searched and replaced based on masking rules, or a random value can be pulled from a set file. If specifying fields to be filled with different columns from the same set file, relationships are preserved (i.e. a set file with zip codes, cities, and states in separate columns will select the correct city and state that goes with a zip code).<\/p>\n<p>I have set up a file mask context with several configuration options corresponding to the fields I want to generate test data for, and what type of data I want to draw from for each field.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15602 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-file-mask-context-pdf-generation.png\" alt=\"\" width=\"650\" height=\"328\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-file-mask-context-pdf-generation.png 989w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-file-mask-context-pdf-generation-300x151.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-file-mask-context-pdf-generation-768x387.png 768w\" sizes=\"(max-width: 650px) 100vw, 650px\" \/><\/p>\n<p><i>setReplacement <\/i>\u00a0specifies a URL containing a tab-delimited file called a \u2018set\u2019 file. <i>setReplacementColumns<\/i> specifies the column to take from each set file (starting from index 0). The default behavior is to take the first column if this option is not set.<\/p>\n<p><i>setReplacementFields<\/i> specifies the name of each field in a PDF form to produce data from the file specified in <i>setReplacement <\/i>at the same index. <i>onTextOverflow<\/i> being set to <i>replace<\/i> allows for text in a PDF to be replaced with text of longer length, as can be the case in pseudonymization.<\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15684 size-full\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-td-form-original-1.png\" alt=\"\" width=\"833\" height=\"885\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-td-form-original-1.png 833w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-td-form-original-1-282x300.png 282w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-td-form-original-1-768x816.png 768w\" sizes=\"(max-width: 833px) 100vw, 833px\" \/><i>Original PDF form<\/i><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15685 size-full\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-td-form-gen-1.png\" alt=\"\" width=\"829\" height=\"879\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-td-form-gen-1.png 829w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-td-form-gen-1-283x300.png 283w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-td-form-gen-1-768x814.png 768w\" sizes=\"(max-width: 829px) 100vw, 829px\" \/><i>Resulting PDF returned from the DarkShield API<\/i><\/p>\n<p>Employee name and address, employee phone number, employee social security number, address, routing number, and account number have been freshly generated. Note that the form fields were populated from scratch, and the fixed address was replaced with another address from a set file (shown highlighted).<\/p>\n<h6><b>Generating Test Data in Images<\/b><\/h6>\n<p>As for images, text can be generated into a bounding box area to replace either existing text that was found through OCR and deemed as sensitive by the search matchers that have been set up, or directly into a user-specified bounding box region. This new functionality for generating test images also offers the ability to try to copy the background color based on the most common RGB values in the bounding box region.<\/p>\n<p>Background color is specified as a file configuration option in a file mask context, which also includes other new options such as <i>UseOCR<\/i>, <i>setReplacement<\/i>, <i>setReplacementColumns<\/i>, and <i>maskingMethod<\/i>. <i>UseOCR<\/i> can be set to false to greatly improve performance if only using user-specified bounding boxes. On the other hand, <i>if <\/i>using OCR, <i>maskingMethod<\/i> specifies what should be done with the text that was searched and found as sensitive.<\/p>\n<p>Setting to <i>replacement<\/i> will allow for text to be replaced based on the masking rule associated with the search matcher. This text will be inserted into the original text area as generated text. The default option (and previously only option) if <i>replacement<\/i> is NOT specified is to redact the original text with a black box.<\/p>\n<p><i>setReplacement <\/i>specifies the set files to be used for each user-specified bounding box to generate text in an image. <i>setReplacementColumns<\/i> specifies the column of the set file to pull from, and should be in the same order as the <i>setReplacement<\/i> and <i>boundingBoxes<\/i> parameters.<\/p>\n<p>To get the coordinates to use for a user-specified bounding box, it is easiest currently to use the bounding box search matcher in IRI Workbench. This allows a user to select an image, view it, and draw the bounding box.<\/p>\n<p>The coordinates are then output as the details of the search matcher. These coordinates can be copied and used to specify bounding boxes to the API through the file mask configuration. Multiple bounding boxes can be specified, and there is no restriction on the number of bounding boxes that can be specified.<\/p>\n<h6><b>Test Data in Check Images<\/b><\/h6>\n<p>The demo, available on <a href=\"https:\/\/github.com\/TeamIRI\/darkshield-api-demos\/tree\/master\/pdf-image\/check\">GitHub<\/a>, demonstrates generating check images, replacing the information in the top left corner (name, address, city, state, and zip) with test data. The routing number and account number have also been generated on top of the base image.<\/p>\n<p>Additional bounding box redactions could have been specified for other areas of the check, like the payee, bank logo, and account holder\u2019s signature.<\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15679 size-full\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-samplecheck-1.gif\" alt=\"\" width=\"583\" height=\"253\" \/><i>Original check image #1<\/i><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15680 size-full\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-gen-0-samplecheck-2.gif\" alt=\"\" width=\"583\" height=\"253\" \/><i>Test data image #1 &#8211; MICR font used for test numbers<\/i><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15607 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-check2.jpg\" alt=\"\" width=\"649\" height=\"296\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-check2.jpg 728w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-check2-300x137.jpg 300w\" sizes=\"(max-width: 649px) 100vw, 649px\" \/><i>Original check image #2<\/i><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15609 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-gen-2-check2.jpg\" alt=\"\" width=\"649\" height=\"296\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-gen-2-check2.jpg 728w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-gen-2-check2-300x137.jpg 300w\" sizes=\"(max-width: 649px) 100vw, 649px\" \/><i>Test data image #2 &#8211; Different backgrounds result in bounding box color differences, too.<\/i><\/p>\n<h6><b>Test Data in a Driver\u2019s License\u00a0<\/b><\/h6>\n<p>In this example, I defined some bounding boxes to redact the two locations where the face is shown in the image. In addition, I set the masking method to replacement, which will replace text that is found in an image and matched with a search matcher.<\/p>\n<p>The search matcher matches some common names and cities that I have defined in a set file. The masking rule is to pseudonymize the original value with a <a href=\"https:\/\/www.iri.com\/blog\/data-transformation2\/drawing-values-from-set-files\/\">replacement value from the same two-column set file<\/a>.<\/p>\n<p>Names and cities are consistently pseudonymized to an alternate value based on the mappings in the set file. In this case Harrisburg was replaced with Pittsburgh because it was another city in PA.<\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-15611 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-dl-150-dpi.jpg\" alt=\"\" width=\"506\" height=\"319\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-dl-150-dpi.jpg 506w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-dl-150-dpi-300x189.jpg 300w\" sizes=\"(max-width: 506px) 100vw, 506px\" \/><i>Original image<\/i><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-15610 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-gen-0-dl-150-dpi.jpg\" alt=\"\" width=\"506\" height=\"319\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-gen-0-dl-150-dpi.jpg 506w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-gen-0-dl-150-dpi-300x189.jpg 300w\" sizes=\"(max-width: 506px) 100vw, 506px\" \/><i>Resulting image from DarkShield Files API with selected new data and redacted photos<\/i><\/p>\n<h6><b>Test Data in a Credit Card Image<\/b><b><br \/>\n<\/b><\/h6>\n<p>In this example, the numbers in an image of a credit card are replaced with synthetic numbers using an OCR-A font that credit card numbers often utilize.<\/p>\n<p>These numbers are taken from a set file that can be produced with IRI RowGen using the <i>ccn_gen<\/i> function. That function takes optional arguments of a specific credit card type (or all major types) and a separator between each group of numbers.<\/p>\n<p>See the RowGen script below that I used to generate the set file of synthetic credit card numbers. It is generating VISA credit card numbers with a space in between each group of numbers:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-15613 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-visa-card-numbers-script.png\" alt=\"\" width=\"593\" height=\"226\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-visa-card-numbers-script.png 593w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-visa-card-numbers-script-300x114.png 300w\" sizes=\"(max-width: 593px) 100vw, 593px\" \/><\/p>\n<p>Here are the original and synthetic credit card images, with no change to the name or expiration date:<\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-15616 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-cc-sample.png\" alt=\"\" width=\"300\" height=\"189\" \/><i>Original image<br \/>\n<\/i><br \/>\n<img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-15615 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-gen-cc-sample.png\" alt=\"\" width=\"300\" height=\"189\" \/><i>Synthesized image<\/i><\/p>\n<p>Note that invalid card numbers can be created as well, and it is possible in glue code to imprint a watermark to remind viewers as to the artificial, test nature of the image.<\/p>\n<p><b>Mass Test Image Creation<\/b><\/p>\n<p>Generating test images in bulk uses a combination of IRI DarkShield, IRI RowGen, and calls to the DarkShield API through glue code. RowGen can synthesize set files to use for generating test data in images and PDFs with DarkShield.<\/p>\n<p>I used RowGen to merge existing data from set files that ship with IRI Workbench and some names that I had extracted from the most common baby names in the United States to create a single, multi-column set file with first name, address, city, state, zip, and phone number.<\/p>\n<p>Since the data I had gotten for common last names was in all upper case, I also cleansed it by using the <i>toproper<\/i> function, which puts a name into \u2018proper case\u2019 to make it more realistic for insertion into an image. This involves making the first letter in name uppercase, and the rest of the letters lowercase, except for certain cases like \u2018McDonald\u2019.<\/p>\n<p>Here is the script I created in the <a href=\"https:\/\/www.iri.com\/products\/workbench\/rowgen-gui\">IRI Workbench GUI for RowGen<\/a> to generate that test file with 1000 rows and 7 columns.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-15612 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-test-data-script.png\" alt=\"\" width=\"574\" height=\"654\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-test-data-script.png 574w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-test-data-script-263x300.png 263w\" sizes=\"(max-width: 574px) 100vw, 574px\" \/><\/p>\n<p>I also generated set files for account number and routing number with RowGen by simply generating a field with the data type of digit and the equivalent realistic size (number of digits).<\/p>\n<p>These set files are referenced in the DarkShield API by specifying configuration options in a file mask context. The set file URL can be either a local file URL or an Internet URL.<\/p>\n<p>Here is an example of Python glue code used to set up contexts to the DarkShield API to use in generating test data for check images. OCR is being disabled as a configuration option in the file search context to greatly improve speed, since the data being dropped into the image is at a consistent location.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15618 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-image-text-replacement-contexts-setup-screenshot.png\" alt=\"\" width=\"600\" height=\"575\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-image-text-replacement-contexts-setup-screenshot.png 926w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-image-text-replacement-contexts-setup-screenshot-300x288.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-image-text-replacement-contexts-setup-screenshot-768x736.png 768w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/p>\n<p>Also shown in this image is the ability to specify a custom font with the <i>customFont <\/i>image masking configuration option. The default font for replacing text in images is Times New Roman if no custom font is specified.<\/p>\n<p>A custom font can also be loaded from a file by specifying the path to the file in the <i>customFontFile <\/i>image masking config option. Multiple custom fonts and font files may be specified in a single context.<\/p>\n<p>That was a snippet of setting up variables for file search and mask contexts in Python to send to the DarkShield API. The following image displays Python glue code to synthesize multiple check images in bulk from existing images.<\/p>\n<p>Ten copies of each type of check are generated with different synthetic (test) values in each, and the glue code could be easily modified to generate many more copies. The image below just shows a subset of the 20 test checks generated in the code above:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-15681 size-full\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-checks-1024x576-1.png\" alt=\"\" width=\"1024\" height=\"576\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-checks-1024x576-1.png 1024w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-checks-1024x576-1-300x169.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-checks-1024x576-1-768x432.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<p>The files are output to the local file system, but glue code offers the flexibility to output to any type of destination. The same file contexts are used for both types of check images since the relative locations of the data in the image are similar.<\/p>\n<figure id=\"attachment_15617\" class=\"thumbnail wp-caption aligncenter style=\"width: 610px\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-15617\" src=\"\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-simple-image-generation.png\" alt=\"\" width=\"600\" height=\"588\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-simple-image-generation.png 811w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-simple-image-generation-300x294.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-simple-image-generation-768x753.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-simple-image-generation-70x70.png 70w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><figcaption class=\"caption wp-caption-text\">Python glue code example to set up contexts shown in the previous image, send two different check types ten times to the DarkShield API for synthetic data generation, and teardown contexts at the termination of the program.<\/figcaption><\/figure>\n<p>This example provides a template of how to generate test images in bulk, and can be modified for specific images and scenarios.<\/p>\n<p>If you have any questions or comments about generating or producing safe test data for <i>unstructured <\/i>data sources, contact <a href=\"mailto:darkshield@iri.com\">darkshield@iri.com<\/a>. IRI also offers the RowGen product for generating <i>structured <\/i>test data in flat and EDI files, Excel sheets, ASN.1-compatible CDRs, and structurally and referentially correct RDB schema. Contact <a href=\"mailto:rowgen@iri.com\">rowgen@iri.com<\/a> with any questions about generating or managing test data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>IRI DarkShield has always had the ability to search through and mask Personally Identifiable Information (PII) and other sensitive data in unstructured sources like PDF documents and image files. Now DarkShield can also work with IRI RowGen to generate and insert test data in those formats, too, on-premise or in the cloud. This capability enables<\/p>\n<div><a class=\"btn-filled btn\" href=\"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/\" title=\"Generating Test Data in PDF and Image Files\">Read More<\/a><\/div>\n","protected":false},"author":119,"featured_media":15681,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[8,34,29],"tags":[1494,14,1493,1388,526,1624,1492,1666,1667,88,1662,1663,1664,1665],"class_list":["post-15590","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-protection","category-business","category-test-data","tag-darkshield-api","tag-data-masking","tag-image-masking","tag-iri-darkshield","tag-iri-rowgen","tag-ocr","tag-pdf-masking","tag-synthetic-images","tag-synthetic-pdfs","tag-test-data-2","tag-test-data-images","tag-test-data-pdf","tag-test-images","tag-test-pdf-documents"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v23.4 (Yoast SEO v23.4) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Generating Test Data in PDF and Image Files - IRI<\/title>\n<meta name=\"description\" content=\"Learn how to generate and insert test data values into PDF documents and image files for safe document management and application testing.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Generating Test Data in PDF and Image Files\" \/>\n<meta property=\"og:description\" content=\"Learn how to generate and insert test data values into PDF documents and image files for safe document management and application testing.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/\" \/>\n<meta property=\"og:site_name\" content=\"IRI\" \/>\n<meta property=\"article:published_time\" content=\"2022-02-04T20:02:17+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-11-07T11:37:13+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-checks-1024x576-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"576\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Devon Kozenieski\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Devon Kozenieski\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/\"},\"author\":{\"name\":\"Devon Kozenieski\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/de972c035aaeecfc40a3ae2ea5ff7ba1\"},\"headline\":\"Generating Test Data in PDF and Image Files\",\"datePublished\":\"2022-02-04T20:02:17+00:00\",\"dateModified\":\"2024-11-07T11:37:13+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/\"},\"wordCount\":1957,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-checks-1024x576-1.png\",\"keywords\":[\"Darkshield API\",\"data masking\",\"image masking\",\"IRI DarkShield\",\"IRI RowGen\",\"OCR\",\"PDF masking\",\"synthetic images\",\"synthetic PDFs\",\"test data\",\"test data images\",\"test data pdf\",\"test images\",\"test PDF documents\"],\"articleSection\":[\"Data Masking\/Protection\",\"IRI Business\",\"Test Data\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/\",\"url\":\"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/\",\"name\":\"Generating Test Data in PDF and Image Files - IRI\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-checks-1024x576-1.png\",\"datePublished\":\"2022-02-04T20:02:17+00:00\",\"dateModified\":\"2024-11-07T11:37:13+00:00\",\"description\":\"Learn how to generate and insert test data values into PDF documents and image files for safe document management and application testing.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/#primaryimage\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-checks-1024x576-1.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-checks-1024x576-1.png\",\"width\":1024,\"height\":576},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.iri.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Generating Test Data in PDF and Image Files\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.iri.com\/blog\/#website\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"name\":\"IRI\",\"description\":\"Total Data Management Blog\",\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.iri.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\",\"name\":\"IRI\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"width\":750,\"height\":206,\"caption\":\"IRI\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/de972c035aaeecfc40a3ae2ea5ff7ba1\",\"name\":\"Devon Kozenieski\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/e4c421588c1a85dd9a76146fe15528f7?s=96&d=blank&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/e4c421588c1a85dd9a76146fe15528f7?s=96&d=blank&r=g\",\"caption\":\"Devon Kozenieski\"},\"url\":\"https:\/\/www.iri.com\/blog\/author\/devonk\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Generating Test Data in PDF and Image Files - IRI","description":"Learn how to generate and insert test data values into PDF documents and image files for safe document management and application testing.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/","og_locale":"en_US","og_type":"article","og_title":"Generating Test Data in PDF and Image Files","og_description":"Learn how to generate and insert test data values into PDF documents and image files for safe document management and application testing.","og_url":"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/","og_site_name":"IRI","article_published_time":"2022-02-04T20:02:17+00:00","article_modified_time":"2024-11-07T11:37:13+00:00","og_image":[{"width":1024,"height":576,"url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-checks-1024x576-1.png","type":"image\/png"}],"author":"Devon Kozenieski","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Devon Kozenieski","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/#article","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/"},"author":{"name":"Devon Kozenieski","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/de972c035aaeecfc40a3ae2ea5ff7ba1"},"headline":"Generating Test Data in PDF and Image Files","datePublished":"2022-02-04T20:02:17+00:00","dateModified":"2024-11-07T11:37:13+00:00","mainEntityOfPage":{"@id":"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/"},"wordCount":1957,"commentCount":0,"publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-checks-1024x576-1.png","keywords":["Darkshield API","data masking","image masking","IRI DarkShield","IRI RowGen","OCR","PDF masking","synthetic images","synthetic PDFs","test data","test data images","test data pdf","test images","test PDF documents"],"articleSection":["Data Masking\/Protection","IRI Business","Test Data"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/","url":"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/","name":"Generating Test Data in PDF and Image Files - IRI","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/#primaryimage"},"image":{"@id":"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-checks-1024x576-1.png","datePublished":"2022-02-04T20:02:17+00:00","dateModified":"2024-11-07T11:37:13+00:00","description":"Learn how to generate and insert test data values into PDF documents and image files for safe document management and application testing.","breadcrumb":{"@id":"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/#primaryimage","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-checks-1024x576-1.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-checks-1024x576-1.png","width":1024,"height":576},{"@type":"BreadcrumbList","@id":"https:\/\/www.iri.com\/blog\/test-data\/generating-test-data-in-pdf-and-images\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.iri.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Generating Test Data in PDF and Image Files"}]},{"@type":"WebSite","@id":"https:\/\/www.iri.com\/blog\/#website","url":"https:\/\/www.iri.com\/blog\/","name":"IRI","description":"Total Data Management Blog","publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.iri.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.iri.com\/blog\/#organization","name":"IRI","url":"https:\/\/www.iri.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","width":750,"height":206,"caption":"IRI"},"image":{"@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/de972c035aaeecfc40a3ae2ea5ff7ba1","name":"Devon Kozenieski","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/e4c421588c1a85dd9a76146fe15528f7?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e4c421588c1a85dd9a76146fe15528f7?s=96&d=blank&r=g","caption":"Devon Kozenieski"},"url":"https:\/\/www.iri.com\/blog\/author\/devonk\/"}]}},"jetpack_featured_media_url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2022\/02\/test-data-pdf-images-checks-1024x576-1.png","_links":{"self":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/15590"}],"collection":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/users\/119"}],"replies":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/comments?post=15590"}],"version-history":[{"count":25,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/15590\/revisions"}],"predecessor-version":[{"id":18090,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/15590\/revisions\/18090"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media\/15681"}],"wp:attachment":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media?parent=15590"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/categories?post=15590"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/tags?post=15590"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}