{"id":13427,"date":"2020-01-09T15:24:16","date_gmt":"2020-01-09T20:24:16","guid":{"rendered":"http:\/\/www.iri.com\/blog\/?p=13427"},"modified":"2026-02-23T17:18:01","modified_gmt":"2026-02-23T22:18:01","slug":"masking-pii-xml-json","status":"publish","type":"post","link":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/","title":{"rendered":"Finding and Masking PII in XML and JSON Files Using Filters"},"content":{"rendered":"<p><em><strong>Editors Note:\u00a0<\/strong>The content of this article has been superseded as of DarkShield Version 5. Please refer to <strong><a href=\"https:\/\/www.iri.com\/blog\/data-protection\/finding-and-masking-pii-in-files-with-the-darkshield-files-wizard\/\">this article instead<\/a> <\/strong>for the current methodology using data classification and location matchers for XML and JSON files. Note that in addition to the GUI approach described in that article, DarkShield also provides an <a href=\"https:\/\/www.iri.com\/blog\/data-protection\/darkshield-files-rpc-api\/\">API for file<\/a>s to integrate search\/mask operations into your application(s).<\/em><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Personally Identifiable Information (PII) like names, Social Security numbers, home addresses, etc. are stored in multiple sources and silos, including semi-structured files in JSON and XML format. These formats are characterized by key-value pairs that identify data elements; these identifiers can now be used for finding and masking PII values in <\/span><a href=\"https:\/\/www.iri.com\/products\/darkshield\"><span style=\"font-weight: 400;\">IRI DarkShield<\/span><\/a><span style=\"font-weight: 400;\"> software.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Note that <a href=\"https:\/\/www.iri.com\/products\/fieldshield\">IRI FieldShield<\/a> could already find and mask PII in <a href=\"https:\/\/www.iri.com\/solutions\/data-and-database-migration\/file-conversion\">structured<\/a> (flat) JSON and XML formats. But <em>DarkShield<\/em> handles <a href=\"https:\/\/www.iri.com\/solutions\/big-data\/unstructured-data\">more complex<\/a>, semi-structured documents, and can save more\u00a0time in the search process through a new method: path filters.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"> Specifically, t<\/span><span style=\"font-weight: 400;\">his article discusses the location and remediation of PII in semi-structured files via these key or element names. This method can be used <em>alone or in conjunction with<\/em> other \u201csearch matchers\u201d supported in DarkShield, which include: pattern matchers, value lookups, and NER models. Path flters can provide a faster and more reliable way of finding PII in semi-structured files.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">By way of example, suppose we have an XML file containing a list of invoices with <\/span><i><span style=\"font-weight: 400;\">Forename <\/span><\/i><span style=\"font-weight: 400;\">and <\/span><i><span style=\"font-weight: 400;\">Surname <\/span><\/i><span style=\"font-weight: 400;\">elements as well as other PII hidden within free-flowing text elements which could potentially be used to expose the identity of the customer. We want to mask this PII wherever we find it in the <\/span><i><span style=\"font-weight: 400;\">Invoices<\/span><\/i><span style=\"font-weight: 400;\">, but retain the customer information in other parts of the document.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">DarkShield supports this use case through the use of <\/span><i><span style=\"font-weight: 400;\">Filters<\/span><\/i><span style=\"font-weight: 400;\">, which are file-type specific objects attached to Search Matchers. DarkShield supports the use of <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/XPath\"><i><span style=\"font-weight: 400;\">XPaths<\/span><\/i><\/a><span style=\"font-weight: 400;\">, a query language that can navigate through XML file elements and attributes, and return a value pertaining to the specified element. DarkShield can also use <\/span><a href=\"https:\/\/goessner.net\/articles\/JsonPath\/\"><span style=\"font-weight: 400;\">JSON Paths<\/span><\/a><span style=\"font-weight: 400;\"> to filter through keys in a JSON file.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Let\u2019s look at an XML file containing PII in <\/span><a href=\"https:\/\/www.iri.com\/products\/workbench\"><span style=\"font-weight: 400;\">IRI Workbench<\/span><\/a><span style=\"font-weight: 400;\">, the graphical IDE for DarkShield et al, built on Eclipse\u2122:<\/span><\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking1.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-13430 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking1-1024x666.png\" alt=\"\" width=\"637\" height=\"414\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking1-1024x666.png 1024w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking1-300x195.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking1-768x500.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking1.png 1090w\" sizes=\"(max-width: 637px) 100vw, 637px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">The left side shows file in a standard XML format, with customer information inside. On the right is that same data outlined in a more readable format. We can see the Forename \u201cCarl\u201d and Surname \u201cGustav\u201d &#8212; and an address and telephone number &#8212; are exposed PII in this file.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In this example, we will only be masking people\u2019s names. Similar techniques can be used to mask other PII within XML documents.<\/span><\/p>\n<h4><strong>Forename and Surname Matcher<\/strong><\/h4>\n<p><span style=\"font-weight: 400;\">To search and mask this file, we need to open the <\/span><i><span style=\"font-weight: 400;\">New Dark Data Discovery Job <\/span><\/i><span style=\"font-weight: 400;\">Wizard from the<\/span><i><span style=\"font-weight: 400;\"> Data Discovery<\/span><\/i><span style=\"font-weight: 400;\"> dropdown menu. Select the source of this file, the target folder for remediated results, and the metadata information to accompany the results. If you are unfamiliar with this process, please refer to <\/span><a href=\"https:\/\/www.iri.com\/blog\/migration\/data-migration\/unstructured-data-data-restructuring-wizard\/\"><span style=\"font-weight: 400;\">this<\/span><\/a><span style=\"font-weight: 400;\"> blog article for assistance.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To begin, we want to create a new Search Matcher to match on Forenames and Surnames found under the <\/span><i><span style=\"font-weight: 400;\">Invoices <\/span><\/i><span style=\"font-weight: 400;\">element.<span id='easy-footnote-1-13427' class='easy-footnote-margin-adjust'><\/span><span class='easy-footnote'><a href='https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#easy-footnote-bottom-1-13427' title='&lt;\/span&gt;&lt;span style=&quot;font-weight: 400;&quot;&gt;Previously, we had to create a Data Class with a RegEx pattern to match on text that follows the given elements, and attach it to the Search Matcher. Using a regular expression to match on the XML structure is more error-prone however, and does not allow us to exclude arbitrarily nested Forenames and Surnames found under other elements like &lt;\/span&gt;&lt;i&gt;&lt;span style=&quot;font-weight: 400;&quot;&gt;Customers&lt;\/span&gt;&lt;\/i&gt;&lt;span style=&quot;font-weight: 400;&quot;&gt;. Filters do not share the same shortcomings, and allow for greater flexibility, since Data Classes need no longer be tailored to a particular semi-structured file layout, and can thus be reused in other file formats.&lt;\/span&gt;&lt;span style=&quot;font-weight: 400;&quot;&gt;'><sup>1<\/sup><\/a><\/span><\/span><span style=\"font-weight: 400;\">\u00a0Once we open the Search Matcher Details Dialog, we can start by adding a new XPath Filter by selecting <\/span><i><span style=\"font-weight: 400;\">Add <\/span><\/i><span style=\"font-weight: 400;\">under the <\/span><i><span style=\"font-weight: 400;\">Filters <\/span><\/i><span style=\"font-weight: 400;\">field:<\/span><\/p>\n<p><a href=\"\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking2.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-13431 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking2.png\" alt=\"\" width=\"525\" height=\"450\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking2.png 525w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking2-300x257.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking2-350x300.png 350w\" sizes=\"(max-width: 525px) 100vw, 525px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">Select the type of the Filter as <\/span><i><span style=\"font-weight: 400;\">XML <\/span><\/i><span style=\"font-weight: 400;\">and enter the XPath query into the text box. This particular filter uses <\/span><i><span style=\"font-weight: 400;\">Recursive Descent <\/span><\/i><span style=\"font-weight: 400;\">(\u201c\/\/\u201d) to help locate data without having to specify absolute paths.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The parser will substitute \u201c\/\/\u201d for any arbitrary nested sequence of elements, in our case \u201cCustomers\/Invoices\/Invoice\/InvoiceAddress\/Forename. More simply, \u201c\/\/Invoices\/\/Forename\u201d searches for every instance of Forename within the Invoices element of the file.<\/span><\/p>\n<p><a href=\"\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking3.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-13432 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking3.png\" alt=\"\" width=\"526\" height=\"393\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking3.png 603w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking3-300x224.png 300w\" sizes=\"(max-width: 526px) 100vw, 526px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">Every Search Matcher we create requires a Data Class in order to match on portions of the text. We can create a new Data Class from the Search Matcher details dialog by clicking on <\/span><i><span style=\"font-weight: 400;\">Create <\/span><\/i><span style=\"font-weight: 400;\">in the <\/span><i><span style=\"font-weight: 400;\">Data Class Name <\/span><\/i><span style=\"font-weight: 400;\">field, or by <\/span><a href=\"https:\/\/www.iri.com\/blog\/iri\/iri-workbench\/data-class-validation-workbench\/\"><span style=\"font-weight: 400;\">selecting an existing Data Class<\/span><\/a><span style=\"font-weight: 400;\"> from our preferences by clicking <\/span><i><span style=\"font-weight: 400;\">Browse<\/span><\/i><span style=\"font-weight: 400;\">.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In our example, we will create a new Data Class which uses a RegEx pattern to match all characters. This is useful when a filtered element contains only the necessary data (a name, in this case) and does not need to be searched further.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Note that this Search Matcher will match on all content within file types other than XML, so make sure that only the XML file type is selected in the Source URI Dialog.<\/span><\/p>\n<p><a href=\"\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking4.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-13433 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking4.png\" alt=\"\" width=\"525\" height=\"522\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking4.png 525w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking4-150x150.png 150w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking4-300x298.png 300w\" sizes=\"(max-width: 525px) 100vw, 525px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">We also need to create a new Data Rule to mask our data. In the <\/span><i><span style=\"font-weight: 400;\">Data Rule<\/span><\/i><span style=\"font-weight: 400;\"> field, click <\/span><i><span style=\"font-weight: 400;\">Create<\/span><\/i><span style=\"font-weight: 400;\"> to open up the <\/span><i><span style=\"font-weight: 400;\">Data Rule Wizard.\u00a0<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">This wizard provides a list of different masking functions that can be applied to your search results. Identifying what data that is going to be masked will help you decide what masking rule will be most suitable.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In this case, we are searching for Forename and Surname, so applying a rule that returns ciphertext that also looks like a name &#8212; while remaining a consistent and unique replacement that can preserve referential integrity &#8212; would be ideal.<\/span><\/p>\n<p><a href=\"\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking5.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-13434 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking5.png\" alt=\"\" width=\"524\" height=\"397\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking5.png 766w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking5-300x227.png 300w\" sizes=\"(max-width: 524px) 100vw, 524px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">In this instance we will use an alphanumeric Format Preserving Encryption (FPE) encryption function that replaces the found value with like alphanumeric characters. Letters and numbers will be swapped for other letters and numbers in the same places. Original length, capitalization, and non-alphanumeric characters are also retained in this anonymization scheme.\u00a0<\/span><\/p>\n<p><a href=\"\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking6.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-13435 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking6.png\" alt=\"\" width=\"524\" height=\"483\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking6.png 536w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking6-300x276.png 300w\" sizes=\"(max-width: 524px) 100vw, 524px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">To match on Surnames within the Invoices element, we\u2019ll add another XPath filter to our list of filters. We can create multiple filters for each Search Matcher, and as long as the element matches at least one filter in the list it will be found by the Search Matcher. The screenshot above represents the final state of the Search Matcher we have created for Invoice names.<\/span><\/p>\n<h4><strong>Names in Free-Flowing Text<\/strong><\/h4>\n<p><span style=\"font-weight: 400;\">So far we have described a process for matching the entire content of the filtered elements, but we would also like to use our Search Matchers to intelligently search through free-flowing text embedded within certain elements. To do this, we will use a Named Entity Recognition (NER) matcher for finding names using natural language (contextual) clues in sentences<\/span><span style=\"font-weight: 400;\">.<span id='easy-footnote-2-13427' class='easy-footnote-margin-adjust'><\/span><span class='easy-footnote'><a href='https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#easy-footnote-bottom-2-13427' title='&lt;\/span&gt;&lt;span style=&quot;font-weight: 400;&quot;&gt;Details on NER matchers, along with how to train your own NER models, will be discussed in a future blog article.&lt;\/span&gt;&lt;span style=&quot;font-weight: 400;&quot;&gt;'><sup>2<\/sup><\/a><\/span><\/span><\/p>\n<p><span style=\"font-weight: 400;\">Since not all values within this XML contain sentences, we would like to create a Search Matcher that can filter only those elements which contain free-flowing text, and use an additional NER matcher to match on the portions of that filtered, unstructured text which contain PII.<\/span><\/p>\n<p><a href=\"\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking7.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-13436 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking7.png\" alt=\"\" width=\"524\" height=\"483\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking7.png 536w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking7-300x276.png 300w\" sizes=\"(max-width: 524px) 100vw, 524px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">To that effect, we can add another Search Matcher to use a separate set of filters, Data Classes and Data Rules to find and match PII. In the screenshot above, we created another Search Matcher which uses a Data Class loaded with a NER model.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The XML will be filtered for any arbitrarily nested <\/span><i><span style=\"font-weight: 400;\">Text <\/span><\/i><span style=\"font-weight: 400;\">elements which contain free-flowing sentences. We can also reuse the same Data Rule as for the previous Search Matcher.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">After finishing the wizard, a <\/span><i><span style=\"font-weight: 400;\">.search <\/span><\/i><span style=\"font-weight: 400;\">file is generated. Right-click and select <\/span><i><span style=\"font-weight: 400;\">Run As -&gt; IRI Search and Remediate Job <\/span><\/i><span style=\"font-weight: 400;\">to find matches and mask them with the FPE rule we defined.<\/span><\/p>\n<p><a href=\"\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking8.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-13437 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking8.png\" alt=\"\" width=\"549\" height=\"268\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking8.png 549w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking8-300x146.png 300w\" sizes=\"(max-width: 549px) 100vw, 549px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">Shown above are the search results, including the XML location (with XPath and character offsets shown). Note how \u201cCarl Gustav\u201d was only matched in <\/span><i><span style=\"font-weight: 400;\">Invoices <\/span><\/i><span style=\"font-weight: 400;\">rather than the <\/span><i><span style=\"font-weight: 400;\">Customer <\/span><\/i><span style=\"font-weight: 400;\">element. Also note how \u201cCharles Habsburg\u201d was found in the <\/span><i><span style=\"font-weight: 400;\">Text <\/span><\/i><span style=\"font-weight: 400;\">element using NER.<\/span><\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking9.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-13438 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking9-1024x365.png\" alt=\"\" width=\"881\" height=\"314\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking9-1024x365.png 1024w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking9-300x107.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking9-768x274.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking9.png 1200w\" sizes=\"(max-width: 881px) 100vw, 881px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">Snippet of the remediated results on the left in comparison to the original data on the right.<\/span><\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking10.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-13439 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking10-1024x195.png\" alt=\"\" width=\"882\" height=\"168\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking10-1024x195.png 1024w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking10-300x57.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking10-768x146.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking10.png 1346w\" sizes=\"(max-width: 882px) 100vw, 882px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">In the first screenshot above, you can see the original XML file on the right side that displays the name \u201cCarl Gustav\u201d. On the left side, the file shows the remediation with Format Preserving Encryption. Note how the ciphertext is the same in both cases, preserving referential integrity.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In the second screenshot, we put our NER model to use and it locates the name \u201cCharles Habsburg\u201d. This model is best used when working with documents or transcript values, as it uses Natural Language Processing (NLP) to find the name in the context of sentences.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">If you need help to use DarkShield to find and mask PII in semi- or unstructured text sources in XML, JSON, or any other file, document, or image format, just ask your <\/span><a href=\"https:\/\/www.iri.com\/partners\/resellers\"><span style=\"font-weight: 400;\">local<\/span><\/a><span style=\"font-weight: 400;\"> IRI representative.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Editors Note:\u00a0The content of this article has been superseded as of DarkShield Version 5. Please refer to this article instead for the current methodology using data classification and location matchers for XML and JSON files. Note that in addition to the GUI approach described in that article, DarkShield also provides an API for files to<\/p>\n<div><a class=\"btn-filled btn\" href=\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/\" title=\"Finding and Masking PII in XML and JSON Files Using Filters\">Read More<\/a><\/div>\n","protected":false},"author":126,"featured_media":13438,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[8,91,2255],"tags":[1386,1304,14,1462,1388,850,1104,1459,1463,1461,149,1306,1457,1432,1458,550,1460],"class_list":["post-13427","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-protection","category-iri-workbench","category-archived-articles","tag-darkshield","tag-data-class","tag-data-masking","tag-finding-pii-in-json","tag-iri-darkshield","tag-iri-workbench","tag-json","tag-json-data-masking","tag-masking-pii-in-json-files","tag-masking-pii-in-xml","tag-pii","tag-pii-masking","tag-semi-structured","tag-unstructured-data-masking","tag-unstructured-files","tag-xml","tag-xml-data-masking"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v23.4 (Yoast SEO v23.4) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Finding and Masking PII in XML and JSON Files Using Filters - IRI<\/title>\n<meta name=\"description\" content=\"Personally Identifiable Information (PII) like names, Social Security numbers, home addresses, etc. are stored in multiple sources and silos, including semi\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Finding and Masking PII in XML and JSON Files Using Filters\" \/>\n<meta property=\"og:description\" content=\"Personally Identifiable Information (PII) like names, Social Security numbers, home addresses, etc. are stored in multiple sources and silos, including semi\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/\" \/>\n<meta property=\"og:site_name\" content=\"IRI\" \/>\n<meta property=\"article:published_time\" content=\"2020-01-09T20:24:16+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-02-23T22:18:01+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking9.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"428\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Cody Cremeans\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Cody Cremeans\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/\"},\"author\":{\"name\":\"Cody Cremeans\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/a2a7f6972de861e2605adf3ad662dbc9\"},\"headline\":\"Finding and Masking PII in XML and JSON Files Using Filters\",\"datePublished\":\"2020-01-09T20:24:16+00:00\",\"dateModified\":\"2026-02-23T22:18:01+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/\"},\"wordCount\":1502,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking9.png\",\"keywords\":[\"DarkShield\",\"data class\",\"data masking\",\"finding PII in JSON\",\"IRI DarkShield\",\"IRI Workbench\",\"JSON\",\"JSON data masking\",\"masking PII in JSON files\",\"masking PII in XML\",\"PII\",\"pii masking\",\"semi-structured\",\"unstructured data masking\",\"unstructured files\",\"xml\",\"XML data masking\"],\"articleSection\":[\"Data Masking\/Protection\",\"IRI Workbench\",\"Archived Articles\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/\",\"url\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/\",\"name\":\"Finding and Masking PII in XML and JSON Files Using Filters - IRI\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking9.png\",\"datePublished\":\"2020-01-09T20:24:16+00:00\",\"dateModified\":\"2026-02-23T22:18:01+00:00\",\"description\":\"Personally Identifiable Information (PII) like names, Social Security numbers, home addresses, etc. are stored in multiple sources and silos, including semi\",\"breadcrumb\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#primaryimage\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking9.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking9.png\",\"width\":1200,\"height\":428},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.iri.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Finding and Masking PII in XML and JSON Files Using Filters\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.iri.com\/blog\/#website\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"name\":\"IRI\",\"description\":\"Total Data Management Blog\",\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.iri.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\",\"name\":\"IRI\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"width\":750,\"height\":206,\"caption\":\"IRI\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/a2a7f6972de861e2605adf3ad662dbc9\",\"name\":\"Cody Cremeans\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/576914b298da75553752650a8087f764?s=96&d=blank&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/576914b298da75553752650a8087f764?s=96&d=blank&r=g\",\"caption\":\"Cody Cremeans\"},\"url\":\"https:\/\/www.iri.com\/blog\/author\/codyc\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Finding and Masking PII in XML and JSON Files Using Filters - IRI","description":"Personally Identifiable Information (PII) like names, Social Security numbers, home addresses, etc. are stored in multiple sources and silos, including semi","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/","og_locale":"en_US","og_type":"article","og_title":"Finding and Masking PII in XML and JSON Files Using Filters","og_description":"Personally Identifiable Information (PII) like names, Social Security numbers, home addresses, etc. are stored in multiple sources and silos, including semi","og_url":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/","og_site_name":"IRI","article_published_time":"2020-01-09T20:24:16+00:00","article_modified_time":"2026-02-23T22:18:01+00:00","og_image":[{"width":1200,"height":428,"url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking9.png","type":"image\/png"}],"author":"Cody Cremeans","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Cody Cremeans","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#article","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/"},"author":{"name":"Cody Cremeans","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/a2a7f6972de861e2605adf3ad662dbc9"},"headline":"Finding and Masking PII in XML and JSON Files Using Filters","datePublished":"2020-01-09T20:24:16+00:00","dateModified":"2026-02-23T22:18:01+00:00","mainEntityOfPage":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/"},"wordCount":1502,"commentCount":0,"publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking9.png","keywords":["DarkShield","data class","data masking","finding PII in JSON","IRI DarkShield","IRI Workbench","JSON","JSON data masking","masking PII in JSON files","masking PII in XML","PII","pii masking","semi-structured","unstructured data masking","unstructured files","xml","XML data masking"],"articleSection":["Data Masking\/Protection","IRI Workbench","Archived Articles"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/","url":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/","name":"Finding and Masking PII in XML and JSON Files Using Filters - IRI","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#primaryimage"},"image":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking9.png","datePublished":"2020-01-09T20:24:16+00:00","dateModified":"2026-02-23T22:18:01+00:00","description":"Personally Identifiable Information (PII) like names, Social Security numbers, home addresses, etc. are stored in multiple sources and silos, including semi","breadcrumb":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#primaryimage","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking9.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking9.png","width":1200,"height":428},{"@type":"BreadcrumbList","@id":"https:\/\/www.iri.com\/blog\/data-protection\/masking-pii-xml-json\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.iri.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Finding and Masking PII in XML and JSON Files Using Filters"}]},{"@type":"WebSite","@id":"https:\/\/www.iri.com\/blog\/#website","url":"https:\/\/www.iri.com\/blog\/","name":"IRI","description":"Total Data Management Blog","publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.iri.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.iri.com\/blog\/#organization","name":"IRI","url":"https:\/\/www.iri.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","width":750,"height":206,"caption":"IRI"},"image":{"@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/a2a7f6972de861e2605adf3ad662dbc9","name":"Cody Cremeans","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/576914b298da75553752650a8087f764?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/576914b298da75553752650a8087f764?s=96&d=blank&r=g","caption":"Cody Cremeans"},"url":"https:\/\/www.iri.com\/blog\/author\/codyc\/"}]}},"jetpack_featured_media_url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2020\/01\/xml-json-masking9.png","_links":{"self":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/13427"}],"collection":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/users\/126"}],"replies":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/comments?post=13427"}],"version-history":[{"count":14,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/13427\/revisions"}],"predecessor-version":[{"id":18994,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/13427\/revisions\/18994"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media\/13438"}],"wp:attachment":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media?parent=13427"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/categories?post=13427"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/tags?post=13427"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}