{"id":16711,"date":"2023-12-19T12:39:53","date_gmt":"2023-12-19T17:39:53","guid":{"rendered":"https:\/\/www.iri.com\/blog\/?p=16711"},"modified":"2025-06-18T14:55:54","modified_gmt":"2025-06-18T18:55:54","slug":"data-matchers","status":"publish","type":"post","link":"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/","title":{"rendered":"Finding PII Using Data Matchers"},"content":{"rendered":"<p><i><span style=\"font-weight: 400;\">As we\u2019ve learned from <\/span><\/i><a href=\"https:\/\/www.iri.com\/blog\/data-protection\/iri-data-classification\/\"><i><span style=\"font-weight: 400;\">this article<\/span><\/i><\/a><i><span style=\"font-weight: 400;\"> on Data Classification in IRI Workbench (as of <a href=\"https:\/\/www.iri.com\/products\/darkshield\">DarkShield V5<\/a>), the types of PII, or classes of sensitive data, you define should be associated with one or more Search Matchers used during data discovery to find those values accurately. This article covers the use of Data Matchers, which examine the contents of data itself at search time.<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">Currently, Search Matchers can be divided into two sub-categories: <\/span><b>Location <\/b><span style=\"font-weight: 400;\">Matchers and <\/span><b>Data <\/b><span style=\"font-weight: 400;\">Matchers. <a href=\"https:\/\/www.iri.com\/blog\/data-protection\/location-matchers\/\">Location Matchers<\/a> apply strictly to structured and semi-structured data and use the <\/span><i><span style=\"font-weight: 400;\">structure <\/span><\/i><span style=\"font-weight: 400;\">(metadata) <\/span><span style=\"font-weight: 400;\">of data sources to locate and classify data. Data matchers, on the other hand, directly inspect the <\/span><i><span style=\"font-weight: 400;\">content <\/span><\/i><span style=\"font-weight: 400;\">of data to determine if values match the specified search attributes of the data class.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Unlike Location Matchers, Data Matchers can be used for matching against structured, semi-structured, and unstructured data. Data Matchers are very useful when PII can be found in free-floating text. This includes but is not limited to, free text, Word documents, PDFs, images, and PowerPoint sources, either in standalone files or embedded in database collections.<\/span><\/p>\n<h5><b>The Different Types of Data Matchers<\/b><\/h5>\n<p><span style=\"font-weight: 400;\">IRI Workbench supports six types of Data Matchers:\u00a0<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Data Pattern Matcher\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Set File Matcher<\/span><\/li>\n<li><span style=\"font-weight: 400;\">Fuzzy Matcher (DarkShield Only)<\/span><\/li>\n<li>NER Matchers\n<ul>\n<li aria-level=\"1\"><span style=\"font-weight: 400;\">OpenNLP Matcher (DarkShield Only)<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">PyTorch Matcher (DarkShield Only)<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">TensorFlow Matcher (DarkShield Only)<\/span><\/li>\n<\/ul>\n<\/li>\n<li>Signature Detection<\/li>\n<li>Audio<\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">This article covers the first four; additional articles on signatures and audio are pending.<\/span><\/p>\n<h5><b>Data Pattern Matcher<\/b><\/h5>\n<p><span style=\"font-weight: 400;\">The Data Pattern Matcher is one of IRI\u2019s most commonly used Data Matchers. Using a Java Regular Expression (RegEx) pattern, this matcher looks for a string matching particular formatting attributes. For example, a RegEx pattern for emails will look for the special character <\/span><i><span style=\"font-weight: 400;\">@ <\/span><\/i><span style=\"font-weight: 400;\">between words and check for a dot ( . ) followed by more letters (and possibly more dots) meant to represent an email domain.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">From the IRI data class rules library (.dcrlib) form editor\u2019s Data Matchers wizard page a Data Pattern Matcher can take three parameters. The first parameter is the RegEx pattern used to perform matching. A user can decide to either create their own pattern or use a pattern from the list of default patterns.\u00a0<\/span><\/p>\n<p style=\"text-align: left;\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-16717 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2023\/12\/data-matcher-description-300x300.png\" alt=\"\" width=\"464\" height=\"464\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/data-matcher-description-300x300.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/data-matcher-description.png 1024w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/data-matcher-description-150x150.png 150w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/data-matcher-description-768x768.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/data-matcher-description-1536x1536.png 1536w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/data-matcher-description-70x70.png 70w\" sizes=\"(max-width: 464px) 100vw, 464px\" \/><br \/>\n<span style=\"font-weight: 400;\">By default IRI ships several different patterns with its <\/span><a href=\"https:\/\/www.iri.com\/products\/workbench\"><span style=\"font-weight: 400;\">Workbench<\/span><\/a><span style=\"font-weight: 400;\">\u00a0 and we frequently update that list. To view and\/or select from the list of preloaded patterns click the button <\/span><i><span style=\"font-weight: 400;\">Browse\u2026<\/span><\/i><span style=\"font-weight: 400;\"> next to the field called Pattern.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">A dialog page will open, allowing the user to choose from a list of regex library files sorted by locality.<\/span><\/p>\n<p style=\"text-align: left;\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-16718 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2023\/12\/regex_loc-300x149.png\" alt=\"\" width=\"632\" height=\"314\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/regex_loc-300x149.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/regex_loc.png 740w\" sizes=\"(max-width: 632px) 100vw, 632px\" \/><br \/>\n<span style=\"font-weight: 400;\">Once a selection is made click OK to continue to the Common Patterns wizard page that will display the current list of patterns in the Workbench pattern library from the previously selected library file.<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-16719\" src=\"\/blog\/wp-content\/uploads\/2023\/12\/pattern-library-291x300.png\" alt=\"\" width=\"468\" height=\"482\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/pattern-library-291x300.png 291w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/pattern-library.png 598w\" sizes=\"(max-width: 468px) 100vw, 468px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0On the Pattern Library wizard page, we can select, add, edit, remove, import, or export regex patterns. After finding a pattern of interest, select the pattern and click <\/span><i><span style=\"font-weight: 400;\">OK <\/span><\/i><span style=\"font-weight: 400;\">to return the initial Data Matcher page with the selected regex pattern loaded.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Another option aside from using a stored pattern, is to create and add a new pattern. This is done by selecting the <\/span><i><span style=\"font-weight: 400;\">Create\u2026<\/span><\/i><span style=\"font-weight: 400;\"> button next to the Pattern field. This will in turn display a new page called Pattern Editor. The Pattern Editor wizard page is for creating a new pattern or editing previously created patterns.<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-16720\" src=\"\/blog\/wp-content\/uploads\/2023\/12\/Pattern-Editor-wizard-300x290.png\" alt=\"\" width=\"486\" height=\"470\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/Pattern-Editor-wizard-300x290.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/Pattern-Editor-wizard-1024x988.png 1024w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/Pattern-Editor-wizard-768x741.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/Pattern-Editor-wizard-1536x1482.png 1536w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/Pattern-Editor-wizard.png 1061w\" sizes=\"(max-width: 486px) 100vw, 486px\" \/><\/p>\n<p style=\"text-align: left;\"><span style=\"font-weight: 400;\">In the <\/span><i><span style=\"font-weight: 400;\">Regex Pattern<\/span><\/i><span style=\"font-weight: 400;\"> field, we can provide a regex pattern that will be used to match on data. To verify that a pattern will match correctly to specific data adhering to specific formats we can use the <\/span><i><span style=\"font-weight: 400;\">Test Sample <\/span><\/i><span style=\"font-weight: 400;\">field to provide some test data. From the example above you can see that the regex pattern provided will match on emails.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Once satisfied with the regex pattern provided click OK to return to the initial Data Matchers page with a pattern added.<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-16721\" src=\"\/blog\/wp-content\/uploads\/2023\/12\/initial-data-matchers-300x295.png\" alt=\"\" width=\"405\" height=\"398\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/initial-data-matchers-300x295.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/initial-data-matchers-70x70.png 70w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/initial-data-matchers.png 547w\" sizes=\"(max-width: 405px) 100vw, 405px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Moving on, the second parameter, <\/span><i><span style=\"font-weight: 400;\">Validator Script,<\/span><\/i><span style=\"font-weight: 400;\"> is an optional parameter where the user can choose to upload a validator script used to validate the match. For example, a match may be found on a phone number, credit card, or SSN based on a pattern but without some way to validate if it is a real number, you may match on false positives.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Currently, only JavaScript-based validator scripts are supported. See <\/span><a href=\"https:\/\/www.iri.com\/blog\/iri\/iri-workbench\/data-class-validator-workbench\/\"><span style=\"font-weight: 400;\">this article<\/span><\/a><span style=\"font-weight: 400;\"> for more details.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Lastly, the third parameter is another optional parameter called <\/span><i><span style=\"font-weight: 400;\">Groups<\/span><\/i><span style=\"font-weight: 400;\">. Groups allow matches on named groups of a RegEx pattern match. For example, a pattern that matches \u201c<\/span><a href=\"mailto:exampleemail@gmail.com\"><span style=\"font-weight: 400;\">exampleemail@gmail.com<\/span><\/a><span style=\"font-weight: 400;\">\u201d may have a group that is used to find the domain (which in this case would return \u201cgmail\u201d).\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">You can add or remove groups in the Groups field of the Data Pattern wizard page. Using groups you could do something like preserve the domain name and mask only the preceding username.<\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><\/p>\n<h5><b>Set File Matcher<\/b><\/h5>\n<p><span style=\"font-weight: 400;\">A Set File Matcher, also commonly known as a dictionary lookup, uses a text file containing a list of strings to perform matches against. Each entry must be separated by a new line. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">The file can have multiple columns (which must be tab-separated), but only the first column will be used in the context of data matching with the set file data matcher. This type of matcher is easy to use and is also very flexible in its purposes. See <\/span><a href=\"https:\/\/www.iri.com\/blog\/test-data\/all-about-iri-set-files-a-primer\/\"><span style=\"font-weight: 400;\">this article<\/span><\/a><span style=\"font-weight: 400;\"> for more details about set files.<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-16722\" src=\"\/blog\/wp-content\/uploads\/2023\/12\/set-files-data-matchers-181x300.png\" alt=\"\" width=\"216\" height=\"358\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/set-files-data-matchers-181x300.png 181w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/set-files-data-matchers.png 373w\" sizes=\"(max-width: 216px) 100vw, 216px\" \/><\/p>\n<p style=\"text-align: center;\"><i><span style=\"font-weight: 400;\">Set File containing addresses<\/span><\/i><\/p>\n<p style=\"text-align: left;\"><span style=\"font-weight: 400;\">From the IRI data class rules library (.dcrlib) form editor\u2019s Data Matchers wizard page a Set File Matcher can take three parameters. The first parameter is the path to the set file used for the look-up. By clicking <\/span><i><span style=\"font-weight: 400;\">Browse<\/span><\/i><span style=\"font-weight: 400;\"> the user can select a file on the local file system to provide a path to said set file.<\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-16723\" src=\"\/blog\/wp-content\/uploads\/2023\/12\/names-first-set-300x294.png\" alt=\"\" width=\"423\" height=\"415\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/names-first-set-300x294.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/names-first-set-70x70.png 70w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/names-first-set.png 509w\" sizes=\"(max-width: 423px) 100vw, 423px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">As a side note, IRI Workbench ships with a decent amount of set files and its repertoire increases frequently. IRI also maintains sets of last names and gender-specific first names popular in more than 40 countries.<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-16724\" src=\"\/blog\/wp-content\/uploads\/2023\/12\/iri-set-files-184x300.png\" alt=\"\" width=\"252\" height=\"411\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/iri-set-files-184x300.png 184w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/iri-set-files.png 267w\" sizes=\"(max-width: 252px) 100vw, 252px\" \/><\/p>\n<p style=\"text-align: center;\"><i><span style=\"font-weight: 400;\">Some set files shipped with IRI Workbench<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">The second parameter provides the option to either match only on the whole word, or to allow matching on the parts of a word. To indicate whether to match on only the whole or not the user must check the field <\/span><i><span style=\"font-weight: 400;\">Match on Whole Word<\/span><\/i><span style=\"font-weight: 400;\"> to indicate true or off for false accordingly. By default this field is checked on.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The third parameter provides the option to match case sensitive or insensitive. This can be useful if words may or may not follow normal capitalization conventions of speech. For example, John Smith may be present in text as either John Smith or JOHN SMITH.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">By default, the field <\/span><i><span style=\"font-weight: 400;\">Case Insensitive <\/span><\/i><span style=\"font-weight: 400;\">is checked off. To allow case insensitive matching check the field on.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The fourth parameter provided in the field called <\/span><i><span style=\"font-weight: 400;\">Exclusion<\/span><\/i><span style=\"font-weight: 400;\">, allows the option to match only words that are not in the set file. By default, this option is set to false.<\/span><\/p>\n<h5><b>Fuzzy Matcher<\/b><\/h5>\n<p><span style=\"font-weight: 400;\">A Fuzzy Matcher is similar to a set file lookup in that it performs matching against a list of provided words. Fuzzy Matchers differ from Set File Matchers in that they are not looking for exact matches but close <\/span><i><span style=\"font-weight: 400;\">approximation <\/span><\/i><span style=\"font-weight: 400;\">matches using various search algorithms; e.g., John Addams and John Adams would be a match.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">From the IRI Library form editor\u2019s Data Matchers wizard page, a Fuzzy Matcher can take up to five parameters:\u00a0<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">set file URL<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">maximum distance<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">fuzzy search method<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">fuzzy search algorithm<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">minimum similarity score\u00a0<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">A set file URL can be either a local file or internet URL, such as a file in a GitHub repository.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">There are two types of fuzzy matching methods supported by DarkShield: <\/span><b>score <\/b><span style=\"font-weight: 400;\">(measures the similarity between the source value and set file value) and <\/span><b>distance <\/b><span style=\"font-weight: 400;\">(a difference calculation)<\/span><b>. <\/b><span style=\"font-weight: 400;\">Some fuzzy matching algorithms, by the nature of the algorithm, only support one of the two methods and will use the single method that is supported even if the other method is specified.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">DarkShield fuzzy matchers support many different types of fuzzy search algorithms. Different types of algorithms have distinct strengths and weaknesses, as briefly enumerated in the graphic below. For more information on each algorithm, see <\/span><a href=\"https:\/\/github.com\/tdebatty\/java-string-similarity\"><span style=\"font-weight: 400;\">this project<\/span><\/a><span style=\"font-weight: 400;\"> in GitHub.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-16728 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2023\/12\/fuzzy-matching-282x300.png\" alt=\"\" width=\"518\" height=\"551\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/fuzzy-matching-282x300.png 282w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/fuzzy-matching.png 756w\" sizes=\"(max-width: 518px) 100vw, 518px\" \/><\/p>\n<p style=\"text-align: center;\"><i><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Comparison of Fuzzy Matching Algorithms<\/span><\/i><i><span style=\"font-weight: 400;\"><br \/>\n<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">Any match with a distance <\/span><i><span style=\"font-weight: 400;\">less <\/span><\/i><span style=\"font-weight: 400;\">than or equal to the distance specified (if the distance is the search method being used) will be considered a match. Any match with a score (which is only calculated if the score is the search method being used and the algorithm supports similarity scoring) <\/span><i><span style=\"font-weight: 400;\">greater <\/span><\/i><span style=\"font-weight: 400;\">than or equal to the score specified will be considered a match.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The following fuzzy matching algorithms support similarity scores:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Normalized Levenshtein<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Jaro-Winkler<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Cosine Similarity<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Sorensen Dice Coefficient<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Ratcliff-Obershelp Pattern Recognition<\/span><\/li>\n<\/ul>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-16729\" src=\"\/blog\/wp-content\/uploads\/2023\/12\/fuzzy-darkshield-only-294x300.png\" alt=\"\" width=\"461\" height=\"470\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/fuzzy-darkshield-only-294x300.png 294w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/fuzzy-darkshield-only-70x70.png 70w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/fuzzy-darkshield-only.png 511w\" sizes=\"(max-width: 461px) 100vw, 461px\" \/><\/p>\n<h5><b>NER Matchers<\/b><\/h5>\n<p><em><strong>Named Entity Recognition via Natural Language Processing<\/strong><\/em><\/p>\n<p>Please refer to <a href=\"https:\/\/www.iri.com\/blog\/data-protection\/named-entity-recognition-ner-in-iri-darkshield\/\">this article on training Named Entity Recognition models<\/a> as a complement to this next section.<\/p>\n<h5><b>OpenNLP Matcher<\/b><\/h5>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-16725\" src=\"\/blog\/wp-content\/uploads\/2023\/12\/OpenNLP-matcher_2-300x212.png\" alt=\"\" width=\"559\" height=\"395\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/OpenNLP-matcher_2-300x212.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/OpenNLP-matcher_2.png 724w\" sizes=\"(max-width: 559px) 100vw, 559px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">OpenNLP Matchers support the <\/span><a href=\"https:\/\/opennlp.apache.org\/\"><span style=\"font-weight: 400;\">Apache OpenNLP<\/span><\/a><span style=\"font-weight: 400;\"> library. Apache\u2019s OpenNLP library is a <\/span><span style=\"font-weight: 400;\">machine learning-based toolkit for the natural language processing (NLP) of text. The OpenNLP models DarkShield leverages are called Named Entity Recognition (NER) models.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">NER models perform classification based on the context of words in sentences; i.e. using sentence grammar (natural language) to find entities like people\u2019s names, locations, or organizations. OpenNLP NER models are considerably lightweight and fast in performance, but the tradeoff can be lower search accuracy.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">From the IRI Library form editor\u2019s Data Matchers wizard page, an OpenNLP Matcher can take three parameters. All three parameters are optional, as if none are provided default parameters will be passed at job execution.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The first parameter is the model URL parameter. This is the URL to the NER model used for classification. If nothing is passed as a model URL parameter, an English NER model will be used by default. To provide a URL, either type inside the text box of the Model URL field, or click <\/span><i><span style=\"font-weight: 400;\">Browse<\/span><\/i><span style=\"font-weight: 400;\"> to select a model file (that will populate the form with the model file URL).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The second parameter is a sentence detector URL. The sentence detector is used in conjunction with NER tasks to split strings of text into individual sentences to be processed.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This is another optional parameter, and again if none is provided, an English sentence detector is used by default.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To provide a URL either type inside the text box of the Sentence Detector field or click <\/span><i><span style=\"font-weight: 400;\">Browse<\/span><\/i><span style=\"font-weight: 400;\"> to select a sentence detector from the file system. This will in turn fill the form field with the file URL for the sentence detector binary file.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The third parameter is the tokenizer URL. The tokenizer is for splitting a sentence into smaller parts that provide meaning. This is another optional parameter, if none is provided, then the tokenizer provided with the model will be used instead.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To provide a URL either type inside the text box of the Tokenizer field or click <\/span><i><span style=\"font-weight: 400;\">Browse<\/span><\/i><span style=\"font-weight: 400;\"> to select a tokenizer from the file system. This will in turn fill the form field with the file URL for tokenizer binary file.<\/span><\/p>\n<h5><b>PyTorch and TensorFlow Matchers<\/b><\/h5>\n<p><span style=\"font-weight: 400;\">PyTorch and TensorFlow Matchers are machine-learning-based NLP models that perform NER classification on text. PyTorch and TensorFlow use different underlying frameworks for their models but both framework types are accessible on the Hugging Face cloud model repository.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Hugging Face is a community-driven library that supports repositories to open-source models.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Currently, PyTorch and TensorFlow are DarkShield-only Search Matcher types.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Compared to OpenNLP, these models are a heavier download and take a longer time to perform classification. In exchange, these models are far more accurate than OpenNLP models.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">On the IRI Library form editor\u2019s Data Matchers wizard page, both the <\/span><span style=\"font-weight: 400;\">PyTorch and TensorFlow<\/span><span style=\"font-weight: 400;\"> Matcher can take up to four parameters. The parameters for the <\/span><span style=\"font-weight: 400;\">PyTorch and TensorFlow<\/span><span style=\"font-weight: 400;\"> Matchers are exactly the same. As such, the same concepts will apply when creating either matcher.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The first parameter is the model URL. Like the OpenNLP matcher. If nothing is passed as a model URL parameter an English NER model will be used by default.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To provide a URL either type inside the text box of the Model URL field or click <\/span><i><span style=\"font-weight: 400;\">Browse<\/span><\/i><span style=\"font-weight: 400;\"> to select from the file system the directory containing the model that will be used.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The second parameter is the tokenizer URL. The tokenizer is for splitting a sentence into smaller parts that provide meaning. This is another optional parameter, if none is provided, then the tokenizer provided with the model will be used instead.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To provide a URL either type inside the text box of the Tokenizer field or click <\/span><i><span style=\"font-weight: 400;\">Browse<\/span><\/i><span style=\"font-weight: 400;\"> to select from the file system the directory containing the tokenizer that will be used.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The third parameter is the list of entity labels that may be used during classification. By default, all entity labels available to a model will be used if none are passed as a parameter in the wizard page. Entity labels dictate what groupings will be used during the classification process and may vary depending on the model.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">For example, there may be a model that accepts four labels:<\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">PER (names)<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">LOC (places or addresses)<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">ORG (organizations)<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">MISC (everything else)<\/span><\/li>\n<\/ul>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-16726\" src=\"\/blog\/wp-content\/uploads\/2023\/12\/PyTorch-NER-matcher-params-300x207.png\" alt=\"\" width=\"553\" height=\"382\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/PyTorch-NER-matcher-params-300x207.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/PyTorch-NER-matcher-params.png 768w\" sizes=\"(max-width: 553px) 100vw, 553px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Thus if you only wanted to find names and organizations, the list of entity labels would include PER and ORG. As previously mentioned, the entity labels that are allowed vary from model to model.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To identify the entity labels accepted by a model, check out the <\/span><i><span style=\"font-weight: 400;\">id2label<\/span><\/i><span style=\"font-weight: 400;\"> list residing inside the <\/span><i><span style=\"font-weight: 400;\">config.json<\/span><\/i><span style=\"font-weight: 400;\"> file inside that model\u2019s directory.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To add a label to the list of labels that will be passed as a parameter click the <\/span><i><span style=\"font-weight: 400;\">Add <\/span><\/i><span style=\"font-weight: 400;\">button on the Entity Labels form field. Click <\/span><i><span style=\"font-weight: 400;\">Remove <\/span><\/i><span style=\"font-weight: 400;\">to remove a label from the list.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The fourth parameter is the aggregation strategy that will be used by the model. This is a way to\u00a0 fuse (or not) tokens based on the model prediction. A model can either use a strategy of none, simple, first, average, or max. For more information on aggregation strategy, see <\/span><a href=\"https:\/\/huggingface.co\/transformers\/v4.8.2\/main_classes\/pipelines.html\"><span style=\"font-weight: 400;\">this documentation<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-16727\" src=\"\/blog\/wp-content\/uploads\/2023\/12\/named-entity-recognition-300x212.png\" alt=\"\" width=\"536\" height=\"379\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/named-entity-recognition-300x212.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/named-entity-recognition.png 724w\" sizes=\"(max-width: 536px) 100vw, 536px\" \/><\/p>\n<h5><b>In Closing<\/b><\/h5>\n<p><span style=\"font-weight: 400;\">Whether the data you need to find are in free-floating text, images, or documents, these Data Matchers deliver the freedom to match on data itself, along with the needed flexibility in the search-matching process. If you have any questions or need help with these concepts, please email <\/span><a href=\"mailto:info@iri.com\"><span style=\"font-weight: 400;\">info@iri.com<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>As we\u2019ve learned from this article on Data Classification in IRI Workbench (as of DarkShield V5), the types of PII, or classes of sensitive data, you define should be associated with one or more Search Matchers used during data discovery to find those values accurately. This article covers the use of Data Matchers, which examine<\/p>\n<div><a class=\"btn-filled btn\" href=\"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/\" title=\"Finding PII Using Data Matchers\">Read More<\/a><\/div>\n","protected":false},"author":152,"featured_media":16712,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[8],"tags":[1386,1081,280,14,1716,1437,211,13,984,1718,1719,1734,1737],"class_list":["post-16711","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-protection","tag-darkshield","tag-data-classification","tag-data-discovery","tag-data-masking","tag-data-matchers","tag-data-matching","tag-data-privacy-laws","tag-data-protection-2","tag-find-and-mask-sensitive-data","tag-finding-pii","tag-masking-pii","tag-pii-discovery","tag-sensitive-data-discovery"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v23.4 (Yoast SEO v23.4) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Finding PII Using Data Matchers - IRI<\/title>\n<meta name=\"description\" content=\"Improve data discovery with IRI DarkShield Data Matchers to find and mask PII and other sensitive data in unstructured documents and images.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Finding PII Using Data Matchers\" \/>\n<meta property=\"og:description\" content=\"Improve data discovery with IRI DarkShield Data Matchers to find and mask PII and other sensitive data in unstructured documents and images.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/\" \/>\n<meta property=\"og:site_name\" content=\"IRI\" \/>\n<meta property=\"article:published_time\" content=\"2023-12-19T17:39:53+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-06-18T18:55:54+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/Data-Matcher-Blog-Image.png\" \/>\n\t<meta property=\"og:image:width\" content=\"768\" \/>\n\t<meta property=\"og:image:height\" content=\"368\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Adam Lewis\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Adam Lewis\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/\"},\"author\":{\"name\":\"Adam Lewis\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/37c0e5beab094bd61cc521902df2876e\"},\"headline\":\"Finding PII Using Data Matchers\",\"datePublished\":\"2023-12-19T17:39:53+00:00\",\"dateModified\":\"2025-06-18T18:55:54+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/\"},\"wordCount\":2378,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/Data-Matcher-Blog-Image.png\",\"keywords\":[\"DarkShield\",\"data classification\",\"data discovery\",\"data masking\",\"data matchers\",\"data matching\",\"data privacy laws\",\"data protection\",\"find and mask sensitive data\",\"finding pii\",\"masking pii\",\"PII discovery\",\"sensitive data discovery\"],\"articleSection\":[\"Data Masking\/Protection\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/\",\"url\":\"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/\",\"name\":\"Finding PII Using Data Matchers - IRI\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/Data-Matcher-Blog-Image.png\",\"datePublished\":\"2023-12-19T17:39:53+00:00\",\"dateModified\":\"2025-06-18T18:55:54+00:00\",\"description\":\"Improve data discovery with IRI DarkShield Data Matchers to find and mask PII and other sensitive data in unstructured documents and images.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/#primaryimage\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/Data-Matcher-Blog-Image.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/Data-Matcher-Blog-Image.png\",\"width\":768,\"height\":368},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.iri.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Finding PII Using Data Matchers\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.iri.com\/blog\/#website\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"name\":\"IRI\",\"description\":\"Total Data Management Blog\",\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.iri.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\",\"name\":\"IRI\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"width\":750,\"height\":206,\"caption\":\"IRI\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/37c0e5beab094bd61cc521902df2876e\",\"name\":\"Adam Lewis\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/087667d0c75d33bb6fab6e734bd89333?s=96&d=blank&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/087667d0c75d33bb6fab6e734bd89333?s=96&d=blank&r=g\",\"caption\":\"Adam Lewis\"},\"url\":\"https:\/\/www.iri.com\/blog\/author\/adaml\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Finding PII Using Data Matchers - IRI","description":"Improve data discovery with IRI DarkShield Data Matchers to find and mask PII and other sensitive data in unstructured documents and images.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/","og_locale":"en_US","og_type":"article","og_title":"Finding PII Using Data Matchers","og_description":"Improve data discovery with IRI DarkShield Data Matchers to find and mask PII and other sensitive data in unstructured documents and images.","og_url":"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/","og_site_name":"IRI","article_published_time":"2023-12-19T17:39:53+00:00","article_modified_time":"2025-06-18T18:55:54+00:00","og_image":[{"width":768,"height":368,"url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/Data-Matcher-Blog-Image.png","type":"image\/png"}],"author":"Adam Lewis","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Adam Lewis","Est. reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/#article","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/"},"author":{"name":"Adam Lewis","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/37c0e5beab094bd61cc521902df2876e"},"headline":"Finding PII Using Data Matchers","datePublished":"2023-12-19T17:39:53+00:00","dateModified":"2025-06-18T18:55:54+00:00","mainEntityOfPage":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/"},"wordCount":2378,"commentCount":0,"publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/Data-Matcher-Blog-Image.png","keywords":["DarkShield","data classification","data discovery","data masking","data matchers","data matching","data privacy laws","data protection","find and mask sensitive data","finding pii","masking pii","PII discovery","sensitive data discovery"],"articleSection":["Data Masking\/Protection"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/","url":"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/","name":"Finding PII Using Data Matchers - IRI","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/#primaryimage"},"image":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/Data-Matcher-Blog-Image.png","datePublished":"2023-12-19T17:39:53+00:00","dateModified":"2025-06-18T18:55:54+00:00","description":"Improve data discovery with IRI DarkShield Data Matchers to find and mask PII and other sensitive data in unstructured documents and images.","breadcrumb":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/#primaryimage","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/Data-Matcher-Blog-Image.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/Data-Matcher-Blog-Image.png","width":768,"height":368},{"@type":"BreadcrumbList","@id":"https:\/\/www.iri.com\/blog\/data-protection\/data-matchers\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.iri.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Finding PII Using Data Matchers"}]},{"@type":"WebSite","@id":"https:\/\/www.iri.com\/blog\/#website","url":"https:\/\/www.iri.com\/blog\/","name":"IRI","description":"Total Data Management Blog","publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.iri.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.iri.com\/blog\/#organization","name":"IRI","url":"https:\/\/www.iri.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","width":750,"height":206,"caption":"IRI"},"image":{"@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/37c0e5beab094bd61cc521902df2876e","name":"Adam Lewis","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/087667d0c75d33bb6fab6e734bd89333?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/087667d0c75d33bb6fab6e734bd89333?s=96&d=blank&r=g","caption":"Adam Lewis"},"url":"https:\/\/www.iri.com\/blog\/author\/adaml\/"}]}},"jetpack_featured_media_url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/12\/Data-Matcher-Blog-Image.png","_links":{"self":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/16711"}],"collection":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/users\/152"}],"replies":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/comments?post=16711"}],"version-history":[{"count":19,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/16711\/revisions"}],"predecessor-version":[{"id":18443,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/16711\/revisions\/18443"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media\/16712"}],"wp:attachment":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media?parent=16711"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/categories?post=16711"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/tags?post=16711"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}