DarkShield Data Discovery

 

Next Steps
DarkShield GUI Data Classification PII Discovery File Masking NoSQL DB Masking RDB Masking CLI & API Specs DarkShield Audit Logs

Whether the sensitive values in your sources can be found or not depends on what search matcher or matchers you associated with your data classes (see prior tab). You can choose from:

Location Matchers (faster, for structured or semi-structured sources)

  1. Column name or pattern-match to a column name
  2. List file value (from a list of locations, like a list of column names)
  3. Range (matches within a specified range of indexed locations) 
  4. Excel cells (and worksheets)
  5. CSV (or other delimited file) header row names
  6. JSON or XML path

and/or

Data Matchers (to scan the contents of each item for a match to a):

  1. RegEx pattern with or without computational validation in JavaScript
  2. Dictionary value (set file) value lookup (exact match)
  3. Fuzzy lookup value (just a close match) 
  4. Open NLP (NER) model
  5. PyTorch (NER) model
  6. TensorFlow (NER) model

When you execute a search only, or search and mask job, in the IRI Workbench GUI for DarkShield, the search engine will scan the file and database silos you specified in your connection profiles using the method or methods (above) you chose to find that data.


Specifying a RegEx pattern & validator search for credit card numbers


Semi-supervised machine learning for Named Entity Recognition

Information about the data found during your searches is recorded in a JSON annotation log which DarkShield can use at search time, or subsequently, to mask the same data with the function you assigned to each data class. Search results can also be directed to a delimited text log which contains metadata associated with (file search) results, and optionally produces layouts for that file in SortCL data definition format (DDF) to facilitate the use of combinatory CoSort transformation and reporting on the those very logs.

Data from these logs is also used in an HTML5-compatible dashboard DarkShield can display to help you locate PII by data class and ranked locations. It can also be exported to external SIEM/SOC and log analytic platforms like Splunk for additional query and visual insight, or actions like a Splunk Adaptive Response or Phantom Playbook to trigger an email or DarkShield masking job.

Frequently Asked Questions (FAQs)

1. What is PII discovery in IRI DarkShield?
PII discovery in DarkShield is the process of scanning files, databases, and cloud repositories to locate sensitive information using search methods tied to defined data classes. This step is essential before masking because it ensures that all relevant data is identified and ready for remediation.
2. How does DarkShield search for sensitive data?
DarkShield uses two types of matchers: location matchers and data matchers. Location matchers identify sensitive data by column names, file headers, JSON or XML paths, or Excel cell locations, while data matchers scan actual content using RegEx patterns, dictionary lookups, fuzzy matches, or machine learning-based AI models for signature detection, RDB data classification, Named Entity Recognition (NER) for NLP, and handwritten PII.
3. What search methods does DarkShield support for structured data?
For structured or semi-structured data, you can use column name pattern matching (e.g., in RDBs and Excel), header row names in delimited files like CSV, JSON/XML paths, and range-based searches. These methods are faster because they avoid scanning every individual value and focus on matching locations.
4. How does DarkShield handle unstructured data discovery?
DarkShield uses content-based data matchers such as RegEx, dictionary value lookups, fuzzy matching, and NER models (OpenNLP, TensorFlow, or PyTorch) to locate PII in unstructured text documents, logs, and other sources where location-based matching is not possible.
5. Can DarkShield use machine learning to improve data discovery?
Yes. DarkShield supports semi-supervised machine learning for Named Entity Recognition (NER), allowing it to detect names, addresses, and other context-sensitive data elements even when they do not follow a fixed pattern.
6. How does DarkShield record and report search results?
When a search job runs, DarkShield produces a JSON annotation log that records every matched value and its location. These logs can be used immediately or later to apply consistent masking functions. Search results can also be exported to a delimited text log for reporting and auditing and web-ready dashboard charts.
7. How does DarkShield support compliance reporting with its discovery logs?
The logs generated during discovery can be displayed in an HTML5 dashboard that ranks PII by data class and location. They can also be exported to SIEM and SOC platforms like Splunk ES, and Datadog for additional analysis, visualization, or automated action like triggering a pre-defined DarkShield data masking job.
8. Can discovery and masking be run separately in DarkShield?
Yes. You can run discovery-only jobs to identify where sensitive data resides and review results before masking. Alternatively, you can run search and mask simultaneously to speed up remediation if you are confident in your data class and matcher definitions.
9. How do search results integrate with other IRI tools?
DarkShield can output results in a delimited file with attendant metadata in SortCL data definition file (DDF) format, enabling integration with IRI CoSort for textual ETL and reporting on the discovered data. This creates a seamless workflow from discovery to security analytics, data integration, and remediation.
10. How does discovery help ensure consistency in masking operations?
PII discovery results are tied to the data classes you define and the masking functions you select. This ensures that the same value is always masked in the same way across all sources, preserving referential integrity and supporting enterprise-wide compliance. See the related data masking job pages in this section for more information.
Share this page

Request More Information

Live Chat

* indicates a required field.
IRI does NOT share your information.