
Schema Pattern Search
The Schema Pattern Search in IRI Workbench (WB) can be used to retrieve data in an entire schema matching specific patterns. The search process compares the patterns to all the data in every column of the selected data types in every table in every schema selected. This processes is helpful in complying with PCI storage retention requirements to find all locations of credit card numbers, or with the GDPR right to be forgotten by finding every database record associated with the requestor’s name.
Unlike the Database Profile wizard in IRI Workbench which can search one table at a time, or the Flat-File Profile wizard which searches only through a selected file, the Schema Pattern Search wizard searches entire schemas for patterns. If only selected tables or structured files need to be searched, use the Data Class Library editor for faster results.
Note that results of the Schema Pattern Search wizard described also be automatically associated to your data classes. In addition, DB/schema-wide data class search and data class masking wizards are also available in IRI Workbench for FieldShield users.
This example searches a single Oracle schema using a credit card regular expression pattern. To use the wizard, select the schema or schemas to search. Select to include matching on names if the patterns are in reference to column names. Select to scan tables that were not scanned previously. This is helpful if a scan did not finish or if new tables have been added since a previous scan. Click Add to add the tableSearchResults from the previous scan to exclude in this scan. Next, select the depth of matching. The choices are full scan to scan all rows and output the data matching. If the matched data is not important, only the columns where results are found, select to not include the data results. If the scan should only match on the column names, select to not scan the data.
On the next page, select the schema or schemas to search.
Select the data types of the columns that contain the data in question. If the suspect data is only in text type fields, selecting only those types of columns will reduce the amount of data that the underlying search engine has to process.
Enter any patterns to exclude items. The patterns should follow this format: <Schema>.<Table> or <Schema>.<Table>.<Column>.
Enter the pattern by browsing for existing patterns or creating a new one. Multiple patterns can be added. The AND and OR operators are available to fine tune the patterns to use.
Clicking Finish will start the search/report process. Depending on how much data is being scanned, the wizard may run for a significant amount of time. Because the process may run for a long time, a file called tableSearchResults is created and records every table that has been fully processed. In case of a failure during the search, this file will show the last table that was successfully searched.
When the search reaches the end of a column, if any matches were found, a file named columnSearchResults is appended with the name of the column. At the same time, the schema_pattern_search file (or custom named file on the setup page) is appended with the results of the search and contains both the column name and the data matched.
The data is recorded in a tab-delimited file and can by used by other processes as needed.
The columnSearchResults.csr file can be used by the Schema Pattern Search to Data Class Association wizard in WB that takes the column results and assigns data classes to them. This can speed the process of data protection during other IRI jobs.
1 COMMENT
[…] through data class libraries. Update Q2’18: IRI has also introduced a schema-wide pattern search wizard to find PII matching RegEx or literal values in multiple tables at […]