
New Data Discovery Charts in IRI Workbench
We’re excited to introduce graphical dashboard displays from the structured data discovery wizards in IRI Workbench. It’s much easier to glean insights from statistical analyses of data sources and PII scans using visual charts than by looking at pure text results.
The first data discovery dashboard in Workbench was introduced last year for IRI DarkShield. Its Dark Data Discovery wizard produces both text reports of PII in unstructured sources, and this HTML5 display of PII search results by file type, and how many file of each type were masked:
The newest raft of dashboards cover structured data discovery results from these IRI Workbench wizards:
- DB Profiling
- Flat-File Profiling
- Schema Pattern Search
- Schema Data Class Search
- Directory Data Class Search
Each dashboard uses Chart.js to create graphs that provide visual frames of reference for the data, rather than exporting that data to a spreadsheet or BI tool to provide their visuals. Any external system like that would have to import, restructure, and build a display from the wizard’s test data, all from scratch. Instead now, it’s all automated in, and launched from, IRI Workbench.
In addition to providing a more graphical version of what’s in the text output, the charts actually extend the capabilities of the discovery wizards. If the chart option is selected, a page containing the charts will automatically display upon completion of the wizard.
Data Profiling Dashboards
In the case of the DB Profiling wizard, that page will display the following charts:
- Recurring Column Table – displays titles of the most common columns throughout the dataset. When you hover the mouse pointer over the right column (number of occurrences), a box display shows the list of tables where the column was observed.
- Data Type chart – represents how data is represented across different types
- Min and max values – area charts indicate trends via minimum and maximum values
- Count measurements – these bar charts can also be used to identify trends or anomalies
- Length measurements – area charts show trends in length helping to identify consistency amongst a dataset
- Null values per field – charts showing how much of each field of the dataset is null
- Regex & Value match counts – any results display here, or else show, “Not performed”
For the Flat File Profiling Wizard, that dashboard contains the following:
- Recurring Column Table – displays titles of the most common columns throughout the dataset. When you hovers the mouse pointer over the right column (number of occurrences), a box display shows the list of tables where the column was observed
- Data Type chart – a doughnut chart visualizing how data is represented across different types
- Min and max values – displayed side-by-side on a table, indicates trends via minimum and maximum values
- Count measurements – these horizontal bar charts can also be used to identify trends or anomalies
- Length measurements – bar charts show trends in length helping to identify consistency amongst a dataset
- Null values per field – charts showing how much of each field of the dataset is null
- Regex & Value match counts – any results display here, or else show, “Not performed”
Data Class and Pattern Search Dashboards
For the Schema Pattern Search, that page consists of two simple charts:
- Top Sources Table – displays the schema, table, and data pattern found through the search. When you hover over the schema column, a box displays showing the profile name, and when you hover over the table column, the corresponding column is shown. The table is scrollable, allowing you to see rankings of large datasets
- Top Patterns – represents the data patterns by number of occurrences, allowing you to understand what portion of the data matches the selected pattern. Within this chart, you can click on a pattern in the legend, and the selected section of the chart will be hidden.
The Schema Data Class Search consists of similar charts to the Schema Pattern Search:
- Top Sources Table – displays the schema, table, and data class found through the search. When you hover over the schema column, a box displays showing the profile name, and when you hover over the data class column, the corresponding column is shown. The table is scrollable, allowing you to see rankings of large datasets
- Top Data Classes – represents the data classes by number of occurrences, allowing you to understand what portion of the data matches the selected class. Within this chart, you can click on a pattern in the legend, and the selected section of the chart will be hidden.
Finally, the Directory Data Class Search also consists of two charts:
- Top Sources Table – displays the data class and corresponding file found through the search. When you hover over the file column, a box displays showing the path, and when you hover over the data class column, the corresponding column is shown. The table is scrollable, allowing you to see rankings of large datasets
- Top Data Classes – represents the data classes by number of occurrences, allowing you to understand what portion of the data matches the selected class. Within this chart, you can click on a pattern in the legend, and the selected section of the chart will be hidden.
Interactive Charts
Here is a more detailed view of the charts available in some of the dashboards above.
Profile charts
Most of the charts feature hover functionality, where a user can learn more about the data while pointing the mouse at a dataset. This make it easier to see individual details like the number of occurrences of a searched value, or data type, in the tables you selected in the wizard to profile:
Beneath the title of each section is a small About link. When clicked, a section is displayed that describes the data being shown in each section:
Before
After
The chart’s algorithms parse and process the results from the wizard, and utilize Chart.js to create a visual representation for each item of information. However, if the item was not selected within the wizard, or contains a null data set, it will not be created, and the other charts will simply resize to use the space available:
Row with 3 items:
Row with 2 items:
A way to double check if an item is null is to scroll to the Null Values per Field section. There is a small doughnut chart for each field showing what percentage of the data is null, or not null.
The charts are automatically colored to help make the result stand out: if the chart is green, then 50% or more of the field’s data is not null. If it’s red, 50% or more of the data is null:
Again, hovering will allow you to get a more detailed look at the raw data:
Data Class and Pattern Charts
For the Data Type doughnut charts seen in the Schema Pattern Search, Schema Data Class Search, and the Directory Data Class Search, you can click on an element within the legend, and that section will be hidden. With this, you can customize the chart to show filtered results.
Hovering over sections of the chart will also reveal the number of hits associated with the data class or pattern:
These dashboards also include a table of the top sources. Each row contains the source and the class or pattern found in that source.
The data class and pattern charts have three columns containing the schema, table, and class/pattern while the directory data class chart only contains the source file and data class.
Hovering over the entries will reveal more information about the location of the class or pattern.
You can learn more about IRI Workbench here. If you have any questions about, or need help using, these new data discovery reports, please contact your IRI representative.