Profiling data, finding matching patterns or related values, and rapidly identifying the locations and lineage attributes of disparate data sources are all ways to reveal the content of data, and how it was created, deleted, or modified. Most tools available for these purpose are expensive, and designed for a specific data source (e.g., one database).
After data is discovered and transformed, application audit trails need comprehensive information about the target layouts and job runs. The details must be readily-available and secure. Logs should also track sensitive data protections and enable: user accountability, job replication, parameter modification, and issue analysis.
For example, data processing forensics may expose if a record count or value range changes beyond an established threshold; this could indicate a problem of data loss or fraud. One of the issues in student and healthcare data is the mitigation of re-ID risk and the need to measure that risk. Few data management software platforms or fit-for-purpose applications support any of these things.
Finally, in the database firewall context, you need way to log your protection policy settings and all traffic and activity into a custom, query-ready audit trail that is secure, and subject to post-deletion recovery.
Searching & Profiling Data Sources
Using state-of-the-art database, file, and dark data discovery tools in the free IRI Workbench GUI for all IRI software, you can find the location of precise (and fuzzy matching) data patterns, and automatically discover source-specific metadata that reveals file authorship and other attributes. For example, as you locate PII values within databases, flat-files, spreadsheets, text documents, and other repositories, you can also automatically display the location, ownership, security, and other properties of those files.
Automated data classification for databases and files takes this one step further. This wizard allows you to define data classes and groups to which you can apply global data transformation and masking rules.
Auditing Data Masking Jobs and Scoring Re-Identification Risk
The job scripts, statistical reports, and audit logs in the IRI Voracity data management platform and its constituent IRI CoSort (SortCL) data transformation and IRI FieldShield data masking programs contain your data layout specifications, query syntax, and manipulation details. An onboard re-ID risk determination wizard measures the statistical likelihood that a masked data set can still be traced to an individual based on remaining quasi-identifiers in the data set.
The XML audit log from IRI jobs (like FieldShield data masking) provide details for each input, inrec (virtual), and output definition -- including which field attributes and modification functions were specified. Phase-specific record counts -- including the number of records accepted, rejected, and processed -- plus job tuning details, are available in the statistical logs.
The entire job script, along with user, runtime, and environment variable information, are also recorded in the audit trail. It is easy to query and report on the logs using your preferred XML parsing tool or SortCL (through supplied data definition files for the logs). For example, you can query on file and field names, run dates, and job duration. You can quickly examine specific jobs without having to manually review a giant audit trail.
Data and Metadata Lineage
Free data and metadata lineage capabilities are also available in the IRI Workbench, through the use of search tools and hubs like EGit for sharing and securing master data and metadata in the cloud. Graphical data lineage and metadata impact analysis for IRI Voracity ETL platform users is available via AnalytiX DS Mapping Manager or Data Advantage Group's MetaCenter.
Auditing Database Access & Activity
Database administrators and their managers can examine the forensic trail of log-in and SQL execution attempt -- along with before/after data states or query results shown -- either in real-time or through audit log searches in the IRI Chakra Max database activity monitoring and database audit and protection (DAM/DAP) software:
The Chakra Max audit log uses an encrypted, multi-threaded columnar database to robustly record and rapidly reveal results in online searches or custom reports. It works in active/passive (failover) modes, and can be restored in the event of deletion.