Profiling data, finding matching patterns or related values, and rapidly identifying the locations and lineage attributes of disparate data sources are all ways to reveal the content of data, and how it was created, deleted, or modified. Most tools available for these purpose are expensive, and designed for a specific data source (e.g., one database).
After data is discovered and transformed, application audit trails need comprehensive information about the target layouts and job runs. The details must be readily-available and secure. Logs should also track sensitive data protections and enable: user accountability, job replication, parameter modification, and issue analysis.
For example, data processing forensics may expose if a record count or value range changes beyond an established threshold; this could indicate a problem of data loss or fraud. One of the issues in student and healthcare data is the mitigation of re-ID risk and the need to measure that risk. Few data management software platforms or fit-for-purpose applications support any of these things.
Finally, in the database firewall context, you need way to log your protection policy settings and all traffic and activity into a custom, query-ready audit trail that is secure, and subject to post-deletion recovery.
Using state-of-the-art database, file, and dark data discovery tools in the free IRI Workbench GUI for all IRI software, you can find the location of precise (and fuzzy matching) data patterns, and automatically discover source-specific metadata that reveals file authorship and other attributes. For example, as you locate PII values within databases, flat-files, spreadsheets, text documents, and other repositories, you can also automatically display the location, ownership, security, and other properties of those files.
Automated data classification for databases and files takes this one step further. This wizard allows you to define data classes and groups to which you can apply global data transformation and masking rules. Built-in re-ID risk determination measures the statistical likelihood a masked data set can still be traced to an individual based on remaining quasi-identifiers in the data set.
The job scripts, statistical reports, and audit logs in the IRI Voracity data management platform and its constituent IRI CoSort (SortCL) data transformation and IRI FieldShield data masking programs contain your data layout specifications, query syntax, and manipulation details.
The XML audit log from IRI jobs provide details for each input, inrec (virtual), and output definition -- including which field attributes and protection techniques were specified. Phase-specific record counts -- including the number of records accepted, rejected, and processed -- plus job tuning details, are available in the statistical logs.
The entire job script, along with user, runtime, and environment variable information, are also recorded in the audit trail. It is easy to query and report on the logs using your preferred XML parsing tool or SortCL (through supplied data definition files for the logs). For example, you can query on file and field names, run dates, and job duration. You can quickly examine specific jobs without having to manually review a giant audit trail.
Free data and metadata lineage capabilities are also available in the IRI Workbench, through the use of search tools and hubs like EGit for sharing and securing master data and metadata in the cloud. Graphical data lineage and metadata impact analysis for IRI Voracity platform users is available via AnalytiX DS Mapping Manager and Data Advantage Group's MetaCenter.
Database administrators and their managers can examine the forensic trail of log-in and SQL execution attempts in real-time or through audit log searches in IRI Chakra Max database activity monitoring and database audit and protection (DAM/DAP) software.