The creation and management of safe, intelligent test data continues to remain a vexing part of QA and development cycles. Manual and off-the-shelf approaches have proven to be time-consuming and costly, and their incomplete solutions have resulted in inadequate testing and missed SLA/delivery dates. Using the latest, still unmasked production data as a fail safe is simply unsafe.
By leveraging the long-proven RowGen test data generation capabilities and the graphical facilities of Eclipse (IRI Workbench) -- both of which are also part of the IRI Voracity data management platform -- you can address multiple complex test data requirements. These requirements may involve more customized, virtual test sets and the need to provision them quickly and easily to DevOps testers.
One of the lesser known, but inherent benefits of Voracity as a data integration and governance platform, is its ability to combine static and streaming ETL with simultaneous masking, data synthesis, data transformation, and custom formatting. These features -- normally and also available standalone in IRI FieldShield, RowGen, and CoSort respectively -- are how Voracity enables both ad hoc and automated capture, manipulation, and provisioning of both ad hoc (virtual) and persistent test sets ... test sets that reflect production data precisely without compromising any of its confidentiality or affecting any live systems.
First consider the business rules driving your need for an ad hoc solution. IRI provides advice on considering them in this series of test data management articles, and several facilities to help you discover the data you have to work with in sources like these; i.e., in files, databases, and dark data documents.
Your test targets may need a combination of data masking and synthesis like this. Or you may want to mask and thus produce realistic test data while:
- subsetting it from a database environment like this, or replicating it like this
- integrating SQL and file sources like this, or previewing ETL jobs like this
- streaming it from a broker, like this.
Once techniques are decided upon, you can also choose how to design the job(s), modify and/or share them, and how and where to run them. Voracity supports multiple job design and runtime methods; see the IRI Workbench section on this page. And for every generation process, multiple differently formatted persistent and virtual targets can be defined and provided simultaneously. Such efficiency and flexibility are especially valuable to DevOps teams who need to work in parallel.
Unlike other virtual TDM solutions, with IRI you do not need to clone databases, set up a virtual TDM appliance, or anything that complex (or expensive). Test data engineers can serve up as many persistent or virtual copies as they need, and immediately populate their testers' repositories as the test data is generated. However if you do want to a fully masked or synthetic database clone, IRI FieldShield and RowGen jobs can be run as scripts called simultaneously from Actifio, Commvault, and Windocks (virtualized container image) operations!
IRI subsetting, masking, and synthesis jobs for structured data are also supported in the Value Labs Test Data Hub we application, which produces data sets on demand into file, DB and API targets. For TDM involving semi-structured (e.g, HL7, JSON, and XML) and unstructured text or file (e.g., PDF, MS Office, and image data), you can make application or web services calls to the DarkShield API supporting the same masking functions and an extended set of search methods to find and de-identify production data for test targets.
Finally, the governance of test data can be just as important as the governance of your production data. In addition to the inherent data security governance in Voracity's many static data masking functions, multiple data quality features allow you to validate and stabilize the collections. Workflow diagrams and automated batch file generation support graphical design of independent and dependent work chains. And multiple data and metadata lineage options are supported so that you can track the changes to source data and your test data projects.