Test Data Virtualization

 

Next Steps
Overview Benchmarking Compliance DB Test Data DB Subsetting DevOps Test Files/Reports Virtual Test Data TDaaS

What is Test Data Virtualization?

Test data virtualization combines test data generation -- whether from data masking, subsetting or synthesis -- with efficient test data delivery. The operating word is efficiency, since most test data provisioning today still involves the creation of many physical copies of test tables or files rather than fewer, more consistent, up-to-date golden copies of test data that can be easily accessed and reset.

Manual, database-centric, and off-the-shelf approaches to test data virtualization have proven to be time-consuming and costly, resulting in inadequate testing or database-centric approaches that set back SLA/delivery dates. And testing with the latest, still unmasked production data as a fail safe is simply unsafe.

The ultimate goal is to provide dynamic, more or less on-demand (self-service), test data for database and software application development, systems testing, and outsourcing. The creation and management of virtual test environments containing safe, intelligent test data continues to remain a vexing part of QA and development cycles.

 

Solutions

By leveraging the long-proven synthetic test data generation and subsetting capabilities of the IRI RowGen tool -- or the FieldShield and DarkShield data masking tools also in the IRI Voracity data management platform -- you can satisfy multiple test data management requirements. You can also meet many of the test data provisioning requirements you have through virtual test environments, without the costs or complexity associated with other other test data virtualization solutions.

Test tubes with a yellow liquid being dripped into them

One of inherent benefits of the IRI Voracity data management platform is its combination of robust data integration, test data generation, and data replication capabilities. Together they allow you to create and provision customized, virtual test data solutions quickly and easily for DevOps.

Voracity can combine static and streaming ETL, or real-time incremental database replication, with data masking, subsetting, synthesis, data transformation, and custom formatting. Without impacting live systems or being limited to a particular database or cloud platform, Voracity users can exploit and automate the capture, manipulation, and provisioning of both ad hoc (virtual) and persistent test sets that: reflect production data characteristics, preserve data and referential integrity enterprise-wide (not just database-wide), anonymize PII, and do not get stale.

 

Suggestions

First consider the business rules driving your need for an ad hoc solution. IRI provides advice on considering them in this series of test data management articles, and several facilities to help you discover the data you have to work with in sources like these; i.e., in files, databases, and dark data documents.

Next think about what kind of test data you need based on who needs it, and how and where it will be used. Your may need to be creative; some test targets benefit from a combination of data masking and synthesis like this. Or you may want to mask and thus produce realistic test data while:

  • subsetting it from a database environment like this, or replicating it like this
  • integrating SQL and file sources like this, or previewing ETL jobs like this
  • feeding it to a DevOps (CI/CD) pipeline for test automation like this
  • refreshing a virtual test database in real-time like this
  • streaming it from an IoT data broker, like this

Consider also that every Voracity test data generation process allows you to define multiple, differently formatted persistent and virtual targets simultaneously. Such efficiency and flexibility are especially valuable to DevOps teams who need to work in parallel.

Once techniques and targets are decided, you can also choose how to design the job(s), modify and/or share them, and how and where to run them. Voracity supports multiple job design and runtime methods; see the IRI Workbench section on this page

Further Advantages

Unlike other virtual Test Data Management (TDM) solutions, with IRI you do not need to clone databases, set up a virtual TDM appliance, or anything that complex (or expensive). Test data engineers can serve up as many persistent or virtual copies as they need, and immediately populate their testers' repositories as the test data is generated. However if you do want to a fully masked or synthetic database clone, IRI FieldShield and RowGen jobs can be run from scripts called simultaneously from Actifio, Commvault, and Windocks (virtualized container image) operations!

Robots in a testing assembly like working on acid CC BY-SA 2.5 

IRI subsetting, masking, and synthesis jobs for structured data are also supported in Cigniti and Value Lab TDM portals, which help you produce and govern on-demand test data sets for file, DB and API targets. For TDM involving semi-structured (e.g, HL7, JSON, and XML) and unstructured text or file (e.g., PDF, MS Office, and image data) sources, you can use IRI DarkShield to mask them, or replace real values in them with test data generated by IRI RowGen; see this article.

Finally, the governance of test data can be just as important as the governance of your production data. In addition to the inherent data security governance in Voracity's many static data masking functions, multiple data quality features allow you to validate and stabilize your test data collections, virtual or otherwise.

Workflow diagrams and automated batch file generation support graphical design of independent and dependent work chains. And multiple data and metadata lineage options are supported so that you can track the changes to source data and your test data projects.

Frequently Asked Questions (FAQs)

1. What is test data virtualization?
Test data virtualization is the practice of generating and delivering secure, realistic test data on-demand without physically copying production datasets. It combines data masking, subsetting, and synthesis to create lightweight virtual environments for software testing.
2. How does test data virtualization differ from traditional test data provisioning?
Traditional methods often require full copies of production data, which are slow to create, expensive to store, and risky to use. Virtualization avoids these issues by delivering secure, fit-for-purpose subsets or synthetic datasets that are faster to provision and safer to use.
3. What are the benefits of using IRI for test data virtualization?
IRI’s RowGen, FieldShield, and DarkShield tools—or the Voracity platform which includes them all—enable fast, secure test data provisioning. They support dynamic test sets, multiple formats, DevOps automation, and scalable delivery without needing costly virtualization appliances.
4. How can IRI Voracity generate virtual test data across different environments?
Voracity integrates ETL, masking, subsetting, and synthetic data generation to produce test data in real time or batch. It supports databases, files, streams, and cloud platforms, allowing flexible test data delivery across local, hybrid, and CI/CD environments.
5. Can I use test data virtualization with DevOps pipelines?
Yes. IRI tools integrate directly with DevOps platforms like Jenkins, GitLab, Azure DevOps, and AWS CodePipeline. You can automate test data provisioning at any stage of the CI/CD lifecycle to keep test environments up to date and compliant.
6. What types of data sources are supported for test data virtualization?
IRI supports structured data (from relational databases and flat files), semi-structured data (like JSON, HL7, and XML), and unstructured files (like PDFs, DOCX, and images). Test data can be masked, synthesized, or streamed from these sources as needed.
7. How does IRI ensure data privacy and compliance during virtualization?
IRI applies static data masking techniques that anonymize sensitive data before it's provisioned. Tools like FieldShield and DarkShield support compliance with data privacy laws such as HIPAA, GDPR, and PCI DSS by removing or protecting PII at rest.
8. What options exist for delivering test data in virtual formats?
IRI lets you create ad hoc views, virtual databases, federated schemas, or on-demand file-based test sets. You can also define multiple target formats in a single job, which helps DevOps teams working in parallel on different platforms.
9. Can I blend masked and synthetic test data in a virtual test set?
Yes. You can combine data masking and generation techniques to create test data that mirrors real-world conditions without exposing sensitive information. This is useful for complex use cases that require high realism and compliance.
10. How do I track and manage virtual test data over time?
IRI Voracity supports data and metadata lineage, job versioning, and visual workflows that help you govern test data generation and usage. This ensures consistent quality, traceability, and compliance throughout development cycles.
11. What makes IRI’s approach more efficient than other test data virtualization tools?
IRI avoids expensive and complex TDM appliances. Instead, it uses script-based jobs, Eclipse-based wizards, and DevOps integrations to virtualize and provision test data quickly—without full database clones or persistent infrastructure.
12. Can test data be streamed for real-time testing?
Yes. Voracity supports real-time data provisioning using incremental replication and streaming integration with IoT brokers, allowing you to test how applications respond to real-world, continuous data flows.
Share this page

Request More Information

Live Chat

* indicates a required field.
IRI does NOT share your information.