Database Subsetting


Next Steps
Overview Benchmarking Compliance DB Test Data DB Subsetting DevOps Test Files/Reports Virtual Test Data TDaaS


Database application developers often rely on data in production tables for testing. But there are several drawbacks to that approach, including, the:

  1. confidentiality of data in those tables
  2. cost of migrating, masking, refreshing, and/or storing replicated databases for testing
  3. redundancy of production data, which means wasted space and insufficient testing coverage
  4. need for only small slices of data for specific test cases

Sometimes smaller, masked subsets of massive files are also needed for rapidly testing applications with anonymized data. Most data masking tools cannot support the volume and variety of flat files involved.


In addition to the powerful database parsing, generation, and population capabilities that IRI RowGen provides for synthesizing structurally and referentially correct test tata, you can now also produce (and mask) referential intact database subsets from standard relational -- as well as complex and/or very large database (VLDB) -- sources.

A proven, powerful Database Subsetting wizard for relational databases is inside IRI Workbench, the Eclipse IDE for the IRI Voracity test data management platform or its component IRI Data Protector Suite test data security tools (IRI FieldShield for data masking and/or RowGen for test data generation). This ergonomic test data subsetting utility allows you to rapidly create custom-sized subsets of manageable, referentially correct data determined by your master (parent) table, and apply consistent data masking and/or mapping rules to all the subset (child) tables at the same time.

It's also possible to selectively subset and mask data from individual tables on an ad hoc basis for testing using the same IRI metadata framework. To do that, simply write, or use a wizard to automatically create, a single source FieldShield data masking job script with the table details, and add either SQL SELECT syntax directly into the input section, or create and use custom /INCOLLECT row-count filtering and/or qualitative /INCLUDE or /OMIT statements to define the size and content of each subset, respetively.

You can provision the DB subsets in multiple ways for test data users working on-premise or in the cloud. These database subset management options include: new persistent or virtual (federated) test schema, flat file targets, and DevOps piplelines per this example.

In addition to database subsets,  you can also create test file subsets. Use RowGen to create synthetic, highly realistic test files in any format and size. Use FieldShield to extract (and mask) test data subsets from structured (flat) files in fixed position or delimited format using its built-in selection and filtering functionality. Built on the same big data engine for Voracity (the IRI CoSort SortCL program), FieldShield will handle structured files at any volume, too.

Subsetting strategies like these not only minimize the risk of PII exposure and privacy law violations, but dramatically lower the costs of database and application testing infrastructures ... some say as much as $50,000 per database. Learn how to automatically set-up, and create test data subsetting jobs in Workbench here.

Share this page

Request More Information

Live Chat

* indicates a required field.
IRI does NOT share your information.