The value of good test data to DBAs is well known:
“Testing of database-intensive applications has unique challenges that stem from hidden dependencies, subtle differences in data semantics, target database schemes, and implicit business rules. These challenges become even more difficult when the application involves integrated and heterogeneous databases or confidential data. Proper test data that simulate real-world data problems are critical to achieving reasonable quality benchmarks for functional input-validation, load, performance, and stress testing. ” – Ali Raza & Stephen Clyde, abstract from Creating Datasets for Testing Relational Databases
Testing database operations, prototyping data warehouse and ETL/ELT jobs, safely outsourcing file samples and reports, and running performance benchmarks on DB appliances all require test data with the look and feel of the production database so that the applications using that test data now will perform successfully with real data later. In their 2012 book, Raza and Clyde compare test data generation against test data extraction.
IRI and its users know that using real data for testing is undesirable. The most obvious reason today is that real data risk exposing personally identifiable information (PII) that needs to be kept confidential. A developer or tester does not want to run processes, or test a database system and risk sharing customer information like social security numbers, credit card information, birth dates, etc. during this phase. Currently available real data may also not be robust or realistic enough to stress-test applications or databases which will have to handle larger volumes and/or value ranges.
Unfortunately, Raza and Clyde wrote their book before RowGen v3 was released, when they may have observed it generates test data that:
1) does not expose PII because it contains new, or randomized real, column values
2) maintains the structural and referential integrity defined in the original DDL
3) is not limited to the original database’s data volumes or value ranges
4) can be customized through the generation of scripts to address complex requirements
5) is pre-sorted and automatically bulk-loaded for the fastest population possible
6) are defined in batch scripts that have diverse flexibility and can be exported, re-used, and modified as needed
IRI RowGen v3 is the latest release of the world’s fastest and most robust high-volume test data generator for relational databases. RowGen will run from the IRI Workbench GUI built on Eclipse, on the command line, or from batch programs, to produce the quality and quantity of test data necessary to accurately reflect the scope, layouts, and relationships within production databases, and in turn, data warehouses and operational data stores.
RowGen v3’s new DB Test Data wizard, when launched from the IRI Workbench GUI, guides users through the specification and automation of:
Parsing – by selecting the schema and tables to populate, RowGen translates the database table descriptions and integrity constraints into .rcl scripts that specify the source structure, dependent sets, and data creation, in the order necessary to populate the tables in the right format, and with all primary keys, unique indexes, and foreign key relationships respected.
Generation – by building and running the .rcl scripts to create one test file per table that can be bulk loaded, and/or saved for future use.
Population – by bulk-loading the target tables in the right order with pre-sorted test data that is structurally and referentially correct.
The process can rapidly load huge test databases and comply with both business rules and data privacy laws. The data generated are realistic and robust enough to stress-test database operations and query applications.
RowGen v3 also supports rule- and script-based options to control specific field values and value range distributions that accommodates specific database constraints, and best represent the appearance and occurrence rates of data in production. Users can also graph and visually substantiate that the test values conform to linear, normalized, weighted, or standard distributions.