Database and solution architects depend on realistic test data to:
- help create new databases, prototype ETL jobs or applications
- benchmark performance in new or existing platforms
- stress-test systems
- protect confidential information in existing systems if database work is outsourced or used for demonstrations.
Production data runs the risk of exposing personally identifiable information (PII), proprietary information, or may not reflect the types or volume of real data that can be encountered in the future.
The data used for testing must not be real, but instead appear real; it must truly represent what would be found in production, and conform to the value and volume characteristics, along with the business rules, necessary to test any software or system accurately.
For more information on how IRI RowGen software enhances test data realism, refer to Making Test Data Realistic – Without Taking It from Production.
There are a number of functions and wizards built into RowGen that allow it to synthesize realistic sets of test data. Through the Set File from Column wizard, RowGen can create set file extracts from production column values. The set files contain actual, but randomly selected column pairs that exist in production. This is what is referred to as a joined pair, or valid pair.
These set files would normally contain innocuous information that does not have to be protected, and make sense to be paired — like a city and state, or a state and zip code. Why is a joined pair important? If a user is testing his database or data warehouse system, he wants to be able to tell if data is being loaded and organized properly.
By loading known, valid pairs into his test data, the user can visually validate results quickly. The valid pairs provide a method of testing data loading, field entry parameters, and the subsequent presentation of the data. Consider this example from the RowGen ‘Set File from Column’ wizard:
Another useful test data generation feature built into RowGen is the ability to create composite keys. A composite key is a key made up of two or more attributes that uniquely identify an occurrence.
RowGen can create all permutations of two or more attributes or fields that create all possible pairs and composite keys. Depending on the testing scenario, those composite keys can represent each unique record that could be needed for testing.
This process can be used to create a primer for all-pairs testing by generating the initial data up front, and then allowing the user to pare it down to carefully selected test vectors that properly test all combinations of scenarios for a system.
The reasoning behind all-pairs testing is basic: the simplest bugs in a program are generally triggered by a single input parameter. The next simplest category of bugs consists of those dependent on interactions between pairs of parameters, which can be easily caught with all-pairs testing. Many testing methods regard all-pairs testing of a system as a reasonable cost-benefit compromise between often computationally infeasible combinatorial testing methods, and less exhaustive methods that fail to exercise all possible pairs of parameters.
RowGen v3 is IRI’s latest offering in safe, intelligent, high-volume test data generation for relational databases, sequential files, and formatted report targets. RowGen runs from the IRI Workbench GUI (built on Eclipse™), on the command line, or from batch programs, to produce the quality and quantity of test data necessary to accurately reflect the scope, layouts, and relationships within production databases and data warehouses. For more information on RowGen, see http://www.iri.com/products/rowgen.