This article is part of a 4-step series introduced here. Navigation between articles is below.
Step 1: Goal Setting & Team Building
Someone needs test data to do something, like:
- stress-testing the functions and performance of applications
- prototyping database load/query and DW ETL/ELT operations
- benchmarking prospective new hardware or software
- outsourcing development or proofs of concept
- demonstrating systems with real-looking, but not real, sample data
In all these cases, the most realistic data possible is needed, but it should also be safe and de-personalized. Sometimes it’s enough to mask real data and test with that. Sometimes, however, production data cannot be used for testing (even if masked), because it is not yet in existence, nor available since source data access is restricted. It may also not be realistic or robust enough for application use cases, nor big enough to stress-test the future capacity of the solution (think healthcare.gov).
Sub-setting and masking production data can also be arduous relative to an automated method of creating data from scratch using existing metadata (like database DDL or a COBOL copybook). In that case, the goal would be to generate test data with the properties, but not the actual values, of data in production.
Once you determine the need for the test data and whether it is available (and needs masking using a tool like IRI FieldShield), or generated (‘from scratch’ with a tool like IRI RowGen), the project manager should identify who requires the test data, and detail their particular technical requirements for it (see Step 2).
Are they the same people who have access to the production data to be masked, or to the metadata information needed to generate production-quality data? Can they work together? Identify who will:
- obtain, mask, and/or create the test data (e.g., DBA, programmer)
- deliver the test data sets to the stakeholders, and/or populate the target tables directly
- validate the test data’s quality and quantity (sufficiency for effect; e.g., application developer, benchmark tester)
- assess and verify its compliance with internal and governmental data privacy regulations (e.g., CISO, data governance or stewardship lead)
- use the test data (e.g. developer, solution architect, DBA), and give feedback to the provider(s)
- document and version-control the metadata for the project(s) (e.g., application developer)
- store, relocate, and/or dispose of the test data sets after use, as needed (e.g,. system administrator)
Some of these people or roles will overlap; or, you might be the only person doing it all! And the same tools used to mask or create the test data may not necessarily be the same ones used for managing it.
Having FieldShield and RowGen sharing the same metadata and Eclipse GUI allows browsing/acquisition, masking and/or generation, and test data asset management to occur in the same environment. Java application development, project management and version control, plus access to databases for browsing, population, and SQL testing are other benefits of this environment. BIRT can also be used to visualize the test data in charts and graphs that show its distribution. You may or may not need all of that functionality or control, but it’s something to think about, and it’s nice to have it all in one place.