{"id":16403,"date":"2023-01-24T17:18:57","date_gmt":"2023-01-24T22:18:57","guid":{"rendered":"https:\/\/www.iri.com\/blog\/?p=16403"},"modified":"2025-11-12T06:26:42","modified_gmt":"2025-11-12T11:26:42","slug":"iri-test-data-generation","status":"publish","type":"post","link":"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/","title":{"rendered":"IRI Test Data Generation"},"content":{"rendered":"<p>We started this series of articles by\u00a0<a href=\"https:\/\/www.iri.com\/blog\/test-data\/iri-voracity-and-test-design-automation\/\">talking about test design automation<\/a> and the need to introduce automation throughout your testing processes. In this blog, we come full circle to talk, once again, about testing.<\/p>\n<p>Now, however, we are going to discuss test data generation specifically. This forms a significant subset of test data management, being that it is an integral part of keeping your test data secure while allowing it to retain the characteristics that are important to your tests. To wit, either you will want to find any sensitive data within the data you want to test, then replace much or all of it with masked data; or you will want to generate entire sets of synthetic data (which is to say, data that is realistic but not real) for testing purposes. Or, quite possibly, both. Regardless, you will need some degree of test data generation capabilities. For generating masked data, see the previous article in this series, in which we discuss this topic at length. For synthetic data generation, read on, although note that \u2013 at least in IRI\u2019s case \u2013 the two can quite readily be combined together.<\/p>\n<p>Synthetic data generation can either be thought of as an alternative to the traditional method of generating test data via subsetting and masking or as an addition to it, in which it is used to generate the masked data that is used to replace your sensitive data. In either case, one of the most important aspects to understand about synthetic data generation is that it does not just generate random data. Rather, it uses sophisticated methods to analyze the structure of an existing data set, then produces a new data set composed of data that is entirely fake individually but that possesses the same statistical properties of the original data set when considered as a whole. This is sometimes referred to as \u201cpreserving statistical integrity\u201d (in contrast to preserving referential integrity, which is vitally important for masking data consistently across relational databases while maintaining existing relational structures in the masked data). Thus, you end up with a selection of entirely safe data that cannot possibly be used to identify an individual but is still just as useful as the original, sensitive data set for the purposes of testing. That is the ideal, anyway \u2013 the degree to which various vendor offerings actually achieve this varies considerably.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-16404 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2023\/01\/IRI-Fig-Test-Data-Generation-970x454-1-300x140.jpg\" alt=\"\" width=\"626\" height=\"292\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/01\/IRI-Fig-Test-Data-Generation-970x454-1-300x140.jpg 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/01\/IRI-Fig-Test-Data-Generation-970x454-1-768x359.jpg 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/01\/IRI-Fig-Test-Data-Generation-970x454-1.jpg 970w\" sizes=\"(max-width: 626px) 100vw, 626px\" \/><\/p>\n<p><a href=\"https:\/\/www.iri.com\/products\/voracity\">IRI Voracity<\/a> \u2013 or, more specifically, IRI RowGen \u2013 is available for generating \u201crealistic but not real\u201d synthetic data. It places particular emphasis on test data customization: on giving you fine-grained control over what data is generated, and moreover, how and where it is generated. For instance, at a basic level, it can either generate test data based on available information provided to it or select data randomly from a \u201c<a href=\"https:\/\/www.iri.com\/blog\/test-data\/all-about-iri-set-files-a-primer\/\">set file<\/a>\u201d that has been prepared ahead of time, either by hand or in IRI Workbench. These set files may themselves consist of synthetic data, or of real data that has been isolated from any associated data to the point that it is not identifying. Set files can also be simple lists or have multiple columns.\u00a0 The general idea is that multiple set files can be drawn from simultaneously to create a holistic data profile for a person or other entity that doesn\u2019t actually exist, but that has realistic attributes drawn from your actual data.<\/p>\n<p>Various <a href=\"https:\/\/www.iri.com\/blog\/iri\/iri-workbench\/data-generation-rules-workbench\/\">generation functions<\/a> are available for creating test data sets, including both the specific \u2013 say, national ID number generation \u2013 and the generic \u2013 such as generating data according to a predefined, weighted statistical distribution. There are multiple ways to customize the end results of these functions: test data can be generated in such a way that each value is unique, each value in a set file can be mandated to be used exactly once, and so on. You can even define your own compound data formats.<\/p>\n<p>In short, whether synthetic test data is randomly generated or selected, its production characteristics \u2013 including original data formats and sizes, value ranges, key relationships, and frequency distributions \u2013 are preserved. Basically, there is a lot of customization of test data available, and it should be obvious that this can be useful for tailoring your generated test data to your specific business needs. Moreover, this extends past just what data you are generating and also encompasses how and where you are generating it (which means that you could, for instance, generate data as part of a <a href=\"https:\/\/www.prnewswire.com\/news-releases\/iri-now-delivering-test-data-automation-for-popular-devops-pipelines-301700673.html\">CI\/CD pipeline<\/a>).\u00a0 This adds further depth of functionality to the test data generation, and more specifically the test data customization, offered by RowGen in particular and Voracity in general.<\/p>\n<section>\n<div class=\"container plain-width\">\n<div class=\"faq-section\">\n<h3>Frequently Asked Questions (FAQs)<\/h3>\n<div class=\"faq-item\">\n<div class=\"faq-question\">1. What is test data generation and why is it important? <i class=\"faq-icon fa fa-plus\"><\/i><i class=\"faq-icon fa fa-minus\"><\/i><\/div>\n<div class=\"faq-answer\">Test data generation is the process of creating data sets for use in database, application, and systems testing. It is important because it allows developers and testers to work with data that mimics real-world scenarios while protecting sensitive information. This ensures thorough testing without risking privacy or compliance violations.<\/div>\n<\/div>\n<div class=\"faq-item\">\n<div class=\"faq-question\">2. How does synthetic data generation differ from subsetting and masking? <i class=\"faq-icon fa fa-plus\"><\/i><i class=\"faq-icon fa fa-minus\"><\/i><\/div>\n<div class=\"faq-answer\">Synthetic data generation creates entirely new, fake data that preserves the statistical properties of real data, whereas subsetting and masking involve taking a sample of real data and de-identifying it. Synthetic data eliminates any risk of exposing real individuals\u2019 information while still being representative for testing.<\/div>\n<\/div>\n<div class=\"faq-item\">\n<div class=\"faq-question\">3. What does \u201cpreserving statistical integrity\u201d mean in synthetic data generation? <i class=\"faq-icon fa fa-plus\"><\/i><i class=\"faq-icon fa fa-minus\"><\/i><\/div>\n<div class=\"faq-answer\">Preserving statistical integrity means that while the data values are entirely fake, they retain the same distribution patterns, value ranges, and relationships as the original data. This makes the synthetic data just as useful as real data for analytics, testing, and performance validation.<\/div>\n<\/div>\n<div class=\"faq-item\">\n<div class=\"faq-question\">4. How does IRI RowGen support test data generation? <i class=\"faq-icon fa fa-plus\"><\/i><i class=\"faq-icon fa fa-minus\"><\/i><\/div>\n<div class=\"faq-answer\"><a href=\"https:\/\/www.iri.com\/products\/rowgen\">IRI RowGen<\/a> provides fine-grained control over test data creation by allowing users to define what data is generated, how it is generated, and where it is delivered. It supports both random generation and set-file-driven generation, enabling realistic yet non-identifiable data production.<\/div>\n<\/div>\n<div class=\"faq-item\">\n<div class=\"faq-question\">5. Can RowGen combine masking and synthetic data generation? <i class=\"faq-icon fa fa-plus\"><\/i><i class=\"faq-icon fa fa-minus\"><\/i><\/div>\n<div class=\"faq-answer\">Yes. RowGen can generate synthetic data to replace sensitive data discovered through masking operations, allowing both approaches to work together. This ensures secure test data that is still representative of production systems.<\/div>\n<\/div>\n<div class=\"faq-item\">\n<div class=\"faq-question\">6. How customizable is the data generated by RowGen? <i class=\"faq-icon fa fa-plus\"><\/i><i class=\"faq-icon fa fa-minus\"><\/i><\/div>\n<div class=\"faq-answer\">RowGen offers a high level of customization, including unique value generation, use of set files with specific frequency distributions, and compound data formats. Users can tailor the generated data to match their business rules, relational structures, and compliance needs.<\/div>\n<\/div>\n<div class=\"faq-item\">\n<div class=\"faq-question\">7. What are set files and how are they used in RowGen? <i class=\"faq-icon fa fa-plus\"><\/i><i class=\"faq-icon fa fa-minus\"><\/i><\/div>\n<div class=\"faq-answer\">Set files are predefined lists of data values that RowGen can use for test data generation. They can contain synthetic or real de-identified data and can be used to build realistic profiles by combining multiple set files to represent entities like people or organizations.<\/div>\n<\/div>\n<div class=\"faq-item\">\n<div class=\"faq-question\">8. Can RowGen generate test data for specific use cases like national IDs or emails? <i class=\"faq-icon fa fa-plus\"><\/i><i class=\"faq-icon fa fa-minus\"><\/i><\/div>\n<div class=\"faq-answer\">Yes. RowGen includes generation functions for specific data types such as national IDs, emails, phone numbers, and more. It can also create generic data using statistical distributions or user-defined rules.<\/div>\n<\/div>\n<div class=\"faq-item\">\n<div class=\"faq-question\">9. How does RowGen support CI\/CD pipelines? <i class=\"faq-icon fa fa-plus\"><\/i><i class=\"faq-icon fa fa-minus\"><\/i><\/div>\n<div class=\"faq-answer\">RowGen can be integrated into CI\/CD pipelines to automatically generate test data during build and deployment processes. This helps ensure that every testing environment has consistent, secure, and relevant data.<\/div>\n<\/div>\n<div class=\"faq-item\">\n<div class=\"faq-question\">10. What are the benefits of using synthetic test data over production data? <i class=\"faq-icon fa fa-plus\"><\/i><i class=\"faq-icon fa fa-minus\"><\/i><\/div>\n<div class=\"faq-answer\">Synthetic test data eliminates privacy risks, complies with data protection regulations, and allows full coverage of edge cases that may not exist in production data. It also ensures that sensitive customer information is never exposed in testing environments.<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/section>\n","protected":false},"excerpt":{"rendered":"<p>We started this series of articles by\u00a0talking about test design automation and the need to introduce automation throughout your testing processes. In this blog, we come full circle to talk, once again, about testing. Now, however, we are going to discuss test data generation specifically. This forms a significant subset of test data management, being<\/p>\n<div><a class=\"btn-filled btn\" href=\"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/\" title=\"IRI Test Data Generation\">Read More<\/a><\/div>\n","protected":false},"author":188,"featured_media":16405,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[1,34,29],"tags":[789,88,191],"class_list":["post-16403","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-transformation2","category-business","category-test-data","tag-iri-voracity","tag-test-data-2","tag-test-data-generation"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v23.4 (Yoast SEO v23.4) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>IRI Test Data Generation - IRI<\/title>\n<meta name=\"description\" content=\"We are going to discuss test data generation specifically. This forms a significant subset of test data management, being that it is an integral part of keeping your test data secure while allowing it to retain the characteristics that are important to your tests. To wit, either you will want to find any sensitive data within the data you want to test, then replace much or all of it with masked data; or you will want to generate entire sets of synthetic data (which is to say, data that is realistic but not real) for testing purposes. Or, quite possibly, both. Regardless, you will need some degree of test data generation capabilities.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"IRI Test Data Generation\" \/>\n<meta property=\"og:description\" content=\"We are going to discuss test data generation specifically. This forms a significant subset of test data management, being that it is an integral part of keeping your test data secure while allowing it to retain the characteristics that are important to your tests. To wit, either you will want to find any sensitive data within the data you want to test, then replace much or all of it with masked data; or you will want to generate entire sets of synthetic data (which is to say, data that is realistic but not real) for testing purposes. Or, quite possibly, both. Regardless, you will need some degree of test data generation capabilities.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/\" \/>\n<meta property=\"og:site_name\" content=\"IRI\" \/>\n<meta property=\"article:published_time\" content=\"2023-01-24T22:18:57+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-11-12T11:26:42+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/01\/BR-1000-IRI-TD-Customisation-banner-977-x-250px-970x248-1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"970\" \/>\n\t<meta property=\"og:image:height\" content=\"248\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Daniel Howard, Bloor Research\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Daniel Howard, Bloor Research\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/\"},\"author\":{\"name\":\"Daniel Howard, Bloor Research\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/9099cd77d0c3653574d152e6897fa38f\"},\"headline\":\"IRI Test Data Generation\",\"datePublished\":\"2023-01-24T22:18:57+00:00\",\"dateModified\":\"2025-11-12T11:26:42+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/\"},\"wordCount\":1268,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/01\/BR-1000-IRI-TD-Customisation-banner-977-x-250px-970x248-1.jpg\",\"keywords\":[\"IRI Voracity\",\"test data\",\"test data generation\"],\"articleSection\":[\"Data Transformation\",\"IRI Business\",\"Test Data\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/\",\"url\":\"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/\",\"name\":\"IRI Test Data Generation - IRI\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/01\/BR-1000-IRI-TD-Customisation-banner-977-x-250px-970x248-1.jpg\",\"datePublished\":\"2023-01-24T22:18:57+00:00\",\"dateModified\":\"2025-11-12T11:26:42+00:00\",\"description\":\"We are going to discuss test data generation specifically. This forms a significant subset of test data management, being that it is an integral part of keeping your test data secure while allowing it to retain the characteristics that are important to your tests. To wit, either you will want to find any sensitive data within the data you want to test, then replace much or all of it with masked data; or you will want to generate entire sets of synthetic data (which is to say, data that is realistic but not real) for testing purposes. Or, quite possibly, both. Regardless, you will need some degree of test data generation capabilities.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/#primaryimage\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/01\/BR-1000-IRI-TD-Customisation-banner-977-x-250px-970x248-1.jpg\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/01\/BR-1000-IRI-TD-Customisation-banner-977-x-250px-970x248-1.jpg\",\"width\":970,\"height\":248},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.iri.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"IRI Test Data Generation\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.iri.com\/blog\/#website\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"name\":\"IRI\",\"description\":\"Total Data Management Blog\",\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.iri.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\",\"name\":\"IRI\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"width\":750,\"height\":206,\"caption\":\"IRI\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/9099cd77d0c3653574d152e6897fa38f\",\"name\":\"Daniel Howard, Bloor Research\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/9386a20e3db8c225dffad46b5ba7c313?s=96&d=blank&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/9386a20e3db8c225dffad46b5ba7c313?s=96&d=blank&r=g\",\"caption\":\"Daniel Howard, Bloor Research\"},\"url\":\"https:\/\/www.iri.com\/blog\/author\/dhoward\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"IRI Test Data Generation - IRI","description":"We are going to discuss test data generation specifically. This forms a significant subset of test data management, being that it is an integral part of keeping your test data secure while allowing it to retain the characteristics that are important to your tests. To wit, either you will want to find any sensitive data within the data you want to test, then replace much or all of it with masked data; or you will want to generate entire sets of synthetic data (which is to say, data that is realistic but not real) for testing purposes. Or, quite possibly, both. Regardless, you will need some degree of test data generation capabilities.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/","og_locale":"en_US","og_type":"article","og_title":"IRI Test Data Generation","og_description":"We are going to discuss test data generation specifically. This forms a significant subset of test data management, being that it is an integral part of keeping your test data secure while allowing it to retain the characteristics that are important to your tests. To wit, either you will want to find any sensitive data within the data you want to test, then replace much or all of it with masked data; or you will want to generate entire sets of synthetic data (which is to say, data that is realistic but not real) for testing purposes. Or, quite possibly, both. Regardless, you will need some degree of test data generation capabilities.","og_url":"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/","og_site_name":"IRI","article_published_time":"2023-01-24T22:18:57+00:00","article_modified_time":"2025-11-12T11:26:42+00:00","og_image":[{"width":970,"height":248,"url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/01\/BR-1000-IRI-TD-Customisation-banner-977-x-250px-970x248-1.jpg","type":"image\/jpeg"}],"author":"Daniel Howard, Bloor Research","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Daniel Howard, Bloor Research","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/#article","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/"},"author":{"name":"Daniel Howard, Bloor Research","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/9099cd77d0c3653574d152e6897fa38f"},"headline":"IRI Test Data Generation","datePublished":"2023-01-24T22:18:57+00:00","dateModified":"2025-11-12T11:26:42+00:00","mainEntityOfPage":{"@id":"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/"},"wordCount":1268,"commentCount":0,"publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/01\/BR-1000-IRI-TD-Customisation-banner-977-x-250px-970x248-1.jpg","keywords":["IRI Voracity","test data","test data generation"],"articleSection":["Data Transformation","IRI Business","Test Data"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/","url":"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/","name":"IRI Test Data Generation - IRI","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/#primaryimage"},"image":{"@id":"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/01\/BR-1000-IRI-TD-Customisation-banner-977-x-250px-970x248-1.jpg","datePublished":"2023-01-24T22:18:57+00:00","dateModified":"2025-11-12T11:26:42+00:00","description":"We are going to discuss test data generation specifically. This forms a significant subset of test data management, being that it is an integral part of keeping your test data secure while allowing it to retain the characteristics that are important to your tests. To wit, either you will want to find any sensitive data within the data you want to test, then replace much or all of it with masked data; or you will want to generate entire sets of synthetic data (which is to say, data that is realistic but not real) for testing purposes. Or, quite possibly, both. Regardless, you will need some degree of test data generation capabilities.","breadcrumb":{"@id":"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/#primaryimage","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/01\/BR-1000-IRI-TD-Customisation-banner-977-x-250px-970x248-1.jpg","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/01\/BR-1000-IRI-TD-Customisation-banner-977-x-250px-970x248-1.jpg","width":970,"height":248},{"@type":"BreadcrumbList","@id":"https:\/\/www.iri.com\/blog\/test-data\/iri-test-data-generation\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.iri.com\/blog\/"},{"@type":"ListItem","position":2,"name":"IRI Test Data Generation"}]},{"@type":"WebSite","@id":"https:\/\/www.iri.com\/blog\/#website","url":"https:\/\/www.iri.com\/blog\/","name":"IRI","description":"Total Data Management Blog","publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.iri.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.iri.com\/blog\/#organization","name":"IRI","url":"https:\/\/www.iri.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","width":750,"height":206,"caption":"IRI"},"image":{"@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/9099cd77d0c3653574d152e6897fa38f","name":"Daniel Howard, Bloor Research","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/9386a20e3db8c225dffad46b5ba7c313?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/9386a20e3db8c225dffad46b5ba7c313?s=96&d=blank&r=g","caption":"Daniel Howard, Bloor Research"},"url":"https:\/\/www.iri.com\/blog\/author\/dhoward\/"}]}},"jetpack_featured_media_url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2023\/01\/BR-1000-IRI-TD-Customisation-banner-977-x-250px-970x248-1.jpg","_links":{"self":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/16403"}],"collection":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/users\/188"}],"replies":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/comments?post=16403"}],"version-history":[{"count":6,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/16403\/revisions"}],"predecessor-version":[{"id":18761,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/16403\/revisions\/18761"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media\/16405"}],"wp:attachment":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media?parent=16403"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/categories?post=16403"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/tags?post=16403"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}