{"id":6770,"date":"2015-03-13T15:26:42","date_gmt":"2015-03-13T19:26:42","guid":{"rendered":"http:\/\/www.iri.com\/blog\/?p=6770"},"modified":"2021-12-13T11:29:03","modified_gmt":"2021-12-13T16:29:03","slug":"creating-test-data-for-cassandra-datastax","status":"publish","type":"post","link":"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/","title":{"rendered":"Creating Test Data for Cassandra"},"content":{"rendered":"<p><em>UPDATE: Q1&#8217;17: Native JSON file generation, and available JDBC and ODBC drivers for NoSQL RDBs like Cassandra, now provide a more seamless connection than the CSV\/import approach shown below. Please contact <a href=\"mailto:fieldshield@iri.com\">rowgen@iri.com<\/a> or <a href=\"mailto:voracity@iri.com\">voracity@iri.com<\/a> for details.<\/em><\/p>\n<p>DataStax\/Apache Cassandra cannot readily generate and populate realistic\u00a0prototypes for testing queries or planning capacity; the Cassandra <a href=\"http:\/\/www.datastax.com\/documentation\/cassandra\/1.2\/cassandra\/tools\/toolsCStress_t.html\" target=\"_blank\" rel=\"noopener\">stress tool<\/a>\u00a0only inserts random values at the time of this writing. This article explains how to use the <a href=\"http:\/\/www.iri.com\/products\/rowgen\" target=\"_blank\" rel=\"noopener\">IRI RowGen<\/a>\u00a0test data synthesis product (or the\u00a0<a href=\"http:\/\/www.iri.com\/products\/voracity\">IRI Voracity<\/a>\u00a0data management platform which includes RowGen and IRI FieldShield &amp; DarkShield <a href=\"https:\/\/www.iri.com\/solutions\/data-masking\">data masking<\/a> products) to create and load artificial but realistic test data into\u00a0Cassandra\u00a0via a\u00a0CSV file with\u00a0production data characteristics.<\/p>\n<p>In this\u00a0example, we know that our table will contain customers with Usernames, First and Last Names, Email Addresses, and Credit Card Numbers. To create our test data, we must first generate some set files containing test values for each of those categories. A set file is a list of one or more tab-delimited values that may already exist, or need to be generated manually or automatically from database columns through the \u2018Generate New Set File\u2019 wizard in IRI RowGen.<\/p>\n<p>Of course,\u00a0you will need to\u00a0consider the structure and content of the test data for your table needs. See <a href=\"http:\/\/www.iri.com\/blog\/test-data\/test-data-management-test-data-needs-assessment\/\" target=\"_blank\" rel=\"noopener\">this article<\/a> for typical planning considerations.<\/p>\n<p><strong>Generating Names<\/strong><\/p>\n<p>1) Create a compound data value (first and last names combined) job script named \u201cCreateNamesSet.rcl\u201d that RowGen can execute to produce a set file. Call the output \u201cUser.set\u201d because these names will also be used as the basis for our usernames.<\/p>\n<p>2) Create three fields to be generated in User.set: last name, tab separator, and first name. Name the first field \u201cLastName\u201d and choose the method that will select values from an IRI-provided set file called \u201cnames_last.set\u201d. Add the literal value \u201c\\t\u201d to add a tab separator, and then repeat the process used for LastName and FirstName values using &#8220;names_first.set&#8221;.<\/p>\n<p>3) Run CreateNamesSet.rcl with RowGen, either on the command line or from the IRI Workbench GUI, to produce the tab-delimited User.set file of first and last names, which will be used in both the generation of usernames and in the final test file build that populates our prototype collection.<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/03\/User1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-6871 size-full\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/03\/User1-e1426860409209.png\" alt=\"Cassandra_RowGen_User\" width=\"600\" height=\"578\" \/><\/a><\/p>\n<p><strong>Generating Usernames<br \/>\n<\/strong>For Usernames, we will create a set file that utilizes the Users.set file generated above. Usernames for this example will combine last name, first initial, and a randomly generated number between 100 and 999.<\/p>\n<p>1) Create a new RowGen job script with the Compound Data Wizard, call it \u201cCreateUsernamesSet.rcl\u201d, and name the output set file \u201cUsernames.set\u201d.<\/p>\n<p>2) Build compound username values with three components named Part1, Part2, and Part3.<\/p>\n<p>3) For Part1, choose the method that will select values from (browse to) the previously-generated User.set file, and specify \u2018ALL\u2019 for the selection type to maintain the association between users, usernames, and email addresses. Set the size to 5.<\/p>\n<p>4) For Part2, repeat the process used for Part1, except for Selection type, select \u2018Row\u2019 and set Column Index to 2. Set the size to 1. This guarantees all the last names will be used in the generation, and that the first letter of the first name in the same row is appended to the user name.<\/p>\n<p>5) For Part3, specify the generation of a numeric value between 100 and 999 to suffix a random integer with each username.<\/p>\n<p>Upon execution of CreateUsernamesSet.rcl, we see that each username contains the first five letters of their last name, then their first initial, then a random 3-digit number:<\/p>\n<p lang=\"zxx\"><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/03\/Usernames.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-6872 size-full\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/03\/Usernames-e1426860426474.png\" alt=\"Cassandra_RowGen_Usernames\" width=\"600\" height=\"556\" \/><\/a><\/p>\n<p><strong>Generating Emails<br \/>\n<\/strong>Next we will create an email set file that appends the username values with randomly-selected domain names. Because some email services are more popular than others, we will also create a weighting system to reflect a higher frequency of Yahoo and Gmail domains.<\/p>\n<p>1) Run RowGen\u2019s \u2018New Custom Test Data\u2019 job wizard to create a job called \u201cCreateEmailsSet\u201d that produces a set file called \u201cEmails.set\u201d.<\/p>\n<p>2) Produce the username part of the email. In the Test Data Definition dialog, click New Field, and rename the first field Usernames. Double-click on it to launch the Generation Field dialog and \u201cDefine \u2026\u201d its Set file as Usernames.set. Set the size to 9 and click OK.<\/p>\n<p>3) Produce the domain part of the email (which includes the @ symbol). In the Layout Fields dialog, click New Field and rename it to \u201caddress\u201d and double-click on it. In the Generation Field dialog, specify a \u201d ,\u201d with a position of 10 and a size of 20. In the Data Generation \/ Data Distribution section below, click \u201cDefine \u2026\u201d to name a new data distribution of items \u201cWeightedEmails\u201d.<\/p>\n<p>4) In the New Distribution Wizard, chose \u2018Weighted Distribution of Items\u2019 and enter these items into the ratio and literal text boxes respectively, then add each to the list.<\/p>\n<pre><span style=\"color: #111111;\"><span style=\"font-family: Consolas, 'Andale Mono', Monaco, Courier, 'Courier New', Verdana, sans-serif;\"><span style=\"font-size: small;\">(32 | @gmail.com), (32 | @yahoo.com), (2 | @ibm.com), (4 | @msn.com), (2 | @ymail.com), (2 | @inmail.com), (2 | @cnet.net), (2 | @chase.org), (1 | @iri.com), (1 | @gdic.com), (1 | @aci.com), (2 | @oracle.net), (1 | @gmx.org), (4 | @aol.com), (2 | @inbox.com), (2 | @hushmail.com), (2 | @outlook.com), (2 | @zoho.com), (2 | @yandex.net), (2 | @mail.com)<\/span><\/span><\/span><\/pre>\n<p>After you enter these values, click Next in the original wizard to move into the Data Targets dialog. Use \u201cAdd Data Target \u2026\u201d to specify the output file \u201cEmail.set\u201d. This will also be used at collection-build time.<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/03\/Emails.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-6873 size-full\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/03\/Emails-e1426860460394.png\" alt=\"Cassandra_RowGen_Emails\" width=\"600\" height=\"461\" \/><\/a><\/p>\n<blockquote lang=\"zxx\"><p><em>The email we set the highest weights for (Gmail and Yahoo) show up most frequently, with others showing up periodically.<\/em><\/p><\/blockquote>\n<p><strong>Generating Credit Card Numbers<br \/>\n<\/strong>Lastly, we will create computationally valid card numbers in the format XXXX-XXXX-XXXX-XXXX. The first four digits reflect actual Issue Identifier Numbers (IIN) of various credit card companies, and the last digit verifies the cards\u2019 authenticity.<\/p>\n<p>To do this, create and run a new (empty) job. Call it \u201cCreateCCNSet.rcl\u201d (or .scl), and populate it with the script below to create \u201cCCN.set\u201d. The \/INCOLLECT value in RowGen scripts determines the number of rows generated.<\/p>\n<p>RowGen\u2019s purpose-built CCN generation function, ccn_gen(\u201cANY, \u201c-\u201c) is called to populate this field. Note similar functions exist for <a href=\"http:\/\/www.iri.com\/blog\/test-data\/united-states-social-security-number\/\" target=\"_blank\" rel=\"noopener\">US<\/a> and <a href=\"http:\/\/www.iri.com\/blog\/test-data\/korea-social-security-number\/\" target=\"_blank\" rel=\"noopener\">Korean<\/a> social security numbers, and the national IDs of <a href=\"http:\/\/www.iri.com\/blog\/test-data\/generating-test-nid-data-italy-fiscal-codes\/\" target=\"_blank\" rel=\"noopener\">Italy<\/a> and <a href=\"http:\/\/www.iri.com\/blog\/test-data\/netherlands-social-fiscal\/\" target=\"_blank\" rel=\"noopener\">The Netherlands<\/a>.<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/03\/CreditCardNumber.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-6874 size-full\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/03\/CreditCardNumber-e1426860474180.png\" alt=\"Cassandra_RowGen_CreditCardNumber\" width=\"600\" height=\"415\" \/><\/a><\/p>\n<p><strong>Creating the Final Test File<\/strong><br \/>\nWith all set files built, it is time to use them in the test CSV file we\u2019ll create and export to a Cassandra collection.<\/p>\n<p>1) Run RowGen\u2019s \u2018New Custom Test Data\u2019 job wizard to create a job called \u201cCreateCassUserData.rcl\u201d that will generate the Customers.csv file, the file we will then export to Cassandra.<\/p>\n<p>2) Click \u201cLayout Fields \u2026\u201d to enter the Layout Fields Dialog. Click New Field and rename the first field to Usernames. Double-click on it to launch the Generation Field dialog and \u201cDefine \u2026\u201d its Set file as Usernames.set; then select ALL for its selection type.<\/p>\n<p>3) Click New Field and rename the second field to LastNames. Double-click on it to launch the Generation Field dialog and \u201cDefine \u2026\u201d its Set file as Users.set; then select ALL for its selection type.<\/p>\n<p>4) Click New Field and rename the third field to FirstNames. Double-click on it to launch the Generation Field dialog and \u201cDefine \u2026\u201d its Set file as Users.set; then select ROWS for its selection type and set the column index to 2.<\/p>\n<p>5) Click New Field and rename the fourth field to Email. Double-click on it to launch the Generation Field dialog and \u201cDefine \u2026\u201d its Set file as Emails.set; then select ALL for its selection type.<\/p>\n<p>6) Click New Field and rename the fifth field to CreditCardNumbers. Double-click on it to launch the Generation Field dialog and \u201cDefine \u2026\u201d its Set file as CCN.set; then select ALL for its selection type.<\/p>\n<p>7) After you enter these values, click Next in the original wizard to move into the Data Targets dialog. Use \u201cAdd Data Target \u2026\u201d to specify the output file Customers.csv; then run the script in the Workbench or on the command line to generate that file:<\/p>\n<pre><span style=\"color: #111111;\"><span style=\"font-family: Consolas, 'Andale Mono', Monaco, Courier, 'Courier New', Verdana, sans-serif;\"><span style=\"font-size: small;\">rowgen \/spec=CreateCassUserData.rcl<\/span><\/span><\/span><\/pre>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/03\/FinalScript.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-6875 size-full\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/03\/FinalScript-e1426860487976.png\" alt=\"Cassandra_RowGen_FinalScript\" width=\"600\" height=\"391\" \/><\/a><\/p>\n<p>Note that RowGen, in addition to producing this CSV file at runtime, could have also produced multiple other file, database, formatted-report, named-pipe, procedural, and even real-time BIRT display, with fields from the generated test data, all at the same time.<\/p>\n<p><strong>Importing to Cassandra<\/strong><br \/>\nTo import the CSV file into your Cassandra Database, call the following COPY command:<\/p>\n<pre><span style=\"color: #111111;\"><span style=\"font-family: Consolas, 'Andale Mono', Monaco, Courier, 'Courier New', Verdana, sans-serif;\"><span style=\"font-size: small;\"><span style=\"color: #333333;\">COPY &lt;Table you are importing data to&gt; (field1fromCSV, field2fromCSV, ...) FROM '&lt;Path to CSV&gt;';<\/span><\/span><\/span><\/span><\/pre>\n<p><span style=\"color: #111111;\"><span style=\"font-family: Consolas, 'Andale Mono', Monaco, Courier, 'Courier New', Verdana, sans-serif;\"><span style=\"font-size: small;\"> <a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/03\/Import.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-6876 size-full\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/03\/Import-e1426860502581.png\" alt=\"Cassandra_RowGen_Import\" width=\"549\" height=\"277\" \/><\/a><\/span><\/span><\/span><\/p>\n<p>Here are the records in the test table<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/03\/Display.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-6877 size-full\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/03\/Display-e1426860515391.png\" alt=\"Cassandra_RowGen_Display\" width=\"550\" height=\"266\" \/><\/a><\/p>\n<p>For more information on the generation options available, see the\u00a0<strong>Test File Targets<\/strong> section at:\u00a0<a href=\"http:\/\/www.iri.com\/products\/rowgen\/technical-details\" target=\"_blank\" rel=\"noopener\">http:\/\/www.iri.com\/products\/rowgen\/technical-details<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>UPDATE: Q1&#8217;17: Native JSON file generation, and available JDBC and ODBC drivers for NoSQL RDBs like Cassandra, now provide a more seamless connection than the CSV\/import approach shown below. Please contact rowgen@iri.com or voracity@iri.com for details. DataStax\/Apache Cassandra cannot readily generate and populate realistic\u00a0prototypes for testing queries or planning capacity; the Cassandra stress tool\u00a0only inserts<\/p>\n<div><a class=\"btn-filled btn\" href=\"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/\" title=\"Creating Test Data for Cassandra\">Read More<\/a><\/div>\n","protected":false},"author":61,"featured_media":6814,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[108,8,29],"tags":[523,510,521,512,522,1641,1640,395,524,525,1643,1639,1642],"class_list":["post-6770","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-big-data-2","category-data-protection","category-test-data","tag-build-test-data","tag-cassandra-datastax","tag-creating-realistic-test-data","tag-csv-file","tag-importing-to-cassandra","tag-nosql-db","tag-nosql-test-data","tag-production-data","tag-prototype","tag-stress","tag-synthetic-csv","tag-synthetic-data","tag-synthetic-json"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v23.4 (Yoast SEO v23.4) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Creating Test Data for Cassandra - IRI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Creating Test Data for Cassandra\" \/>\n<meta property=\"og:description\" content=\"UPDATE: Q1&#8217;17: Native JSON file generation, and available JDBC and ODBC drivers for NoSQL RDBs like Cassandra, now provide a more seamless connection than the CSV\/import approach shown below. Please contact rowgen@iri.com or voracity@iri.com for details. DataStax\/Apache Cassandra cannot readily generate and populate realistic\u00a0prototypes for testing queries or planning capacity; the Cassandra stress tool\u00a0only insertsRead More\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/\" \/>\n<meta property=\"og:site_name\" content=\"IRI\" \/>\n<meta property=\"article:published_time\" content=\"2015-03-13T19:26:42+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-12-13T16:29:03+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/02\/User1-e1425678943171.png\" \/>\n\t<meta property=\"og:image:width\" content=\"600\" \/>\n\t<meta property=\"og:image:height\" content=\"555\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Nathan Dymora\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Nathan Dymora\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/\"},\"author\":{\"name\":\"Nathan Dymora\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/6c3bde00b144e9786b3d024c1e45defa\"},\"headline\":\"Creating Test Data for Cassandra\",\"datePublished\":\"2015-03-13T19:26:42+00:00\",\"dateModified\":\"2021-12-13T16:29:03+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/\"},\"wordCount\":1351,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/02\/User1-e1425678943171.png\",\"keywords\":[\"build test data\",\"Cassandra DataStax\",\"creating realistic test data\",\"csv file\",\"importing to Cassandra\",\"NoSQL DB\",\"NoSQL test data\",\"production data\",\"prototype\",\"stress\",\"synthetic CSV\",\"synthetic data\",\"Synthetic JSON\"],\"articleSection\":[\"Big Data\",\"Data Masking\/Protection\",\"Test Data\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/\",\"url\":\"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/\",\"name\":\"Creating Test Data for Cassandra - IRI\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/02\/User1-e1425678943171.png\",\"datePublished\":\"2015-03-13T19:26:42+00:00\",\"dateModified\":\"2021-12-13T16:29:03+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/#primaryimage\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/02\/User1-e1425678943171.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/02\/User1-e1425678943171.png\",\"width\":600,\"height\":555},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.iri.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Creating Test Data for Cassandra\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.iri.com\/blog\/#website\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"name\":\"IRI\",\"description\":\"Total Data Management Blog\",\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.iri.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\",\"name\":\"IRI\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"width\":750,\"height\":206,\"caption\":\"IRI\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/6c3bde00b144e9786b3d024c1e45defa\",\"name\":\"Nathan Dymora\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/fe3589b371c7912ed817bd9e5e443745?s=96&d=blank&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/fe3589b371c7912ed817bd9e5e443745?s=96&d=blank&r=g\",\"caption\":\"Nathan Dymora\"},\"url\":\"https:\/\/www.iri.com\/blog\/author\/nathand\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Creating Test Data for Cassandra - IRI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/","og_locale":"en_US","og_type":"article","og_title":"Creating Test Data for Cassandra","og_description":"UPDATE: Q1&#8217;17: Native JSON file generation, and available JDBC and ODBC drivers for NoSQL RDBs like Cassandra, now provide a more seamless connection than the CSV\/import approach shown below. Please contact rowgen@iri.com or voracity@iri.com for details. DataStax\/Apache Cassandra cannot readily generate and populate realistic\u00a0prototypes for testing queries or planning capacity; the Cassandra stress tool\u00a0only insertsRead More","og_url":"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/","og_site_name":"IRI","article_published_time":"2015-03-13T19:26:42+00:00","article_modified_time":"2021-12-13T16:29:03+00:00","og_image":[{"width":600,"height":555,"url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/02\/User1-e1425678943171.png","type":"image\/png"}],"author":"Nathan Dymora","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Nathan Dymora","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/#article","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/"},"author":{"name":"Nathan Dymora","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/6c3bde00b144e9786b3d024c1e45defa"},"headline":"Creating Test Data for Cassandra","datePublished":"2015-03-13T19:26:42+00:00","dateModified":"2021-12-13T16:29:03+00:00","mainEntityOfPage":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/"},"wordCount":1351,"commentCount":0,"publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/02\/User1-e1425678943171.png","keywords":["build test data","Cassandra DataStax","creating realistic test data","csv file","importing to Cassandra","NoSQL DB","NoSQL test data","production data","prototype","stress","synthetic CSV","synthetic data","Synthetic JSON"],"articleSection":["Big Data","Data Masking\/Protection","Test Data"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/","url":"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/","name":"Creating Test Data for Cassandra - IRI","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/#primaryimage"},"image":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/02\/User1-e1425678943171.png","datePublished":"2015-03-13T19:26:42+00:00","dateModified":"2021-12-13T16:29:03+00:00","breadcrumb":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/#primaryimage","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/02\/User1-e1425678943171.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/02\/User1-e1425678943171.png","width":600,"height":555},{"@type":"BreadcrumbList","@id":"https:\/\/www.iri.com\/blog\/data-protection\/creating-test-data-for-cassandra-datastax\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.iri.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Creating Test Data for Cassandra"}]},{"@type":"WebSite","@id":"https:\/\/www.iri.com\/blog\/#website","url":"https:\/\/www.iri.com\/blog\/","name":"IRI","description":"Total Data Management Blog","publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.iri.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.iri.com\/blog\/#organization","name":"IRI","url":"https:\/\/www.iri.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","width":750,"height":206,"caption":"IRI"},"image":{"@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/6c3bde00b144e9786b3d024c1e45defa","name":"Nathan Dymora","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/fe3589b371c7912ed817bd9e5e443745?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/fe3589b371c7912ed817bd9e5e443745?s=96&d=blank&r=g","caption":"Nathan Dymora"},"url":"https:\/\/www.iri.com\/blog\/author\/nathand\/"}]}},"jetpack_featured_media_url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2015\/02\/User1-e1425678943171.png","_links":{"self":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/6770"}],"collection":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/users\/61"}],"replies":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/comments?post=6770"}],"version-history":[{"count":29,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/6770\/revisions"}],"predecessor-version":[{"id":15415,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/6770\/revisions\/15415"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media\/6814"}],"wp:attachment":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media?parent=6770"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/categories?post=6770"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/tags?post=6770"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}