{"id":6158,"date":"2014-10-02T12:07:39","date_gmt":"2014-10-02T16:07:39","guid":{"rendered":"http:\/\/www.iri.com\/blog\/?p=6158"},"modified":"2026-02-23T14:33:02","modified_gmt":"2026-02-23T19:33:02","slug":"nextform-prepares-unstructured-data-splunk-indexing","status":"publish","type":"post","link":"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/","title":{"rendered":"Preparing Unstructured Data for Splunk"},"content":{"rendered":"<p><em>Introduction:\u00a0This example demonstrates an older method of using the unstructured data edition of IRI NextForm to extract dark data and prepare it for ingestion in Splunk for indexing and visualization purposes. As you will read, NextForm would process the data outside of Splunk and create a CSV file for input. IRI now offers a new add-on for seamless data preparation, indexing, and visualization in Splunk and information on the add-on is found <a href=\"http:\/\/www.iri.com\/blog\/data-transformation2\/iri-voracity-add-on-for-splunk\/\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/em><\/p>\n<p>Splunk is not designed to index data from most unstructured, &#8220;dark data&#8221; text sources, as they are in highly encoded file formats. Attempting to index such files results in an excessive amount of encoding language that gets indexed instead of the relevant character data. Thus, Splunk cannot readily access meaningful data from these file types.<\/p>\n<p>The\u00a0<a href=\"http:\/\/www.iri.com\/blog\/migration\/data-migration\/unstructured-data-data-restructuring-wizard\/\" target=\"_blank\" rel=\"noopener\">Unstructured Data<\/a>\u00a0edition of <a href=\"http:\/\/www.iri.com\/products\/nextform\" target=\"_blank\" rel=\"noopener\">IRI NextForm<\/a>, however, can\u00a0extract specific character data from doc\/x, ppt\/x, xls\/x, pdf, rtf, txt, xml, and email repositories. Using regular expressions, the user can find and extract data that conforms to the desired pattern; for example, a telephone number, email address, or credit card number. And custom regular expressions can be used to extract specific data patterns.<\/p>\n<p>All\u00a0data matching the search pattern is written to a delimited text file. Once in that structured format, Splunk can automatically parse the values\u00a0for quick and easy indexing. As a forensic aside, that results\u00a0file can also include metadata information on each source from which\u00a0the data was extracted, including the file path, file size, and date of creation.<\/p>\n<p>In the how-to example below, unstructured data will be extracted from various file formats, including docx, PDF, ppt, and xlsx:<\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<p>&nbsp;<\/p>\n<p>1)<strong> Gather<\/strong> or identify\u00a0all the files you wish to search\u00a0through\u00a0within a folder. The wizard extracts\u00a0and structures data in the\u00a0following file formats: doc, docx, eml, pdf, ppt, pptx, rtf, txt, xls, xlsx, and xml.<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-1.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6194\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-1.jpg\" alt=\"image-1\" width=\"595\" height=\"234\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-1.jpg 595w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-1-300x117.jpg 300w\" sizes=\"(max-width: 595px) 100vw, 595px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>2)<strong> Create<\/strong> a new project in the <a href=\"http:\/\/www.iri.com\/products\/workbench\" target=\"_blank\" rel=\"noopener\">IRI Workbench<\/a>.<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-210.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6195\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-210.jpg\" alt=\"image-2\" width=\"190\" height=\"38\" \/><\/a><\/p>\n<p>3)<strong> Start<\/strong>\u00a0the New Data Restructuring tool under the IRI icon.<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-31.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6196\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-31.jpg\" alt=\"image-3\" width=\"287\" height=\"160\" \/><\/a><\/p>\n<p>4)<strong> Specify\u00a0<\/strong>the top level UNC or folder name containing all the sub-folders with files\u00a0you wish to search.<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-41.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6197\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-41.jpg\" alt=\"image-4\" width=\"682\" height=\"129\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-41.jpg 682w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-41-300x56.jpg 300w\" sizes=\"(max-width: 682px) 100vw, 682px\" \/><\/a><\/p>\n<p>5) <strong>Select\u00a0<\/strong>the file types you wish to search by selecting the corresponding extensions.<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-51.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6198\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-51.jpg\" alt=\"image-5\" width=\"669\" height=\"34\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-51.jpg 669w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-51-300x15.jpg 300w\" sizes=\"(max-width: 669px) 100vw, 669px\" \/><\/a><\/p>\n<p>6) <strong>Select<\/strong>\u00a0the metadata you would like to include in the output by checking the corresponding field types.<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-71.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6200\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-71.jpg\" alt=\"image-7\" width=\"668\" height=\"52\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-71.jpg 668w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-71-300x23.jpg 300w\" sizes=\"(max-width: 668px) 100vw, 668px\" \/><\/a><\/p>\n<p>7)<strong> Name<\/strong>\u00a0the data you are searching for in the Column Name text box. In the Search Pattern text box enter the regular expression that will identify the data structure you are searching for. In this example, email addresses will be extracted, and clicking the help icon at the bottom-left of the form reveals a list of common regular expressions.<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-81.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6201\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-81.jpg\" alt=\"image-8\" width=\"669\" height=\"32\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-81.jpg 669w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-81-300x14.jpg 300w\" sizes=\"(max-width: 669px) 100vw, 669px\" \/><\/a><\/p>\n<p>8) <strong>Insert <\/strong>the pattern to the table of regular expressions that will be used during the search\/extract operation.<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-91.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6202\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-91.jpg\" alt=\"image-9\" width=\"674\" height=\"139\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-91.jpg 674w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-91-300x61.jpg 300w\" sizes=\"(max-width: 674px) 100vw, 674px\" \/><\/a><\/p>\n<p>9) <strong>Specify<\/strong> the delimiter to be used in the text file, then insert a\u00a0comma.<\/p>\n<p>10) <strong>Browse<\/strong> for the location of the project created in the beginning to store the output, and enter a name for the text file (.txt). Do the same for the data definition file (.ddf), which\u00a0gets created at the same time. Both names can be the same since they have different extensions, but make sure the names and extensions are correct before continuing.<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-101.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6203\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-101.jpg\" alt=\"image-10\" width=\"671\" height=\"91\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-101.jpg 671w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-101-300x40.jpg 300w\" sizes=\"(max-width: 671px) 100vw, 671px\" \/><\/a><\/p>\n<p>11)\u00a0 <strong>Examine <\/strong>the preview of data matching the search criteria, as well as the metadata from each source you wanted.<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-111.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6204\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-111.jpg\" alt=\"image-11\" width=\"676\" height=\"412\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-111.jpg 676w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-111-300x182.jpg 300w\" sizes=\"(max-width: 676px) 100vw, 676px\" \/><\/a><\/p>\n<p>12) <strong>Effect<\/strong> the extraction if you&#8217;re satisfied with the previewed results. The target files will be generated in the designated project folder.<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-121.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6205\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-121.jpg\" alt=\"image-12\" width=\"192\" height=\"159\" \/><\/a><\/p>\n<p>13) \u00a0 <strong>Open<\/strong> the text file and type the names of each field from left to right, separated by commas. The field names must be inserted on the\u00a0<strong>first line<\/strong> above the data to be recognized as headers.<\/p>\n<p>14) <strong>Save<\/strong> the file with a\u00a0<strong>.csv<\/strong> extension.<\/p>\n<p>15) <strong>Index<\/strong> the reformatted .csv file into Splunk, and click save to complete the upload. The CSV format will be easily recognized and indexed without any configuration.<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-191.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6212\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-191.jpg\" alt=\"image-19\" width=\"678\" height=\"376\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-191.jpg 678w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-191-300x166.jpg 300w\" sizes=\"(max-width: 678px) 100vw, 678px\" \/><\/a><\/p>\n<p>16)\u00a0\u00a0<strong>View <\/strong>the data by searching for the source. Enter <em>Source = \u201cFile Path to CSV\u201d <\/em>in the search bar to view the data.<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-201.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6213\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-201.jpg\" alt=\"image-20\" width=\"678\" height=\"342\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-201.jpg 678w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-201-300x151.jpg 300w\" sizes=\"(max-width: 678px) 100vw, 678px\" \/><\/a><\/p>\n<p>17) <strong>Visualize<\/strong> the data as you see fit, and even see your charts in an internal browser in the same Eclipse IDE (<a href=\"http:\/\/www.iri.com\/products\/workbench\" target=\"_blank\" rel=\"noopener\">IRI Workbench<\/a>), so\u00a0you can see the results side-by-side with your data preparation activities.<\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-211.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6193\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-211.jpg\" alt=\"image-21\" width=\"679\" height=\"399\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-211.jpg 679w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-211-300x176.jpg 300w\" sizes=\"(max-width: 679px) 100vw, 679px\" \/><\/a>This Splunk chart displays the number of instances over time. Contact <a href=\"mailto:support@iri.com\">support@iri.com<\/a> and reference this article if you have any technical questions.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction:\u00a0This example demonstrates an older method of using the unstructured data edition of IRI NextForm to extract dark data and prepare it for ingestion in Splunk for indexing and visualization purposes. As you will read, NextForm would process the data outside of Splunk and create a CSV file for input. IRI now offers a new<\/p>\n<div><a class=\"btn-filled btn\" href=\"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/\" title=\"Preparing Unstructured Data for Splunk\">Read More<\/a><\/div>\n","protected":false},"author":60,"featured_media":6193,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[108,32,2255],"tags":[611,610,561,612,613,614,553,489,615,616,617,418,618,574,619,143,620,621,550],"class_list":["post-6158","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-big-data-2","category-business-intelligence","category-archived-articles","tag-csv","tag-dark-data","tag-ddf","tag-doc","tag-docx","tag-eml","tag-iri-nextform","tag-metadata","tag-pdf","tag-ppt","tag-pptx","tag-regular-expressions","tag-rtf","tag-splunk","tag-txt","tag-unstructured-data","tag-xls","tag-xlsx","tag-xml"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v23.4 (Yoast SEO v23.4) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Preparing Unstructured Data for Splunk - IRI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Preparing Unstructured Data for Splunk\" \/>\n<meta property=\"og:description\" content=\"Introduction:\u00a0This example demonstrates an older method of using the unstructured data edition of IRI NextForm to extract dark data and prepare it for ingestion in Splunk for indexing and visualization purposes. As you will read, NextForm would process the data outside of Splunk and create a CSV file for input. IRI now offers a newRead More\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/\" \/>\n<meta property=\"og:site_name\" content=\"IRI\" \/>\n<meta property=\"article:published_time\" content=\"2014-10-02T16:07:39+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-02-23T19:33:02+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-211.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"679\" \/>\n\t<meta property=\"og:image:height\" content=\"399\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Kyle Grosjean\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kyle Grosjean\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/\"},\"author\":{\"name\":\"Kyle Grosjean\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/b1e7df4f0bc31c27408b601af27ba5dd\"},\"headline\":\"Preparing Unstructured Data for Splunk\",\"datePublished\":\"2014-10-02T16:07:39+00:00\",\"dateModified\":\"2026-02-23T19:33:02+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/\"},\"wordCount\":731,\"commentCount\":3,\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-211.jpg\",\"keywords\":[\"csv\",\"dark data\",\"DDF\",\"doc\",\"docx\",\"eml\",\"IRI NextForm\",\"metadata\",\"pdf\",\"ppt\",\"pptx\",\"regular expressions\",\"rtf\",\"Splunk\",\"txt\",\"unstructured data\",\"xls\",\"xlsx\",\"xml\"],\"articleSection\":[\"Big Data\",\"Business Intelligence (BI&#041;\",\"Archived Articles\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/\",\"url\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/\",\"name\":\"Preparing Unstructured Data for Splunk - IRI\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-211.jpg\",\"datePublished\":\"2014-10-02T16:07:39+00:00\",\"dateModified\":\"2026-02-23T19:33:02+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/#primaryimage\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-211.jpg\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-211.jpg\",\"width\":679,\"height\":399},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.iri.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Preparing Unstructured Data for Splunk\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.iri.com\/blog\/#website\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"name\":\"IRI\",\"description\":\"Total Data Management Blog\",\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.iri.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\",\"name\":\"IRI\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"width\":750,\"height\":206,\"caption\":\"IRI\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/b1e7df4f0bc31c27408b601af27ba5dd\",\"name\":\"Kyle Grosjean\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5774edb367635439d5415b05885abf26?s=96&d=blank&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5774edb367635439d5415b05885abf26?s=96&d=blank&r=g\",\"caption\":\"Kyle Grosjean\"},\"url\":\"https:\/\/www.iri.com\/blog\/author\/kyleg\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Preparing Unstructured Data for Splunk - IRI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/","og_locale":"en_US","og_type":"article","og_title":"Preparing Unstructured Data for Splunk","og_description":"Introduction:\u00a0This example demonstrates an older method of using the unstructured data edition of IRI NextForm to extract dark data and prepare it for ingestion in Splunk for indexing and visualization purposes. As you will read, NextForm would process the data outside of Splunk and create a CSV file for input. IRI now offers a newRead More","og_url":"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/","og_site_name":"IRI","article_published_time":"2014-10-02T16:07:39+00:00","article_modified_time":"2026-02-23T19:33:02+00:00","og_image":[{"width":679,"height":399,"url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-211.jpg","type":"image\/jpeg"}],"author":"Kyle Grosjean","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Kyle Grosjean","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/#article","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/"},"author":{"name":"Kyle Grosjean","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/b1e7df4f0bc31c27408b601af27ba5dd"},"headline":"Preparing Unstructured Data for Splunk","datePublished":"2014-10-02T16:07:39+00:00","dateModified":"2026-02-23T19:33:02+00:00","mainEntityOfPage":{"@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/"},"wordCount":731,"commentCount":3,"publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-211.jpg","keywords":["csv","dark data","DDF","doc","docx","eml","IRI NextForm","metadata","pdf","ppt","pptx","regular expressions","rtf","Splunk","txt","unstructured data","xls","xlsx","xml"],"articleSection":["Big Data","Business Intelligence (BI&#041;","Archived Articles"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/","url":"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/","name":"Preparing Unstructured Data for Splunk - IRI","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/#primaryimage"},"image":{"@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-211.jpg","datePublished":"2014-10-02T16:07:39+00:00","dateModified":"2026-02-23T19:33:02+00:00","breadcrumb":{"@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/#primaryimage","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-211.jpg","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-211.jpg","width":679,"height":399},{"@type":"BreadcrumbList","@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/nextform-prepares-unstructured-data-splunk-indexing\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.iri.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Preparing Unstructured Data for Splunk"}]},{"@type":"WebSite","@id":"https:\/\/www.iri.com\/blog\/#website","url":"https:\/\/www.iri.com\/blog\/","name":"IRI","description":"Total Data Management Blog","publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.iri.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.iri.com\/blog\/#organization","name":"IRI","url":"https:\/\/www.iri.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","width":750,"height":206,"caption":"IRI"},"image":{"@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/b1e7df4f0bc31c27408b601af27ba5dd","name":"Kyle Grosjean","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5774edb367635439d5415b05885abf26?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5774edb367635439d5415b05885abf26?s=96&d=blank&r=g","caption":"Kyle Grosjean"},"url":"https:\/\/www.iri.com\/blog\/author\/kyleg\/"}]}},"jetpack_featured_media_url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2014\/10\/image-211.jpg","_links":{"self":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/6158"}],"collection":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/users\/60"}],"replies":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/comments?post=6158"}],"version-history":[{"count":20,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/6158\/revisions"}],"predecessor-version":[{"id":11635,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/6158\/revisions\/11635"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media\/6193"}],"wp:attachment":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media?parent=6158"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/categories?post=6158"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/tags?post=6158"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}