{"id":10448,"date":"2016-09-07T14:22:40","date_gmt":"2016-09-07T18:22:40","guid":{"rendered":"http:\/\/www.iri.com\/blog\/?p=10448"},"modified":"2026-02-23T15:21:45","modified_gmt":"2026-02-23T20:21:45","slug":"a-fresh-look-at-data-preparation","status":"publish","type":"post","link":"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/","title":{"rendered":"A Fresh Look at Data Preparation"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">To analyze data successfully, it must first be prepared\u00a0successfully. Poor quality data creates poor results. Worse yet is data that takes too long to collect and clean because it is too big or too foreign. Raw data is usually unfit even for the imagination, much less making decisions.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Traditional BI architects and big data scientists know the solution to this problem lies in good data preparation. Unfortunately, this is a process that rarely comes to mind when people think about data. It&#8217;s hardly seductive, or\u00a0the glamorous side of information today. But it is a necessary step\u00a0towards\u00a0the insights people expect to glean from their data. Indeed, before data can become useful, it has to be culled, churned, and cleansed, so the glamorous algorithmic and display magic <\/span><i><span style=\"font-weight: 400;\">can <\/span><\/i><span style=\"font-weight: 400;\">happen.<\/span><\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/1-Benefits-of-Prepared-Data.jpeg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-10449\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/1-Benefits-of-Prepared-Data.jpeg\" alt=\"Benefits of Prepared Data\" width=\"600\" height=\"348\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/1-Benefits-of-Prepared-Data.jpeg 800w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/1-Benefits-of-Prepared-Data-300x174.jpeg 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/1-Benefits-of-Prepared-Data-768x445.jpeg 768w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p><b><br \/>\nWhat Exactly is Data Preparation?<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Think of the sum of data as a landfill. Data gets thrown in from every possible place. People don\u2019t care about it because it\u2019s easy to send it somewhere else and forget about it. As a result, you wind up with just a dump of information. Not only is most of it useless in its current state, i<\/span><span style=\"font-weight: 400;\">t is also taking up space, possibly in what\u2019s now called a <\/span><a href=\"http:\/\/www.iri.com\/blog\/business-intelligence\/the-use-of-data-lakes\/\"><span style=\"font-weight: 400;\">data lake<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Worse still, this data often sits in a pile of various formats and frequently contains duplicates and errors. It\u2019s an unorganized and unmanageable tangle. To call it messy is putting it lightly.<\/span><\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/2-Landfill-pile.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-10450\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/2-Landfill-pile.jpg\" alt=\"Landfill Pile\" width=\"600\" height=\"400\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/2-Landfill-pile.jpg 1024w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/2-Landfill-pile-300x200.jpg 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/2-Landfill-pile-768x512.jpg 768w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><br \/>\n<\/a><span style=\"font-weight: 400;\">Data preparation solves these problems. It collects data and cleans it. It gets rid of duplicates and errors. It recognizes that \u201cWilliam Bartram,\u201d \u201cBill Bartram,\u201d and \u201cBill Bertram\u201d are actually the same person, and it unifies those records. Unstructured data, like phone calls or email, passes through the sieve of preparation and comes out in a database-friendly format. More importantly, though, this data is now becoming valuable.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Data preparation has gone by a handful of names. IRI first used Rick Sherman\u2019s term for it, \u201c<\/span><a href=\"http:\/\/www.iri.com\/blog\/business-intelligence\/data-franchising\/\"><span style=\"font-weight: 400;\">data franchising<\/span><span style=\"font-weight: 400;\">,<\/span><\/a><span style=\"font-weight: 400;\">\u201d when CoSort was preparing data for BI tools in 2003. Now it also appears as data <\/span><i><span style=\"font-weight: 400;\">blending, <\/span><\/i><span style=\"font-weight: 400;\">data <\/span><i><span style=\"font-weight: 400;\">munging<\/span><\/i><span style=\"font-weight: 400;\">, and data <\/span><i><span style=\"font-weight: 400;\">wrangling<\/span><\/i><span style=\"font-weight: 400;\">, among other newer industry buzzwords. But they all mean the same thing. Whichever term you prefer, the core preparation activities should include:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">discover (profile, search, extract, classify)<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">pivot, slice and dice<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">transform (sort, join, aggregate, pre-calculate)<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">cleanse (scrub, unify, validate)<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">mask (encrypt, redact, pseudonymize) <\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">against the widest array of sources, with as much customization and automation as possible.<\/span><\/p>\n<p><b><br \/>\nThe Value of Data Preparation<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Valuable is a pretty vague word. Gold, for instance, is valuable only because we say it is. But a straw is valuable when you have a toddler with a cup\u00a0because you need it. Companies identify value differently; they extract it according to how it affects the bottom line. Data preparation begins the process of extracting whatever informational value they think that data may yield.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Making data usable is critical to profiting from it. Thorough data preparation assures valid, high-quality information. It helps maximize marketing campaigns by eradicating duplicate prospects and narrowing segments to more precise metrics. It unlocks the value of analytics and self-service BI (business intelligence). It even prepares data for selling, as noted by the expanding field of <\/span><a href=\"http:\/\/www.iri.com\/blog\/iri\/business\/infonomics-and-you\/\"><span style=\"font-weight: 400;\">Infonomics<\/span><\/a><span style=\"font-weight: 400;\">. Selling data alone makes US supermarket chain, The Kroger Co., $100 million dollars in incremental revenue each year.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Data Preparation merges several business-critical objectives. It sources and profiles data, integrates data, governs data, and prepares data for analytics. By investing in data preparation, you invest in your company\u2019s future. Because, no matter your line of business, data is its future.<\/span><\/p>\n<p><a href=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/3-Data-Satisfaction.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-10451\" src=\"http:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/3-Data-Satisfaction.png\" alt=\"Data Satisfaction\" width=\"600\" height=\"239\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/3-Data-Satisfaction.png 853w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/3-Data-Satisfaction-300x120.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/3-Data-Satisfaction-768x306.png 768w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-weight: 400;\">Source: <\/span><i><span style=\"font-weight: 400;\">Improving Data Preparation for Business Analytics<\/span><\/i><span style=\"font-weight: 400;\">, TDWI Q3 2016<\/span><\/p>\n<p><b><br \/>\nWhat&#8217;s Holding It Back<br \/>\n<\/b><\/p>\n<p><span style=\"font-weight: 400;\">At this point, data preparation sounds pretty good. It\u2019s something every serious, data-driven business needs for competitive advantage. And the faster you can consistently and successfully prepare data, the better positioned your company will be to benefit from it.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Many businesses nevertheless struggle to implement data preparation procedures. They are not aware of the volume and variety of data they will face, cannot handle the cost and complexity of tools designed to prepare it, and they do not perceive sufficient ROI from the process.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In its recent report, <\/span><i><span style=\"font-weight: 400;\">Improving Data Preparation for Business Analytics<\/span><\/i><span style=\"font-weight: 400;\">, TDWI warned that \u201cinsufficient budget is the most common barrier to improving how data is prepared for users\u2019 BI and analytics projects.\u201d The second most common barrier is \u201cnot having a strong enough business case.\u201d<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This means several things, including the fact that many businesses don\u2019t value their data enough. Their executives do not understand or realize the potential of the data they have. That makes the business user\u2019s tasks of justifying and budgeting for data preparation solutions harder. And when those solutions are expensive, the tasks are all the more arduous. What would help is to have data preparation software that fits the analytic project budget, rather than overwhelms it. That way, a clean business case can be made along with the technical one.<\/span><\/p>\n<p><b><br \/>\nThe Technology At Hand<\/b><\/p>\n<p><span style=\"font-weight: 400;\">The most common approach to custom data preparation may be none at all. MOLAP, ROLAP, or HOLAP <\/span><a href=\"http:\/\/www.1keydata.com\/datawarehousing\/molap-rolap.html\"><span style=\"font-weight: 400;\">cubes<\/span><\/a><span style=\"font-weight: 400;\"> provide immediate \u201cslice and dice\u201d and calculation-based analyses for relational databases; but they are limited in their source scope and performance, and devoid of governance. Otherwise raw or virtual tables at rest, or data federated through a <\/span><a href=\"http:\/\/www.iri.com\/blog\/data-transformation2\/voracity-and-the-logical-data-warehouse-ldw\/\"><span style=\"font-weight: 400;\">logical data warehouse<\/span><\/a><span style=\"font-weight: 400;\">, are often used as direct fodder for analytic processes or platforms. \u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Among the tools designed to handle the issue in a DB-agnostic way is <\/span><a href=\"http:\/\/www.iri.com\/products\/cosort\"><span style=\"font-weight: 400;\">IRI CoSort<\/span><\/a><span style=\"font-weight: 400;\">, or the larger\u00a0\u00a0<\/span><a href=\"http:\/\/www.iri.com\/products\/voracity\"><span style=\"font-weight: 400;\">IRI Voracity<\/span><\/a><span style=\"font-weight: 400;\">\u00a0data management platform that uses CoSort or MR2, Spark, Spark Stream, Storm, or Tez in Hadoop to prepare data. \u00a0They support <\/span><a href=\"http:\/\/www.iri.com\/products\/workbench\/data-sources\"><span style=\"font-weight: 400;\">multi-source<\/span><\/a><span style=\"font-weight: 400;\"> data discovery, integration, migration, governance and analytics in both preparatory and presentation frameworks that are typically cheaper, easier to configure, and faster than specialty data preparation and legacy ETL tools.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><a href=\"http:\/\/www.iri.com\/products\/workbench\/voracity-gui\/display\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-11940\" src=\"http:\/\/www.iri.com\/devblog\/wp-content\/uploads\/2016\/09\/preparation_v.png\" alt=\"voracity total data management for data wrangling\" width=\"400\" height=\"176\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/preparation_v.png 1000w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/preparation_v-300x132.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/preparation_v-768x337.png 768w\" sizes=\"(max-width: 400px) 100vw, 400px\" \/><\/a><\/span><\/p>\n<p><span style=\"font-weight: 400;\">The benchmarks linked from the tabs in <\/span><a href=\"http:\/\/www.iri.com\/solutions\/business-intelligence\/bi-tool-acceleration\"><span style=\"font-weight: 400;\">this section<\/span><\/a><span style=\"font-weight: 400;\"> show a 2-20X improvement in time-to-visualization by using IRI data preparation software ahead of legacy BI tools and newer analytic platforms &#8212; all of which choke or die on big data. Beyond the proven power of this option are the established efficiencies of reusable data. Central data stores avoid handling overhead and synchronization problems that come from integrating data for every report.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">For more real-time analyses, API-level tie-ups with <\/span><a href=\"http:\/\/www.iri.com\/blog\/business-intelligence\/iri-data-source-birt\/\"><span style=\"font-weight: 400;\">BIRT<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a href=\"http:\/\/www.iri.com\/blog\/business-intelligence\/linear-regression-a-predictive-tool-in-iri-voracity\/\"><span style=\"font-weight: 400;\">Boost<\/span><\/a><span style=\"font-weight: 400;\"> functions in Eclipse combine data preparation with more advanced, open source presentation. The Voracity add-on for <\/span><a href=\"http:\/\/www.iri.com\/blog\/data-transformation2\/iri-voracity-add-on-for-splunk\/\"><span style=\"font-weight: 400;\">Splunk<\/span><\/a><span style=\"font-weight: 400;\"> indexes cloud collections while preparing raw data locally or remotely.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Newer product offerings in the space (like Alteryx, Paxata, and Trifacta), which prepare data for analytics, are also worth a look for what they do and where they run (cloud). However, they would not have the benefit of IRI&#8217;s many more years of legacy and big data transformation &#8212;\u00a0or data cleansing and masking &#8212; experience. They are also not considered a <a href=\"https:\/\/www.iri.com\/solutions\/data-integration\/implement\/analytics\">Production Analytic Platform<\/a> which Voracity has become &#8230;<\/span><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>To analyze data successfully, it must first be prepared\u00a0successfully. Poor quality data creates poor results. Worse yet is data that takes too long to collect and clean because it is too big or too foreign. Raw data is usually unfit even for the imagination, much less making decisions. Traditional BI architects and big data scientists<\/p>\n<div><a class=\"btn-filled btn\" href=\"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/\" title=\"A Fresh Look at Data Preparation\">Read More<\/a><\/div>\n","protected":false},"author":3,"featured_media":11618,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[108,32,776,34,2255],"tags":[273,879,52,1164,57,1032,14,1162,359,366,1383,1161,5,101,1163,100,546,789],"class_list":["post-10448","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-big-data-2","category-business-intelligence","category-etl","category-business","category-archived-articles","tag-bi","tag-big-data-analytics","tag-business-intelligence-2","tag-data-blending","tag-data-franchising","tag-data-lake","tag-data-masking","tag-data-munging","tag-data-preparation","tag-data-quality-2","tag-data-science","tag-data-scientist","tag-data-transformation","tag-data-warehouse","tag-data-wrangling","tag-etl","tag-iri-cosort","tag-iri-voracity"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v23.4 (Yoast SEO v23.4) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>A Fresh Look at Data Preparation - IRI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A Fresh Look at Data Preparation\" \/>\n<meta property=\"og:description\" content=\"To analyze data successfully, it must first be prepared\u00a0successfully. Poor quality data creates poor results. Worse yet is data that takes too long to collect and clean because it is too big or too foreign. Raw data is usually unfit even for the imagination, much less making decisions. Traditional BI architects and big data scientistsRead More\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/\" \/>\n<meta property=\"og:site_name\" content=\"IRI\" \/>\n<meta property=\"article:published_time\" content=\"2016-09-07T18:22:40+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-02-23T20:21:45+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/data-preparation.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"870\" \/>\n\t<meta property=\"og:image:height\" content=\"435\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"David Friedland\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"David Friedland\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/\"},\"author\":{\"name\":\"David Friedland\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/cdb89f0c0a9c88810b8516d4b140734a\"},\"headline\":\"A Fresh Look at Data Preparation\",\"datePublished\":\"2016-09-07T18:22:40+00:00\",\"dateModified\":\"2026-02-23T20:21:45+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/\"},\"wordCount\":1166,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/data-preparation.jpg\",\"keywords\":[\"BI\",\"big data analytics\",\"business intelligence\",\"data blending\",\"data franchising\",\"data lake\",\"data masking\",\"data munging\",\"data preparation\",\"data quality\",\"data science\",\"data scientist\",\"data transformation\",\"data warehouse\",\"data wrangling\",\"ETL\",\"IRI CoSort\",\"IRI Voracity\"],\"articleSection\":[\"Big Data\",\"Business Intelligence (BI&#041;\",\"ETL\",\"IRI Business\",\"Archived Articles\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/\",\"url\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/\",\"name\":\"A Fresh Look at Data Preparation - IRI\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/data-preparation.jpg\",\"datePublished\":\"2016-09-07T18:22:40+00:00\",\"dateModified\":\"2026-02-23T20:21:45+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/#primaryimage\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/data-preparation.jpg\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/data-preparation.jpg\",\"width\":870,\"height\":435},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.iri.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"A Fresh Look at Data Preparation\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.iri.com\/blog\/#website\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"name\":\"IRI\",\"description\":\"Total Data Management Blog\",\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.iri.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\",\"name\":\"IRI\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"width\":750,\"height\":206,\"caption\":\"IRI\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/cdb89f0c0a9c88810b8516d4b140734a\",\"name\":\"David Friedland\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/995ea08bc7d036da625671cb48a636eb?s=96&d=blank&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/995ea08bc7d036da625671cb48a636eb?s=96&d=blank&r=g\",\"caption\":\"David Friedland\"},\"url\":\"https:\/\/www.iri.com\/blog\/author\/davidf\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"A Fresh Look at Data Preparation - IRI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/","og_locale":"en_US","og_type":"article","og_title":"A Fresh Look at Data Preparation","og_description":"To analyze data successfully, it must first be prepared\u00a0successfully. Poor quality data creates poor results. Worse yet is data that takes too long to collect and clean because it is too big or too foreign. Raw data is usually unfit even for the imagination, much less making decisions. Traditional BI architects and big data scientistsRead More","og_url":"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/","og_site_name":"IRI","article_published_time":"2016-09-07T18:22:40+00:00","article_modified_time":"2026-02-23T20:21:45+00:00","og_image":[{"width":870,"height":435,"url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/data-preparation.jpg","type":"image\/jpeg"}],"author":"David Friedland","twitter_card":"summary_large_image","twitter_misc":{"Written by":"David Friedland","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/#article","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/"},"author":{"name":"David Friedland","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/cdb89f0c0a9c88810b8516d4b140734a"},"headline":"A Fresh Look at Data Preparation","datePublished":"2016-09-07T18:22:40+00:00","dateModified":"2026-02-23T20:21:45+00:00","mainEntityOfPage":{"@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/"},"wordCount":1166,"commentCount":0,"publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/data-preparation.jpg","keywords":["BI","big data analytics","business intelligence","data blending","data franchising","data lake","data masking","data munging","data preparation","data quality","data science","data scientist","data transformation","data warehouse","data wrangling","ETL","IRI CoSort","IRI Voracity"],"articleSection":["Big Data","Business Intelligence (BI&#041;","ETL","IRI Business","Archived Articles"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/","url":"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/","name":"A Fresh Look at Data Preparation - IRI","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/#primaryimage"},"image":{"@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/data-preparation.jpg","datePublished":"2016-09-07T18:22:40+00:00","dateModified":"2026-02-23T20:21:45+00:00","breadcrumb":{"@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/#primaryimage","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/data-preparation.jpg","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/data-preparation.jpg","width":870,"height":435},{"@type":"BreadcrumbList","@id":"https:\/\/www.iri.com\/blog\/business-intelligence\/a-fresh-look-at-data-preparation\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.iri.com\/blog\/"},{"@type":"ListItem","position":2,"name":"A Fresh Look at Data Preparation"}]},{"@type":"WebSite","@id":"https:\/\/www.iri.com\/blog\/#website","url":"https:\/\/www.iri.com\/blog\/","name":"IRI","description":"Total Data Management Blog","publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.iri.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.iri.com\/blog\/#organization","name":"IRI","url":"https:\/\/www.iri.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","width":750,"height":206,"caption":"IRI"},"image":{"@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/cdb89f0c0a9c88810b8516d4b140734a","name":"David Friedland","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/995ea08bc7d036da625671cb48a636eb?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/995ea08bc7d036da625671cb48a636eb?s=96&d=blank&r=g","caption":"David Friedland"},"url":"https:\/\/www.iri.com\/blog\/author\/davidf\/"}]}},"jetpack_featured_media_url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2016\/09\/data-preparation.jpg","_links":{"self":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/10448"}],"collection":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/comments?post=10448"}],"version-history":[{"count":12,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/10448\/revisions"}],"predecessor-version":[{"id":12554,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/10448\/revisions\/12554"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media\/11618"}],"wp:attachment":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media?parent=10448"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/categories?post=10448"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/tags?post=10448"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}