Data Franchising

by David Friedland

Data franchising refers to the staging or packaging of large data sets into meaningful data chunks that business users can digest and use for decision-making, particularly through advanced business intelligence (BI) software.  Additional terms for the preparation of data for BI include data wrangling and data munging.

To improve the usability and performance of BI tools, the SortCL program in IRI CoSort rapidly prepares CSV and XML “feed” files (or ODBC inserts) for their use.

DIF Architecture with Processes 2013 V3 - Data Franschising

SortCL takes very large input data from mainframe data sets, very large database tables, web logs, and other files, and performs one or more data integration and staging functions simultaneously that result in one or more outputs, such as:

  • select/filter
  • sort/merge
  • aggregate/calculate
  • match/join
  • cleanse/encrich
  • encrypt/mask
  • convert/reformat
  • pivot/unpivot
  • substring/custom

By integrating large volumes of sequential data in the file system, SortCL takes the overhead of data transformation out of the BI layer. By combining and multi-threading the big data manipulations, SortCL also saves job design, computation, and I/O cycle time. Savings also manifest in the BI front-end, since query and display (responses) are faster with smaller inputs.

CoSort’s SortCL is routinely used for data franchising into BI platforms like Business Objects, Cognos, Microstrategy, Splunk, Tableau, and Excel spreadmarts. IRI also partners directly with best-of-breed dashboard vendors like Dimensional Insight and IVIZ Group, as SortCL can pare huge, disparate legacy and database extract files into aggregated, sorted, and filtered CSV and XML subsets designed to populate their Diver and iDashboard platforms, respectively.  SortCL can also prepare data for SOA, web services, data modeling, security, and advanced statistical applications like R, SAS and SPSS.

Once the prepared data has been ingested into the BI platform, users can continue to run a variety of custom queries, modifications, and dynamic reports to visualize, and interact with, data at multiple levels of granularity, and cycle their data through additional query and display processes.

Because the IRI Workbench IDE supporting CoSort runs on Eclipse, BIRT users can consume SortCL data targets directly, and produce custom reports in the same environment. They can even specify an IRI Data Source through ODA to combine data integration in the same runtime operation with the report display (i.e., simultaneous data preparation and presentation).

Finally, SortCL itself also includes standard reporting functionality. This means you can actually run detail, summary, and delta reports (usually in batch processes), and still franchise data for more sophisticated BI tools — at the same time.

Leave a Comment

Previous post:

Next post: