CoSort Rapidly Stages and Integrates Massive Files
Challenges:
Bringing disparate data into a central place from diverse systems can be a daunting technical and performance feat. Sophisticated modeling and ETL tools are adept at connecting to, identifying, and mapping between proprietary DB sources and targets.
Unfortunately, there are often performance problems when data volumes are large. Most tools cannot rapidly extract, transform, or load data that is sourced in:
• tables with more than 10 million rows
• legacy index files and data sets
• multi-gigabyte XML, LDIF and other file formats
Solutions:
The CoSort product's SortCL
tool can dramatically speed data integration through single-pass, flat-file processing. IRI and others recommend the use of sequential files for
data integration and staging for their relative portability, simplicity,
and efficiency.
The CoSort-compatible Fast Extract (FACT)
tool for Oracle can help by unloading large tables into flat files,
or by piping Oracle data directly into SortCL jobs for integration with
other files.
SortCL can simultaneously merge and join multiple files to create a single
source of truth. SortCL uses field names as the symbolic references in data
integration efforts to, for example:
• map sources to targets (reformat and convert)
• filter (select and de-duplicate)
• manipulate (sort and join keys, lookup, etc.)
In addition to integrating disparate data sources, SortCL can:
• protect sensitive fields (encrypt, de-ID, mask)
• derive and present information for reports
• create output file subsets for BI tools
• perform multiple transformations
• pre-sort on index keys to speed bulk DB loads