How do you currently manipuate VLDB table and flat-file data? ETL tools, Perl scripts, custom programs, and PL/SQL procedures can be expensive and hard to maintain. Complex GUIs and coding syntax make specification difficult, and runtime performance may be lacking.
Can you leverage multiple CPUs and cores for big jobs, run several tasks in the same I/O, and dynamically allocate resources to optimize performance? Are your data and job definitions easy enough for a non-expert to safely modify?
The SortCL program in IRI's CoSort package does the heavy lifting of data transformation in the world's largest data warehouses, operational data stores and clickstream data webhouses. And it does it with speed and simplicity that no other tool or method can match.
Optimize ETL operations as you combine sorts, joins, and aggregates in a single job script, partition, and I/O pass. At the same time, de-duplicate and filter, convert and remap, lookup, rank, pivot and unpivot, calculate, and encrypt to protect, and mask to re-cast. Create custom output reports and hand off pre-processed subsets in CSV and XML format for data marts and BI tools.
Transform large volumes of data in many different tables and flat files together. Discover, define, and expose your data and manipulation definitions in simple text file metadata repositories. Use named fields for the mappings, as you:
- Map sources to targets
- Reduce script sizes and creation times
- Facilitate reorg and ETL operations
- Produce load and file-compare metadata
Perform all the same transforms that slower and more complex SQL procedures or ETL tools do, as well as:
- change data capture
- row-column rotation (pivot/unpivot)
- slowly changing dimension reporting
- star (or snowflake) schema targeting
- static, structured, running, and windowed aggregates
- discrete and operative value lookups
- format mass and other value modifications
- data cleansing
- data protection (masking, encryption, pseudonymization, etc.)
Use Existing Metadata
SortCL and related facilities in the CoSort package accept many third-party data layouts, e.g., DB DDL, COBOL copybooks, CSV, LDIF and XML files, CLF and ELF web logs, and SQL*Loader Control File metadata. SortCL job scripts contain SQL-familiar commands that use and/or reference the layouts.
Meta Integration Technology, Inc. (MITI) also has a metadata model bridge (MIMB) spoke to SortCL's data definition file format. If you have file layouts already defined for popular ETL tools like Informatica or DataStage, MIMB can automatically produce the equivalent layouts for use in SortCL. Accelerate those tools without having to manually redefine your metadata.
Interoperate & Accelerate
SortCL transformations work hand-in-hand with data extraction and loading utilities. SortCL can take piped data from IRI's Fast Extract (FACT) tool, and pipe it pre-sorted into database load utilities like SQL*Loader. SortCL can also connect through ODBC to other databases and Excel to acquire and deliver data.
SortCL transforms can run alongside ETL tools like Informatica and DataStage, to optimize their performance. SortCL jobs run on the command line, in batch scripts, from 3GL programs, via API calls, or in the IRI Workbench GUI, built on Eclipse. Easily embed these transforms to accelerate your applications.
SortCL exploits CoSort's granular performance tuning and flexible CPU licensing. IRI's continuing innovation in parallel data movement, I/O and memory management, data manipulation functionality and consolidation -- along with our meaningful industry partnerships -- keep you at the leading edge of big data transformation.