Data transformation, re-formatting, and reporting are often performed in slow, separate steps: e.g., sort, then join or aggregate, then stop in a flat file that gets handed off to data scrubbing process. Then it populates a data mart, BI tool, etc.
Alternatively, pre-process filter and format steps are performed prior to large transforms. Sometimes complex languages like Perl or Python are used to recast data.
Either way, multiple passes through the data are necessary and, sometimes, multiple platforms.
The SortCL data transformation program in IRI CoSort performs data mappings using field names as symbolic references on output. This allows you to reformat and report on data in the same job (and I/O) with ETL operations.
Specifically, as SortCL maps fixed- or variable-position fields from input to output, it can re-map (i.e., re-position, re-size, align, trim, pad), and type-convert. Additional custom layout options include changing fixed-position layouts to variable (floating) and vice versa.
Parse, strip, or rewrite header records on output. Insert special formatting characters and environment variables, including markup language commands for web-ready reports.
Perform mathematical expressions (cross-calculation) between field data, or on joined and/or aggregated values, to derive and output new detail or summary field values. Create as many simultaneous output targets and formats as you need in the same job script, and produce them in the same pass.
Reformat files from one type to another. Come in with Micro Focus ISAM and leave with CSV, or go from a huge flat file to XML, and vice versa.
For database indexing, append a "sequencer" integer field to each sorted record with the starting value and increment. This allows the database to cross-reference records over a number of fields.
Populate targets directly through ODBC, pipes, or procedures, or provide them with the flat files they need for loading or further integration.