Challenges
Structured
Structured text files are fixed or variable, sequential (flat) files that can be as small as one record or contain billions of archived rows from database extracts, web logs, transaction feeds, mainframe datasets, and other applications.
You may need to:
- Sort a huge text file
- Extract data or create a report from a text file
- Convert between text and XML file formats
- Convert a text file to another format
- Encrypt or de-identify fields in an text file
- Load text data to a spreadsheet or database
- Reformat a text file from legacy or binary data
You may need to do more than one of these functions at the same time, and with many massive source and target files.
Unstructured
Unstructured text sources, including files and repositories in these formats:
- ASN.1 TAP3
- .DOC, .DOCX
- .EML, .OST, .PST
- .PDF, RTF
- .PPT., .PPTX
- .TXT, .XML
- .XLS, .XLSX
can be converted, but the data within them cannot be readily extracted or used in the ways structured data can.
Solutions
IRI delivers data conversion and related manipulation functionality for text files in several products. Choose based on need:
Use IRI NextForm to convert structured text files to other formats (e.g., CSV, ODBC, XML, etc.), or from other formats to text. NextForm supports data type conversion at the field level, and record layout remapping. The NextForm 'Unstructured data' edition can parse and structure data in unstructured text files for the operations described on this page and throughout the IRI product stack.
NextForm file definitions also work in SortCL programs under IRI CoSort. Re-use the metadata if you upgrade to CoSort for fast data transformation and reporting.
The SortCL program in CoSort can:
- transform the data (i.e., sort, join, aggregate, cross-calculate, etc.) in text files
- convert text files to other file formats and create text files from those formats
- report from text file sources
using a simple 4GL for layout and manipulation definitions, or a powerful free GUI built on Eclipse.
Map one or more input files in text format to and from other file formats. Create detail, summary, or delta (change data capture and slowly changing dimension) reports from text files sources. Hand off pre-sorted, filtered, and converted subsets to BI tools, database load utilities, or other applications.
Use IRI FieldShield to protect fields in structured text files with encryption, masking, etc.
Use IRI RowGen if you need test data in text file formats. RowGen uses the same layout metadata as CoSort (SortCL) and NextForm so you can easily move between test data generation and real data transformation.