Data comes in diverse forms from diverse sources, including an explosion in machine-generated data. Semi-structured data formats with flexible schemas, such as JSON and XML, are now standard formats for sending and storing data. Conventional DBs and DWs built on fixed schema, however, cannot readily store or process it. Thus it must be stored and carried around in its raw form (compromising performance), or transformed before it is loaded (losing information while adding complexity).
IRI software is designed to process semi-structured data without these trade-offs. The CoSort engine inside Voracity handles semi-structured formats natively so you can load without converting its format or imposing a new external schema. By taking this direct approach, Voracity users can leverage the "unparalleled parallel performance" of CoSort engine processing without sacrificing functionality or flexibility. That's why, in addition to big structured data, IRI software can process certain classes of semi-structured data, for example:
Simultaneously find and organize strings based on patterns or specific values across: email repositories, .pdf, .rtf, MS Office (.doc/x, .ppt/x, .xls/x), .txt, .xml, and other files all at once. Then in the same GUI, use IRI software on the created flat files and their metadata for:
- Data integration and transformation
- Data migration and replication
- Data masking (encryption, de-ID, etc.)
- DB load and query optimization
- Reporting or hand-offs to BI Tools
- Population of CRM, DB, ETL, and external apps
- Forensics on the data via exposed source file attributes
IRI and partner software can manipulate and/or mask data in-situ or on-the-fly in unstructured sources like those listed above, plus image files through OCR and other techniques. The total trove of big data -- whether analyzed in batch or in real-time feeds -- is of great interest to business and government service providers.
The Bottom Line
IRI software - and its Voracity total data management platform in particular - is the fastest, easiest, and most affordable way to blend and prepare [package, protect, and provision] structured, semi-structured, and unstructured data sources ... within your existing IT infrastructure.