Semi- and UnStructured Data


Next Steps
Overview Package Protect Provision Structured Data Semi & Unstructured Data With/out Hadoop Use Cases IoT Bootcamp

Semi-Structured Data

Data comes in diverse forms from diverse sources, including an explosion in machine-generated data. Semi-structured data formats with flexible schemas, such as JSON and XML, are now standard formats for sending and storing data. Conventional DBs and DWs built on fixed schema, however, cannot readily store or process it. Thus it must be stored and carried around in its raw form (compromising performance), or transformed before it is loaded (losing information while adding complexity).

IRI software is designed to process semi-structured data without these trade-offs. The CoSort data manipulation engine in the IRI Voracity big data management platform now handles semi-structured formats natively so you can process this data without converting its format or imposing a new external schema. In this direct mode, Voracity users can leverage the "unparalleled parallel performance" of CoSort engine processing without sacrificing functionality or flexibility. That's why, in addition to big structured data, IRI software can process certain classes of static and streaming semi-structured data, for example:

  • ASN.1 call detail record (CDR) files
  • IDMS, IMS, and other legacy sources
  • MF-ISAM and Vision index files
  • MongoDB (BSON), JSON, and XML
  • Other NoSQL, Hive, and cloud / SaaS sources

Unstructured Data

You can now also search, extract, structure data from unstructured text file sources in the IRI Workbench GUI -- and then do everything with the flat-file results in that environment. See below regarding masking the data in-situ.

Simultaneously find and organize strings based on patterns or specific values across: email repositories, .pdf, .rtf, MS Office (.doc/x, .ppt/x, .xls/x), .txt, .xml, and other files all at once. Then in the same GUI, use IRI software on the created flat files and their metadata for:

IRI software can mask data in unstructured sources too, including Excel via IRI CellShield EE and text / PDF files via IRI DarkShield. IRI is now working to mask PII in image files, too.

The Bottom Line

The total trove of big data -- whether analyzed in batch or in real-time feeds -- is of great interest to business and government service providers. IRI software - and its Voracity total data management platform in particular - is the fastest, easiest, and most affordable way to blend and prepare [package, protect, and provision] structured, semi-structured, and unstructured data sources ... within your existing IT infrastructure.

Request More Information

Live Chat

* indicates a required field.
IRI does NOT share your information.

Try Voracity Free

Big Data Speed & Security. Simple and Seamless

Get Info See Demo