Semi-Structured and Unstructured Data Handling

Home » Solutions » Big Data » Semi & Unstructured Data

Quick Links

Overview Package Protect Provision Structured Data Semi & Unstructured Data With/out Hadoop Use Cases IoT Bootcamp

Semi-Structured Data

Data comes in diverse forms from diverse sources, including an explosion in machine-generated data. Semi-structured data formats with flexible schemas, such as JSON and XML, are now standard formats for exchanging and storing data. Conventional DBs and DWs built on fixed schema, however, cannot readily store or process it. Thus it must be stored and carried around in its raw form (compromising performance), or transformed before it is loaded (losing information while adding complexity).

IRI software is designed to process semi-structured data without these trade-offs. The CoSort data manipulation engine in the IRI Voracity big data management platform now handles semi-structured formats nativelyso you can process this data without converting its format or imposing a new external schema.

In this direct mode, Voracity users can leverage the "unparalleled parallel performance" of CoSort engine processing without sacrificing functionality or flexibility. That's why, in addition to big structured data, IRI software can process certain classes of static and streaming semi-structured data, for example:

ASN.1 call detail record (CDR) files
IDMS, IMS, and other legacy sources
MF-ISAM and Vision index files
NoSQL DBs including MongoDB (BSON), Cassandra, and Elasticsearch
Excel, JSON, and XML files
Hadoop HIVE, and cloud / SaaS sources (e.g, AWS S3)
IoT and message queues via MQTT, Kafka, MQseries, etc.

Unstructured Data

You can now also search, extract, and structure data from unstructured text file sources in the IRI Workbench GUI -- and then do everything with the flat-file results in that environment. That means with Voracity, you're get a textual ETL tool as well. In addition, it is possible to find and mask PII in unstructured data files and mask it in place or in new targets with the same file names.

Specifically, running Dark Data search jobs in IRI Workbench, IRI Voracity data management platform or IRI DarkShield data masking product users can simultaneously find, mask/replace/delete, and extract (and then further process) strings based on patterns, explicit or lookup table values, machine-learned NLP models, path filters, or defined bounding-box areas, across: email repositories; NoSQL DBs like Cassandra. Elasticsearch and MongoDB; .pdf, .rtf and MS Office (.doc/x, .ppt/x, .xls/x) documents; .txt, .xml, .html, .hl7 / x12, JSON, XML and other unstructured text and log files -- as well as image and audio files -- all at once.

And from that same Eclipse GUI, IRI software users can operate on the flat-file extracts and their metadata for:

Data integration and transformation, including textual ETL
Data migration and replication
Data masking (encryption, de-ID, redaction, etc.)*
DB load and query optimization
Reporting or wrangled hand-offs to Datadog, KNIME, Splunk and other BI/analytic platforms
Population of CRM, DB, ETL, and external apps
Forensic auditing on the searched or masked data via value extracts and/or source file attributes -- link here to examples analyzed in Splunk.

The Bottom Line

The total trove of big data -- whether analyzed in batch or in real-time feeds -- is of great interest to business and government service providers. IRI software - and its Voracity total data management platform in particular - is the fastest, easiest, and most affordable way to blend and prepare [package, protect, and provision] structured, semi-structured, and unstructured data sources ... within your existing IT infrastructure.

Share this page

Request More Information

Live Chat

* indicates a required field.
IRI does NOT share your information.

Semi- and Unstructured Data

Profile, Process, Protect and Present

Quick Links

Request More Information

Solutions

Products

Customers

Services

Company

Support

News

Partners

Try Voracity Free

Semi- and Unstructured Data

Profile, Process, Protect and Present

Quick Links

Request More Information

Follow us on

Try Voracity Free

Get the IRI Newsletter