IRI Archives

Processing Live Data Feeds in Voracity

by Alex Murray

This article demonstrates processing a web-based data source in the IRI Voracity data management platform. Static and streaming data defined in URLs — including flat files in formats like CSV or through FTP/S, HTTP/S, HDFS, Kafka, MQTT, and MongoDB — are supported by the default data processing engine in Voracity, CoSort Version 10. Read More

Browsing the Operational Data Store (ODS)

by Jason Koivu

What Is an ODS?

An operational data store (or “ODS”) is another paradigm for integrating enterprise data that is relatively simpler than a data warehouse (DW). Read More

Connecting to Snowflake for Data Integration & Security

by Timothy Brown

Connecting to and working with data in cloud data warehouse powered by an AWS Snowflake database from the IRI Workbench IDE is no different than with an on-premise SQL-compatible source. Read More

JSON and the Truth it Captures

by Paul Friedland

“Have you stopped speeding?” You could probably object to a leading question like this in court, but what happens when an important question with only a yes or no answer is solicited on a mandatory form, and the response becomes part of an actionable database record? Read More

Anonymizing Indirect Identifiers to Lower Re-ID Risk

by Claudia Irvine

Editors Note: This articles covers data anonymization as a form of data masking for privacy protection. In particular, it covers the concepts of quasi-identifiers and re-identification risk and the use of HIPAA data de-identification standards for protecting sensitive data in research through the use of anonymizing techniques like age blurring and demographic attribute blurring in conjunction with re-ID risk scoring. Read More

Production Analytic Platform #4/4: Unifying the Worlds of Information…

by Barry Devlin

This is part 4 of a 4-part series on Production Analytics. Processing on Par with Information [Part 1] Data Processing Drives Efficiency [Part 2] Processing Real World Data [Part 3]

In this final article of the series covering the Production Analytic Platform paradigm, we look at data virtualization—a key requirement in today’s multi-source, data-overloaded world. Read More

Production Analytic Platform #3/4: Processing Real World Data

by Barry Devlin

This is part 3 of a 4-part series on Production Analytics. Processing on Par with Information [Part 1] Data Processing Drives Efficiency [Part 2] Unifying the Worlds of Information and Processing [Part 4]

The inclusion of full function data processing in the Production Analytic Platform simplifies the task of gathering data from external sources such as the Internet of Things and clickstream data that requires both intensive exploratory modeling as well as high-speed application and maintenance of those models on real-time and streaming data. Read More

Production Analytic Platform #1/4: Process on Par with Information

by Barry Devlin

This is the first of a four-part series of blog articles examining the inherent tradeoffs between data processing and information storage and presentation within traditional ETL paradigms — from the ODS to the data lake. Read More

Production Analytic Platform #2/4: Data Processing Drives Efficiency

by Barry Devlin

This is part 2 of a 4-part series on Production Analytics. Processing on Par with Information [Part 1] Processing Real World Data [Part 3] Unifying the Worlds of Information and Processing [Part 4]

Considering data processing as a central component of data management and on a par with databases offers new insights on how to improve overall efficiency and return on investment in traditional data warehouses. Read More

Scoring Datasets for Re-ID Risk

by Dmitry Kulakov

One of the biggest concerns with releasing a dataset is the risk that a potential attacker can identify the owners of particular records. Even though masking or removing unique identifiers, like names and Social Security Numbers, can reduce that risk substantially, it may still not be enough. Read More