Only IRI Voracity® can manipulate and manage a huge range and volume of data in one affordable Eclipse™ pane of glass. Use it to rapidly and reliably discover, integrate, migrate, govern, and analyze data in every source.
"The IRI Voracity platform has caused me to rethink the relative importance of data processing in information-centric systems, such as data warehouses and lakes. With the right features and sufficient power, a data processing platform can complement and, indeed, extend the function of the databases traditionally considered to be at the core of these systems."
-Barry Devlin, 9sight Consulting
Check out Voracity's capabilities, the challenges it addresses in digital transformations, and its components, below. Explore the other tabs in this section, and the solution areas throughout this website to understand just how much your teams can cooperatively accomplish with the state-of-the-art technology in this platform.
Voracity uniquely combines the discovery, integration, migration, governance, and analysis of data in a variety of sources ... all from one place, and often in one pass. Manipulate, map, migrate, mask, munge, and mine structured, semi-structured, and unstructured data, and produce multiple targets at once.
Voracity addresses the challenges of data volume, variety, velocity, veracity, and value with a comprehensive data management platform that eliminates multi-tool complexity and bends the cost curve away from megavendor ETL packages and Hadoop distributions.
Voracity is powered by IRI CoSort or Hadoop engines, and everything it does is front-ended in one graphical IDE, built on Eclipse™. Beyond a massive amount of included features, a plethora of free Eclipse plug-ins and proven partner technology expand what you can do with Voracity.
Voracity's core data management capabilities leverage the functionality of the IRI CoSort SortCL data definition and manipulation program.
As one of the original and few remaining viable fast processing alternatives to Hadoop, SortCL packages, presents, and provisions big data. It combines: data cleansing, extraction, transformation, loading, masking, reporting -- and even synthetic test data generation -- in the same job script and multi-threaded I/O pass in your existing file system.
If however, you still need the scalability and capability of Hadoop, however, you are covered. Voracity supports the execution of many SortCL jobs in MapReduce2, Spark, Spark Stream, Storm, and Tez. Compare that to Hadoop distributions you are considering, or to the disjointed Apache projects you are trying to coordinate.
All of that work in the middle starts with data discovery. Only Voracity provides at least four data profiling tools. And it ends with analytics, where you have three choices: 1) embedded BI; 2) DataDog, KNIMEandSplunkintegrations; or, 3) robust data preparation (wrangling) for your chosen data visualization platform.
As the above schematic illustrates, Voracity supports the design, deployment and management of all these activities from a single Eclipse pane of glass, IRI Workbench:
Only Voracity delivers multiple, graphical job design and deployment options in the same Eclipse IDE. And only Voracity uses the latest CoSort engines while also supporting multiple Hadoop engine alternatives from that same GUI which require no additional coding.
So, by embedding mission-critical data integration, migration, and governance capabilities, supporting Hadoop sources and engines, and by front-ending data discovery, EMM, MDM, and workflow in a continually developed Eclipse IDE, Voracity is not only functionally comprehensive, it's uniquely ergonomic, scalable, and future-proofed for new data sources and enterprise information needs.
Prepare big data subsets for analytics fast by accelerating and combining transforms in your file system -- not in the BI or DB layer. Use Voracity to de-duplicate and filter, sort and join, aggregate and segment, reformat and wrangle data all in one pass. Create reports on the fly as part of the process, too, with embedded BI. Or, send prepared data in memory to BIRT, Datadog, KNIME or Splunk in real-time, or into cubes that your app wants. Otherwise, hand-off wrangled flat-files or RDB view tables for use in Business Objects, Cognos, Cubeware, iDashboards, Microstrategy, OAC/OBIEE/ODV, PowerBI, QlikView, R, Splunk, Spotfire or Tableau, speeding time-to-display.
The CoSort engine in Voracity processed big data long before it was called big data, running and combining multi-gigabyte transforms in seconds, and besting 3rd-party sort, BI, DB, and ETLtools2-20X. And when IRI turned 40 in 2018, DW industry gury Dr. Barry Devlin declared Voracity to be a production analytic platform. Learn why here.
And now there are Hadoop options in Voracity too, distributing and scaling huge workloads across commodity hardware via MapReduce2, Spark, Spark Stream, Storm, and Tez.
What tools are you using now to discover, extract, process, and analyze all the data you gather or buy? Can you reach and process it all in one pane of glass? Can you quality-control and manage its metadata and master data in that same place? Can you analyze the data there too, or at least rapidly integrate and prepare it for external applications? If you use multiple tools, can you manage the expertise they require? Or if you use a legacy ETL platform, can you bear its cost?
Voracity analyzes, integrates, migrates, governs, profiles, and connects to some 150 different data sources and targets ... structured, semi-structured, and unstructured.
That includes legacy files, data and endian types, as well as popular flat and document file formats, every RDBMS, and newer big data and cloud/SaaS sources.
The biggest data volumes are still processed in regular batch cycles, something Voracity's native CoSort or Hadoop Map Reduce and Tez options can optimize. But what about the need to process (transform, mask, reformat) and analyze data in real-time for instant promotional campaigns (think mobile devices), or alerts (like traffic and weather notices) that can help drivers or event-goers?
Voracity includes CoSort to integrate data in memory and files, so you can process big data 6X faster than ETL tools, 10X faster than SQL, and 20X faster than BI/analytic tools. Its typical mode, including CDC, is batch.
Voracity can process real-time, near-real-time, and streaming data through Kafka or MQTT brokers, in memory via pipes or input procedures to CoSort, or in Hadoop Spark or Storm engines ... all from the same Eclipse GUI, IRI Workbench. Other options include using the built-in job launcher to spawn Voracity jobs in near-real-time intervals, or using specialized BAM or CEP tools for managing event-driven activity.
Garbage in = garbage out, and thus data in doubt. Data quality suffers from inconsistent, inaccurate, or incomplete values. Social media data can be deceptive, unstructured data imprecise, and data ambiguity plagues MDM. Survey data can be biased, noisy or abnormal. Meanwhile, PII and secrets contained in all that data mean you have to mask it prior to shared use. Do you have a central point of control for cleaning data and making it safe?
Voracity's data discovery, fuzzy matching, value validation, scrubbing, enrichment, and unification features all improve data quality.
Voracity's comprehensive data masking functions and synthetic test data generation capabilities remove the risk of data breaches and poor prototypes.
Consider your information and decision needs from data. For example, are you tracking consumer behavior, weather patterns, device or web log activity so that you can change promotions, make predictions, or diagnose problems? Do you see the value in an IDE easy enough for self-service data preparation and presentation, but powerful enough for IT and business user collaboration in data lifecycle management? And if you use BIRT, KNIME or Splunk, can you get data into those structures AS it's being wrangled?
Voracity is the one tool that provides access to, and discovery across, the disparate data sources behind these analyses.
Only Voracity allows you blend, cleanse, mask and mungetons of data fast, and feed the results to algorithmic and visualization applications -- within the same, or another, environment in the right format. That's why Dr. Barry Devlin calls Voracity a Production Analytic Platform.
The default Voracity stack uses IRI Workbench for client-side design of data-driven jobs defined in portable scripts represented in multiple graphical UIs.
Many of the same jobs also run interchangeably in Hadoop MR2, Spark, Spark Stream, Storm, or Tez.
Voracity metadata and related job script parameters are fully supported in the Workbench data model and optionally in erwin Mapping Manager (Data Catalog), for graphical creation, modification, and management.
Within the base Voracity package are:
- DB, flat-file, and dark data search and profiling, plus E-R diagramming and metadata definition wizards
- the data processing features of CoSort, NextForm and RowGen in the IRI Data Manager suite
- all data security features of the three 'shield' products in the IRI Data Protector suite
- apps for seamless analytics in Splunk and data science in KNIME
- multiple, re-entrant job design options and execution paradigms
- runtime and metadata SDKs for application development
- robust GUI help content and CLI reference manuals
More specifically, Voracity includes free use of a rich, familiar front-end job design and management environment called IRI Workbench, built on Eclipse™. Together with Voracity's back-end production engine you can run anywhere, IRI Workbench supports the capabilities of:
- IRI CoSort for big data manipulation and movement, including EDW integration (ETL) and data preparation for DBs and analytic tools, embedded BI, data quality, metadata and master data management, legacy sort migration, and data governance
- IRI NextForm for data and DB migration, data replication, remapping, and federation
- IRI FieldShield for masking PII in flat files and structured (1NF) RDBs, IRI CellShield EE for Excel® sheets, and IRI DarkShield for semi- and unstructured data sources.
- IRI RowGen for generating (and masking) DB subsets, and for synthesizing safe but realistic file, database, and report test data
Additional capabilities of IRI Workbench include:
- default and plug-in shell UIs for command line execution and interaction
- multiple data profiling and metadata discovery and definition wizards
- metadata management and master data management (MDM)
- handoffs for BIRT (visual analytics) and a data source node for KNIME
- Sirius workflow, transform mapping, and E-R diagrams
Beyond the base edition, premium options include:
- erwin Smart Connector | automated job/metadata migration from other ETL tools
- erwin Mapping Manager | code-free source-target ETL mapping/flow generation
- CONNX source drivers | move/manipulate mainframe and other proprietary data
- DataDirect DB drivers | move/manipulate big data and cloud/SaaS data
- DW Digest | cloud dashboard for interactive BI
- IRI FACT | parallel unload of Oracle and 6 other VLDB tables to files
- Hadoop runtimes | MapReduce2, Spark, Spark Stream, Storm, and Tez options
- ValueLabs Test Data Hub (TDH) | on-demand FieldShield & RowGen DevOps datasets
- Windocks | virtualized, on-demand database (clone) images with IRI-masked/cleansed/synthesized data
Whether included with the base Voracity package, or installed as partner technology, everything runs in IRI Workbench and leverages the same, open data and manipulation metadata infrastructure for job management and deployment ... inside or outside that GUI.
Voracity jobs are also compatible with Datadog, KNIME, MF COBOL, Software AG Natural (sorting), Splunk, Value Labs Test Data Hub, and Windocks, can integrate with Jenkins, NodeJS, SQL, and third-party schedulers, and be called from Actifio, Commvault, and third-party ETL tools.