Abstract: IRI and DataSwitch have partnered to create an end-to-end, multi-featured metadata and data engineering environment to: modernize legacy data and ETL jobs, create and orchestrate data mapping operations with AI, mask PII on the fly, and wrangle data in business-friendly ways for analytics.
This is the first in a series of articles introducing the tie-up between IRI Voracity data management operations, and web-facing control from the no-code DataSwitch data modernization, engineering and democratization platform. See the Q3’2021 corporate and code collaboration announcement is here.
You already know that IRI Voracity is the data management platform product for discovering, integrating, migrating, governing, and analyzing large and/or sensitive data in a variety of sources and silos. Voracity operations on structured data are designed in the IRI Workbench IDE built on Eclipse and powered by the ‘SortCL’ manipulation program first associated with IRI CoSort.
But as performant and versatile as the SortCL engine and IRI Workbench (Eclipse IDE) are for mapping, masking, munging and mining data, Voracity lacks a web-facing job design and server-based orchestration environment for it, as well as AI extensions for things like textual ETL and self-service analytics. And for those who want to convert legacy ETL tool mappings into Voracity tasks, only Erwin tools and services could help.
DataSwitch (DS) fills in these gaps using a web-based “no-code” platform for digital data modernization where legacy enterprise data and architectures are redefined into more customer-centric formats. More specifically, DS has three toolkits that build local or cloud data platforms to: 1) scale decision-making processes – “DS Migrate”; 2) automate manual data processing – “DS Integrate”; and, 3) simplify complex analytics – “DS Democratize”:
These cloud data platforms are molded around three pivotal business objectives:
- To scale decision-making processes
- To automate manual data processing and
- To simplify complex data analytics.
The Combined Platforms
The new platform alliance between IRI and DataSwitch is both complementary and value-adding. As mentioned, it fills in some of the modern ETL platform gaps that Voracity had. And at the same time, Voracity brings more performant and affordable data integration, cleansing, masking, and wrangling alternatives to DataSwitch users trying to migrate their legacy data and ETL jobs.
Specific combinatory capabilities are more easily understood through the descriptions below, for:
1. Legacy Data Modernization (DS Migrate)
where many data-driven enterprises are en route to the cloud, and/or looking for a way to leave their legacy ETL tool. These initiatives are meant to rescue time-to-market, save money, and achieve greater scalability and agility. There is also the unending concern for data governance and security and a desire to become self-serviceable to drive ROI.
These modernization goals are met in the combined platform through the rapid migration of schema, data, and mapping jobs from older data warehouses and ETL tools using DS Migrate “no-code” ergonomics to create faster Voracity-powered cloud data warehouses and data integration services. For example, DS Migrate can quickly and accurately convert SQL or Informatica jobs to faster and more affordable Voracity ETL jobs orchestrated and monitored centrally in Apache Airflow.
Why does this matter? Because according to Vanson Bourne, 83% of IT Decision Makers are not fully satisfied with the performance and output of their data management and data warehousing solutions and because manually transforming those workloads consumes more time and resources than expected.
Along the way of course, Voracity’s powerful data discovery, quality, masking, subsetting, and synthesis features support data governance, privacy law compliance, and test data management initiatives. In fact, in the same jobs that migrate and/or munge, you can also cleanse, mask or mock sensitive data in any schema and preserve its referential integrity.
Legacy data and metadata sources supported include Teradata, Netezza, MySQL, Oracle, Microsoft SQL Server, IBM DB2, Informatica, SSIS, and IBM DataStage. Cloud systems include PySpark, Databricks, Matillion, SnapLogic, AWS Glue, Talend, Snowflake, AWS Redshift, Google BigQuery, Exasol, etc.
2. Self-Service Data Engineering (DS Integrate)
which enables business users to simplify both structured and unstructured data integration and masking through data catalogs and predictive data mapping which results in domain-specific, Voracity-powered process automation and security. For example, beyond DBs and EDI files, users can graphically parse and structure data in Word and PDF files, then map, mask, munge, and mine that data in Voracity jobs.
Why does this matter? Because textual ETL is traditionally difficult and expensive, yet necessary to leverage relevant information in unstructured data repositories along with traditional structured and semi-structured (transactional database or file) sources.
DS Integrate is a self-serviceable, business-friendly, metadata-based toolkit for providing AI/ML-driven data aggregation. This part of the platform consolidates and ingests structured or unstructured data to domain-specific data applications, and supports generating AI catalogs, predictive data mapping, and NLP-based data enrichment.
Data engineers can simply upload the unstructured or structured data in any source format (including ODBC, JDBC, PDF, image, or documents), and DS Integrate will follow coding standards and best practices to convert the input into a structured data catalog. This catalog can then be converted into the data (/FIELD) layouts that Voracity can use in ETL, masking and other jobs running locally or in the cloud.
The bottom line is that the DataSwitch data engineering front-end and Voracity data mapping back-end here again combine seamlessly to accelerate data and process migrations. The DataSwitch drag-and-drop web UI provides business analysts with an agile, self-service approach to configuring the data acquisition and integration, which Voracity jobs can execute rapidly and automatically in production.
All this saves time and effort while reducing friction between business and IT teams. It also helps enterprises put their hired niche experts to better use, cut outsourcing costs, and allow data-driven enterprises to become IT-independent.
3. Data Democratization (DS Democratize)
which features a conversational, AI-driven “Data-as-a-Service” to streamline and drive data and analytics consumption across the enterprise. For example, business users can ingest and integrate data from a Voracity-populated and -regulated data lake to feed their chosen analytic target.
Why does this matter? The Harvard Business Review attests that over 95% of industry leaders state that commercialized access to data and analytics throughout the enterprise is crucial to business accomplishments. And it matters to IRI and DataSwitch because the global big data and analytics market is growing annually at 12.8% through 2025 (IDC 8/21).
Once the knowledge base is available in the cloud, DS Democratize extracts meaningful and required information using Natural Language Processing (NLP) technology. The business user only has to type in a question to the chatbot and DS Democratize will generate relevant code or a SQL query and send it to the knowledge base, from where it retrieves and presents the data in the required format.
In this way, insights can be extracted in a conversational manner. The opportunities for literally anybody to access data insights and generate them on their own have just grown exponentially thanks to progress in user inclusivity. This maximizes the value of data and insights for a broader spectrum of enterprise users.
DataSwitch (DS) is a no-code platform for rapid data modernization, engineering and democratization. Its goal is to provide a wide range of enterprises and users hassle-free ways to redefine data architecture into more customer-centric, cloud-friendly designs and outcomes.
To that end, the new, tight API-level integration between DS and IRI Voracity can give business and IT users the best of both worlds. A large swath of high-speed discovery, integration, migration, governance and analytics functionality in Voracity is now directly accessible from all three DS cloud data toolkits above.
If your enterprise needs data profiling or modeling, transformation or migration, cleansing or masking, subsetting or synthesis, reporting or wrangling, or enrichment and consumption email firstname.lastname@example.org or email@example.com. Our knowledgeable team will be happy to help you manage the entire cloud data engineering ecosystem.