How Do You Perform ETL?
In ETL operations, data are extracted from different sources, transformed separately, and loaded to a data warehouse (DW) database and possibly other targets.
This data mapping and staging is the everyday stuff of data integration, but the best known ETL tools are well known for poor performance, high cost, and deliberate complexity. The modern exception is IRI Voracity.
For greenfield projects, Voracity's CoSort engine and Eclipse UI -- plus its seamless support for Hadoop and just about every data source -- make it the ideal platform for every major data integration paradigm.
If you have a legacy ETL tool, use Voracity to accelerate unload, sort, join, aggregation, or load tasks 2-16X. Or, auto-convert your mappings to equivalent Voracity jobs with AnalytiX DS technology and save 2-9X.
Click here to see why Voracity's is now the best choice among ETL tools, or check out its key attributes below:
IRI CoSort takes data from any static or streaming source, including a pipe from FACT, and does the heavy lifting of data transformation, load pre-sort, cleansing, masking, and reporting all in the same job and I/O pass.
The IRI Voracity platform can combine FACT, CoSort, and bulk loads in a visualized, scheduled ETL workflow that does not require compilation or partitioning. It can even run many CoSort (SortCL)-defined manipulation and masking jobs in Hadoop seamlessly via MapReduce 2, Spark, Spark Stream, Storm, or Tez.
Compare all this to slower, more verbose SQL and 3GL programs, and to costlier, more complex ETL and ELT platforms ... not to mention the onboarding delays of disjointed Apache projects.
ETL metadata and job definitions are automated in the IRI Workbench GUI for Voracity, built on Eclipse™. Data discovery and new job wizards, and a number of visual ETL job design and rule management options, allow you to speed-build reusable repositories and scripts without requiring an education in new syntax.
That said, Voracity metadata is the easiest in the IT industry to learn and use. It leverages the same human-readable 4GL of CoSort -- called SortCL -- that features explicit data layouts, supports SQL concepts and syntax, and enables team sharing. Even today, many users still prefer to code and tweak their jobs scripts in Voracity\'s syntax-aware, color-coded script editor, or their own.
Beyond extremely fast extract/load, and one-pass, no-partitioning-needed data transformations, the Voracity ETL environment includes:
- Change Data Capture (Delta) Reporting
- Dark Data Search/Extract/Structure
- Database and Flat-file Profiling
- Data Masking, Encryption, etc.
- Data/DB Migration and Replication
- Data Classification and Metadata Generation
- Detail and Summary Reporting
- Master Data Management
- Metadata Management and Data Lineage Analysis
- Offline Reorgs
- Slowly Changing Dimension Reporting
- Test Data Generation and DB Subsetting
Voracity supports these activities on a very broad range of structured, legacy, big data, cloud and SaaS data sources.
Create, run, and managed all your ETL (and other!) jobs in the same pane of glass ... the IRI Workbench GUI for Voracity, built on Eclipse.™
Use new job wizards, workflow and mapping palettes, dialogs and outlines, syntax-aware script editors (or any text editor you prefer), or even AnalytiX DS Mapping Manager. Only Voracity gives you the choice of defining your data and manipulations visually or through simple, self-documenting 4GL scripts. The metadata is model-driven and re-entrant, so anything you do in one mode feeds the others.
Test or run jobs individually or together in the GUI flow, or later in a (scheduled) batch operation. You have that execution flexibility because the job scripts are portable. You can run any of the pieces, or the whole project, on any platform where the engine(s) are licensed. Call them from the command line or any application.
The IRI Workbench GUI for Voracity delivers the visual metadata creation, conversion, and discovery tools you need to generate, deploy, and manage the job scripts, data definition files (DDF), and XML workflows common to all IRI software.
In the same place, you can also design and run COBOL, C/C++, Hive, Impala, Java, Perl, Python, R, SQL, and other programs supported in Eclipse, and sometimes incorporate them as steps in your Voracity workflow.
Voracity is far more than an ETL tool, yet is priced below all of them. Even if you don't use it for ETL, because its SortCL program can join across many sources and query data in flat files, Voracity continues in the CoSort tradition as the fastest, least expensive change data capture, 2D reporting, and data blending tool available.
For serious ETL architects however, Voracity's consolidation and multi-processing of transformations in the file system or (seamlessly in) Hadoop makes it the most cost-effective big data processing alternative to DB/ELT appliances, Ab Initio, SyncSort, Teradata, and in-memory DBs.
Finally, with its free Eclipse client, freemium or low-cost opex server subscriptions, and easy on-boarding, Voracity is the most affordable data management platform to acquire and maintain.