IRI Voracity provides analytic capabilities in three ways, with a fourth pending:
1) Embedded reporting and analysis - via CoSort SortCL programs that write custom detail, summary, and trend reports in 2D formats complete with cross-calculation, Boost-driven statistical functions, and incorporated data transformation, remapping, masking and formatting features. The reports can be descriptive, or through more fuzzy logic and statistic functions, predictive.
2) Integration with BIRT in Eclipse - where at reporting time, BIRT charts and graphs you design get populated with 'IRI Data Sources' via ODA support for Voracity/CoSort SortCL output. What's interesting about this in-memory transfer of SortCL data and metadata to BIRT that the data integration/preparation get run when the report is requested, saving time as well as resources by having data prepared outside the BI layer (with CoSort or Hadoop engines)
3) Data preparation (franchising) that accelerates time-to-visualization for 10 third-party BI and analytic vendor platforms. This section of the IRI blog site features benchmarks run when SortCL (available to Voracity or CoSort users) alone runs ahead of BOBJ, Cognos, Microstrategy, QlikView, Splunk, Spotfire, R, and Tableau.
Yes. CoSort offers a range of solutions for generating meaningful reports from huge volumes of data. You can use CoSort's SortCL (4GL) program as a standalone report generator, or as a staging tool to digest and hand-off large volumes of data.
Not only does SortCL transform and protect huge volumes of disparate data from a variety of RDBMS sources, and sequential or index files, plus web and various device logs (including ASN.1 TAP3 CDRs). It can also join, aggregate, calculate, and display that data in custom detail and summary report formats, complete with special variables and tags for web pages.
By creating output in .CSV and .XML formats, CoSort's SortCL tool can directly populate spreadsheets like Excel, databases, ETL tools, and BI tools. See the next question in this section, or:
Any and all BI tools that can import CSV and flat XML files, or RDBMS tables ...
IRI software's job in these cases is to prepare or "franchise" big, structured data in a centralized place. CoSort or Voracity users create Sort Control Language (SortCL) programs in 4GL text scripts -- or through wizards in the free IRI Workbench GUI, built on Eclipse™ -- that transform (filter, sort, join, aggregate, mask/encrypt, pivot, pre-calc, etc.) data in more than 125 sources prior to the hand-off.
SortCL programs or Voracity targets can directly populate spreadsheets like Excel, databases (relational and NoSQL), ETL tools, and BI tools, including:
- SAP Business Objects
- IBM Cognos
- DI-Diver from Dimensional Insight
- iDashboards from iViz
as well as newer analytic platforms, such as:
In addition, ODA driver support in the IRI Workbench provides seamless data and metadata flows between CoSort/SortCL data preparation and BIRT/ActuateOne presentation.
For more information, please see:
Yes, though not formally stored in an IRI Workbench roles registry, those rules can be applied ad hoc. Source/target data accesses are defined by the DBA, and the configuration details are stored in manageable DSN files or Workbench Data Connection Registry settings subject to workspace access permissions controlled at the file (or O/S log-in) level. XML audit log files and other metadata assets (e.g., for FieldShield data masking jobs) can be controlled at the O/S level, or SCCS (e.g., EGit) repository level.
Multiple roles/permissions for metadata assets (ddf, job scripts, flows) etc. can be assigned in the AnlytiX DS data governance platform (premium option) or free IRI Workbench hubs like EGit and other Eclipse-compatible repositories. See http://www.iri.com/blog/iri/iri-workbench/introduction-metadata-management-hub/ for an example.
Yes, access to data source (and targets, down to the column level) can be controlled through DBA-granted or file-level permissions (managed in DSN files and the IRI Workbench data connection registry), as well as through field-level revelation authorizations controlled in (securable) job scripts and decryption keys.
Yes, in multiple ways, including access permissions to job metadata, data sources (and targets), and decryption keys, and via differential data class function/rule assignments. Contact email@example.com or firstname.lastname@example.org for assistance setting up your controls.
Does Voracity (FieldShield, or other IRI software) independently check who the end-user is attempting to access protected data, or does it rely on the underlying database or application access controls?
Both, in that the FieldShield job script access specifications (or library-calling application) must match defined permissions for source and target DB instances. IRI also offers optional DAM/DAP technology as part of the Voracity platform, callable from the IRI Workbench.
Through client computer access controls and file permissions on the Workbench workspace. Below that, either the AnalytiX DS governance platform or any Eclipse-compatible SCCS like Git for metadata assets -- where permissions by role are configurable -- can lock down specific projects, jobs, and other metadata assets.
Some of the things that Voracity offers that legacy and open source ETL (much less ELT) tools do not are:
- Built-in data profiling tools for flat-files, databases, and dark data (unstructured) document sources
- Raw power and scalability with or without Hadoop; i.e., built-in performance in volume, but also seamless support for Hadoop!
- A negligible learning curve: simple, explicit, accessible, and open text metadata you can easily use, modify and share
- The ability to deploy jobs outside the GUI, running them via command line, batch, or any program via system or API call
- An open source GUI you already know (Eclipse) that front-ends proven, robust manipulations and reports on big structured data
- Advanced aggregation functionality like lead/lag, ranking and running, multiplication and expressions
- Multiple nested layers for both conditions and derived fields with support for PCREs, fuzzy matching, C (math/trig) functions, locale and 'conversion specifiers', etc.
- Composite data value definition for both production data (format masking) migration and test data generation
- Built-in: data and DB profiling, migration, replication and administration
- 12 field protection (static data masking), DB subsetting, and synthetic referentially correct test data generation
- Data-centric change data capture, slowly changing dimension and detail and summary reporting, plus trend (predictive analytics), and web log (clickstream analytics) reporting
- Seamless metadata integration with Fast Extract (FACT) for major RDBs, plus Hadoop, AnalytiX DS and MIMB-embedding platforms
- Superior price-performance, fast ROI, and immediate access to US-developer support
Another way to consider the differences is by looking at what Voracity's does not require, and why:
With Voracity, there is no need for:
|separate transforms or transform stages||can combine filter, sort, join, aggregate, pivot, remap, custom and other transforms in the same job script and I/O pass, though it can represent and run them separately in separate task blocks|
|partitioning, manual or otherwise||automatically multi-threads and uses other system resources only your resource controls limit, and does not push transformations into the database layer where there are inherently less efficient|
|manual metadata definition||provides automatic metadata discovery and format conversion tools, and is supported by AnalytiX DS Mapping Manager and CATfx templates, as well as MITI's MIMB platform|
|separate BI (reporting) tools||can produce custom-formatted details and summary reports in the same job script and I/O pass with all the transforms, and/or hand off data to files, tables, or ODA streams in Eclipse for BIRT|
|separate data masking tools||includes every single function in FieldShield, the most robust data masking and encryption tool available.|
|separate test data tools||all the functions of RowGen, which can generate safe (no need for production data), intelligent (realistic and referentially correct) test data for DB, file, and report targets|
|long-term consulting||uses an already familiar Eclipse GUI and metadata defining both data and ETL processes|
|separate MDM hubs or data quality tools||has a wizard for MDM, plus support for: composite data type definitions, master data value lookups, joins, tables and set files suitable for production or test data|
|a new team sharing or version control paradigm||metadata repositories and job scripts work with any source code and metadata version control system, including AnalytiX DS and GIT, CVS or SVN in Eclipse|
|concerns about open source or support||is backed by IRI, a stable 38-year-old company with more than 40 international offices|
|a huge budget now, or a lease renewal headache later||is sold at affordable prices for perpetual or subscription use|
Yes, CoSort is a data transformation, and thus, ETL engine. It is not a traditional ETL package however, but the IRI Voracity that leverages CoSort for data transformation is. Voracity can use CoSort as well as Hadoop for transformations, and of course FACT for extraction and pre-sorted bulk loads into auto-config'd DB load utilities. Refer to:
When you use Voracity or CoSort, you benefit from high-performance (I/O-consolidated, multi-threaded) Transformations like:
In addition, CoSort -- and in particular, its Sort Control Language (SortCL) program -- can also handle:
- slowly changing dimensions
- fuzzy logic lookup tables
- pivoting (normalization and denormalization)
- running, ranking, and windowed aggregates
- bulk/batch change data capture
In Voracity, much of the above is exposed in graphical wizards, specification dialogs, workflow and transform mapping diagrams so you don't need to learn how to script those jobs. The jobs can be previewed and then run (scheduled) from the GUI, or on the command line (and thus in batch scripts and other applications, as well as third-party ETL tools that need a boost). Metadata can be change-tracked, shared, secured, and version-controlled in repositories like EGit on-premise or in the cloud.
Yes, both. As far back as 1999, industry experts have been touting CoSort as an ETL engine for its high-performance data staging and integration capabilities. CoSort - and it's SortCL program in particular - performs the heavy lifting of selection, transformation, reporting, and pre-load sorting against sequential files in an ODS, DW staging area, or on extracted tables in suspense.
CoSort's SortCL is a push-down optimization option for Informatica PowerCenter, and in the sequential file stage of IBM DataStage, to perform faster, combined (single-pass) sort, join, and aggregation operations. Click here to read the press release about CoSort's 6x improvement of Informatica speed.
Besides proven integrations with, and plug 'n play sort replacements for, DataStage and Informatica, CoSort also links to Kalido, ETI, SAS and TeraStream ETL packages. CoSort's SortCL programs can be called as an executable from any tool allowing that as well, which would also mean Ab Initio, Pentaho, JasperETL, Pervasive DataRush, Hummingbird, and others, to further consolidate and optimize data transformation performance via the file system.
Yes. Either through the DataStage sequential file stage or Before-Job Subroutine. With or without the larger Voracity ETL and data management platform subscription supporting CoSort, you would use CoSort as an external data transformation hub, combining large sort, join, aggregate, reformatting, protection, and cleansing functions in a single job script and I/O pass in the file system. Voracity adds the visual ETL design environment and Hadoop execution options around CoSort.
For more information, please see:
You can also leverage AnalytiX DS technology to automatically convert most ETL jobs currently in DataStage to Voracity:
Yes. With or without the larger IRI Voracity ETL and data management platform subscription supporting IRI CoSort, you can use CoSort as an external data transformation hub, combining large sort, join, aggregate, reformatting, protection, and cleansing functions in a single job script and I/O pass in the file system, and just call those jobs into existing Informatica job flows as a command line operation. Voracity adds the visual ETL design environment and Hadoop execution options around CoSort.
For more information, see:
You can also leverage AnalytiX DS technology to automatically convert most ETL jobs currently in PowerCenter to Voracity:
Yes, to a degree.
Voracity's core data manipulation features -- transformation, mapping, masking, and embedded report formatting -- are all built into the SortCL program you can already script and run with your CoSort license to do those things. And, the GUI for CoSort (IRI Workbench, built on Eclipse), is the same GUI Voracity and all subset IRI products in the IRI Data Manager (CoSort, FACT, NextForm) and IRI Data Protector (FieldShield, CellShield EE, RowGen) suites use! With that GUI comes access to a lot of Voracity features through the toolbar menu, like data discovery, ETL flow diagrams, MDM wizards, FieldShield data masking and RowGen test data generation wizards. So you could run a lot of those jobs with your existing SortCL license.
As a CoSort licensee, however, you are only entitled to IRI support for the feature-functions documented in the CoSort manual (and in particular, the SortCL language reference chapter), and for GUI operations based on jobs created from wizards in the "CoSort" (stopwatch icon) and "IRI" menus in the IRI Workbench toolbar. That is, you would normally be confined to the materials for CoSort users in the Welcome section and support content for CoSort in the help menu.
Voracity subscribers, on the other hand, are entitled to support from IRI for all the feature-functions and menu items exposed in the IRI Workbench GUI, including data profiling, masking, test data, ETL, MDM, CDC, slowly changing dimensions, metadata management, and BIRT integration. Voracity users can also get IRI support on optional upgrades with Voracity-compatible software like AnalytIX DS Mapping Manager, IRI FACT, running SortCL jobs in Hadoop, predictive analytic reports using SortCL and BIRT, and the Paques self-service, shard-powered BI module.
For a complete list of the available features and support provided with all IRI software, please refer to this comparison page.
CoSort's SortCL program has built-in filtering and selection logic to reduce bulk, segment, and scrub data during or after processing.
For more advanced data cleansing and quality operations, SortCL allows you to plug in your own function libraries to perform custom transformations at the field level, before or after sort, join, merge, or report processing. SortCL ships with a sample template: a Melissa Data address standardization object that cleanses the address field as records are output.
For more information, see:
CoSort SortCL, IRI Voracity, and other IRI software programs (including FACT, NextForm, FieldShield, and RowGen) can all run from the command line and thus be scheduled into batch streams with cron, Stonebranch Universal Controller (UAC), Cisco TES, CA Autosys, ASCI ActiveBatch and similar applications. PoCs were done with Oracle DBMS_Scheduler and UAC and Full 360 metaController. See:
From within the IRI Workbench -- the free Eclipse GUI supporting all IRI software -- a Task Launch Scheduler is built in. See:
The company is IRI, Inc. IRI stands for Innovative Routines International. IRI was founded in New York in 1978 as Information Resources, Inc. and changed its name during the relocation to Florida in 1995 (where the name Information Resources was already in use in Florida).
We are not affiliated with another IRI -- the Chicago firm best known for retail market research called Information Resources, Inc., but ironically, they are an IRI CoSort licensee of record.
CoSort is IRI's best known software product for data management and manipulation. The primary operational facility and user interface in the CoSort package is the Sort Control Language (SortCL) program.
SortCL refers to both the executable and the 4GL syntax for job scripts that contain both data definition and manipulation statements. The SortCL program has given rise to several IRI spin-off products that use the same syntax for data definition, but support a lower-cost, targeted functional subset of manipulation commands. Such SortCL-compatible, fit-for-purpose IRI products include: NextForm for data migration, FieldShield for sensitive data masking, and RowGen for test data generation.
Voracity is IRI's new, "total data management" platform that includes CoSort and all the SortCL spin-off product capabilities, as well as new data discovery (profiling) tools, visual job design features for DW/BI architects, DBAs and business users, plus: Hadoop engine options, cloud and big data connectors, and data quality and MDM wizards, and multiple analytic frameworks.
IRI Workbench is the free, common Eclipse GUI for designing, deploying, and managing all IRI software jobs.
Until recently, IRI's primary focus and product line has served back-end systems that not many talk about. Doing the heavy lifting for other people's products and operations (including Accenture, CGI, Cincom, Dell, Epsilon, Sabre, Sungard, and Unicon) has made IRI more of a silent partner than the total solution provider that the company's software stack proves it to be.
In actuality, IRI is also a prominent enterprise software vendor and well known to many large companies worldwide (like American Airlines, Bank of America, Comcast, Disney, EDS, and Fidelity Investments), as well as the consultancies that serve them. Both customers and analysts following IRI note that the company is not venture-backed and invests far more in R&D than it does in marketing, providing far more value to customers.
Gartner is following IRI software in their data integration, data masking, legacy migration, test data, and business intelligence research. IRI is also gaining recognition in the areas of big data (with and without Hadoop), data quality, metadata management, master data management, and unstructured data.
Yes, and yes.
From IRI's support staff, its international representatives, and expert independent consultants. Call or email us before, during, or after your evaluation to get directed to the right resource. Because it is in IRI's best interests for your business and technical goals to be met through the use of IRI software, we will collaborate with you in the development and maintenance of successful solutions.
We will. So will more than 40 international support offices. And there are many third-party consultancies familiar with IRI software who can help you implement, optimize, and support applications around the tools.
More importantly, you will find it surprisingly easy to support yourself. One of the most compelling aspects of IRI software is it's simplicity. All the data and job-related metadata are shared, exposed, self-documenting, and easy to modify and extend. There is also a familiar Eclipse GUI automating script creation, integration, execution, and management.
Because they are lower. See the answer to the question above about why IRI is not better known in certain circles.
Being part of so many other applications has not required the marketing overhead of our competitors. IRI has also chosen not to be a public company, and has not been servicing external investors or debt. Our customers continue to benefit from these savings.
Call or email us for a quote -- or use this form -- knowing the product(s) you want to license after successful evaluation. Price is based on what and where you run (see the pricing FAQ). IRI accepts purchase orders and credit cards.
Yes, it is available. There is normally an upcharge to standard annual support to provide 24/7 support worldwide.
Maintenance is free in the first year after licensing. For those users covered under maintenance, minor new releases are provided free upon request or if needed for support. Major new releases are also optional, but usually chargeable upgrades. The cost of the upgrades depends on your maintenance level.
Because you are buying a perpetual-use license, once. Support is optional and you are not forced to upgrade (even though you should eventually). If you are using Voracity on a subscription basis, however, you can lock in the license and support cost for five years through a discounted up-front payment.
If you are executing Voracity jobs in Windows, Linux, or other Unix file system using the default CoSort/SortCL (or subset) program, that executable can send event (job status) messages to the (CLI or GUI) console at various verbosity level (most basic shown below).
If you are executing Voracity jobs in Hadoop MR2, Spark, Storm, or Tez, the HDFS job inventory indicates status:
We believe that Hadoop provides for this automatically. Default CoSort/SortCL-based executions only allow for pause/resume in the event of insufficient soft overflow (temp) file space.
There are several logs generated, including SortCL app stats, error logs, and a self-appending runtime performance file. There is also an optional XML audit log file generated from each run showing the contents of the script and environment details:
CoSort is a robust, commercial-grade software package for efficiently manipulating and managing high volumes of data. More specifically, it is a sorting, data transformation, migration and reporting package that addresses a very wide range of enterprise and development-related challenges in data integration, data masking, business intelligence, and ancillary disciplines. Please review the product description and solution sections of this web site for details.
For the purpose of this simple sort/merge question, CoSort stands for Co-routine Sort, first released for commercial use
CoSort exploits parallel processing, advanced memory management and I/O techniques, as well as task consolidation and superior algorithms, to optimize data movement and manipulation performance in existing file systems. No paradigm shifts to database engines, NoSQL, Hadoop, or appliances are necessary. Maybe a little more RAM, but that's usually enough ...
Very. Performance varies by source sizes and formats, data and job orientation, hardware configuration and resource allocation, concurrent activity and application tuning. The best benchmarks (e.g. 1GB in 12 seconds, 50GB in 2 minutes) run in memory on fast, multiple CPU Unix servers.
When you perceive a bottleneck, which could be starting at 500K to 50M rows depending on your hardware. CoSort sorts routinely in the terabyte range -- and scales linearly in volume without Hadoop. Input files in the dozens or hundreds of gigabytes are now common. Any number of input and output files -- and structured file formats -- are simultaneously supported, including line, record and variable sequential, blocked, CSV, I-SAM, LDIF, XML (flat), and Vision. For a list of supported data sources, click here.
Usually through a CoSort Resource Control (cosortrc) text file, which can be global, user, and/or job-specific. On Windows, default registry settings also set up at installation time and can be overridden by an rc file. You can specify a ceiling and floor on CPU/core threads and memory, I/O buffers, and allocate/compress disk space for sort overflow. There are several other documented job controls also specified at setup, and easily modified (or secured) later.
CoSort users can display runtime information before, during, and after execution through:
• optional on-screen display levels
• self-appending and replacing log files
• application-specific statistical files
• and, a full audit trail for various compliance and forensic requirements
More than 120 now, and counting. These includes single and multi-byte character sets, Unicode, C, COBOL, and mainframe numerics. Contact IRI to help obtain a definition if you are not sure what you have. Moreover, CoSort supports the (simultaneous) collation, conversion, and creation of more than two dozen file formats.
They differ, and are based on collaboration and feedback with partners and customers who own their data and job definition metadata.
OMIT COND=((63,1,CH,NE,C’ ‘))
The translation gives me this:
/FIELD=(field_1, POSITION=1, SIZE=20)
/FIELD=(field_2, POSITION=40, SIZE=3)
/FIELD=(field_0, POSITION=63, SIZE=1, EBCDIC)
/CONDITION=(cond_0, TEST=(field_0 != ” “))
/FIELD=(field_1, POSITION=1, SIZE=20)
/FIELD=(field_2, POSITION=21, SIZE=3)
This is fine, except that I don’t want the lines that have $SORTIN or $SORTOUT. How can I suppress these lines?
A. There are no specific mvs2scl options that will do this. But there are options with the grep command that can be used while executing the translation. Here is what you should execute on the command line.
mvs2scl job1.mvs | grep -v ‘$SORT’ > job1.scl
job1.mvs is the mvs script that is to be translated. job1.scl is the translated script for sortcl without the $SORTIN or $SORTOUT lines. The -v option with grep says to only output lines that do not contain the expression $SORT
Either use the CoSort Sort Stage PlugIn or SortCL alongside DataStage in the file system for large file sort, join, and aggregate transformations. You can then direct the CoSort-processed results back into the ETL engine or a DB load utility.
We have hundreds of tables and files, and thousands of fields already defined for processing in DataStage. How can we exploit CoSort on flat files and leverage existing field layouts (i.e. not re-define them manually)?
Use the Meta Integration Model Bridge (MIMB) from .dsx to CoSort SortCL DDF to automatically convert the metadata. For more information, visit http://www.metaintegration.net/Products/MIMB/Specifications/MIRIriCoSortSortClExport.html
To speed Sorter Tx operations seamlessly, use either CoSort's unique Advanced External Procedure (AEP, in PowerCenter v7) or Custom Transform (CT, in PowerCenter v8). For sorting AND other high volume transformations, run CoSort 'SortCL' program scripts (via command line, batch, GUI, API) alongside PowerCenter in the file system. This is appropriate for large file sort, join, and/or aggregate transforms (since SortCL can combine them in the same job script and I/O pass). You can then direct the CoSort output data back into the ETL stream, to a file, DB loader, etc.
We have hundreds of tables and files, and thousands of fields already defined for processing in Informatica. How can we exploit CoSort on flat files without and leverage existing field layouts (i.e. not re-define them manually)?
Use the Meta Integration Model Bridge (MIMB) from .xml to CoSort SortCL DDF to automatically convert the metadata. For more information, visit http://www.metaintegration.net/Products/MIMB/Specifications/MIRIriCoSortSortClExport.html
You can enforce field-level security at the very same time that you're using the same tool (CoSort's SortCL) for high-volume transformation and reporting jobs that an ETL or BI tool cannot do as efficiently. You can thus use one tool to accomplish multiple goals in one pass. Alternatively, you can run SortCL on flat files just to protect certain fields, right alongside your other tools that will transform or present the data (before of after SortCL protected it). This lets you:
- use your existing code
- protect only the fields needing security
- still keep both protected and unprotected data available to the routines and systems that need to access it.
Both! FieldShield and CoSort's SortCL program give you the ability to protect both kinds of data sources simultaneously with one or more field-level security functions. These IRI products can address either source type in bulk (static data masking) or surgically (dynamic data masking) through filter command or customized stored procedure calls.
Of course, some databases have built-in column encryption. But their approach may be cumbersome or limiting for a variety of reasons, such as:
- You need to protect multiple databases other sources or data in motion, like flat files and a single platform or method will not address, or be compatible with, enterprise needs
- Built-in DB encryption libraries may also be too slow, costly, or complex to implement
- They are limited to a single encryption methodology that may or not conform to security or appearance requirements
- You may also need to leave your data as-is while it's in the database, but protect it while it's moving into or out of the database. That's where flat files come in. Data is often in a flat file format as it goes in or out of your databases.
Other encryption products protect an entire file, database, disk, computer, or network to protect sensitive data moving through your systems. However, encrypting more than the fields that matter can take a long time, and cut off your access to the non-sensitive data that still needs to be accessed and processed. FieldShield and CoSort's SortCL can encrypt (or otherwise protect) only those fields/columns that need it, and can do it in the same job scripts and I/O passes with big data transformation, migration, and reporting.
It depends on the version of Excel (not your O/S) that is running.
How can I tell whether I'm running a 32-bit or 64-bit version of Microsoft Office?
FieldShield and CoSort ship with multiple 128 and 256-bit encryption libraries using proven, compliant 3DES, AES, GPG and OpenSSL algorithms. For each field, you can use the same or different built-in encryption routine, or link to your own encryption library and specify it as a custom, field-level transformation function in a job script. You can also use the same algorithm(s) and a different encryption key for each field as well.
After data governance efforts have identified the files and fields at risk (usually via ETL or modeling tools like those from Exeros or GlobalIDS), you can declare the specific field-level protection functions in a FieldShield or SortCL script that makes sense for you. Both tools deliver field-protected views of sensitive data (like social security or phone numbers, salaries, medical codes) in ODBC-connected database tables and sequential files, through many techniques, including:
- field filtering (redaction)
- masking and obfuscation
- secure encryption and decryption
- anonymization and pseudonymization
- de-identification and re-identification
All FieldShield and CoSort/SortCL job scripts and field-level functions can be recorded in XML audit logs that you can secure, and query with your preferred XML reporting tool. You can also use SortCL scripts (n.b. samples are provided, where /INFILE=$path/auditlog.xml /PROCESS=XML, etc.) against these audit logs for reporting.
- Expression (calculation/function) logic
- Substring and byte shifting
- Data type conversion
- Custom function
The decision criteria for which protection function to use for each datum are:
- Security - how strong (uncrackable) must the function be
- Reversability - whether that which was concealed must later be revealed
- Performance - how much computational overhead is associated with the algorithm
- Appearance - whether the data must retain its original format after being protected
IRI is happy to help you assess which functions best apply to your data.
Note also that you can protect one or more fields with the same or different functions, or protect one or more records entirely ("wholerec"). In each case, the condition criteria and targets/layout parameters can also be customized, and combined with data transformation and reporting in the same job. And, in fit-for-purpose multi-table wizards, or through global data classification, DBAs and data stewards can apply these protections as rules to preserve consistency and referential integrity database or enterprise wide.
Apart from the many data warehouse architects who use CoSort's SortCL tool and find its flat-file approach faster than SQL procedures and ETL tool steps, many experts also acknowledge the efficiency of flat-files in high volume data staging:
"The first system for which the data warehouse is responsible is the data staging area, where production data from many sources is brought in, cleansed, conformed, combined, and ultimately delivered to the data warehouse presentation systems ... The two dominant data structures in the data staging area are the flat file and the entity/relationship schema, which are directly extracted or derived from the production systems."
from "The Foundations for Modern Data Warehousing" by Ralph Kimball, appearing in Intelligent Enterprise Magazine's Data Warehouse Designer Section.
By using flat files, the SortCL program in the IRI CoSort product, and the default engine of the IRI Voracity data management (and integration) platform bypasses the ETL tool overhead of DB connectivity and transformations. SortCL exploits O/S-level file I/O, multiple CPUs and cores parallel (multi-threaded) data manipulation, and many proprietary speed techniques. Flat files also allow the tool to rapidly combine data transformation, reporting, protection and prototyping functions in the same job and I/O pass.
With SortCL you can simultaneously filter, cleanse, standardize, transform, protect, and report on your massive collections of data. Compare this consolidation potential with how you get work done now.
Put another way, because of SortCL, if you use CoSort or Voracity, you get: one product, place, pass, and price. vs. the complexity, cost and time of your status quo (multiple products, places, passes and prices).
Through it's Sort Control Language (SortCL) program, users of the IRI CoSort product or IRI Voracity platform can leverage the resources of their existing file systems to perform these kinds of jobs without the overhead and administrative constraints of databases and SQL procedures -- not to mention the cost of megavendor ETL tools and ELT appliances, in-memory DBs, or complex Apache projects.
Through SortCL, you can perform and combine many of these activities simultaneously against multiple data sources of any size:
- Data Transformation > select, sort, merge, join, aggregate, re-map, pivot, cross-calc, etc.
- Data Cleansing > enrich, evaluate, filter, reformat, and validate data across disparate sources
- Data Governance > manage and mask master data, manage metadata, improve data quality
- Data Migration > remap file formats, data types, endian states, record formats
- Data Replication > copy, shift, enrich, and re-purpose data from one or formats/platforms into others
- Data Federation > virtualize ad hoc mash-up views and formatted reports, or feed direct BIRT displays via ODA
- Data Masking > de-ID, encrypt, hash, pseudonymize, randomize, redact, tokenize and otherwise obfuscate fields
- Data Presentation > get 2D BI via detail, delta, and summary reports in custom formats, even with embedded HTML
- Data Franchising > filter, pivot, transform, and segment data into CSV, XML and ODBC hand-offs for BI tools
- Data Staging > scrub and prepare bulk data for other ETL tools, databases, data and spreadmarts, and analytic platforms
- Data Prototyping > generate safe, intelligent, and referentially correct DB, file, and custom-report-formatted test data
So, to process, present, protect, and prototype big data, SortCL and flat files are still the fastest, and most cost-effective approach to consolidating information lifecycle management (ILM) activities. And if you keep your data in HDFS, many of the core data transformation, masking, and test data generation functions designed in SortCL can run in MapReduce 2, Spark, Spark Stream, Storm or Tez through Voracity's Hadoop gateway (called "VGrid") without re-coding anything!
Dumping the data to a flat file or pipe into CoSort without qualifiers encumbering the unload can help. So, save the order by, group by, distinct and join work for CoSort -- which can handle them all (at once) faster in the file system, and give you formatted reports in the same IO.
CoSort's SortCL supports all SQL aggregate functions, including sum, average, count, maximum and minimum. But SortCL is more efficient in that it can sort on multiple keys and produce aggregate results for one or more output files in the same pass through the data, off-line.
On an ia64 hp server rx5670 with four 1GHz Itanium2 CPUs and 32GB of RAM, Oracle 9i's SQL*Plus joined two 1G tables in 48 minutes. Unloading the same tables with the IRI Fast Extract (FACT) tool, piping these to flat stream sorts and joins in the CoSort SortCL program, and then piping the result into SQL*Loader built the same joined table in 18 minutes (or, ~1/3 the time of the on-line method).
Direct path, pre-sorted loads are the fastest way to build new tables since this loading method bypasses the overhead of Oracle's index sort. For bulk loads, CoSort the data first on the primary index key; the create index will bypass the sort step. For regular insertion loads, CoSort the data on the clustered index as the key.
Through database-specific APIs and parallel unloading techniques that produce portable flat files. For interface details, and to request a brochure, white paper, webinar or trial, go to Products > FACT
CoSort (or Voracity) SortCL job scripts can do many of the same jobs much faster, and with far less coding. SortCL runs outside the database on flat file inputs, using the same relational logic and functions; i.e. SELECT WHERE, DISTINCT, ENCRYPT, ORDER BY, GROUP BY, JOIN. For example, CoSORT's SortCL uses conditional /INCLUDE and /OMIT statements to select from sequential input sources and output targets. DISTINCT is similar to /NODUPLICATES, ORDER BY is a /KEY, GROUP BY may be a /SUM, /AVERAGE, etc.
We have hundreds of tables and thousands of columns defined in tables, though these are also represented as external files. How can we leverage SortCL without having to re-define these layouts by hand?
Like FACT, both the AnalytiX DS Mapping Manager or Meta Integration Model Bridge (MIMB) from Meta Integration Technology, Inc. (MITI) can automatically convert the file layout metadata used in your relational or ETL tool into CoSort/SortCL data definition file (DDF) format.
- IRI Workbench, the graphical IDE built on Eclipse, is currently free on Windows and Linux platforms.
- IRI FACT (VLDB unload) and IRI RowGen (test data generation) are priced per hostname according to the number of licensed CPU cores (threads) licensed for use on each.
- IRI FieldShield (data masking) is seat based. IRI NextForm has multiple editions and prices.
- IRI Voracity is typically a subscription (one or five years) based only the total number of SortCL-executing hostname licenses, which are usually database or ETL servers. There are additional charges for Hadoop and other integrated (premium feature) options available through our partners.
Refer to the licensing information on their product description pages for price ranges. Contact your IRI representative for more information and an NDA-confidential quotation for the use of any IRI software product, or for an IRI Professional Services engagement estimate.
Per-copy prices, which include the SortCL tool, typically range from the low 4 to 5 figures (USD) and are based on hardware (e.g. RAM, model number) and CPU/core configurations. Contact your IRI representative for a perpetual-use estimate in your environment.
There are discounts for successive licenses ordered at the same time, limited or expiring usage, runtime integration, and GSA schedule procurements. Your company may also have an umbrella (site), or distribution (royalty) agreement in place with pre-negotiated discounts.
Specific and final price quotes for your organization can be provided under a signed confidentiality agreement.
It may be included in your license fee in the first year, and then become an annual option at a percentage of the license fee that reflects the desired level of support and upgrades.
For example, annual support at U.S. sites can be renewed for as little as 15% of the license fee basis for the hardware. This level entitles you to minor updates and license transfer credits, and a 90% discount on major new releases. At the 20% level, major new releases are included, as is first (local) and second (IRI) level support. All levels include minor or intra-version updates, and credits for license transfers and lifts.
If support at either level lapses, a major new release in later years can be licensed to existing users at 50% of its then current license fee, which includes another of support. 24/7 support is an additional premium that only some sites request or require.
The software does not expire. IRI's default business model allows you to keep running forever, with or without support. So, you do not have to pay us all over again at the end of some multi-year term, unless you have specifically requested and are paying us to lease instead.
The CoSort package comes with all the tools, conversion utilities, and callable libraries, as well as full .pdf documentation for the executables and APIs.
When a CoSort package is installed, default system tuning parameters are automatically configured to exploit available RAM and CPU resources. Users can manually adjust their tuning parameters (including the assignment of sort overflow disks, and audit logs) in a simple text file.
Usually through system-specific license keys. Call us to describe your situation.
That depends on what component(s) you choose to apply, and the extent of functionality you need. For a basic sort, it may take as long to choose an interface as to implement one. Implementing custom user input, compare, or output routines, or write custom, field-level transforms (i.e. adding your own cleansing or statistical functions for individual fields) will take longer.
From USD$150 to $15K per copy, or site, when embedded and redistributed within your application. It depends on what piece you integrate, how many and how quickly you distribute them, plus the (average) size of end-user hardware and cost of your application. We aim for fair licensing within your business model.
Actually, it can be any (combination) of the above.
You can choose to invisibly embed, re-license, or refer a full or partial CoSort package. Partial options include:
• Standalone interfaces: sorti, sortcl
• Third-Party Sort PlugIns: e.g., Unix sort, SAS, etc.
• Runtime libraries: cosort_r(), sortcl_routine()
That depends on your current methodology, hardware, kernel and CoSort tuning, data volume, and job specs. You can measure it during a free, confidential trial.
If so, less than you would for production. Call IRI to discuss your situation.
IRI Voracity's new Data Unification wizard follows the consolidation style of MDM, and allows you to compare, reconcile, and bucket new master values using fuzzy matching logic.
Alternatively, you can manually move the master data to a central place and metadata repository where it is easier to clean, manage risk, and support a SOA chain. Define the files and fields in the CoSort SortCL tool readable format and scan for naming and attribute discrepancies. Create SortCL jobs to integrate and standardize these files, and leverage custom, field-level transformation functions to cleanse the data according to your own business rules. Once prepared in the file system via rapid CoSort SortCL operations, you can then re-populate master data tables as needed. SortCL join operations can also filter and report on changed master data that are in files.
As we move to a centralized MDM model, all our critical metadata may sit in one vulnerable basket. Our trusted internal systems and processes touch this data. But we also need to make it available to outsourcers and for testing. What can we do to balance need-to-know and access/user requirements?
CoSort's SortCL tool can apply the necessary security filters to help you enforce your need-to-know rules, and to encrypt and anonymize specific file data at risk before it is exposed. You must mask your master data at the field level if you need to test or outsource this data -- or provide views to different departments -- and leave less-sensitive data still accessible.
All CoSort (SortCL) application scripts (whether for data transformation, reporting, protection or prototyping) are parsed and saved with runtime information in audit logs that you can secure and query with your preferred XML application or SortCL itself.
The SortCL program in IRI CoSort can flatten and process some hierarchical data, including mainframe index files, to prepare for DB loads and business intelligence tools. SortCL can also join product data with customer transactions (as well as other files with common keys) and output custom reports. These reports can include aggregation and cross-calculation of the detail and summary data produced. SortCL can also perform field-level lookups to find discrete values, and populate newly formatted files.
RowGen v3 pricing starts well below $10K for perpetual use, and increases proportionally with additional CPU cores that you may wish to exploit. Maintenance is free in the first year and then 15- 20% of your license fee, depending on the level of upgrades.
It reads popular data model and file layout metadata to formats the rows of test data into precise flat file, reporting, and database table structures for DB populations, and preserves the referential integrity of those tables. RowGen is a complete solution for test data, as it allows you to be simultaneously:
- Generating Huge Volumes in Parallel
- Applying Custom Rules and Ranges
- Using DB Data Models and Metadata
- Preserving Realism and Relationships
- Transforming, Segmenting, and Reporting
- Formatting in Custom File Layouts
- Auditing Jobs to Verify Compliance
Yes, as long as your XML files describe structured, sequential records. The output can be another flat XML file, or any another file type that SortCL supports.
Yes, if the XML file contains structured, sequential records, and the output process type is documented in SortCL.
Yes, if those other process types are documented in SortCL. Any valid output targets can also be custom-formatted and protected for reporting, hand-offs, and outsourcing.
Yes, subject to the format limitations defined above.
No, unless you run out of disk space.
At the field-level, you cannot link custom transformations that require file-level information, such as statistics. However, SortCL already supports file and record-level aggregation, plus cross-calculation, for detail and summary reporting.
SortCL sends the field data to the routine, which then sends it back in the transformed state, formatted per the other attributes in the output field description.
Write or license a library and specify it at the top of a SortCL job script (which can also be invoked in a thread-safe API call). In the /INREC or /OUTFILE phase of the script, define the library as a field attribute.
Yes, through a resource control file that is configured at set up time by the system administrator. This text file can be modified at any time for reference at a global, user or job level.
Not yet. Please tell us if you need this, and by when.
SAS documents the CoSort option in the v7 and 8 system, which is reflected in the SAS and CoSort user manuals. For SAS 9, contact SAS, as they have to update the appendage for CoSort and have not yet done so. IRI has made repeated requests for SAS to update their CoSort connection. SAS is now requiring that you make your request known directly to them and IRI at the same time. Thank you for helping us help you and others who need this interface.
For CoSort's Sorti, SortCL, default (non-MIMB) SortCL Metadata converters, APIs, included Sort PlugIns, and other utilities, yes. The IRI Workbench (CoSort GUI) requires a reasonably current 32-bit JRE installation.