Yes. CoSort offers a range of solutions for generating business intelligence from huge volumes of data. You can use CoSort's SortCL (4GL) tool as a standalone report generator, or as a staging tool to digest and hand-off large volumes of data.
Not only does SortCL transform and protect huge volumes of disparate data from a variety of RDBMS sources, and sequential or index files, it can also join, aggregate, calculate, and display that data in custom detail and summary report formats, complete with special variables and tags for web pages.
By creating output in .CSV and .XML formats, CoSort's SortCL tool can directly populate spreadsheets like Excel, databases, ETL tools, and BI tools like BIRT (and ActuateOne), SAP Business Objects, IBM Cognos, Microstrategy, DI-Diver from Dimensional Insight, and iDashboards from iViz. ODA driver support in the IRI Workbench GUI (built on Eclipse) will provide direct data and metadata flows between data preparation and presentation.
For more information, please see:
Any and all BI Tools that can import CSV and flat XML files, or RDBMS tables, that CoSort populates. CoSort's job in these cases is to prepare or "franchise" big, structured data in a centralized place. CoSort users create Sort Control Language (SortCL) programs in 4GL text scripts or through wizards in the free IRI Workbench GUI, built on Eclipse.
CoSort's SortCL tool can directly populate spreadsheets like Excel, databases, ETL tools, and BI tools, like BIRT (and ActuateOne), SAP Business Objects, IBM Cognos, Microstrategy, DI-Diver from Dimensional Insight, and iDashboards from iViz. ODA driver support in the IRI Workbench GUI (built on Eclipse) will provide direct data and metadata flows between CoSort/SortCL data preparation and BIRT/ActuateOne presentation.
For more information, please see:
More than before, and getting closer to those you'd think of first. Traditionally CoSort has been an ETL engine only, specializing in high-performance (I/O-consolidated, multi-threaded) Transformations like:
In addition, CoSort -- and in particular, its Sort Control Language (SortCL) program -- can also handle:
- slowly changing dimensions
- fuzzy logic lookup tables
- pivoting (normalization and denormalization)
- running, ranking, and windowed aggregates
When combined with IRI's metadata-compatible Fast Extract (FACT) and bulk database Load utilities in the free IRI Workbench GUI built on Eclipse, CoSort can provide:
- big data integration and staging
- data and database migration
- change data capture
- business intelligence (see BI FAQ and blog.iri.com)
- dynamic and static data masking and encryption
- metadata and master data management
- project sharing/version control
- robust test data generation
- classic (offline) DB reorgs
CoSort's SortCL program uses simple text scripts to define and interchange many different source and target layouts and manipulations in many different formats. CoSort in the IRI Workbench GUI also offers direct GUI-driven DB connectivity and visualized task and column mappings indigenous to major ETL suites. IRI is working on built-in job scheduling, visual workflow, data lineage and metadata impact analysis. CoSort also has special plug-ins for, and/or works hand-in-hand with, major ETL tools, including Ab Initio, Data Stage, Informatica, Kalido and Terastream to speed up high volume transformations and loads. IRI continues to grow CoSort's from its roots as the leading open systems sort package into a visionary platform for data integration, data masking, test data, and data migration.
Perhaps the best answer starts with what CoSort's not got:
With CoSort, there is no need for:
|separate transforms or transform stages||combines filter, sort, join, aggregate, pivot, remap, custom and other transforms in the same job script and I/O pass.|
|partitioning, manual or otherwise||automatically multi-threads and uses other system resources only your resource controls limit, and does not push transformations into the database layer where there are inherently less efficient.|
|cumbersome data definition||provides automatic metadata discovery and format conversion tools, and is supported by MIMB|
|separate BI tools||can produce custom-formatted details and summary reports in the same job script and I/O pass with all the transforms, and/or hand off data to files, tables, or ODA buckets in Eclipse for direct BIRT use!|
|separate data masking tools||includes every single function in FieldShield, the most robust data masking and encryption tool available.|
|separate test data producers||all the functions of RowGen, which can generate safe (no need for production data), intelligent (realistic and referentially correct) test data for DB, file, and report targets|
|long-term consulting||uses an already familiar Eclipse GUI and metadata defining both data and ETL processes|
|complex MDM hubs||uses an already familiar Eclipse GUI and metadata repositories, and supports: composite data type definitions, master data value lookups, joins, tables and set files suitable for production or test data|
|a new team sharing or version control paradigm||metadata repositories and job scripts work with any source code and metadata version control system, including GIT, CVS or SVN in Eclipse|
|concerns about open source or support||is backed by IRI, a stable 35-year-old company with more than 40 international offices|
|a huge budget now, or a lease renewal headache later||is sold at affordable prices for perpetual use|
Some of the things that CoSort offers that legacy and open source ETL (much less ELT) tools do not are:
- Raw power and scalability without the need for Hadoop; i.e. built-in performance in volume
- A negligible learning curve: simple, explicit, accessible, and open text metadata you can easily use, modify and share
- The ability to deploy jobs outside the GUI, running them via command line, batch, or any program via system or API call
- An open source GUI you already know (Eclipse) that front-ends proven, robust manipulations and reports on big structured data
- Additional aggregation functionality like lead/lag, ranking and running, multiplication and expressions
- Multiple nested layers for both conditions and derived fields with support for PCRE's, C (math/trig) functions, locale and 'conversion specifiers'
- Composite data value definition for both production data (format masking) migration and test data generation
- Built-in: reporting, data and DB migration, 10 field protection categories, safe test data generation, JCL select/sort/sum parm translation
- Data-centric change data capture, slowly changing dimension, fuzzy logic lookup, and trend reporting
- Seamless metadata integration with Fast Extract (FACT) for major RDBs, and all MIMB-embedding platforms
- Superior price-performance, fast ROI, and immediate access to US-developer support
Yes, both. As far back as 1999, industry experts have been touting CoSort as an ETL engine for its high-performance data staging and integration capabilities. CoSort - and it's SortCL program in particular - performs the heavy lifting of selection, transformation, reporting, and pre-load sorting against sequential files in an ODS, DW staging area, or on extracted tables in suspense.
CoSort's SortCL is a push-down optimization option for Informatica PowerCenter, and in the sequential file stage of IBM DataStage, to perform faster, combined (single-pass) sort, join, and aggregation operations. Click here to read the press release about CoSort's 6x improvement of Informatica speed.
Besides proven integrations with, and plug 'n play sort replacements for, DataStage and Informatica, CoSort also links to Kalido, ETI, SAS and TeraStream ETL packages. CoSort's SortCL programs can be called as an executable from any tool allowing that as well, which would also mean Ab Initio, Pentaho, JasperETL, Pervasive DataRush, Hummingbird, and others, to further consolidate and optimize data transformation performance via the file system.
Yes. Either via CoSort's Server Edition sort stage "plug-in" or using CoSort's SortCL through the DataStage sequential file stage for external transformations (i.e. large sort, join, aggregate, reformatting, protection, and cleansing functions) in the file system.
For more information, please see:
For more information, please see:
CoSort offers a range of solutions for generating business intelligence from huge volumes of data. You can use CoSort's SortCL (4GL) tool as a standalone report generator, or as a staging tool to digest and hand-off large volumes of data.
Not only does SortCL transform and protect huge volumes of disparate data from a variety of sequential or index file sources, it can also join, aggregate, calculate, and display it in custom detail and summary report formats, complete with special variables and tags for web pages.
By creating output in .CSV and .XML formats, CoSort's SortCL tool can directly populate spreadsheets, databases, ETL tools, and other BI tools.
For more information, please see:
CoSort's SortCL tool features built-in filtering and selection logic to reduce bulk, segment, and scrub data during or after processing.
For more advanced data cleansing and quality operations, SortCL allows you to plug in your own function libraries to perform custom transformations at the field level, before or after sort, join, merge, or report processing. SortCL ships with a sample template: a Melissa Data address standardization object that cleanses the address field as records are output.
For more information, see:
CoSort SortCL operations can run from the command line and thus be scheduled into batch streams with cron, Appworx, Tidal and similar applications. A PoC was done with Oracle DBMS_Scheduler and Appworx. For an elaborate, dependent scheduler that works with other ETL software, consider Full 360's metaController, which can also offer specific hooks to SortCL job scripting parameters.
The company is IRI, Inc. IRI stands for Innovative Routines International. The company was founded in New York in 1978 as Information Resources, Inc. and changed its name during the relocation to Florida in 1995 (where the name Information Resources was already in use in Florida).
CoSort is IRI's first product and IRI is well known for it. See the products and company sections of this web site for more information on both.
Until recently, IRI's primary focus and product line has served back-end systems that not many talk about. Doing the heavy lifting for other people's products and operations (including Accenture, CGI, Cincom, Dell, Epsilon, Sabre, Sungard, and Unicon) has made IRI more of a silent partner than the total solution provider that the company's software stack supports.
In actuality, IRI is also a prominent enterprise software vendor and well known to many large companies worldwide (like American Airlines, Bank of America, Comcast, Disney, EDS, and Fidelity Investments), and the analysts that serve them. Both the customers and analysts following IRI note that the company is not venture-backed and invests far more in R&D than it does in costly marketing, providing far more value to its customers than its own fame.
Gartner is following IRI software in their data integration, data masking, legacy migration, test data, and business intelligence research. IRI is also gaining recognition in the areas of big data (without Hadoop), data quality, metadata and master data management. IRI will soon add database security (DAP) and Excel add-in software, too, exposing the company to larger audiences.
Yes, and yes.
IRI's support staff and international partners will show you - and point you to examples of - how to use the software to meet your requirements. Call or email us before, during, or after your evaluation. It is in IRI's best interests to collaborate with you in the development and maintenance of a successful solution.
We will. So will more than 40 international support offices. And there are many third-party consultancies familiar with IRI software who can help you implement, optimize, and support applications around the tools.
More importantly, you will find it surprisingly easy to support yourself. One of the most compelling aspects of IRI software is it's simplicity. All the data and job-related metadata are shared, exposed, self-documenting, and easy to modify and extend. There is also a familiar Eclipse GUI automating script creation, integration, execution, and management.
They are lower. See the answer to the question above about why IRI is not better known in certain circles.
Being part of so many other applications has not required the marketing overhead of our competitors. IRI has also chosen not to be a public company, and has not been servicing external investors or debt. Our customers continue to benefit from these savings.
Call or email us for a quote based on the product(s) you want to license after successful evaluation. Price is based on what and where you run (see the pricing FAQ). IRI accepts purchase orders and credit cards.
Yes, it is available. There is normally an upcharge to standard annual support to provide 24/7 support worldwide.
Maintenance is free in the first year after licensing. For those users covered under maintenance, minor new releases are provided free upon request or if needed for support. Major new releases are also optional, but usually chargeable upgrades. The cost of the upgrades depends on your maintenance level.
Because you are buying a perpetual-use license, once. Support is optional and you are not forced to upgrade (even though you should eventually).
CoSort is a robust, commercial-grade software package for efficiently manipulating and managing high volumes of data. More specifically, it is a sorting, data transformation, migration and reporting package that addresses a very wide range of enterprise and development-related challenges in data integration, data masking, business intelligence, and ancillary disciplines. lease review the product description and solution sections of this web site for details.
For the purpose of this simple sort/merge question, CoSort stands for Co-routine Sort, first released for commercial use
CoSort also exploits parallel processing and advanced I/O techniques.
Very. Performance varies by source sizes and formats, data and job orientation, hardware configuration and resource allocation, concurrent activity and application tuning. The best benchmarks (e.g. 1GB in 12 seconds, 50GB in 2 minutes) run in memory on fast, multiple CPU Unix servers.
When you perceive a bottleneck, which could be starting at 500K to 50M rows depending on your hardware. CoSort sorts routinely in the terabyte range -- and scales linearly in volume without Hadoop. Input files in the dozens or hundreds of gigabytes are now common. Any number of input and output files -- and structured file formats -- are simultaneously supported, including line, record and variable sequential, blocked, CSV, I-SAM, LDIF, XML (flat), and Vision.
Usually through a text file, called cosortrc, which can be global, user, and/or job-specific. On Windows, a default registry settings also set up at installation time. You can specify a ceiling and floor on CPUs and memory, I/O buffers and threads, and allocate disk space for sort overflow. There are several other documented job controls also specified at setup, and easily modified (or secured) later.
CoSort users can display runtime information before, during, and after execution through:
• optional on-screen display levels
• self-appending and replacing log files
• application-specific statistical files
• and, a full audit trail for various compliance and forensic requirements
More than 120 now, and counting. These includes single and multi-byte character sets, Unicode, C, COBOL, and mainframe numerics. Contact IRI to help obtain a definition if you are not sure what you have. Moreover, CoSort supports the (simultaneous) collation, conversion, and creation of more than two dozen file formats.
They differ, and are based on collaboration and feedback with partners and customers who own their jobs.
We have hundreds of tables and files, and thousands of fields already defined for processing in DataStage. How can we exploit CoSort on flat files and leverage existing field layouts (i.e. not re-define them manually)?
Use the Meta Integration Model Bridge (MIMB) from .dsx to CoSort SortCL DDF to automatically convert the metadata. For more information, visit http://www.metaintegration.net/Partners/IRI.html
To speed Sorter Tx operations seamlessly, use either CoSort's unique Advanced External Procedure (AEP, in PowerCenter v7) or Custom Transform (CT, in PowerCenter v8). For sorting AND other high volume transformations, run CoSort 'SortCL' program scripts (via command line, batch, GUI, API) alongside PowerCenter in the file system. This is appropriate for large file sort, join, and/or aggregate transforms (since SortCL can combine them in the same job script and I/O pass). You can then direct the CoSort output data back into the ETL stream, to a file, DB loader, etc.
We have hundreds of tables and files, and thousands of fields already defined for processing in Informatica. How can we exploit CoSort on flat files without and leverage existing field layouts (i.e. not re-define them manually)?
Use the Meta Integration Model Bridge (MIMB) from .xml to CoSort SortCL DDF to automatically convert the metadata. For more information, visit http://www.metaintegration.net/Partners/IRI.html
You can enforce field-level security at the very same time that you're using the same tool (CoSort's SortCL) for high-volume transformation and reporting jobs that an ETL or BI tool cannot do as efficiently. You can thus use one tool to accomplish multiple goals in one pass. Alternatively, you can run SortCL on flat files just to protect certain fields, right alongside your other tools that will transform or present the data (before of after SortCL protected it). This lets you:
- use your existing code
- protect only the fields needing security
- still keep both protected and unprotected data available to the routines and systems that need to access it.
Both! FieldShield and CoSort's SortCL program give you the ability to protect both kinds of data sources simultaneously with one or more field-level security functions. These IRI products can address either source type in bulk (static data masking) or surgically (dynamic data masking) through filter command or customized stored procedure calls.
Of course, some databases have built-in column encryption. But their approach may be cumbersome or limiting for a variety of reasons, such as:
- You need to protect multiple databases other sources or data in motion, like flat files and a single platform or method will not address, or be compatible with, enterprise needs
- Built-in DB encryption libraries may also be too slow, costly, or complex to implement
- They are limited to a single encryption methodology that may or not conform to security or appearance requirements
- You may also need to leave your data as-is while it's in the database, but protect it while it's moving into or out of the database. That's where flat files come in. Data is often in a flat file format as it goes in or out of your databases.
Other encryption products protect an entire file, database, disk, computer, or network to protect sensitive data moving through your systems. However, encrypting more than the fields that matter can take a long time, and cut off your access to the non-sensitive data that still needs to be accessed and processed. FieldShield and CoSort's SortCL can encrypt (or otherwise protect) only those fields/columns that need it, and can do it in the same job scripts and I/O passes with big data transformation, migration, and reporting.
FieldShield and CoSort ship with multiple 128 and 256-bit encryption libraries using proven, compliant 3DES, AES, GPG and OpenSSL algorithms. For each field, you can use the same or different built-in encryption routine, or link to your own encryption library and specify it as a custom, field-level transformation function in a job script. You can also use the same algorithm(s) and a different encryption key for each field as well.
After data governance efforts have identified the files and fields at risk (usually via ETL or modeling tools like those from Exeros or GlobalIDS), you can declare the specific field-level protection functions in a FieldShield or SortCL script that makes sense for you. Both tools deliver field-protected views of sensitive data (like social security or phone numbers, salaries, medical codes) in ODBC-connected database tables and sequential files, through many techniques, including:
- field filtering (redaction)
- masking and obfuscation
- secure encryption and decryption
- anonymization and pseudonymization
- de-identification and re-identification
All FieldShield and CoSort/SortCL job scripts and field-level functions can be recorded in XML audit logs that you can secure, and query with your preferred XML reporting tool. You can also use SortCL scripts (n.b. samples are provided, where /INFILE=$path/auditlog.xml /PROCESS=XML, etc.) against these audit logs for reporting.
Use whichever technique satisfies your business requirements. In FieldShield (or CoSort's SortCL), you can apply any of these functions:
- Expression (calculation/function) logic
- Substring and byte shifting
- Data type conversion
- Custom function
You can choose to protect one or more fields with one or more techniques, or protect one or more records entirely, using condition criteria and layout parameters. Decision criteria for which function is chosen for each datum are:
- Security - how strong (uncrackable) must the function be
- Reversability - whether that which was concealed must later be revealed
- Performance - how much computational overhead is associated with the algorithm
- Appearance - whether the data must retain its original format after being protected
IRI is happy to help you assess which functions best apply to your data.
"The first system for which the data warehouse is responsible is the data staging area, where production data from many sources is brought in, cleansed, conformed, combined, and ultimately delivered to the data warehouse presentation systems ... The two dominant data structures in the data staging area are the flat file and the entity/relationship schema, which are directly extracted or derived from the production systems."
from "The Foundations for Modern Data Warehousing" by Ralph Kimball, appearing in Intelligent Enterprise Magazine's Data Warehouse Designer Section
With SortCL - and optional functions that integrate with it - you can simultaneously filter, cleanse, standardize, transform, protect, and report on your massive collections of data. Compare this consolidation potential with how you get work done now:
CoSort - one product, place, pass, and price. vs. the complexity, cost and time of your Status Quo - multiple products, places, passes and prices.
â€¢ Data Processing > Integration and Transformation - Select, Filter, Sort, Join, Merge, Aggregate > Format Conversion - Data type translation, custom field mapping > Legacy Migration - Multiple, simultaneous file format migration > Metadata Translation - Legacy sort parm and file layout conversions
â€¢ Data Presentation > Custom Reporting - Segmented and formatted detail and summary BI > Hand-off to Advanced BI Tools - CoSort dashboard is one such option
â€¢ Data Protection > Safe Files for Compliance and Outsourcing - Field-level anonymization, encryption, etc.
â€¢ Data Prototyping > Test Data - Randomly fields in real file/report formats > Job Auditing - XML application logs for query and reporting > Data Validation - Verify data are in the form specified
So, to process, present, protect and prototype big data, CoSort and flat files can be a fast and easy way to get this work done - standalone, or all in one I/O pass.
Through database-specific APIs and parallel unloading techniques that produce portable flat files. For interface details, and to request a brochure, white paper, webinar or trial, go to Products > FACT
SortCL job scripts can do many of the same jobs much faster, and with far less coding. SortCL runs outside the database on flat file inputs, using the same relational logic and functions; i.e. SELECT WHERE, DISTINCT, ENCRYPT, ORDER BY, GROUP BY, JOIN. For example, CoSORT's SortCL uses conditional /INCLUDE and /OMIT statements to select from sequential input sources and output targets. DISTINCT is similar to /NODUPLICATES, ORDER BY is a /KEY, GROUP BY may be a /SUM, /AVERAGE, etc.
We have hundreds of tables and thousands of columns defined in tables, though these are also represented as external files. How can we leverage SortCL without having to re-define these layouts by hand?
Like FACT, the Meta Integration Model Bridge (MIMB) from Meta Integration Technology, Inc. (MITI) can automatically convert the file layout metadata used in your relational or ETL tool into CoSort/SortCL data definition file (DDF) format. For details, please visit the MITI web site .
- IRI Workbench, the graphical IDE built on Eclipse, is currently free on Windows and Linux platforms.
- IRI FACT (VLDB unload) and IRI RowGen (test data generation) are priced per hostname according to the number of licensed CPU cores (threads) licensed for use on each.
- IRI FieldShield (data masking) is seat based. IRI NextForm has 5 different editions and prices.
Refer to the licensing information on their product description pages for price ranges. Contact your IRI representative for more information and an NDA-confidential quotation for the use of any IRI software product, or for an IRI Professional Services engagement estimate.
Per-copy prices, which include the SortCL tool, typically range from the low 4 to 5 figures (USD) and are based on hardware and CPU configurations. Contact your IRI representative for a perpetual-use estimate in your environment.
There are discounts for successive licenses ordered at the same time, limited or expiring usage, runtime integration, and GSA schedule procurements. Your company may also have an umbrella (site), or distribution (royalty) agreement in place with pre-negotiated discounts.
Specific and final price quotes for your organization can be provided under a signed confidentiality agreement.
It may be included in your license fee in the first year, and then become an annual option at a percentage of the license fee that reflects the desired level of support and upgrades.
The CoSort package comes with all the tools, conversion utilities, and callable libraries, as well as full .pdf documentation for the executables and APIs.
When a CoSort package is installed, default system tuning parameters are automatically configured to exploit available RAM and CPU resources. Users can manually adjust their tuning parameters (including the assignment of sort overflow disks, and audit logs) in a simple text file.
• Standalone interfaces: sorti, sortcl
• Third-Party Sort PlugIns: e.g., Unix sort, SAS, etc.
• Runtime libraries: cosort_r(), sortcl_routine()
If so, less than you would for production. Call IRI to discuss your situation.
Move the master data to a central place and metadata repository where it is easier to clean, manage risk, and support a SOA chain. Define the files and fields in the CoSort SortCL tool readable format and scan for naming and attribute discrepancies. Create SortCL jobs to integrate and standardize these files, and leverage custom, field-level transformation functions to cleanse the data according to your own business rules. Once prepared in the file system via rapid CoSort SortCL operations, you can then re-populate master data tables as needed. SortCL join operations can also filter and report on changed master data that are in files.
As we move to a centralized MDM model, all our critical metadata may sit in one vulnerable basket. Our trusted internal systems and processes touch this data. But we also need to make it available to outsourcers and for testing. What can we do to balance need-to-know and access/use requirements?
All CoSort (SortCL) application scripts (whether for data transformation, reporting, protection or prototyping) are parsed and saved with runtime information in audit logs that you can secure and query with your preferred XML application or SortCL itself.
The SortCL program in IRI CoSort can flatten and process some hierarchical data, including mainframe index files, to prepare for DB loads and business intelligence tools. SortCL can also join product data with customer transactions (as well as other files with common keys) and output custom reports. These reports can include aggregation and cross-calculation of the detail and summary data produced. SortCL can also perform field-level lookups to find discrete values, and populate newly formatted files.
RowGen v3 pricing starts well below $10K for perpetual use, and increases proportionally with additional CPU cores that you may wish to exploit. Maintenance is free in the first year and then 15- 20% of your license fee, depending on the level of upgrades.
* Generating Huge Volumes in Parallel
* Applying Custom Rules and Ranges
* Using DB Data Models and Metadata
* Preserving Realism and Relationships
* Transforming, Segmenting, and Reporting
* Formatting in Custom File Layouts
* Auditing Jobs to Verify Compliance
For CoSort's Sorti, SortCL, default (non-MIMB) SortCL Metadata converters, APIs, included Sort PlugIns, and other utilities, yes. The IRI Workbench (CoSort GUI) requires a reasonably current 32-bit JRE installation.