Selected Questions and Answers
Important note: The FAQs below are not a comprehensive resource, and only address a fraction of the available capabilities in IRI software or questions people ask.
Please visit the IRI solutions and products sections to learn more. Also, do not hesitate to contact us if you have any questions, or need details on specific features or options applicable to your use case(s).
What is the recommended hardware sizing configuration / system requirements for jobs requiring SortCL (including CoSort/Voracity) or its subsets/spinoffs?
IRI Workbench is the Eclipse front-end (graphical job design client) piece, which runs on Windows, Linux, and macOS. The back-end SortCL program in the CoSort package/Voracity platform -- which is the data manipulation, mapping, munging, masking, and mining engine -- runs on all the above, plus all flavors of Unix including AIX, Solaris, HP-UX, and z/i/pSeries Linux. Native hosts or virtual machines should be considered similarly for this discussion.
Again, because SortCL is the back-end engine, and IRI Workbench is the front-end, for multiple IRI products, this answer applies not only to CoSort (sort) and Voracity (ETL, etc.) jobs, but also to FieldShield, RowGen, NextForm, and DarkShield. The requirements for Windows and Linux are similar as far as hardware is concerned.
The absolute minimum requirement for CoSort/SortCL CLI operations (only) is 40MB of RAM, but at least 512MB-2GB is recommended as a minimum for system RAM available to the CoSort user. Sort jobs that can fit entirely in memory are generally faster, and it's not uncommon for modern CoSort hostnames to be configured with 64GB-2TB of RAM to sort without the I/O overhead of work files.
The minimum configuration for Workbench is 4GB of RAM and 10GB of free disk space, after the installation of any VMs, DBs, etc. Workbench includes a JRE. Workbench and CoSort are tested and supported back to XP on Windows. We also test with various major Linux distributions of both Debian and Red Hat package management standards. However, 6GB and up works best for each system to accommodate multiple database connections and table parsing for metadata and job definition. In fact, for schema with hundreds of tables to enumerate, as much as 64GB of RAM could be appropriate for the Workbench machine(s) where DB-related jobs are built.
We recommend where possible co-location of the licensed back-end (SortCL executable) on, or within close network proximity to, your database source or target server(s) for performance reasons, particularly if there are known network bottlenecks. When it comes to data masking in FieldShield, DarkShield, and test data generation via RowGen, the bottleneck is typically in network performance AND I/O; the time it takes you currently to read and write your data now is roughly the time it will take you to mask, subset, or synthesize it, too. This is therefore another reason for same-system colocation where possible, or at least optimal I/O subsystems for the software use in volume (e.g., fibre channel, SSD, multi-core, fewer conncurrent processes, etc.).
If you are going to also run that CoSort engine on the same PC as Workbench, then you will have to allow extra capacity to run jobs. The requirements for the program increase with the size of the data that you intend to process at any one time. A recommended hardware platform for a PC running Workbench would be 8GB of RAM, and 10GB available disk space, plus additional disk space for temporary files equal to 1.5 times the largest data set to be processed. A general guiding principal for hardware is that the more RAM, the better the performance.
For more information on CoSort-specific tuning recommendations, see this article:
https://www.iri.com/blog/iri/business/frequently-asked-cosort-tuning-questions/
Many 'big data' CoSort and Voracity users license and cosortrc-tune the product on very large multi-core Unix systems to leverage hundreds of GB of RAM to out-perform Hadoop for example. If you run the Hadoop edition of Voracity, load balancing should be automatic. For DarkShield masking jobs through the API, multi-node load balancing is also possible through the NGINX reverse proxy server.

