Hide+ BI & Analytics
R is a free programming language and software environment that statisticians and data miners use for analysis and predictions, and has become known as a 'big data' visualization tool. Because R holds all its objects in memory, however, it cannot effectively work with very large data sets.
The SortCL program in the IRI Voracity platform or standalone IRI CoSort package is a fast, simple, and inexpensive way to prepare big data for R efficiently -- both in terms of job design and runtime performance. See this section to understand why.
When SortCL sorts, joins, and aggregates raw datasets in a single job and I/O pass ahead of R, time-to-visualizations in tools like ggplot or qplot are cut in half:
Without SortCL, R will only work on multiple, small chunks of data, and require multiple code files to produce the same result as one a single SortCL script. Hadoop is another way to rapdily prepare big data sets for R of course, and Voracity users can run SortCL jobs seamlessly in Map Reduce 2, Spark, Storm, or Tez without additional coding.