R Performance Optimization with or without Hadoop

Home » Solutions » BI, Analytics & AI » Data Wrangling » R

Quick Links

Data Wrangling BOBJ Cognos Cubeware Datadog Excel iViz KNIME MSTR OBIEE/DV Power BI QlikView R Splunk Spotfire Tableau

R is a free programming language and software environment that statisticians and data miners use for analysis and predictions, and has become known as a 'big data' visualization tool. Because R holds all its objects in memory, however, it cannot effectively work with very large data sets.

The SortCL program in the IRI Voracity big data management platform or standalone IRI CoSort package is a fast, simple, and inexpensive way to prepare big data for R efficiently -- both in terms of job design and runtime performance.

When we rant a SortCL sort, join, and aggregation job ahead (and instead) of R, time-to-visualization in tools like ggplot or qplot were cut in half:

R works only on multiple, small chunks of data, and requires multiple code files to produce the same result as one a single SortCL job. Hadoop is another way to rapidly prepare big data sets for R of course, and Voracity users can run most SortCL jobs seamlessly in Map Reduce 2, Spark, Spark Stream, Storm, or Tez without additional coding.

For more information on the benchmark and how SortCL can prepare data in the same Eclipse™ environment (via the StatET for R plug-in for IRI Workbench), see:

Accelerating R Analytics

Prepare Big Data and Run R in Eclipse

Quick Links

Request More Information

Solutions

Products

Customers

Services

Company

Support

News

Partners

Try Voracity Free

Accelerating R Analytics

Prepare Big Data and Run R in Eclipse

Quick Links

Request More Information

Follow us on

Try Voracity Free

Get the IRI Newsletter