IRI Blog Articles

Diving Deeper into Data Management

 

 

Linear Regression Report - thumb

ūüé• Linear Regression – A Predictive Tool in IRI Voracity

by Dustin Ellsworth

Linear regression is a staple data analysis function for financial, economic, research, and many other disciplines, that helps discover new data correlations. Users of the IRI Voracity platform can now simultaneously process big data from any number of sources and present customized trend lines to help business users make predictions.

Linear regression expands on previously covered analytic or preparatory capabilities in constituent Voracity technology, such as:

This article (and embedded video) demonstrates how to simultaneously perform¬†regression analysis using the ‚Äúquick_stats‚ÄĚ field function in the CoSort SortCL¬†program, and generate¬†a .pdf report with confidence intervals on x-y value pairs using¬†BIRT¬†in the IRI Workbench GUI¬†and a¬†Boost C++ library function.¬†A planned update to this article will use a Workbench wizard that automates these design steps.

The auto-generated report provides a base set of statistics about the x and y values at the time the entire data set is ingested.¬†Thus¬†‘big’ source data integration and calculation are running¬†simultaneously with the visualization of those results … and all in the same “pane of glass” controlling both data processing¬†and presentation. With Voracity‚Äôs task scheduler,¬†you can run such reports at regular intervals and produce a uniform set of general statistics about the x and y data to analyze over time.

The sample report below contains a graph of the data points, a linear regression analysis line and equation, R-squared value, average x and y values, standard deviations of x and y, and corresponding confidence limits:

 Example Report

linear-regression

Use Cases 

Demonstration Script (stats-test.scl) 

Demonstration Script (stats-test.scl)

Demonstration Video

Summary of quick_stats()

The function quick_stats (SOURCE X , SOURCE Y , FILENAME, X-AXIS LABEL, Y-AXIS LABEL ) takes in five (5) arguments as follows:

  • Argument 1 – SOURCE X value (integer or decimal)
  • Argument 2 – SOURCE Y value (integer or decimal)
  • Argument 3 – FILENAME you want the resulting report to have. This name will have the date and time appended upon creation. (ie. “FILENAME_2016-01-04_11.45.30”)
  • Argument 4 – X-AXIS LABEL you want on the graph (ie. “X_AXIS_LABEL”)
  • Argument 5 – Y-AXIS LABEL you want on the graph (ie. “Y_AXIS_LABEL”)

The linear regression line and all statistics contained within the report are calculated from the entire data-set.  Do to report size limitations, a maximum of 1000 points are displayed on the graph.  So, if 1,000,000 x-y value pairs are passed into the function, every one is used in calculations, though a random 1000 are displayed on the graph.

Required Preparation

Because this function is new, there are a few steps you must take to make it work:

  1. In IRI Workbench, create a new IRI Project named ‚ÄúRepo‚ÄĚ from the New toolbar.
  2. Download the latest BIRT Runtime Release Build from the BIRT download page.
  3. Unzip the BIRT Runtime Release Build zip file into a temporary location.
  4. Copy the folder ReportEngine from inside the temporary location into the root of the drive that IRI software is installed on. Creating C:\ReportEngine for a default installation of CoSort.
  5. Download quick_stats.zip from the IRI Website.
  6. Unzip quick_states.zip into your CoSort install directory, C:\IRI\CoSort95 by default.
  7. Copy all the files from the examples\quick_stats_example_files folder unzipped from quick_stats.zip into the Repo project that you created in Step 1.

BIRT Report Designer

To customize the report, you must have the BIRT Report Designer plugin installed in the IRI Workbench (Eclipse) GUI as follows:

  1. Within the IRI Workbench go to Help -> Install New Software
  2. In the ‚ÄėWork with‚Äô box type: ¬†http://download.eclipse.org/birt/update-site/4.4 ¬†(or current version)
  3. Click Add and install Report Designer.

If you have any suggestions for a specific type of in-process data analysis you might find useful, please leave a comment below, or email voracity@iri.com.

Print Friendly

{ 0 comments… add one now }

Leave a Comment

Previous post:

Next post: