Pivot Transformations
Changing between column and row formats is an online analytical processing (OLAP) activity that optimizes data structures for queries and reports.
A pivot transformation turns multiple rows of data into one, denormalizing a data set into a more compact version by rotating the input data on a column value.
For example, a normalized sales report that includes store names, dates and sales amounts has several rows for each store. In this case, each row for that store may have a sales amount for each month. Pivoting the data on the month column produces one row per store, with month names shown as column names, and sales amount values in the month column.
An unpivot transformation on the other hand, converts columns into rows, normalizing the dataset by expanding values in multiple columns in a single record into multiple records with the same values in a single column. There are several ways these data transformations can be accomplished, depending on the platform or tool you use.
One way is a pivot table. According to Wikipedia, “a pivot table is a data summarization tool found in data visualization programs such as spreadsheets or business intelligence software. A pivot-table can sort, count, total or give the average of the data stored in one table or spreadsheet. It displays the results in a second table (called a “pivot table”) showing the summarized data.
Pivot tables are also useful for creating unweighted cross tabulations.” The user sets up and changes (rotates) the summary’s structure by dragging and dropping fields graphically. The concept is named for this pivoting of the summary table. Another way database administrators pivot their tables is through SQL, and the code is typically complex. See this blog example on data normalization.
Pivot transformations, like many other functions that can be performed with a database or ETL tool, may also be performed externally in the file system for performance reasons, or need to be, because the data sources are already in flat files.
IRI believes that data and applications should be independent, and that databases should be used for secure data storage and retrieval, not big data transformation or single-point solution control. Its customers transform big data externally, in jobs that are easier to introduce, code, and change.
IRI CoSort — and its Sort Control Language (SortCL) Program in particular — is a data transformation and reporting tool that runs outside the database layer. Pivoting and other transforms are coded in a 4GL familiar to SQL users, or with field drag/drop operations in an Eclipse GUI supporting SortCL scripts.
SortCL reports can combine a wide range of sort, join, sum, filter, string manipulation, field function, PII protection, and formatting options to produce change data capture (delta) reports, display calculations from slowly changing dimensions, create custom file, XML and HTML targets, etc. SortCL jobs are portable and can handle data in files and tables at the same time.