IRI Blog Articles

Diving Deeper into Data Management

 

 

SCD Type 1

by Susan Gegner

Dimensional data that change slowly or unpredictably is captured in Slowly Changing Dimensions (SCD) analyses. In a data warehouse environment, a dimension table has a primary key that uniquely identifies each record and other pieces of information that are known as the dimensional data.

All the update methods for the different SCD types can be accomplished using the SortCL program in IRI CoSort. In IRI Voracity there is a wizard to aid in creating the SCD job scripts used when running the SortCL program. Most types use a full outer join to match records from the original data source with records in the update source based on equating a key from each. Records with matches need to be updated. Records in the update source that do not have a match need to be added to the master.

This article covers the SCD Type 1 model, where new information from the update data overwrites original information in the master source.  Records in the update source without matches are added to the new master. As this model involves overwriting old values with the current values, and maintains no history, it is not often used.

The update is accomplished by joining with respect to the field ProductCode. For this example, the current data will be the source file master1.dat and update.dat is used to change values or add records in the current master source.

The master1.dat source contains:

ProductCodeCostStartDate
C123125.5020110228
F1122365.0020120101
G10119.2520110930
J245450.5020110430
S02298.7520110515

In this case, the update data will all have the same StartDate. The update.dat source contains records with the following values:

ProductCodeCostStartDate
F1122425.0020120701
J245550.5020120701
M447101.7520120701
S022101.7520120701

The new dimensional table will have these values after the update:

ProductCodeCostStartDate
C123125.5020110228
F1122425.0020120701
G10119.2520110930
J245550.5020120701
M447139.2520120701
S022101.7520120701

In IRI Workbench, there is a wizard to assist in the creation of scripts for updating Dimensional files and tables. This wizard is located in the Voracity dropdown on the navigation bar. First you pick the SCD type. Then the window where you select the files that are used for processing the update is displayed.

image001

The next screen is for defining the join performed with the master and update sources

image003

Below is the sortcl job script that is produced when processing the two files master1.dat and update.dat:

/INFILE=C:/IRI/CoSort95/workbench.orig/workspace/SCD/SCD1/master1.dat
  /PROCESS=DELIMITED
  /ALIAS=master1
  /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"')
  /FIELD=(COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"')
  /FIELD=(STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"')
/INFILE=C:/IRI/CoSort95/workbench.orig/workspace/SCD/update.dat
  /PROCESS=DELIMITED
  /ALIAS=update
  /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"')
  /FIELD=(COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"')
  /FIELD=(STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"')

/JOIN FULL_OUTER NOT_SORTED master1 NOT_SORTED update WHERE MASTER1.PRODUCTCODE == UPDATE.PRODUCTCODE

/OUTFILE=/master1.dat
# This processes all except the new records
  /PROCESS=DELIMITED
  /FIELD=(MASTER1.PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"')
  /FIELD=(COST_NEW, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"', IF MASTER1.PRODUCTCODE EQ UPDATE.PRODUCTCODE THEN UPDATE.COST ELSE MASTER1.COST)
  /FIELD=(STARTDATE_NEW, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"', IF MASTER1.PRODUCTCODE EQ UPDATE.PRODUCTCODE THEN UPDATE.STARTDATE ELSE MASTER1.STARTDATE)
  /INCLUDE WHERE MASTER1.PRODUCTCODE NE ""
/OUTFILE=master1.dat
# This processes the new records to be added
  /PROCESS=DELIMITED
  /FIELD=(UPDATE.PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"')
  /FIELD=(UPDATE.COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"')
  /FIELD=(UPDATE.STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"') I 
  /INCLUDE WHERE MASTER1.PRODUCTCODE EQ ""
Print Friendly

{ 0 comments… add one now }

Leave a Comment

Previous post:

Next post: