IRI Blog Articles

Diving Deeper into Data Management

 

 

SCD Type 4

by Susan Gegner

Dimensional data that change slowly or unpredictably are captured in Slowly Changing Dimensions (SCD) analyses. In a data warehouse environment, a dimension table has a primary key that uniquely identifies each record and other pieces of information that are known as the dimensional data.

All the update methods for the different SCD types are accomplished using the SortCL program in IRI CoSort. In IRI Voracity there is a wizard to aid in creating the SCD job scripts used when running the SortCL program. Most SCD types use a full outer join to match records from the original data source with records in the updated source based on equating a key from each. Type 4, however, accomplishes the update using a sort.

Type 4 SCD

The Type 4 model is similar to that for Type 2. The difference is that there are 2 tables or files that are maintained: one for the current costs and one to hold the history records for the costs. When new current records are added to the master, these new records are usually added to the history at the same time.

The field definitions are the same in the update, the history and the master files or tables. We will sort the records for the update and history together. A new master will be created that only has one record for each ProductCode and that record will be the most current. The history will have all the records for each ProductCode including the ones from the update source.

The starting master table will be the same as the starting master from our example of Type 1 with values as shown below.

ProductCodeCostStartDate
C123125.5020110228
F1122365.0020120101
G10119.2520110930
J245450.5020110430
S02298.7520110515

All the update data will all have the same StartDate. The update.dat source contains records with the following values:

ProductCodeCostStartDate
F1122425.0020120701
J245550.5020120701
M447101.7520120701
S022101.7520120701

The starting history source below is like the starting master in the Type 2. So Type 4 is a combination of Type 1 and Type 2 where the master is the same as a Type 1 master and the history is the same as a Type 2 master.

ProductCodeCostStartDate
C123125.5020110228
F1122365.0020120101
G10119.2520110930
G10121.2520110501
J245450.5020110430
J245425.2520101001
J245385.2520100215
S02298.7520110515

In IRI Workbench, there is a Voracity wizard to assist in the creation of scripts for updating Dimensional files and tables. This wizard is located in the Voracity dropdown on the navigation bar. First you pick the SCD type. Then the window where you select the sources that are used for processing the update is displayed.

image001
The next screen is where you set up the sort by selecting the keys. There are 2 keys: ProductCode then StartDate. We are grouping by ProductCode, but we also need to sort in descending order by StartDate within each ProductCode grouping. This is so that we can filter the most recent record for each ProductCode group to the new master.

image003

To make the StartDate a key that sorts in descending order, select that key and then select Edit Key. This will bring up the following screen:

image005

In the dropdown for Direction select Descending.

Here is the job script for the sort:

/INFILE=C:/IRI/CoSort95/workbench.orig/workspace/SCD/SCD4/history4.dat
    /PROCESS=DELIMITED
    /ALIAS=history4
    /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"')
    /FIELD=(COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"')
    /FIELD=(STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"')
/INFILE=C:/IRI/CoSort95/workbench.orig/workspace/SCD/update.dat
    /PROCESS=DELIMITED
    /ALIAS=update
    /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"')
    /FIELD=(COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"')
    /FIELD=(STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"')

/SORT
    /KEY=(PRODUCTCODE, TYPE=ASCII)
    /KEY=(STARTDATE, DESCENDING, TYPE=ASCII)

/OUTFILE=history4.dat
 # This file will contain all the records from both inputs
    /PROCESS=DELIMITED
    /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"')
    /FIELD=(COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"')
    /FIELD=(STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"')

/OUTFILE=master4.dat
# Include only one record for each ProductCode
    /PROCESS=DELIMITED
    /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"')
    /FIELD=(COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"')
    /FIELD=(STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"')
    /INCLUDE WHERE PRODUCTCODE

Below is the updated master produced by running the above job script using SortCL. The values are the same that are produced in a Type 1 update.

ProductCodeCostStartDate
C123125.5020110228
F1122425.0020120701
G10119.2520110930
J245550.5020120701
M447139.2520120701
S022101.7520120701

The history file contains all previous records plus the new records from the update file.

ProductCodeCostStartDate
C123125.5020110228
F1122425.0020120701
F1122365.0020120101
G10119.2520110930
G10121.2520110501
J245550.5020120701
J245450.5020110430
J245425.2520101001
J245385.2520100215
M447101.75220120701
S022101.7520120701
S02298.7520110515
Print Friendly

{ 0 comments… add one now }

Leave a Comment

Previous post:

Next post: