IRI Blog Articles

Diving Deeper into Data Management

 

 

SCD Type 3

by Susan Gegner

Dimensional data that change slowly or unpredictably is captured in Slowly Changing Dimensions (SCD) analyses. In a data warehouse environment, a dimension table has a primary key that uniquely identifies each record and other pieces of information that are known as the dimensional data.

All the update methods for the different SCD types can be accomplished using the SortCL program in IRI CoSort. In IRI Voracity there is a wizard to aid in creating the SCD job scripts used when running the SortCL program. Most variations use a full outer join to match records from the original master data source with records in the updated source based on equating a key from each. Records with matches need to be updated in the master. Records in the update source that do not have a match need to be added to the master.

Here is an overview of how to update a dimensional file using SCD Type 3 where I am maintaining product costs. The update is accomplished by joining with respect to the field ProductCode.

Type 3 SCD

With the Type 3 model, the current data and the historical data are carried in the same record.  One of the major things to decide is how much history should be kept. With our example, I will keep a cost for three start dates. Enddate is the date when the cost in the record is no longer effective. If the cost is still effective, then the EndDate1 field will be set to 99991231. This is to avoid an empty value in this field. Here are the values contained in the starting master source (master3.dat):

ProductCodeCost1StartDate1EndDate1Cost2StartDate2EndDate2Cost3StartDate3EndDate3
C123125.502011022899991231
F1122365.002012010199991231
G10119.25201109309999123121.252011050120110930
J245450.502011043099991231425.252010100120110430385.252010021520101001
S02298.752011051599991231

The update records all have the same StartDate. The update.dat source contains the following field values for the records:

ProductCodeCostStartDate
F1122425.0020120701
J245550.5020120701
M447101.7520120701
S022101.7520120701

Below is the updated data. For each record that was updated, each cost, startdate, and enddate is moved to the right by one set. In the old current group for the record, the enddate is changed to the startdate for the new updated current set.  Any values for the old third set are discarded.

ProductCodeCost1StartDate1EndDate1Cost2StartDate2EndDate2Cost3StartDate3EndDate3
C123125.50201102289991231
F1122425.002012070199912312365.002012010120120701
G10119.2520110930999123121.252011050120110930
J245550.50201207019991231450.502011043020120701425.252010100120110430
M447101.75201207019991231
S022101.7520120701999123198.752011051520120701

In IRI Workbench, there is a Voracity wizard to assist in the creation of scripts for updating Dimensional files and tables. This wizard is located in the Voracity dropdown on the navigation bar. First you pick the SCD type. Then the window where you select the files that are used for processing the update is displayed.

image001

In the top table on the next screen, you determine how the data being updated is mapped to the master. In the middle grouping on the screen, choose the fields that give the status values for the records that are being updated.

In the dropdowns under Current and Historical Field Sets, the first set that you define, gives the field names for current values. Value Field is the dimensional value that is being updated. Start Field is the field that determines when the Value Field became active and End Field determines when it is no longer valid. Then define as many sets as you need for the number of historical values that are being kept.

image003

Here is the job script that will update the master file:

/INFILE=master3.dat
    /PROCESS=DELIMITED
    /ALIAS=master3
    /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",")
    /FIELD=(COST1, TYPE=NUMERIC, POSITION=2, SEPARATOR=",")
    /FIELD=(STARTDATE1, TYPE=ASCII, POSITION=3, SEPARATOR=",")
    /FIELD=(ENDDATE1, TYPE=ASCII, POSITION=4, SEPARATOR=",")
    /FIELD=(COST2, TYPE=NUMERIC, POSITION=5, SEPARATOR=",")
    /FIELD=(STARTDATE2, TYPE=ASCII, POSITION=6, SEPARATOR=",")
    /FIELD=(ENDDATE2, TYPE=ASCII, POSITION=7, SEPARATOR=",")
    /FIELD=(COST3, TYPE=NUMERIC, POSITION=8, SEPARATOR=",")
    /FIELD=(STARTDATE3, TYPE=ASCII, POSITION=9, SEPARATOR=",")
    /FIELD=(ENDATE3, TYPE=ASCII, POSITION=10, SEPARATOR=",")
/INFILE=update.dat
    /PROCESS=DELIMITED
    /ALIAS=update
    /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"')
    /FIELD=(COST, TYPE=ASCII, POSITION=2, SEPARATOR=",", FRAME='\"')
    /FIELD=(STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"')

/JOIN FULL_OUTER master3 update WHERE MASTER3.PRODUCTCODE == UPDATE.PRODUCTCODE

/OUTFILE=master3.dat
# Include only records that are to be updated
    /PROCESS=DELIMITED
    /FIELD=(UPDATE.PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",")
    /FIELD=(UPDATE.COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",")
    /FIELD=(UPDATE.STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",")
    /FIELD=(ENDDATE1_NEW="99991231", TYPE=ASCII, POSITION=4, SEPARATOR=",")
    /FIELD=(MASTER3.COST1, TYPE=NUMERIC, POSITION=5, SEPARATOR=",")
    /FIELD=(MASTER3.STARTDATE1, TYPE=ASCII, POSITION=6, SEPARATOR=",")
    /FIELD=(UPDATE.STARTDATE, TYPE=ASCII, POSITION=7, SEPARATOR=",")
    /FIELD=(MASTER3.COST3, TYPE=NUMERIC, POSITION=8, SEPARATOR=",")
    /FIELD=(MASTER3.STARTDATE3, TYPE=ASCII, POSITION=9, SEPARATOR=",")
    /FIELD=(MASTER3.ENDATE3, TYPE=ASCII, POSITION=10, SEPARATOR=",")
    /INCLUDE WHERE MASTER3.PRODUCTCODE == UPDATE.PRODUCTCODE

/OUTFILE=master3.dat
# Include only records that are not to be updated
    /PROCESS=DELIMITED
    /FIELD=(MASTER3.PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",")
    /FIELD=(MASTER3.COST1, TYPE=NUMERIC, POSITION=2, SEPARATOR=",")
    /FIELD=(MASTER3.STARTDATE1, TYPE=ASCII, POSITION=3, SEPARATOR=",")
    /FIELD=(MASTER3.ENDDATE1, TYPE=ASCII, POSITION=4, SEPARATOR=",")
    /FIELD=(MASTER3.COST2, TYPE=NUMERIC, POSITION=5, SEPARATOR=",")
    /FIELD=(MASTER3.STARTDATE2, TYPE=ASCII, POSITION=6, SEPARATOR=",")
    /FIELD=(MASTER3.ENDDATE2, TYPE=ASCII, POSITION=7, SEPARATOR=",")
    /FIELD=(MASTER3.COST3, TYPE=NUMERIC, POSITION=8, SEPARATOR=",")
    /FIELD=(MASTER3.STARTDATE3, TYPE=ASCII, POSITION=9, SEPARATOR=",")
    /FIELD=(MASTER3.ENDATE3, TYPE=ASCII, POSITION=10, SEPARATOR=",")
    /OMIT WHERE MASTER3.PRODUCTCODE == UPDATE.PRODUCTCODE
    /OMIT WHERE MASTER3.PRODUCTCODE EQ ""
/OUTFILE=C:/IRI/CoSort95/workbench.orig/workspace/SCD/SCD3/master3.dat
# Add new records
    /PROCESS=DELIMITED
    /FIELD=(UPDATE.PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",")
    /FIELD=(UPDATE.COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",")
    /FIELD=(UPDATE.STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",")
    /FIELD=(ENDDATE1_NEW="99991231", TYPE=ASCII, POSITION=4, SEPARATOR=",")
    /INCLUDE WHERE MASTER3.PRODUCTCODE EQ "" AND UPDATE.PRODUCTCODE NE ""
Print Friendly

{ 0 comments… add one now }

Leave a Comment

Previous post:

Next post: