IRI Blog Articles

Diving Deeper into Data Management

 

 

SCD Type 6

by Susan Gegner

Dimensional data that change slowly or unpredictably are captured in Slowly Changing Dimensions (SCD) analyses. In a data warehouse environment, a dimension table has a primary key that uniquely identifies each record and other pieces of information that are known as the dimensional data.

All the update methods for the different SCD types can be accomplished using the SortCL program in IRI CoSort. In IRI Voracity there is a wizard to aid in creating the SCD job scripts used when running the SortCL program. Most variations use a full outer join to match records from the original data source with records in the updated source based on equating a key from each. Records in the update source that do not have a match need to be added to the master.

Here is an overview of how to update a dimensional file or table using SCD Type 6 where I am maintaining product costs.  The update is accomplished by joining with respect to the field ProductCode.

Type 6 SCD

Type 6 is a hybrid that is a combination of Type 1, Type 2, and Type 3. It is called Type 6 because 1+2+3=6. Each record will have will have the following fields:

  • ProductCode:  This is the identifier key field.
  • Cost: Current cost of the product.
  • HistoricalCost: Cost that became effective on the StartDate for that record.
  • StartDate: Date on which the HistoricalCost became effective.
  • EndDate: Date on which the HistoricalCost for the record was no longer the current CurrentCost. If the HistoricalCost is still the current Cost, then the EndDate is 99991231.
  • Current: Y if Cost is still current, N if it is not

For now let’s just look at the records for ProductCode J245. Let’s start when the earliest HistoricalCost was the current Cost. The HistoricalCost is the same as the Cost. There is then one record for the ProductCode J245 and it has the values shown below:

ProductCodeCostHistoricalCostStartDateEndDateCurrent
J245385.25385.252010021599991231Y

When a new Cost needs to be implemented, the values in the update record are used to add a new current record. The CurrentCost in all records with the Product Code J245 will be changed to the Cost value from the update record while the HistoricalCost stays the same for the existing records. In addition, the EndDate for the old current record is changed to the StartDate for the new current record. In the records with ProductCode J245, the CurrentCost field in the updated Master file now has the below values:

ProductCodeCostHistoricalCostStartDateEndDateCurrent
J245425.25425.252010100199991231Y
J245425.25385.252010021520101001N

Now we will update with the next new cost and get the following values for the J245 records:

ProductCodeCostHistoricalCostStartDateEndDateCurrent
J245450.50450.502011043099991231Y
J245450.50425.252010100120110430N
J245450.50385.252010021520101001N

The final update creates records that have these values:

ProductCodeCostHistoricalCostStartDateEndDateCurrent
J245550.50550.502012070199991231Y
J245550.50450.502011043020120701N
J245550.50425.252010100120110430N
J245550.50385.252010021520101001N

Here are the values in the master file prior to updating:

ProductCodeCostHistoricalCostStartDateEndDateCurrent
C123125.50125.502011022899991231Y
F1122365.002365.002012010199991231Y
G10119.2519.252011093099991231Y
G10119.2521.252011050120110930N
J245450.50450.502011043099991231Y
J245450.50425.252010100120110430N
J245450.50385.252010021520101001N
S02298.7598.752011051599991231Y

The update data will all have the same StartDate. The update.dat source contains the following values:

ProductCodeCostStartDate
F1122425.0020120701
J245550.5020120701
M447101.7520120701
S022101.7520120701

In IRI Workbench, there is a Voracity wizard to assist in the creation of scripts for updating Dimensional files and tables. This wizard is located in the Voracity dropdown on the navigation bar. First you pick the SCD type. Then the window where you select the sources that are used for processing the update is displayed. For Type 6, the target is normally the original master file or table.

image001

With the next screen, you determine how the update data is mapped and how other field or column values are set. The update.Cost field will map to master6.CurrentCost and the update.StartDate field will map to master6.StartDate. The Flag Field is the field that is used to determine whether the record contains the most recent field values. In this case that field is master6.current. If it is the most current record, then the CurrentCost field and the HistoryCost field will have the same values. The Flag Positive Value is the value in master6.Current that determines that the field values are the most recent values for a ProductCode. When that is the case, the value is “Y” in our example and the Flag Negative Value is “N”. End Field contains the name of the field that holds the value used to determine when the Cost for the record is no longer effective and Master.EndDate holds that value. End Value is used as the value for End Field when the record is current.

image003

The next screen is for defining the join performed with the master and update sources. You should note that both sources must be ordered with respect to ProductCode. If they are not, then you need to select NOT_SORTED in the dropdown for Sort Order Option under the data source that needs to be sorted.

image007

 

Here is the first job script:

/INFILE=master6.dat
    /PROCESS=DELIMITED
    /ALIAS=master6
    /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"')
    /FIELD=(CURRENTCOST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"')
    /FIELD=(HISTORICALCOST, TYPE=NUMERIC, POSITION=3, SEPARATOR=",", FRAME='\"')
    /FIELD=(STARTDATE, TYPE=ASCII, POSITION=4, SEPARATOR=",", FRAME='\"')
    /FIELD=(ENDDATE, TYPE=ASCII, POSITION=5, SEPARATOR=",", FRAME='\"')
    /FIELD=(CURRENT, TYPE=ASCII, POSITION=6, SEPARATOR=",", FRAME='\"')
/INFILE=update.dat
    /PROCESS=DELIMITED
    /ALIAS=update
    /FIELD=(PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"')
    /FIELD=(COST, TYPE=ASCII, POSITION=2, SEPARATOR=",", FRAME='\"')
    /FIELD=(STARTDATE, TYPE=ASCII, POSITION=3, SEPARATOR=",", FRAME='\"')

/JOIN FULL_OUTER NOT_SORTED master6 update WHERE MASTER6.PRODUCTCODE == UPDATE.PRODUCTCODE

/OUTFILE=master6.dat
# Make changes to records that have the same ProductCode
# as records in the update file 
    /PROCESS=DELIMITED
    /FIELD=(MASTER6.PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"')
    /FIELD=(UPDATE.COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"')
    /FIELD=(MASTER6.HISTORICALCOST, TYPE=NUMERIC, POSITION=3, SEPARATOR=",", FRAME='\"')
    /FIELD=(MASTER6.STARTDATE, TYPE=ASCII, POSITION=4, SEPARATOR=",", FRAME='\"')
    /FIELD=(ENDDATE_NEW, TYPE=ASCII, POSITION=5, SEPARATOR=",", FRAME='\"', IF MASTER6.CURRENT EQ "Y" THEN UPDATE.STARTDATE ELSE MASTER6.ENDDATE)
    /FIELD=(CURRENT_NEW="N", TYPE=ASCII, POSITION=6, SEPARATOR=",", FRAME='\"')
    /INCLUDE WHERE MASTER6.PRODUCTCODE == UPDATE.PRODUCTCODE

/OUTFILE=master6.dat
# Keep the records that have no updates 
    /PROCESS=DELIMITED
    /FIELD=(MASTER6.PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"')
    /FIELD=(MASTER6.CURRENTCOST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"')
    /FIELD=(MASTER6.HISTORICALCOST, TYPE=NUMERIC, POSITION=3, SEPARATOR=",", FRAME='\"')
    /FIELD=(MASTER6.STARTDATE, TYPE=ASCII, POSITION=4, SEPARATOR=",", FRAME='\"')
    /FIELD=(MASTER6.ENDDATE, TYPE=ASCII, POSITION=5, SEPARATOR=",", FRAME='\"')
    /FIELD=(MASTER6.CURRENT, TYPE=ASCII, POSITION=6, SEPARATOR=",", FRAME='\"')
    /OMIT WHERE MASTER6.PRODUCTCODE == UPDATE.PRODUCTCODE
    /OMIT WHERE MASTER6.PRODUCTCODE EQ ""

/OUTFILE=master6.dat
# Add the records with new product codes
    /PROCESS=DELIMITED
    /FIELD=(UPDATE.PRODUCTCODE, TYPE=ASCII, POSITION=1, SEPARATOR=",", FRAME='\"')
    /FIELD=(UPDATE.COST, TYPE=NUMERIC, POSITION=2, SEPARATOR=",", FRAME='\"')
    /FIELD=(UPDATE.COST, TYPE=NUMERIC, POSITION=3, SEPARATOR=",", FRAME='\"')
    /FIELD=(UPDATE.STARTDATE, TYPE=ASCII, POSITION=4, SEPARATOR=",", FRAME='\"')
    /FIELD=(ENDDATE_NEW="99991231", TYPE=ASCII, POSITION=5, SEPARATOR=",", FRAME='\"')
    /FIELD=(CURRENT_NEW="Y", TYPE=ASCII, POSITION=6, SEPARATOR=",", FRAME='\"')
    /OMIT WHERE UPDATE.PRODUCTCODE EQ ""
    /INCLUDE WHERE UPDATE.PRODUCTCODE

To review,

  • The value for CurrentCost will be the same for all records that have a common ProductCode
  • StartDate is the date when the HistoricalCost became effective
  • For the most current record of a ProductCode,
    • the values for CurrentCost and HistoricalCost are the same
    • the EndDate is 99991231
    • The value in Current is Y
  • For the records that are not the current record
    • EndDate is the date when the next more recent Cost became effective
    • The field Current has N for the value

The new master file will not be sorted because the new current master records were added to the bottom of the master file which will have values like this:

ProductCodeCostHistoricalCostStartDateEndDateCurrent
C123125.50125.502011022899991231Y
F1122425.002365.002012010120120701N
G10119.2519.252011093099991231Y
G10119.2521.252011050120110930N
J245550.50450.502011043020120701N
J245550.50425.252010100120110430N
J245550.50385.252010021520101001N
S022101.7598.752011051520120701N
F1122425.002425.002012070199991231Y
J245550.50550.502012070199991231Y
M447139.25139.252012070199991231Y
S022101.75101.752012070199991231Y
Print Friendly

{ 0 comments… add one now }

Leave a Comment

Previous post:

Next post: