Friday 5 December 2014

SCD Type1,2 and 3



SCD Type 1, SCD Type 2, SCD Type 3,Slowly Changing Dimension Types,Advantages & Disadvantages

The Slowly Changing Dimension problem is a common one particular to data warehousing.There are in general three ways to solve this type of problem, and they are categorized as follows:

  • Type 1: The new record replaces the original record. No trace of the old record exists.
  • Type 2: A new record is added into the customer dimension table.Thereby, the customer is treated essentially as two people.
  • Type 3: The original record is modified to reflect the change.
SCD Type 1,Slowly Changing Dimension Use,Example,Advantage,Disadvantage
In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept.
In our example, recall we originally have the following table:

Customer Key
Name
State
1001
Williams
New York

After Williams moved from New York to Los Angeles, the new information replaces the new record, and we have the following table:

Customer Key
Name
State
1001
Williams
Los Angeles

Advantages
  • This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information.
Disadvantages
  • All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Williams lived in New York before.
Usage
About 50% of the time.

When to use Type 1
Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical changes.

SCD Type 2,Slowly Changing Dimension Use,Example,Advantage,Disadvantage
In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key.
In our example, recall we originally have the following table:

Customer Key
Name
State
1001
Williams
New York

After Williams moved from New York to Los Angeles, we add the new information as a new row into the table:

Customer Key
Name
State
1001
Williams
New York
1005
Williams
Los Angeles

Advantages
  • This allows us to accurately keep all historical information.
Disadvantages
  • This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.
  • This necessarily complicates the ETL process.
Usage
About 50% of the time.

When to use Type 2
Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes.

SCD Type 3,Slowly Changing Dimension Use,Example,Advantage,Disadvantage
In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active.
In our example, recall we originally have the following table:

Customer Key
Name
State
1001
Williams
New York

To accommodate Type 3 Slowly Changing Dimension, we will now have the following columns:
  • Customer Key
  • Name
  • Original State
  • Current State
  • Effective Date
After Williams moved from New York to Los Angeles, the original information gets updated, and we have the following table (assuming the effective date of change is February 20, 2010):

Customer Key
Name
Original State
Current State
Effective Date
1001
Williams
New York
Los Angeles
20-FEB-2010

Advantages
  • This does not increase the size of the table, since new information is updated.
  • This allows us to keep some part of history.
Disadvantages
  • Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Williams later moves to Texas on December 15, 2003, the Los Angeles information will be lost.
Usage
Type 3 is rarely used in actual practice.

When to use Type 3
Type III slowly changing dimension should only be used when it is necessary for the data warehouse to track historical changes, and when such changes will only occur for a finite number of time.

7 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Inspiring writings and I greatly admired what you have to say , I hope you continue to provide new ideas for us all and greetings success always for you..Keep update more information..

    rpa training in Chennai | best rpa training in chennai

    rpa training in pune

    rpa online training | rpa training in bangalore

    ReplyDelete
  3. Hello I am so delighted I found your blog, I really found you by mistake, while I was looking on Yahoo for something else, anyways I am here now and would just like to say thanks for a tremendous post. Please do keep up the great work.
    Data Science training in Chennai | Data science training in bangalore
    Data science training in pune| Data science online training
    Python training in Kalyan nagar

    ReplyDelete
  4. A universal message I suppose, not giving up is the formula for success I think. Some things take longer than others to accomplish, so people must understand that they should have their eyes on the goal, and that should keep them motivated to see it out til the end.
    Python training in marathahalli
    Python training institute in pune

    ReplyDelete