Friday 5 December 2014

SCD Type1,2 and 3



SCD Type 1, SCD Type 2, SCD Type 3,Slowly Changing Dimension Types,Advantages & Disadvantages

The Slowly Changing Dimension problem is a common one particular to data warehousing.There are in general three ways to solve this type of problem, and they are categorized as follows:

  • Type 1: The new record replaces the original record. No trace of the old record exists.
  • Type 2: A new record is added into the customer dimension table.Thereby, the customer is treated essentially as two people.
  • Type 3: The original record is modified to reflect the change.
SCD Type 1,Slowly Changing Dimension Use,Example,Advantage,Disadvantage
In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept.
In our example, recall we originally have the following table:

Customer Key
Name
State
1001
Williams
New York

After Williams moved from New York to Los Angeles, the new information replaces the new record, and we have the following table:

Customer Key
Name
State
1001
Williams
Los Angeles

Advantages
  • This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information.
Disadvantages
  • All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Williams lived in New York before.
Usage
About 50% of the time.

When to use Type 1
Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical changes.

SCD Type 2,Slowly Changing Dimension Use,Example,Advantage,Disadvantage
In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key.
In our example, recall we originally have the following table:

Customer Key
Name
State
1001
Williams
New York

After Williams moved from New York to Los Angeles, we add the new information as a new row into the table:

Customer Key
Name
State
1001
Williams
New York
1005
Williams
Los Angeles

Advantages
  • This allows us to accurately keep all historical information.
Disadvantages
  • This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.
  • This necessarily complicates the ETL process.
Usage
About 50% of the time.

When to use Type 2
Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes.

SCD Type 3,Slowly Changing Dimension Use,Example,Advantage,Disadvantage
In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active.
In our example, recall we originally have the following table:

Customer Key
Name
State
1001
Williams
New York

To accommodate Type 3 Slowly Changing Dimension, we will now have the following columns:
  • Customer Key
  • Name
  • Original State
  • Current State
  • Effective Date
After Williams moved from New York to Los Angeles, the original information gets updated, and we have the following table (assuming the effective date of change is February 20, 2010):

Customer Key
Name
Original State
Current State
Effective Date
1001
Williams
New York
Los Angeles
20-FEB-2010

Advantages
  • This does not increase the size of the table, since new information is updated.
  • This allows us to keep some part of history.
Disadvantages
  • Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Williams later moves to Texas on December 15, 2003, the Los Angeles information will be lost.
Usage
Type 3 is rarely used in actual practice.

When to use Type 3
Type III slowly changing dimension should only be used when it is necessary for the data warehouse to track historical changes, and when such changes will only occur for a finite number of time.

Datastage Most Common Errors,Warnings and Solutions




This post will help you to solve your datastage error's and warning.

1. While running ./NodeAgents.sh start command... getting the following error: “LoggingAgent.sh process stopped unexpectedly”

SOL:   needs to kill LoggingAgentSocketImpl
              Ps –ef |  grep  LoggingAgentSocketImpl   (OR)
              PS –ef |  grep Agent  (to check the process id of the above)

2.Warning: A sequential operator cannot preserve the partitioning of input data set on input port 0
SOL:    Clear the preserve partition flag before Sequential file stages.

3.Warning: A user defined sort operator does not satisfy the requirements.
            SOL:   Check the order of sorting columns and make sure use the same order when use join stage after sort to joing two inputs.

4.Conversion error calling conversion routine timestamp_from_string data may have been lost. xfmJournals,1: Conversion error calling conversion routine decimal_from_string data may have been lost
SOL:    check for the correct date format or decimal format and also null values in the date or decimal fields before passing to datastage StringToDate, DateToString,DecimalToString or StringToDecimal functions.

5.“Error trying to query dsadm[]. There might be an issue in database server”
SOL:   Check XMETA connectivity.
db2 connect to xmeta (A connection to or activation of database “xmeta” cannot be made because of  BACKUP pending)

6.“DSR_ADMIN: Unable to find the new project location”
SOL:   Template.ini file might be missing in /opt/ibm/InformationServer/Server.
           Copy the file from another severs.
7. “Designer LOCKS UP while trying to open any stage”
SOL:   Double click on the stage that locks up datastage
           Press ALT+SPACE
           Windows menu will popup and select Restore
           It will show your properties window now
           Click on “X” to close this window.
           Now, double click again and try whether properties window appears.

8.“Error Setting up internal communications (fifo RT_SCTEMP/job_name.fifo)
SOL:   Remove the locks and try to run (OR)
          Restart DSEngine and try to run (OR)
Go to /opt/ibm/InformationServer/server/Projects/proj_name/
ls RT_SCT* then
           rm –f  RT_SCTEMP
           then try to restart it.

9.While attempting to compile job,  “failed to invoke GenRunTime using Phantom process helper”
   RC:     /tmp space might be full
           Job status is incorrect
           Format problems with projects uvodbc.config file
   SOL:   1)         clean up /tmp directory
             2)        DS Director Ã  JOB Ã  clear status file
             3)         confirm uvodbc.config has the following entry/format:
                       [ODBC SOURCES]
                       
                       DBMSTYPE = UNIVERSE
                       Network  = TCP/IP
                       Service =  uvserver
                       Host = 127.0.0.1
 10. No jobs or logs showing in IBM DataStage Director Client, however jobs are still accessible from the Designer Client.
SOL:   SyncProject cmd that is installed with DataStage 8.5 can be run to analyze and recover projects
           SyncProject -ISFile islogin -project dstage3 dstage5 –Fix

11. CASHOUT_DTL: Invalid property value /Connection/Database (CC_StringProperty::getValue, file CC_StringProperty.cpp, line 104)
SOL:   Change the Data Connection properties manually in the produced
DB2 Connector stage.
A patch fix is available for this issue JR35643
1    
      12. SQL0752N. Connect to a database is not permitted within logical unit of work CONNECT type 1 settings is in use.
SOL:   COMMIT or ROLLBACK statement before requesting connection to another database.

      13.Failed to authenticate the current user against the selected Domain: Could not connect to server.
       RC:     Client has invalid entry in host file
Server listening port might be blocked by a firewall
Server is down

      SOL:   Update the host file on client system so that the server hostname can be resolved from client.
Make sure the WebSphere TCP/IP ports are opened by the firewall.
Make sure the WebSphere application server is running. (OR)
Restart Websphere services.



    14.   The connection was refused or the RPC daemon is not running (81016)
         RC: The dsprcd process must be running in order to be able to login to DataStage.
      If you restart DataStage, but the socket used by the dsrpcd (default is 31538) was busy,
                the dsrpcd will fail to start. The socket may be held by dsapi_slave processes that were still running  or recently killed when DataStage was restarted.

         SOL:   Run "ps -ef | grep dsrpcd" to confirm the dsrpcd process is not running.
Run "ps -ef | grep dsapi_slave" to check if any dsapi_slave processes exist. If so, kill them.
       Run "netstat -a | grep dsprc" to see if any processes have sockets that are ESTABLISHED, FIN_WAIT, or CLOSE_WAIT. These will prevent the dsprcd from starting. The sockets with status FIN_WAIT or CLOSE_WAIT will eventually time out and disappear, allowing you to restart DataStage. Then Restart DSEngine.(if above doesn’t work) Needs to reboot the system.
   
   15. "Run time error '457'. This Key is already associated with an element of this collection."
   SOL:   Needs to rebuild repository objects.
a)     Login to the Administrator client
b)     Select the project
c)     Click on Command
d)     Issue the command ds.tools
e)     Select option ‘2’
f)      Keep clicking next until it finishes.

g)     All objects will be updated.


Errors/Warnings                                                                                                                                                               Solutions
 A sequential operator cannot preserve the partitioning of input data set on input port 0                                                                                                                                                              
 Clear the preserve partition flag before Sequential file stages.
                                                                                                                                                               
A user defined sort operator does not satisfy the requirements.                                                                                                                                                              
check the order of sorting columns and make sure use the same order when use join stage after sort to joing two inputs.
                                                                                                                                                               
Xfm_header,1: Conversion error calling conversion routine timestamp_from_string data may have been lost                                                                                                                                                    
check for the correct date format or decimal format and also null values in the date or decimal fields before passing to datastage StringToDate, DateToString,DecimalToString or StringToDecimal functions.
xfm,1: Conversion error calling conversion routine decimal_from_string data may have been lost                                                                                                                                                            
                                                                                                                                                               
Join_Outer: When checking operator: Dropping component “Field_Name” because of a prior component with the same name.                                                                                                                                                      
If you are using join,diff,merge or comp stages make sure both links have the differnt column names other than key columns
                                                                                                                                                               
oracle_source: When checking operator: When binding output interface field “Field1” to field “Field1”: Converting a nullable source to a non-nullable result;                                                                                                                                                               
If you are reading from oracle database or in any processing stage where incoming column is defined as nullable and if you define metadata in datastage as non-nullable then you will get above issue.if you want to convert a nullable field to non  nullable make sure you apply available null functions in datastage or in the extract query. i.e. Null function in oracle:NVL,NVL2. Datastage:IsNull,NullToEmpty,NullToZero
                                                                                                                                                               
ds_Trailer_Rec: When checking operator: When binding output schema variable "outRec": When binding output interface field "TrailerDetailRecCount" to field "TrailerDetailRecCount": Implicit conversion from source type "ustring" to result type "string[max=255]": Possible truncation of variable length ustring when converting to string using codepage ISO-8859-1.                                                                                                                                                        "In extended col under metadata of the transformer change to unicode

"
                                                                                                                                                               
Syntax error: Error in "group"  operator: Error in output redirection: Error in output parameters: Error in modify adapter: Error in binding: Could not find type: "subrec", line 35                                                                                                                                                 
It is the issue with the level number of the columns which were being added in transformer. level number was blank and the columns that were being taken from cff file had it as 02.
                                                                                                                                                               
Agg_stg: When checking operator: When binding input interface field “column1” to field “ column1 ”: Implicit conversion from source type “string[5]” to result type “dfloat”: Converting string to number.                                                                                                                                                     use data type conversion
                                                                                                                                                               
Join_sort: When checking operator: Data claims to already be sorted on the specified keys the ‘sorted’ option can be used to confirm this. Data will be resorted as necessary. Performance may improve if this sort is removed from the flow                                                                                                                                                       Sort the data before sending to join stage and check for the order of sorting keys and join keys and make sure both are in the same order.