A.What is DataStage
parallel Extender / Enterprise Edition (EE)?
Parallel extender is
that the parallel processing of data extraction and transformation application
. there are two types of parallel processing
1) Pipeline
Parallelism
2) Partition Parallelism.
B.What is a conductor
node?
Ans->Actually every process contains a conductor
process where the execution was started and a section leader process
for each processing node and a player process for each set of combined
operators and a individual player process for each uncombined operator.
When ever we want to kill a process we should have to destroy the player process and then section leader process and then conductor process.
When ever we want to kill a process we should have to destroy the player process and then section leader process and then conductor process.
C.How do you execute
datastage job from command line prompt?
Using
"dsjob" command as follows. dsjob -run -jobstatus projectname
jobname
ex:$dsjob -run
and also the options like
-stop -To stop the running job
-lprojects - To list the projects
-ljobs - To list the jobs in project
-lstages - To list the stages present in job.
-llinks - To list the links.
-projectinfo - returns the project information(hostname and project name)
-jobinfo - returns the job information(Job status,job runtime,endtime, etc.,)
-stageinfo - returns the stage name ,stage type,input rows etc.,)
-linkinfo - It returns the link information
-lparams - To list the parameters in a job
-paraminfo - returns the parameters info
-log - add a text message to log.
-logsum - To display the log
-logdetail - To display with details like event_id,time,messge
-lognewest - To display the newest log id.
-report - display a report contains Generated time, start time,elapsed time,status etc.,
-jobid - Job id information.
and also the options like
-stop -To stop the running job
-lprojects - To list the projects
-ljobs - To list the jobs in project
-lstages - To list the stages present in job.
-llinks - To list the links.
-projectinfo - returns the project information(hostname and project name)
-jobinfo - returns the job information(Job status,job runtime,endtime, etc.,)
-stageinfo - returns the stage name ,stage type,input rows etc.,)
-linkinfo - It returns the link information
-lparams - To list the parameters in a job
-paraminfo - returns the parameters info
-log - add a text message to log.
-logsum - To display the log
-logdetail - To display with details like event_id,time,messge
-lognewest - To display the newest log id.
-report - display a report contains Generated time, start time,elapsed time,status etc.,
-jobid - Job id information.
D.Difference between
sequential file,dataset and fileset?
Sequential File:
1. Extract/load from/to seq file max 2GB
2. when used as a source at the time of compilation it will be converted into native format from ASCII
3. Does not support null values
4. Seq file can only be accessed on one node.
2. when used as a source at the time of compilation it will be converted into native format from ASCII
3. Does not support null values
4. Seq file can only be accessed on one node.
Dataset:
1. It preserves partition.it stores data on the nodes so when you read from a dataset you dont have to repartition the data
2. it stores data in binary in the internal format of datastage. so it takes less time to read/write from ds to any other source/target.
1. It preserves partition.it stores data on the nodes so when you read from a dataset you dont have to repartition the data
2. it stores data in binary in the internal format of datastage. so it takes less time to read/write from ds to any other source/target.
3. You cannot view the
data without datastage.
4. It Creates 2 types
of file to storing the data.
A)
Descriptor File : Which is created in defined folder/path.
B)
Data File : Created in Dataset folder mentioned in configuration file.
5. Dataset (.ds) file cannot be open
directly, and you could follow alternative way to achieve that, Data Set
Management, the utility in client tool(such as Designer and Manager), and
command line ORCHADMIN.
Fileset:
1. It stores data in the format similar to that of sequential file.Only advantage of using fileset over seq file is it preserves partition scheme.
2. you can view the data but in the order defined in partitioning scheme.
1. It stores data in the format similar to that of sequential file.Only advantage of using fileset over seq file is it preserves partition scheme.
2. you can view the data but in the order defined in partitioning scheme.
3. Fileset creates .fs
file and .fs file is stored as ASCII format, so you could directly open it to
see the path of data file and its schema.
# What is the main differences between Lookup, Join and Merge
stages ?
All are used to join tables, but find the difference.
Lookup: when the reference data is very less we use lookup. bcoz the
data is stored in buffer. if the reference data is very large then it wl take
time to load and for lookup.
Join: if the reference data is very large then we wl go for join.
bcoz it access the data directly from the disk. so the
processing time wl be less when compared to lookup. but here in
join we cant capture the rejected data. so we go for merge.
Merge: if we want to capture rejected data (when the join key is not
matched) we use merge stage. for every detailed link there is a reject link to
capture rejected data.
Significant differences that I have noticed are:
1) Number Of Reject Link
(Join) does not support reject link.
(Merge) has as many reject link as the update links (If there are n-input links then 1 will be master link and n-1 will be the update link).
2) Data Selection
(Join) There are various ways in which data is being selected. e.g. we have different types of joins inner outer( left right full) cross join etc. So you have different selection criteria for dropping/selecting a row.
(Merge) Data in Master record and update records are merged only when both have same value for the merge key columns.
1) Number Of Reject Link
(Join) does not support reject link.
(Merge) has as many reject link as the update links (If there are n-input links then 1 will be master link and n-1 will be the update link).
2) Data Selection
(Join) There are various ways in which data is being selected. e.g. we have different types of joins inner outer( left right full) cross join etc. So you have different selection criteria for dropping/selecting a row.
(Merge) Data in Master record and update records are merged only when both have same value for the merge key columns.
# What are the different types of lookup? When one should use sparse lookup in a job?
In DS 7.5 we have 2 types of lookup options are avilable: 1.
Normal 2. Sparce
In DS 8.0.1 Onwards, we have 3 types of lookup options are available 1. Normal 2. Sparce 3. Range
In DS 8.0.1 Onwards, we have 3 types of lookup options are available 1. Normal 2. Sparce 3. Range
Normal lkp: To perform this
lkp data will be stored in the memory first and then lkp will be performed due
to which it takes more execution time if reference data is high in volume.
Normal lookup it takes the entiretable into memory and perform lookup.
Sparse lkp: Sql query will
be directly fired on the database related record due to which execution is
faster than normal lkp. sparse lookup it directly perform the lookup in
database level.
i.e If reference link is directly connected to Db2/OCI
Stage and firing one-by-one query on the DB table to fetcht the result.
Range lookup: this will help you to search records based on perticular range. it will serch only that perticuler range records and provides good performance insted of serching the enire record set.
i.e Define the range expression by selecting the upper
bound and lower bound range columns and the required operators.
For example:
Account_Detail.Trans_Date >= Customer_Detail.Start_Date AND
Account_Detail.Trans_Date <= Customer_Detail.End_Date
# Use and Types of Funnel Stage
in Datastage ?
The Funnel stage is a processing stage. It copies multiple input
data sets to a single output data set. This operation is useful for combining
separate data sets into a single large data set. The stage can have any number
of input links and a single output link.
The Funnel stage can operate in one of three modes:
- Continuous Funnel combines the records of the input data in no guaranteed order. It takes one record from each input link in turn. If data is not available on an input link, the stage skips to the next link rather than waiting.
- Sort Funnel combines the input records in the order defined by the value(s) of one or more key columns and the order of the output records is determined by these sorting keys.
- Sequence copies all records from the first input data set to the output data set, then all the records from the second input data set, and so on.
For all methods
the meta data of all input data sets must be identical. Name of columns should
be same in all input links.
#What is the Diffrence
Between Link Sort and Sort Stage?
Or
Diffrence Between Link sort and Stage Sort ?
If the volume of the data is low, then we go for link sort.
If the volume of the data is high, then we go for sort stage.
If the volume of the data is high, then we go for sort stage.
"Link Sort" uses scratch disk (physical location on
disk), whereas
"Sort Stage" uses server RAM (Memory). Hence we can
change the default memory size in "Sort Stage"
Using SortStage you have the possibility to create a KeyChangeColumn
- not possible in link sort.
Within a SortStage you have the possibility to increase the memory size per partition,
Within a SortStage you have the possibility to increase the memory size per partition,
Within a SortStage you can define the 'don't sort' option
on sort key they are already sorted.
Link Sort and stage sort,both do the same thing.Only the Sort
Stage provides you with more options like the amount of memory to be
used,remove duplicates,sort in Ascending or descending order,Create change key
columns and etc.These options will not be available to you while using Link
Sort.
# Details of Data partitioning and
collecting methods in Datastage?
Partitioning mechanism divides a portion of data into
smaller segments, which is then processed independently by each node in
parallel. It helps make a benefit of parallel architectures like SMP, MPP, Grid
computing and Clusters.
- Auto
- Same
- Round robin
- Hash
- Entire
- Random
- Range
- Modulus
Collecting is the opposite of partitioning and can be
defined as a process of bringing back data partitions into a single sequential
stream (one data partition).
1. Auto
2.
Round Robin
3.
Ordered
4.
Sort Merge
** Data partitioning methods : Datastage supports a few types of Data
partitioning methods which can be implemented in parallel stages:
· Auto - default. Datastage
Enterprise Edition decides between using Same or Round Robin partitioning.
Typically Same partitioning is used between two parallel stages and round robin
is used between a sequential and an EE stage.
· Same - existing partitioning
remains unchanged. No data is moved between nodes.
· Round robin - rows are
alternated evenly accross partitions. This partitioning method guarantees an
exact load balance (the same number of rows processed) between nodes and is
very fast.
· Hash - rows with same key column
(or multiple columns) go to the same partition. Hash is very often used and
sometimes improves performance, however it is important to have in mind that
hash partitioning does not guarantee load balance and misuse may lead to skew
data and poor performance.
· Entire - all rows from a dataset
are distributed to each partition. Duplicated rows are stored and the data
volume is significantly increased.
· Random - rows are randomly
distributed accross partitions
· Range - an expensive refinement
to hash partitioning. It is imilar to hash but partition mapping is
user-determined and partitions are ordered. Rows are distributed according to
the values in one or more key fields, using a range map (the 'Write Range Map'
stage needs to be used to create it). Range partitioning requires processing
the data twice which makes it hard to find a reason for using it.
· Modulus - data is partitioned on
one specified numeric field by calculating modulus against number of
partitions. Not used very often.
** Data collecting methods : A collector combines
partitions into a single sequential stream. Datastage Parallel supports the
following collecting algorithms:
· Auto - the default algorithm
reads rows from a partition as soon as they are ready. This may lead to
producing different row orders in different runs with identical data. The
execution is non-deterministic.
· Round Robin - picks rows from
input partition patiently, for instance: first row from partition 0, next from
partition 1, even if other partitions can produce rows faster than partition 1.
· Ordered - reads all rows from
first partition, then second partition, then third and so on.
· Sort Merge - produces a globally
sorted sequential stream from within partition sorted rows. Sort Merge produces
a non-deterministic on un-keyed columns sorted sequential stream using the
following algorithm: always pick the partition that produces the row with the
smallest key value.
#Remove duplicates
using Sort Stage and Remove Duplicate Stages and Diffrence?
We can remove
duplicates using both stages but in the sort stage we can capture duplicate
records using create key change column property.
1)The advantage of
using sort stage over remove duplicate stage is that sort stage allows us to
capture the duplicate records whereas remove duplicate stage does not.
2) Using a sort stage we can only retain the first record.
Normally we go for retaining last when we sort a particular field in ascending order and try to get the last rec. The same can be done using sort stage by sorting in descending order to retain the first record.
2) Using a sort stage we can only retain the first record.
Normally we go for retaining last when we sort a particular field in ascending order and try to get the last rec. The same can be done using sort stage by sorting in descending order to retain the first record.
#What is difference
between Copy & transformer stage ?
In a copy stage there
are no constraints or derivation so it surely should perform better than a
transformer. If you want a copy of a dataset you better use the copy stage and
if there any business rules to be applied to the dataset you better use the
transformer stage.
We use the copy stage to change the metadata of input dataset(like changing the column name)
We use the copy stage to change the metadata of input dataset(like changing the column name)
# What is the use
Enterprise Pivot Stage ?
The Pivot Enterprise
stage is a processing stage that pivots data horizontally and vertically.
· Specifying a horizontal pivot
operation : Use the Pivot Enterprise stage to horizontally pivot data to
map sets of input columns onto single output columns.
Table 1. Input data for a simple horizontal pivot operation
|
|||||||
REPID
|
last_name
|
Jan_sales
|
Feb_sales
|
Mar_sales
|
|||
100
|
Smith
|
1234.08
|
1456.80
|
1578.00
|
|||
101
|
Yamada
|
1245.20
|
1765.00
|
1934.22
|
|||
Table 2. Output data for a simple horizontal pivot operation
|
|||||||
REPID
|
last_name
|
Q1sales
|
Pivot_index
|
||||
100
|
Smith
|
1234.08
|
0
|
||||
100
|
Smith
|
1456.80
|
1
|
||||
100
|
Smith
|
1578.00
|
2
|
||||
101
|
Yamada
|
1245.20
|
0
|
||||
101
|
Yamada
|
1765.00
|
1
|
||||
101
|
Yamada
|
1934.22
|
2
|
||||
· Specifying a vertical pivot operation:
Use the Pivot Enterprise stage to vertically pivot data and then map the
resulting columns onto the output columns.
Table 1. Input data for vertical pivot operation
|
||||||
REPID
|
last_name
|
Q_sales
|
||||
100
|
Smith
|
1234.08
|
||||
100
|
Smith
|
1456.80
|
||||
100
|
Smith
|
1578.00
|
||||
101
|
Yamada
|
1245.20
|
||||
101
|
Yamada
|
1765.00
|
||||
101
|
Yamada
|
1934.22
|
||||
Table 2. Out put data for vertical pivot operation
|
||||||
REPID
|
last_name
|
Q_sales (January)
|
Q_sales1 (February)
|
Q_sales2 (March)
|
Q_sales_average
|
|
100
|
Smith
|
1234.08
|
1456.80
|
1578.00
|
1412.96
|
|
101
|
Yamada
|
1245.20
|
1765.00
|
1934.22
|
1648.14
|
|
Stage Variable - An intermediate processing variable that
retains value during read and doesnt pass the value into target column.
Derivation - Expression that specifies value to be
passed onto the target
Constraint- is like a filter condition which limits the
number of records coming from input according to business rule.
The right order is : Stage
variables Then Constraints Then Derivations
#What is the
difference between change capture and change apply stages?
Change capture stage is used to get the difference between
two sources i.e. after dataset and before dataset. The source which is used as
a reference to capture the changes is called after dataset. The source in which
we are looking for the change is called before dataset. This change capture
will add one field called "chage code" in the output from this stage.
By this change code one can recognize which kind of change this is like whether
it is delete, insert or update.
Change apply stage is used along with change capture
stage. It takes change code from the change capture stage and apply all the
changes in the before dataset based on the change code.
Change Capture is used to capture the changes between the
two sources.
Change Apply will apply those changes in the output file.
instagram takipçi satın al
ReplyDeleteaşk kitapları
tiktok takipçi satın al
instagram beğeni satın al
youtube abone satın al
twitter takipçi satın al
tiktok beğeni satın al
tiktok izlenme satın al
twitter takipçi satın al
tiktok takipçi satın al
youtube abone satın al
tiktok beğeni satın al
instagram beğeni satın al
trend topic satın al
trend topic satın al
youtube abone satın al
instagram takipçi satın al
beğeni satın al
tiktok izlenme satın al
sms onay
youtube izlenme satın al
tiktok beğeni satın al
sms onay
sms onay
perde modelleri
instagram takipçi satın al
takipçi satın al
tiktok jeton hilesi
instagram takipçi satın al
pubg uc satın al
sultanbet
marsbahis
betboo
betboo
betboo
smm panel
ReplyDeletesmm panel
İŞ İLANLARI
İNSTAGRAM TAKİPÇİ SATIN AL
hırdavatçı
BEYAZESYATEKNİKSERVİSİ.COM.TR
Servis
tiktok jeton hilesi
pendik beko klima servisi
ReplyDeletependik bosch klima servisi
pendik toshiba klima servisi
tuzla lg klima servisi
tuzla alarko carrier klima servisi
tuzla daikin klima servisi
ataşehir toshiba klima servisi
çekmeköy beko klima servisi
ataşehir beko klima servisi