We need to identify either the
transformation is Synchronous (or) Asynchronous.
Asynchronous transformation always takes
more time to run then Synchronous.
· Synchronous Transformation:
Here the rows are processed as it is getting from the
source.
Ex: Derived column, data conversion etc.
· Asynchronous Transformation:
In this case until the required rows, the system
collects and it doesn’t perform any operation.
It requires more memory and processes many rows.
These transformations either blocks partially or fully.
So these transformation can also called as “Partially blocking or Full blocking”
transformations.
Ex: Sort, Aggregate, Merge etc.
· Remove unused columns from the pipeline:
Extract required data from source system
like table or file. Need to avoid the tendency to pull everything available on
the source. If you can follow the same method it will impact the future, it
eats up network bandwidth, consumes system resource (I/O & CPU), requires
extra storage, and it kills the overall performance of the ETL System.
· Avoid unnecessary sorting in
the pipeline:
Behaviour of sorting is to kills the
performance and avoid unnecessary sorting can enhance the performance of the
package flow.
Sometimes the source data has already been sorted before
going to the next steps. Such pre-sorting can occur when the SELECT query used
an ORDER BY clause or when the data was inserted into the source in sorted
order. For such pre-sorted source data, you can provide a hint that the data is
sorted, and thereby avoid the use of a Sort transformation to satisfy the
sorting requirements of certain downstream transformations. (For example, the
Merge and Merge Join transformations require sorted inputs.)
To
provide a hint that the data is sorted, you must do the following tasks:
·
Set the IsSorted property on
the output of an upstream data flow component to True.
·
Specify the sort key columns on
which the data is sorted.
No comments:
Post a Comment