Tải bản đầy đủ (.doc) (15 trang)

lookup, rank and normalizer transformation in informatica

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1014.59 KB, 15 trang )

Lookup, Rank and Normalizer
transformation in Informatica
May, 2006
Prepared at
INFOSYS TECHNOLOGIES LIMITED
India
Document Name
Lookup, Rank and
Normalizer
transformation in
Informatica.doc
Version Rev.
0.0a
Author’s Name Rajat Kashyap Author’s
Email

1
Table of Contents
1.1 Active Transformtion 3
1.2 Passive Transformation 3
2. Lookup Transformation 4
2.1 Connected lookup 5
2.2 Unconnected lookups 7
2.3 Specifying database location for a Lookup transformation 9
2.4 SQL override in Lookups 10
3.1 The Rank Port 12
3.2 Rank Index 13
4. Normalizer Transformation 14
5. References 15
2
1. Transformations Overview:


A transformation is a repository object that generates, modifies, or passes data.
Transformations, in a mapping, represent the operations the Informatica Server performs
on the data. Data passes into and out of transformations through ports which are linked in
a mapping or mapplet.
Transformations can be broadly classified as Active/Passive or Connected/
Unconnected.
1.1 Active Transformtion
An Active transformation is the one which can increase or reduce the no of rows passing
through it. Example of an Active transformation is Filter transformation.
Filter transformation passes those rows, through it, which matches a specified condition
or criterion. Another example of an Active transformation is a Normalizer which can
increase the no of rows passing through it.
1.2 Passive Transformation
A Passive transformation does not change the number of rows that pass through it, such
as an Expression transformation that performs a calculation on data and passes all rows
through the transformation.

Transformations can be Connected or Unconnected.
Connected transformations are connected to other transformations. An unconnected
transformation is not connected to other transformations in the mapping. It is called
within another transformation, and returns a value to that transformation. Filter, Joiner,
Expression etc are some examples of connected transformations whereas Lookup and
Stored Procedure can be both connected and unconnected.
3
2. Lookup Transformation
Lookup transformation is a Passive transformation and it can be either Connected or
Unconnected.
It is used to look up data in a relational table or view. Lookup definition can be imported
either from source or from target tables. Import a lookup definition from any relational
database to which both the Informatica Client and Server can connect. You can use

multiple Lookup transformations in a mapping.
You can create a lookup by clicking Transformations > Create > Lookup, in the designer.

Fig Lkp1.1
Lookup definition can be imported either from source or from target tables.
Fig Lkp1.2
Lookups can be configured to be connected or unconnected, cached or uncached.
4
The Informatica Server queries the lookup table based on the lookup ports in the
transformation. It compares Lookup transformation port values to lookup table column
values based on the lookup condition. Use the result of the lookup to pass to other
transformations and the target.
Connected and unconnected lookup (or transformations) receive input and send output
in different ways.
You can improve session performance by caching the lookup table. If you cache the
lookup table, you can choose to use a dynamic or static cache. By default, the lookup
cache remains static and does not change during the session. With a dynamic cache, the
Informatica Server inserts or updates rows in the cache during the session. When you
cache the target table as the lookup, you can look up values in the target and insert them
if they do not exist, or update them if they do.
You can configure a connected Lookup transformation to receive input directly from the
mapping pipeline, or you can configure an unconnected Lookup transformation to receive
input from the result of an expression in another transformation.
2.1 Connected lookup
Connected lookups receives input values directly from the pipeline, uses a dynamic or
static cache and can return multiple columns from the same row or insert into the
dynamic lookup cache. If there is no match for the lookup condition, the Informatica
Server returns the default value for all output ports. If you configure dynamic caching,
the Informatica Server inserts rows into the cache or leaves it unchanged.
If there is a match for the lookup condition, the Informatica Server returns the result of

the lookup condition for all lookup/output ports. Connected Lookup pass multiple output
values to another transformation, links lookup/output ports to another transformation and
also supports user-defined default values.
For each input row, the Informatica Server queries the lookup table or cache based on the
lookup ports and the condition in the transformation.
Fig Lkp1.3
5
Here, in Fig Lkp1.3, EmployeeId is taken as an input port and lookup condition will be
based on it. If you are making a shared lookup which will be used in different mappings,
based on different conditions, you should create as many input ports as required.
You can specify the condition for lookup in the condition tab:
Fig Lkp1.4
Fields will be fetched based on the above lookup condition. While creating a shared
lookup, make sure that you specify at least one lookup condition. You can add condition
based on your business need.
Fig Lkp 1.5
6
2.2 Unconnected lookups
Unconnected Lookups receives input values from the result of a :LKP expression in
another transformation. You can only use a static cache here and designate one return
port (R) i.e. you can only return one column from each row.
In unconnected lookups, if there is no match for the lookup condition, the Informatica
Server returns NULL. If there is a match for the lookup condition, the Informatica Server
returns the result of the lookup condition into the return port.
Fig Lkp1.6
The lookup/output/return port passes the value to the transformation calling :LKP
expression. The general format of that is:
:LKP.lookup_transformation_name(argument, argument, )
The arguments are local input ports that match the Lookup transformation input ports
used in the lookup condition.

Following guidelines should be used to write an expression that calls an unconnected
Lookup transformation:
• The order in which you list each argument must match the order of the lookup
conditions in the Lookup transformation. Also the no of arguments must match
the no of lookup condition and input ports in the lookup.
• The datatypes for the ports in the expression must match the datatypes for the
input ports in the Lookup transformation. The Designer does not validate the
expression if the datatypes do not match.
7
• If one port in the lookup condition is not a lookup/output port, the Designer does
not validate the expression.
• The arguments (ports) in the expression must be in the same order as the input
ports in the lookup condition.
• If you use incorrect :LKP syntax, the Designer marks the mapping invalid.
• If you call a connected Lookup transformation in a :LKP expression, the Designer
marks the mapping invalid.
Fig Lkp1.7
In the figure above, an unconnected lookup is used. The expression exp_lookupEmployee
makes a call to the unconnected lookup to fetch the last name of the Employee based on
Employee_ID.
Fig Lkp1.8
8
Fig Lkp1.9
Unconnected lookups have a major drawback. If you are creating a shared unconnected
lookup, and if someone else changes that lookup, according to his requirements, and adds
an additional input port, all the calls to that lookup will become invalid because the calls
will no longer satisfy the order and no of the input ports in the lookup.
2.3 Specifying database location for a Lookup transformation
While configuring the lookup transformation, you can use either the $Source or $Target
variable when you specify the database location for a Lookup transformation. You can

use these variables in the Location Information property for a Lookup transformation.
Fig Lkp1.10
9
When you configure a session, you can specify a database connection value for $Source
or $Target. This ensures the Informatica Server uses the correct database connection for
the variable when it runs the session.
These parameters are passed from the Unix script, while running the ETL. You can also
hardcode the connection name but it is not considered to be a good practice and should be
limited to testing purposes only.
There might be a requirement that the lookup has to be done on a table which is neither
in source database nor in target database. Like in my project requirement, I had to code
ETLs which move data from one table in Stage to another. Now in a scenario like this
you cannot parameterize the database location, if you are looking up a table from the
warehouse. So in these cases you have to hardcode the lookup database location.
2.4 SQL override in Lookups
You can use SQL override property, while configuring the lookup, to overrides the
default SQL statement to query the lookup table. It specifies the SQL statement you want
the Informatica Server to use for querying lookup values and can be used only with the
lookup with cache enabled. Enter only the SELECT, FROM, and WHERE clauses when
you enter the SQL override.
By default, the Informatica Server generates an ORDER BY statement for a cached
lookup that contains all lookup ports. To increase performance, you can suppress the
default ORDER BY statement and enter an override ORDER BY with fewer columns.
To override the default ORDER BY statement, specify the ORDER BY statement and
place a comment notation after the ORDER BY statement to suppress the default
ORDER BY statement that the Informatica Server generates.
Make sure that ORDER BY statement contains the condition ports in the same order they
appear in the Lookup condition, otherwise the session will fail.
10
3. Rank Transformation

Rank Transformation is an Active and Connected transformation.
The Rank transformation allows you to rank the data, coming inside a rank
transformation, based on some particular field. It also lets you select only the top or
bottom rank or specified no of top/ bottom ranks. You can use a Rank transformation to
return the largest or smallest numeric value in a port or group.
Rank transformation is different from MIN/ MAX function of the aggregator in the sense
that MIN or MAX function only lets you select one maximum/ minimum value from the
data whereas Rank transformation allows you to select a set of too pr bottom records.
You connect all ports representing the same row set to the transformation. Only the rows
that fall within that rank, based on some measure you set when you configure the
transformation, pass through the Rank transformation.
As an active transformation, the Rank transformation might change the number of rows
passed through it. You might pass 100 rows to the Rank transformation, but select to rank
only the top 10 rows, which pass from the Rank transformation to another transformation.
While configuring the rank transformation, the designer asks for the no. of ranks. Specify
the no of top/bottom ranks needed in the target.
Fig Rank1.1
11
You can connect ports from only one transformation to the Rank transformation.
The Rank transformation allows you to create local variables and write non-aggregate
expressions.
During the workflow, the Informatica Server caches input data until it can perform the
rank calculations. Informatica Server compares an input row with rows in the data cache.
If the input row out-ranks a cached row, the Informatica Server replaces the cached row
with the input row.
3.1 The Rank Port
The Rank transformation includes input/output ports, variable ports and a rank port.
Rank port is used to designate the column for which we want to rank values. Only one
rank port can be used in the transformation and Rank port must be linked to another
transformation.

Fig Rank1.2
Here in the above example ranking is based on Marks field i.e. the rows would be ranked
based on the value of the Marks field.
12
3.2 Rank Index
The Designer automatically creates a RANKINDEX port for each Rank transformation.
The Informatica Server uses the Rank Index port to store the ranking position for each
row in a group. It is an output port only and can be passed directly to the target.
If two rank values match, they receive the same value in the rank index and the
transformation skips the next value.
Fig Rank1.3
In case there is a requirement that it is needed to rank all the rows and all the ranks are
needed in the target, specify the maximum no of records that can come, in the rank
transformation, from the source. In case you are not aware the no of records that can
come from the source, specify the maximum no of ranks or records that the rank
transformation can rank, i.e. 2147483647. The rank transformation will rank whatever no
of records that comes in.
In my project requirement, I was supposed to rank a record based on eight different
fields. So I needed to use eight rank transformations as a rank transformation can be
used to rank on one and only one field. Also I needed to have eight expressions, one each
before the rank transformation which holds all the fields along with the previous rank
index.
Rank transformation allows you to group information. While configuring the
transformation, you can set one of its input/output ports as a group by port.
For example, if you want to select the 10 top rankers in a particular class as class id (Fig
Rank1.2). For each unique value in the group port, the transformation creates a group of
rows falling within the rank definition.
13
4. Normalizer Transformation
Normalizer Transformation is an Active and Connected transformation.

The Normalizer transformation normalizes records from COBOL and relational sources,
allowing you to organize the data according to your own needs.
Normalization is the process of organizing data. In database terms, this includes creating
normalized tables and establishing relationships between those tables
A Normalizer transformation is used mainly with COBOL sources where most of the
time data is stored in de-normalized format. Also, Normalizer transformation can be
used, with relational sources, to create multiple rows from a single row of data. For each
new record it creates, the Normalizer transformation generates a unique identifier. You
can use this key value to join the normalized records. A Normalizer transformation can
appear anywhere in a data flow when you normalize a relational source.
You cannot drag fields into a Normalizer transformation. You have to physically add the
rows and specify the Level and Occurs attribute for them.
Say you have a Student table which has data pertaining to a particular student. A record
in Student table has marks scored in various subjects. Now, you want to insert the data in
Marks table which will have a row for each subject for each student. Here you want to
create as many rows, to be inserted in Marks table, as there are subjects in the Student
table.
Fig Nor1.1
Here, in the example above, all the fields except Marks has ‘Occurs’ attribute set as one,
as we want that, all the normalized records should have same Student_ID, Name and
Class_Id, as that of parent record.
14
Marks has three as ‘Occurs’ attribute as we want to break a record into three records
based on the marks. So a row in the source table will have three rows, corresponding to it,
in the target table.
Fig Nor1.2
In the above example, Normailzer will have three input for marks, one for each subject,
and will create three records one for each input.
5. References
• Informatica Help File

• www.informatica.com
15

×