Tải bản đầy đủ (.pdf) (10 trang)

Hands-On Microsoft SQL Server 2008 Integration Services part 11 potx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (328.55 KB, 10 trang )

78 Hands-On Microsoft SQL Server 2008 Integration Services
the FileUsageType property of the connection manager to indicate how you want to
use the File Connection Manager—that is, you want to create or use an existing file or
a folder.
Flat File Connection Manager
This connection manager provides access to data in a flat file. It is used to extract data
from a flat-file source or load data to a destination and can use delimited, fixed-width,
or ragged-right format. This connection manager accesses only one file. If you want to
reference multiple flat files, you must use a Multiple Flat Files Connection Manager.
FTP Connection Manager
Use this connection manager whenever you want to upload or download files using File
Transfer Protocol (FTP). It enables you to connect to an FTP server using anonymous
authentication or basic authentication. The default port used for FTP connection is 21.
FTP Connection Manager can send and receive files using active or passive mode. The
transfer mode is defined as active mode when the server initiates the FTP connection
and passive mode when the client initiates the FTP connection.
HTTP Connection Manager
Whenever you want to upload or download files using HTTP (port 80), use this
connection manager. It enables you to connect to a web server using HTTP. The Web
Service task provided in Integration Services uses this connection manager. Like the
FTP Connection Manager, the HTTP Connection Manager allows connections using
anonymous authentication or basic authentication.
MSMQ Connection Manager
When you’re working with mainframe systems or on systems with messaging architecture,
you will need to use Message Queuing within your packages for which you will have
to use an MSMQ Connection Manager. For example, if you want to use the Message
Queue task in Integration Services, you need to add an MSMQ Connection Manager.
An MSMQ Connection Manager enables a package to connect to a message queue.
Analysis Services Connection Manager
If you are creating an analysis services project or database as part of your solution, you
may want to update or process Analysis Services objects as part of your SSIS jobs. One


simple example could be that your SSIS packages update the data mart nightly, after
Chapter 3: Nuts and Bolts of the SSIS Workflow 79
which you may want to process the cube and dimensions to include the latest data in
the SSAS database. For such reasons as these, you may include the Analysis Services
Connection Manager into your SSIS packages. This connection manager provides
access to Analysis Services objects such as cube and dimensions by allowing you to
connect to an Analysis Services database or an Analysis Services project in the same
solution, though you can connect to an Analysis Services project only at design time.
You will use the Analysis Services Connection Manager with the Analysis Services
Processing task, Analysis Services Execute DDL task, or Data Mining Model Training
destination objects in your package.
Multiple Files Connection Manager
When you have to connect to multiple files within your Script task or Script component
scripts, you will use the Multiple Files Connection Manager. When you add this
connection manager, you can add multiple files or folders to be referenced. Those multiple
files and folders show up as a piped delimited list in the ConnectionString property of
this connection manager. To specify multiple files or folders, you can also use wildcards.
Suppose, for example, that you want to use all the text files in the C:\SSIS folder. You
could add the Multiple Files Connection Manager by choosing only one file in the
C:\SSIS folder, going to the Properties window of the connection manager, and setting
the value of the ConnectionString property to C:\SSIS\*.txt.
Similar to the File Connection Manager, the Multiple Files Connection Manager
has a FileUsageType property to indicate the usage type—that is, how you want to
create or use an existing file or a folder.
Multiple Flat Files Connection Manager
As you can reference only one flat file using the Flat File Connection Manager, you use
the Multiple Flat Files Connection Manager when you need to reference more than
one flat file. You can access data in flat files having delimited, fixed-width, or ragged-
right format. In the GUI of this connection manager, you can select multiple files by
using the Browse button and highlighting multiple files. These files are then listed as

a piped delimited list in the connection manager. You can also use wildcards to specify
multiple files. Suppose, for example, that you want to use all the flat files in the
C:\SSIS folder. To do this, you would add C:\SSIS\*.txt in the File Names field to
choose multiple files. However, note that all these files must have the same format.
So, when you have multiple flat files to import from a folder, you have two options.
One is to loop over the files using Foreach Loop Container, read the filenames and
pass those filenames one by one to the Flat File Connection Manager so that the
files can be imported iteratively. The second option is to use a Multiple Flat Files
80 Hands-On Microsoft SQL Server 2008 Integration Services
Connection Manager where you don’t need to use a looping construct; rather, this
connection manager reads all the files, collates the data, and passes the data directly to
the downstream components in a single iteration as if the data were coming from
a single source such as a database table instead of multiple flat files.
Both these options have their usability in particular scenarios; for example, if you
have to import several files from the same folder and you’re not worried much about
auditing and lineage—i.e., where the data is coming from, you can use the Multiple
Flat Files Connection Manager method. This method bulk-imports the data quite
quickly comparative to the looping construct of dealing with each file. The cost of
speed is paid in terms of resource utilization. As all the files are read within the same
batch, the CPU utilization and memory requirements are quite high in this case,
although for a short duration, depending upon the file sizes. On the other hand, the
iterative method deals with a file at a time, requiring less CPU and memory resources,
but for a longer duration. Based on the file size, lineage, and auditing requirements, the
resource availability on your server and the time window available to import data, you
can choose one of these two methods to address the requirements.
ODBC Connection Manager
This connection manager enables an Integration Services package to connect to a wide
range of relational database management systems (RDBMS) using the Open Database
Connectivity (ODBC) protocol.
OLE DB Connection Manager

This connection manager enables an Integration Services package to connect to a data
source using an OLE DB provider. OLE DB is an updated ODBC standard and
is designed to be faster, more efficient, and more stable than ODBC; it is an open
specification for accessing several kinds of data. Many of the Integration Services tasks
and data flow components use the OLE DB Connection Manager. For example, the
OLE DB source adapter and OLE DB destination adapter use OLE DB Connection
Manager to extract and load data, and one of the connections that the Execute SQL
task uses is the OLE DB Connection Manager to connect to an SQL Server database
to run queries.
SMO Connection Manager
SQL Management Objects (SMO) is a collection of objects that can be programmed
to manage SQL Server. SMO is an upgrade to SQL-DMO, a set of APIs you use to
create and manage SQL Server database objects. SMO performs better, is more scalable,
and is easy to use compared to SQL-DMO. SMO Connection Manager enables an
Chapter 3: Nuts and Bolts of the SSIS Workflow 81
Integration Services package to connect to an SMO server and hence enable you to
manage SQL Server objects using SMO scripts. For example, Integration Services
transfer tasks use an SMO connection to transfer objects from one server to another.
SMTP Connection Manager
An SMTP Connection Manager enables an Integration Services package to connect to
a Simple Mail Transfer Protocol (SMTP) server. For example, when you want to send
an e-mail notification from a package, you can use Send Mail Task and configure it to
use SMTP Connection Manager to connect to an SMTP server.
SQL Server Compact Edition Connection Manager
When you need to connect to an SQL Server Compact database, you will use an SQL
Server Compact Connection Manager. SQL Server Compact Destination adapter uses
this connection to load data into a table in an SQL Server Compact Edition database.
If you’re running the package that uses this connection manager on a 64-bit server,
you will need to run it in 32-bit mode, as the SQL Server Compact Edition provider is
available in a 32-bit version.

WMI Connection Manager
Windows Management Instrumentation (WMI) enables you to access management
information in enterprise systems such as networks, computers, managed devices, and
other managed components using the Web-Based Enterprise Management (WBEM)
standard. Using a WMI Connection Manager, your Integration Services package can
manage and automate administrative tasks in an enterprise environment.
Microsoft Connector 1.0 for SAP BI
You can import and export data between Integration Services and SAP BI by using
Microsoft Connector 1.0 for SAP BI. Using this connector in Integration Services, you
can integrate a non-SAP data source with SAP BI or can use SAP BI as a data source
in your data integration application. The Microsoft Connector for SAP BI is a set of
managed components that transfers data from and to an SAP NetWeaver BI version
7 system in both Full and Delta modes via standard interfaces. This connector is not
installed in the default installation; rather it is an add-in to Integration Services and
you have to download the installation files separately from the Microsoft SQL Server
2008 Feature Pack download web page. The SAP BI Connector can be installed on an
Enterprise or a Developer Edition of SQL Server 2008 Integration Services; however,
82 Hands-On Microsoft SQL Server 2008 Integration Services
you can transfer data between SAP BI 7.0 and any of the versions from SQL Server
2000 and later. The SAP BI connector provides three main components:
SAP BI Source
c
SAP BI Destination c
SAP BI Connection Manager c
As you can guess, SAP BI Source can be used to extract data from an SAP BI system,
SAP BI Destination can be used to load data into an SAP BI system and the SAP BI
Connection Manager helps to manage the RFC connection between the Integration
Services package and SAP BI. When you install the SAP BI connector, the SAP BI
Connection Manager is displayed in the list of connection managers; however, you will
need to add the SAP BI Source and SAP BI Destination manually. You can do this by

right-clicking the Data Flow Sources in the Toolbox, selecting the Choose Items option,
and selecting SAP BI Source from the list in the SSIS Data Flow Items tab. Similarly,
you can add the SAP BI Destination by right-clicking the Data Flow Destinations
in the Toolbox.
Figure 3-2 shows the SAPBI Connection Manager in the Add SSIS
Connection Manager dialog box, the SAP BI Source in Data Flow Sources section, and
the SAP BI Destination in the Data Flow Destinations section of the Toolbox.
Microsoft Connector for Oracle by Attunity
Microsoft Oracle and Teradata connectors are developed by Attunity and have been
implemented in the same fashion as the SAP BI connector. That is, when you install
these connectors, you get a connection manager, a Source component, and a Destination
component, though you will have to manually add source and destination components in
to the Data Flow Designer Toolbox. Refer to Figure 3-2 to see how these components
have been implemented. The Oracle connector has been developed to achieve optimal
performance when transferring data from or to an Oracle database using Integration
Services. The connector is implemented as a set of managed components and is available
for Enterprise and Developer Editions of SQL Server 2008 Integration Services only.
The Attunity Oracle Connector supports Oracle 9.2.0.4 and higher-version databases
and requires Oracle client software version 10.x or 11.x be installed on the same
computer where SSIS will be using this connector. With this connector, you can:
Fast Load
c Bulk Load Destination using OCI (Oracle Call Interface) Direct Path.
Arrayed Load c Bulk Load Destination in batches and the entire batch is inserted
under the same transaction.
Bulk Extract Source
c Using OCI Array Binding.
Chapter 3: Nuts and Bolts of the SSIS Workflow 83
Microsoft Connector for Teradata by Attunity
The Microsoft Connector for Teradata is a set of managed components developed
to achieve optimal performance for transferring data from or to a Teradata database

using Integration Services. The connector is available for the Enterprise and Developer
Editions of SQL Server 2008 Integration Services only. The SSIS components for
Teradata—i.e., Teradata Source, Teradata Destination, and Teradata Connection
Figure 3-2 SSIS connection managers and data flow sources and destinations
84 Hands-On Microsoft SQL Server 2008 Integration Services
Manager (see Figure 3-2) use the Teradata Parallel Connector (TPC) for connectivity.
The Microsoft Connector for Teradata supports
Teradata Database version 2R6.0
c
Teradata Database version 2R6.1 c
Teradata Database version 2R6.2 c
Teradata Database version 12.0 c
To use this connector, you will have to install Teradata Parallel Transporter (TPT)
version 12.0 and the Teradata ODBC driver (version 12 recommended) on the same
computer where SSIS will be using this connector. You can use this connector for
Bulk Load Destination using TPT FastLoad
c
Incremental Load Destination using TPT Tpump c
Bulk Extract Source using TPT c
Data Sources and Data Source Views
We have talked about connection managers that can be added in the packages. However,
you might have noticed two folders, Data Sources and Data Source Views, in your project
in Solution Explorer. These folders can also contain data source connections. However,
these are only design-time objects and aren’t available at run time. The connection
managers embedded in the packages are used at run time.
Data Sources
You can create design-time data source objects in Integration Services, Analysis Services,
and Reporting Services projects in BIDS. A data source is a connection to a data store—
for example, a database. You can create a data source by right-clicking the Data Sources
node and selecting the New Data Source option. This will start the Data Source Wizard

that will help you create a data source. So, the data source object gets created outside the
package and you reference it later in the package. Once a data source is created, it can be
referenced by multiple packages. You can reference a data source in a package by right-
clicking in the Connection Managers area and selecting the New Connection from Data
Source option from the context menu.
When you reference a data source inside a package, it is added as a connection
manager connection and is used at run time. This approach of having data source
created outside a package and then referencing it or embedding it in the package as
Chapter 3: Nuts and Bolts of the SSIS Workflow 85
a connection manager has several benefits. You can provide a consistent approach
in your packages to make managing connections easier. You can update all the
connection managers used in various packages that reference a data source by
simply making a change at one place only—in the data source itself, as the data
source provides synchronization between itself and the connection managers. Last,
you can delete a data source any time without affecting the connection managers
in the packages. This is possible because there is no dependency between the two.
Connection managers don’t need data sources to be able to work, as they are complete
in themselves. The only link between a data source and the connection managers
that reference it is that the connection managers get synchronized at times or when
the changes occur. The data sources and the data source views are only design-time
objects that help in management of the connection managers across several packages.
During run time, the package doesn’t need a data source to be present, as it uses
connection managers that gets embedded in it anyway. Data sources are not used
when building packages programmatically.
Data Source View
A data source view, built on a data source, is a named, saved subset that defines the
underlying schema of a relational data source. A data source view can include metadata
that can define sources, destinations, and lookup tables for SSIS tasks, transformations,
and data adapters. While a data source is a connection to a data store, the data source
views are used to reference more specific objects such as tables or views or their

subsets. As you can apply filters on a data source view, you can in fact create multiple
data source view objects from a data source. For example, a data source can reference
a database, while different data source views can be created to reference its different
tables or views. To use a data source view in a package, you must first add the data
source to the package.
Using data source views can be beneficial. While you can use a data source view in
multiple packages, refreshing a data source view reflects the changes in its underlying data
sources. Data source views can also cache metadata of the data sources on which they are
built and can extend a data source view by adding calculated columns, new relationships,
and so on. You can consider this as an additional abstraction layer provided to you for
polishing the data model or aligning the metadata as per your package requirements. This
can be a very powerful facility in case you’re dealing with third-party databases or working
with systems where it is not easy for you to make a change.
The data source view can be referenced by data flow components such as OLE DB
source and lookup transformations. To reference a data source view, you instantiate
the data source and then refer the data source view in the component. Figure 3-3
shows an OLE DB source referencing a CampaignZone1 data source view, where
86 Hands-On Microsoft SQL Server 2008 Integration Services
Campaign is a data source. Once you add a data source view to a package, it is resolved
to an SQL statement and stored in a property of the component using it. You create
a data source view by using the Data Source View Wizard and then modify it in the
Data Source View Designer. Data source views are not used when building packages
programmatically.
SSIS Variables
Variables are used to store values. They enable SSIS objects to communicate among
each other in the package as well as between parent and child packages at run time. You
can use variables in a variety of ways—for example, you can load results of an Execute
SQL task to a variable, change the way a package works by dynamically updating its
parameters at run time using variables, control looping within a package by using a
loaded variable, raise an error when a variable is altered, use them in scripts, or evaluate

them as an expression.
Figure 3-3 Referencing a data source view inside an OLE DB source
Chapter 3: Nuts and Bolts of the SSIS Workflow 87
DTS 2000 provides global variables, for which users set the values in a single area
in the package and then use those values over and over. This allows users to extend the
dynamic abilities of packages. As the global variables are defined at the package level,
sometimes managing all the variables at a single place becomes quite challenging for
complex packages. SSIS has improved on this shortcoming by assigning a scope to the
variables. Scopes are discussed in greater detail a bit later in the chapter in the section
“User-Defined Variables.”
Integration Services provides two types of variables—system variables and user-
defined variables—that you can configure and use in your packages. System variables
are made available in the package and provide environmental information or the state
of the system at run time. You don’t have to create the system variables, as they are
provided for you, and hence you can use them in your packages straightaway. However,
you must create a user-defined variable before you can use it in your package. To see the
variables available in a package in BIDS, either go to the Variables window or go to the
Package Explorer tab and expand the Variables folder.
System Variables
The preconfigured variables provided in Integration Services are called system variables.
While you create user-defined variables to meet the needs of your packages, you cannot
create additional system variables. They are read-only; however, you can configure
them to raise an event when they change their value. System variables store informative
values about the packages and their objects, which can be used in expressions to
customize packages, containers, tasks, and event handlers. Different containers have
different system variables available to them. For example, PackageID is available in the
package scope, whereas TaskID is available in the Data Flow Task scope. Some of the
more frequently used system variables for different containers are defined in Table 3-1.
Using these system variables, you can actually extract interesting information from
the packages on the fly. For example, at run time using system variables, you can log

who started which package at what time. This is exactly what you are going to do in the
following Hands-On exercise.
Hands-On: Using System Variables
to Create Custom Logs
This exercise demonstrates how you can create a custom log for an Integration Services
package.

×