Tải bản đầy đủ (.pdf) (10 trang)

Hands-On Microsoft SQL Server 2008 Integration Services part 39 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (96.26 KB, 10 trang )

358 Hands-On Microsoft SQL Server 2008 Integration Services
To train the data mining models using this destination, you need a connection to
SQL Server Analysis Services, where the mining structure and the mining models
reside. For this, you can use Analysis Services Connection Manager to connect to an
instance of Analysis Services or to the Analysis Services project. The Data Mining
Model Training Editor has two tabs, Connection and Columns, in which you can
configure the required properties. In the Connection tab, you specify the connection
manager for Analysis Services in the Connection Manager field and then specify the
mining structure that contains the mining models you want this data to train. Once
you select a mining structure in the Mining structure field, the list of mining models
is displayed in the Mining models area, and this destination adapter will train all the
models contained within the specified mining structure. In the Columns tab, you
can map available input columns to the Mining structure columns. The processing of
the mining model requires data to be sorted, which you can achieve by adding a sort
transformation before the data mining model training destination.
DataReader Destination
When your ADO.NET–compliant application needs to access data from the data flow
of an SSIS package, you can use the DataReader destination. Integration Services can
provide data straight from the pipeline to your ADO.NET application in cases where
you need dynamic processing to happen when users request using the ADO.NET
DataReader interface. SSIS data processing extension facilitates provision of the data
via the DataReader destination. An excellent use of the DataReader destination is as a
data source for an SSRS report.
The DataReader destination doesn’t have a custom UI but uses the Advanced Editor
to expose all the properties organized in three tabs. You can specify Name, Description,
LocaleID, and ValidateExternalMetadata properties in the Common Properties section
of the Component Properties tab. In the Custom Properties section, you can specify a
ReadTimeout value in milliseconds, and if this value is exceeded, you can choose to fail
the component in the FailOnTimeout field.
In the Input Columns tab, you can select the columns you want to output, assign each
of them an output alias, and specify a usage type of READONLY or READWRITE


from the drop-down list box. Finally, the Input And Output Properties tab lists only the
input column details, as DataReader destination has only one input and no error output.
Dimension Processing Destination
One of the frequent uses of Integration Services is to load data warehouse dimensions
using the dimension processing destination. This destination can be used to load and
process an SQL Server Analysis Services dimension. Being a destination, it has no
output and one input, and it does not support an error output.
Chapter 9: Data Flow Components 359
The dimension processing destination has a custom user interface, but the Advanced
Editor can also be used to modify properties that are not available in the custom editor.
In the Dimension Processing Destination Editor, the properties are grouped logically
in three different pages. In the Connection Manager page, you can specify the connection
manager for Analysis Services to connect to the Analysis Services server or an Analysis
Services project. Using this connection manager, the Dimension Processing Destination
Editor accesses all the dimensions in the source and displays them as a list for you to
select the one you want to process. Next you can choose the processing method from
add (incremental), full, or update options. In the Mappings page, you can map the
Available Input Columns to the Available Destination Columns using a drag-and-
drop operation.
The Advanced page allows you to configure error handling in the dimension
processing destination. You can choose from several options to configure the way you
want the errors to be handled:
By default, this destination will use default Analysis Services error handling that
c
you can change by un-checking the Use Default Error Configuration check box.
When the dimension processing destination processes a dimension to populate
c
values from the underlying columns, an unacceptable key value may be encountered.
In such cases, you can use the Key Error Action field to specify that the record be
discarded by selecting the DiscardRecord value, or you can convert the unacceptable

key value to the UnknownMember value. UnknownMember is a property of the
analysis services dimension indicating that the supporting column doesn’t have a value.
Next you can specify the processing error limits and can choose to either ignore
c
errors or stop on error. If you select Stop On Error option, then you can specify
the error threshold using the Number Of Errors option. Also, you can specify
the on error action either to stop processing or to stop logging when the error
threshold is reached by selecting the StopProcessing or StopLogging value.
You can also specify specific error conditions such as these:
c
When the destination raises an error of Key Not Found, you can select it to be c
IgnoreError or ReportAndStop, whereas, by default, it is ReportAndContinue.
Similarly, you can configure for Duplicate Key error for which default action c
is to IgnoreError. You can set it to ReportAndStop or ReportAndContinue if
you wish.
When a null key is converted to the UnknownMember value, you can choose
c
to ReportAndStop or ReportAndContinue. By default, the destination will
IgnoreError.
360 Hands-On Microsoft SQL Server 2008 Integration Services
When a null key value is not allowed in data, this destination will c
ReportAndContinue by default. However, you can set it to IgnoreError
or ReportAndStop.
You can specify a path for the error log using the Browse button.
c
Excel Destination
Using the Excel destination, you can output data straight to an Excel workbook,
worksheets, or ranges. You use an Excel Connection Manager to connect to an Excel
workbook. Like an Excel Source, the Excel destination treats the worksheets and
ranges in an Excel workbook as tables or views. The Excel destination has one regular

input and one error output.
This destination has its own custom user interface that you can use to configure its
properties; the Advanced Editor can also be used to modify the remaining properties.
The Excel Destination Editor lists its properties in three different pages.
In the Connection Manager page, you can select the name of the connection manager
from the drop-down list in the OLE DB Connection Manager field. Then you can
choose one of these three data access mode options:
Table or view
c Lets the Excel destination load data in the Excel worksheet or
named range; specify the name of the worksheet or the range in the Name Of e
Excel Sheet field.
Table name or view name variable c Works like the Table Or View option except
that the name of the table or view is contained within a variable that you specify
in the Variable Name field.
SQL command c Allows you to load the results of an SQL statement to
an Excel file.
In the Mappings page, you can map Available Input Columns to the Available
Destination Columns using a drag-and-drop operation. In the Error Output page you
can configure the behavior of the Excel destination for errors and truncations. You can
ignore the failure, redirect the data, or fail the component for each of the columns in
case of an error or a truncation.
Flat File Destination
Every now and then you may require outputting some data from disparate sources to a
text file, as this is the most convenient method to share data with external systems. You
can build an Integration Services package to connect to those disparate sources, extract
data using customized extraction rules, and output the required data set to a text file
Chapter 9: Data Flow Components 361
using the flat file destination adapter. This destination requires a Flat File Connection
Manager to connect to a text file. When you configure a Flat File Connection Manager,
you also configure various properties to specify the type of the file and how the data will

reside in the file. For example, you can choose the format of the file to be delimited,
fixed width, or ragged right (also called mixed format). You also specify how the columns
and rows will be delimited and the data type of each column. In this way, the Flat File
Connection Manager provides a basic structure to the file, which the destination adapter
uses as is. This destination has one output and no error output.
The Flat File destination has a simple customized user interface, though you can
also use the Advanced Editor to configure some of the properties. In the Flat File
Destination Editor, you can specify the connection manager you want to use for this
destination in the Flat File Connection Manager field and select the check box for
“Overwrite data in the file” if you want to overwrite the existing data in the flat file.
Next you are given an opportunity to provide a block of text in the Header field, which
can be added before the data as a header to the file. In the Mappings page, you can map
Available Input Columns to the Available Destination Columns.
OLE DB Destination
You can use the OLE DB destination when you want to load your transformed data
to OLE DB–compliant databases, such as Microsoft SQL Server, Oracle, or Sybase
database servers. This destination adapter requires an OLE DB Connection Manager
with an appropriate OLE DB provider to connect to the data destination. The OLE
DB destination has one regular input and one error output.
This destination adapter has a custom user interface that can be used to configure
most of the properties alternatively you can also use the Advanced Editor. In the
OLE DB Destination Editor, you can specify an OLE DB connection manager in
the Connections Manager page. If you haven’t configured an OLE DB Connection
Manager in the package yet, you can create a new connection by clicking New. Once
you’ve specified the OLE DB Connection Manager, you can select the data access
mode from the drop-down list. Depending on the option you choose, the editor
interface changes to collect the relevant information. Here you have five options to
choose from:
Table or view
c You can load data into a table or view in the database specified

by OLE DB Connection Manager. Select the table or the view from the drop-
down list in the name of the table or the view field. If you don’t already have a
table in the database where you want to load data, you can create a new table by
clicking New. An SQL statement for creating a table is created for you when you
click New. e columns use the data type and the length same as that of the input
362 Hands-On Microsoft SQL Server 2008 Integration Services
columns, which you can change if you want. However, if you provide the wrong
data type or a shorter column length, you will not be warned and may get errors
at run time. If you are happy with the CREATE TABLE statement, all you need
to do is provide a table name replacing the [OLE DB Destination] string after
CREATE TABLE in the SQL statement.
Table or view—fast load
c e data is loaded into a table or view as in the preceding
option; however, you can configure additional options here when you select fast load
data access mode. e additional fast load options are:
Keep identity
c During loading, the OLE DB destination needs to know
whether it has to keep the identity values coming in the data or it has to
assign unique values itself to the columns configured to have identity key.
Keep nulls c Tells the OLE DB destination to keep the null values in the data.
Table lock c Acquires a table lock during bulk load operation to speed up the
loading process. is option is selected by default.
Check constraints
c Checks the constraints at the destination table during
the data loading operation. is option is selected by default.
Rows per batch
c Specifies the number of rows in a batch in this box. e
loading operation handles the incoming rows in batches and the setting in
this box will affect the buffer size. So, you should test out a suitable value for
this field based on the memory available to this process during run time on

your server.
Maximum insert commit size
c You can specify a number in this dialog
box to indicate the maximum size that the OLE DB destination handles
to commit during loading. e default value of 2147483647 indicates that
these many rows are considered in a single batch and they will be handled
together—i.e., they will commit or fail as a single batch. Use this setting
carefully, taking into consideration how busy your system is and how many
rows you want to handle in a single batch. A smaller value means more
commits and hence the overall loading will take more time; however, if the
server is a transactional server hosting other applications, then this might
be a good idea to share resources on the server. However, if the server is a
dedicated reporting or data mart server or you’re loading at a time when the
other activities on the server are less active, then using a higher value in this
box will reduce the overall loading time.
Make sure you use fast load data access mode when loading with double-byte character
set (DBCS) data; otherwise, you may get corrupted data loaded in your table or view.
The DBCS is a set of characters in which each character is represented by two bytes.
Chapter 9: Data Flow Components 363
The environments using ideographic writing systems such as Japanese, Korean, and
Chinese use DBCS, as they contain more characters than can be represented by 256 code
points. These double-byte characters are commonly called Unicode characters. Examples
of data types that support Unicode data in SQL Server are nchar, nvarchar, and ntext,
whereas Integration Services has DT_WSTR and DT_NTEXT data types to support
Unicode character strings.
Table name or view name variable
c is data access mode works like table
or view access mode except that in this access mode you supply the name of a
variable in the Variable Name field that contains the name of the table or the view.
Table name or view name variable—fast load c is data access mode works like

table or view—fast load access mode except here you supply the name of a variable
in the Variable Name field that contains the name of the table or the view. You
still specify the fast load options in this data access mode.
SQL command
c Load the result set of an SQL statement using this option.
You can provide the SQL query in the SQL Command Text dialog box or build
a query by clicking Build Query.
In the Mappings page, you can map Available Input Columns to the Available
Destination Columns using a drag-and-drop operation, and in the Error Output page,
you can specify the behavior when an error occurs.
Partition Processing Destination
The partition processing destination is used to load and process an SQL Server Analysis
Services partition and works like a dimension processing destination. This destination
has a custom user interface that is like the one for the dimension processing destination.
This destination adapter requires the Analysis Services Connection Manager to connect
to the cubes and its partitions that reside in an Analysis Services server or the Analysis
Services project.
The Partition Processing Destination Editor has three pages to configure properties.
In the Connection Manager page, you can specify an Analysis Services Connection
Manager and can choose from the three processing methods—Add (incremental) for
incremental processing; Full, which is a default option and performs full processing
of the partition; and Data only to perform update processing of the partition. In the
Mappings page, you can map Available Input Columns to the Available Destination
Columns using a drag-and-drop operation. In the Advanced page you can configure
error-handling options when various types of errors occur. Error-handling options are
similar to those available on the Advanced page of dimension processing destination.
364 Hands-On Microsoft SQL Server 2008 Integration Services
Raw File Destination
Sometimes you may need to stage data in between processes, for which you will want
to extract data at the fastest possible speed. For example, if you have multiple packages

that work on a data set one after another—i.e., a package needs to export the data at
the end of its operation for the next package to continue its work on the data—a raw
file destination and raw file source combination can be excellent choices. The raw
file destination writes raw data to the destination raw file in an SSIS native form that
doesn’t require translation. This raw data can be imported back to the system using the
raw file source discussed earlier. Using the raw file destination to export and raw file
source to import data back into the system results in high performance for the staging
or export/import operation. However, if you have binary large object (BLOB) data
that needs to be handled in such a fashion, Raw File destination cannot help you, as it
doesn’t support BLOB objects.
The Raw File Destination Editor has two pages to expose the configurable properties.
The Connection Managers page allows you to select an access mode—File name or File
name from variable—to specify how the filename information is provided. You can either
specify the filename and path in the File Name field directly or you can use a variable to
pass these details. Note that the Raw File destination doesn’t use a connection manager
to connect to the raw file and hence you don’t specify a connection manager in this page;
it connects to the raw file directly using the specified filename or by reading the filename
from a variable.
Next, you can choose from the following four options to write data to a file in the
Write Option field:
Append
c Lets you use an existing file and append data to the already existing
data. is option requires that the metadata of the appended data match the
metadata of the existing data in the file.
Create Always c is is a default option and always creates a new file using the
filename details provided either directly in the File Name field or indirectly in
a variable specified in the Variable Name field.
Create Once c In the situations where you are using the data flow inside a repeating
logic—i.e., inside a loop container—you may want to create a new file in the first
iteration of the loop and then append the data to the file in the second and higher

iterations. You can achieve this requirement by using this option.
Truncate And Append
c If you’ve an existing raw file that you want to use to
write the data into, but want to delete the existing data before the new data is
written into it, you can use this option to truncate the existing file first and then
append the data to this file.
Chapter 9: Data Flow Components 365
In all these options, wherever you use an existing file, the metadata of the data being
loaded to the destination must match with the metadata of the file specified.
In the Columns tab, you can select the columns you want to write into the raw file
and assign them an output alias as well.
Recordset Destination
Sometimes you may need to take a record set from the data flow to pass it over to
other elements in the package. Of course, in this instance you do not want to write to
an external storage and then read from it unnecessarily. You can achieve this by using
a variable and the recordset destination that populates an in-memory ADO record set
to the variable at run time.
This destination adapter doesn’t have its own custom user interface but uses the
Advanced Editor to expose its properties. When you double-click this destination, the
Advanced Editor for Recordset destination opens and displays properties organized in
three tabs. In the Component Properties tab, you can specify the name of the variable
to hold the record set in the Variable Name field. In the Input Columns tab, you can
select the columns you want to extract out to the variable and assign an alias to each of
the selected column along with specifying whether this is a read-only or a read-write
column. As this source has only one input and no error output, the Input And Output
Properties tab lists only the input columns.
Script Component Destination
You can use the script component as a data flow destination when you choose Destination
in the Select Script Component Type dialog box. On being deployed as a destination, this
component supports only one input and no output, as you know data flow destinations

don’t have an output. The script component as a destination is covered in Chapter 11.
SQL Server Compact Destination
Integration Services stretches out to give you an SQL Server Compact destination,
enabling your packages to write data straight to an SQL Server Compact database
table. This destination uses the SQL Server Compact Connection Manager to connect
to an SQL Server Compact database. The SQL Server Compact Connection Manager
lets your package connect to a compact database file, and then you can specify the table
you want to update in an SQL Server Compact destination.
You need to create an SQL Server Compact Connection Manager before you can
configure an SQL Server Compact destination. This destination does not have a
custom user interface and hence uses the Advanced Editor to expose its properties.
When you double-click this destination, the Advanced Editor for SQL Server
366 Hands-On Microsoft SQL Server 2008 Integration Services
Compact destination opens with four tabs. Choose the connection manager for
a Compact database in the Connection Manager tab. Specify the table name you
want to update in the Table Name field under the Custom Properties section of the
Component Properties tab.
In the Column Mappings tab, you can map Available Input Columns to the Available
Destination Columns using a drag-and-drop operation. The Input and Output Properties
tab shows you the External Columns and Input Columns in the Input Collection and the
Output Columns in the Error Output Collection. SQL Server Compact destination has
one input and supports an error output.
SQL Server Destination
We have looked at two different ways to import data into SQL Server—using the Bulk
Insert Task in Chapter 5 and the OLE DB destination earlier in this chapter. Though
both are capable of importing data into SQL Server, they suffer from some limitations.
The Bulk Insert task is a faster way to import data but is a part of the control flow,
not the data flow, and doesn’t let you transform data before import. The OLE DB
destination is part of the data flow and lets you transform the data before import;
however, it isn’t the fastest method to import data into SQL Server. The SQL Server

destination combines benefits of both the components—it lets you transform the data
before import and use the speed of the Bulk Insert task to import data into local SQL
Server tables and views. The SQL Server destination can write data into a local SQL
Server only. So, if you want to import data faster to an SQL Server table or a view on
the same server where the package is running, use an SQL Server destination rather
than an OLE DB destination. Being a destination adapter, this has one input only and
does not support an error output.
SQL Server destination has a custom user interface, though you can also use the
Advanced Editor to configure its properties. In the Connection Manager page of the
SQL Destination Editor, you can specify a connection manager, a data source, or a data
source view in the Connection Manager field to connect to an SQL Server database.
Then select a table or view from the drop-down list in the Use A Table Or View field.
You also have an option to create a new connection manager or a table or view by clicking
the New buttons provided. In the Mappings page, you can map Available Input Columns
to the Available Destination Columns using a drag-and-drop operation.
You specify the Bulk Insert options in the Advanced page of the SQL Destination
Editor dialog box. You can configure the following ten options in this page:
Keep identity
c is option is not checked by default. Check this box to keep the
identity values coming in the data rather than using the unique values assigned by
SQL Server.
Chapter 9: Data Flow Components 367
Keep nulls c is option is not checked by default. Check this box to retain the
null values.
Table lock
c is option is checked by default. Uncheck this option if you don’t
want to lock the table during loading time. is option may impact the availability
of tables being loaded to other applications or users. If you want to allow concurrent
use of SQL Server tables that are being loaded by this destination, uncheck this
box; however, if you are running this package at a quiet time—i.e., when no other

applications or users are accessing the tables being loaded, or you do not want to
allow concurrent use of those tables—it is better to leave the default setting.
Check constraints
c is option is checked by default. is means any constraint
on the table being loaded will be checked during loading time. If you’re confident
the data being loaded does not break any constraints and want faster import of
data, you may uncheck this box to save processing overhead of checking constraints.
Fire triggers c is option is not checked by default. Check this box to let the bulk
insert operation execute insert triggers on target tables during loading. Selecting to
execute insert triggers on the destination table may affect the performance of the
loading operation.
First row c Specify a value for the first row from which the bulk insert will start.
Last row c Specify a value in this field for the last row to insert.
Maximum number of errors c Provide a value for the maximum number of rows
that cannot be imported due to errors in data before the bulk insert operation
stops. Leave the First Row, Last Row, and Maximum Number Of Errors fields
blank to indicate that you do not want to specify any limits. However, if you’re
using the Advanced Editor, use a –1 value to indicate the same.
Timeout
c Specify the number of seconds in this field before the bulk insert
operation times out.
Order columns
c Specify a comma-delimited list of columns in this field to sort
data on in ascending or descending order.
Data Flow Paths
First, think of how you connect tasks in the control flow. You click the first task in the
control flow to highlight the task and display a green arrow, representing output from
the task. Then you drag the green arrow onto the next task in the work flow to create a
connection between the tasks, represented by the green line by default. The green line,
called a precedence constraint, enables you to define some conditions when the following

tasks can be executed. In the data flow, you connect the components in the same way you

×