Hướng dẫn học Microsoft SQL Server 2008 part 89 pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (545.09 KB, 10 trang )

Nielsen c37.tex V4 - 07/21/2009 2:13pm Page 842
Part V Data Connectivity
to adjust names and locations for speciﬁc target servers. UNC is a method of identifying a path so
that it can be accessed from anywhere on the network a package may be run; it takes the form of
\\servername\sharename\path\file.ext. The many ﬁle conﬁguration managers are listed here:
■ Flat File: Presents a text ﬁle as if it were a table, with locale and header options. The ﬁle can
be in one of four formats:
■ Delimited: File data is separated by column (e.g., comma) and row delimiters (e.g.,{
CR}{LF}).
■ Fixed Width: File data has known sizes without column or row delimiters. When opened
in Notepad, such a ﬁle appears as if all data is on a single line.
■ Ragged Right: File data is interpreted using ﬁxed width for all columns except the last,
which is terminated by the row delimiter.
Only ﬁles that use the delimited format are able to interpret zero-length strings as null.
■ Multiple Flat Files: Same as the Flat File connection manager, but it allows multiple ﬁles to
be selected, either individually or using wildcards. Data then appears as a single large table to
Integration Services elements.
■ File: Identiﬁes a ﬁle or folder in the ﬁle system without specifying content. Such ﬁle pointers
are used by several elements with Integration Services, including the ﬁle system and FTP tasks
for ﬁle manipulation and the Execute SQL task to identify the ﬁle from which a SQL statement
should be read. The usage type (Create ﬁle, Existing ﬁle, Create folder, Existing folder) ensures
that the correct type of ﬁle pointer is created.
■ Multiple Files: Same as the ﬁle connection manager, but it allows multiple ﬁles to be selected,
either individually or using wildcards
■ Excel: Identiﬁes a ﬁle containing a group of cells that can be interpreted as a table (0 or 1
header rows, data rows below without row or column gaps)
Special
Beyond Database and File connection managers, several other types are provided:
■ Cache: Deﬁnes a data cache location. The cache is ﬁrst populated using the Cache transform and
then used by Lookup transforms within Data Flow tasks. The cache is a write once, read many data
store: All the data to be included in the cache must be written by a single Cache transform but can

then be used by many Lookup transforms. Conﬁguring the connection manager requires that index
columns be selected, so it is often easiest to use the New button from within the Cache transform
to create the connection manager, as it provides the column meta-data.
Conﬁgure the connection manager by marking the columns that will be used to look up rows
in the Columns tab. Mark the ﬁrst column in the lookup as index position 1, the second as 2,
and so on. The lookups performed on a cache must use all of the marked columns and no
others to ﬁnd the row. By default, the cache is created in memory and is available only in the
current package. Make the cache available on disk f or use by subsequent packages by enabling
ﬁle cache on the General tab and identifying the
.CAW ﬁle to be used to store the cached data.
■ FTP: Deﬁnes a connection to an FTP server. For most situations, entering the server name and
credentials is sufﬁcient to deﬁne the c onnection. This is used with the FTP task to move and
remove ﬁles or create and remove directories using FTP.
842
www.getcoolebook.com
Nielsen c37.tex V4 - 07/21/2009 2:13pm Page 843
Performing ETL with Integration Services 37
■ HTTP: Deﬁnes a connection to a Web Service. Enter the URL of the WSDL (Web Ser-
vice Deﬁnition) for the Web Service in question — for example,
http://MyServer/
reportserver/reportservice.asmx?wsdl
points to the WSDL for Reporting
Services on MyServer. Used with the Web Service task to access Web Service
methods.
■ MSMQ: Deﬁnes a connection to a Microsoft Message Queue; used in conjunction with a
Message Queue task to send or receive queued messages.
■ SMO: Speciﬁes the name and authentication method to be used with Database Transfer tasks
(Transfer Objects, Transfer Logins, etc.).
■ SMTP: Speciﬁes the name of the Simple Mail Transfer Protocol Server for use with the Send
Mail task. Older SMTP server versions may not support all the commands necessary to send

e-mail from Integration Services.
■ WMI: Deﬁnes a server connection for use with Windows Management Instrumentation tasks,
which enable logged and current event data to be collected.
Control ﬂow elements
The Control Flow tab provides an environment for deﬁning the overall work ﬂow of the package. The
following elements are the building blocks of that work ﬂow.
Containers
Containers provide i mportant features for an Integration Services package, including iteration over a
group of tasks and isolation for error and event handling.
In addition to containers, the Integration Services Designer will also create task groups. Deﬁne a group
by selecting a number of Control Flow items, right-clicking one of the selected items, and choosing
Group. This encloses several tasks in a group box that can be collapsed into a single title bar. Note,
however, that this group has no properties and cannot participate in the container hierarchy — in short,
it is a handy visual device that has no effect on how the package executes.
The containers available are as follows:
■ TaskHost: This container is not visible in a package, but implicitly hosts any task that is not
otherwise enclosed in a container. Understanding this default container helps understand error
and event handler behaviors.
■ Sequence: This simply contains a number of tasks without any iteration f eatures, but it pro-
vides a shared event and error-handling context, allows shared variables to be scoped to the
container level instead of the package level, and enables the entire container to be disabled at
once during debugging.
■ For Loop: This container provides the advantages of a Sequence container but runs the
tasks in the container as if the tasks were in a C# for loop. For example, given an inte-
ger variable
@LoopCount, assigning the For Loop properties InitExpression to
@LoopCount=0, EvalExpression to @LoopCount<3,andAssignExpression to
@LoopCount=@LoopCount+1 will execute the contents of the container three times, with
@LoopCount containing the values (0,1,2) on each successive iteration.
843

www.getcoolebook.com
Nielsen c37.tex V4 - 07/21/2009 2:13pm Page 844
Part V Data Connectivity
■ Foreach Loop: This container provides iteration over the contents of the container based on
various lists of items:
■ File: Each ﬁle in a wildcarded directory command
■ Item: Each item in a manually entered list
■ ADO: Each row in a variable containing an ADO recordset or ADO.NET data set
■ ADO.NET Schema Rowset: Each item in the schema rowset
■ Nodelist: Each node in an XPath result set
■ SMO: List of server objects (such as jobs, databases, ﬁle groups)
Describe the list to be iterated on the Collection page, and then map each item being iterated
over to a corresponding variable. For example, a File loop requires a single string variable
mapped to index 0, but an ADO loop requires n variables for n columns, with indexes 0
through n-1.
Control ﬂow tasks
Tasks that can be included in c ontrol ﬂow are as follows:
■ ActiveX Script: Enables legacy VB and Java scripts to be included in Integration Services. New
scripts should use the Script task instead. Consider migrating legacy scripts where possible
because this task will not be available in future versions of SQL Server.
■ Analysis Services Execute DDL: Sends Analysis Services Scripting Language (ASSL) scripts to
an Analysis Services server to create, alter, or process cube and data mining structures. Often
such scripts can be created using the Script option in SQL Server Management Studio.
■ Analysis Services Processing Task: Identiﬁes an Analysis Services database, a list of objects
to process, and processing options
■ Bulk Insert: Provides the fastest mechanism to load a ﬂat ﬁle into a database table without
transformations. Specify source ﬁle and destination table as a minimum conﬁguration. If the
source ﬁle is a simple delimited ﬁle, then specify the appropriate row and column delimiters;
otherwise, create and specify a format ﬁle that describes the layout of the source ﬁle. Error
rows cannot be redirected, but rather cause the task to fail.

■ Data Flow: Provides a ﬂexible structure f or loading, transforming, and storing data as conﬁg-
ured on the Data Flow tab. See the section ‘‘Data Flow Components,’’ later in this chapter for
the components that can be conﬁgured in a Data Flow task.
■ Data Proﬁling: Builds an XML ﬁle to contain an analysis of selected tables. Available anal-
yses include null ratio, column length for string columns, statistics for numeric columns,
value distribution, candidate keys, and inter-column dependencies. Open the result-
ing ﬁle in the Data Proﬁle Viewer to explore the results. Alternately, the analysis results
can be sent to an XML variable for programmatic inspection as part of a data validation
regimen.
Conﬁgure by setting the destination and ﬁle overwrite behavior on the General page. Select
proﬁles to run either by pressing the Quick Proﬁle button to select many proﬁles for a single
table or by switching to the Proﬁle Requests page to add proﬁles for one or more tables
844
www.getcoolebook.com
Nielsen c37.tex V4 - 07/21/2009 2:13pm Page 845
Performing ETL with Integration Services 37
individually. Add a new proﬁle request manually by clicking the Proﬁle Type pull-down list on
the ﬁrst empty row.
■ Data Mining Query: Runs prediction queries against existing, trained data mining models.
Specify the Analysis Services database connection and mining structure name on the Min-
ing Model tab. On the Build Query tab, enter the DMX query, using the Build New Query
button to invoke the Query Builder if desired. The DMX query can be parameterized by
placing parameter names of the form
@MyParamName in the query string. If parameters are
used, then map from the parameter name (without the @ preﬁx) to a corresponding vari-
able name on the Parameter Mapping tab. Results can be handled by sending them either to
variable(s) on the Result Set tab and/or to a database table on the Output tab:
■ Single-row result sets can be stored directly into variables on the Result Set tab by mapping
each Result (column) Name returned by the query to the corresponding target variable,
choosing the Single Row result type for each mapping.

■ Multiple-row result sets can be stored in a variable of type
Object for later use with a
Foreach loop container or other processing. On the Result Set tab, map a single Result
Name of 0 (zero) to the object variable, with a result type of Full Result Set.
■ Independent of any variable mappings, both single-row and multiple-row result sets can be
sent to a table by specifying the database connection and table name on the Output tab.
■ Execute DTS 2000 Package: Enables legacy DTS packages to be executed as part of the
Integration Services work ﬂow. Specify the package location, authentication information, and
DTS-style Inner/Outer variable mappings. Optionally, once the package is identiﬁed, it can be
loaded as part of the Integration Services package. Additional downloads are required in SQL
Server 2008 to enable DTS package execution; see Books Online for details.
■ Execute Package: Executes the speciﬁed Integration Services package, enabling packages to
be broken down into smaller, reusable pieces. Invoking a child package requires substantial
overhead, so consider the number of invocations per run when considering child packages.
For example, one or two child packages per ﬁle or table processed is probably ﬁne, but one
package per row processed is probably not. The child package will participate in a transaction
if the Execute Package task is conﬁgured to participate. Variables available to the Execute
Package task can be used by the child package by creating a ‘‘parent package variable’’ conﬁgu-
ration in the child package, mapping each parent package variable to a locally deﬁned package
variable as needed.
■ Execute Process: Executes an external program or batch ﬁle. Specify the program to be run in
the
Executable property, including the extension (e.g., MyApp.exe), and the full path if the
program is not included in the computer’s
PATH setting (e.g., C:\stuff\MyApp.exe). Place
any switches or arguments that would normally follow the program name on the command
line in the
Arguments property. Set other execution time parameters as appropriate, such as
WorkingDirectory or SuccessValue so Integration Services knows if the task succeeded.
The

StandardInputVariable property allows the text of a variable to be supplied to
applications that read from
StdIn (e.g., find or grep). The StandardOutputVariable
and StandardErrorVariable properties enable the task’s normal and error messages to be
captured in variables.
845
www.getcoolebook.com
Nielsen c37.tex V4 - 07/21/2009 2:13pm Page 846
Part V Data Connectivity
■ Execute SQL: Runs a SQL script or query, optionally returning results into variables. On the
General page of the editor, set the
ConnectionType and Connection properties to specify
which database the query will run against.
SQLSourceType speciﬁes how the query will be
entered:
■ Direct Input: Enter into the
SQLStatement property by typing in the property page,
pressing the ellipses to enter the query in a text box, pressing the Browse button to read
the query from a ﬁle into the property, or pressing the Build Query button to invoke the
Query Builder.
■ File connection: Specify a ﬁle that the query will be read from at runtime.
■ Variable: Specify a variable that contains the query to be run.
A query can be made dynamic either by using parameters or by setting the
SQLStatement
property using the Expressions page of the editor. Using expressions is slightly more com-
plicated but much more ﬂexible, as parameter use is limited — only in the
WHERE clause
and, with the exception of ADO.NET connections, only for stored procedure executions or
simple queries. If parameters are to be used, then the query is entered with a marker for each
parameter to b e replaced, and then each marker is mapped to a variable via the Parameter

Mapping page. Parameter markers and mapping vary according to connection manager type:
■ OLE DB: Write the query leaving a ? to mark each parameter location, and then refer to
each parameter using its order of appearance in the query to determine a name: 0 for the
ﬁrst parameter, 1 for the second, and so on.
■ ODBC: Same as OLE DB, except parameters are named starting at 1 instead of 0
■ ADO: Write the query using ? to mark each parameter location, and specify any non-
numeric parameter name for each parameter. For ADO, it is the order in which the
variables appear on the mapping page (and not the name) that determines which parameter
they will replace.
■ ADO.NET: Write the query as if the parameters were variables declared in Transact-SQL
(e.g.,
SELECT name FROM mytable WHERE id = @ID), and then refer to the parameter
by name for mapping.
The
ResultSet property (General page) speciﬁes how query results are returned to variables:
■ None: Results are not captured.
■ Single row: Results from a singleton query can be stored directly into variables. On the
Result Set tab, map each result name returned by the query to the c orresponding target
variable. As with input parameters, result names vary according to connection manager
type. OLE DB, ADO, and ADO.NET connections map columns by numeric order starting
at 0. ODBC also allows numeric mapping but starts at 1 for the ﬁrst column. In addition,
OLE DB and ADO connections allow columns to be mapped by column name instead of
number.
■ Full result set: Multiple-row result sets are stored in a variable of type
Object for later
use with a Foreach loop container or other processing. On the Result Set tab, map a single
result name of 0 (zero) to the object variable, with a result type of Full Result Set.
■ XML: Results are stored in an XML DOM document for later use with a Foreach loop
container or other processing. On the Result Set tab, map a single result name of 0 (zero)
to the object variable, with a result type of Full Result Set.

846
www.getcoolebook.com
Nielsen c37.tex V4 - 07/21/2009 2:13pm Page 847
Performing ETL with Integration Services 37
■ File System Task: Provides a number of ﬁle (copy, delete, move, rename, set attributes)
and folder (copy, create, delete, delete content, move) operations. Source and destination
ﬁles/folders can be speciﬁed by either a File connection manager or a string variable that
contains the path. Remember to set the appropriate usage type when conﬁguring a File con-
nection manager (e.g., Create folder vs. Existing folder). Set the
OverwriteDestination or
UseDirectoryIfExists properties to obtain the desired behavior for preexisting objects.
■ FTP: Supports a commonly used subset of FTP functionality, including send/receive/delete ﬁles
and create/remove directories. Specify the server via an FTP connection manager. Any remote
ﬁle/path can be speciﬁed via either direct entry or a string variable that contains the ﬁle/path.
A local ﬁle/path can be speciﬁed via either a File connection manager or a string variable that
contains the ﬁle/path. Wildcards are accepted in ﬁlenames. Use
OverWriteFileAtDest to
specify whether target ﬁles can be overwritten, and
IsAsciiTransfer to switch between
ASCII and binary transfer modes.
■ Message Queue: Sends or receives queued messages via MSMQ. Specify the message connec-
tion, send or receive, and the message type.
New in 2008
S
cript tasks and script components now use the Visual Studio Tools for Applications (VSTA) development
environment. This enables C# code to be used in addition to the Visual Basic code supported by SQL
Server 2005. Scripts also have full access to Web and other assembly references, compared to the subset of
.NET assemblies available in SQL Server 2005.
■ Script: This task allows either Visual Basic 2008 or Visual C# 2008 code to be embedded in a
task. Properties include the following:

■ ScriptLanguage: Choose which language to use to create the task. Once the script has
been viewed/edited, this property becomes read-only.
■ ReadOnlyVariables/ReadWriteVariables: List the read and read/write variables to be
accessed within the script, separated by commas, in these properties. Attempting to access
a variable not listed in these properties results in a run-time error. Entries are case sensi-
tive, so
myvar and MyVar are considered different variables, although using the new Select
Variables dialog will eliminate typos.
■ EntryPoint: Name of the class that contains the entry point for the script. There is nor-
mally no reason to change the default name (ScriptMain). It generates the following code
shell:
Public Class ScriptMain
Public Sub Main()
‘
‘ Add your code here
Dts.TaskResult = Dts.Results.Success
End Sub
End Class
847
www.getcoolebook.com
Nielsen c37.tex V4 - 07/21/2009 2:13pm Page 848
Part V Data Connectivity
At the end of execution, the script must return Dts.TaskResult as either success or failure
to indicate the outcome of the task. Variables can be referenced through the
Dts.Variables
collection. For example, Dts.Variables("MyVar").Value exposes the value of the
MyVar variable. Be aware that the collection is case sensitive, so referencing "myvar" will not
return the value of
"MyVar".TheDts object exposes several other useful members, including
the

Dts.Connections collection to access connection managers, Dts.Events.Fire
methods to raise events, and the Dts.Log method to write log entries. See ‘‘Interact-
ing with the Package i n the Script Task’’ in SQL Server 2008 Books Online for additional
details.
■ Send Mail: Sends a text-only SMTP e-mail message. Specify the SMTP conﬁguration manager
and all the normal e-mail ﬁelds (To, From, etc.). Separate multiple addresses by commas (not
semicolons). The source of the message body is speciﬁed by the
MessageSourceType prop-
erty:
Direct Input for entering the body as text in the MessageSource property, File
Connection
to read the message from a ﬁle at runtime, or Variable to use the contents
of a string variable as the message body. Attachments are entered as pipe-delimited ﬁle specs.
Missing attachment ﬁles cause the task to fail.
■ Transfer Database: Copies or moves an entire database between SQL Server instances.
Choose between the faster
DatabaseOffline method (which detaches, copies ﬁles, and
reattaches the databases) or the slower
DatabaseOnline (which uses SMO to create the
target database). Identify the source and destination servers via SMO connection managers.
For the
DatabaseOnline method, specify the source and destination database names, and
the path for each destination ﬁle to be created. The
DatabaseOnline method requires the
same information, plus a network share path for each source and destination ﬁle, as the copy
must move the physical ﬁles. Specifying UNC paths for the network share path is the most
general, but packages that are running on one of the servers can reference l ocal paths for that
server. Using the
DatabaseOnline method requires that any objects on which the database
depends, such as logins, be in place before the database is transferred.

■ Transfer Error Messages: Transfers custom error messages (ala
sp_addmessage)fromone
server to another. Identify the source and destination servers via SMO connection managers
and the list of messages to be transferred.
■ Transfer Jobs: Copies SQL Agent jobs from one SQL Server instance to another. Identify the
source and destination servers via SMO connection managers and the list of messages to be
transferred. Any resources required (e.g., databases) by the jobs being copied must be available
to successfully copy.
■ Transfer Logins: Copies logins from one SQL Server instance to another. Identify the source
and destination servers via SMO connection managers and the list of logins to be transferred.
The list may consist of selected logins, all logins on the source server, or all logins that have
access to selected databases (see the
LoginsToTransfer property in the Task dialog).
■ Transfer Master Stored Procedures: Copies any custom stored procedures from the master
database on one server to the master database on another server. Identify the source and
destination servers via SMO connection managers, and then select to either copy all custom
stored procedures or individually mark the procedures to be copied.
■ Transfer Objects: Copies any database-level object from one SQL Server instance to another.
Identify the source and destination servers via SMO connection managers and the database on
each server. For each type of object, select to either copy all such objects or to individually
identify which objects to transfer, and then enable copy options (e.g.,
DropObjectsFirst,
CopyIndexes,etc.).
848
www.getcoolebook.com
Nielsen c37.tex V4 - 07/21/2009 2:13pm Page 849
Performing ETL with Integration Services 37
■ Web Service: Executes a Web Service call, storing the output in either a ﬁle or a vari-
able. Specify an HTTP connection manager and a local ﬁle in which to store WSDL
information. If the HTTP connection manager points directly at the WSDL ﬁle (e.g.,

http://MyServer/MyService/MyPage.asmx?wsdl for the MyService Web Service
on
MyServer), then use the Download WSDL button to ﬁll the local copy of the WSDL ﬁle;
otherwise, manually retrieve and create the local WSDL ﬁle. Setting
OverwriteWSDLFile to
true will store the latest Web Service description into the local ﬁle each time the task is run.
Once connection information is established, switch to the Input page to choose the service
and method to execute, and then enter any parameters required by the chosen method. The
Output page provides options to output to either a ﬁle, as described by a File connection
manager, or a variable. Take care to choose a variable with a data type compatible with the
result the Web Service will return.
■ WMI Data Reader: Executes a Windows Management Instrumentation (WQL) query against
a server to retrieve e vent log, conﬁguration, and other management information. Select a WMI
connection manager and specify a WQL Query (e.g.,
SELECT * FROM win32_ntlogevent
WHERE logfile = ‘system’ AND timegenerated > ‘20080911’
for all system event
log entries since 9/11/2008) from direct input, a ﬁle containing a query, or a string vari-
able containing a query. Choose an output format by setting the
OutputType property to
‘‘Data table’’ for a comma-separated values list, ‘‘Property name and value’’ f or one comma-
separated name/property combination per row with an extra newline between records, or
‘‘Property value’’ for o ne property value per row without names. Use
DestinationType and
Destination to send the query results to either a ﬁle or a string variable.
■ WMI Event Watcher: Similar to a WMI data reader but instead of returning data, the task
waits for a WQL speciﬁed event to occur. When the event occurs or the task times out,
the SSIS task events
WMIEventWatcherEventOccurred or WMIEventWatcherEvent
Timeout

can ﬁre, respectively. For either occurrence, specify the action (log and ﬁre event or
log only) and the task disposition (return success, return failure, or watch again). Set the task
timeout (in seconds) using the
Timeout property, with 0 specifying no timeout.
■ XML: Performs operations on XML documents, including comparing two documents (diff),
merging two documents, applying diff output (diffgram) to a document, validating a document
against a DTD, and performing XPath queries or XSLT transformations. Choose a source docu-
ment as direct input, a ﬁle, or a string variable, and an output as a ﬁle or a string variable. Set
other properties as appropriate for the selected
OperationType.
Maintenance Plan tasks
Maintenance Plan tasks provide the same elements that are used to build maintenance plans for use
in custom package development. Tasks use an ADO.NET connection manager to identify the server
being maintained, but any database selected in the connection manager is superseded by the databases
identiﬁed within each Maintenance Plan task. Any questions about what a particular task does can be
answered by pressing the View T-SQL button on the maintenance task.
For more information about database maintenance, see Chapter 42, ‘‘Maintaining the
Database.’’
The available tasks are as follows:
■ Back Up Database: Creates a native SQL backup of one or more databases
■ Check Database Integrity: Performs a DBCC
CHECKDB
849
www.getcoolebook.com
Nielsen c37.tex V4 - 07/21/2009 2:13pm Page 850
Part V Data Connectivity
■ Execute SQL Server Agent Job: Starts the selected SQL Agent job via the sp_start_job
stored procedure
■ Execute T-SQL Statement: A simpliﬁed SQL-Server-only statement execution. It does not
return results or set variables: Use the Execute SQL task for more complex queries.

■ History Cleanup: Trims old entries from backup/restore, maintenance plan, and SQL Agent
job history
■ Maintenance Cleanup: Prunes old maintenance plan, backup, or other ﬁles
■ Notify Operator: Performs an
sp_notify_operator, sending a message to selected
on-duty operators deﬁned on that SQL Server
■ Rebuild Index: Issues an
ALTER INDEX REBUILD for each table, indexed view, or both in
the selected databases
■ Reorganize Index: Uses
ALTER INDEX REORGANIZE to reorganize either all or selected
indexes within the databases chosen. It optionally compacts large object data.
■ Shrink Database: Performs a DBCC
SHRINKDATABASE
■ Update Statistics: Issues an UPDATE STATISTICS statement for column, index, or all
statistics in the selected databases
Data ﬂow components
This section describes the individual components that can be conﬁgured within a Data Flow task:
sources of data for the ﬂow, destinations that output the data, and optional transformations that can
change the data in between. See the ‘‘Data Flow’’ section earlier in this chapter for general information
about conﬁguring a Data Flow task.
Sources
Data Flow sources supply the rows of data that ﬂow through the Data Flow task. Right-clicking a source
on the design surface reveals that each source has two different editing options: Edit (basic) and Show
Advanced Editor, although in some cases the basic Edit option displays the Advanced Editor anyway.
The common steps to conﬁguring a source are represented by the pages of the basic editor:
■ Connection Manager: Specify the particular table, ﬁle(s), view, or query that will provide the
data for this source. Several sources will accept either a table name or a query string from a
variable.
■ Columns: Choose which columns will appear in the data ﬂow. Optionally, change the default

names of the columns in the data ﬂow.
■ Error Output: Specify what to do for each column should an error occur. Each type of error
can be ignored, cause the component to fail (default), or redirect the problem row to an error
output. Truncation errors occur when a string is longer than the destination allows, ‘‘Error’’
errors catch all other types of failures. Don’t be confused by the ‘‘Description’’ column; it is not
another type of error, but merely provides a description of the context under which the e rror
could occur.
The advanced editor provides the same capabilities as the basic e ditor in a different format, plus much
ﬁner control over input and output columns, including names and data types. When the rows sent
to the data ﬂow are already sorted, they can be marked as such using the advanced editor. O n the
850
www.getcoolebook.com
Nielsen c37.tex V4 - 07/21/2009 2:13pm Page 851
Performing ETL with Integration Services 37
Input and Output Properties tab, choose the top node of the tree and set the IsSorted property
to
true. Then select each of the output (data ﬂow) columns that make up the sort and enter a
SortKeyPosition value, beginning with 1 and incrementing by 1 for each column used in sorting.
To mark a column as sorted descending, specify a negative
SortKeyPosition. For example, giving
theDateandCategorycolumns
SortKeyPosition values of -1 and 2, respectively, will mark the Date
descending and the Category ascending.
The available sources are as follows:
■ OLE DB: The preferred method of reading database data. It requires an OLE DB connection
manager.
■ ADO.NET: Uses an ADO.NET connection manager to read database data, either by identifying
a database o bject or entering a query to execute.
■ Flat File: Requires a Flat File connection manager. Delimited ﬁles translate zero-length strings
into null values for the data ﬂow when the

RetainNulls property is true.
■ Excel: Uses an Excel connection manager and either a worksheet or named ranges as tables. A
SQL command can be constructed using the Build Query button that selects a subset of rows.
Data types are assigned to each column by sampling the ﬁrst few rows, but can be adjusted
using the advanced editor.
■ Raw: Reads a ﬁle written by the Integration Servi ces Raw File destination (see the following
‘‘Destinations’’ section) in a preprocessed format, making this a very fast method of retrieving
data, often used when data processed by one stage of a package needs to be stored and reused
by a l ater stage. Because the data has already been processed once, no error handling or output
conﬁguration is r equired. The input ﬁlename is directly speciﬁed without using a connection
manager.
■ XML: Reads a simple XML ﬁle and presents it to the data ﬂow as a table, using either an
inline schema (a header in the XML ﬁle that describes the column names and data types) or
an XSD (XML Schema Deﬁnition) ﬁle. The XML source does not use a connection manager;
instead, specify the input ﬁlename and then either specify an XSD ﬁle or indicate that the ﬁle
contains an inline schema. (Set the
UseInlineSchema property to true or select the check
box in the basic editor).
■ Script: A script component can act as a source, destination, or transformation of a data ﬂow.
Use a script as a source to generate test data or to format a complex external source of data.
For example, a poorly formatted text ﬁle could be read and parsed into individual c olumns by
a script. Start by dragging a script transform onto the design surface, choosing Source from the
pop-up Select Script Component Type dialog. On the Inputs and Outputs page of the editor,
add as many outputs as necessary, renaming t hem as desired. Within each output, deﬁne
columns as appropriate, carefully choosing the corresponding data types. On the Script page of
the editor, list the read and read/write variables to be accessed within the script, separated by
commas, in the
ReadOnlyVariables and ReadWriteVariables properties, respectively.
Click the Edit Script button to expose the code itself, and note that the primary method to be
coded overrides

CreateNewOutputRows, as shown in this simple example:
Public Overrides Sub CreateNewOutputRows()
‘Create 20 rows of random integers between 1 and 100
Randomize()
Dim i As Integer
851
www.getcoolebook.com

Hướng dẫn học Microsoft SQL Server 2008 part 89 pptx

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về