Tải bản đầy đủ (.pdf) (10 trang)

Hướng dẫn học Microsoft SQL Server 2008 part 75 pot

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (765.48 KB, 10 trang )

Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 702
Part V Data Connectivity
@datasrc = ‘C:\SQLServerBible\CHA1_Schedule.xls’,
@provstr = ‘Excel 5.0’;
Excel spreadsheets are not multi-user spreadsheets. SQL Server can’t perform a distributed
query that accesses an Excel spreadsheet while that spreadsheet is open in Excel.
Linking to MS Access
Not surprisingly, SQL Server links easily to MS Access databases. SQL Server uses the OLE DB Jet
provider to connect to Jet and request data from the MS Access
.mdb file.
FIGURE 31-3
Prior to the conversion to SQL Server, the Cape Hatteras Adventures company was managing its tour
schedule in the CHA1_Schedule.xls spreadsheet.
702
www.getcoolebook.com
Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 703
Executing Distributed Queries 31
FIGURE 31-4
Tables are defined within the Excel spreadsheet as named ranges. The CHA1_Schedule spreadsheet
has five named ranges.
Because Access is a database, there’s no trick to preparing it for linking, as there is with Excel. Each
Access table will appear as a table under the Linked Servers node in Management Studio.
The Cape Hatteras Adventures customer/prospect list was stored in Access prior to upsizing the
database to SQL Server. The following code from the
CHA2_Convert.sql script links to the
CHA1_Customers.mdb Access database so SQL Server can retrieve the data and populate the SQL
Server tables:
EXEC sp_addlinkedserver
‘CHA1_Customers’,
703
www.getcoolebook.com


Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 704
Part V Data Connectivity
‘Access 2003’,
‘Microsoft.Jet.OLEDB.4.0’,
‘C:\SQLServerBible\CHA1_Customers.mdb’;
If you are having difficulty with a distributed query, one of the first places to check is the security con-
text. Excel expects that connections do not establish a security context, so the non-mapped user login
should be set to no security context:
EXEC sp_addlinkedsrvlogin
@rmtsrvname = ‘CHA1_Schedule’,
@useself = ‘false’;
Developing Distributed Queries
Once the link to the external data source is established, SQL Server can reference the external data
within queries. Table 31-2 shows the four basic syntax methods that are available, which differ in
query-processing location and setup method.
TABLE 31-2
Distributed Query Method Matrix
Link Setup Query-Execution Location
Local SQL Server External Data Source (Pass-Through)
Linked Server
Four-part name Four-part name OpenQuery()
Ad Hoc Link Declared
in the Query
OpenDataSource() OpenRowSet()
Distributed queries and Management Studio
Management Studio doesn’t supply a graphic method for initiating a distributed query. There’s no way
to drag a linked server or remote table into the Query Designer. However, the distributed query can be
entered manually in the SQL pane and then executed as a query.
Using the Query Editor, the name of the linked server can be dragged from the Object Explorer to the
Query Editor.

Distributed views
Views are saved SQL SELECT statements. While I don’t recommend building a client/server application
based on views, they are useful for ad hoc queries. Because most users (and even developers) are unfa-
miliar with the various methods of performing distributed queries, wrapping a distributed query inside a
view might be a good idea.
704
www.getcoolebook.com
Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 705
Executing Distributed Queries 31
Local-distributed queries
A local-distributed query sounds like an oxymoron, but it’s a query that pulls the external data into
SQL Server and then processes the query at the local SQL Server. Because the processing occurs at the
local SQL Server, local-distributed queries use T-SQL syntax and are sometimes called T-SQL distributed
queries.
Using the four-part name
If the data is in another SQL Server, then a complete four-part name is required:
Server.Database.Schma.ObjectName
The four-part name may be used in any SELECT or data-modification query. On my writing computer
is a second instance of SQL Server called
[SQL2008RC0\London]. The object’s owner name is
required if the query accesses an external SQL Server.
The following query retrieves the
Person table from the SQL2 instance:
SELECT LastName, FirstName
FROM [SQL2008RC0\London].Family.dbo.Person;
Result:
LastName FirstName

Halloway Kelly
Halloway James

When performing an INSERT, UPDATE,orDELETE command as a distributed query, either the
four-part name or a distributed query function must be substituted for the table name. For example, the
following SQL code, extracted from the
CHA2_Convert.sql script that populates the CHA2 sample
database, uses the four-part name as the source for an
INSERT command. The query retrieves base
camps from the Excel spreadsheet and inserts them into SQL Server:
INSERT BaseCamp(Name)
SELECT DISTINCT [Base Camp]
FROM CHA1_Schedule [Base_Camp]
WHERE [Base Camp] IS NOT NULL;
If you’ve already executed CHA2_Convert.sql and populated your copy of CHA2, then you
may want to re-execute
CHA2_Create.sql in order to start with an empty database.
As another example of using the four-part name for a distributed query, the following code updates the
Family database on the second SQL Server instance:
UPDATE [SQL2008RC0\London].Family.dbo.Person
SET LastName = ‘Wilson’
WHERE PersonID = 1;
705
www.getcoolebook.com
Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 706
Part V Data Connectivity
OpenDataSource()
Using the OpenDataSource() function is functionally the same as using a four-part name to access
a linked server, except that the
OpenDataSource() function defines the link within the function
instead of referencing a pre-defined linked server. While defining the link in code bypasses the linked
server requirement, if the link location changes, then the change will affect every query that uses
OpenDataSource(). In addition, OpenDataSource() won’t accept variables as parameters.

The
OpenDataSource() function is substituted for a server in the four-part name and may be used
within any DML statement.
The syntax for the
OpenDataSource() function seems simple enough:
OPENDATASOURCE ( provider_name, init_string )
However, there’s more to it than the first appearance betrays. The init string is a semicolon-delimited
string containing several parameters (the exact parameters used depend on the external data source
and are not described here; see Books Online for a full overview). The potential parameters within the
init string include data source, location, extended properties, connection timeout, user ID, password,
and catalog. The
init string must define the entire external data-source connection, and the security
context, within a function. No quotes are required around the parameters within the
init string. The
common error committed in building
OpenDataSource() distributed queries is mixing the commas
and semicolons.
If
OpenDataSource() is connecting to another SQL Server using Windows authentication, then
authentication delegation via Kerberos security is required.
A relatively straightforward example of the
OpenDataSource() function is using it as a means of
accessing a table within another SQL Server instance:
SELECT FirstName, Gender
FROM OPENDATASOURCE(
‘SQLOLEDB’,
‘Data Source=SQL2008VPC\London;User ID=Joe;Password=j’
).Family.dbo.Person;
Result:
FirstName Gender


Adam M
Alexia F
The following example of a distributed query that uses OpenDataSource() references the Cape
Hatteras Adventures
sample database. Because an Access location contains only one database and
the tables don’t require the owner to specify the table, the database and owner are omitted from the
four-part name:
SELECT ContactFirstName, ContactLastName
FROM OPENDATASOURCE(
‘Microsoft.Jet.OLEDB.4.0’,
706
www.getcoolebook.com
Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 707
Executing Distributed Queries 31
‘Data Source =
C:\SQLServerBible\CHA1_Customers.mdb’
) Customers;
Result:
ContactFirstName ContactLastName

Neal Garrison
Melissa Anderson
Gary Quill
To illustrate using OpenDataSource() in an update query, the following query example will update
any rows inside the
CHA1_Schedule.xls Excel 2000 spreadsheet. A named range was previously
defined as
Tours ‘=Sheet1!$E$5:$E$24’, which now appears to the SQL query as a table within
the data source. Rather than update an individual spreadsheet cell, this query performs an

UPDATE
operation that affects every row in which the tour column is equal to Gauley River Rafting and
updates the
Base Camp column to the value Ashville.
The distributed SQL Server query will use OLE DB to call the Jet engine, which will open the Excel
spreadsheet file. Because the spreadsheet is opened by a user, the file is now unavailable to anyone else.
Excel is a single-user database. The
OpenDataSource() function supplies only the server name in a
four-part name; as with Access, the database and owner values are omitted:
UPDATE OpenDataSource(
‘Microsoft.Jet.OLEDB.4.0’,
‘Data Source=C:\SQLServerBible\CHA1_Schedule.xls;
User ID=Admin;Password=;Extended properties=Excel 5.0’
) Tour
SET [Base Camp] = ‘Ashville’
WHERE Tour = ‘Gauley River Rafting’;
Figure 31-5 illustrates the query execution plan for the distributed UPDATE query, beginning at the
right with a Remote Scan operation that returns all 19 rows from the Excel named range. The data is
then processed within SQL Server. The details of the Remote Update logical operation reveal that the
distributed
UPDATE query actually updated only two rows.
To complete the example, the following query reads from the same Excel spreadsheet and verifies that
the update took place. Again, the
OpenDataSource() function is only pointing the distributed query
to an external server:
SELECT *
FROM OpenDataSource(
‘Microsoft.Jet.OLEDB.4.0’,
‘Data Source=C:\SQLServerBible\CHA1_Schedule.xls;
User ID=Admin;Password=;Extended properties=Excel 5.0’

) Tour
WHERE Tour = ‘Gauley River Rafting’;
707
www.getcoolebook.com
Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 708
Part V Data Connectivity
FIGURE 31-5
The query execution plan for the distributed query using OpenDataSource()
Result:
Base Camp Tour

Ashville Gauley River Rafting
Ashville Gauley River Rafting
Pass-through distributed queries
A pass-through query executes a query at the external data source and returns the result to SQL Server.
The primary reason for using a pass-through query is to reduce the amount of data being passed
from the server (the external data source) and the client (SQL Server). Rather than pull a million rows
into SQL Server so that it can use 25 of them, it may be better to select those 25 rows from the external
data source.
Be aware that the pass-through query will use the query syntax of the external data source. If the
external data source is Oracle or Access, then PL/SQL or Access SQL must be used in the pass-through
query.
708
www.getcoolebook.com
Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 709
Executing Distributed Queries 31
In the case of a pass-through query that modifies data, the remote data type determines whether the
update is performed locally or remotely:
■ When another SQL Server is being updated, the remote SQL Server will perform the update.
■ When non–SQL Server data is being updated, the data providers determine where the update

will be performed. Often, the pass-through query merely selects the correct rows remotely. The
selected rows are returned to SQL Server, modified inside SQL Server, and then returned to
the remote data source for the update.
Two forms of local distributed queries exist, one for linked servers and one for external data sources
defined in the query; likewise, two forms of explicitly declaring pass-through distributed queries exist
as well.
OpenQuery() uses an established linked server, and OpenRowSet() declares the link within
the query.
Using the four-part name
If the distributed query is accessing another SQL Server, then the four-part name becomes a hybrid
distributed query method. Depending on the
FROM clause and the WHERE clause, SQL Server will attempt
to pass as much of the query as possible to the external SQL Server to improve performance.
When building a complex distributed query using the four-part name, it’s difficult to predict how much
of the query SQL Server will pass through. I’ve seen SQL Server take a single query and depending on
the
WHERE clause, the whole query was passed through, each table became a separate pass-through
query, or only one table was passed through.
OpenQuery()
For pass-through queries, the OpenQuery() function leverages a linked server, so it’s the easiest to
develop. It also handles changes in server configuration without changing the code.
The
OpenQuery() function is used within the SQL DML statement as a table. The function accepts
only two parameters: the name of the linked server and the pass-through query. The next query uses
OpenQuery() to retrieve data from the CHA1_Schedule Excel spreadsheet:
SELECT *
FROM OPENQUERY(CHA1_Schedule,
‘SELECT * FROM Tour WHERE Tour = "Gauley River Rafting"’);
Result:
Tour Base Camp


Gauley River Rafting Ashville
Gauley River Rafting Ashville
The OpenQuery() pass-through query requires almost no processing by SQL Server. The Remote Scan
returns exactly two rows to SQL Server. The
WHERE clause is executed by the Jet engine as it reads from
the Excel spreadsheet.
In the next example, the
OpenQuery() requests the Jet engine to extract only the two rows requiring
the update. The actual
UPDATE operation is performed in SQL Server, and the result is written back
709
www.getcoolebook.com
Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 710
Part V Data Connectivity
to the external data set. In effect, the pass-through query is performing only the SELECT portion of the
UPDATE command:
UPDATE OPENQUERY(CHA1_Schedule,
‘SELECT * FROM Tour WHERE Tour = "Gauley River Rafting"’)
SET [Base Camp] = ‘Ashville’;
OpenRowSet()
The OpenRowSet() function is the pass-through counterpart to the OpenDataSet() function. Both
require the remote data source to be fully specified in the distributed query.
OpenRowSet() adds a
parameter to specify the pass-through query:
SELECT ContactFirstName, ContactLastName
FROM OPENROWSET (’Microsoft.Jet.OLEDB.4.0’,
‘C:\SQLServerBible\CHA1_Customers.mdb’; ‘Admin’;’’,
‘SELECT * FROM Customers WHERE CustomerID = 1’);
Result:

ContactFirstName ContactLastName

Tom Mercer
Best Practice
O
f the four distributed-query methods, the best option is the OpenQuery() function. With
OpenQuery(), you have specific control over which data will be processed where. In addition, it has
the advantage of predefined links, making the query more robust if the server configuration changes.
To perform an update using the OpenRowSet() function, use the function in place of the table being
modified. The following code sample modifies the customer’s last name in an Access database. The
WHERE clause of the UPDATE command is handled by the pass-through portion of the OpenRowSet()
function:
UPDATE OPENROWSET (’Microsoft.Jet.OLEDB.4.0’,
‘C:\SQLServerBible\CHA1_Customers.mdb’; ‘Admin’;’’,
‘SELECT * FROM Customers WHERE CustomerID = 1’)
SET ContactLastName = ‘Wilson’;
Distributed Transactions
Transactions are key to data integrity. If the logical unit of work includes modifying data outside the
local SQL server, then a standard transaction is unable to handle the atomicity of the transaction. If a
failure should occur in the middle of the transaction, then a mechanism must be in place to roll back
710
www.getcoolebook.com
Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 711
Executing Distributed Queries 31
the partial work; otherwise, a partial transaction will be recorded and the database will be left in an
inconsistent state.
Chapter 66, ‘‘Managing Transactions, Locking, and Blocking,’’ explores the ACID properties
of a database and transactions.
Distributed Transaction Coordinator
SQL Server uses the Distributed Transaction Coordinator (DTC) to handle multiple server transactions,

commits, and rollbacks. The DTC service uses a two-phase commit scheme for multiple server trans-
actions. The two-phase commit ensures that every server is available and handling the transaction by
performing the following steps:
1. Each server is sent a ‘‘prepare to commit’’ message.
2. Each server performs the first phase of the commit, ensuring that it is capable of committing
the transaction.
3. Each server replies when it has finished preparing for the commit.
4. Only after every participating server has responded positively to the ‘‘prepare to commit’’
message is the actual commit message sent to each server.
If the logical unit of work only involves reading from the external SQL Server, then the DTC is not
required. Only when remote updates are occurring is a transaction considered a distributed transaction.
The Distributed Transaction Coordinator is a separate service from SQL Server. DTC is started or
stopped with the SQL Server Service Manager.
Only one instance of DTC runs per server regardless of how many SQL Server instances may be
installed or running on that server. The actual service name is
msdtc.exe, and it consumes only about
2.5 MB of memory.
DTC must be running when a distributed transaction is initiated or the transaction will fail.
Developing distributed transactions
Distributed transactions are similar to local transactions with a few extensions to the syntax:
SET xact_abort on;
BEGIN DISTRIBUTED TRANSACTION;
In case of error, the xact_abort connection option will cause the current transaction, rather than only
the current T-SQL statement, to be rolled back. The
xact_abort ON option is required for any dis-
tributed transactions accessing a remote SQL Server and for most other OLE DB connections as well;
but if
xact_abort ON is not in the code, then SQL Server will automatically convert the transaction
to
xact_abort ON as soon as a distributed query is executed.

The
BEGIN DISTRIBUTED TRANSACTION command, which determines whether the DTC service is
available, is not strictly required. If a transaction is initiated with only
BEGIN TRAN, then the transaction
is escalated to a distributed transaction, and DTC is checked as soon as a distributed query is executed.
It’s considered a better practice to use
BEGIN DISTRIBUTED TRANSACTION so that DTC is checked at
the beginning of the transaction. When DTC is not running, an 8501 error is raised automatically:
711
www.getcoolebook.com

×