Tải bản đầy đủ (.pdf) (10 trang)

Hướng dẫn học Microsoft SQL Server 2008 part 33 doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.06 MB, 10 trang )

Nielsen c11.tex V4 - 07/23/2009 1:54pm Page 282
Part II Manipulating Data With Select
relational division query would list only those students who passed the required courses and no others.
A relational division with a remainder, also called an approximate divide, would list all the students who
passed the required courses and include students who passed any additional courses. Of course, that
example is both practical and academic.
Relational division is more complex than a join. A join simply finds any matches between two data sets.
Relational division finds exact matches between two data sets. Joins/subqueries and relational division
solve different types of questions. For example, the following questions apply to the sample databases
and compare the two methods:
■ Joins/subqueries:
■ CHA2: Who has ever gone on a tour?
■ CHA2: Who lives in the same region as a base camp?
■ CHA2: Who has attended any event in his or her home region?
■ Exact relational division:
■ CHA2: Who has gone on every tour in his or her home state but no tours outside it?
■ OBXKites: Who has purchased every kite but nothing else?
■ Family: Which women (widows or divorcees) have married the same husbands as each
other, but no other husbands?
■ Relational division with remainders:
■ CHA2: Who has gone on every tour in his or her home state, and possibly other tours as
well?
■ OBXKites: Who has purchased every kite and possibly other items as well?
■ Family: Which women have married the same husbands and may have married other men
as well?
Relational division with a remainder
Relational division with a remainder essentially extracts the quotient while allowing some leeway for
rows that meet the criteria but contain additional data as well. In real-life situations this type of division
is typically more useful than an exact relational division.
The previous OBX Kites sales question (‘‘Who has purchased every kite and possibly other items as
well?’’) is a good one to use to demonstrate relational division. Because it takes five tables to go from


contact to product category, and because the question refers to the join between
OrderDetail and
Product, this question involves enough complexity that it simulates a real-world relational-database
problem.
The toy category serves as a good example category because it contains only two toys and no one has
purchased a toy in the sample data, so the query will answer the question ‘‘Who has purchased at least
one of every toy sold by OBX Kites?’’ (Yes, my kids volunteered to help test this query.)
First, the following data will mock up a scenario in the
OBXKites database. The only toys are
ProductCode 1049 and 1050. The OBXKites database uses unique identifiers for primary keys and
therefore uses stored procedures for all inserts. The first
Order and OrderDetail inserts will list the
stored procedure parameters so the following stored procedure calls are easier to understand:
USE OBXKites;
DECLARE @OrderNumber INT;
282
www.getcoolebook.com
Nielsen c11.tex V4 - 07/23/2009 1:54pm Page 283
Including Data with Subqueries and CTEs 11
The first person, ContactCode 110, orders exactly all toys:
EXEC pOrder_AddNew
@ContactCode = ‘110’,
@EmployeeCode = ‘120’,
@LocationCode = ‘CH’,
@OrderDate= ‘2002/6/1’,
@OrderNumber = @OrderNumber output;
EXEC pOrder_AddItem
@OrderNumber = @OrderNumber,
@Code = ‘1049’,
@NonStockProduct = NULL,

@Quantity = 12,
@UnitPrice = NULL,
@ShipRequestDate = ‘2002/6/1’,
@ShipComment = NULL;
EXEC pOrder_AddItem
@OrderNumber, ‘1050’, NULL, 3, NULL, NULL, NULL;
The second person, ContactCode 111, orders exactly all toys — and toy 1050 twice:
EXEC pOrder_AddNew
‘111’, ‘119’, ‘JR’, ‘2002/6/1’, @OrderNumber output;
EXEC pOrder_AddItem
@OrderNumber, ‘1049’, NULL, 6, NULL, NULL, NULL;
EXEC pOrder_AddItem
@OrderNumber, ‘1050’, NULL, 6, NULL, NULL, NULL;
EXEC pOrder_AddNew
‘111’, ‘119’, ‘JR’, ‘2002/6/1’, @OrderNumber output;
EXEC pOrder_AddItem
@OrderNumber, ‘1050’, NULL, 6, NULL, NULL, NULL;
The third person, ContactCode 112, orders all toys plus some other products:
EXEC pOrder_AddNew
‘112’, ‘119’, ‘JR’, ‘2002/6/1’, @OrderNumber output;
EXEC pOrder_AddItem
@OrderNumber, ‘1049’, NULL, 6, NULL, NULL, NULL;
EXEC pOrder_AddItem
@OrderNumber, ‘1050’, NULL, 5, NULL, NULL, NULL;
EXEC pOrder_AddItem
@OrderNumber, ‘1001’, NULL, 5, NULL, NULL, NULL;
EXEC pOrder_AddItem
@OrderNumber, ‘1002’, NULL, 5, NULL, NULL, NULL;
The fourth person, ContactCode 113, orders one toy:
EXEC pOrder_AddNew

‘113’, ‘119’, ‘JR’, ‘2002/6/1’, @OrderNumber output;
283
www.getcoolebook.com
Nielsen c11.tex V4 - 07/23/2009 1:54pm Page 284
Part II Manipulating Data With Select
EXEC pOrder_AddItem
@OrderNumber, ‘1049’, NULL, 6, NULL, NULL, NULL;
In other words, only customers 110 and 111 order all the toys and nothing else. Customer 112 pur-
chases all the toys, as well as some kites. Customer 113 is an error check because she bought only one
toy.
At least a couple of methods exist for coding a relational-division query. The original method, proposed
by Chris Date, involves using nested correlated subqueries to locate rows in and out of the sets. A more
direct method has been popularized by Joe Celko: It involves comparing the row count of the dividend
and divisor data sets.
Basically, Celko’s solution is to rephrase the question as ‘‘For whom is the number of toys ordered equal
to the number of toys available?’’
The query is asking two questions. The outer query will group the orders with toys for each contact,
and the subquery will count the number of products in the toy product category. The outer query’s
HAVING clause will then compare the distinct count of contact products ordered that are toys against
the count of products that are toys:
Is number of toys ordered
SELECT Contact.ContactCode
FROM dbo.Contact
JOIN dbo.[Order]
ON Contact.ContactID = [Order].ContactID
JOIN dbo.OrderDetail
ON [Order].OrderID = OrderDetail.OrderID
JOIN dbo.Product
ON OrderDetail.ProductID = Product.ProductID
JOIN dbo.ProductCategory

ON Product.ProductCategoryID = ProductCategory.ProductCategoryID
WHERE ProductCategory.ProductCategoryName = ‘Toy’
GROUP BY Contact.ContactCode
HAVING COUNT(DISTINCT Product.ProductCode) =
equal to number of toys available?
(SELECT Count(ProductCode)
FROM dbo.Product
JOIN dbo.ProductCategory
ON Product.ProductCategoryID
= ProductCategory.ProductCategoryID
WHERE ProductCategory.ProductCategoryName = ‘Toy’);
Result:
ContactCode

110
111
112
284
www.getcoolebook.com
Nielsen c11.tex V4 - 07/23/2009 1:54pm Page 285
Including Data with Subqueries and CTEs 11
Some techniques in the previous query — namely, group by, having,andcount() —are
explained in the next chapter, ‘‘Aggregating Data.’’
Exact relational division
Exact relational division finds exact matches without any remainder. It takes the basic question of rela-
tional division with remainder and tightens the method so that the divisor will have no extra rows that
cause a remainder.
In practical terms it means that the example question now asks, ‘‘Who has ordered only every toy?’’
If you address this query with a modified form of Joe Celko’s method, the pseudocode becomes ‘‘For
whom is the number of toys ordered equal to the number of toys available, and also equal to the total

number of products ordered?’’ If a customer has ordered additional products other than toys, then the
third part of the question eliminates that customer from the result set.
The SQL code contains two primary changes to the previous query. One, the outer query must find
both the number of toys ordered and the number of all products ordered. It does this by finding the
toys purchased in a derived table and joining the two data sets. Two, the
HAVING clause must be
modified to compare the number of toys available with both the number of toys purchased and the
number of all products purchased, as follows:
Exact Relational Division
Is number of all products ordered
SELECT Contact.ContactCode
FROM dbo.Contact
JOIN dbo.[Order]
ON Contact.ContactID = [Order].ContactID
JOIN dbo.OrderDetail
ON [Order].OrderID = OrderDetail.OrderID
JOIN dbo.Product
ON OrderDetail.ProductID = Product.ProductID
JOIN dbo.ProductCategory P1
ON Product.ProductCategoryID = P1.ProductCategoryID
JOIN
and number of toys ordered
(SELECT Contact.ContactCode, Product.ProductCode
FROM dbo.Contact
JOIN dbo.[Order]
ON Contact.ContactID = [Order].ContactID
JOIN dbo.OrderDetail
ON [Order].OrderID = OrderDetail.OrderID
JOIN dbo.Product
ON OrderDetail.ProductID = Product.ProductID

JOIN dbo.ProductCategory
ON Product.ProductCategoryID =
ProductCategory.ProductCategoryID
WHERE ProductCategory.ProductCategoryName = ‘Toy’
285
www.getcoolebook.com
Nielsen c11.tex V4 - 07/23/2009 1:54pm Page 286
Part II Manipulating Data With Select
) ToysOrdered
ON Contact.ContactCode = ToysOrdered.ContactCode
GROUP BY Contact.ContactCode
HAVING COUNT(DISTINCT Product.ProductCode) =
equal to number of toys available?
(SELECT Count(ProductCode)
FROM dbo.Product
JOIN dbo.ProductCategory
ON Product.ProductCategoryID
= ProductCategory.ProductCategoryID
WHERE ProductCategory.ProductCategoryName = ‘Toy’)
AND equal to the total number of any product ordered?
AND COUNT(DISTINCT ToysOrdered.ProductCode) =
(SELECT Count(ProductCode)
FROM dbo.Product
JOIN dbo.ProductCategory
ON Product.ProductCategoryID
= ProductCategory.ProductCategoryID
WHERE ProductCategory.ProductCategoryName = ‘Toy’);
The result is a list of contacts containing the number of toys purchased (2) and the number of total
products purchased (2), both equal to the number of products available (2):
ContactCode


110
111
Composable SQL
Composable SQL, also called select from output or DML table source (in SQL Server BOL), is the ability
to pass data from an insert, update, or delete’s output clause to an outer query. This is a very powerful
new way to build subqueries, and it can significantly reduce the amount of code and improve the per-
formance of code that needs to write to one table, and then, based on that write, write to another table.
To track the evolution of composable SQL (illustrated in Figure 11-3), SQL Server has always had
DML triggers, which include the inserted and deleted virtual tables. Essentially, these are a view to the
DML modification that fired the triggers. The deleted table holds the before image of the data, and the
inserted table holds the after image.
Since SQL Server 2005, any DML statement that modifies data (
INSERT, UPDATE, DELETE, MERGE)
can have an optional
OUTPUT clause that can SELECT from the virtual inserted and deleted table. The
OUTPUT clause can pass the data to the client or insert it directly into a table.
286
www.getcoolebook.com
Nielsen c11.tex V4 - 07/23/2009 1:54pm Page 287
Including Data with Subqueries and CTEs 11
The inserted and deleted virtual tables are covered in Chapter 26, ‘‘Creating DML Trig-
gers,’’ and the output clause is detailed in Chapter 15, ‘‘Modifying Data.’’
In SQL Server 2008, composable SQL can place the DML statements and its OUTPUT clause in a sub-
query and then select from that subquery. The primary benefit of composable SQL, as opposed to just
using the
OUTPUT clause to insert into a table, is that OUTPUT clause data may be further filtered and
manipulated by the outer query.
FIGURE 11-3
Composable SQL is an evolution of the inserted and deleted tables.

Output
Select From Output
Inserted Deleted
Insert
Select From
SQL 2008
SQL 2005
SQL 2000
DML
Insert,
Update,
Delete,
Merge
Client, table
variable, temp
tables, tables
subquery
The following script first creates a table and then has a composable SQL query. The subquery has an
UPDATE command with an OUTPUT clause. The OUTPUT clause passes the oldvalue and newvalue
columns to the outer query. The outer query filters out TestData and then inserts it into the CompSQL
table:
CREATE TABLE CompSQL (oldvalue varchar(50), newvalue varchar(50));
INSERT INTO CompSQL (oldvalue, newvalue )
SELECT oldvalue, newvalue
FROM
(UPDATE HumanResources.Department
SET GroupName = ‘Composable SQL Test’
OUTPUT Deleted.GroupName as ‘oldvalue’,
Inserted.GroupName as ‘newvalue’
WHERE Name = ‘Sales’) Q;

287
www.getcoolebook.com
Nielsen c11.tex V4 - 07/23/2009 1:54pm Page 288
Part II Manipulating Data With Select
SELECT oldvalue, newvalue
FROM CompSQL
WHERE newvalue <> ‘TestData’;
Result:
oldvalue newvalue

Sales and Marketing Composable SQL Test
Note several restrictions on composable SQL:
■ The update DML in the subquery must modify a local table and cannot be a partitioned view.
■ The composable SQL query cannot include nested composable SQL, aggregate function, sub-
query, ranking function, full-text features, user-defined functions that perform data access, or
the
textptr function.
■ The target table must be a local base table with no triggers, no foreign keys, no merge
replication, or updatable subscriptions for transactional replication.
Summary
While the basic nuts and bolts of subqueries may appear simple, they open a world of possibilities, as
they enable you to build complex nested queries that pull and twist data into the exact shape that is
needed to solve a difficult problem. As you continue to play with subqueries, I think you’ll agree that
herein lies the power of SQL — and if you’re still developing primarily with the GUI tools, this might
provide the catalyst to move you to developing SQL using the query text editor.
A few key points from this chapter:
■ Simple subqueries are executed once and the results are inserted into the outer query.
■ Subqueries can be used in nearly every portion of the query — not just as derived tables.
■ Correlated subqueries refer to the outer query, so they can’t be executed by themselves. Con-
ceptually, the outer query is executed and the results are passed to the correlated subquery,

which is executed once for every row in the outer query.
■ You don’t need to memorize how to code relational division; just remember that if you need to
join not on any row but every row, then relational division is the set-based solution to do the
job.
■ Composable SQL is useful if you need to write to multiple tables from a single transaction, but
there are plenty of limitations.
The previous chapters established the foundation for working with SQL, covering the
SELECT state-
ment, expressions, joins, and unions, while this chapter expanded the
SELECT with powerful subqueries
and CTEs. If you’re reading through this book sequentially, congratulations — you are now over the
hump of learning SQL. If you can master relational algebra and subqueries, the rest is a piece of cake.
The next chapter continues to describe the repertoire of data-retrieval techniques with aggregation
queries, where using subqueries pays off.
288
www.getcoolebook.com
Nielsen c12.tex V4 - 07/21/2009 12:46pm Page 289
Aggregating Data
IN THIS CHAPTER
Calculating sums and averages
Statistical analysis
Grouping data within a query
Solving aggravating aggregation
problems
Generating cumulative totals
Building crosstab queries with
the case, pivot, and dynamic
methods
T
he Information Architecture Principle in Chapter 2 implies that informa-

tion, not just data, is an asset. Turning raw lists of keys and data into
useful information often requires summarizing data and grouping it in
meaningful ways. While summarization and analysis can certainly be performed
with other tools, such as Reporting Services, Analysis Services, or an external tool
such as SAS, SQL is a set-based language, and a fair amount of summarizing and
grouping can be performed very well within the SQL
SELECT statement.
SQL excels at calculating sums, max values, and averages for the entire data set
or for segments of data. In addition, SQL queries can create cross-tabulations,
commonly known as pivot tables.
Simple Aggregations
The premise of an aggregate query is that instead of returning all the selected
rows, SQL Server returns a single row of computed values that summarizes the
original data set, as illustrated in Figure 12-1. More complex aggregate queries
can slice the selected rows into subsets and then summarize every subset.
The types of aggregate calculations range from totaling the data to performing
basic statistical operations.
It’s important to note that in the logical order of the SQL query, the aggregate
functions (indicated by the Summing function in the diagram) occur following
the
FROM clause and the WHERE filters. This means that the data can be assem-
bled and filtered prior to being summarized without needing to use a subquery,
although sometimes a subquery is still needed to build more complex aggregate
queries (as detailed later in the ‘‘Aggravating Queries’’ section in this chapter.)
289
www.getcoolebook.com
Nielsen c12.tex V4 - 07/21/2009 12:46pm Page 290
Part II Manipulating Data With Select
What’s New with Query Aggregations?
M

icrosoft continues to evolve T-SQL’s ability to aggregate data. SQL Server 2005 included the capability
to roll your own aggregate functions using the .NET CLR. SQL Server 2008 expands this feature by
removing the 8,000-byte limit on intermediate results for CLR user-defined aggregate functions.
The most significant enhancement to query aggregation in SQL Server 2008 is the ability to use grouping sets
to further define the CUBE and ROLLUP functions with the GROUP BY clause.
WITH ROLLUP and WITH CUBE have been deprecated, as they are non-ISO-compliant syntax for special
cases of the ISO-compliant syntax. They are replaced with the new, more powerful, syntax for ROLLUP and
CUBE.
FIGURE 12-1
The aggregate function produces a single row result from a data set.
Where
From
Col(s),
Expr(s)
Single Row
Summing
Data
Source(s)
Basic aggregations
SQL includes a set of aggregate functions, listed in Table 12-1, which can be used as expressions in the
SELECT statement to return summary data.
ON
the
WEBSITE
ON
the
WEBSITE
The code examples for this chapter use a small table called RawData. The code to
create and populate this data set is at the beginning of the chapter’s script. You can
download the script from

www.SQLServerBible.com.
CREATE TABLE RawData (
RawDataID INT NOT NULL IDENTITY PRIMARY KEY,
Region VARCHAR(10) NOT NULL,
Category CHAR(1) NOT NULL,
Amount INT NULL,
SalesDate Date NOT NULL
);
290
www.getcoolebook.com
Nielsen c12.tex V4 - 07/21/2009 12:46pm Page 291
Aggregating Data 12
TABLE 12-1
Basic Aggregate Functions
Aggregate Function Data Type Supported Description
sum() Numeric Totals all the non-null values in the column
avg() Numeric Averages all the non-null values in the column. The
result has the same data type as the input, so the
input is often converted to a higher precision, such as
avg(cast col as a float).
min() Numeric, string,
datetime
Returns the smallest number or the first datetime or
the first string according to the current collation from
the column
max() Numeric, string,
datetime
Returns the largest number or the last datetime or the
last string according to the current collation from the
column

Count[_big](*) Any data type
(row-based)
Performs a simple count of all the rows in the result
set up to 2,147,483,647. The count_big() variation
uses the bigint data type and can handle up to
2 ˆ 63-1 rows.
Count[_big]
([distinct]
column)
Any data type
(row-based)
Performs a simple count of all the rows with non-null
values in the column in the result set up to
2,147,483,647. The distinct option eliminates
duplicate rows. Will not count blobs.
This simple aggregate query counts the number of rows in the table and totals the Amount column. In
lieu of returning the actual rows from the
RawData table, the query returns the summary row with the
row count and total. Therefore, even though there are 24 rows in the
RawData table, the result is a
single row:
SELECT COUNT(*) AS Count,
SUM(Amount) AS [Sum]
FROM RawData;
Result:
Count Sum

24 946
291
www.getcoolebook.com

×