Tải bản đầy đủ (.pdf) (10 trang)

Hướng dẫn học Microsoft SQL Server 2008 part 29 pps

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (690.5 KB, 10 trang )

Nielsen c10.tex V4 - 07/21/2009 12:42pm Page 242
Part II Manipulating Data With Select
INSERT dbo.One(OnePK, Thing1)
VALUES (2, ‘New Thing’);
INSERT dbo.One(OnePK, Thing1)
VALUES (3, ‘Red Thing’);
INSERT dbo.One(OnePK, Thing1)
VALUES (4, ‘Blue Thing’);
INSERT dbo.Two(TwoPK, OnePK, Thing2)
VALUES(1,0, ‘Plane’);
INSERT dbo.Two(TwoPK, OnePK, Thing2)
VALUES(2,2, ‘Train’);
INSERT dbo.Two(TwoPK, OnePK, Thing2)
VALUES(3,3, ‘Car’);
INSERT dbo.Two(TwoPK, OnePK, Thing2)
VALUES(4,NULL, ‘Cycle’);
FIGURE 10-9
The Red Thing Blue Thing example has data to view every type of join.
Old Thing
Red Thing
New Thing
Blue Thing
Plane
Cycle
Train
Car
An inner join between table One and table Two will return only the two matching rows:
SELECT Thing1, Thing2
FROM dbo.One
INNER JOIN dbo.Two
ON One.OnePK = Two.OnePK;


Result:
Thing1 Thing2

New Thing Train
Red Thing Car
A left outer join will extend the inner join and include the rows from table One without a match:
SELECT Thing1, Thing2
FROM dbo.One
LEFT OUTER JOIN dbo.Two
ON One.OnePK = Two.OnePK;
242
www.getcoolebook.com
Nielsen c10.tex V4 - 07/21/2009 12:42pm Page 243
Merging Data with Joins and Unions 10
All the rows are now returned from table One, but two rows are still missing from table Two:
Thing1 Thing2

Old Thing NULL
New Thing Train
Red Thing Car
Blue Thing NULL
A full outer join will retrieve every row from both tables, regardless of a match between the tables:
SELECT Thing1, Thing2
FROM dbo.One
FULL OUTER JOIN dbo.Two
ON One.OnePK = Two.OnePK;
The plane and cycle from table Two are now listed along with every row from table One:
Thing1 Thing2

Old Thing NULL

New Thing Train
Red Thing Car
Blue Thing NULL
NULL Plane
NULL Cycle
As this example shows, full outer joins are an excellent tool for finding all the data, even bad data. Set
difference queries, explored later in this chapter, build on outer joins to zero in on bad data.
Placing the conditions within outer joins
When working with inner joins, a condition has the same effect whether it’s in the JOIN clause or the
WHERE clause, but that’s not the case with outer joins:
■ When the condition is in the
JOIN clause, SQL Server includes all rows from the outer table
and then uses the condition to include rows from the second table.
■ When the restriction is placed in the
WHERE clause, the join is performed and then the WHERE
clause is applied to the joined rows.
The following two queries demonstrate the effect of the placement of the condition.
In the first query, the left outer join includes all rows from table
One and then joins those rows from
table
Two where OnePK is equal in both tables and Thing1’s value is New Thing. The result is all the
rows from table
One,androwsfromtableTwo that meet both join restrictions:
SELECT Thing1, Thing2
FROM dbo.One
LEFT OUTER JOIN dbo.Two
243
www.getcoolebook.com
Nielsen c10.tex V4 - 07/21/2009 12:42pm Page 244
Part II Manipulating Data With Select

ON One.OnePK = Two.OnePK
AND One.Thing1 = ‘New Thing’;
Result:
Thing1 Thing2

Old Thing NULL
New Thing Train
Red Thing NULL
Blue Thing NULL
The second query first performs the left outer join, producing the same four rows as the previous query
but without the
AND condition. The WHERE clause then restricts that result to those rows where Thing1
is equal to New Thing1. The net effect is the same as when an inner join was used (but it might take
more execution time):
SELECT Thing1, Thing2
FROM dbo.One
LEFT OUTER JOIN dbo.Two
ON One.OnePK = Two.OnePK
WHERE One.Thing1 = ‘New Thing’;
Result:
Thing1 Thing2

New Thing Train
Multiple outer joins
Coding a query with multiple outer joins can be tricky. Typically, the order of data sources in the FROM
clause doesn’t matter, but here it does. The key is to code them in a sequential chain. Think through it
this way:
1. Grab all the customers regardless of whether they’ve placed any orders.
2. Then grab all the orders regardless of whether they’ve shipped.
3. Then grab all the ship details.

When chaining multiple outer joins, stick to left outer joins, as mixing left and right outer joins
becomes very confusing very fast. Be sure to unit test the query with a small sample set of data to
ensure that the outer join chain is correct.
Self-Joins
A self-join is a join that refers back to the same table. This type of unary relationship is often used to
extract data from a reflexive (also called a recursive) relationship, such as organizational charts (employee
to boss). Think of a self-join as a table being joined with a temporary copy of itself.
244
www.getcoolebook.com
Nielsen c10.tex V4 - 07/21/2009 12:42pm Page 245
Merging Data with Joins and Unions 10
The Family sample database uses two self-joins between a child and his or her parents, as shown in
the database diagram in Figure 10-10. The mothers and fathers are also people, of course, and are listed
in the same table. They link back to their parents, and so on. The sample database is populated with
five fictitious generations that can be used for sample queries.
FIGURE 10-10
The database diagram of the Family database includes two unary relationships (children to parents)
on the left and a many-to-many unary relationship (husband to wife) on the right.
The key to constructing a self-join is to include a second reference to the table using a table alias. Once
the table is available twice to the
SELECT statement, the self-join functions much like any other join. In
the following example, the
dbo.Person table is referenced using the table alias Mother:
Switching over to the
Family sample database, the following query locates the children of Audry Hal-
loway:
USE Family;
SELECT Child.PersonID, Child.FirstName,
Child.MotherID, Mother.PersonID
245

www.getcoolebook.com
Nielsen c10.tex V4 - 07/21/2009 12:42pm Page 246
Part II Manipulating Data With Select
FROM dbo.Person AS Child
INNER JOIN dbo.Person AS Mother
ON Child.MotherID = Mother.PersonID
WHERE Mother.LastName = ‘Halloway’
AND Mother.FirstName = ‘Audry’;
The query uses the Person table twice. The first reference (aliased as Child) is joined with the sec-
ond reference (aliased as
Mother), which is restricted by the WHERE clause to only Audry Halloway.
Only the rows with a
MotherID that points back to Audry will be included in the inner join. Audry’s
PersonID is 6 and her children are as follows:
PersonID FirstName MotherID PersonID

8 Melanie 6 6
7 Corwin 6 6
9 Dara 6 6
10 James 6 6
While the previous query adequately demonstrates a self-join, it would be more useful if the mother
weren’t hard-coded in the
WHERE clause, and if more information were provided about each birth, as
follows:
SELECT CONVERT(NVARCHAR(15),C.DateofBirth,1) AS Date,
C.FirstName AS Name, C.Gender AS G,
ISNULL(F.FirstName + ‘ ‘ + F.LastName, ‘ * unknown *’)
as Father,
M.FirstName + ‘ ‘ + M.LastName as Mother
FROM dbo.Person AS C

LEFT OUTER JOIN dbo.Person AS F
ON C.FatherID = F.PersonID
INNER JOIN dbo.Person AS M
ON C.MotherID = M.PersonID
ORDER BY C.DateOfBirth;
This query makes three references to the Person table: the child, the father, and the mother, with
mnemonic one-letter aliases. The result is a better listing:
Date Name G Father Mother

5/19/22 James M James Halloway Kelly Halloway
8/05/28 Audry F Bryan Miller Karen Miller
8/19/51 Melanie F James Halloway Audry Halloway
8/30/53 James M James Halloway Audry Halloway
2/12/58 Dara F James Halloway Audry Halloway
3/13/61 Corwin M James Halloway Audry Halloway
3/13/65 Cameron M Richard Campbell Elizabeth Campbell

For more ideas about working with hierarchies and self-joins, refer to Chapter 17,
‘‘Traversing Hierarchies.’’
246
www.getcoolebook.com
Nielsen c10.tex V4 - 07/21/2009 12:42pm Page 247
Merging Data with Joins and Unions 10
Cross (Unrestricted) Joins
The cross join, also called an unrestricted join, is a pure relational algebra multiplication of the two
source tables. Without a join condition restricting the result set, the result set includes every possible
combination of rows from the data sources. Each row in data set one is matched with every row in data
set two — for example, if the first data source has five rows and the second data source has four rows,
a cross join between them would result in 20 rows. This type of result set is referred to as a Cartesian
product.

Using the
One/Two sample tables, a cross join is constructed in Management Studio by omitting the join
condition between the two tables, as shown in Figure 10-11.
FIGURE 10-11
A graphical representation of a cross join is simply two tables without a join condition.
In code, this type of join is specified by the keywords CROSS JOIN and the lack of an ON condition:
SELECT Thing1, Thing2
FROM dbo.One
CROSS JOIN dbo.Two;
247
www.getcoolebook.com
Nielsen c10.tex V4 - 07/21/2009 12:42pm Page 248
Part II Manipulating Data With Select
The result of a join without restriction is that every row in table One matches with every row from table
Two:
Thing1 Thing2

Old Thing Plane
New Thing Plane
Red Thing Plane
Blue Thing Plane
Old Thing Train
New Thing Train
Red Thing Train
Blue Thing Train
Old Thing Car
New Thing Car
Red Thing Car
Blue Thing Car
Old Thing Cycle

New Thing Cycle
Red Thing Cycle
Blue Thing Cycle
Sometimes cross joins are the result of someone forgetting to draw the join in a graphical-query tool;
however, they are useful for populating databases with sample data, or for creating empty ‘‘pidgin hole’’
rows for population during a procedure.
Understanding how a cross join multiplies data is also useful when studying relational division, the
inverse of relational multiplication. Relational division requires subqueries, so it’s explained in the next
chapter.
Exotic Joins
Nearly all joins are based on a condition of equality between the primary key of a primary table and the
foreign key of a secondary table, which is why the inner join is sometimes called an equi-join. Although
it’s commonplace to base a join on a single equal condition, it is not a requirement. The condition
between the two columns is not necessarily equal, nor is the join limited to one condition.
The
ON condition of the join is in reality nothing more than a WHERE condition restricting the product
of the two joined data sets. Where-clause conditions may be very flexible and powerful, and the same is
true of join conditions. This understanding of the
ON condition enables the use of three powerful tech-
niques:  (theta) joins, multiple-condition joins,andnon-key joins.
Multiple-condition joins
If a join is nothing more than a condition between two data sets, then it makes sense that multiple con-
ditions are possible at the join. In fact, multiple-condition joins and  joins go hand-in-hand. Without
the ability to use multiple-condition joins,  joins would be of little value.
248
www.getcoolebook.com
Nielsen c10.tex V4 - 07/21/2009 12:42pm Page 249
Merging Data with Joins and Unions 10
If the database schema uses natural primary keys, then there are probably tables with composite primary
keys, which means queries must use multiple-condition joins.

Join conditions can refer to any table in the
FROM clause, enabling interesting three-way joins:
FROM A
INNER JOIN B
ON A.col = B.col
INNER JOIN C
ON B.col = C.col
AND A.col = C.col;
The first query in the previous section, ‘‘Placing the Conditions within Outer Joins,’’ was a multiple-
condition join.
 (theta) joins
A theta join (depicted throughout as ) is a join based on a non-equal on condition. In relational the-
ory, conditional operators (=, >, <, >=, <=, <>) are called  operators. While the equals condi-
tion is technically a  operator, it is commonly used, so only joins with conditions other than equal are
referred to as  joins.
The  condition may be set within Management Studio’s Query Designer using the join Properties dia-
log, as previously shown in Figure 10-7.
Non-key joins
Joins are not limited to primary and foreign keys. The join can match a row in one data source with a
row in another data source using any column, as long as the columns share compatible data types and
the data match.
For example, an inventory allocation system would use a non-key join to find products that are expected
to arrive from the supplier before the customer’s required ship date. A non-key join between the
PurchaseOrder and OrderDetail tables with a  condition between PO.DateExpected and
OD.DateRequired will filter the join to those products that can be allocated to the customer’s orders.
The following code demonstrates the non-key join (this is not in a sample database):
SELECT OD.OrderID, OD.ProductID, PO.POID
FROM OrderDetail AS OD
INNER JOIN PurchaseOrder AS PO
ON OD.ProductID = PO.ProductID

AND OD.DateRequired > PO.DateExpected;
When working with inner joins, non-key join conditions can be placed in the WHERE clause or in the
JOIN. Because the conditions compare similar values between two joined tables, I often place these con-
ditions in the
JOIN portion of the FROM clause, rather than the WHERE clause. The critical difference
depends on whether you view the conditions as a part of creating the record set upon which the rest
of the SQL
SELECT statement is acting, or as a filtering task that follows the FROM clause. Either way,
the query-optimization plan is identical, so use the method that is most readable and seems most logical
249
www.getcoolebook.com
Nielsen c10.tex V4 - 07/21/2009 12:42pm Page 250
Part II Manipulating Data With Select
to you. Note that when constructing outer joins, the placement of the condition in the JOIN or in the
WHERE clause yields different results, as explained earlier in the section ‘‘Placing the Conditions within
Outer Joins.’’
Asking the question, ‘‘Who are twins?’’ of the
Family sample database uses all three exotic
join techniques in the join between person and twin. The join contains three conditions. The
Person.PersonID <> Twin.PersonID condition is a  join that prevents a person from being
considered his or her own twin. The join condition on
MotherID, while a foreign key, is nonstandard
because it is being joined with another foreign key. The
DateOfBirth condition is definitely a non-key
join condition:
SELECT Person.FirstName + ‘ ‘ + Person.LastName AS Person,
Twin.FirstName + ‘ ‘ + Twin.LastName AS Twin,
Person.DateOfBirth
FROM dbo.Person
INNER JOIN dbo.Person AS Twin

ON Person.PersonID <> Twin.PersonID
AND Person.MotherID = Twin.MotherID
AND Person.DateOfBirth = Twin.DateOfBirth;
The following is the same query, this time with the exotic join condition moved to the WHERE clause.
Not surprisingly, SQL Server’s Query Optimizer produces the exact same query execution plan for each
query:
SELECT Person.FirstName + ‘ ‘ + Person.LastName AS Person,
Twin.FirstName + ‘ ‘ + Twin.LastName AS Twin,
Person.DateOfBirth
FROM dbo.Person
INNER JOIN dbo.Person AS Twin
ON Person.MotherID = Twin.MotherID
AND Person.DateOfBirth = Twin.DateOfBirth
WHERE Person.PersonID <> Twin.PersonID;
Result:
Person Twin DateOfBirth

Abbie Halloway Allie Halloway 1979-010-14 00:00:00.000
Allie Halloway Abbie Halloway 1979-010-14 00:00:00.000
The difficult query scenarios at the end of the next chapter also demonstrate exotic joins, which are
oftenusedwithsubqueries.
Set Difference Queries
A query type that’s useful for analyzing the correlation between two data sets is a set difference query,
sometimes called a left (or right) anti-semi join, which finds the difference between the two data sets
based on the conditions of the join. In relational algebra terms, it removes the divisor from the dividend,
250
www.getcoolebook.com
Nielsen c10.tex V4 - 07/21/2009 12:42pm Page 251
Merging Data with Joins and Unions 10
leaving the difference. This type of query is the inverse of an inner join. Informally, it’s called a find

unmatched rows query.
Set difference queries are great for locating out-of-place data or data that doesn’t match, such as rows
that are in data set one but not in data set two (see Figure 10-12).
FIGURE 10-12
The set difference query finds data that is outside the intersection of the two data sets.
Old Thing
Red Thing
New Thing
Blue Thing
Plane
Cycle
Train
Car
Table TwoTable One
Set
Difference
Set
Difference
Left set difference query
A left set difference query finds all the rows on the left side of the join without a match on the right side
of the joins.
Using the
One and Two sample tables, the following query locates all rows in table One without a
matchintable
Two, removing set two (the divisor) from set one (the dividend). The result will be the
rows from set one that do not have a match in set two.
The outer join already includes the rows outside the intersection, so to construct a set difference query
use an
OUTER JOIN with an IS NULL restriction on the second data set’s primary key. This will return
all the rows from table

One that do not have a match in table Two:
USE tempdb;
SELECT Thing1, Thing2
FROM dbo.One
LEFT OUTER JOIN dbo.Two
ON One.OnePK = Two.OnePK
WHERE Two.TwoPK IS NULL;
Table One’s difference is as follows:
Thing1 Thing2

Old Thing NULL
Blue Thing NULL
251
www.getcoolebook.com

×