Tải bản đầy đủ (.pdf) (10 trang)

Joe Celko s SQL for Smarties - Advanced SQL Programming P34 doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (242.82 KB, 10 trang )


302 CHAPTER 15: EXISTS() PREDICATE

SELECT P1.emp_name, ' was born on a day without a famous New
Yorker!'
FROM Personnel AS P1
WHERE P1.birthday NOT IN
(SELECT C1.birthday
FROM Celebrities AS C1
WHERE C1.birth_city = 'New York');

and you would think that the

EXISTS

version would be:

SELECT P1.emp_name, ' was born on a day without a famous New
Yorker!'
FROM Personnel AS P1
WHERE NOT EXISTS
(SELECT *
FROM Celebrities AS C1
WHERE C1.birth_city = 'New York'
AND C1.birthday = P1.birthday);

Assume that Gloria Glamour is our only New Yorker and we still do
not know her birthday. The subquery will be empty for every employee
in the

NOT EXISTS



predicate version, because her

NULL

birthday will
not test equal to the known employee birthdays.
That means that the

NOT EXISTS

predicate will return

TRUE

and we
will get every employee to match to Ms. Glamour. But now look at the

IN

predicate version, which will have a single

NULL

in the subquery
result. This predicate will be equivalent to

(Personnel.birthday =
NULL)


, which is always

UNKNOWN

, and we will get no employees back.
Likewise, you cannot, in general, transform the quantified
comparison predicates into

EXISTS

predicates, because of the
possibility of

NULL

values. Remember that

x <> ALL <subquery>

is
shorthand for

x NOT IN <subquery>,

and

x = ANY <subquery>

is
shorthand for


x IN <subquery>,

and it will not surprise you.
In general, the

EXISTS

predicates will run faster than the

IN


predicates. The problem is in deciding whether to build the query or the
subquery first; the optimal approach depends on the size and distribution
of values in each, and that cannot usually be known until runtime.

15.2 EXISTS and INNER JOINs

The

[NOT] EXISTS

predicate is almost always used with a correlated
subquery. Very often the subquery can be “flattened” into a

JOIN

, which


15.3 NOT EXISTS and OUTER JOINs 303

will frequently run faster than the original query. Our sample query can
be converted into:



SELECT P1.emp_name, ' has the same birthday as a famous person!'
FROM Personnel AS P1, Celebrities AS C1
WHERE P1.birthday = C1.birthday;

The advantage of the

JOIN version is that it allows us to show
columns from both tables. We should make the query more informative
by rewriting it:
SELECT P1.emp_name, ' has the same birthday as ', C1.emp_name
FROM Personnel AS P1, Celebrities AS C1
WHERE P1.birthday = C1.birthday;
This new query could be written with an EXISTS() predicate, but
that is a waste of resources.
SELECT P1.emp_name, ' has the same birthday as ', C1.emp_name
FROM Personnel AS P1, Celebrities AS C1
WHERE EXISTS
(SELECT *
FROM Celebrities AS C2
WHERE P1.birthday = C2.birthday
AND C1.emp_name = C2.emp_name);
15.3 NOT EXISTS and OUTER JOINs
The NOT EXISTS version of this predicate is almost always used with a

correlated subquery. Very often the subquery can be “flattened” into an
OUTER JOIN, which will frequently run faster than the original query.
Our other sample query was:

SELECT P1.emp_name, ' was born on a day without a famous New
Yorker!'
FROM Personnel AS P1
WHERE NOT EXISTS
(SELECT *
FROM Celebrities AS C1
WHERE C1.birth_city = 'New York'
AND C1.birthday = P1.birthday);
304 CHAPTER 15: EXISTS() PREDICATE
Which we can replace with:
SELECT P1.emp_name, ' was born on a day without a famous New
Yorker!'
FROM Personnel AS P1
LEFT OUTER JOIN
Celebrities AS C1
ON C1.birth_city = 'New York'
AND C1.birthday = E2.birthday
WHERE C1.emp_name IS NULL;
This is assuming that we know each and every celebrity name in the
Celebrities table. If the column in the WHERE clause could have
NULLs in its base table, then we could not prune out the generated
NULLs. The test for NULL should always be on (a column of) the primary
key, which cannot be
NULL. Relating this back to the example, how
could a celebrity be a celebrity with an unknown name? Even The
Unknown Comic had a name (“The Unknown Comic”).

15.4 EXISTS() and Quantifiers
Formal logic makes use of quantifiers that can be applied to
propositions. The two forms are “For allx, P(x)” and “For somex, P(x)”.
The first is written as {{inverted uppercase A }} and the second is written
as {{reversed uppercase E}}, if you want to look up formulas in a
textbook. The quantifiers put into symbols such statements as “all men
are mortal” or “some Cretans are liars” so they can be manipulated.
The big question more than 100 years ago was that of existential
import in formal logic. Everyone agreed that saying “all men are mortal”
implies that “no men are not mortal,” but does it also imply that “some
men are mortal”—that we have to have at least one man who is mortal?
Existential import lost the battle and the modern convention is that
“All men are mortal” has the same meaning as “There are no men who
are immortal,” but does not imply that any men exist at all. This is the
convention followed in the design of SQL. Consider the statement “some
salesmen are liars” and the way we would write it with the
EXISTS()
predicate in SQL:

EXISTS(SELECT *
15.5 EXISTS() and Referential Constraints 305
FROM Personnel AS P1, Liars AS L1
WHERE P1.job = 'Salesman'
AND P1.emp_name = L1.emp_name);
If we are more cynical about salesmen, we might want to formulate
the predicate “all salesmen are liars” with the
EXISTS predicate in SQL,
using the transform rule just discussed:

NOT EXISTS(SELECT *

FROM Personnel AS P1
WHERE P1.job = 'Salesman'
AND P1.emp_name
NOT IN
(SELECT L1.emp_name
FROM Liars AS L1));
That says, informally, “there are no salesmen who are not liars” in
English. In this case, the
IN predicate can be changed into JOIN, which
should improve performance and be a bit easier to read.
15.5 EXISTS() and Referential Constraints
Standard SQL was designed so that the declarative referential constraints
could be expressed as
EXISTS() predicates in a CHECK() clause. For
example:
CREATE TABLE Addresses
(addressee_name CHAR(25) NOT NULL PRIMARY KEY,
street_loc CHAR(25) NOT NULL,
city_name CHAR(20) NOT NULL,
state_code CHAR(2) NOT NULL
REFERENCES ZipCodeData(state_code),
);
could be written as:
CREATE TABLE Addresses
(addressee_name CHAR(25) NOT NULL PRIMARY KEY,
street_loc CHAR(25) NOT NULL,
306 CHAPTER 15: EXISTS() PREDICATE
city_name CHAR(20) NOT NULL,
state_code CHAR(2) NOT NULL,
CONSTRAINT valid_state_code

CHECK (EXISTS(SELECT *
FROM ZipCodeData AS Z1
WHERE Z1.state_code = Addresses.state_code)),
);
There is no advantage to this expression for the DBA, since you
cannot attach referential actions with the
CHECK() constraint. However,
an SQL database can use the same mechanisms in the SQL compiler for
both constructions.
15.6 EXISTS and Three-Valued Logic
This example is due to an article by Lee Fesperman at FirstSQL. Using
Chris Date’s “SupplierParts” table with three rows:
CREATE TABLE SupplierPart
(sup_nbr CHAR(2) NOT NULL PRIMARY KEY,
part_nbr CHAR(2) NOT NULL,
qty INTEGER CHECK (qty > 0));
sup_nbr part_nbr qty
======================
'S1' 'P1' NULL
'S2' 'P1' 200
'S3' 'P1' 1000
The row (‘S1’, ‘P1’, NULL) means that supplier ‘S1’ supplies part ‘P1’
but we do not know what quantity he has.
The query we wish to answer is “Find suppliers of part ‘P1’, but not in
a quantity of 1000 on hand.” The correct answer is ‘S2’. All suppliers in
the table supply ‘P1’, but we do know ‘S3’ supplies the part in quantity
1000 and we do not know in what quantity ‘S1’ supplies the part. The
only supplier we eliminate for certain is ‘S2’.
An SQL query to retrieve this result would be:
SELECT spx.sup_nbr

FROM SupplierParts AS spx
WHERE px.part_nbr = 'P1'
15.6 EXISTS and Three-Valued Logic 307
AND 1000
NOT IN (SELECT spy.qty
FROM SupplierParts AS spy
WHERE spy.sup_nbr = spx.sup_nbr
AND spy.part_nbr = 'P1');
According to Standard SQL, this query should return only ‘S2’, but
when we transform the query into an equivalent version, using
EXISTS
instead, we obtain:
SELECT spx.sup_nbr
FROM SupplierParts AS spx
WHERE spx.part_nbr = 'P1'
AND NOT EXISTS
(SELECT *
FROM SupplierParts AS spy
WHERE spy.sup_nbr = spx.sup_nbr
AND spy.part_nbr = 'P1'
AND spy.qty = 1000);
Which will return (‘S1’, ‘S2’). You can argue that this is the wrong
answer because we do not definitely know whether or not ‘S1’ supplies
‘P1’ in quantity
1000. The EXISTS() predicate will return TRUE or
FALSE, even in situations where a subquery’s predicate returns an
UNKNOWN (i.e., NULL = 1000).
The solution is to modify the predicate that deals with the quantity in
the subquery to explicitly say that you do or not want to give the “benefit
of the doubt” to the

NULL. You have several alternatives:
1.
(spy.qty = 1000) IS NOT FALSE
This uses the new predicates in Standard SQL for testing logical
values. Frankly, this is confusing to read and worse to maintain.
2. (spy.qty = 1000 OR spy.qty IS NULL)
This uses another test predicate, but the optimizer can probably use
any index on the qty column.
308 CHAPTER 15: EXISTS() PREDICATE
3. (COALESCE(spy.qty, 1000) = 1000)
This is portable and easy to maintain. The only disadvantage is that
some SQL products might not be able to use an index on the qty
column, because it is in an expression.
The real problem is that the query was formed with a double
negative in the form of a
NOT EXISTS and an implicit IS NOT FALSE
condition. The problem stems from the fact that the
EXISTS()
predicate is one of the few two-value predicates in SQL, and that
(NOT
(NOT UNKNOWN)) = UNKNOWN.
For another approach based on Dr. Codd’s second relational model,
visit www.FirstSQL.com and read some of the white papers by Lee
Fesperman. He used the two
NULLs Codd proposed to develop a product.

CHAPTER

16


Quantified Subquery Predicates

A

QUANTIFIER IS A logical operator that states the quantity of objects
for which a statement is

TRUE

. This is a logical quantity, not a numeric
quantity; it relates a statement to the whole set of possible objects. In
everyday life, you see statements like “There is only one mouthwash
that stops dinosaur breath,” “All doctors drive Mercedes,” or “Some
people got rich investing in cattle futures,” which are quantifie

d.

The first statement, about the mouthwash, is a uniqueness
quantifier. If there were two or more products that could save us from
dinosaur breath, the statement would be

FALSE

. The second statement
has what is called a universal quantifier, since it deals with all
doctors—find one exception and the statement is

FALSE

. The last

statement has an existential quantifier, since it asserts that one or more
people exist who got rich on cattle futures—find one example and the
statement is

TRUE

.



SQL has forms of these quantifiers that are not quite like those in
formal logic. They are based on extending the use of comparison
predicates to allow result sets to be quantified, and they use SQL’s
three-valued logic, so they do not return just

TRUE

or

FALSE

.

310 CHAPTER 16: QUANTIFIED SUBQUERY PREDICATES

16.1 Scalar Subquery Comparisons

Standard SQL allows both scalar and row comparisons, but most queries
use only scalar expressions. If a subquery returns a single-row, single-
column result table, it is treated as a scalar value in Standard SQL in

virtually any place a scalar could appear. For example, to find out if we
have any teachers who are more than one year older than the students, I
could write:

SELECT T1.teacher_name
FROM Teachers AS T1
WHERE
T1.birthday > (SELECT MAX(S1.birthday) - INTERVAL '365' DAY
FROM Students AS S1);

In this case, the scalar subquery will be run only once and reduced to
a constant value by the optimizer before scanning the Teachers table.
A correlated subquery is more complex, because it will have to be
executed for each value from the containing query. For example, to find
which suppliers have sent us fewer than 100 parts, we would use this
query. Notice how the

SUM(quantity)

has to be computed for each
supplier number, sup_nbr.

SELECT sup_nbr, sup_name
FROM Suppliers
WHERE 100 > (SELECT SUM(quantity)
FROM Shipments
WHERE Shipments.sup_nbr = Suppliers.sup_nbr);

If a scalar subquery returns a


NULL

, we have rules for handling
comparison with

NULL

s. But what if it returns an empty result—a
supplier that has not shipped us anything? In Standard SQL, the empty
result table is converted to a

NULL

of the appropriate data type.



In Standard SQL, you can place scalar or row subqueries on either
side of a comparison predicate as long as they return comparable results.
But you must be aware of the rules for row comparisons. For example,
the following query will find the product manager who has more of his
product at the stores than in the warehouse:

SELECT manager_name, product_nbr
FROM Stores AS S1

16.2 Quantifiers and Missing Data 311

WHERE (SELECT SUM(qty)
FROM Warehouses AS W1

WHERE S1.product_nbr = W1.product_nbr)
< (SELECT SUM(qty)
FROM RetailStores AS R1
WHERE S1.product_nbr = R1.product_nbr);

Here is a programming tip: the main problem with writing these
queries is getting a result with more than one row in it. You can
guarantee uniqueness in several ways. An aggregate function on an
ungrouped table will always be a single value. A

JOIN

with the
containing query based on a key will always be a single value.

16.2 Quantifiers and Missing Data

The quantified predicates are used with subquery expressions to
compare a single value to those of the subquery, and take the general
form

<value expression> <comp op> <quantifier>
<subquery>

. The predicate

"<value expression> <comp op>
[ANY|SOME] <table expression>"

is equivalent to taking each

row,

s

, (assume that they are numbered from 1 to

n

) of

<table
expression>

and testing

"<value expression> <comp op> s"


with

OR

s between the expanded expressions:

((<value expression> <comp op> s1)
OR (<value expression> <comp op> s2)

OR (<value expression> <comp op> s

n


))

When you get a single

TRUE

result, the whole predicate is

TRUE

.
As long as

<table expression>

has cardinality greater than zero
and one non-

NULL

value, you will get a result of

TRUE

or

FALSE

. The

keyword

SOME

is the same as

ANY

, and the choice is just a matter of style
and readability. Likewise,

"<value expression> <comp op> ALL
<table expression>"

takes each row,

s

, of

<table expression>


and tests

<value expression> <comp op> s

with

AND


s between
the expanded expressions:

((<value expression> <comp op> s1)
AND (<value expression> <comp op> s2)

AND (<value expression> <comp op> s

n

))

×