Tải bản đầy đủ (.pdf) (10 trang)

Joe Celko s SQL for Smarties - Advanced SQL Programming P33 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (220.48 KB, 10 trang )


292 CHAPTER 14: THE [NOT] IN() PREDICATE

SELECT *
FROM JohnsBook AS J1
WHERE NOT EXISTS
(SELECT *
FROM QualityGuide AS Q1
WHERE Q1.restaurant_name = J1.restaurant_name);

The reason the second version will probably run faster is that it can
test for existence using the indexes on both tables. The

NOT IN()


version has to test all the values in the subquery table for inequality.
Many SQL implementations will construct a temporary table from the

IN()

predicate subquery, if it has a

WHERE

clause, but the temporary
table will not have any indexes. The temporary table can also have
duplicates and a random ordering of its rows, so that the SQL engine has
to do a full-table scan.

14.2 Replacing ORs with the IN() Predicate



A simple trick that beginning SQL programmers often miss is that an

IN()

predicate can often replace a set of

ORed predicates. For example:
SELECT *
FROM QualityControlReport
WHERE test_1 = 'passed'
OR test_2 = 'passed'
OR test_3 = 'passed'
OR test_4 = 'passed';
can be rewritten as:
SELECT *
FROM QualityControlReport
WHERE 'passed' IN (test_1, test_2, test_3, test_4);
The reason this is difficult to see is that programmers get used to
thinking of either a subquery or a simple list of constants. They miss the
fact that the
IN() predicate list can be a list of expressions. The
optimizer would have handled each of the original predicates separately
in the
WHERE clause, but it has to handle the IN() predicate as a single
item, which can change the order of evaluation. This might or might not
be faster than the list of
ORed predicates for a particular query. This
14.3 NULLs and the IN() Predicate 293
formulation might cause the predicate to become nonindexable; you

should check the indexability rules of your particular DBMS.
14.3 NULLs and the IN() Predicate
NULLs make some special problems in a NOT IN() predicate with a
subquery. Consider these two tables:
CREATE TABLE Table1 (x INTEGER);
INSERT INTO Table1 VALUES (1), (2), (3), (4);
CREATE TABLE Table2 (x INTEGER);
INSERT INTO Table2 VALUES (1), (NULL), (2);
Now execute the query:
SELECT *
FROM Table1
WHERE x NOT IN (SELECT x FROM Table2)
Let’s work it out step by painful step:
1. Do the subquery:
SELECT *
FROM Table1
WHERE x NOT IN (1, NULL, 2);
2. Convert the NOT IN() to its definition:
SELECT *
FROM Table1
WHERE NOT (x IN (1, NULL, 2));
3. Expand IN() predicate:
SELECT *
FROM Table1
WHERE NOT ((x = 1) OR (x = NULL) OR (x = 2));
4. Apply DeMorgan’s law:
SELECT *
FROM Table1
294 CHAPTER 14: THE [NOT] IN() PREDICATE
WHERE ((x <> 1) AND (x <> NULL) AND (x <> 2

5. Perform the constant logical expression:
SELECT *
FROM Table1
WHERE ((x <> 1) AND UNKNOWN AND (x <> 2));
6. Reduce OR to constant:
SELECT *
FROM Table1
WHERE UNKNOWN;
7. The results are always empty.
Now try this with another set of tables
CREATE TABLE Table3 (x INTEGER);
INSERT INTO Table3 VALUES (1), (2), (NULL), (4);
CREATE TABLE Table4 (x INTEGER);
INSERT INTO Table3 VALUES (1), (3), (2);
Let’s work out the same query step by painful step again.
1. Do the subquery
SELECT *
FROM Table3
WHERE x NOT IN (1, 3, 2);
2. Convert the NOT IN() to Boolean expression
SELECT *
FROM Table3
WHERE NOT (x IN (1, 3, 2));
3. Expand IN() predicate
SELECT *
FROM Table3
14.4 IN() Predicate and Referential Constraints 295
WHERE NOT ((x = 1) OR (x = 3) OR (x = 2));
4. Apply DeMorgan’s law:
SELECT *

FROM Table3
WHERE ((x <> 1) AND (x <> 3) AND (x <> 2));
5. Compute the result set; I will show it as a UNION with
substitutions:
SELECT *
FROM Table3
WHERE ((1 <> 1) AND (1 <> 3) AND (1 <> 2)) FALSE
UNION ALL
SELECT *
FROM Table3
WHERE ((2 <> 1) AND (2 <> 3) AND (2 <> 2)) FALSE
UNION ALL
SELECT * FROM Table3
WHERE ((CAST(NULL AS INTEGER) <> 1)
AND (CAST(NULL AS INTEGER) <> 3)
AND (CAST(NULL AS INTEGER) <> 2)) UNKNOWN
UNION ALL
SELECT *
FROM Table3
WHERE ((4 <> 1) AND (4 <> 3) AND (4 <> 2)); TRUE
6. The result is one row = (4).
14.4 IN() Predicate and Referential Constraints
One of the most popular uses for the IN() predicate is in a CHECK()
clause on a table. The usual form is a list of values that are legal for a
column, such as:
CREATE TABLE Addresses
(addressee_name CHAR(25) NOT NULL PRIMARY KEY,
street_loc CHAR(25) NOT NULL,
city_name CHAR(20) NOT NULL,
state_code CHAR(2) NOT NULL

CONSTRAINT valid_state_code
296 CHAPTER 14: THE [NOT] IN() PREDICATE
CHECK (state_code IN ('AL', 'AK', )),
);
This method works fine with a small list of values, but it has problems
with a longer list. It is very important to arrange the values in the order
that they are most likely to match to the two-letter state_code to speed
up the search.
In Standard SQL a constraint can reference other tables, so you could
write the same constraint as:
CREATE TABLE Addresses
(addressee_name CHAR(25) NOT NULL PRIMARY KEY,
street_loc CHAR(25) NOT NULL,
city_name CHAR(20) NOT NULL,
state_code CHAR(2) NOT NULL,
CONSTRAINT valid_state_code
CHECK (state_code
IN (SELECT state_code
FROM ZipCodes AS Z1
WHERE Z1.state_code = Addresses.state_code)),
);
The advantage of this is that you can change the ZipCodes table and
thereby change the effect of the constraint on the
Addresses table. This
is fine for adding more data in the outer reference (i.e., Quebec joins the
United States and gets the code ‘
QB’), but it has a bad effect when you try
to delete data in the outer reference (i.e., California secedes from the
United States and every row with ‘
CA’ for a state code is now invalid).

As a rule of thumb, use the
IN() predicate in a CHECK() constraint
when the list is short, static, and unique to one table. When the list is
short, static, but not unique to one table, then use a
CREATE DOMAIN
statement, and put the
IN() predicate in a CHECK() constraint on the
domain.
Use a
REFERENCES clause to a lookup table when the list is long and
dynamic, or when several other schema objects (
VIEWs, stored
procedures, etc.) reference the values. A separate table can have an
index, and that makes a big difference in searching and doing joins.
14.5 IN() Predicate and Scalar Queries 297
14.5 IN() Predicate and Scalar Queries
As mentioned before, the list of an IN() predicate can be any scalar
expression. This includes scalar subqueries, but most people do not
seem to know that this is possible. For example, given tables that model
warehouses, trucking centers, and so forth, we can find if we have a
product, identified by its UPC code, somewhere in the enterprise.
SELECT P.upc
FROM Picklist AS P
WHERE P.upc
IN ((SELECT upc FROM Warehouse AS W WHERE W.upc =
Picklist.upc),
(SELECT upc FROM TruckCenter AS T WHERE T.upc =
Picklist.upc),

(SELECT upc FROM Garbage AS G WHERE G.upc =

Picklist.upc));
The empty result sets will become NULLs in the list. The alternative to
this is usually a chain of
OUTER JOINs or an ORed list of EXISTS()
predicates.


CHAPTER

15

EXISTS() Predicate

T

HE EXISTS PREDICATE IS very natural. It is a test for a nonempty set. If
there are any rows in its subquery, it is

TRUE

; otherwise, it is

FALSE

.
This predicate does not give an

UNKNOWN

result. The syntax is:


<exists predicate> ::= EXISTS <table subquery>

It is worth mentioning that a

<table subquery>

is always inside
parentheses to avoid problems in the grammar during parsing.
In SQL-89, the rules stated that the subquery had to have a

SELECT

clause with one column or a

*

. If the

SELECT *

option was
used, the database engine would (in theory) pick one column and use
it. This fiction was needed because SQL-89 defined subqueries as
having only one column.
Some early SQL implementations would work better with

EXISTS(SELECT <column> )

,


EXISTS(SELECT <constant>
),

or

EXISTS(SELECT * )

versions of the predicate. Today,
there is no difference in the three forms in the major products, so the

EXISTS(SELECT * )

is the preferred form.
Indexes are very useful for

EXISTS()

predicates because they can
be searched while the base table is left alone completely. For example,
we want to find all employees who were born on the same day as any
famous person. The query could be:

300 CHAPTER 15: EXISTS() PREDICATE

SELECT P1.emp_name, ' has the same birthday as a famous person!'
FROM Personnel AS P1
WHERE EXISTS
(SELECT *
FROM Celebrities AS C1

WHERE P1.birthday = C1.birthday);

If the table

Celebrities

has an index on its birthday column, the
optimizer will get the current employee’s birthday

P1.birthday

and
look up that value in the index. If the value is in the index, the predicate
is

TRUE

and we do not need to look at the

Celebrities

table at all.



If it is not in the index, the predicate is

FALSE

and there is still no

need to look at the

Celebrities

table. This should be fast, since
indexes are smaller than their tables and are structured for very fast
searching.
However, if

Celebrities

has no index on its birthday column, the
query may have to look at every row to see if there is a birthday that
matches the current employee’s birthday. There are some tricks that a
good optimizer can use to speed things up in this situation.

15.1 EXISTS and NULLs

A

NULL

might not be a value, but it does exist in SQL. This is often a
problem for a new SQL programmer who is having trouble with

NULL

s
and how they behave.
Think of them as being like a brown paper bag—you know that

something is inside because you lifted it, but you do not know exactly
what that something is. For example, we want to find all the employees
who were not born on the same day as a famous person. This can be
answered with the negation of the original query, like this:

SELECT P1.emp_name, ' was born on a day without a famous person!'
FROM Personnel AS P1
WHERE NOT EXISTS
(SELECT *
FROM Celebrities AS C1
WHERE P1.birthday = C1.birthday);

But assume that among the celebrities we have a movie star who will
not admit her age, shown in the row

('Gloria Glamour', NULL)

. A
new SQL programmer might expect that Ms. Glamour would not match

15.1 EXISTS and NULLs 301

to anyone, since we do not know her birthday yet. Actually, she will
match to everyone, since there is a chance that they may match when
some tabloid newspaper finally gets a copy of her birth certificate. But
work out the subquery in the usual way to convince yourself:


WHERE NOT EXISTS
(SELECT *

FROM Celebrities
WHERE P1.birthday = NULL);


becomes:



WHERE NOT EXISTS
(SELECT *
FROM Celebrities
WHERE UNKNOWN);

becomes:


WHERE TRUE;

And you see that the predicate tests to

UNKNOWN

because of the

NULL


comparison, and therefore fails whenever we look at Ms. Glamour.
Another problem with


NULL

s is found when you attempt to convert

IN

predicates to

EXISTS

predicates. Using our example of matching our
employees to famous people, the query can be rewritten as:

SELECT P1.emp_name, ' was born on a day without a famous person!'
FROM Personnel AS P1
WHERE P1.birthday NOT IN
(SELECT C1.birthday
FROM Celebrities AS C1);

However, consider a more complex version of the same query, where
the celebrity has to have been born in New York City. The

IN

predicate
would be:

×