Tải bản đầy đủ (.pdf) (10 trang)

Joe Celko s SQL for Smarties - Advanced SQL Programming P5 pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (136.07 KB, 10 trang )

12 CHAPTER 1: DATABASE DESIGN
When you use NULLs in math calculations, they propagate in the
results so that the answer is another
NULL. When you use them in
logical expressions or comparisons, they return a logical value of
UNKNOWN and give SQL its strange three-valued logic. They sort either
always high or always low in the collation sequence. They group
together for some operations, but not for others. In short,
NULLs cause a
lot of irregular features in SQL, which we will discuss later. Your best bet
is just to memorize the situations and the rules for
NULLs when you
cannot avoid them.
CHECK() Constraint
The CHECK() constraint tests the rows of the table against a logical
expression, which SQL calls a search condition, and rejects rows whose
search condition returns
FALSE. However, the constraint accepts rows
when the search condition returns
TRUE or UNKNOWN. This is not the
same rule as the
WHERE clause, which rejects rows that test UNKNOWN.
The reason for this “benefit-of-the-doubt” feature is so that it will be easy
to write constraints on
NULL-able columns.
The usual technique is to do simple range checking, such as CHECK
(rating BETWEEN 1 AND 10), or to verify that a column’s value is in
an enumerated set, such as
CHECK (sex IN (0, 1, 2, 9)), with
this constraint. Remember that the sex column could also be set to
NULL, unless a NOT NULL constraint is also added to the column’s


declaration. Although it is optional, it is a really good idea to use a
constraint name. Without it, most SQL products will create a huge, ugly,
unreadable random string for the name, since they need to have one in
the schema tables. If you provide your own, you can drop the constraint
more easily and understand the error messages when the constraint is
violated.

For example, you can enforce the rule that a firm must not hire
anyone younger than 21 years of age for a job that requires a liquor-
serving license by using a single check clause to check the applicant’s
birth date and hire date. However, you cannot put the current system
date into the
CHECK() clause logic for obvious reasons, it is always
changing.
The real power of the
CHECK() clause comes from writing complex
expressions that verify relationships with other rows, with other tables,
or with constants. Before SQL-92, the
CHECK() constraint could only
reference columns in the table in which it was declared. In Standard
SQL, the
CHECK() constraint can reference any schema object. As an
example of how complex things can get, consider a database of movies.
1.1 Schema and Table Creation 13
First, let’s enforce the rule that no country can export more than ten
titles.
CREATE TABLE Exports
(movie_title CHAR(25) NOT NULL,
country_code CHAR(2) NOT NULL, use 2-letter ISO nation codes
sales_amt DECIMAL(12, 2) NOT NULL,

PRIMARY KEY (movie_title, country_code),
CONSTRAINT National_Quota
CHECK ( reference to same table
10 <= ALL (SELECT COUNT(movie_title)
FROM Exports AS E1
GROUP BY E1.country_code))
);
When doing a self-join, you must use the base table name and not all
correlation names. Let’s make sure no movies from different countries
have the same title.
CREATE TABLE ExportMovies
(movie_title CHAR(25) NOT NULL,
country_code CHAR(2) NOT NULL,
sales_amt DECIMAL(12, 2) NOT NULL,
PRIMARY KEY (movie_title, country_code),
CONSTRAINT National_Quota
CHECK (NOT EXISTS self-join
(SELECT *
FROM ExportMovies AS E1
WHERE ExportMovies.movie_title = E1.movie_title
AND ExportMovies.country_code <> E1.country_code)
);
Here is way to enforce the rule that you cannot export a movie to its
own country of origin.
CREATE TABLE ExportMovies
(movie_title CHAR(25) NOT NULL,
country_code CHAR(2) NOT NULL,
sales_amt DECIMAL(12, 2) NOT NULL,
PRIMARY KEY (movie_title, country_code),
CONSTRAINT Foreign_film

14 CHAPTER 1: DATABASE DESIGN
CHECK (NOT EXISTS reference to second table
(SELECT *
FROM Movies AS M1
WHERE M1.movie_title = ExportMovies.movie_title
AND M1.country_of_origin = ExportMovies.country_code)));
These table-level constraints often use a NOT EXISTS() predicate.
Despite the fact that you can often do a lot of work in a single constraint,
it is better to write a lot of small constraints, so that you know exactly
what went wrong when one of them is violated.
Another important point to remember is that all constraints are true if
the table is empty. This is handled by the
CREATE ASSERTION
statement, which we will discuss shortly.
UNIQUE and PRIMARY KEY Constraints
The UNIQUE constraint says that no duplicate values are allowed in the
column or columns involved.
<unique specification> ::= UNIQUE | PRIMARY KEY
File system programmers understand the concept of a PRIMARY
KEY, but for the wrong reasons. They are used to a file, which can have
only one key because that key is used to determine the physical order of
the records within the file. Tables have no order; the term
PRIMARY KEY
in SQL has to do with defaults in referential actions, which we will
discuss later.
There are some subtle differences between
UNIQUE and PRIMARY
KEY. A table can have only one PRIMARY KEY but many UNIQUE
constraints. A
PRIMARY KEY is automatically declared to have a NOT

NULL constraint on it, but a UNIQUE column can have NULLs in a row
unless you explicitly add a
NOT NULL constraint. Adding the NOT NULL
whenever possible is a good idea, as it makes the column into a proper
relational key.
There is also a multiple-column form of the
<unique
specification>, which is usually written at the end of the column
declarations. It is a list of columns in parentheses after the proper
keyword, and it means that the combination of those columns is
unique. For example, I might declare
PRIMARY KEY (city,
department) so I can be sure that though I have offices in many
cities and many identical departments in those offices, there is only
one personnel department in Chicago.
1.1 Schema and Table Creation 15
REFERENCES Clause
The
<references specification> is the simplest version of a
referential constraint definition, which can be quite tricky. For now, let
us just consider the simplest case:
<references specification> ::=
[CONSTRAINT <constraint name>]
REFERENCES <referenced table name>[(<reference column>)]
This relates two tables together, so it is different from the other
options we have discussed so far. What this says is that the value in this
column of the referencing table must appear somewhere in the
referenced table’s column named in the constraint. Furthermore, the
referenced column must be in a
UNIQUE constraint. For example, you

can set up a rule that the Orders table can have orders only for goods
that appear in the Inventory table.
If no
<reference column> is given, then the PRIMARY KEY
column of the referenced table is assumed to be the target. This is one of
those situations where the
PRIMARY KEY is important, but you can
always play it safe and explicitly name a column. There is no rule to
prevent several columns from referencing the same target column. For
example, we might have a table of flight crews that has pilot and copilot
columns that both reference a table of certified pilots.
A circular reference is a relationship in which one table references a
second table, which in turn references the first table. The old gag about
“you cannot get a job until you have experience, and you cannot get
experience until you have a job!” is the classic version of this.
Notice that the columns in a multicolumn
FOREIGN KEY must
match to a multicolumn
PRIMARY KEY or UNIQUE constraint. The
syntax is:
[CONSTRAINT <constraint name>]
FOREIGN KEY (<column list>)
REFERENCES <referenced table name>[(<reference column list>)]
Referential Actions
The
REFERENCES clause can have two subclauses that take actions when
a database event changes the referenced table. This feature came with
Standard SQL and took a while to be implemented in most SQL
products. The two database events are updates and deletes, and the
subclauses look like this:

16 CHAPTER 1: DATABASE DESIGN
<referential triggered action> ::=
<update rule> [<delete rule>] | <delete rule> [<update rule>]
<update rule> ::= ON UPDATE <referential action>
<delete rule> ::= ON DELETE <referential action>
<referential action> ::= CASCADE | SET NULL | SET DEFAULT | NO
ACTION
When the referenced table is changed, one of the referential actions is
set in motion by the SQL engine.
1. The CASCADE option will change the values in the referencing
table to the new value in the referenced table. This is a very
common method of DDL programming that allows you to set
up a single table as the trusted source for an identifier. This way
the system can propagate changes automatically.
This removes one of the arguments for nonrelational system-
generated surrogate keys. In early SQL products that were based
on a file system for their physical implementation, the values
were repeated for both the referenced and referencing tables.
Why? The tables were regarded as separate units, like files.
Later SQL products regarded the schema as a whole. The
referenced values appeared once in the referenced table, and
the referencing tables obtained them by following pointer
chains to that one occurrence in the schema. The results are
much faster update cascades, a physically smaller database,
faster joins, and faster aggregations.
2. The
SET NULL option will change the values in the referencing
table to a
NULL. Obviously, the referencing column needs to be
NULL-able.

3. The
SET DEFAULT option will change the values in the
referencing table to the default value of that column.
Obviously, the referencing column needs to have some
DEFAULT declared for it, but each referencing column can have
its own default in its own table.
4. The
NO ACTION option explains itself. Nothing is changed in
the referencing table, and it is possible that some error message
about reference violation will be raised. If a referential
1.1 Schema and Table Creation 17
constraint does not specify any ON UPDATE or ON DELETE
rule in the update rule, then
NO ACTION is implicit. You will
also see the reserved word
RESTRICT in some products instead
of
NO ACTION.
Standard SQL has more options about how matching is done
between the referenced and referencing tables. Most SQL products have
not implemented them, so I will not mention them anymore.
Standard SQL has deferrable constraints. This option lets the
programmer turn a constraint off during a session, so that the table can
be put into a state that would otherwise be illegal. However, at the end of
a session, all the constraints are enforced. Many SQL products have
implemented these options, and they can be quite handy, but I will not
mention them until we get to the section on transaction control.
1.1.4 UNIQUE Constraints versus UNIQUE Indexes
UNIQUE constraints are not the same thing as UNIQUE indexes.
Technically speaking, indexes do not even exist in Standard SQL. They

were considered too physical to be part of a logical model of a language.
In practice, however, virtually all products have some form of “access
enhancement” for the DBA to use, and most often, it is an index.
The column referenced by a
FOREIGN KEY has to be either a
PRIMARY KEY or a column with a UNIQUE constraint; a unique index
on the same set of columns cannot be referenced, since the index is on
one table and not a relationship between two tables.
Although there is no order to a constraint, an index is ordered, so the
unique index might be an aid for sorting. Some products construct special
index structures for the declarative referential integrity (DRI) constraints,
which in effect “pre-
JOIN” the referenced and referencing tables.
All the constraints can be defined as equivalent to some
CHECK
constraint. For example:
PRIMARY KEY = CHECK (UNIQUE (SELECT <key columns> FROM <table>)
AND (<key columns>) IS NOT NULL)
UNIQUE = CHECK (UNIQUE (SELECT <key columns> FROM <table>))
NOT NULL = CHECK (<column> IS NOT NULL)
These predicates can be reworded in terms of other predicates and
subquery expressions, and then passed on to the optimizer.
18 CHAPTER 1: DATABASE DESIGN
1.1.5 Nested UNIQUE Constraints
One of the basic tricks in SQL is representing a one-to-one or many-to-
many relationship with a table that references the two (or more) entity
tables involved by their primary keys. This third table has several
popular names, such as “junction table” or “join table,” but we know that
it is a relationship. This type of table needs constraints to ensure that the
relationships work properly.

For example, here are two tables:
CREATE TABLE Boys
(boy_name VARCHAR(30) NOT NULL PRIMARY KEY
);
CREATE TABLE Girls
(girl_name VARCHAR(30) NOT NULL PRIMARY KEY,
);
Yes, I know using names for a key is a bad practice, but it will make
my examples easier to read. There are a lot of different relationships that
we can make between these two tables. If you don’t believe me, just
watch the Jerry Springer Show sometime. The simplest relationship table
looks like this:
CREATE TABLE Couples
(boy_name VARCHAR(30) NOT NULL
REFERENCES Boys (boy_name)
ON UPDATE CASCADE
ON DELETE CASCADE,
girl_name VARCHAR(30) NOT NULL,
REFERENCES Girls(girl_name)
ON UPDATE CASCADE
ON DELETE CASCADE);
The Couples table allows us to insert rows like this:
('Joe Celko', 'Hilary Duff')
('Joe Celko', 'Lindsay Lohan')
('Toby McGuire', 'Lindsay Lohan')
('Joe Celko', 'Hilary Duff')
1.1 Schema and Table Creation 19
Oops! I am shown twice with Hilary Duff, because the Couples table
does not have its own key. This mistake is easy to make, but the way to
fix it is not obvious.

CREATE TABLE Orgy
(boy_name VARCHAR(30) NOT NULL
REFERENCES Boys (boy_name)
ON DELETE CASCADE
ON UPDATE CASCADE,
girl_name VARCHAR(30) NOT NULL,
REFERENCES Girls(girl_name)
ON UPDATE CASCADE
ON DELETE CASCADE,
PRIMARY KEY (boy_name, girl_name)); compound key
The Orgy table gets rid of the duplicated rows and makes this a
proper table. The primary key for the table is made up of two or more
columns and is called a compound key because of that fact. These are
valid rows now.
('Joe Celko', 'Hilary Duff')
('Joe Celko', 'Lindsay Lohan')
('Toby McGuire’, 'Lindsay Lohan')
But the only restriction on the couples is that they appear only once.
Every boy can be paired with every girl, much to the dismay of the Moral
Majority. I think I want to make a rule that guys can have as many gals as
they want, but the gals have to stick to one guy.
The way I do this is to use a
NOT NULL UNIQUE constraint on the
girl_name column, which makes it a key. It is a simple key, since it is
only one column, but it is also a nested key, because it appears as a
subset of the compound
PRIMARY KEY.
CREATE TABLE Playboys
(boy_name VARCHAR(30) NOT NULL
REFERENCES Boys (boy_name)

ON UPDATE CASCADE
ON DELETE CASCADE,
girl_name VARCHAR(30) NOT NULL UNIQUE, nested key
REFERENCES Girls(girl_name)
ON UPDATE CASCADE
20 CHAPTER 1: DATABASE DESIGN
ON DELETE CASCADE,
PRIMARY KEY (boy_name, girl_name)); compound key
The Playboys is a proper table, without duplicated results, but it also
enforces the condition that I get to play around with one or more ladies.
('Joe Celko', 'Hilary Duff')
('Joe Celko', 'Lindsay Lohan')
The women might want to go the other way and keep company with a
series of men.
CREATE TABLE Playgirls
(boy_name VARCHAR(30) NOT NULL UNIQUE nested key
REFERENCES Boys (boy_name)
ON UPDATE CASCADE
ON DELETE CASCADE,
girl_name VARCHAR(30) NOT NULL,
REFERENCES Girls(girl_name)
ON UPDATE CASCADE
ON DELETE CASCADE,
PRIMARY KEY (boy_name, girl_name)); compound key
The Playgirls table would permit these rows from our original set.
('Joe Celko', 'Lindsay Lohan')
('Toby McGuire', 'Lindsay Lohan')
Think about all of these possible keys for a minute. The compound
PRIMARY KEY is now redundant. If each boy appears only once in the
table, or each girl appears only once in the table, then each (boy_name,

girl_name) pair can appear only once. However, the redundancy can be
useful in searching the table, because it will probably create extra
indexes that give us a covering of both names. The query engine then
can use just the index and touch the base tables.
The Moral Majority is pretty upset about this Hollywood scandal and
would love for us to stop running around and settle down in nice stable
couples.
CREATE TABLE Marriages
(boy_name VARCHAR(30) NOT NULL UNIQUE nested key
REFERENCES Boys (boy_name)
1.1 Schema and Table Creation 21
ON UPDATE CASCADE
ON DELETE CASCADE,
girl_name VARCHAR(30) NOT NULL UNIQUE nested key,
REFERENCES Girls(girl_name)
ON UPDATE CASCADE
ON DELETE CASCADE,
PRIMARY KEY(boy_name, girl_name)); redundant compound key!!
Since one of the goals of an RDBMS (relational database management
system) is to remove redundancy, why would I have that compound
primary key? One reason might be to get a covering index on both
columns for performance. But the more likely answer is that this is an
error that a smart optimizer will spot. I leave same-sex marriages as an
exercise for the reader.
The Couples table allows us to insert these rows from the original set.
('Joe Celko', 'Hilary Duff')
('Toby McGuire', 'Lindsay Lohan')
However, SQL products and theory do not always match. Many
products make the assumption that the
PRIMARY KEY is somehow

special in the data model and will be the way that they should access the
table most of the time.
In fairness, making special provision for the
PRIMARY KEY is not a
bad assumption, because the
REFERENCES clause uses the PRIMARY
KEY of the referenced table as the default. Many new SQL programmers
are not aware that a
FOREIGN KEY constraint can also reference any
UNIQUE constraint in the same table or in another table. The following
nightmare code will give you an idea of the possibilities. The multiple
column versions follow the same syntax.
CREATE TABLE Foo
(foo_key INTEGER NOT NULL PRIMARY KEY,

self_ref INTEGER NOT NULL
REFERENCES Foo(fookey),
outside_ref_1 INTEGER NOT NULL
REFERENCES Bar(bar_key),
outside_ref_2 INTEGER NOT NULL
REFERENCES Bar(other_key),
);

×