Tải bản đầy đủ (.pdf) (10 trang)

Joe Celko s SQL for Smarties - Advanced SQL Programming P6 docx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (125.32 KB, 10 trang )

22 CHAPTER 1: DATABASE DESIGN
CREATE TABLE Bar
(bar_key INTEGER NOT NULL PRIMARY KEY,
other_key INTEGER NOT NULL UNIQUE,
);
1.1.6 Overlapping Keys
But let’s get back to the nested keys. Just how far can we go with them?
My favorite example is a teacher’s schedule kept in a table like this [I am
leaving out reference clauses and
CHECK() constraints]:
CREATE TABLE Schedule
(teacher_name VARCHAR(15) NOT NULL,
class_name CHAR(15) NOT NULL,
room_nbr INTEGER NOT NULL,
period INTEGER NOT NULL,
PRIMARY KEY (teacher_name, class_name, room_nbr, period));
That choice of a primary key is the most obvious one, use all the
columns. Typical rows would look like this:
('Mr. Celko', 'Database 101', 222, 6)
The rules we want to enforce are:
1. A teacher is in only one room each period.
2. A teacher teaches only one class each period.
3. A room has only one class each period.
4. A room has only one teacher in it each period.
Stop reading and see what you come up with for an answer. Okay,
now consider using one constraint for each rule in the list, thus.
CREATE TABLE Schedule_1 version one, WRONG!
(teacher_name VARCHAR(15) NOT NULL,
class_name CHAR(15) NOT NULL,
room_nbr INTEGER NOT NULL,
period INTEGER NOT NULL,


UNIQUE (teacher_name, room_nbr, period), rule #1
UNIQUE (teacher_name, class_name, period), rule #2
1.1 Schema and Table Creation 23
UNIQUE (class_name, room_nbr, period), rule #3
UNIQUE (teacher_name, room_nbr, period), rule #4
PRIMARY KEY (teacher_name, class_name, room_nbr, period));
We know that there are four ways to pick three things from a set of
four things. While column order is important in creating an index, we
can ignore it for now and then worry about index tuning later.
I could drop the
PRIMARY KEY as redundant if I have all four of
these constraints in place. But what happens if I drop the
PRIMARY KEY
and then one of the constraints?
CREATE TABLE Schedule_2 still wrong
(teacher_name VARCHAR(15) NOT NULL,
class_name CHAR(15) NOT NULL,
room_nbr INTEGER NOT NULL,
period INTEGER NOT NULL,
UNIQUE (teacher_name, room_nbr, period), rule #1
UNIQUE (teacher_name, class_name, period), rule #2
UNIQUE (class_name, room_nbr, period)); rule #3
I can now insert these rows in the second version of the table:
('Mr. Celko', 'Database 101', 222, 6)
('Mr. Celko', 'Database 102', 223, 6)
This gives me a very tough sixth-period teaching load, because I have
to be in two different rooms at the same time. Things can get even worse
when another teacher is added to the schedule:
('Mr. Celko', 'Database 101', 222, 6)
('Mr. Celko', 'Database 102', 223, 6)

('Ms. Shields', 'Database 101', 223, 6)
Ms. Shields and I are both in room 223, trying to teach different
classes at the same time. Matthew Burr looked at the constraints and the
rules, and he came up with this analysis.
CREATE TABLE Schedule_3 correct version
(teacher_name VARCHAR(15) NOT NULL,
class_name CHAR(15) NOT NULL,
room_nbr INTEGER NOT NULL,
24 CHAPTER 1: DATABASE DESIGN
period INTEGER NOT NULL,
UNIQUE (teacher_name, period), rules #1 and #2
UNIQUE (room_nbr, period),
UNIQUE (class_name, period)); rules #3 and #4
If a teacher is in only one room each period, then given a period and
a teacher I should be able to determine only one room; i.e., room is
functionally dependent upon the combination of teacher and period.
Likewise, if a teacher teaches only one class each period, then class is
functionally dependent upon the combination of teacher and period.
The same thinking holds for the last two rules: class is functionally
dependent upon the combination of room and period, and teacher is
functionally dependent upon the combination of room and period.
With the constraints that were provided in the first version, you will
find that the rules are not enforced. For example, I could enter the
following rows:
(‘Mr. Celko’, ‘Database 101’, 222, 6)
(‘Mr. Celko’, ‘Database 102’, 223, 6)
These rows violate the first and second rules.
However, the unique constraints first provided in Schedule_2 do not
capture this violation and will allow the rows to be entered.
The following constraint:

UNIQUE (teacher_name, room_nbr, period)
checks the complete combination of teacher, room, and period, and
since ('Mr. Celko', 222, 6) is different from ('Mr. Celko', 223, 6), the
DDL does not find any problem with both rows being entered, even
though that means that Mr. Celko is in more than one room during the
same period.
The constraint:
UNIQUE (teacher_name, class_name, period)
does not catch its associated rule either, since ('Mr. Celko', 'Database
101', 6) is different from ('Mr. Celko', 'Database 102', 6). As a result, Mr.
Celko is able to teach more than one class during the same period, thus
violating rule #2. It seems that we’d also be able to add the following
row:
1.1 Schema and Table Creation 25
('Ms. Shields', 'Database 103', 222, 6)
This violates the third and fourth rules.
1.1.7 CREATE ASSERTION Constraints
In Standard SQL, CREATE ASSERTION allows you to apply a constraint
on the tables within a schema, but not to attach the constraint to any
particular table. The syntax is:
<assertion definition> ::=
CREATE ASSERTION <constraint name> <assertion check>
[<constraint attributes>]
<assertion check> ::=
CHECK <left paren> <search condition> <right paren>
As you would expect, there is a DROP ASSERTION statement, but no
ALTER statement. An assertion can do things that a CHECK() clause
attached to a table cannot do, because it is outside of the tables involved.
A
CHECK() constraint is always TRUE if the table is empty.

For example, it is very hard to make a rule that the total number of
employees in the company must be equal to the total number of
employees in all the health plan tables.
CREATE ASSERTION Total_health_Coverage
CHECK (SELECT COUNT(*) FROM Personnel) =
+ (SELECT COUNT(*) FROM HealthPlan_1)
+ (SELECT COUNT(*) FROM HealthPlan_2)
+ (SELECT COUNT(*) FROM HealthPlan_3);
1.1.8 Using VIEWs for Schema Level Constraints
Until you can get CREATE ASSERTION constraints, you have to use
procedures and triggers to get the same effects. Consider a schema for a
chain of stores that has three tables, thus:
CREATE TABLE Stores
(store_nbr INTEGER NOT NULL PRIMARY KEY,
store_name CHAR(35) NOT NULL,
);
26 CHAPTER 1: DATABASE DESIGN
CREATE TABLE Personnel
(ssn CHAR(9) NOT NULL PRIMARY KEY,
last_name CHAR(15) NOT NULL,
first_name CHAR(15) NOT NULL,
);
The first two explain themselves. The third table, following, shows
the relationship between stores and personnel, namely who is assigned
to what job at which store and when this happened. Thus:
CREATE TABLE JobAssignments
(store_nbr INTEGER NOT NULL
REFERENCES Stores (store_nbr)
ON UPDATE CASCADE
ON DELETE CASCADE,

ssn CHAR(9) NOT NULL PRIMARY KEY
REFERENCES Personnel( ssn)
ON UPDATE CASCADE
ON DELETE CASCADE,
start_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
end_date TIMESTAMP,
CHECK (start_date <= end_date),
job_type INTEGER DEFAULT 0 NOT NULL unassigned = 0
CHECK (job_type BETWEEN 0 AND 99),
PRIMARY KEY (store_nbr, ssn, start_date));
Let’s invent some job_type codes, such as 0 = 'unassigned', 1 =
'stockboy', etc., until we get to 99 = 'Store Manager'. We have a rule that
each store has, at most, one manager. In Standard SQL, you could write a
constraint like this:
CREATE ASSERTION ManagerVerification
CHECK (1 <= ALL (SELECT COUNT(*)
FROM JobAssignments
WHERE job_type = 99
GROUP BY store_nbr));
This is actually a bit subtler than it looks. If you change the <= to =,
then the stores must have exactly one manager if it has any employees at
all.
But as we said, most SQL products still do not allow
CHECK()
constraints that apply to the table as a whole, nor do they support the
scheme-level
CREATE ASSERTION statement.
1.1 Schema and Table Creation 27
So, how to do this? You might use a trigger, which will involve
proprietary, procedural code. Despite the SQL/PSM Standard, most

vendors implement very different trigger models and use their
proprietary 4GL language in the body of the trigger.
We need to set
TRIGGERs that validate the state of the table after each
INSERT and UPDATE operation. If we DELETE an employee, this will not
create more than one manager per store. The skeleton for these triggers
would be something like this:
CREATE TRIGGER CheckManagers
AFTER UPDATE ON JobAssignments same for INSERT
IF 1 <= ALL (SELECT COUNT(*)
FROM JobAssignments
WHERE job_type = 99
GROUP BY store_nbr)
THEN ROLLBACK;
ELSE COMMIT;
END IF;
But being a fanatic, I want a pure SQL solution that is declarative
within the limits of most current SQL products.
Let’s create two tables. This first table is a Personnel table for the store
managers only and it is keyed on their Social Security numbers. Notice
the use of
DEFAULT and CHECK() on their job_type to ensure that this
is really a “managers only” table.
CREATE TABLE Job_99_Assignments
(store_nbr INTEGER NOT NULL PRIMARY KEY
REFERENCES Stores (store_nbr)
ON UPDATE CASCADE
ON DELETE CASCADE,
ssn CHAR(9) NOT NULL
REFERENCES Personnel (ssn)

ON UPDATE CASCADE
ON DELETE CASCADE,
start_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
end_date TIMESTAMP,
CHECK (start_date <= end_date),
job_type INTEGER DEFAULT 99 NOT NULL
CHECK (job_type = 99));
28 CHAPTER 1: DATABASE DESIGN
This second table is a Personnel table for employees who are not
'store manager' and it is also keyed on Social Security numbers. Notice
the use of
DEFAULT for a starting position of 'unassigned' and CHECK()
on their job_type to ensure that this is really a “no managers allowed”
table.
CREATE TABLE Job_not99_Assignments
(store_nbr INTEGER NOT NULL
REFERENCES Stores (store_nbr)
ON UPDATE CASCADE
ON DELETE CASCADE,
ssn CHAR(9) NOT NULL PRIMARY KEY
REFERENCES Personnel (ssn)
ON UPDATE CASCADE
ON DELETE CASCADE,
start_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
end_date TIMESTAMP,
CHECK (start_date <= end_date),
job_type INTEGER DEFAULT 0 NOT NULL
CHECK (job_type BETWEEN 0 AND 98) no 99 code
);
From these two tables, build this UNIONed view of all the job

assignments in the entire company and show that to users.
CREATE VIEW JobAssignments (store_nbr, ssn, start_date,
end_date, job_type)
AS
(SELECT store_nbr, ssn, start_date, end_date, job_type
FROM Job_not99_Assignments
UNION ALL
SELECT store_nbr, ssn, start_date, end_date, job_type
FROM Job_99_Assignments)
The key and job_type constraints in each table, working together, will
guarantee at most manager per store. The next step is to add
INSTEAD
OF triggers to the VIEW or write stored procedures, so the users can
insert, update, and delete from it easily. A simple stored procedure,
without error handling or input validation, would be:
CREATE PROCEDURE InsertJobAssignments
1.1 Schema and Table Creation 29
(IN store_nbr INTEGER, IN new_ssn CHAR(9), IN new_start_date
DATE, IN new_end_date DATE, IN new_job_type INTEGER)
LANGUAGE SQL
IF new_job_typ <> 99
THEN INSERT INTO Job_not99_Assignments
VALUES (store_nbr, new_ssn, new_start_date, new_end_date,
new_job_type);
ELSE INSERT INTO Job_99_Assignments
VALUES (store_nbr, new_ssn, new_start_date, new_end_date,
new_job_type);
END IF;
Likewise, a procedure to terminate an employee:
CREATE PROCEDURE FireEmployee (IN new_ssn CHAR(9))

LANGUAGE SQL
IF new_job_typ <> 99
THEN DELETE FROM Job_not99_Assignments
WHERE ssn = new_ssn;
ELSE DELETE FROM Job_99_Assignments
WHERE ssn = new_ssn;
END IF;
If a developer attempts to change the Job_Assignments VIEW directly
with an
INSERT, UPDATE, or DELETE, he will get an error message
telling him that the
VIEW is not updatable because it contains a UNION
operation. That is a good thing in one way, because we can force the
developer to use only the stored procedures.
Again, this is an exercise in programming a solution within certain
limits. The
TRIGGER is probably going give better performance than the
VIEW.
1.1.9 Using PRIMARY KEYs and ASSERTIONs
for Constraints
Let’s do another version of the “stores and personnel” problem given in
section 1.1.8.
CREATE TABLE JobAssignments
(ssn CHAR(9) NOT NULL PRIMARY KEY nobody is in two Stores
REFERENCES Personnel (ssn)
ON UPDATE CASCADE
30 CHAPTER 1: DATABASE DESIGN
ON DELETE CASCADE,
store_nbr INTEGER NOT NULL
REFERENCES Stores (store_nbr)

ON UPDATE CASCADE
ON DELETE CASCADE);
The key on the Social Security number will ensure that nobody is at
two stores, and that a store can have many employees assigned to it.
Ideally, you want an SQL-92 constraint to check that each employee
does have a branch assignment.
The first attempt is usually something like this.
CREATE ASSERTION Nobody_Unassigned
CHECK (NOT EXISTS
(SELECT *
FROM Personnel AS P
LEFT OUTER JOIN
JobAssignments AS J
ON P.ssn = J.ssn
WHERE J.ssn IS NULL
AND P.ssn
IN (SELECT ssn FROM JobAssignments
UNION
SELECT ssn FROM Personnel)));
However, this example is overkill and does not prevent an employee
from being at more than one store. There are probably indexes on the
Social Security number values in both Personnel and JobAssignments
tables, so getting a
COUNT() function should be cheap. This assertion
will also work.
CREATE ASSERTION Everyone_assigned_one_store
CHECK ((SELECT COUNT(ssn) FROM JobAssignments)
= (SELECT COUNT(ssn) FROM Personnel));
This is a surprise to people at first, because they expect to see a JOIN
to do the one-to-one mapping between personnel and job assignments.

But the PK-FK (primary key–foreign key) requirement provides that for
you. Any unassigned employee will make the Personnel table bigger than
the JobAssignments table, and an employee in JobAssignments must
have a match in Personnel. Good optimizers extract things like that as
1.1 Schema and Table Creation 31
predicates and use them, which is why we want declarative referential
integrity, instead of triggers and application-side logic.

You will need to have a stored procedure that inserts into both tables
as a single transaction. The updates and deletes will cascade and clean up
the job assignments.
Let’s change the specs a bit and allow employees to work at more than
one store. If we want to have employees in multiple Stores, we could
change the keys on JobAssignments, thus.
CREATE TABLE JobAssignments
(ssn CHAR(9) NOT NULL
REFERENCES Personnel (ssn)
ON UPDATE CASCADE
ON DELETE CASCADE,
store_nbr INTEGER NOT NULL
REFERENCES Stores (store_nbr)
ON UPDATE CASCADE
ON DELETE CASCADE,
PRIMARY KEY (ssn, store_nbr));
Then use a COUNT(DISTINCT ) in the assertion.
CREATE ASSERTION Everyone_assigned_at_least_once
CHECK ((SELECT COUNT(DISTINCT ssn) FROM JobAssignments)
= (SELECT COUNT(ssn) FROM Personnel));
You must be aware that the uniqueness constraints and assertions
work together; a change in one or both of them can also change this rule.

1.1.10 Avoiding Attribute Splitting
Attribute splitting takes many forms. It occurs when you have a single
attribute, but put its values in more than one place in the schema. The
most common form of attribute splitting is to create separate tables for
each value. Another form of attribute splitting is to create separate rows
in the same table for part of each value. These concepts are probably
easier to show with examples.
Attribute Split Tables
If I were to create a database with a table for male employees and
separate table for female employees, you would immediately see that

×