Tải bản đầy đủ (.pdf) (10 trang)

Joe Celko s SQL for Smarties - Advanced SQL Programming P13 pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (234.87 KB, 10 trang )

92 CHAPTER 2: NORMALIZATION
Consider this actual problem, which appeared on CompuServe’s
ORACLE forum some years ago. A pharmaceutical company has an
inventory table, and price changes table that look like this:
CREATE TABLE Drugs
(drug_nbr INTEGER NOT NULL PRIMARY KEY,
drug_name CHAR(30) NOT NULL,
drug_qty INTEGER NOT NULL
CONSTRAINT positive_quantity
CHECK(drug_qty >= 0),
);
CREATE TABLE Prices
(drug_nbr INTEGER NOT NULL,
start_date DATE NOT NULL,
end_date DATE NOT NULL
CONSTRAINT started_before_endded
CHECK(start_date <= end_date),
price DECIMAL(8,2) NOT NULL,
PRIMARY KEY (drug_nbr, start_date));
Every order has to use the order date to find what the selling price
was when the order was placed. The current price will have a value of
“eternity” (a dummy date set so high that it will not be reached, such as
‘9999-12-31’). The
(end_date + INTERVAL '1' DAY) of one price
will be equal to the start_date of the next price for the same drug.
While this is normalized, performance was bad. Every report, invoice
or query will have a
JOIN between Drugs and Prices. The trick might be
to add more columns to the Drugs, like this:
CREATE TABLE Drugs
(drug_nbr INTEGER PRIMARY KEY,


drug_name CHAR(30) NOT NULL,
drug_qty INTEGER NOT NULL
CONSTRAINT positive_quantity
CHECK(drug_qty >= 0),
current_start_date DATE NOT NULL,
current_end_date DATE NOT NULL,
CONSTRAINT current_start_before_endded
CHECK(current_start_date <= current_end_date),
current_price DECIMAL(8,2) NOT NULL,
2.11 Key Types 93
prior_start_date DATE NOT NULL,
prior_end_date DATE NOT NULL,
CONSTRAINT prior_start_before_endded
CHECK(prior_start_date <= prior_end_date),
AND (current_start_date = prior_end_date + INTERVAL '1'
DAY
prior_price DECIMAL(8,2) NOT NULL,
);
This covered more than 95% of the orders in the actual company,
because very few orders have more than two price changes before they
are taken out of stock. The odd exception was trapped by a procedural
routine.
The other method is to add
CHECK() constraints that will enforce the
rules destroyed by denormalization. We will discuss this later, but the
overhead for insertion, updating, and deleting to the table is huge. In
fact, in many cases denormalized tables cannot be changed until a
complete set of columns is built outside the table. Furthermore, while
one set of queries is improved, all others are damaged.
Today, however, only data warehouses should be denormalized.

JOINs are far cheaper than they were, and the overhead of handling
exceptions with procedural code is far greater than any extra database
overhead.
2.11.5 Row Sorting
On May 27, 2001, Fred Block posted a problem on the SQL Server
Newsgroup. I will change the problem slightly, but the idea was that he
had a table with five character string columns that had to be sorted
alphabetically within each row. This “flatten table” denormalization is a
very common one that might involve months of the year as columns, or
other things that are acting as repeating groups in violation of 1NF.
Let’s declare the table and dive into the problem:
CREATE TABLE Foobar
(key_col INTEGER NOT NULL PRIMARY KEY,
c1 VARCHAR(20) NOT NULL,
c2 VARCHAR(20) NOT NULL,
c3 VARCHAR(20) NOT NULL,
c4 VARCHAR(20) NOT NULL,
c5 VARCHAR(20) NOT NULL);
This means that we want this condition to hold:
94 CHAPTER 2: NORMALIZATION
CHECK ((c1 <= c2) AND (c2 <= c3)
AND (c3 <= c4) AND (c4 <= c5))
Obviously, if he had added this constraint to the table in the first
place, we would be fine. Of course, that would have pushed the problem
to the front end, and I would not have a topic for this section.
What was interesting was how everyone who read this newsgroup
posting immediately envisioned a stored procedure that would take the
five values, sort them and return them to their original row in the table.
The only way to make this approach work for the whole table was to
write an update cursor and loop through all the rows of the table. Itzik

Ben-Gan posted a simple procedure that loaded the values into a
temporary table, then pulled them out in sorted order, starting with the
minimum value, using a loop.
Another trick is the Bose-Nelson sort (Bose-Nelson Sort, Dr. Dobbs
Journal, September 1985, pp. 282-296), which I had written about in Dr.
Dobb’s Journal back in 1985. This sort is a recursive procedure that takes
an integer and then generates swap pairs for a vector of that size. A swap
pair is a pair of position numbers from 1 to (n) in the vector that need to
be exchanged if they are out of order. These swap pairs are also related to
Sorting Networks in the literature (see Donald Knuth, Art of Computer
Programming, Volume 3: Sorting and Searching, 2nd Edition, April 24,
1998, ISBN: 0-201-89685-0).
You are probably thinking that this method is a bit weak, because
the results are only good for sorting a fixed number of items. But a
table only has a fixed number of columns, so that is not a problem in
denormalized SQL.
You can set up a sorting network that will sort five items, with the
minimal number of exchanges, nine swaps, like this:
Swap(c1, c2);
Swap(c4, c5);
Swap(c3, c5);
Swap(c3, c4);
Swap(c1, c4);
Swap(c1, c3);
Swap(c2, c5);
Swap(c2, c4);
Swap(c2, c3);
2.11 Key Types 95
You might want to deal yourself a hand of five playing cards in one
suit to see how it works. Put the cards face down on the table and pick

up the pairs, swapping them if required, then turn over the row to see
that it is in sorted order when you are done.
In theory, the minimum number of swaps needed to sort (n) items is
CEILING (log2 (n!)), and as (n) increases, this approaches O(n*log2(n)).
Computer science majors will remember this “Big O” expression as the
expected performance of the best sorting algorithms, such as Quicksort.
The Bose-Nelson method is very good for small values of (n). If (n < 9)
then it is perfect, actually. But as things get bigger, Bose-Nelson
approaches O(n ^ 1.585). In English, this method is good for a fixed size
list of 16 or fewer items, but it goes to Hell after that.
You can write a version of the Bose-Nelson procedure that will output
the SQL code for a given value of (n). The obvious direct way to do a
Swap() is to write a chain of UPDATE statements. Remember that in
SQL, the
SET clause assignments happen in parallel, so you can easily
write a
SET clause that exchanges the two items when they are out of
order. Using the above swap chain, we get this block of code:
BEGIN ATOMIC
Swap(c1, c2);
UPDATE Foobar
SET c1 = c2, c2 = c1
WHERE c1 > c2;
Swap(c4, c5);
UPDATE Foobar
SET c4 = c5, c5 = c4
WHERE c4 > c5;
Swap(c3, c5);
UPDATE Foobar
SET c3 = c5, c5 = c3

WHERE c3 > c5;
Swap(c3, c4);
UPDATE Foobar
SET c3 = c4, c4 = c3
WHERE c3 > c4;
Swap(c1, c4);
96 CHAPTER 2: NORMALIZATION
UPDATE Foobar
SET c1 = c4, c4 = c1
WHERE c1 > c4;
Swap(c1, c3);
UPDATE Foobar
SET c1 = c3, c3 = c1
WHERE c1 > c3;
Swap(c2, c5);
UPDATE Foobar
SET c2 = c5, c5 = c2
WHERE c2 > c5;
Swap(c2, c4);
UPDATE Foobar
SET c2 = c4, c4 = c2
WHERE c2 > c4;
Swap(c2, c3);
UPDATE Foobar
SET c2 = c3, c3 = c2
WHERE c2 > c3;

END;
This is fully portable, Standard SQL code, and it can be machine-
generated. But that parallelism is useful. It is worthwhile to combine

some of the
UPDATE statements. But you have to be careful not to change
the effective sequence of the swap operations.
If you look at the first two
UPDATE statements, you can see that they
do not overlap. This means you could roll them into one statement like
this:
Swap(c1, c2) AND Swap(c4, c5);
UPDATE Foobar
SET c1 = CASE WHEN c1 <= c2 THEN c1 ELSE c2 END,
c2 = CASE WHEN c1 <= c2 THEN c2 ELSE c1 END,
c4 = CASE WHEN c4 <= c5 THEN c4 ELSE c5 END,
c5 = CASE WHEN c4 <= c5 THEN c5 ELSE c4 END
WHERE c4 > c5 OR c1 > c2;
2.11 Key Types 97
The advantage of doing this is that you have to execute only one
UPDATE statement, not two. Updating a table, even on nonkey
columns, usually locks the table and prevents other users from getting
to the data. If you could roll the statements into one single
UPDATE,
you would have the best of all possible worlds, but I doubt that the
code would be easy to read.
We can see this same pattern in the pair of statements:
Swap(c1, c3);
Swap(c2, c5);
But there are other patterns, so you can write general templates for
them. Consider this one:
Swap(x, y);
Swap(x, z);
Write out all possible triplets and apply these two operations on

them, thus:
(x, y, z) => (x, y, z)
(x, z, y) => (x, z, y)
(y, x, z) => (x, y, z)
(y, z, x) => (x, z, y)
(z, x, y) => (x, y, z)
(z, y, x) => (x, y, z)
The result of this pattern is that x is lowest value of the three values,
and y and z either stay in the same relative position to each other or be
sorted properly. Properly sorting them would have the advantage of
saving exchanges later and also of reducing the set of the subset being
operated upon by each
UPDATE statement. With a little thought, we can
write the following symmetric piece of code.
Swap(x, y) AND Swap(x, z);
UPDATE Foobar
SET x = CASE WHEN x BETWEEN y AND z THEN y
WHEN z BETWEEN y AND x THEN y
WHEN y BETWEEN z AND x THEN z
WHEN x BETWEEN z AND y THEN z
ELSE x END,
98 CHAPTER 2: NORMALIZATION
y = CASE WHEN x BETWEEN y AND z THEN x
WHEN x BETWEEN z AND y THEN x
WHEN z BETWEEN x AND y THEN z
WHEN z BETWEEN y AND x THEN z
ELSE y END,
z = CASE WHEN x BETWEEN z AND y THEN y
WHEN z BETWEEN x AND y THEN y
WHEN y BETWEEN z AND x THEN x

WHEN z BETWEEN y AND x THEN x
ELSE z END
WHERE x > z OR x > y;
While it is very tempting to write more and more of these pattern
templates, it might be more trouble than it is worth, because of increased
maintenance and readability.
Here is an SQL/PSM program for the Bose-Nelson sort, based on the
version given in Frederick Hegeman’s “Sorting Networks” article for The
C/C++ User’s Journal (Hegeman 1993). It assumes that you have a
procedure called
PRINT() for output to a text file. You can translate it
into the programming language of your choice easily, as long as it
supports recursion.
CREATE PROCEDURE BoseSort (IN i INTEGER, IN j INTEGER)
LANGUAGE SQL
DETERMINISTIC
BEGIN
DECLARE m INTEGER;
IF j > i
THEN SET m = i + (j-i+1)/2 -1;
CALL BoseSort(i,m);
CALL BoseSort(m+1, j);
CALL BoseMerge(i, m, m+1, j);
END IF;
END;
CREATE PROCEDURE BoseMerge (IN i1 INTEGER, IN i2 INTEGER, IN
'j1' INTEGER, IN 'j2' INTEGER)
LANGUAGE SQL
DETERMINISTIC
BEGIN

DECLARE i_mid INTEGER;
2.11 Key Types 99
DECLARE j_mid INTEGER;
IF i2 = i1 AND 'j2' = 'j1'
THEN CALL PRINT('swap', i1, 'j1');
ELSE IF i2 = i1+1 AND 'j2' = 'j1'
THEN CALL PRINT('swap', i1, 'j1');
CALL PRINT('swap', i2, 'j1');
ELSE IF i2 = i1+1 AND 'j2' = 'j1'+1
THEN CALL PRINT('swap', i1, 'j2');
CALL PRINT('swap', i1, 'j1');
ELSE SET i_mid = i1 + (i2-i1+1)/2 - 1;
IF MOD((i2-i1+1),2) = 0 AND i2-i1 <> 'j2'-'j1'
THEN SET j_mid = ('j1' + 'j2'-'j1')/2 -1;
CALL BoseMerge(i1, i_mid, 'j1', j_mid);
CALL BoseMerge(ii_mid+1, i2, j_mid+1, 'j2');
CALL BoseMerge(ii_mid+1, i2, 'j1', j_mid);
END IF;
END IF;
END IF;
END IF;
END;



CHAPTER

3

Numeric Data in SQL


S

QL IS NOT A computational or procedural language; the arithmetic
capability of SQL is weaker than that of any other language you have
ever used. But there are some tricks that you need to know when
working with numbers in SQL and when passing them to a host
program. Much of the arithmetic and the functions are defined by
implementations, so you should experiment with your particular
product and make notes on the defaults, precision, and tools in the
math library of your database.
You should also read Chapter 21, which deals with the related topic
of aggregate functions. This chapter deals with the arithmetic that you
would use across a row, instead of down a column; they are not quite
the same.

3.1 Numeric Types

The SQL Standard has a wide range of numeric types. The idea is that
any host language can find an SQL numeric type that matches one of
its own.
You will also find some vendor extensions in the numeric data
types, the most common of which is

MONEY

. This is really a

DECIMAL



or

NUMERIC

data type, which also accepts and displays currency
symbols in input and output.

×