Tải bản đầy đủ (.pdf) (10 trang)

Joe Celko s SQL for Smarties - Advanced SQL Programming P62 potx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (226.79 KB, 10 trang )


582 CHAPTER 25: ARRAYS IN SQL

j INTEGER NOT NULL CHECK (j > 0),
CHECK ((SELECT MAX(i) FROM MyMatrix)
= (SELECT COUNT(i) FROM MyMatrix)),
CHECK ((SELECT MAX(j) FROM MyMatrix)
= (SELECT COUNT(j) FROM MyMatrix)));

The constraints see that the subscripts of each element are within
proper range. I am starting my subscripts at one, but a little change in
the logic would allow any value.

25.3.1 Matrix Equality

This test for matrix equality is from the article “SQL Matrix Processing”
(Mrdalj, Vujovic, and Jovanovic 1996). Two matrices are equal if their
cardinalities and the cardinality of the their intersection are all equal.

SELECT COUNT(*) FROM MatrixA
UNION
SELECT COUNT(*) FROM MatrixB
UNION
SELECT COUNT(*)
FROM MatrixA AS A, MatrixB AS B
WHERE A.i = B.i
AND A.j = B.j
AND A.element = B.element;

You have to decide how to use this query in your context. If it returns
one number, they are the same; otherwise, they are different.



25.3.2 Matrix Addition

Matrix addition and subtraction are possible only between matrices of
the same dimensions. The obvious way to do the addition is simply:

SELECT A.i, A.j, (A.element + B.element) AS total
FROM MatrixA AS A, MatrixB AS B
WHERE A.i = B.i
AND A.j = B.j;

But properly, you ought to add some checking to be sure the matrices
match. We can assume that both start numbering subscripts with either
one or zero.

25.3 Matrix Operations in SQL 583

SELECT A.i, A.j, (A.element + B.element) AS total
FROM MatrixA AS A, MatrixB AS B
WHERE A.i = B.i
AND A.j = B.j
AND (SELECT COUNT(*) FROM MatrixA) =
(SELECT COUNT(*) FROM MatrixB)
AND (SELECT MAX(i) FROM MatrixA) =
(SELECT MAX(i) FROM MatrixB)
AND (SELECT MAX(j) FROM MatrixA) =
(SELECT MAX(j) FROM MatrixB));

Likewise, to make the addition permanent, you can use the same
basic query in an


UPDATE

statement:

UPDATE MatrixA
SET element = element + (SELECT element
FROM MatrixB
WHERE MatrixB.i = MatrixA.i
AND MatrixB.j = MatrixA.j)
WHERE (SELECT COUNT(*) FROM MatrixA)
=(SELECT COUNT(*) FROM MatrixB)
AND (SELECT MAX(i) FROM MatrixA)
= (SELECT MAX(i) FROM MatrixB)
AND (SELECT MAX(j) FROM MatrixA)
= (SELECT MAX(j) FROM MatrixB));

25.3.3 Matrix Multiplication

Multiplication by a scalar constant is direct and easy:

UPDATE MyMatrix
SET element = element * :constant;

Matrix multiplication is not as big a mess as might be expected.
Remember that the first matrix must have the same number of rows
as the second matrix has columns. That means A[i, k] * B[k, j] = C[i, j],
which we can show with an example:

CREATE TABLE MatrixA

(i INTEGER NOT NULL
CHECK (i BETWEEN 1 AND 10), pick your own bounds
k INTEGER NOT NULL

584 CHAPTER 25: ARRAYS IN SQL

CHECK (k BETWEEN 1 AND 10), must match MatrixB.k range
element INTEGER NOT NULL,
PRIMARY KEY (i, k));
MatrixA
i k element
===================
1 1 2
1 2 -3
1 3 4
2 1 -1
2 2 0
2 3 2
CREATE TABLE MatrixB
(k INTEGER NOT NULL
CHECK (k BETWEEN 1 AND 10), must match MatrixA.k range
j INTEGER NOT NULL
CHECK (j BETWEEN 1 AND 4), pick your own bounds
element INTEGER NOT NULL,
PRIMARY KEY (k, j));
MatrixB
k j element
==================
1 1 -1
1 2 2

1 3 3
2 1 0
2 2 1
2 3 7
3 1 1
3 2 1
3 3 -2
CREATE VIEW MatrixC(i, j, element)
AS SELECT i, j, SUM(MatrixA.element * MatrixB.element)
FROM MatrixA, MatrixB
WHERE MatrixA.k = MatrixB.k
GROUP BY i, j;

25.4 Flattening a Table into an Array 585

This is taken directly from the definition of multiplication.

25.3.4 Other Matrix Operations

The transposition of a matrix is easy to do:

CREATE VIEW TransA (i, j, element)
AS SELECT j, i, element FROM MatrixA;

Again, you can make the change permanent with an

UPDATE

statement:


UPDATE MatrixA
SET i = j, j = i;

Multiplication by a column or row vector is just a special case of
matrix multiplication, but a bit easier. Given the vector V and MatrixA:

SELECT i, SUM(A.element * V.element)
FROM MatrixA AS A, VectorV AS V
WHERE V.j = A.i
GROUP BY A.i;

Cross tabulations and other statistical functions traditionally use an
array to hold data. But you do not need a matrix for them in SQL.
It is possible to do other matrix operations in SQL, but the code
becomes so complex, and the execution time so long, that it is simply
not worth the effort. If a reader would like to submit queries for
eigenvalues and determinants, I will be happy to put them in future
editions of this book.

25.4 Flattening a Table into an Array

Reports and data warehouse summary tables often want to see an array
laid horizontally across a line. The original one element/one column
approach to mapping arrays was based on seeing such reports and
duplicating that structure in a table. A subscript is often an enumeration,
denoting a month or another time period, rather than an integer.
For example, a row in a “Salesmen” table might have a dozen
columns, one for each month of the year, each of which holds the total
commission earned in a particular month. The year is really an array,
subscripted by the month. The subscripts-and-value approach requires


586 CHAPTER 25: ARRAYS IN SQL

more work to produce the same results. It is often easier to explain a
technique with an example. Let us imagine a company that collects time
cards from its truck drivers, each with the driver’s name, the week within
the year (numbered 0 to 51 or 52, depending on the year), and his total
hours. We want to produce a report with one line for each driver and six
weeks of his time across the page. The Timecards table looks like this:

CREATE TABLE Timecards
(driver_name CHAR(25) NOT NULL,
week_nbr INTEGER NOT NULL
CONSTRAINT valid_week_nbr
CHECK(week BETWEEN 0 AND 52)
work_hrs INTEGER
CONSTRAINT zero_or_more_hours
CHECK(work_hrs >= 0),
PRIMARY KEY (driver_name, week_nbr));

We need to “flatten out” this table to get the desired rows for the
report. First, create a working storage table from which the report can be
built:

CREATE TEMPORARY TABLE TimeReportWork working storage
(driver_name CHAR(25) NOT NULL,
wk1 INTEGER, important that these columns are NULL-able
wk2 INTEGER,
wk3 INTEGER,
wk4 INTEGER,

wk5 INTEGER,
wk6 INTEGER);

Notice two important points about this table. First, there is no
primary key; second, the weekly data columns are

NULL

-able. This table
is then filled with time card values:

INSERT INTO TimeReportWork (driver_name, wk1, wk2, wk3, wk4, wk5, wk6)
SELECT driver_name,
SUM(CASE (week_nbr = :rpt_week_nbr) THEN work_hrs ELSE 0 END) AS wk1,
SUM(CASE (week_nbr = :rpt_week_nbr - 1) THEN work_hrs ELSE 0 END) AS wk2,
SUM(CASE (week_nbr = :rpt_week_nbr - 2) THEN work_hrs ELSE 0 END) AS wk3,
SUM(CASE (week_nbr = :rpt_week_nbr - 3) THEN work_hrs ELSE 0 END) AS wk4,
SUM(CASE (week_nbr = :rpt_week_nbr - 4) THEN work_hrs ELSE 0 END) AS wk5,

25.5 Comparing Arrays in Table Format 587

SUM(CASE (week_nbr = :rpt_week_nbr - 5) THEN work_hrs ELSE 0 END) AS wk6
FROM Timecards
WHERE week_nbr BETWEEN :rpt_week_nbr AND (:rpt_week_nbr - 5);
The number of the weeks in the WHERE clauses will vary with the
period covered by the report. The parameter
:rpt_week_nbr is “week
of the report,” and it computes backwards for the prior five weeks. If a
driver did not work in a particular week, the corresponding weekly
column gets a zero hour total. However, if the driver has not worked at

all in the last six weeks, we could lose him completely (no time cards, no
summary). Depending on the nature of the report, you might consider
using an
OUTER JOIN to a Personnel table to be sure you have all the
drivers’ names.
The
NULLs are coalesced to zero in this example, but if you drop the
ELSE 0 clauses, the SUM() will have to deal with a week of all NULLs
and return a
NULL. This enables you to tell the difference between a driver
who was missing for the reporting period and a driver who worked zero
hours but turned in a time card for that period. That difference could be
important for computing the payroll.
25.5 Comparing Arrays in Table Format
It is often necessary to compare one array or set of values with another
when the data is represented in a table. Remember that comparing a
set with a set does not involve ordering the elements, whereas an array
does. For this discussion, let us create two tables, one for employees
and one for their dependents. The children are subscripted in the order
of their births—i.e., 1 is the oldest living child, 2 is the second oldest,
and so forth.
CREATE TABLE Employees
(emp_id INTEGER PRIMARY KEY,
emp_name CHAR(15) NOT NULL,
);
CREATE TABLE Dependents
(emp_id INTEGER NOT NULL the parent
kid CHAR(15) NOT NULL, the array element
birthorder INTEGER NOT NULL, the array subscript
PRIMARY KEY (emp_id, kid));

588 CHAPTER 25: ARRAYS IN SQL
The query “Find pairs of employees whose children have the same set
of names” is very restrictive, but we can make it more so by requiring
that the children be named in the same birth order. Both Mr. X and Mr.
Y must have exactly the same number of dependents; both sets of names
must match. We can assume that no parent has two children with the
same name (George Foreman does not work here) or born at the same
time (we will order twins). Let us begin by inserting test data into the
Dependents table, thus:
Dependents
emp_id kid_name birthorder
==========================
1 'Dick' 2
1 'Harry' 3
1 'Tom' 1
2 'Dick' 3
2 'Harry' 1
2 'Tom' 2
3 'Dick' 2
3 'Harry' 3
3 'Tom' 1
4 'Harry' 1
4 'Tom' 2
5 'Curly' 2
5 'Harry' 3
5 'Moe' 1
In this test data, employees 1, 2, and 3 all have dependents named
‘Tom’, ‘Dick’, and ‘Harry’.
The birth order is the same for the children of employees 1 and 3, but
not for employee 2.

For testing purposes, you might consider adding an extra child to the
family of employee 3, and so forth, to play with this data.
Though there are many ways to solve this query, this approach will
give us some flexibility that others would not. Construct a
VIEW that
gives us the number of dependents for each employee:
CREATE VIEW Familysize (emp_id, tally)
AS
SELECT emp_id, COUNT(*)
FROM Dependents
GROUP BY emp_id;
25.5 Comparing Arrays in Table Format 589
Create a second VIEW that holds pairs of employees who have
families of the same size. (This
VIEW is also useful for other statistical
work, but that is another topic.)
CREATE VIEW Samesize (emp_id1, emp_id2, tally)
AS SELECT F1.emp_id, F2.emp_id, F1.tally
FROM Familysize AS F1, Familysize AS F2
WHERE F1.tally = F2.tally
AND F1.emp_id < F2.emp_id;
We will test for set equality by doing a self-JOIN on the dependents
of employees with families of the same size. If one set can be mapped
onto another with no children left over, and in the same birth order,
then the two sets are equal.
SELECT D1.emp_id, ' named his ',
S1.tally, ' kids just like ',
D2.emp_id
FROM Dependents AS D1, Dependents AS D2, Samesize AS S1
WHERE S1.emp_id1 = D1.emp_id

AND S1.emp_id2 = D2.emp_id
AND D1.kid = D2.kid
AND D1.birthorder = D2.birthorder
GROUP BY D1.emp_id, D2.emp_id, S1.tally
HAVING COUNT(*) = S1.tally;
If birth order is not important, then drop the predicate
D1.birthorder = D2.birthorder from the query.
This is a form of exact relational division, with a second column
equality test as part of the criteria.


CHAPTER

26

Set Operations

B

Y SET OPERATIONS, I mean union, intersection, and set differences,
where the sets in SQL are tables. These are the basic operators used in
elementary set theory, which has been taught in the United States
public school systems for decades. Since the relational model is based
on sets, you would expect that SQL would have had a good variety of
set operators from the start. However, this was not the case. Standard
SQL has added the basic set operators, but they are still not common
in actual products.
There is another problem in SQL that you did not have in high
school set theory. SQL tables are multisets (also called bags), which
means that, unlike sets, they allow duplicate elements (rows or

tuples). Dr. Codd’s relational model is stricter and uses only true sets.
SQL handles these duplicate rows with an

ALL

or

DISTINCT

modifier
in different places in the language;

ALL

preserves duplicates, and

DISTINCT

removes them.
So that we can discuss the result of each operator formally, let R be
a row that is a duplicate of some row in TableA, or of some row in
TableB, or of both. Let

m

be the number of duplicates of R in TableA
and let

n


be the number of duplicates of R in TableB, where (

m

>= 0)
and (

n

>= 0). Informally, the engines will pair off the two tables on a
row-per-row basis in set operations. We will see how this works for
each operator.

×