Tải bản đầy đủ (.pdf) (10 trang)

SQL VISUAL QUICKSTART GUIDE- P44 ppsx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (165.64 KB, 10 trang )

Listing 15.4 defines the sequence shown
in Figure 15.4. You can use a sequence
generator in a few ways. The SQL standard
provides the built-in function
NEXT VALUE FOR
to increment a sequence value, as in:
INSERT INTO shipment(
part_num,
desc,
quantity)
VALUES(
NEXT VALUE FOR part_seq,
‘motherboard’,
5);
If you’re creating a column of unique
values, you can use the keyword
IDENTITY
to define a sequence right in the
CREATE
TABLE
statement:
CREATE TABLE parts (
part_num INTEGER AS
IDENTITY(INCREMENT BY 1
MINVALUE 1
MAXVALUE 10000
START WITH 1
NO CYCLE),
desc AS VARCHAR(100),
quantity INTEGER;
This table definition lets you omit


NEXT
VALUE FOR
when you insert a row:
INSERT INTO shipment(
desc,
quantity)
VALUES(
‘motherboard’,
5);
SQL also provides
ALTER SEQUENCE
and
DROP SEQUENCE
to change and remove
sequence generators.
410
Chapter 15
Generating Sequences
Listing 15.4 Create a sequence generator for the
consecutive integers 1 to 10,000. See Figure 15.4 for
the result.
CREATE SEQUENCE part_seq
INCREMENT BY 1
MINVALUE 1
MAXVALUE 10000
START WITH 1
NO CYCLE;
Listing
1
2

3

9998
9999
10000
Figure 15.4 The sequence that Listing 15.4 generates.
✔ Tip

Oracle, DB2, and PostgreSQL
support
CREATE SEQUENCE
,
ALTER
SEQUENCE
, and
DROP SEQUENCE
. In Oracle,
use
NOCYCLE
instead of
NO CYCLE
. See
your DBMS documentation to see how
sequences are used in your system.
Most DBMSs don’t support
IDENTITY
columns because they have other (pre-
SQL:2003) ways that define columns with
unique values. See Table 3.18 in “Unique
Identifiers” in Chapter 3. PostgreSQL’s

generate_series()
function offers a
quick way to generate numbered rows.
A one-column table containing a sequence
of consecutive integers makes it easy to
solve problems that would otherwise be
difficult with SQL’s limited computational
power. Sequence tables aren’t really part of
the data model—they’re auxiliary tables that
are adjuncts to queries and other “real” tables.
You can create a sequence table by using one
of the methods just described. Alternatively,
you can create one by using Listing 15.5,
which creates the sequence table
seq
by
cross-joining the intermediate table
temp09
with itself. The
CAST
expression concatenates
digit characters into sequential numbers
and then casts them as integers. You can
drop
temp09
after
seq
is created. Figure 15.5
shows the result. The table
seq

contains the
integer sequence 0, 1, 2, …, 9999. You can
shrink or grow this sequence by changing
the
SELECT
and
FROM
expressions in the
INSERT INTO seq
statement.
411
SQL Tricks
Generating Sequences
Listing 15.5 Create a one-column table that contains
consecutive integers. See Figure 15.5 for the result.
CREATE TABLE temp09 (
i CHAR(1) NOT NULL PRIMARY KEY
);
INSERT INTO temp09 VALUES('0');
INSERT INTO temp09 VALUES('1');
INSERT INTO temp09 VALUES('2');
INSERT INTO temp09 VALUES('3');
INSERT INTO temp09 VALUES('4');
INSERT INTO temp09 VALUES('5');
INSERT INTO temp09 VALUES('6');
INSERT INTO temp09 VALUES('7');
INSERT INTO temp09 VALUES('8');
INSERT INTO temp09 VALUES('9');
CREATE TABLE seq (
i INTEGER NOT NULL PRIMARY KEY

);
INSERT INTO seq
SELECT CAST(t1.i || t2.i ||
t3.i || t4.i AS INTEGER)
FROM temp09 t1, temp09 t2,
temp09 t3, temp09 t4;
DROP TABLE temp09;
Listing
i

0
1
2
3
4

9996
9997
9998
9999
Figure 15.5 Result of Listing 15.5.
A sequence table is especially useful for
enumerative and datetime functions.
Listing 15.6 lists the 95 printable charac-
ters in the ASCII character set (if that’s the
character set in use). See Figure 15.6 for
the result.
Listing 15.7 adds monthly intervals to
today’s date (7-March-2005) for the next six
months. See Figure 15.7 for the result. This

example works on Microsoft SQL Server;
the other DBMSs have similar functions that
increment dates.
Sequence tables are handy for normalizing
data that you’ve imported from a non-
relational environment such as a spreadsheet.
Suppose that you have the following non-
normalized table, named
au_orders
, showing
the order of the authors’ names on each
book’s cover:
title_id author1 author2 author3
———————— ——————— ——————— ———————
T01 A01 NULL NULL
T02 A01 NULL NULL
T03 A05 NULL NULL
T04 A03 A04 NULL
T05 A04 NULL NULL
T06 A02 NULL NULL
T07 A02 A04 NULL
T08 A06 NULL NULL
T09 A06 NULL NULL
T10 A02 NULL NULL
T11 A06 A03 A04
T12 A02 NULL NULL
T13 A01 NULL NULL
Listing 15.8 cross-joins
au_orders
with

seq
to produce Figure 15.8. You can
DELETE
the
result rows with nulls in the column
au_id
,
leaving the result set looking like the table
title_authors
in the sample database.
Note that Listing 15.8 does the reverse of
Listing 8.18 in Chapter 8.
412
Chapter 15
Generating Sequences
Listing 15.6 List the characters associated with a set
of character codes. See Figure 15.6 for the result.
SELECT
i AS CharCode,
CHR(i) AS Ch
FROM seq
WHERE i BETWEEN 32 AND 126;
Listing
CharCode Ch

32
33 !
34 "
35 #
36 $

37 %
38 &
39 '
40 (
41 )
42 *
43 +
44 ,
45 -
46 .
47 /
48 0
49 1
50 2
51 3
52 4

Figure 15.6 Result of Listing 15.6.
413
SQL Tricks
Generating Sequences
Listing 15.7 Increment today’s date to six months
hence, in one-month intervals. See Figure 15.7 for
the result.
SELECT
i AS MonthsAhead,
DATEADD("m", i, CURRENT_TIMESTAMP)
AS FutureDate
FROM seq
WHERE i BETWEEN 1 AND 6;

Listing
MonthsAhead FutureDate

1 2005-04-07
2 2005-05-07
3 2005-06-07
4 2005-07-07
5 2005-08-07
6 2005-09-07
Figure 15.7 Result of Listing 15.7.
Listing 15.8 Normalize the table
au_orders
. See
Figure 15.8 for the result.
SELECT title_id,
(CASE WHEN i=1 THEN '1'
WHEN i=2 THEN '2'
WHEN i=3 THEN '3'
END) AS au_order,
(CASE WHEN i=1 THEN author1
WHEN i=2 THEN author2
WHEN i=3 THEN author3
END) AS au_id
FROM au_orders, seq
WHERE i BETWEEN 1 AND 3
ORDER BY title_id, i;
Listing
title_id au_order au_id

T01 1 A01

T01 2 NULL
T01 3 NULL
T02 1 A01
T02 2 NULL
T02 3 NULL
T03 1 A05
T03 2 NULL
T03 3 NULL
T04 1 A03
T04 2 A04
T04 3 NULL
T05 1 A04
T05 2 NULL
T05 3 NULL
T06 1 A02
T06 2 NULL
T06 3 NULL
T07 1 A02
T07 2 A04
T07 3 NULL
T08 1 A06
T08 2 NULL
T08 3 NULL
T09 1 A06
T09 2 NULL
T09 3 NULL
T10 1 A02
T10 2 NULL
T10 3 NULL
T11 1 A06

T11 2 A03
T11 3 A04
T12 1 A02
T12 2 NULL
T12 3 NULL
T13 1 A01
T13 2 NULL
T13 3 NULL
Figure 15.8 Result of Listing 15.8.
✔ Tips

If you have a column of sequential inte-
gers that’s missing some numbers, you
can fill in the gaps by
EXCEPT
ing the
column with a sequence column. See
“Finding Different Rows with
EXCEPT

earlier in this chapter.

To run Listing 15.5 in Microsoft
Access and Microsoft SQL
Server, change the
CAST
expression to:
t1.i + t2.i + t3.i + t4.i
To run Listing 15.5 in MySQL, change
the

CAST
expression to:
CONCAT(t1.i, t2.i, t3.i, t4.i)
To run Listing 15.6 in Microsoft SQL
Server and MySQL, change
CHR(i)
to
CHAR(i)
.
To run Listing 15.8 in Microsoft Access,
change the
CASE
expressions to
Switch()
function calls (see the DBMS Tip in
“Evaluating Conditional Values with
CASE
” in Chapter 5):
(Switch(i=1, ‘1’, i=2, ‘2’,
i=3, ‘3’)) AS au_order,
(Switch(i=1, author1, i=2, author2,
i=3, author3)) AS au_id
414
Chapter 15
Generating Sequences
Calendar Tables
Another useful auxiliary table is a calendar
table. One type of calendar table has a
primary-key column that contains a row
for each calendar date (past and future)

and other columns that indicate the
date’s attributes: business day, holiday,
international holiday, fiscal-month end,
fiscal-year end, Julian date, business-
day offsets, and so on. Another type of
calendar table stores the starting and
ending dates of events (in the columns
event_id
,
start_date
, and
end_date
, for
example). Spreadsheets have more date-
arithmetic functions than DBMSs, so it
might be easier to build a calendar table
in a spreadsheet and then import it as a
database table.
Even if your DBMS has plenty of date-
arithmetic functions, it might be faster to
look up data in a calendar table than to
call these functions in a query.
Finding Sequences, Runs,
and Regions
A sequence is a series of consecutive values
without gaps. A run is like a sequence, but
the values don’t have to be consecutive,
just increasing (that is, gaps are allowed).
A region is an unbroken series of values that
all are equal.

Finding these series requires a table that has
at least two columns: a primary-key column
that holds a sequence of consecutive inte-
gers and a column that holds the values of
interest. The table
temps
(Listing 15.9 and
Figure 15.9) shows a series of high temper-
atures over 15 days.
As a set-oriented language, SQL isn’t a good
choice for finding series of values. The fol-
lowing queries won’t run very fast, so if you
have a lot of data to analyze, you might con-
sider exporting it to a statistical package or
using a procedural host language.
✔ Tip

These queries are based on the ideas in
David Rozenshtein, Anatoly Abramovich,
and Eugene Birger’s Optimizing Transact-
SQL: Advanced Programming Techniques
(SQL Forum Press). You can use the
queries’ common framework to create
similar queries that find other series
of values.
415
SQL Tricks
Finding Sequences, Runs, and Regions
Listing 15.9 List all the column in the table
temps

.
See Figure 15.9 for the result.
SELECT *
FROM temps;
Listing
id hi_temp

1 49
2 46
3 48
4 50
5 50
6 50
7 51
8 52
9 53
10 50
11 50
12 47
13 50
14 51
15 52
Figure 15.9 Result of Listing 15.9.
Listing 15.10 finds all the sequences in
temps
and lists each sequence’s start
position, end position, and length. See
Figure 15.10 for the result. This query
is a lot to take in at first glance, but it’s
easier to understand it if you look at it

piecemeal. Then you’ll be able to under-
stand the rest of the queries in this section.
The subquery’s
WHERE
clause subtracts
id
from
hi_temp
, yielding (internally):
id hi_temp diff
—— ——————— ————
1 49 48
2 46 44
3 48 45
4 50 46
5 50 45
6 50 44
7 51 44
8 52 44
9 53 44
10 50 40
11 50 39
12 47 35
13 50 37
14 51 37
15 52 37
In the column
diff
, note that successive
differences are constant for sequences

(50 – 6 = 44, 51 – 7 = 44, and so on). To find
neighboring rows, the outer query cross-joins
two instances of the same table (
t1
and
t2
), as
described in “Calculating Running Statistics”
earlier in this chapter. The condition
WHERE (t1.id < t2.id)
guarantees that any
t1
row represents an
element with an index (
id
) lower than the
corresponding
t2
row.
416
Chapter 15
Finding Sequences, Runs, and Regions
Listing 15.10 List the starting point, ending point,
and length of each sequence in the table
temps
.
See Figure 15.10 for the result.
SELECT
t1.id AS StartSeq,
t2.id AS EndSeq,

t2.id - t1.id + 1 AS SeqLen
FROM temps t1, temps t2
WHERE (t1.id < t2.id)
AND NOT EXISTS(
SELECT *
FROM temps t3
WHERE (t3.hi_temp - t3.id <>
t1.hi_temp - t1.id
AND t3.id BETWEEN
t1.id AND t2.id)
OR (t3.id = t1.id - 1
AND t3.hi_temp - t3.id =
t1.hi_temp - t1.id)
OR (t3.id = t2.id + 1
AND t3.hi_temp - t3.id =
t1.hi_temp - t1.id)
);
Listing
StartSeq EndSeq SeqSize

6 9 4
13 15 3
Figure 15.10 Result of Listing 15.10.
The subquery detects sequence breaks with
the condition
t3.hi_temp - t3.id <> t1.hi_temp - t1.id
The third instance of
temps
(
t3

) in the sub-
query is used to determine whether any row
in a candidate sequence (
t3
) has the same
difference as the sequence’s first row (
t1
).
If so, it’s a sequence member. If not, the can-
didate pair (
t1
and
t2
) is rejected.
The last two
OR
conditions determine whether
the candidate sequence’s borders can expand.
A row that satisfies these conditions means the
current candidate sequence can be extended
and is rejected in favor of a longer one.
✔ Tip

To find only sequences larger than n
rows, add the
WHERE
condition
AND (t2.id - t1.id) >= n - 1
To change Listing 15.10 to find all
sequences of four or more rows, for

example, replace
WHERE (t1.id < t2.id)
with
WHERE (t1.id < t2.id)
AND (t2.id - t1.id) >= 3
The result is:
StartSeq EndSeq SeqSize
———————— —————— ———————
6 9 4
417
SQL Tricks
Finding Sequences, Runs, and Regions
Listing 15.11 finds all the runs in
temps
and
lists each run’s start position, end position,
and length. See Figure 15.11 for the result.
The logic of this query is similar to that
of the preceding one but accounts for run
values needing only to increase, not (neces-
sarily) be consecutive. The fourth instance
of
temps
(
t4
) is needed because there doesn’t
have to be a constant difference between
id
and
hi_temp

values. The subquery cross-
joins
t3
and
t4
to check rows in the middle
of a candidate run, whose borders are
t1
and
t2
. For every element between
t1
and
t2
(limited by
BETWEEN
),
t3
and its predecessor
t4
are compared to see whether their values
are increasing.
418
Chapter 15
Finding Sequences, Runs, and Regions
Listing 15.11 List the starting point, ending point, and
length of each run in the table
temps
. See Figure 15.11
for the result.

SELECT
t1.id AS StartRun,
t2.id AS EndRun,
t2.id - t1.id + 1 AS RunLen
FROM temps t1, temps t2
WHERE (t1.id < t2.id)
AND NOT EXISTS(
SELECT *
FROM temps t3, temps t4
WHERE (t3.hi_temp <= t4.hi_temp
AND t4.id = t3.id - 1
AND t3.id BETWEEN
t1.id + 1 AND t2.id)
OR (t3.id = t1.id - 1
AND t3.hi_temp <
t1.hi_temp)
OR (t3.id = t2.id + 1
AND t3.hi_temp >
t2.hi_temp)
);
Listing
StartRun EndRun RunLen

2 4 3
6 9 4
12 15 4
Figure 15.11 Result of Listing 15.11.
Listing 15.12 finds all regions in
temps
with

a high temperature of 50 and lists each
region’s start position, end position, and
length. See Figure 15.12 for the result.
✔ Tips

To rank regions by length, add an
ORDER
BY
clause to the outer query:
ORDER BY t2.id - t1.id DESC

To list the individual
id
s that fall in a
region (with value 50), type:
SELECT DISTINCT t1.id
FROM temps t1, temps t2
WHERE t1.hi_temp = 50
AND t2.hi_temp = 50
AND ABS(t1.id - t2.id) = 1;
The standard function
ABS()
, which all
DBMSs support, returns the absolute
value of its argument. The result is:
id
––
4
5
6

10
11
419
SQL Tricks
Finding Sequences, Runs, and Regions
Listing 15.12 List the starting point, ending point, and
length of each region (with value 50) in the table
temps
. See Figure 15.12 for the result.
SELECT
t1.id AS StartReg,
t2.id AS EndReg,
t2.id - t1.id + 1 AS RegLen
FROM temps t1, temps t2
WHERE (t1.id < t2.id)
AND NOT EXISTS(
SELECT *
FROM temps t3
WHERE (t3.hi_temp <> 50
AND t3.id BETWEEN
t1.id AND t2.id)
OR (t3.id = t1.id - 1
AND t3.hi_temp = 50)
OR (t3.id = t2.id + 1
AND t3.hi_temp = 50)
);
Listing
StartReg EndReg RegLen

4 6 3

10 11 2
Figure 15.12 Result of Listing 15.12.

×