Tải bản đầy đủ (.pdf) (10 trang)

Joe Celko s SQL for Smarties - Advanced SQL Programming P31 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (378.75 KB, 10 trang )



CHAPTER

13

BETWEEN and OVERLAPS Predicates

T

HE BETWEEN AND OVERLAPS predicates both offer a shorthand way of
showing that one value lies within a range defined by two other
values. The

BETWEEN

predicate works with scalar range limits; the

OVERLAPS

predicate looks at two time periods (defined either by
start and end points or by a starting time and an

INTERVAL

) to see if
they overlap in time.

13.1 The BETWEEN Predicate

The predicate



<value expression> [NOT] BETWEEN <low
value expression> AND <high value expression>

is a
feature of SQL that is used often enough to deserve special attention. It
is also just tricky enough to fool beginning programmers. This
predicate is actually just shorthand for the expression:

((<low value expression> <= <value expression>)
AND (<value expression> <= <high value expression>))

Please note that the end points are included in this definition. This
predicate works with any data types that can be compared. Most
programmers miss this fact and use it only for numeric values, but it
can be used for character strings and temporal data as well. The

<high

274 CHAPTER 13: BETWEEN AND OVERLAPS PREDICATES

value expression>

and

<low value expression>

can be
expressions or constants, but again, programmers tend to use just
constants.


13.1.1 Results with NULL Values

The results of this predicate with

NULL

values for

<value
expression>

,

<low value expression>

, or

<high value
expression>

follow directly from the definition. If both

<low value
expression>

and

<high value expression>


are

NULL

, the result
is

UNKNOWN

for any value of

<value expression>

. If

<low value
expression>

or

<high value expression>

is

NULL

, but not both
of them, the result is determined by the value of

<value expression>



and its comparison with the remaining non-

NULL

term. If

<value
expression>

is

NULL

, the results are

UNKNOWN

for any values of

<low
value expression>

and

<high value expression>

.


13.1.2 Results with Empty Sets

Notice that if

<high value expression>

is less than

<low value
expression>

, the expression will always be

FALSE

unless the value is

NULL

; then it is

UNKNOWN

. That is a bit confusing, since there is no value
to which

<value expression>

could resolve itself that would
produce a


TRUE

result. But this follows directly from expanding the
definition:

x BETWEEN 12 AND 15 depends on the value of x
x BETWEEN 15 AND 12 always FALSE
x BETWEEN NULL AND 15 always UNKNOWN
NULL BETWEEN 12 AND 15 always UNKNOWN
x BETWEEN 12 AND NULL always UNKNOWN
x BETWEEN x AND x always TRUE

13.1.3 Programming Tips

The

BETWEEN

range includes the end points, so you have to be careful.
Here is an example that deals with changing a percent range on a test
into a letter grade:

Grades
low_score high_score grade
=========================
90 100 'A'

13.2 OVERLAPS Predicate 275


80 90 'B'
70 80 'C'
60 70 'D'
00 60 'F'

However, this will not work when a student gets a grade on the
borderlines (90, 80, 70, or 60). One way to solve the problem is to
change the table by adding 1 to the low scores. Of course, the student
who got 90.1 will argue that he should have gotten an ‘A’ and not a ‘B’. If
you add 0.01 to the low scores, the student who got 90.001 will argue
that he should have gotten an ‘A’ and not a ‘B’, and so forth. This is a
problem with a continuous variable. A better solution might be to change
the predicate to

(score BETWEEN low_score AND high_score)
AND (score > low_score)

or simply to

((low_score <
score) AND (score <= high_score))

. Neither approach will be
much different in this example, since few values will fall on the borders
between grades and this table is very, very small.
As a sidebar, the reader might want to look up an introductory book
to fuzzy logic. In that model, an entity can have a degree of membership
in a set, rather than being strictly in or out of the set. Some experimental
databases use fuzzy logic.
However, some indexing schemes might make the


BETWEEN


predicate the better choice for larger tables of this sort. They will keep
index values in trees whose nodes hold a range of values (look up a
description of the B-Tree family in a computer science book). An
optimizer can compare the range of values in the

BETWEEN

predicate to
the range of values in the index nodes as a single action. If the

BETWEEN


predicate were presented as two comparisons, it might execute them as
separate actions against the database, which would be slower.

13.2 OVERLAPS Predicate

The

OVERLAPS

predicate is a feature not yet available in most SQL
implementations, because it requires more of the Standard SQL
temporal data features than most implementations have. Many
programmers have been faking the functionality of the


INTERVAL

data
type with the existing date and time features of their products.

13.2.1 Time Periods and OVERLAPS Predicate

An

INTERVAL

is a measure of temporal duration, expressed in units
such as days, hours, minutes, and so forth. This is how you add or

276 CHAPTER 13: BETWEEN AND OVERLAPS PREDICATES

subtract days to or from a date, hours and minutes to or from a time, and
so forth. When

INTERVAL

s are more generally available, you will also
have an

OVERLAPS

predicate, which compares two time periods. These
time periods are defined as row values with two columns. The first
column (the starting time) of the pair is always a


<datetime>

data type,
and the second column (the termination time) is a

<datetime>

data
type that can be used to compute a

<datetime>

value. If the starting
and termination times are the same, this is an instantaneous event.
The result of the

<overlaps predicate>

is formally defined as
the result of the following expression:

(S1 > S2 AND NOT (S1 >= T2 AND T1 >= T2))
OR (S2 > S1 AND NOT (S2 >= T1 AND T2 >= T1))
OR (S1 = S2 AND (T1 <> T2 OR T1 = T2))

In this expression, S1 and S2 are the starting times of the two time
periods, and T1 and T2 are their termination times.
The rules for the


OVERLAPS

predicate should be intuitive, but they
are not. The principles that we wanted in the standard were:
1. A time period includes its starting point, but does not include
its end point. The reason for this model is that it follows the
ISO convention that there is no 24:00 today; midnight is 00:00
tomorrow. Half-open durations have closure properties that
are useful. The concatenation of two half-open durations is a
half-open duration.
2. If the time periods are not instantaneous, they overlap when
they share a common time period.
3. If the first term of the predicate is an

INTERVAL

and the
second term is an instantaneous event (a

<datetime>

data
type), they overlap when the second term is in the time period
(but is not the end point of the time period).
4. If the first and second terms are both instantaneous events,
they overlap only when they are equal.
5. If the starting time is
NULL and the finishing time is a
<datetime> value, the finishing time becomes the starting
time and we have an event. If the starting time is

NULL and the
finishing time is an
INTERVAL value, then both the finishing
and starting times are
NULL.
13.2 OVERLAPS Predicate 277
Please consider how your intuition reacts to these results, when the
granularity is at the
YEAR-MONTH-DAY level. Remember that a day
begins at 00:00.
(today, today)
OVERLAPS (today, today) is TRUE
(today, tomorrow) OVERLAPS (today, today) is TRUE
(today, tomorrow) OVERLAPS (tomorrow, tomorrow) is FALSE
(yesterday, today) OVERLAPS (today, tomorrow) is FALSE
Since the OVERLAPS predicate is not yet common in SQL products,
let’s see what we have to do to handle overlapping times. Consider a table
of hotel guests with the days of their stays and a table of special events
being held at the hotel. The tables might look like this:
CREATE TABLE Guests
(guest_name CHARACTER(30) NOT NULL PRIMARY KEY,
arrival_date DATE NOT NULL,
depart_date DATE NOT NULL,
);
Guests
guest_name arrival_date depart_date
==============================================
'Dorothy Gale' '2005-02-01' '2005-11-01'
'Indiana Jones' '2005-02-01' '2005-02-01'
'Don Quixote' '2005-01-01' '2005-10-01'

'James T. Kirk' '2005-02-01' '2005-02-28'
'Santa Claus' '2005-12-01' '2005-12-25'
CREATE TABLE Celebrations
(eventname CHARACTER(30) PRIMARY KEY,
start_date DATE NOT NULL,
finish_date DATE NOT NULL,
);
Celebrations
celeb_name start_date finish_date
==================================================
'Apple Month' '2005-02-01' '2005-02-28'
'Christmas Season' '2005-12-01' '2005-12-25'
278 CHAPTER 13: BETWEEN AND OVERLAPS PREDICATES
'Garlic Festival' '2005-01-15' '2005-02-15'
'National Pear Week' '2005-01-01' '2005-01-07'
'New Year's Day' '2005-01-01' '2005-01-01'
'St. Fred's Day' '2005-02-24' '2005-02-24'
'Year of the Prune' '2005-01-01' '2005-12-31'
The BETWEEN operator will work just fine with single dates that fall
between the starting and finishing dates of these celebrations, but please
remember that the
BETWEEN predicate will include the end point of an
interval, and the OVERLAPS predicate will not. To find out if a particular
date occurs during an event, you can simply write queries like:
SELECT guest_name, ' arrived during ', celeb_name
FROM Guests, Celebrations
WHERE arrival_date BETWEEN start_date AND finish_date
AND arrival_date <> finish_date;
This query will find the guests who arrived at the hotel during each
event. The final predicate can be kept, if you want to conform to the

ANSI convention, or dropped, if that makes more sense in your
situation. From now on, we will keep both end points to make the
queries easier to read.
SELECT guest_name, ' arrived during ', celeb_name
FROM Guests, Celebrations
WHERE arrival_date BETWEEN start_date AND finish_date;
Results
guest_name " arrived during " celeb_name
=========================================================
'Dorothy Gale' 'arrived during' 'Apple Month'
'Dorothy Gale' 'arrived during' 'Garlic Festival'
'Dorothy Gale' 'arrived during' 'Year of the Prune'
'Indiana Jones' 'arrived during' 'Apple Month'
'Indiana Jones' 'arrived during' 'Garlic Festival'
'Indiana Jones' 'arrived during' 'Year of the Prune'
'Don Quixote' 'arrived during' 'National Pear Week'
'Don Quixote' 'arrived during' 'New Year's Day'
'Don Quixote' 'arrived during' 'Year of the Prune'
'James T. Kirk' 'arrived during' 'Apple Month'
'James T. Kirk' 'arrived during' 'Garlic Festival'
13.2 OVERLAPS Predicate 279
'James T. Kirk' 'arrived during' 'Year of the Prune'
'Santa Claus' 'arrived during' 'Christmas Season'
'Santa Claus' 'arrived during' 'Year of the Prune'
The obvious question is which guests were at the hotel during each
event. A common programming error when trying to find out if two
intervals overlap is to write the query with the
BETWEEN predicate, thus:
SELECT guest_name, ' was here during ', celeb_name
FROM Guests, Celebrations

WHERE arrival_date BETWEEN start_date AND finish_date
OR depart_date BETWEEN start_date AND finish_date;
This is wrong, because it does not cover the case where the event
began and finished during the guest’s visit. Seeing his error, the
programmer will sit down and draw a timeline diagram of all four
possible overlapping cases, as shown in Figure 13.1.
So the programmer adds more predicates, thus:
SELECT guest_name, ' was here during ', celeb_name
FROM Guests, Celebrations
WHERE arrival_date BETWEEN start_date AND finish_date
OR depart_date BETWEEN start_date AND finish_date
OR start_date BETWEEN arrival_date AND depart_date
OR finish_date BETWEEN arrival_date AND depart_date;
A thoughtful programmer will notice that the last predicate is not
needed and might drop it, but either way, this is a correct query. But it is
not the best answer. In the case of the overlapping intervals, there are
two cases where a guest’s stay at the hotel and an event do not both fall
within the same time frame: either the guest checked out before the
Figure 13.1
Timeline Diagram
of All Possible
Overlapping
Cases.
280 CHAPTER 13: BETWEEN AND OVERLAPS PREDICATES
event started, or the event ended before the guest arrived. If you want to
do the logic, that is what the first predicate will work out to be when you
also add the conditions that arrival_date <= depart_date and start_date
<= finish_date. But it is easier to see in a timeline diagram, thus:
Both cases can be represented in one SQL statement as:
SELECT guest_name, celeb_name

FROM Guests, Celebrations
WHERE NOT ((depart_date < start_date) OR (arrival_date >
finish_date));
VIEW GuestsEvents
guest_name celeb_name
======================================
'Dorothy Gale' 'Apple Month'
'Dorothy Gale' 'Garlic Festival'
'Dorothy Gale' 'St. Fred's Day'
'Dorothy Gale' 'Year of the Prune'
'Indiana Jones' 'Apple Month'
'Indiana Jones' 'Garlic Festival'
'Indiana Jones' 'Year of the Prune'
'Don Quixote' 'Apple Month'
'Don Quixote' 'Garlic Festival'
'Don Quixote' 'National Pear Week'
'Don Quixote' 'New Year's Day'
'Don Quixote' 'St. Fred's Day'
'Don Quixote' 'Year of the Prune'
'James T. Kirk' 'Apple Month'
'James T. Kirk' 'Garlic Festival'
'James T. Kirk' 'St. Fred's Day'
'James T. Kirk' 'Year of the Prune'
'Santa Claus' 'Christmas Season'
'Santa Claus' 'Year of the Prune'
This VIEW is handy for other queries. The reason for using the NOT in
the
WHERE clause is so that you can add or remove it to reverse the sense
Figure 13.2
Timeline Diagram.

13.2 OVERLAPS Predicate 281
of the query. For example, to find out how many celebrations each guest
could have seen, you would write:
CREATE VIEW GuestCelebrations (guest_name, celeb_name)
AS SELECT guest_name, celeb_name
FROM Guests, Celebrations
WHERE NOT ((depart_date < start_date) OR (arrival_date >
finish_date));
SELECT guest_name, COUNT(*) AS celebcount
FROM GuestCelebrations
GROUP BY guest_name;
Results
guest_name celebcount
=========================
'Dorothy Gale' 4
'Indiana Jones' 3
'Don Quixote' 6
'James T. Kirk' 4
'Santa Claus' 2
Then, to find out how many guests were at the hotel during each
celebration, you would write:
SELECT celeb_name, COUNT(*) AS guestcount
FROM GuestCelebrations
GROUP BY celeb_name;
Result
celeb_name guestcount
============================
'Apple Month' 4
'Christmas Season' 1
'Garlic Festival' 4

'National Pear Week' 1
'New Year's Day' 1
'St. Fred's Day' 3
'Year of the Prune' 5
This last query is only part of the story. What the hotel management
really wants to know is how many room nights were sold for a

×