Tải bản đầy đủ (.pdf) (20 trang)

Tài liệu Managing time in relational databases- P5 pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (315.81 KB, 20 trang )

the row representing that assertion will cease to be asserted on
that date even if no correcting assertion is supplied to replace it.
The last reason an assertion end date may be changed is to
lock an assertion which has been updated or deleted by a
deferred transaction, until the resulting deferred assertion
becomes current. We will have more to say about deferred trans-
actions, deferred assertions and locking in Chapter 13.
Now() and UTC
Keeping our notation DBMS agnostic, and keeping the clock
tick granularity generic, we will refer to the current moment, to
right now, as Now().
7
SQL Server may use getdate(), and DB2
may use Current Timestamp or Current Date. Depending on our
clock tick duration, we might need to use a date formatting func-
tion to set the current granularity. In our examples, we generally
use one month as our clock tick granularity. However for our
purposes, Now() can take on values at whatever level of granular-
ity we choose to use, including day, second or microsecond.
Now() is usually assumed to represent the current moment by
us
ing local time.
But local time may change because of entering
or leaving Daylight Savings Time. And another issue is time zone.
At any one time, data about to update a database may exist in a
different time zone than the database itself. Users about to retrieve
data from a database may exist in a different time zone than the
database itself. And, of course, federated queries may attempt to
join data from databases located in different time zones.
So the data values returned by Now() can change for reasons
other than the passage of time. Daylight Savings Time can


change those values. At any one point in time, those values can
differ because of time zones. Clearly, we need a reference frame-
work, and a set of values, that will not change for any reason
other than the passage of time, and that will be the same value,
at any point in time, the world over and year around.
This reference framework is Universal Coordinated Time
(UTC).
8
To make use of UTC, our Asserted Versioning Framework
will convert local time to UTC on maintenance and queries, and
7
Now() is a function that returns the current date. It is not a value. However, we will
often use it to designate a specific point in time. For example, we may say that a time
period starts at Now() and continues on until 12/31/9999. This is a shorthand way of
emphasizing that, whenever that time period was created, it was given as its begin
date the value returned by Now() at that moment.
8
However, even in UTC, some variations in time values do not reflect the passage of
time. We are referring here to the periodic adjustments in UTC made by adding or
removing leap seconds, as we described in an earlier section of this chapter.
62
Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH
will store Asserted Versioning temporal parameters, such as begin
and end dates, in UTC. For example, with Policy_AV being an
asserted version table of insurance policies, we would insert a
policy like this:
INSERT INTO Policy_AV (oid, asr_beg_dt .....)
VALUES (55, CURRENT TIMESTAMP - CURRENT TIMEZONE .....)
For queries, they will perform better if we do the time conver-
sion before using the value as a selection predicate in the SQL

itself. This is because most optimizers treat functions that
appear in predicates as non-indexable. For example, in DB2,
we should write:
SET :my-cut ¼ TIMESTAMP(:my-local-time-value) - CURRENT
TIMEZONE
SELECT .....FROM .....
WHERE oid ¼ 55
AND asr_beg_dt <¼ :my-cut
AND asr_end_dt > :my-cut
rather than
SELECT .....FROM .....
WHERE oid ¼ 55
AND asr_beg_dt <¼
TIMESTAMP(:my-local-time-value) - CURRENT TIMEZONE
AND.....
However, if these functions are used for display purposes, then
there is no reason to exclude them from the queries. For example:
SELECT asr_beg_dt þ CURRENT TIMEZONE AS my_local_asr_beg_dt . .
...FROM.....
It would also be useful to add alternate columns for the tem-
poral dates in our views that have the translation to local time
performed already.
The Very Concept of Bi-Temporality
Business IT professionals were using tables with both an
effective date and a physical row create date by the early 90s.
9
But they were doing so with apparently no knowledge of
9
Or timestamps, or other datatypes. We remind the reader that, throughout this book,
we use the date datatype for all temporal columns, and a first of the month value for

all our dates. This simplifies the presentation without affecting any of the semantics.
In real business applications, of course, these columns would often be timestamps.
Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH
63
academic work on bi-temporality. At that time, these version
tables which also contained a row create date were state of the
art in best practice methods for managing temporal data. We will
discuss them in the next chapter.
With a row creation date, of course, any query can be
re
stric
ted to the rows present inatableasofanyspecificdate
by including a WHERE clause predicate that qualifies only
thoserowswhosecreatedateislessthanorequaltothespeci-
fied date. With two effective dates, tables like these are also able
to specify one of the two temporal dimensions that make up
full bi-temporality.
The standard temporal model uses the term “valid time”
where we use the term “effective time”. But the difference is
purely verbal. We have found no differences between how valid
time works in the standard model, and how effective time works
in Asserted Versioning. We use “effective time” because it is the
preferred term among business IT professionals, and also
because it readily adapts itself to other grammatical forms such
as “becoming effective” and “in effect”.
The standard model states that “(v)alid time ... captur(es) the
history of a changing reality, and transaction time . . . . . captur(es)
the sequence of states of a changing table . . . . . A table supporting
both is termed a “bi-temporal table” [2000, Snodgrass, p. 20]. But
as we will see later, Asserted Versioning does not define bi-tempo-

rality in exactly the same way. The difference lies primarily in the
second of the two temporal dimensions, what computer scientists
call “transaction time” and what we call “assertion time”. While a
transaction begin date always indicates when a row is physically
inserted into a table, an assertion begin date indicates when we
are first willing to assert, or claim, that a row is a true statement
about the object it represents, during that row’s effective (valid)
time period, and also that the quality of the row’s data is good
enough to base business decisions on.
In the standard temporal model, the beginning of a transaction
time period is the date on which the row is created. Obviously,
once the row is created, that date cannot be changed. But in the
Asserted Versioning temporal model, an assertion time period
begins either on the date a row is created, or on a later date.
Because an assertion begin date is not necessarily the same
as the date on which its row is physically created, Asserted
Versioning needs, in addition to the pair of dates that define this
time period, an additional date which is the physical creation
date of each row. That date serves as an audit trail, and as a
means of reconstructing a table as it physically existed at any
past point in time.
64
Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH
What are these rows with future assertion begin dates? To
take a single example, they might be rows for which we have
some of the business data, but not all of it, rows which are in
the process of being made ready “for prime time”. These
rows—which may be assertions about past, present or future
versions—are not yet ready, we will say, to become part of the
production data in the table, not yet ready to become rows that

we are willing to present to the world and of which we are will-
ing to say “We stand behind the statements these rows make.
We claim that the statements they make are (or are likely to
become) true, and that the information these rows provide
meetsthestandardsofreliabilityunderstood(orexplicitly
stated) to apply to all rows in this table”.
So the semantics of the standard temporal model are fully
supported by Asserted Versioning. But Asserted Versioning adds
the semantics of what we call deferred assertions, and which we
have just briefly described. As we will see in later chapters,
deferred assertions are just one kind of internalized pipeline
dataset, and the internalization of pipeline datasets can eliminate
a large part of the IT maintenance budget by eliminating the need
to manage pipeline datasets as distinct physical objects.
Allen Relationships
Allen relationships describe all possible positional
relationships between two time periods along a common time-
line. This includes the special case of one or both time periods
being a point in time, i.e. being exactly one clock tick in length.
There are 13 Allen relationships in total. Six have a
corresponding inverse relationship, and one does not. Standard
treatments of the Allen relationships may be found in both
[2000, Snodgrass] and [2002, Date, Darwen, Lorentzos]. We have
found it useful to reconstruct the Allen relationships as a binary
taxonomy. Our taxonomy is shown in Figure 3.4.
In this diagram, the leaf nodes
include a graphic in
which
there are two timelines, each represented by a dashed line. All
the leaf nodes but one have an inverse, and that one is

italicized; when two time periods are identical, they do not
have a distinct inverse. Thus, this taxonomy specifies 13 leaf-
node relationships which are, in fact, precisely the 13 Allen
relationships.
The names of the Allen relationships are standard, and have
been since Allen wrote his seminal article in 1983. But those
names, and the names of the higher-level nodes in our own tax-
onomy of the Allen relationships, are also expressions in
Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH
65
ordinary language. In order to distinguish between the ordinary
language and the technical uses of these terms, we will include
the names of Allen relationships and our other taxonomy nodes
in brackets when we are discussing them. We will also underline
the non-leaf node relationships in the taxonomy, to emphasize
that they are relationships we have defined, and are not one of
the Allen relationships.
In the following discussion, the first time period in a pair of
them is the one that is either earlier than the other, or not longer
than the other.
Given two time periods on a common timeline, either they
have at least one clock tick in common or they do not. If they
do, we will say that they [
intersect] one another. If they do not,
we will say that they [
exclude] one another.
If there is an [
intersects] relationship between two time per-
iods, then either one [
fills] the other or each [overlaps] the other.

If one time period [
fills] another, then all its clock ticks are also
in the time period it [
fills], but not necessarily vice versa. If one
time period [overlaps] another, then the latter also overlaps the
former; but, being the later of the two time periods, we say that
the latter time period has the inverse relationship, [overlaps
À1
].
In the overlaps cases, each has at least one clock tick that the
Time Periods Relationships
Along a Common Timeline
Fills
Overlaps
Intersects
Excludes
Before
Meets
|----------|
|----------|
|-----| |-----|
|-----|-----|
Equals
|-----|
|-----|
Occupies
Starts Finishes
During
|-----|
|------------|

Aligns
|-----|
|------------|
|-----|
|-----------|
Figure 3.4 The Asserted Versioning Allen Relationship Taxonomy.
66
Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH
other does not have, as well as having at least one clock tick that
the other does have.
If two time periods [
exclude] one another, then they do not
share any clock ticks, and they are either non-contiguous or con-
tiguous. If there is at least one clock tick between them, they are
non-contiguous and we say that one is [before] the other. Other-
wise they are contiguous and we say that one [meets] the other.
If one time period [
fills] the other, then either they are [equal],
or one [
occupies] the other. If they are [equal], then neither has a
clock tick that the other does not have. If one [
occupies] the
other, then all the clock ticks in the occupying time period are
also in the occupied time period, but not vice versa.
If one time period [
occupies] the other, then either they share
an [
aligns] relationship, or one occurs [during] the other. If they
are aligned, then they either start on the same clock tick or end
on the same clock tick, and we say that one either [starts] or

[finishes] the other. Otherwise, one occurs [during] the other,
beginning after the other and ending before it. Note that if two
time periods are aligned, one cannot both [start] and [finish]
the other because if it did, it would be [equal] to the other.
If one time period [starts] another, they both begin on the
same clock tick. If one [finishes] the other, they both end on
the same clock tick. If one time period [
occupies] another, but
they are not aligned, then one occurs [during] the other.
Now let’s consider the special case in which one of the two
time periods is a point in time, i.e. is exactly one clock tick in
length, and the other one contains two or more clock ticks. This
point in time may either [
intersect] or [exclude] the time period.
If the point in time [
intersects] the time period, it also [fills] and
[
occupies] that time period. If it [aligns] with the time period,
then it either [starts] the time period or [finishes] it. Otherwise,
the point in time occurs [during] the time period. If the point
in time [
excludes] the time period, then either may be [before]
the other, or they may [meet].
Finally, let’s consider one more special case, that in which both
the time periods are points in time. Those two points in time may
be [equal], or one may be [before] the other, or they may [meet].
There are no other Allen relationships possible for them.
As we will see later, four of these Allen relationship categories
are especially important. They will be discussed in later
chapters, but we choose to mention them here.

(i) The [
intersects] relationship is important because for a tem-
poral insert transaction to be valid, its effective time period
cannot intersect that of any episode for the same object
which is already in the target table. By the same token, for
Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH
67
a temporal update or delete transaction to be valid, the tar-
get table must already contain at least one episode for the
same object whose effective time period does [
intersect]
the time period designated by the transaction.
(ii) The [
fills] relationship is important because violations of
the temporal analog of referential integrity always involve
the failure of a child time period to [
fill] a parent time
period. We will be frequently discussing this relationship
from the parent side, and we would like to avoid having to
say things like “. . . . . failure of a parent time period to be
filled by a child time period”. So we will use the term
“includes” as a synonym for “is filled by”, i.e. as a synonym
for [
fills
À1
]. Now we can say “. . . . . failure of a parent time
period to include a child time period”.
(iii) The [before] relationship is important because it
distinguishes episodes from one another. Every episode of
an object is non-contiguous with every other episode of

the same object, and so for each pair of them, one of them
must be [before] the other.
(iv) The [meets] relationship is important because it groups
versions for the same object into episodes. A series of vers-
ions for the same object that are all contiguous, i.e. that all
[meet], fall within the same episode of that object.
Advanced Indexing Strategies
Indexes are one way to improve performance. And it should
be clear that it would be a serious performance handicap if we
could not define indexes over either or both of the two time per-
iods of a bi-temporal table. But this proves to be more complex
than it might at first sight appear to be.
The issue is that traditional indexes contain pointers to rows,
pointers which are based on discrete values, while the two time
periods of rows in bi-temporal tables are not discrete values,
but rather an unbroken and non-overlapping sequence of such
values. Such rows occupy points in effective (valid) time or in
assertion (transaction) time only as a limit case. What they really
occupy are intervals along those two timelines. That’s the reason
we need two dates to describe each of them. Traditional
balanced-tree indexes work well with discrete values, including
such discrete values as dates. But they don’t work well with
intervals of time, i.e. with time periods.
But indexing methods which manage intervals are being
developed. Specifically, some bi-temporal indexing methods
manage the two intervals for a bi-temporal object as a single
68
Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH
unit, which would appear as a rectangle on a Cartesian graph in
which one temporal dimension is represented by the X-axis and

the other by the Y-axis.
Another approach is to manage each of the two temporal
dimensions separately. One reason for taking this approach is that,
for the standard temporal model, the two temporal dimensions
behave differently. Specifically, for the standard model, transac-
tion time always moves forwards, whereas valid time can move
forwards or backwards. This means that a bi-temporal row can
be inserted into a table proactively in valid time, but can never
be inserted into a table proactively in transaction time.
Asserted Versioning, as we have already pointed out, supports
both forwards and backwards movement in both temporal
dimensions. So for Asserted Versioning, there is no difference
in behavior which would justify separating the two temporal
dimensions for indexing purposes. Specifically, Asserted
Versioning supports both proactive (future-dated) versions and
proactive assertions (i.e. deferred assertions) and also both retro-
active versions and an approval transaction which can move
deferred assertions backwards in time, but not prior to Now().
In Chapter 15, we will describe certain indexing strategies that
will improve performance using today’s DBMS index designs.
Temporal Extensions to SQL
Following [2000, Snodgrass], we will refer to a future release
of the SQL language that will contain temporal extensions as
SQL3. A more detailed discussion may be found in that book,
although we should note that the book is, at the time of publica-
tion of this book, 10 years old.
Temporal Upward Compatibility
One issue related to changes in the SQL standard is temporal
upward compatibility. In describing SQL3, Snodgrass states that
“(t)emporal upward compatibility at its core says this: ‘Take an

application that doesn’t involve time, that concerns only the cur-
rent reality . . . . . Alter one or more of the tables so that they now
have temporal support . . . . . The application should run as
before, without changing a single line of code’” [2000, Snodgrass,
p. 449].
This cannot be an objective for Asserted Versioning, because
we are limited to current SQL, not to a future release of SQL that
builds temporal extensions into the language itself. But we can
come close. We can achieve this objective for queries by using
a view which filters out all but current data, and by redirecting
Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH
69

×