Tải bản đầy đủ (.pdf) (20 trang)

Tài liệu Managing time in relational databases- P4 pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (212.19 KB, 20 trang )

approach is better at tracking changes to persistent objects and
to relationships other than metric balances.
State Temporal Data: Uni-Temporal
and Bi-Temporal Data
At this point in our discussion, we are concerned with state
data rather than with event data, and with state data that is
queryable rather than state data that needs to be reconstructed.
What then are the various options for managing temporal
queryable state data?
First of all, we need to recognize that there are two kinds of
states to manage. One is the state of the things we are interested
in, the states those things pass through as they change over time.
But there is another kind of state, that being the state of the data
itself. Data, such as rows in tables, can be in one of two states:
correct or incorrect. (As we will see in Chapter 12, it can also
be in a third state, one in which it is neither correct nor incor-
rect.) Version tables and assertion tables record, respectively,
the state of objects and the state of our data about those objects.
Uni-Temporal State Data
In a conventional Customer table, each row represents the cur-
rent state of a customer. Each time the state of a customer
changes, i.e. each time a row is updated, the old data is overwritten
with the new data. By adding one (or sometimes two) date(s) or
timestamp(s) to the primary key of the table, it becomes a uni-
temporal table. But since we already know that there are two dif-
ferent temporal dimensions that can be associated with data, we
know to ask “What kind of uni-temporal table?”
As we saw in the Preface, there are uni-temporal version
tables and uni-temporal assertion tables. Version tables keep
track of changes that happen in the real world, changes to the
objects represented in those tables. Each change is recorded as


a new version of an object. Assertion tables keep track of correct-
ions we have made to data we later discovered to be in error.
Each correction is recorded as a new assertion about the object.
The versions make up a true history of what happened to those
objects. The assertions make up a virtual logfile of corrections
to the data in the table.
Usually, when table-level temporal data is discussed, the
tables turn out to be version tables, not assertion tables. In their
book describing the alternative temporal model [2002, Date,
Darwen, Lorentzos], the authors focus on uni-temporal
versioned data. Bi-temporality is not even alluded to until the
Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS
41
penultimate chapter, at which point it is suggested that “logged
time history” tables be used to manage the other temporal
dimension. Since bi-temporality receives only a passing mention
in that book, we choose to classify the alternative temporal
model as a uni-temporal model.
In IT best practices for managing temporal data—which we
will discuss in detail in Chapter 4—once again the temporal
tables are version tables, and error correction is an issue that is
mostly left to take care of itself.
4
For the most part, it does so
by overwriting incorrect data.
5
This is why we classify IT best
practices as uni-temporal models.
The Alternative Temporal Model
What we call the alternative tempo

ral model was developed
by Chris Date, Hugh Darwen and Dr. Nikos Lorentzos in their
book Temporal Data and the Relational Model (Morgan-
Kaufmann, 2002).
6
This model is based in large part on tec-
hniques developed by Dr. Lorentzos to manage temporal data
by breaking temporal durations down into temporally atomic
components, applying various transformations to those compo-
nents, and then re-assembling the components back into those
temporal durations—a technique, as the authors note, whose
applicability is not restricted to temporal data.
As we said, except for the penultimate chapter in that book,
the
entire book is a discussi
on of uni-temporal versioned tables.
In that chapter, the authors recommend that if there is a require-
ment to keep track of the assertion time history of a table (which
they call “logged-time history”), it be implemented by means of
an auxiliary table which is maintained by the DBMS.
4
Lacking criteria to distinguish the best from the rest, the term “best practices” has
come to mean little more than “standard practices”. What we call “best practices”, and
which we discuss in Chapter 4, are standard practices we have seen used by many of
our clients.
5
An even worse solution is to mix up versions and assertions by creating a new row,
with a begin date of Now(), both every time there is a real change, and also every time
there is an error in the data to correct. When that happens, we no longer have a history
of the changes things went through, because we cannot distinguish versions from

corrections. And we no longer have a “virtual logfile” of corrections because we don’t
know how far back the corrections should actually have taken effect.
6
The word “model”, as used here and also in the phrases “alternative model” and
“Asserted Versioning model” obviously doesn’t refer to a data model of specific subject
matter. It means something like theory, but with an emphasis on its applicability to
real-world problems. So “the relational model”, as we use the term, for example,
means something like “relational theory as implemented in current relational
technology”.
42
Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS
In addition, these authors do not attempt, in their book, to
explain how this method of managing temporal data would work
with current relational technology. Like much of the computer
science research on temporal data, they allude to SQL operators
and other constructs that do not yet exist, and so their book is in
large part a recommendation to the standards committees to
adopt the changes to the SQL language which they describe.
Because our own concern is with how to implement temporal
concepts with today’s technologies, and also with how to sup-
port both kinds of uni-temporal data, as well as fully bi-temporal
data, we will have little more to say about the alternative tempo-
ral model in this book.
Best Practices
Over several decades, a best practice has emerged in manag-
ing temporal queryable state data. It is to manage this kind of
data by versioning otherwise conventional tables. The result is
versioned tables which, logically speaking, are tables which com-
bine the history tables and current tables described previously.
Past, present and future states of customers, for example, are

kept in one and the same Customer table. Corrections may or
may not be flagged; but if they are not, it will be impossible to
distinguish versions created because something about a cus-
tomer changed from versions created because past customer
data was entered incorrectly. On the other hand, if they are
flagged, the management and use of these flags will quickly
become difficult and confusing.
There are many variations on the theme of versioning, which
we have grouped into four major categories. We will discuss
them in Chapter 4.
The IT community has always used the term “version” for this
kind of uni-temporal data. And this terminology seems to reflect
an awareness of an important concept that, as we shall see, is cen-
tral to the Asserted Versioning approach to temporal data. For the
term “version” naturally raises the question “A version of what?”, to
which our answer is “A version of anything that can persist and
change over time”. This is the concept of a persistent object, and
it is, most fundamentally, what Asserted Versioning is about.
Bi-Temporal State Data
We now come to our second option, which is to manage
both versions and assertions and, most importantly, their
interdependencies. This is bi-temporal data management, the
subject of both Dr. Rick Snodgrass’s book [2000, Snodgrass] and
of our book.
Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS
43
The Standard Temporal Model
What we call the standard temporal model was developed
by Dr. Rick Snodgrass in his book Developing Time-Oriented
Database Applications in SQL (Morgan-Kaufmann, 2000).

Based on the computer science work current at that time,
and especially on the work Dr. Snodgrass and others had done
on the TSQL (temporal SQL) proposal to the SQL standards
committees, it shows how to implement both uni-temporal
and bi-temporal data management using then-current DBMSs
and then-current SQL.
We emphasize that, as we are writing, Dr. Snodgrass’s book is
a decade old. We use it as our baseline view of computer science
work on bi-temporal data because most of the computer science
literature exists in the form of articles in scientific journals that
are not readily accessible to many IT professionals. We also
emphasize that Dr. Snodgrass did not write that book as a com-
pendium of computer science research for an IT audience.
Instead, he wrote it as a description of how some of that research
could be adapted to provide a means of managing bi-temporal
data with the SQL and the DBMSs available at that time.
One of the greatest strengths of the standard model is that it
discusses and illustrates both the maintenance and the querying
of temporal data at the level of SQL statements. For example, it
shows us the kind of code that is needed to apply the temporal
analogues of entity integrity and referential integrity to temporal
data. And for any readers who might think that temporal data
management is just a small step beyond the versioning they
are already familiar with, many of the constraint-checking
SQL statements shown in Dr. Snodgrass’s book should suffice
to disabuse them of that notion.
The Asserted Versioning Temporal Model
What we call the Asserted Versioning temporal model is our
own approach to managing temporal data. Like the standard
model, it attempts to manage temporal data with current tech-

nology and current SQL.
The Asserted Versioning model of uni-temporal and bi-tem-
poral data management supports all of the functionality of the
standard model. In addition, it extends the standard model’s
notion of transaction time by permitting data to be physically
added to a table prior to the time when that data will appear
in the table as production data, available for use. This is done
by means of deferred transactions, which result in deferred
assertions, those being the inserted, updated or logically deleted
44
Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS
rows resulting from those transactions.
7
Deferred assertions,
although physically co-located in the same tables as other data,
will not be immediately available to normal queries. But once
time in the real world reaches the beginning of their assertion
periods, they will, by that very fact, become currently asserted
data, part of the production data that makes up the database
as it is perceived by its users.
We emphasize that deferred assertions are not the same thing
as
rows describing what
things will be like at some time in the
future. Those latter rows are current claims about what things
will be like in the future. They are ontologically post-dated.
Deferred assertions are rows describing what things were, are,
or will be like, but rows which we are not yet willing to claim
make true statements. They are epistemologically post-dated.
Another way that Asserted Versioning differs from the stan-

dard temporal model is in the encapsulation and simplification
of integrity constraints. The encapsulation of integrity con-
straints is made possible by distinguishing temporal transactions
from physical transactions. Temporal transactions are the ones
that users write. The corresponding physical transactions are
what the DBMS applies to asserted version tables. The Asserted
Versioning Framework (AVF) uses an API to accept temporal
transactions. Once it validates them, the AVF translates each
temporal transaction into one or more physical transactions.
By means of triggers generated from a combination of a logical
data model together with supplementary metadata, the AVF
enforces temporal semantic constraints as it submits physical
transactions to the DBMS.
The simplification of these integrity constraints is made possi-
ble by introducing the concept of an episode. With non-temporal
tables, a row representing an object can be inserted into that table
at some point in time, and later deleted from the table. After it is
deleted, of course, that table no longer contains the information
that the row was ever present. Corresponding to the period of
time during which that row existed in that non-temporal table,
there would be an episode in an asserted version table, consisting
of one or more temporally contiguous rows for the same object.
So an episode of an object in an asserted version table is in effect
during exactly the period of time that a row for that object would
exist in a non-temporal table. And just as a deletion in a conven-
tional table can sometime later be followed by the insertion of a
new row with the same primary key, the termination of an
7
The term “deferred transaction” was suggested by Dr. Snodgrass during a series of
email exchanges which the authors had with him in the summer of 2008.

Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS
45
episode in an assertion version table can sometime later be
followed by the insertion of a new episode for the same object.
In a non-temporal table, each row must conform to entity
integrity and referential integrity constraints. In an asserted ver-
sion table, each version must conform to temporal entity integ-
rity and temporal referential integrity constraints. As we will
see, the parallels are in more than name only. Temporal entity
integrity really is entity integrity applied to temporal data. Tem-
poral referential integrity really is referential integrity applied to
temporal data.
Glossary References
Glossary entries whose definitions form strong inter-
dependencies are grouped together in the following list. The
same glossary entries may be grouped together in different ways
at the end of different chapters, each grouping reflecting the
semantic perspective of each chapter. There will usually be sev-
eral other, and often many other, glossary entries that are not
included in the list, and we recommend that the Glossary be
consulted whenever an unfamiliar term is encountered.
as-is
as-was
Asserted Versioning
Asserted Versioning Framework (AVF)
episode
persistent object
state
thing
physical transaction

temporal transaction
temporal entity integrity (TEI)
temporal referential integrity (TRI)
the alternative temporal model
the Asserted Versioning temporal model
the standard temporal model
46
Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS
PART
2
AN INTRODUCTION TO
ASSERTED VERSIONING
Chapter Contents
3. The Origins of Asserted Versioning: Computer Science Research 51
4. The Origins of Asserted Versioning: The Best Practices 75
5. The Core Concepts of Asserted Versioning 95
6. Diagrams and Other Notations 119
7. The Basic Scenario 141
Part 1 provided the context for Asserted Versioning, a history
and a taxonomy of various ways in which temporal data has
been managed over the last several decades. Here in Part 2, we
introduce Asserted Versioning itself and prepare the way for
the detailed discussion in Part 3 of how Asserted Versioning
actually works.
In Chapter 3, we discuss the origins of Asserted Versioning in
computer science research. Based on the work of computer
scientists, we introduce the concepts of a clock tick and an
atomic clock tick, the latter of which, in their terminology, is
called a chronon. We go on to discuss the various ways in which
time periods are represented by pairs of dates or of timestamps,

since SQL does not directly support the concept of a time period.
There are only a finite number of ways that two time periods
can be situated, with respect to one another, along a common
Managing Time in Relational Databases. Doi: 10.1016/B978-0-12-375041-9.00024-8
Copyright
#
2010 Elsevier Inc. All rights of reproduction in any form reserved.
47
timeline. For example, one time period may entirely precede or
entirely follow another, they may partially overlap or be identi-
cal, they may start at different times but end at the same time,
and so on. These different relationships among pairs of time per-
iods have been identified and catalogued, and are called the
Allen relationships. They will play an important role in our
discussions of Asserted Versioning because there are various
ways in which we will want to compare time periods. With the
Allen relationships as a completeness check, we can make sure
that we have considered all the possibilities.
Another important section of this chapter discusses the dif-
ference between the computer science notion of transaction
time, and our own notion of assertion time. This difference is
based on our development of the concepts of deferred trans-
actions and deferred assertions, and for their subsumption under
the more general concept of a pipeline dataset.
In Chapter 4, we discuss the origins of Asserted Versioning in
IT best practices, specifically those related to versioning. We
believe that these practices are variations on four basic methods
of versioning data. In this chapter, we present each of these
methods by means of examples which include sample tables
and a running commentary on how inserts, updates and deletes

affect the data in those tables.
In Chapter 5, we present the conceptual foundations of
Asserted Versioning. The core concepts of objects, episodes, vers-
ions and assertions are defined, a discussion which leads us to
the fundamental statement of Asserted Versioning, that every
row in an asserted version table is the assertion of a version of
an episode of an object. We continue on to discuss how time
periods are represented in asserted version tables, how temporal
entity integrity and temporal referential integrity enforce the core
semantics of Asserted Versioning, and finally how Asserted
Versioning internalizes the complexities of temporal data
management.
In Chapter 6, we introduce the schema common to all
asserted version tables, as well as various diagrams and notations
that will be used in the rest of the book. We also introduce the
topic of how Asserted Versioning supports the dynamic views
that hide the complexities of that schema from query authors
who would otherwise likely be confused by that complexity.
When an object is represented by a row in a non-temporal
table, the sequence of events begins with the insertion of that
row, continues with zero or more updates, and either continues
on with no further activity, or ends when the row is eventually
deleted. When an object is represented in an asserted version
48
Part 2 AN INTRODUCTION TO ASSERTED VERSIONING

×