Tải bản đầy đủ (.pdf) (10 trang)

4. b Object Oriented Databases 2012

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (231.58 KB, 10 trang )

Object-oriented Databases1

Lecture Notes: Scientific Databases,
Prof. Gaston Gonnet (2012)
1

Stephanie Fingerhuth and Thomas Tschager
Object-oriented databases (OODBs) can be viewed as an extension
of relational databases (RDBs): the attributes of the database can be
objects which are defined in an object-oriented language (OOL). In contrast to RDBs, OODBs are thus not completely isolated and portable,
but are tightly connected to their OOL. Objectivity (C++, C#, Java,
Python, Smalltalk and XML), ObjectStore (C++, Java, .NET), and O2
(C++) are the most prominent products. The field of OODBs is in general much smaller than the RDB field.

OODBs vs. RDBs
OODBs are data management systems that store data in tuples of
attributes organized in relations (Figure 1). There are thus similar
to RDBs (previous lecture), but there are also major differences. The
first, distinctive difference is that the attributes of the OODB can be
objects, i.e. attributes do not only contain a single entry, but consist
of a collection of bits (example 1 and subsection objects). Objects are
defined in an OOL like Python, Java, C++ or a language unique to
the database.
GRADES

STUDENT

LectureName

ECTS


S-ID

Grade

S-ID

S-Name

S-Birth

SDB

4
4
...

001
002
...

5.75
5.5
...

001
002
...

Alice


<date::01>

Bob
...

<date::02>
...

SDB
...

n:1

DATE
Day
Month
Year
Age()

Example 1 (OODB are RDBs where attributes can be objects). We
want to store the name and age of all students. In RDBs, the age of a student is stored, whereas in OODBs it can be computed by a complex object storing only the birthday and a method to compute the age: Using
an RDB, the table ❙t✉❞❡♥t contains the tuple ❙t✉❞❡♥t✳❙✲■❉ (integer),
❙t✉❞❡♥t✳❙✲◆❛♠❡ (string), ❙t✉❞❡♥t✳❙✲❇✐rt❤ (date), and ❙t✉❞❡♥t✳❆❣❡
(integer). Using an OODB, the table ❙t✉❞❡♥t contains the tuple ❙t✉❞❡♥t✳❙✲■❉
(integer), ❙t✉❞❡♥t✳❙✲◆❛♠❡ (string), and ❙t✉❞❡♥t✳❙✲❇✐rt❤, where
❙t✉❞❡♥t✳❙✲❇✐rt❤ is an object storing the integers ❉❛②, ▼♦♥t❤, ❨❡❛r and
the method ❆❣❡✭✮ (Figure 1).

Figure 1: OODBs store data in tuples
(rows) of attributes (columns) organized

in relations (tables). Unlike RDBs, the
attributes can be objects defined by the
underlying OOL. In the example ❉❆❚❊
is an object that stores the integers ❉❛②,
▼♦♥t❤ and ❨❡❛r as well as the method
❆❣❡✭✮.


object-oriented databases

A second distinction from RDBs is that there exists no established
standard. The Object Data Management Group (ODMG, ❤tt♣✿✴✴
✇✇✇✳♦❞❜♠s✳♦r❣✴❖❉▼●✴) defined the Object Data Management Standard
ODMG 3.0 (2000). A major component of this standard is Object
Query Language (OQL), a non-procedural language similar to SQL
for RDBs. OQL is based on SQL (see similarity to SQL in example 2);
it supports update and query functionalities. But unlike SQL, OQL
is not an established standard; it has never been fully implemented.
This is mainly because of the tight connection between OODBs and
programming languages: A OODB is closely depending on a objectoriented language. This causes unavoidable differences between the
various OODBs.

Example 2. Imagine the university using the database introduced above
wants to find suitable candidates for a scholarship. The criteria that have to
be met are
• grades that are on average better than 5.5 and
• being less than 25 years old.
Suitable candidates can be found by querying the tables ❙t✉❞❡♥ts and
●r❛❞❡s (figure 1) using OQL


❙❊▲❊❈❚ ❙t✉❞❡♥t✳❙✲■❉✱ ❆❱●✭●r❛❞❡s✳❣r❛❞❡✮
❋❘❖▼
❙❊▲❊❈❚ ❙✲■❉✱ ❙✲◆❛♠❡
❋❘❖▼ ❙t✉❞❡♥t
❲❍❊❘❊ ❇✐rt❤✳❆❣❡ ❁ ✷✺
❲❍❊❘❊ ●r❛❞❡s✳❙✲■❉ ❂ ❙t✉❞❡♥t✳❙✲■❉
●❘❖❯P ❇❨ ❙t✉❞❡♥t✳❙✲■❉
where ❇✐rt❤✳❛❣❡ is a function computing the age of student from his
birthday (part of the object that forms the attribute ❇✐rt❤ in ❙t✉❞❡♥ts)
and ❆❱●✭●r❛❞❡s✳❣r❛❞❡✮ computes the average of all grades obtained by
the same student. The SQL statement would look similar, but we would not
be able to make use of a method to compute the age. Therefore, we would
need a more complex SQL statement, e.g. using arithmetics or the function
❉❆❚❊❉■❋❋.

Although OODBs are conceptually ideal for scientific databases
(SDBs), the main disadvantages are:
• not as frequently used as RDBs: less tools, libraries and support
available

2


object-oriented databases

• fewer good implementations
• restrictions imposed by use of a specific OOL
• research-your-own is very popular
Furthermore, RDBs converge into OODBs which makes OODBs
more and more obsolete. The convergence is possible through objectrelational mapping (ORM). The key idea is to store objects defined

in the object-oriented language in a RDB using a group of attributes,
such that the properties and relationships are conserved. The object
can be restored with all functionalities. Therefore, this mapping
creates a virtual OODB using an RDB. However, this approach has
some conceptual difficulties (object-relational impedance mismatch),
which arise from the different concepts of RDBs (relational algebra)
and OODBs (object orientation).

Glue language
All parts outside the database are programmed in the glue language.
These parts outside include for example Backups, Archives, Auditing,
Computation, Validation, Output Production or Filters (see figure 2
"General Picture/Flow of the SDB" from the first lecture). One glue
language is thus always needed to connect an OODB to its outside.
In addition, there should never be more than a single glue language,
as the glue language used should be the OOL the OODB is based on.
Thus we arrive at:
Maxime 3: Oh No! #(glue languages) = 1

Objects in SDBs
Objects in the context of SDBs are named collections of bits which can
include attributes or any other components. They are understood
by the system without additional knowledge. Understanding means
knowing the following:
• the type (rich types, e.g. the object ❉❆❚❊ in figure 1)
• the size
• the values
• the validity
Objects are fully described by blocks, selectors and constructors.
Blocks can for example be numbers, strings, object references, or


Unlike in RDBs, a matrix can be stored
in an OODBs in such a way that the
system knows all properties (e.g.
number of rows and columns, the
values and the validity) of the matrix.

3


object-oriented databases

4

closed entities we do not desire to look inside (pictures, movies, pdfs
etc.). Constructors on the other hand are functions that are called
when an object is initialized. They guarantee, for example, that the
object is valid (see table 1 for an example of validity rules).
Name

Type

Validity rules

Day

integer

Month
Year


integer
integer

0 < Day ≤ 31
(Month = 2 ∧ Year mod 4 = 0) ⇒ Day ≤ 28
0 < Day ≤ 12
0 < Year ≤ today().getYear()

Table 1: The fields of the object ❞❛t❡
(see figure 1) with some basic validity
rules. The validity rules are checked by
the constructor and on every update
event.

It is also possible that objects themselves contain objects. These
subordinate objects can can be either included or referenced. Here, an
included object is a direct part of the superordinate object, whereas
a referenced object is an object that exists also outside of the superordinate object. Each attribute has a name, a type and validity rules
which are enforced by the object constructor.
Another important feature of objects is that they allow the addition
of arbitrary fields that can also be empty. Thus a value can be added
to the fields of some objects of a class without having to update all.
This is a very desirable property for databases, as not every possible
evolution can be foreseen in the process of designing a database. This
is more difficult using RDBs: even though nowadays ORMs allow
for the migration of database schemes, changing the design of the
relations in RDBs remains inconvenient.

Normal Forms

All attributes or objects in OODB have to fulfil the normal forms (as
known from RDB, see RDB lecture notes): A → B → C, where →
✟B✟
✟ A is a violation of the normal
means functional dependent. C ✟


forms. The 12 rules of Codd (see Appendix) also apply to OODBs if
adapted.
Normal forms: (see also previous lecture on RDBs) The normal
forms (NF) where defined in RDB theory in order to avoid anomalies
after insert, update or delete events. The first three normal forms
were formulated by E. F. Codd in the early seventies:
1NF defines the relation property between tables: an attribute has
to contain atomic values.
2NF: No non prime attribute is dependent on any proper subset of
any candidate key of the table.
3NF: Every non-prime attribute is non-transitively dependent on
every candidate key.

Codd, E.F. A Relational Model of Data for
Large Shared Data Banks. Communications of the ACM 13 (6): 377-387, June
1970
Codd, E.F. Further Normalization of
the Data Base Relational Model. IBM
Research Report RJ909, August 1971.


object-oriented databases


5

Example 3 (Violation of normal forms). The relation ●❘❆❉❊❙ in figure 1
violates the second normal form: The candidate key is the set {▲❡❝t✉r❡◆❛♠❡✱
❙✲■❉}. The non-prime attribute ❊❈❚❙ is only depending on ▲❡❝t✉r❡◆❛♠❡.
This database design can cause various anomalies: A change of the credit
points for a lecture would cause update anomalies, if not all corresponding
rows would be updated. Moreover, a lecture can only be added, if the grade
for at least one student would be available.
Figure 2 shows an alternative design, which is in second normal form.
STUDENT

GRADES
LectureName

S-ID

Grade

S-ID

S-Name

S-Birth

SDB

001
002
...


5.75
5.5
...

001
002
...

Alice

<date::01>

Bob
...

<date::02>
...

SDB
...

n:1

1:n
LECTURES
LectureName

ECTS


SDB
...

4
...

DATE
Day
Month
Year
Age()

As mentioned above, objects have names. Names can be both
URLs or URIs and the object names (onames) are used to reference
the object within the DB. The names can for example be composed of
❚②♣❡✿▲♦❝❛t♦r✿■❉, where ▲♦❝❛t♦r is in principle a query, a name (of
a file or database) or a computation (see for example ❙t✉❞❡♥ts✳❙✲❇✐rt❤
in figure 1).
Finally, objects have selectors which allow to select parts of individual objects. They also supply attribute names of default values if not
defined. Furthermore, selectors can compute results from the objects.

Operations in OODBs
Although operations in OODBs are dependent on the underlying
OOL (see comparison of frequently used OOLs in Appendix 2), they
have some common characteristics of object oriented languages.
First of all, operators can be polymorphic. Polymorphism is the
notion of using a common operator for various types of inputs. For
example a + b, a · b, ab , a ∧ b will adapt to the type of operands they
are applied to.
This often means that an operator has different implementations

for various types of inputs – a concept called operator overloading (see

Figure 2: Table ●r❛❞❡s is in the second
normal form (in contrast to figure 1), as
the only non-prime attribute is neither
only depending on ▲❡❝t✉r❡◆❛♠❡, nor
on ❙✲■❉.


object-oriented databases

example 4). The used implementation is chosen based on the type of
the inputs, i.e. there are different implementations for adding up two
integers or two matrices; the result will be valid in both cases.
Example 4 (Operator overloading and computations on the fly). In
Darwin the operator + can be used to add up integers

❃ ❛ ✿❂ ✺ ✰ ✼❀
❛ ✿❂ ✶✷
but also to add random numbers to an existing ❙t❛t✭✮ data structure

❃ ❝ ✿❂ ❙t❛t✭✬♦♥❡ ♠✐❧❧✐♦♥ ❬✵✱✶❪ r❛♥❞♦♠ ♥✉♠❜❡rs✬✮✿
❃ t♦ ✶❡✻ ❞♦ ❝ ✰ ❘❛♥❞✭✮ ♦❞✿
♣r✐♥t✭❝✮❀
♦♥❡ ♠✐❧❧✐♦♥ ❬✵✱✶❪ r❛♥❞♦♠ ♥✉♠❡rs✿ ♥✉♠❜❡r ♦❢ s❛♠♣❧❡s ✶❡✰✵✻
♠❡❛♥ ❂ ✵✳✹✾✾✽✸ ✰✲ ✵✳✵✵✵✺✼
✈❛r✐❛♥❝❡ ❂ ✵✳✵✽✸✸✷ ✰✲ ✵✳✵✵✵✶✺
s❦❡✇♥❡ss❂ ✵✳✵✵✵✽✻✽✷✶✱ ❡s❝❡ss❂✲✶✳✶✾✽✸✷
♠✐♥✐♠✉♠❂✶✳✹✸✸✵✼❡✲✵✻✱ ♠❛①✐♠✉♠❂✵✳✾✾✾✾✾✼
The statistical information provided by the ❙t❛t✭✮ structure c is hereby not

stored, but actually computed on the fly. This can be seen if the union of
another ❙t❛t✭✮ structure e with c is printed:

❃ ❡ ✿❂ ❙t❛t✭✬❛♥♦t❤❡r ♠✐❧❧✐♦♥ ❬✵✱✶❪ r❛♥❞♦♠ ♥✉♠❜❡rs✬✮✿
❃ t♦ ✶❡✻ ❞♦ ❡ ✰ ❘❛♥❞✭✮ ♦❞✿
❃ ♣r✐♥t ✭❝ ✉♥✐♦♥ ❡✮❀
♦♥❡ ♠✐❧❧✐♦♥ ❬✵✱✶❪ r❛♥❞♦♠ ♥✉♠❜❡rs ❛♥❞ ❛♥♦t❤❡r ♠✐❧❧✐♦♥ ❬✵✱✶❪
r❛♥❞♦♠ ♥✉♠❜❡rs✿ ♥✉♠❜❡r ♦❢ s❛♠♣❧❡ ♣♦✐♥ts❂✷❡✰✵✻
♠❡❛♥ ❂ ✵✳✺✵✵✵✻ ✰✲ ✵✳✵✵✵✹✵
✈❛r✐❛♥❝❡ ❂ ✵✳✵✽✸✸✷ ✰✲ ✵✳✵✵✵✶✵
s❦❡✇♥❡ss❂✵✳✵✵✵✶✵✹✸✼✹✱ ❡①❝❡ss❂✲✶✳✶✾✾✸✸
♠✐♥✐♠✉♠❂✸✳✻✷✽✽✻❡✲✵✼✱ ♠❛①✐♠✉♠❂✶
Here c union e is not the union of the fixed statistical values of c and e, but
the statistical values of the union c and e.
Objects in OOLs can also have methods which can be accessed
like attributes. The result of methods is not stored, but computed on
the fly (see example 4 where the statistical values mean, variance,
skewness, excess, minimum and maximum are computed on the fly).
Hence, they correspond to the notion of views: A view is a stored
query, the result of which is computed on the fly based on stored
information.
As can be seen in example 2, searching in OODBs works similar to SQL queries in RDBs with Select... From... Where statements.
Apart from the obvious difference that not only attributes, but also

6


object-oriented databases

objects and attributes of objects can be used, the main difference is

that also the methods of objects can be used for queries. Again, the
values have to be computed on the fly. This has two consequences:
first, query optimization in OODBs is complicated; the complexity of
the model and query optimization are positively correlated. Second,
OODB systems are slower and less efficient than their RDB counterparts because of the overhead in storing objects and the increased
complexity in interpretation.

Summary
OODBs are similar to RDBs, but they have the huge advantage that
their attributes can be objects. Objects are collections of bits which
are understood by the system without any further knowledge. This
and also the fact that computations on the fly and the addition of
arbitrary fields are possible make OODBS very appealing for SDBs.
But they are also many drawbacks. First, OODBs are used less frequenlty than RDBs, there are thus less tools, less support and less
libraries available. Also there are fewer good implemetations and
no established standards. Second, OODBs are dependent on and
thus restricted by an OOL. Third, OODBs are less efficient and much
slower than their RDB counterparts because of the overhead in storing objects and increased complexity in interpretation. And fourth,
OODBs become more and more obsolete with ORM allowing for
virtual OODBs in RDBs and thus convergence of RDBs to OODBs.
Nevertheless, OODBs are a very attractive model for SDBs because of
their flexibilities.

7


object-oriented databases

8


Appendix 1
Codd’s 12 rules
Rule (0): The system must qualify as relational, as a database,
and as a management system. For a system to qualify as a relational
database management system (RDBMS), that system must use its
relational facilities (exclusively) to manage the database.
Rule 1: The information rule: All information in a relational
database (including table and column names) is represented in only
one way, namely as a value in a table.
Rule 2: The guaranteed access rule: All data must be accessible.
This rule is essentially a restatement of the fundamental requirement
for primary keys. It says that every individual scalar value in the
database must be logically addressable by specifying the name of the
containing table, the name of the containing column and the primary
key value of the containing row.
Rule 3: Systematic treatment of null values: The database management system must allow each field to remain null (or empty).
Specifically, it must support a representation of "missing information
and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number", in
the case of numeric values), and independent of data type. It is also
implied that such representations must be manipulated by the DBMS
in a systematic way.
Rule 4: Active online catalog based on the relational model: The
system must support an online, inline, relational catalog that is accessible to authorized users by means of their regular query language.
That is, users must be able to access the database’s structure (catalog)
using the same query language that they use to access the database’s
data.
Rule 5: The comprehensive data sublanguage rule: The system
must support at least one relational language that has a linear syntax,
can be used both interactively and within application programs,
supports data definition operations (including view definitions), data

manipulation operations (update as well as retrieval), security and
integrity constraints, and transaction management operations (begin,
commit, and rollback).
Rule 6: The view updating rule: All views that are theoretically
updatable must be updatable by the system.
Rule 7: High-level insert, update, and delete: The system must
support set-at-a-time insert, update, and delete operators. This means
that data can be retrieved from a relational database in sets constructed of data from multiple rows and/or multiple tables. This rule
states that insert, update, and delete operations should be supported

cited from ❤tt♣✿✴✴❡♥✳✇✐❦✐♣❡❞✐❛✳♦r❣✴

✇✐❦✐✴❈♦❞❞✪✷✼s❴✶✷❴r✉❧❡s★❚❤❡❴r✉❧❡s
on 18/10/2012


object-oriented databases

for any retrievable set rather than just for a single row in a single
table.
Rule 8: Physical data independence: Changes to the physical level
(how the data is stored, whether in arrays or linked lists etc.) must
not require a change to an application based on the structure.
Rule 9: Logical data independence: Changes to the logical level
(tables, columns, rows, and so on) must not require a change to an
application based on the structure. Logical data independence is
more difficult to achieve than physical data independence.
Rule 10: Integrity independence: Integrity constraints must be
specified separately from application programs and stored in the
catalog. It must be possible to change such constraints as and when

appropriate without unnecessarily affecting existing applications.
Rule 11: Distribution independence: The distribution of portions
of the database to various locations should be invisible to users of the
database. Existing applications should continue to operate successfully when a distributed version of the DBMS is first introduced and
when existing distributed data are redistributed around the system.
Rule 12: The nonsubversion rule: If the system provides a lowlevel (record-at-a-time) interface, then that interface cannot be used
to subvert the system, for example, bypassing a relational security or
integrity constraint.

9


object-oriented databases

Appendix 2
Comparison of frequently used OOLs

Type conversion
Object selection

C++
A,B
X,Y,Z
a,b
f,g
A a(...);
A *a = new A(...)
(B)a
a.X


Polymorphic functions
Polymorphic methods
Polymorphic operators

f(a)
virtual
A::operator+(...)

Inheritance

class A : public B,C
Multiple inheritance
Different protection levels
template<class A>
class B
template<class A>
A max(A a,A b)
yes
no

Object names
Attribute names
Variables
Functions
Object construction

Generics/Templates
for classes
Generics/Templates
for functions/methods

Introspection
Reflection

as provided on the course website (❤tt♣✿✴✴✇✇✇✳❝❜r❣✳❡t❤③✳❝❤✴
❡❞✉❝❛t✐♦♥✴❙❉❇✴❧❛♥❣✉❛❣❡s✳♣❞❢) on
20/10/2012.

Java
A,B
X,Y,Z
a,b
f,g
A a = new A(...)

Python
A,B
X,Y,Z
a,b
f,g
a = A(...)

Darwin
A,B
X,Y,Z
a,b
f,g
a := A(...)

(B)a
a.X

Java.lang.reflection.*
f(a)
(all)
(only predefined)

(B)a
a.X
getattr(a,”X”)
f(a)
(all)
a+b, c — set([1]), 5*’d’
A. add (self,other):
class A(B):

B(a)
a[X], a[’X’] a[other]
(computed value)
f(a)
A f, A B (converter)
a+b, c union 1, 5*d

no

no

all parameters generic

no

yes

yes

Introspection(A)
GetMethods(A)

public class A
extends B
implements C
(classes and interfaces)
public class A<B,C>
public static <A>
A max(A a,A b)
yes
yes

Table 1: Overview of some OO languages

Inherit(A,B)
ExtendClass(A,B,
[name,type,def],...)

10



×