Tải bản đầy đủ (.ppt) (156 trang)

slide cơ sở dữ liệu tiếng anh chương (30) semistructured data and xml transparencies

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.7 MB, 156 trang )

1
Chapter 30
Semistructured Data and XML
Transparencies
© Pearson Education Limited 1995, 2005
2
Chapter 30 - Objectives

What semistructured data is.

Concepts of the Object Exchange Model (OEM), a
model for semistructured data.

Basics of Lore, a semistructured DBMS, and its
query language, Lorel .

Main language elements of XML.

Difference between well-formed and valid XML
documents.

How Document Type Definitions (DTDs) can be
used to define valid syntax of an XML document.
© Pearson Education Limited 1995, 2005
3
Chapter 30 - Objectives

How Document Object Model (DOM) compares
with OEM.

About other related XML technologies.



Limitations of DTDs and how XML Schema
overcomes these limitations.

How RDF and RDF Schema provide a foundation
for processing metadata.

W3C XQuery Language.

How to map XML to databases.

SQL:2003 support for XML.
© Pearson Education Limited 1995, 2005
4
Introduction

In 1998 XML 1.0 was formally ratified by W3C.

Yet, has impacted every aspect of programming
including graphical interfaces, embedded systems,
distributed systems, and database management.

Already becoming de facto standard for data
communication within software industry, and is
quickly replacing EDI systems as primary medium
for data interchange among businesses.

Some analysts believe it will become language in
which most documents are created and stored, both
on and off Internet.

© Pearson Education Limited 1995, 2005
5
Semistructured Data
Data that may be irregular or incomplete and have
a structure that may change rapidly or
unpredictably.

Semistructured data is data that has some structure,
but structure may not be rigid, regular, or
complete.

Generally, data does not conform to fixed schema
(sometimes use terms schema-less or self-
describing).
© Pearson Education Limited 1995, 2005
6
Semistructured Data

Information normally associated with schema is
contained within data itself.

Some forms of semistructured data have no
separate schema, in others it exists but only places
loose constraints on data.

Unfortunately, relational, object-oriented, and
object-relational DBMSs do not handle data of this
nature particularly well.
© Pearson Education Limited 1995, 2005
7

Semistructured Data

Has gained importance recently for various
reasons:

may be desirable to treat Web sources like a
database, but cannot constrain these sources with a
schema;

may be desirable to have a flexible format for data
exchange between disparate databases;

emergence of XML as standard for data
representation and exchange on the Web, and
similarity between XML documents and
semistructured data.
© Pearson Education Limited 1995, 2005
8
Example 30.1
© Pearson Education Limited 1995, 2005
9
Example 30.1

Note, data is not regular:

for John White, hold first and last names, but
for Ann Beech store single name and also store
a salary;

for property at 2 Manor Rd, store a monthly

rent whereas for property at 18 Dale Rd, store
an annual rent;

for property at 2 Manor Rd, store property type
(flat) as a string, whereas for property at 18
Dale Rd, store type (house) as an integer value.
© Pearson Education Limited 1995, 2005
10
Example 30.1
© Pearson Education Limited 1995, 2005
11
Object Exchange Model (OEM)

Data in OEM is schema-less and self-describing,
and can be thought of as labeled directed graph
where nodes are objects, consisting of:

unique object identifier (for example, &7),

descriptive textual label (street),

type (string),

a value (“22 Deer Rd”).

Objects are decomposed into atomic and complex:

atomic object contains value for base type (e.g.,
integer or string) and in diagram has no outgoing
edges.


All other objects are complex objects whose types
are a set of object identifiers.
© Pearson Education Limited 1995, 2005
12
Object Exchange Model (OEM)

A label indicates what the object represents and is
used to identify the object and to convey the
meaning of the object, and so should be as
informative as possible.

Labels can change dynamically.

A name is a special label that serves as an alias for
a single object and acts as an entry point into the
database (for example, DreamHome is a name that
denotes object &1).
© Pearson Education Limited 1995, 2005
13
Object Exchange Model (OEM)

An OEM object can be considered as a quadruple
(label, oid, type, value).

For example:
{ Staff, &4, set, { &9, &10} }
{ name, &9, string, “Ann Beech”}
{ salary, &10, decimal, 12000}
© Pearson Education Limited 1995, 2005

14
Lore and Lorel

Lore (Lightweight Object REpository), is a multi-
user DBMS, supporting crash recovery,
materialized views, bulk loading of files in some
standard format (XML is supported), and a
declarative update language.

Has an external data manager that enables data
from external sources to be fetched dynamically
and combined with local data during QP.
© Pearson Education Limited 1995, 2005
15
Lorel

Lorel (the Lore language) is an extension to
OQL. Lorel was intended to handle:

queries that return meaningful results even when
some data is absent;

queries that operate uniformly over single-valued and
set-valued data;

queries that operate uniformly over data with
different types;

queries that return heterogeneous objects;


queries where the object structure is not fully known.
© Pearson Education Limited 1995, 2005
16
Lorel

Supports declarative path expressions for
traversing graph structures and automatic
coercion for handling heterogeneous and typeless
data.

A path expression is essentially a sequence of
edge labels (L
1
.L
2
…L
n
), which for given graph
yields set of nodes. For example:

DreamHome.PropertyForRent yields set of nodes
{ &5, &6} ;

DreamHome.PropertyForRent.street yields set of
nodes containing strings {“2 Manor Rd”, “18 Dale
Rd”} .
© Pearson Education Limited 1995, 2005
17
Lore and Lorel


Also supports general path expression that
provides for arbitrary paths:

‘|’ indicates selection;

‘?’ indicates zero or one occurrences;

‘+’ indicates one or more occurrences;

‘*’ indicates zero or more occurrences.

For example:

DreamHome.(Branch | PropertyForRent).street

would match path beginning with DreamHome,
followed by either a Branch edge or a
PropertyForRent edge, followed by a street edge.
© Pearson Education Limited 1995, 2005
18
Example 30.2 – Example Lorel Queries
Find properties overseen by Ann Beech.
SELECT s.Oversees
FROM DreamHome.Staff s
WHERE s.name = “Ann Beech”

Data in FROM clause contains objects &3 and &4.
Applying WHERE restricts this set to object &4.
Then apply SELECT clause.
© Pearson Education Limited 1995, 2005

19
Example 30.2 – Example Lorel Queries
Answer
PropertyForRent &5
street &11 “2 Manor Rd”
type &12 “Flat”
monthlyRent &13 375
OverseenBy &4
PropertyForRent &6
street &14 “18 Dale Rd”
type &15 1
annualRent &16 7200
OverseenBy &4
© Pearson Education Limited 1995, 2005
20
Example 30.2 – Example Lorel Queries
Find all properties with annual rent.
SELECT DreamHomes.PropertyForRent
FROM DreamHome.PropertyForRent.annualRent
Answer
PropertyForRent &6
street &14 “18 Dale Rd”
type &15 1
annualRent &16 7200
OverseenBy &4
© Pearson Education Limited 1995, 2005
21
Example 30.2 – Example Lorel Queries
Find all staff who oversee two or more
properties.

SELECT DreamHome.Staff.Name
FROM DreamHome.Staff SATISFIES
2 <= COUNT(SELECT DreamHome.Staff
WHERE DreamHome.Staff.Oversees)
Answer
name &9 “Ann Beech”
© Pearson Education Limited 1995, 2005
22
DataGuide

A dynamically generated and maintained
structural summary of database, which
serves as a dynamic schema.

Has three properties:

conciseness: every label path in the database
appears exactly once in the DataGuide;

accuracy: every label path in DataGuide exists
in original database;

convenience: a DataGuide is an OEM (or XML)
object, so can be stored and accessed using same
techniques as for source database.
© Pearson Education Limited 1995, 2005
23
DataGuides
© Pearson Education Limited 1995, 2005
24

DataGuides

Can determine whether a given label path of length
n exists in source database by considering at most
n objects in the DataGuide.

For example, to verify whether path
Staff.Oversees.annualRent exists, need only
examine outgoing edges of objects &19, &21, and
&22 in our DataGuide.

Further, only objects that can follow Branch are
the two outgoing edges of object &20.
© Pearson Education Limited 1995, 2005
25
DataGuides

DataGuides can be classified as strong or weak:

strong is where each set of label paths that share
same target set in the DataGuide is exactly the
set of label paths that share same target set in
source database.
© Pearson Education Limited 1995, 2005

×