Tài liệu Advances in Database Technology- P19 ppt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (716.02 KB, 46 trang )

832
K Y. Lam and H.C.W. Pang
Due to the dynamic properties of sensor data, the probability of satisfying the
condition in a sub-query at a node may change with time. Therefore the coordinator
node needs to reorder the sequence of the participating nodes periodically. The
reorder procedure is performed when the following condition is satisfied: the
evaluation is stopped at the same node, called the false node, consecutively for a pre-
defined period of time, and the false node is not the first node. Satisfaction of these
conditions suggests that the sensor data values generated at the false node may have a
high probability to be false in next evaluation. Hence the coordinator node will
reorder the sequence of the nodes using the following procedure:
a. The false node is now the first node in the sequence.
b. All the nodes following the false node will remain in the same relative order
to each other.
c. All the nodes in front of the false node remain in their original relative order.
They rejoin the node sequence but are now attached after the last node of the
original sequence.
4
Implementation
CMQES is implemented with MICA Motes [MM]. In CMQES, one of the MSPUs is
connected with the base station through a mote interface board. It is the base station
MSPU. CMQES contains two main software components: the sensor program in the
MPSU, and the server program in the base station. The sensor program is
implemented in NesC, which is a C-like language for TinyOS [TINY], and is
responsible for capturing sensor data values, evaluating sub-queries, submitting sub-
query results to the coordinator nodes, and collecting performance statistics at each
MPSU. We have implemented SeqPush in the sensor program. The evaluation results
of a CMQ and performance statistics are periodically sent to the base station through
the base station MSPU for reporting.
The second program is the server program residing at the base station. This
program is implemented in Java, with MS Windows 2000 and MS SQL server chosen

respectively as the operating systems and database. The server program is responsible
for collecting results and performance statistics. In addition, the following parameters
can be set using the server program at the base station for submitting a CMQ and
controlling the operations at the MPSUs:
1. The sampling rate of the MSPUs.
2. The number of nodes participating in a CMQ.
3. Aggregation functions to be performed at a node, i.e., calculate the mean,
maximum and minimum from the values of sub-queries.
4. Condition for the sub-query of a CMQ at an MPSU.
5
Demonstration and Sample Results
In the demonstration, through the interface at the base station, we can submit CMQs
for processing at the MPSUs, as shown in Figure 1. Currently, the MPSU is
programmed to capture the light intensity of its surrounding environment periodically,
Aggregation of Continuous Monitoring Queries
833
i.e., every 2 sec, as sensor data values, and a message is sent every 30 sec to the
coordinator node. The sampling rate and message reporting period can be varied at
the base station. The message size for communication in TinyOS is 34 bytes. Five
bytes are reserved for the message header and the message contains the results from
10 evaluations with each reading taking up 2 bytes. The remaining 9 bytes are for
cycle number, message type and the destination address. Currently, the transmission
delay of a single message from a MSPU to another one in the testing environment is
between 300ms and 700ms. As TinyOS only provides best effort message delivery
service. A lost message will be considered a missed evaluation cycle and logged
accordingly by the coordinator node.
Fig. 1. Program at the base station
Fig. 2.
Real time display of received
messages and statistics

Our experiment results show that the number of messages submitted in central
aggregation scheme (CAS), i.e. all the participating MSPU submits sub-query result
to a central MUPU for data aggregation periodically, is much larger than that in
SeqPush. Two ammeters are connected to one of the participating nodes and the
coordinator node to measure the energy consumption rates of the nodes when
different operations are performed at the nodes.
The result captured by the base station is displayed in real time as shown in Fig. 2.
The statistics include:
(1) Number of message transmitted, including sending and receiving messages.
(2) Number of successful evaluations and number of missed query results due to
loss of messages.
(3) Number of reorder of nodes in SeqPush.
References
[MM] www.xbow.com
[TINY] />eVitae: An Event-Based Electronic Chronicle
Bin Wu, Rahul Singh, Punit Gupta, Ramesh Jain
Experiential Systems Group Georgia Institute of Technology
30308 Atlanta, USA
{binwu,rsingh,jain}@ece.gaetch.edu

Abstract. We present an event based system for storing, managing, and
presenting personal multimedia history. At the heart of our approach is
information organization using the concept of an event. Events allow modeling
of the data in a manner that is independent of media. To store events, a novel
database called EventBase is developed which is indexed by events. The unique
characteristics of events make multidimensional querying and multiple
perspective explorations of personal history information feasible. In this demo
we present the major functions of eVitae.
1
Introduction

Personal history systems electronically record important activities from a person’s life
in the form of photographs, text, video, and audio. Examples of some existing systems
such as [3] have shown that there are a number of salient challenges in this domain.
First, information is fundamentally anchored to space and time and people often
exploit them as cues for querying information. Second, the data as the carrier of
information stays in respective silos. This fragments meaningful information across
data. Third, to break down these silos, an information model independent of media is
required to depict the content of information. Lastly, presentation of the information
must be dynamically generated according to individual users’ preferences.
We have developed a personal eChronicle [1] called eVitae [4]. In eVitae we
utilize a novel generative theory based upon the concept of event to design an
information system [2]. In this paper, we show how eVitae system as an access
environment ingests heterogeneous data into meaningful information conveyed by
events, aids the user to quickly focus on what is of interest and presents a
multidimensional environment for exploration of events with their details stored in
appropriate media.
2
Event-Based Data Modeling
The approach we employ to design and implement eVitae is based on the notion of
events [6]. An event is an occurrence or happening of significance that can be defined
as a region or collection of regions in spatial-temporal-attribute space. Given k events,
E. Bertino et al. (Eds.): EDBT 2004, LNCS 2992, pp. 834–836, 2004.
© Springer-Verlag Berlin Heidelberg 2004
eVitae: An Event-Based Electronic Chronicle
835
the event is formally denoted as and uniquely identified by eID
(event identifier). In this notation t characterizes the event temporally, s denotes the
spatial location(s) associated with the event, and are the attribute associated
with the event. An event is defined by its event models, which includes the mandatory
attributes: space, time, transcluded-media, event-name, and event-topic, and a finite

set of free attributes.
Events can be grouped together in collections, called event categories. Formally, an
event category can be represented as: where is the
set of events that comprise the category. Event categories provide a powerful
construct to support the multiple ways of organizing information, definition of
complex queries and notification, personalized views of information space where the
user is interested.
According to the definition of the event, the information layer implemented by
events breaks down the data silos. This layer uses an event-based data model to
construct a new index that is independent of data type. The organization, indexing,
and storage of events conveying potentially changing information are accomplished
by parsing the data as it is entered, and storing all events in a database of events called
EventBase. The data is parsed by the system and events are produced using the event
model. The EventBase also stores links to original data sources, which means the
system can only present the appropriate media in the context of a particular event.
EventBase is the extension of traditional database. In the implementation of prototype
eVitae system, we use MySQL as the database to store and index events.
3
System Architecture
The architecture of eVitae system comprises three modules, namely, Event Entry,
EventBase, and What-You-See-Is-What-You-Get (WYSIWYG) query and exploration
environments. The key features of the system are briefly discussed as following.
EventBase. EventBase is the backend of eVitae system which stores the Events. The
transclusion of media is maintained by storing links between an event and the data it
is based upon. EventBase uses the eID attribute of events as the unified index and is
supported by MySQL.
WYSIWYG query and exploration environment. This is an integrated interaction
environment to explore electronic chronicle of a person, as shown in figure 1. By
using temporal and spatial relationship as cues and by defining event categories to
organize events, we create an exploratory environment for the user. The event exhibit

panel (Fig. 1) presents an overall picture of events. Options for zooming, filtering,
extraction, viewing relations, history keeping, and details-on-demand make the
environment appealing and powerful.
Event Entry. An event may include multifarious data from different sources, such as
video, audio, sensors, texts. The Event Entry module of the system is designed to
produce events using event models, and record the link between events and related
data.
836
B. Wu et al.
Fig. 1.
WYSIWYG Query and Exploration Environment
4
Conclusion
eVitae system demonstrates an event-based approach for organization, storage,
management, and querying of personal history information comprising of multifarious
data. The system provides an event entry module for importing data from
heterogeneous data sources and assimilating them into events. This information is
stored in EventBase which is indexed by events. Event-based modeling allows
multidimensional querying and exploration of personal history information.
Furthermore, flexibility in event definition and organization allows exploration of the
data from multiple perspectives. Preliminary results indicate that an event-based
system not only offers significant advantages in information organization, but also in
exploration and discovery of information from data.
References
R. Jain. “Multimedia Electronic Chronicles”, IEEE MultiMedia, pp. 102-103, Volume
10, Issue 03, July 2003.
R. Jain, “Events in Heterogeneous Data Environments”, Proc. International Conference on
Data Engineering, Bangalore, March 2003.
J. Gemmell, G. Bell, R. Lueder, S. Drucker, and C. Wong. “MyLifeBits: fulfilling the
Memex vision”, ACM Multimedia, pp. 235-238, ACM, 2002.

R. Singh, B. Wu, P. Gupta, R. Jain. “eVitae: Designing Experiential eChronilces”, ESG
Technical Report Number : GT-ESG-01-10-03, />ESG-01-10-03.pdf
1.
2.
3.
4.
CAT: Correct
A
nswers of Continuous Queries
Using Triggers
Goce Trajcevski
1
, Peter Scheuermann
1
*, Ouri Wolfson
2
, and
Nimesh Nedungadi
2
1
Department of Electrical and Computer Engineering, Northwestern University,
Evanston, Il 60208,
{goce,peters}@ece.northwestern.edu
2
Department of Computer Science, University of Illinois at Chicago,
Chicago, Il 60607,
{wolfson,nnedunga}@cs.uic.edu
1
I
ntroduction and Motivation

Consider the query Q1: Retrieve all the motels which will be no further then 1.5
miles from my route, sometime between 7:00PM and 8:30PM, which a mobile
user posed to the Moving Objects Database (MOD). Processing such queries
is of interest to wide range of applications (e.g. tourist information systems
and context awareness [1,2]). These queries pertain to the future of a dynamic
world. Since MOD is only a model of the objects moving in the real world, the
accuracy of the representation has to be continuously verified and updated, and
the answer-set of Q1 has to be re-evaluated in every clock-tick
1
. However, the
re-evaluation of such queries can be avoided if an update to the MOD does not
affect the answer-set.
The motion of the moving object is typically represented as a trajectory –
a sequence of 3D points and its projection
in the X- Y plane is called a route. The details of the construction based on
electronic maps and the speed-profiles of the city blocks are given in [5]. After a
trajectory
is
constructed,
a
traffic
abnormality
may
occur
at a
future
time-point,
due to an accident, road-work, etc , and once it occurs, we need to: identify the
trajectories that are affected, and update them properly (c.f. [5]). In the sequel,
we focus on the impacts of the abnormalities to the continuous queries.

Figure 1 shows three trajectories – and and their respective
routes and If a road-work starts at 4:30PM on the segment between
A and B which will last 5 hours, slow down the speed between 4:30PM and
9:30PM. enters that segment after 4:30PM, and its future portion will need
to be modified. As illustrated by the thicker portion of instead of being at
the point B at 4:50, the object will be there at 5:05. A key observation is that
if the object, say whose trajectory is issued the query Q
1
, we have to
re-evaluate the answer.
*
Research partially supported by NSF grant EIA-000 0536
1
Hence the name continuous queries – formally defined for MOD in [3].
E. Bertino et al. (Eds.): EDBT 2004, LNCS 2992, pp. 837–840, 2004.
© Springer-Verlag Berlin Heidelberg 2004
838
G. Trajcevski et al.
Fig. 1. Trajectories, Routes and Updates
2
System Architecture
Figure 2 illustrates the main components and behavioral aspects of the CAT
system:
There are tables which: store the trajectories (MOT) and landmarks of inter-
est (BUILDINGS); keep track of the queries posed and their answers (PEND-
ING-QUERIES and ANSWERS); and store the information about the traffic
abnormalities (TRAFFIC_ABN). The trajectories and the landmarks were ob-
tained using the real maps of Cook County, Chicago.
In the heart of the CAT system is the set of Triggers, part of which we
illustrate in the context of the example query Q

1
. I
f
posed the query Q1
at 3:30PM, its answer contains the motels and to which should be
close at 7:55PM and 8:20PM, respectively. When an abnormality is detected, its
relevant parameters are inserted in the TRAFFIC_ABN table. This, however,
“awakes” whose event is:
ON INSERT TO TRAFFIC_ABN
checks if the abnormality affects some of the trajectories stored in the
MOT and, if so, updates their future part. However, the action part of
which updates the MOT, in turn, is the event for
ON UPDATE TO MOT
IF HAS_PENDING_QUERIES
checks if the corresponding moving object has posed some queries
which are still of interest (i.e. their time has not “expired”). The condition of
is satisfied by and its action part will re-evaluate the query Q
1
,
based on the new future-portion of Due to the delay, trajectory will
CAT:
C
orrect
A
nswers of Continuous Queries Using T
riggers
839
Fig. 2. Behavioral Aspects of the CAT
be near at 8:35PM, which is a bit too late for the user. On the other hand,
will be near the motel at 7:05PM. Before the traffic incident was not

part of the answer, (it would have had the desired proximity at 6:50PM).
3
Demonstration
All the back-end components are implemented using Oracle 9i as a server. We
used User-Defined Types (UDT) to model the entities and User-Defined Func-
tions (UDF) to implement the processing, exploiting the Oracle Spatial predi-
cates.
The front-end client, which is the GUI presented to the end-user, is imple-
mented in Java. The GUI gives the options of specifying the queries (i.e. time
of submission; relevant time interval; objects of interest; etc ). Once the user
clicks the SUBMIT button, the query is evaluated and its answer is displayed.
In the server, the query is assigned an id number and it is stored in the PEND-
ING_QUERIES table. Clearly, in a real MOD application, the client will be
either a wireless (mobile) user of a web browser-based one, properly interfaced
to the server.
To test the execution of the triggers and the updates of the answer(s) to the
continuous queries posed, the GUI offers a window for generating a traffic ab-
normality. The user enters the beginning and the end times of the incident as
well as its “type” (which determines the impact on the speed-profile). He also
enters the route segments along which the traffic incident is spread. The moment
this information is submitted to the server, the affected trajectories are updated
AND the new answer (s) to the posed continuous queries are displayed back to
the user.
840
G. Trajcevski et al.
References
A. Hinze and A. Voisard. Location-and time-based information delivery in tourism.
In SSTD, 2003.
A. Pashtan, R. Blatter, A. Heusser, and P. Scheuermann. Catis: A context-aware
tourist information system. In IMC, 2003.

A. P. Sistla, O. Wolfson, S. Chamberlain, and S. Dao. Modeling and querying
moving objects. In ICDE, 1997.
G. Trajcevski and P. Scheuermann. Triggers and continuous queries in moving
objects
databases.
In
MDDS,
2003.
G. Trajcevski, O. Wolfson, B. Xu, and P. Nelson. Real-time traffic updates in moving
object databases. In MDDS, 2002.
1.
2.
3.
4.
5.
Hippo: A System for Computing Consistent
Answers to a Class of SQL Queries
Jan Chomicki
1
, Jerzy Marcinkowski
2
, and Slawomir Staworko
1
1
Dept. Computer Science and Engineering, University at Buffalo
{chomicki,staworko}@cse.buffalo.edu
2
Instytut Informatyki, Wroclaw Uniwersity, Poland

1

Motivation and Introduction
Integrity constraints express important properties of data, but the task of pre-
serving data consistency is becoming increasingly problematic with new database
applications. For example, in the case of integration of several data sources, even
if the sources are separately consistent, the integrated data can violate the in-
tegrity constraints. The traditional approach, removing the conflicting data, is
not a good option because the sources can be autonomous. Another scenario
is a long-running activity where consistency can be violated only temporarily
and future updates will restore it. Finally, data consistency may be neglected
because of efficiency or other reasons.
In [1] Arenas, Bertossi, and Chomicki have proposed a theoretical framework
for querying inconsistent databases. Consistent query answers are defined to be
those query answers that are true in every repair of a given database instance. A
repair is a consistent database instance obtained by changing the given instance
using a minimal set of insertions/deletions. Intuitively, consistent query answers
are independent of the way the inconsistencies in the data would be resolved.
Example 1. Assume that an instance of the relation Student is as follows:
Assume also that the functional dependency is given. The
above instance has two repairs: one obtained by deleting the first tuple, the
other - by deleting the second tuple. A query asking for the address of Jones
returns Chicago as a consistent answer because the third tuple is in both repairs.
However, the query asking for the address of Smith has no consistent answers
because the addresses in different repairs are different. On the other hand, the
query asking for those people who live in Los Angeles or New York returns Smith
as a consistent answer.
This conservative definition of consistent answers has one shortcoming: the
number of repairs. Even for a single functional dependency, the number of repairs
E. Bertino et al. (Eds.): EDBT 2004, LNCS 2992, pp. 841–844, 2004.
© Springer-Verlag Berlin Heidelberg 2004
842

J. Chomicki, J. Marcinkowski, and S. Staworko
can be exponential in the number of tuples in the database [3]. Nevertheless,
several practical mechanisms for the computation of consistent query answers
without computing all repairs have been developed (see [5] for a survey): query
rewriting [1], logic programs [2,4,9], and compact representations of repairs [6,
7]. The first is based on rewriting the input query Q into a query such that
the evaluation of returns the set of consistent answers to Q. This method
works only for SJD
1
queries in the presence of universal binary constraints. The
second approach uses disjunctive logic programs to specify all repairs, and then
with the help of a disjunctive LP system [8] finds the consistent answers to a
given query. Although this approach is applicable to very general queries in the
presence of universal constraints, the complexity of evaluating disjunctive logic
programs makes this method impractical for large databases.
2
The System Hippo
The system Hippo is an implementation of the third approach. All information
about integrity violations is stored in a conflict hypergraph. Every hyperedge
connects the tuples violating together an integrity constraint.
Using the conflict hypergraph, we can find if a given tuple belongs to the set
of consistent answers without constructing all repairs [6]. Because the conflict
hypergraph has polynomial size, this method has polynomial data complexity
and allows us to efficiently deal even with large databases [7]. Currently, our ap-
plication computes consistent answers to SJUD queries in the presence of denial
constraints (a class containing functional dependency constraints and exclusion
constraints). Allowing union in the query language is crucial for being able to
extract indefinite disjunctive information from an inconsistent database (see Ex-
ample 1).
Future work includes the support for restricted foreign key constraints, uni-

versal tuple-generating dependencies and full PSJUD
2
queries. However, because
computing consistent query answers for SPJ queries is co-NP-data-complete [3,
6], polynomial data complexity cannot be guaranteed once projection is allowed.
The whole system is implemented in Java as an RDBMS frontend. Hippo
works with any RDBMS that can execute SQL queries, and provides a JDBC
access interface (we use PostgreSQL). The data stored in the RDBMS needs not
be altered.
The flow of data in Hippo is presented on Figure 1. Before processing any
input query, the system performs Conflict Detection and creates Conflict Hyper-
graph for further usage. We are assuming that the number of conflicts is small
enough for the hypergraph to be stored in main memory. The only output of
this system is the Answer Set consisting of the consistent answers to the input
1
When describing a query class, P stands for projection, S for selection, U for union,
J for cartesian product, and D for difference.
2
Currently, our application supports only those cases of projection that don’t intro-
duce existential quantifiers in the corresponding relational calculus query.
Hippo: A System for Computing Consistent Answers
843
Fig. 1. Data flow in Hippo
Query in the database instance DB with respect to a set of integrity constraints
IC
.
The processing of the Query starts from Enveloping. As a result of this step
we get a query defining Candidates (candidate consistent query answers). This
query subsequently undergoes Evaluation by the RDBMS. For every tuple from
the set of candidates, the system uses Prover to check if the tuple is a consistent

answer to the Query. Depending on the result of this check, the tuple is either
added to the Answer Set or not.
For every tuple that Prover processes, several membership checks have typi-
cally to be performed. In the base version of the system this is done by simply
executing the appropriate membership queries on the database. This is a costly
procedure and it has a significant influence on the overall time performance of
the system. We have introduced several optimizations addressing this problem.
In general, by modifying the expression defining the envelope (the set of can-
didates) the optimizations allow us to answer the required membership checks
without executing any queries on the database. Also, using an expression select-
844
J. Chomicki, J. Marcinkowski, and S. Staworko
ing a subset of the set of consistent query answers, we can significantly reduce
the number of tuples that have to be processed by Prover. A more detailed
description of those techniques can be found in [7].
3
Demonstration
The presentation of the Hippo system will consist of three parts. First, we will
demonstrate that using consistent query answers we can extract more informa-
tion from an inconsistent database than in the approach where the input query is
evaluated over the database from which the conflicting tuples have been removed.
Secondly, we will show the advantages of our method over competingapproaches
by demonstrating the expressive power of supported queries and integrity con-
straints. And finally, we will compare the running times of our approach and
the query rewriting approach, showing that our approach is more efficient. For
every query being tested, we will also measure the execution time of this query
by the RDBMS backend (it corresponds to the approach when we ignore the
fact that the database is inconsistent). This will allow us to conclude that the
time overhead of our approach is acceptable.
References

M. Arenas, L. Bertossi, and J. Chomicki. Consistent Query Answers in Inconsistent
Databases. In ACM Symposium on Principles of Database Systems (PODS), pages
68–79, 1999.
M. Arenas, L. Bertossi, and J. Chomicki. Answer Sets for Consistent Query An-
swering in Inconsistent Databases. Theory and Practice of Logic Programming,
3(4–5):393–424, 2003.
M. Arenas, L. Bertossi, J. Chomicki, X. He, V. Raghavan, and J. Spinrad. Scalar
Aggregation in Inconsistent Databases. Theoretical Computer Science, 296(3) :405–
434, 2003.
P. Barcelo and L. Bertossi. Logic Programs for Querying Inconsistent Databases.
In International Symposium on Practical Aspects of Declarative Languages (PADL),
pages 208–222. Springer–Verlag, LNCS 2562, 2003.
L. Bertossi and J. Chomicki. Query Answering in Inconsistent Databases. In
J. Chomicki, R. van der Meyden, and G. Saake, editors, Logics for Emerging Appli-
cations of Databases. Springer-Verlag, 2003.
J. Chomicki and J. Marcinkowski. Minimal-Change Integrity Maintenance Using
Tuple Deletions. Technical Report cs.DB/0212004, arXiv.org e-Print archive, De-
cember 2002. Under journal submission.
J. Chomicki, J. Marcinkowski, and S. Staworko. Computing Consistent Query An-
swers Using Conflict Hypergraphs. In preparation.
T. Eiter, W. Faber, N. Leone, and G. Pfeifer. Declarative Problem-Solving in DLV.
In J. Minker, editor, Logic-Based Artificial Intelligence, pages 79–103. Kluwer, 2000.
G. Greco, S. Greco, and E. Zumpano. A Logical Framework for Querying and
Repairing Inconsistent Databases. IEEE Transactions on Knowledge and Data En-
gineering, 15(6):1389–1408, 2003.
1.
2.
3.
4.
5.

6.
7.
8.
9.
An Implementation of P3P Using Database
Technology
Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, and Yirong Xu
IBM Almaden Research Center
650 Harry Road, San Jose, CA 95120, USA
{ragrawal,kiernan,srikant}@almaden.ibm.com,
/>1
Introduction
The privacy of personal information on the Internet has become a major concern
for governments, businesses, media, and the public. Platform for Privacy Pref-
erences (P3P), developed by the World Wide Web Consortium (W3C), is the
most significant effort underway to enable web users to gain more control over
their private information. P3P provides mechanisms for a web site to encode its
data-collection and data-use practices in a standard XML format, known as a
P3P policy [3], which can be programmatically checked against a user’s privacy
preferences.
This demonstration presents an implementation of the server-centric archi-
tecture for P3P proposed in [1]. The novel aspect of this implementation is that
it makes use of the proven database technology, as opposed to the prevailing
client-centric implementation based on specialized policy-preference matching
engines. Not only does this implementation have qualitative advantages, our
experiments indicate that it performs significantly better (15-30 times faster)
than the sole public-domain client-centric implementation and that the latency
introduced by preference matching is small enough (0.16 second on average) for
real-worl
d

deployments of P3P [1].
2
Overview of P3P
The P3P protocol has two parts:
Privacy Policies: An XML format in which a web site can encode its data-
collection and data-use practices [3]. For example, an online bookseller can
publish a policy which states that it uses a customer’s name and home
phone number for telemarketing purpose, but that it does not release this
information to external parties.
Privacy Preferences: An XML format for specifying privacy preferences and
an algorithm for programmatically matching preferences against policies.
The W3C APPEL working draft provides such a format and corresponding
policy-preference matching algorithm [2]. For example, a privacy-conscious
consumer may define a preference stating that she does not want retailers
to use her personal information for telemarketing and product promotion.
1.
2.
E. Bertino et al. (Eds.): EDBT 2004, LNCS 2992, pp. 845–847, 2004.
© Springer-Verlag Berlin Heidelberg 2004
846
R. Agrawal et al.
Fig. 1. Client-centric policy-preference matching
2.1
Client-Centric Implementation
A client-centric architecture for implementing P3P has been described in [4].
As a user browses a web site, the site’s P3P policy is fetched to the client side.
The policy is then checked by a specialized APPEL engine against the user’s
APPEL preference to see if the policy conforms to the preference (see Figure 1).
There are two prominent implementations of this architecture: Microsoft IE6
and AT&T Privacy Bird.

2.2
Server-Centric Implementation
Figure 2 shows the server-centric architecture we have developed. A web site de-
ploying P3P first installs its privacy policy in a database system. Then database
querying is used for matching a user’s privacy preference against privacy policies.
The server-centric implementation has several advantages including: setting up
the infrastructure necessary for ensuring that web sites act according to their
stated policies, allowing P3P to be deployed in thin, mobile clients that are likely
to dominate Internet access in the future, and allowing site owners to refine their
policies based on the privacy preferences of their users.
3
Demonstration
Our implementation consists of both client and server components.
3.1
Client Components
We extend Microsoft Internet Explorer to invoke preference checking at the
server before a web page is accessed. The IE extension allows a user to specify
her privacy preference at different sensitivity levels. It invokes the preference
checking by sending the preference to the server.
An Implementation of P3P Using Database Technology
847
Fig. 2. Server-centric policy-preference matching
3.2
Server Components
We define a schema in DB2 for storing policy data in the relational tables. This
schema contains a table for every element defined in the P3P policy. The tables
are linked using foreign keys reflecting the XML structure of the policies. We
extend IBM Tivoli Privacy Wizard (a web-based GUI tool for web site owners to
define P3P policies) with the functionality of parsing and shredding P3P policies
as a set of records in the database tables.

When the server receives the APPEL preference from the client, it translates
the preference into SQL queries to be run against the policy tables. The SQL
queries corresponding to the preference are submitted to the database engine.
The result of the query evaluation yields the action to be taken. The evaluation
result is sent back to the client. If the policy does not conform to the preference,
the IE extension will block the web page and prompt a message to the user.
Otherwise, the requested web page is displayed.
References
Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, and Yirong Xu. Implement-
ing P3P using database technology. In 19th Int’l Conference on Data Engineering,
Bangalore, India, March 2003.
Lorrie Cranor, Marc Langheinrich, and Massimo Marchiori. A P3P Preference Ex-
change Language 1.0 (APPEL1.0). W3C Working Draft, April 2002.
Lorrie Cranor, Marc Langheinrich, Massimo Marchiori, Martin Presler-Marshall,
and Joseph Reagle. The Platform for Privacy Preferences 1.0 (P3P1.0) Specifica-
tion. W3C Recommendation, April 2002.
The World Wide Web Consortium. P3P 1.0: A New Standard in Online Privacy.
Available from />1.
2.
3.
4.
XQBE: A Graphical Interface for XQuery
Engines
Daniele Braga, Alessandro Campi, and Stefano Ceri
Politecnico di Milano
Piazza Leonardo da Vinci,32 - 20133 Milano, Italy
{braga,campi,ceri}@elet.polimi.it
Abstract. XQuery is increasingly popular among computer scientists
with a SQL background, since queries in XQuery and SQL require com-
parable skills to be formulated. However, the number of these experts

is limited, and the availability of easier XQuery “dialects” could be ex-
tremely valuable. Something similar happened with QBE, initially pro-
posed as an alternative to SQL, that has then become popular as the
user-friendly query language supported by MS Access. We designed and
implemented XQBE, a visual dialect of XQuery that uses hierarchical
structures, coherent with the hierarchical nature of XML, to denote the
input and output documents.
Our demo consists of examples of queries in XQBE and shows how our
prototype allows to switch between equivalent representations of the
same query.
1
Introduction
The diffusion of XML sets a pressing need for providing the capability to query
XML data to a wide spectrum of users, typically lacking in computer program-
ming skills. This demonstration presents a user friendly interface, based on an in-
tuitive visual query language (XQBE, XQuery By Example), that we developed
for this purpose, inspired by the QBE [2]. QBE showed that a visual interface to
a query language is effective in supporting the intuitive formulation of queries
when the basic graphical constructs are close to the visual abstraction of the
underlying data model. Accordingly, while QBE is a relational query language,
based on the representation of tables, XQBE is based on the use of annotated
trees, to adhere to the hierarchical nature of XML. XQBE was designed with the
objectives of being intuitive and easy to map directly to XQuery. Our interface is
capable of generating the visual representation of many XQuery statements that
belong to a subset of XQuery, defined by our translation algorithm (sketched
later).
XQBE allows for arbitrarily deep nesting of XQuery FLWOR expressions,
construction of new XML elements, and restructuring of existing documents.
However, the expressive power of XQBE is limited in comparison with XQuery,
which is Turing-complete. The particular purpose of XQBE makes usability one

E. Bertino et al. (Eds.): EDBT 2004, LNCS 2992, pp. 848–850, 2004.
© Springer-Verlag Berlin Heidelberg 2004
XQBE: A Graphical Interface for XQuery Engines
849
Fig. 1. A sample document (bib.xml)
of its critical success factors, and we considered this aspect during the whole de-
sign and implementation process. Still from a usability viewpoint, our prototype
is a first step towards an integrated environment to support both XQuery and
XQBE, where users alternate between the XQBE and XQuery representations.
2
XQuery by Example
XQBE is fully described in [1]. Here we only introduce its basics by means of the
query (Q1) “List books published by Addison-Wesley after 1991, including their
year and title”, on the data in Figure 1. Its XQBE version is in Figure 2(a),
while its XQuery version is
A query is formed by a source part (on the left) and a construct part (on
the right). Both parts contain labelled graphs that express properties of XML
fragments: the source part describes the XML data to be matched, the construct
part specifies which are to be retained. Correspondence between the two parts
is expressed by explicit bindings. XML elements in the target are represented as
labelled rectangles, their attributes as black circles (with the name on the arc),
and their PCDATA as an empty circle. In the construct part, the paths that
branch out of a bound node indicate which of its contents are to be retained. In
Figure 2(a)
the
source
part
matches
the
book

elements with
a
year
greater
than
1991 and a publisher equal to “Addison-Wesley”. The binding edge between the
book
nodes
states
that
the
result shall contain
as
many
book
elements
as
those
matched.
The
trapezoidal
bib
node means
that
all the
generated
books
are to be
contained into one bib element.
The translation process translates an XQBE query into a sentence of the

XQuery subset defined by the grammar in figure 3.
The generated translation of Q1 is:
It is also possible to obtain the XQBE version of an XQuery statement. The
automatically generated XQBE version of Q1 is shown in Figure 2(b).
850
D. Braga, A. Campi, and S. Ceri
Fig. 2. The XQBE version of Q1(a) and the automatically generated XQBE for Q1(b).
Fig. 3. EBNF specification of the XQuery subset expressible with XQBE
3
Conclusions
The contribution of our work is the availability of an environment in which users
can query XML data with a GUI, access the generated XQuery statement, and
also visualize the XQBE version of a large class of XQuery statements. Moreover
they can modify any of the representations and observe the changes in the other
representation.
References
D. Braga and A. Campi. A graphical environment to query xml data with xquery.
In Proc. of the 4th WISE, Roma (Italy), December 2003.
M. M. Zloof. Query-by-example: A data base language. IBM Systems Journal, 1977.
1.
2.
P2P-DIET: One-Time and Continuous Queries
in Super-Peer Networks
Stratos Idreos, Manolis Koubarakis, and Christos Tryfonopoulos
Dept. of Electronic and Computer Engineering
Technical University of Crete, GR73100 Chania, Greece
{sidraios
,
manolis
,

trifon}@intelligence.tuc.gr
1
Introduction
In peer-to-peer (P2P) systems a very large number of autonomous computing
nodes (the peers) pool together their resources and rely on each other for data
and services. P2P systems are application level virtual or overlay networks that
have emerged as a natural way to share data and resources. The main applica-
tion scenario considered in recent P2P data sharing systems is that of one-time
querying: a user poses a query (e.g., “I want music by Moby”) and the system
returns a list of pointers to matching files owned by various peers in the network.
Then, the user can go ahead and download files of interest. The complementary
scenario of selective dissemination of information (SDI) or selective information
push is also very interesting. In an SDI scenario, a user posts a continuous query
to the system to receive notifications whenever certain resources of interest ap-
pear in the system (e.g., when a song of Moby becomes available). SDI can be as
useful as one-time querying in many target applications of P2P networks ranging
from file sharing, to more advanced applications such as alert systems for digital
libraries, e-commerce networks etc.
At the Intelligent Systems Laboratory of the Technical University of Crete,
we have recently concentrated on the problem of SDI in P2P networks in the
context of project DIET ( Our work, summarized
in [3], has culminated in the implementation of P2P-DIET, a service that uni-
fies one-time and continuous query processing in P2P networks with super-
peers. P2P-DIET is a direct descendant of DIAS, a Distributed Information
Alert System for digital libraries, that was presented in [4]. P2P-DIET combines
one-time querying as found in other super-peer networks and SDI as proposed
in DIAS. P2P-DIET has been implemented on top of the open source DIET
Agents Platform (http://diet–agents.sourceforge.net/) and it is available
at
/>2

The System P2P-DIET
A high-level view of the P2P-DIET architecture is shown in Figure 1(a) and
a layered view in Figure 1(b). There are two kinds of nodes: super-peers and
clients. All super-peers are equal and have the same responsibilities, thus the
E. Bertino et al. (Eds.): EDBT 2004, LNCS 2992, pp. 851–853, 2004.
© Springer-Verlag Berlin Heidelberg 2004
852
S.
Idreos, M. Koubarakis, and C. Tryfonopoulos
Fig. 1. The architecture and the layered view of P2P-DIET
super-peer subnetwork is a pure P2P network. Each super-peer serves a fraction
of the clients and keeps indices on the resources of those clients.
Clients can run on user computers. Resources (e.g., files in a file-sharing
application) are kept at client nodes, although it is possible in special cases
to store resources at super-peer nodes. Clients are equal to each other only
in terms of download. Clients download resources directly from the resource
owner client. A client is connected to the network through a single super-peer
node, which is the access point of the client. It is not necessary for a client
to be connected to the same access point continuously since client migration
is supported in P2P-DIET. Clients can connect, disconnect or even leave from
the system silently at any time. To enable a higher degree of decentralization
and dynamicity, we also allow clients to use dynamic IP addresses. Routing of
queries (one-time or continuous) is implemented using minimum weight spanning
trees for the super-peer subnetwork. After connecting to the network, a client
may publish resources by sending resource metadata to its access point, post an
one-time query to discover matching resources or subscribe with a continuous
query to be notified when resources of interest are published in the future. A
user may download a file at the time that he receives a notification, or save it in
his saved notifications folder for future use. Additionally a client can download
a resource even when he has migrated to another access point. The feature of

stored notifications guarantees that notifications matching disconnected users
will be delivered to them upon connection. If a resource owner is disconnected,
the interested client may arrange a rendezvous with the resource. P2P-DIET
also offers the ability to add or remove super-peers. Additionally, it supports a
simple fault-tolerance protocol based on are-you-alive messages. Finally, P2P-
DIET provides message authentication and message encryption. For the detailed
protocols see [5].
P2P-DIET: One-Time and Continuous Queries in Super-Peer Networks
853
The current implementation of P2P-DIET to be demonstrated supports the
model [4] and it is currently been extended to support [4]. Each
super-peer utilises efficient query processing algorithms based on indexing of re-
source metadata and queries and a hierarchical organisation of queries (poset)
that captures query subsumption as in [1]. A sophisticated index that exploits
commonalities between continuous queries is maintained at each super-peer, en-
abling the quick identification of the continuous queries that match incoming
resource metadata. In this area, our work extends and improves the indexing
algorithms of SIFT [6] and it is reported in [2].
References
A. Carzaniga and D.S. Rosenblum and A.L. Wolf. Design and evaluation of a wide-
area event notification service. ACM Transactions on Computer Systems, 19(3) :332–
383, August 2001.
C. Tryfonopoulos and M. Koubarakis. Selective Dissemination of Information in
P2P Networks: Data Models, Query Languages, Algorithms and Computational
Complexity. Technical Report TUC-ISL-02-2003, Intelligent Systems Laboratory,
Dept. of Electronic and Computer Engineering, Technical University of Crete, July
2003.
M. Koubarakis and C. Tryfonopoulos and S. Idreos and Y. Drougas. Selective Infor-
mation Dissemination in P2P Networks: Problems and Solutions. ACM SIGMOD
Record, Special issue on Peer-to-Peer Data Management, K. Aberer (editor), 32(3),

September 2003.
M. Koubarakis and T. Koutris and C. Tryfonopoulos and P. Raftopoulou . Informa-
tion Alert in Distributed Digital Libraries: The Models, Languages and Architecture
of DIAS. In Proceedings of the 6th European Conference on Research and Advanced
Technology for Digital Libraries (ECDL 2002), volume 2458 of Lecture Notes in
Computer Science, pages 527–542, September 2002.
S. Idreos and M. Koubarakis. P2P-DIET: A Query and Notification Service Based
on Mobile Agents for Rapid Implementation of P2P Applications. Technical Report
TUC-ISL-01-2003, Intelligent Systems Laboratory, Dept. of Electronic and Com-
puter Engineering, Technical University of Crete, June 2003.
T.W. Yan and H. Garcia-Molina. Index structures for selective dissemination
of
information under the boolean model. ACM Transactions on Database Systems,
19(2):332–364, 1994.
1.
2.
3.
4.
5.
6.
HEAVEN
A Hierarchical Storage and Archive Environment for
Multidimensional Array Database Management Systems
Bernd Reiner and Karl Hahn
FORWISS (Bavarian Research Center for Knowledge Based Systems)
Technical University Munich
Boltzmannstr. 3, D-85747 Garching b. München, Germany
{reiner,hahnk}@forwiss.tu-muenchen.de
Abstract. The intention of this paper is to present HEAVEN, a solution of
intelligent management of large-scale datasets held on tertiary storage systems.

We introduce the common state of the art technique storage and retrieval of
large spatio-temporal array data in the High Performance Computing (HPC)
area. An identified major bottleneck today is fast and efficient access to and
evaluation of high performance computing results. We address the necessity of
developing techniques for efficient retrieval of requested subsets of large
datasets from mass storage devices. Furthermore, we show the benefit of
managing large spatio-temporal data sets, e.g. generated by simulations of
climate models, with Database Management Systems (DMBS). This means
DBMS need a smart connection to tertiary storage systems with optimized
access strategies. HEAVEN is based on the multidimensional array DBMS
RasDaMan.
1
Introduction
Large-scale scientific experiments often generate large amounts of multidimensional
data sets. Data volume may reach hundreds of terabytes (up to petabytes). Typically,
these data sets are stored as files permanently in an archival mass storage system on
up to thousands of magnetic tapes. The access times and/or transfer times of these
kinds of tertiary storage devices, even if robotically controlled, are relatively slow.
Nevertheless, tertiary storage systems are currently the common state of the art
storing such large volumes of data. Concerning data access in HPC area the main
disadvantages are high access latency compared to hard disk devices and to have no
direct access. A major bottleneck for scientific application is the missing possibility of
accessing specific subsets of data. If only a subset of such a large data set is required,
the whole file must be transferred from tertiary storage media. Taking into account
the time required to load, search, read, rewind and unload several cartridges, it can
take many hours/days to retrieve a subset of interest from a large data set. Entire files
must be loaded from the magnetic tape, even if only a subset of the file is needed for a
further processing. The processing with data across a multitude of data sets, for
example, time slices is hard to support. Evaluation of search criteria requires network
transfer of each required data set, implying sometimes a prohibitively immense

E. Bertino et al. (Eds.): EDBT 2004, LNCS 2992, pp. 854–857, 2004.
© Springer-Verlag Berlin Heidelberg 2004
HEAVEN
855
Fig. 1. HEAVEN system architecture
amount of data to be shipped. Hence, many interesting and important evaluations
currently are impossible. Another disadvantage is that access to data sets is done on
an inadequate semantic level. Applications accessing HPC data have to deal with
directories, file names, and data formats instead of accessing multidimensional data in
terms of simulation space or time interval. Examples of large-scale HPC data are
climate-modeling simulations, cosmological experiments and atmospheric data
transmitted by satellites. Such natural phenomena can be modeled as spatio-temporal
array data of some specific dimensionality. Their common characteristic is that a huge
amount of Multidimensional Discrete Data (MDD) has to be stored. For overcoming
the above mentioned shortcomings and for providing flexible data management of
spatio-temporal data we implemented HEAVEN (Hierarchical Storage, and Archive
Environment for Multidimensional Array Database Management Systems).
2
HEAVEN System Architecture
HEAVEN combines the advantages of storing big amounts of data and the realization
of efficient data access and management with DBMS. This means the DBMS must be
extended with easy to use functionalities to automatically store and retrieve data
to/from tertiary storage systems without user interaction. We implemented such
intelligent concepts and integrated it into the kernel of the first commercial
multidimensional array DBMS RasDaMan (Raster Data Management). RasDaMan is
specially designed for generic multidimensional array data of arbitrary size and
dimensionality.
Fig. 1 depicts the architecture of the extended RasDaMan system (client/server
architecture with the conventional DBMS) with tertiary storage connection. We
realize the tertiary storage support by connecting a Hierarchical Storage Management

(HSM) System with RasDaMan. Such HSM-Systems have been developed to manage
tertiary storage archive systems and can handle thousands of tertiary storage media.
The virtual file system of HSM-Systems is separated into a limited cache on which
the user works (load or store his data) and a tertiary storage system with robot
controlled tape libraries. The HSM-System automatically migrates or stages data to or
from the tertiary storage media, if necessary. For realizing the retrieval of subsets of
large data sets (MDDs) RasDaMan stores MDDs subdivided into sub-arrays (called
tiles). Detailed information about tiling can be found in [1, 3, 5]. Tiles are in
RasDaMan the smallest unit of data access. The size of tiles (32 KByte to 640 KByte)
856
B. Reiner and K. Hahn
is optimized for main memory and hard disk access. Those tile sizes are much too
small for data sets held on tertiary storage media [2]. It is necessary to choose
different granularities for hard disk access and tape access, because they differ
significantly in their access characteristics (random vs. sequential access). A
promising idea is to introduce additional data granularity as provided by the new
developed Super-Tile concept. The main goal of the Super-Tile concept is a smart
combination of several small MDD tiles to one Super-Tile for minimizing tertiary
storage access costs. Smart means, exploiting the good transfer rate of tertiary storage
devices, and to take advantage of other concepts like clustering of data. Super-Tiles
are the access (import/export) granularity of MDD on tertiary storage media.
Extensive tests have shown that a Super-Tile size of about 150 MByte shows good
performance characteristics in most cases. The retrieval of data stored on hard disk or
on tertiary storage media is transparent for the user. Only the access time is higher if
data stored on tertiary storage media. Three further strategies for reducing tertiary
storage access time are clustering, query scheduling and caching. Please find more
information in [4].
3
Demonstration
We will demonstrate HEAVEN using the visual front-end RView, to interactively

submit RasQL (RasDaMan query language) queries and display result sets containing
1-D to 4-D data. For demonstrating tertiary storage access we will use our own
developed HSM-System with a connected SCSI DDS-4 tape drive. Demonstration
will start by showing sample retrieval, thereby introducing basic RasQL concepts.
Queries will encompass both, search and array manipulation operations. We will
show typical access cases like retrieval of subsets and data access across a multitude
of objects. Furthermore, performance comparison of DBMS cache area access, HSM
cache access, HSM tape access and traditional access will be presented. Next, selected
queries will serve to demonstrate the effect of several optimization concepts, like inter
and intra Super-Tile clustering. Also performance improvements regarding query
scheduling and caching algorithms will be shown. This allows discussing, how they
contribute to overall performance. Finally, implications of physical data organization
within RasDaMan (tiling strategies) and on tertiary storage media (Super-Tile
concept) will be presented. Queries such as sub-cube extraction and cuts along
different space axes indicate strengths and weaknesses of particular tiling and Super-
Tile schemata. Various tiling strategies are offered as a database tuning possibility
similar to indexes for optimal query performance.
References
Chen L. T., Drach R., Keating M., Louis S., Rotem D., Shoshani A.: Efficient organization
and access of multi-dimensional datasets on tertiary storage, Information Systems, vol. 20,
no. 2, p. 155-183, 1995
Chen L. T., Rotem D., Shoshani A., Drach R.: Optimizing Tertiary Storage Organization
and Access for Spatio-Temporal Datasets, NASA Goddard Conf. on MSS, 1995
1.
2.

Tài liệu Advances in Database Technology- P19 ppt

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về