Tải bản đầy đủ (.pdf) (339 trang)

transformation of knowledge information and data theory and applications

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (3.9 MB, 339 trang )

Hershey • London • Melbourne • Singapore
Information Science Publishing
Transformation of
Knowledge,
Information and Data:
Theory and Applications
Patrick van Bommel
University of Nijmegen, The Netherlands
Acquisition Editor: Mehdi Khosrow-Pour
Senior Managing Editor: Jan Travers
Managing Editor: Amanda Appicello
Development Editor: Michele Rossi
Copy Editor: Alana Bubnis
Typesetter: Jennifer Wetzel
Cover Design: Mindy Grubb
Printed at: Yurchak Printing Inc.
Published in the United States of America by
Information Science Publishing (an imprint of Idea Group Inc.)
701 E. Chocolate Avenue, Suite 200
Hershey PA 17033
Tel: 717-533-8845
Fax: 717-533-8661
E-mail:
Web site:
and in the United Kingdom by
Information Science Publishing (an imprint of Idea Group Inc.)
3 Henrietta Street
Covent Garden
London WC2E 8LU
Tel: 44 20 7240 0856


Fax: 44 20 7379 3313
Web site:
Copyright © 2005 by Idea Group Inc. All rights reserved. No part of this book may be
reproduced in any form or by any means, electronic or mechanical, including photocopying,
without written permission from the publisher.
Library of Congress Cataloging-in-Publication Data
Transformation of knowledge, information and data : theory and applications / Patrick van
Bommel, editor.
p. cm.
Includes bibliographical references and index.
ISBN 1-59140-527-0 (h/c) — ISBN 1-59140-528-9 (s/c) — ISBN 1-59140-529-7 (eisbn)
1. Database management. 2. Transformations (Mathematics) I. Bommel, Patrick van, 1964-
QA76.9.D3T693 2004
005.74—dc22 2004017926
British Cataloguing in Publication Data
A Cataloguing in Publication record for this book is available from the British Library.
All work contributed to this book is new, previously-unpublished material. The views expressed
in this book are those of the authors, but not necessarily of the publisher.
Preface vi
Section I: Fundamentals of Transformations
Chapter I
Transformation-Based Database Engineering 1
Jean-Luc Hainaut, University of Namur, Belgium
Chapter II
Rule-Based Transformation of Graphs and the Product Type 29
Renate Klempien-Hinrichs, University of Bremen, Germany
Hans-Jörg Kreowski, University of Bremen, Germany
Sabine Kuske, University of Bremen, Germany
Chapter III
From Conceptual Database Schemas to Logical Database Tuning 52

Jean-Marc Petit, Université Clermont-Ferrand 2, France
Mohand-Saïd Hacid, Université Lyon 1, France
Transformation of
Knowledge, Information
and Data:
Theory and Applications
Table of Contents
Chapter IV
Transformation Based XML Query Optimization 75
Dunren Che, Southern Illinois University, USA
Chapter V
Specifying Coherent Refactoring of Software Artefacts with
Distributed Graph Transformations 95
Paolo Bottoni, University of Rome “La Sapienza”, Italy
Francesco Parisi-Presicce, University of Rome “La Sapienza”, Italy
and George Mason University, USA
Gabriele Taentzer, Technical University of Berlin, Germany
Section II: Elaboration of Transformation Approaches
Chapter VI
Declarative Transformation for Object-Oriented Models 127
Keith Duddy, CRC for Enterprise Distributed Systems Technology
(DSTC), Queensland, Australia
Anna Gerber, CRC for Enterprise Distributed Systems Technology
(DSTC), Queensland, Australia
Michael Lawley, CRC for Enterprise Distributed Systems Technology
(DSTC), Queensland, Australia
Kerry Raymond, CRC for Enterprise Distributed Systems Technology
(DSTC), Queensland, Australia
Jim Steel, CRC for Enterprise Distributed Systems Technology
(DSTC), Queensland, Australia

Chapter VII
From Conceptual Models to Data Models 148
Antonio Badia, University of Louisville, USA
Chapter VIII
An Algorithm for Transforming XML Documents Schema into
Relational Database Schema 171
Abad Shah, University of Engineering & Technology (UET),
Pakistan
Jacob Adeniyi, King Saud University, Saudi Arabia
Tariq Al Tuwairqi, King Saud University, Saudi Arabia
Chapter IX
Imprecise and Uncertain Engineering Information Modeling in
Databases: Models and Formal Transformations 190
Z. M. Ma, Université de Sherbrooke, Canada
Section III: Additional Topics
Chapter X
Analysing Transformations in Performance Management 217
Bernd Wondergem, LogicaCMG Consulting, The Netherlands
Norbert Vincent, LogicaCMG Consulting, The Netherlands
Chapter XI
Multimedia Conversion with the Focus on Continuous Media 235
Maciej Suchomski, Friedrich-Alexander University of
Erlangen-Nuremberg, Germany
Andreas Märcz, Dresden, Germany
Klaus Meyer-Wegener, Friedrich-Alexander University of
Erlangen-Nuremberg, Germany
Chapter XII
Coherence in Data Schema Transformations: The Notion of Semantic
Change Patterns 257
Lex Wedemeijer, ABP Pensioenen, The Netherlands

Chapter XIII
Model Transformations in Designing the ASSO Methodology 283
Elvira Locuratolo, ISTI, Italy
About the Authors 303
Index 311
Preface
vi
Background
Data today is in motion, going from one location to another. It is more and more
moving between systems, system components, persons, departments, and orga-
nizations. This is essential, as it indicates that data is actually used, rather than
just stored. In order to emphasize the actual use of data, we may also speak of
information or knowledge.
When data is in motion, there is not only a change of place or position. Other
aspects are changing as well. Consider the following examples:
• The data format may change when it is transferred between systems.
This includes changes in data structure, data model, data schema, data
types, etc.
• Also, the interpretation of data may vary when it is passed on from one
person to another. Changes in interpretation are part of data semantics
rather than data structure.
• The level of detail may change in the exchange of data between depart-
ments or organizations, e.g., going from co-workers to managers or from
local authorities to the central government. In this context, we often see
changes in level of detail by the application of abstraction, aggregation,
generalization, and specialization.
• Moreover, the systems development phase of data models may vary.
This is particularly the case when implementation-independent data mod-
els are mapped to implementation-oriented models (e.g., semantic data
models are mapped to operational database specifications).

These examples illustrate just a few possibilities of changes in data. Numerous
other applications exist and everybody uses them all the time. Most applications
are of vital importance for the intelligent functioning of systems, persons, de-
partments, and organizations.
vii
In this book, the fundamental treatment of moving knowledge, information, or
data, with changing format, interpretation, level of detail, development phase,
etc., is based on the concept of transformation. The generally accepted terms
conversion, mutation, modification, evolution, or revision may be used in
specific contexts, but the central concept is transformation.
Note that this definition covers well-known topics such as rewriting and
versioning, and that it is relevant for collaborative information systems and data
warehouses. Although data transformation is typically applied in a networked
context (e.g., Internet or intranet), it is applied in other contexts as well.
Framework
Transformation techniques received a lot of attention in academic as well as in
industrial settings. Most of these techniques have one or more of the following
problems:
• Loss of data: the result of the transformation does not adequately de-
scribe the original data.
• Incomprehensibility: the effect of the transformation is not clear.
• Focus on instances: data instances are transformed, without incorpora-
tion of data types.
• Focus on types: data types are transformed, without incorporation of
data instances.
• Correctness: the transformation has no provable correctness.
We therefore aim at generic approaches for the treatment of data transforma-
tions. Some of the questions we deal with are the following: What is an ad-
equate data transformation technique? What are the requirements for the input
and output of those techniques? What are the problems in existing approaches?

What are the possibilities of a generic approach in important areas such as the
semantic web, supply chain management, the global information community,
and information security?
The theory and applications in this book are rooted in database schema trans-
formation, as well as in database contents transformation. This allows for other
transformations, including transformation of document type definitions (DTDs)
and of concrete documents. It is obvious that graph transformations are rel-
evant here. Note that we do not particularly focus on specific kinds of data or
documents (e.g., RDBMS, HTML or XML), although the models under consid-
eration do not exclude such a focus.
viii
From Source to Target
Here we discuss general aspects of the move from source to target. They deal
with the basic assumptions underlying all transformation processes.
• Source. This is the structure to be transformed, or in other words, it is the
input to the transformation process. An important distinction is made be-
tween formal and informal sources. If the source is informal, the transfor-
mation process cannot be fully automated. We usually then have a partly
automated transformation aiming at support, with sufficient possibilities
for interaction. As an example, a modeling process often is the mapping of
an informal view to a formal model. In this book, the input and output of
most transformations are assumed to be available in some formal lan-
guage.
• Target. This is the resulting structure, so it is the output of the transforma-
tion process. A main question here is how the relation between the target
and the source is defined. Even when the transformation process has
been completed, it is important that the relation of the target with the
source remains clear. One way of establishing such a clear relation, is to
have the target defined in terms of the source. This is also helpful in
providing correctness proofs.

• Applicability. In some cases, transformations are not really general in the
sense that the possible source and target are rather restricted. If, for ex-
ample, a theoretical model of transformations only allows for exotic tar-
gets, not being used in practical situations, the theoretical model suffers
from applicability problems.
• Structure vs. access operations. Besides the transformation of struc-
tures, we must provide mechanisms for the transformation of access op-
erations. These operations may be modification operations as well as re-
trieval operations. Consequently, we have a source structure with corre-
sponding access operations, and a target structure with equivalent opera-
tions. This situation is shown in Figure 1. The transformation kernel con-
tains all metadata relevant for the transformation.
Correctness
Evidently, the correctness of transformations is of vital importance. What pur-
pose would transformations have, if the nature of the result is uncertain? A
general setup for guaranteeing transformation correctness consists of three
steps.
ix
• Wellformedness conditions. First, we describe the required properties of
the target explicitly. We prefer to have basic (independent) wellformedness
conditions here, as this facilitates the systematic treatment in the next
steps.
• Transformation algorithm. Next, we describe the construction of the
target on the basis of the source at hand. This construction process is
defined in the transformation algorithm, which may be enhanced using
guidance parameters. Guidance is interpreted as the development towards
target structures having certain desirable qualities.
• Correctness proof. Finally, we prove that the result of the algorithm sat-
isfies the wellformedness conditions. As a consequence, the resulting struc-
ture is correct in the sense that all wellformedness conditions are satis-

fied. Moreover, when specific guidance parameters are used, we have to
prove that the resulting structure not only satisfies all wellformedness con-
ditions, but has the desirable qualities (indicated by guidance parameters)
as well.
Sequences of Transformations
Transformations may be composed or applied in sequences. Such sequences
sometimes consist of a relatively small number of steps. In more complex prob-
lem areas, however, this is no longer possible. Then, transformation sequences
will be longer and due to the various options in each transformation step, the
outcome of the overall sequence is not a priori known. This is particularly the
case when non-deterministic (e.g., random or probabilistic) transformation pro-
cesses are considered.
Figure 1. Framework for transformation of structures and operations



transformation kernel

target
structure
source
structure
source
operations
target
operations
structure transformation
operation transformation
x
Although the outcome is not a priori known, it is often desirable to predict the

nature of the result. One way of predicting the behavior of probabilistic trans-
formation processes, is through the use of Markov theory. Here the probabili-
ties of a single transformation step are summarized in a transition matrix, such
that transformation sequences can be considered by matrix multiplication.
We will illustrate the definition of a single-step matrix for two basic cases. In
the first case, consider a transformation in a solution space S where each input
x∈S has as possible output some y∈N(x), where N(x)⊆S and x∉N(x). So each
neighbor y∈N(x) can be produced from x by the application of some transfor-
mation rule. Then the probability P(x,y) for the transformation of x into some
y∈N(x) has the following property:
P(x,y) × |N(x)| = 1 (1)
Evidently for y∉N(x) we have P(x,y)=0. With this property it is guaranteed that
P(x,y) is a stochastic matrix, since 0 ≤ P(x,y) ≤ 1 and Σ
y∈S
P(x,y) = 1. Note that
in the above transformation the production of all results is equally likely.
In the second case, we consider situations where the production of all results is
not equally likely. Consider a transformation in a solution space S where each
input x∈S has as possible output some y∈B(x), where B(x)⊆N(x) contains all
better neighbors of x. Then the probability P(x,y) for the transformation of x
into some y∈B(x) is given by the above mentioned formula (1). However, as a
result of accepting only improving transformations, this formula now does not
guarantee P(x,y) to be a stochastic matrix. The consequence of rejecting all
neighbours in N(x)-B(x) is, that a transformation may fail. So now we have to
consider P(x,x). This probability has the following property:
P(x,x) × |N(x)| = |N(x)| - |B(x)| (2)
In this case we have P(x,y)=0 for y ∉ {x}∪B(x). Now we have described a
hill climbing transformation sequence. Note that the matrix underlying hill
climbing transformations is a stochastic matrix indeed.
We will now give an overview of the book. It consists of three parts: fundamen-

tals of transformations, elaboration of transformation approaches, and addi-
tional topics. These three sections contain 13 chapters. It is possible to start in
a later chapter (e.g., in Section II or III), without reading all earlier chapters
(e.g., more theoretical chapters in Section I).
xi
Fundamentals of Transformations
Section I is about fundamentals and consists of five chapters. The focus of
Chapter I is databases: Transformation-Based Database Engineering. Here
we consider the basic theory of the transformation of data schemata, where
reversibility of transformations is also considered. We describe the use of basic
transformations in the construction of more complex (higher-level) transforma-
tions. Several possibilities are recognized here, including compound transfor-
mations, and predicate-driven and model-driven transformations. Basic trans-
formations and their higher-level derivations are embedded within database (for-
ward) design processes as well as within database reverse design processes.
Most models to be transformed are defined in terms of graphs. In Chapter II
we will therefore focus on graph transformations: Rule-Based Transforma-
tion of Graphs and the Product Type. Graph transformations are based on
rules. These rules yield new graphs, produced from a given graph. In this ap-
proach, conditions are used to have more control over the transformation pro-
cess. This allows us to indicate the order of rule application. Moreover, the
result (product) of the transformation is given special attention. In particular,
the type of the product is important. This sets the context for defining the pre-
cise relation between two or more graph transformations.
Having embedded our transformations within the graph transformation context,
Chapter III proceeds with graphs for concrete cases: From Conceptual Data-
base Schemas to Logical Database Tuning. Here we present several algo-
rithms, aiming at the production of directed graphs. In databases we have sev-
eral aims in transformations, including efficiency and freedom from null values.
Note that wellformedness of the input (i.e., a conceptual model) as well as

wellformedness of the output (i.e., the database) is addressed.
It is evident that graphs have to be transformed, but what about operations on
graphs? In systems design this corresponds with query transformation and op-
timization. We apply this to markup languages in Chapter IV: Transformation
Based XML Query Optimization. After representing document type defini-
tions in terms of a graph, we consider paths in the graph and an algebra for text
search. Equivalent algebraic expressions set the context for optimization, as we
know from database theory. Here we combine the concepts from previous chap-
ters, using rule-based transformations. However, the aim of the transformation
process now is optimization.
In Chapter V, the final chapter of Section I, we consider a highly specialized
fundament in the theory behind applications: Specifying Coherent Refactoring
of Software Artefacts with Distributed Graph Transformations. Modifica-
tions in the structure of systems are recorded in terms of so-called “refactoring”.
This means that a coordinated evolution of system components becomes pos-
xii
sible. Again, this graph transformation is rule-based. We use this approach to
reason about the behavior of the system under consideration.
Elaboration of
Transformation Approaches
In Section II, we consider elaborated approaches to transformation. The focus
of Chapter VI is object-oriented transformation: Declarative Transformation
for Object-Oriented Models. This is relevant not only for object-oriented data
models, but for object-oriented programming languages as well. The transfor-
mations under consideration are organized according to three styles of trans-
formation: source-driven, target-driven, and aspect-driven transformations. Al-
though source and target will be clear, the term “aspect” needs some clarifica-
tion. In aspect-driven transformations, we use semantic concepts for setting up
the transformation rule. A concrete SQL-like syntax is used, based on rule —
forall — where — make — linking statements. This also allows for the defini-

tion of patterns.
It is generally recognized that in systems analysis we should use conceptual
models, rather than implementation models. This creates the context for trans-
formations of conceptual models. In Chapter VII we deal with this: From Con-
ceptual Models to Data Models. Conceptual models are often expressed in
terms of the Entity-Relationship approach, whereas implementation models are
often expressed in terms of the relational model. Classical conceptual model
transformations thus describe the mapping from ER to relational models. Hav-
ing UML in the conceptual area and XML in the implementation area, we now
also focus on UML to XML transformations.
We proceed with this in the next chapter: An Algorithm for Transforming
XML Documents Schema into Relational Database Schema. A typical ap-
proach to the generation of a relational schema from a document definition,
starts with preprocessing the document definition and finding the root node of
the document. After generating trees and a corresponding relational schema,
we should determine functional dependencies and other integrity constraints.
During postprocessing, the resulting schema may be normalized in case this is
desirable. Note that the performance (efficiency) of such algorithms is a criti-
cal factor. The proposed approach is illustrated in a case study based on library
documents.
Transformations are often quite complex. If data is inaccurate, we have a fur-
ther complication. In Chapter IX we deal with this: Imprecise and Uncertain
Engineering Information Modeling in Databases: Models and Formal
Transformations. Uncertainty in information modeling is usually based on fuzzy
xiii
sets and probability theory. Here we focus on transformations in the context of
fuzzy Entity-Relationship models and fuzzy nested relations. In the models used
in this transformation, the known graphical representation is extended with fuzzy
elements, such as fuzzy type symbols.
Additional Topics

In Section III, we consider additional topics. The focus of Chapter X is the
application of transformations in a new area: Analysing Transformations in
Performance Management. The context of these transformations is an orga-
nizational model, along with a goal model. This results in a view of organiza-
tional management based on cycles of transformations. Typically, we have trans-
formations of organizational models and goal models, as well as transforma-
tions of the relationship between these models. Basic transformations are the
addition of items and detailing of components.
Next we proceed with the discussion of different media: Multimedia Conver-
sion with the Focus on Continuous Media. It is evident that the major chal-
lenge in multimedia research is the systematic treatment of continuous media.
When focusing on transformations, we enter the area of streams and convert-
ers. As in previous chapters, we again base ourselves on graphs here, for in-
stance chains of converters, yielding a graph of converters. Several qualities
are relevant here, such as quality of service, quality of data, and quality of
experience. This chapter introduces specific transformations for media-type
changers, format changers, and content changers.
The focus of Chapter XII is patterns in schema changes: Coherence in Data
Schema Transformations: The Notion of Semantic Change Patterns. Here
we consider updates of data schemata during system usage (operational
schema). When the schema is transformed into a new schema, we try to find
coherence. A catalogue of semantic changes is presented, consisting of a num-
ber of basic transformations. Several important distinctions are made, for ex-
ample, between appending an entity and superimposing an entity. Also, we have
the redirection of a reference to an owner entity, along with extension and
restriction of entity intent. The basic transformations were found during empiri-
cal studies in real-life cases.
In Chapter XIII, we conclude with the advanced approach: Model Transfor-
mations in Designing the ASSO Methodology. The context of this methodol-
ogy is ease of specifying schemata and schema evolution during system usage.

The transformations considered here particularly deal with subtyping (also called
is-a relationships). This is covered by the transformation of class hierarchies or
more general class graphs. It is evident that schema consistency is one of the
properties required. This is based on consistency of class definitions, with in-
ductive approaches by: (a) requiring that initialization adheres to application
constraints, and (b) all operations preserve all constraints.
Conclusions
This book contains theory and applications of transformations in the context of
information systems development. As data today is frequently moving between
systems, system components, persons, departments, and organizations, the need
for such transformations is evident.
When data is in motion, there is not only a change of place or position. Other
aspects are changing as well. The data format may change when it is trans-
ferred between systems, while the interpretation of data may vary when it is
passed on from one person to another. Moreover, the level of detail may change
in the exchange of data between departments or organizations, and the systems
development phase of data models may vary, e.g., when implementation-inde-
pendent data models are mapped to implementation-oriented models.
The theory presented in this book will help in the development of new innova-
tive applications. Existing applications presented in this book prove the power
of current transformation approaches. We are confident that this book contrib-
utes to the understanding, the systematic treatment and refinement, and the
education of new and existing transformations.
Further Reading
Kovacs, Gy. & van Bommel, P. (1997). From conceptual model to OO data-
base via intermediate specification. Acta Cybernetica, (13), 103-140.
Kovacs, Gy. & van Bommel, P. (1998). Conceptual modelling based design of
object-oriented databases. Information and Software Technology, 40(1), 1-14.
van Bommel, P. (1993, May). A randomised schema mutator for evolutionary
database optimisation. The Australian Computer Journal, 25(2), 61-69.

van Bommel, P. (1994). Experiences with EDO: An evolutionary database
optimizer. Data & Knowledge Engineering, 13, 243-263.
van Bommel, P. (1995, July). Database design by computer aided schema trans-
formations. Software Engineering Journal, 10(4), 125-132.
van Bommel, P., Kovacs, Gy. & Micsik, A. (1994). Transformation of database
populations and operations from the conceptual to the Internal level. In-
formation Systems, 19(2), 175-191.
xiv
van Bommel, P., Lucasius, C.B. & Weide, Th.P. van der (1994). Genetic algo-
rithms for optimal logical database design. Information and Software
Technology, 36(12), 725-732.
van Bommel, P. & Weide, Th.P. van der (1992). Reducing the search space for
conceptual schema transformation. Data & Knowledge Engineering, 8,
269-292.
Acknowledgments
The editor gratefully acknowledges the help of all involved in the production of
this book. Without their support, this project could not have been satisfactorily
completed. A further special note of thanks goes also to all the staff at Idea
Group Publishing, whose contributions throughout the whole process from in-
ception of the initial idea to final publication have been invaluable.
Deep appreciation and gratitude is due to Theo van der Weide and other mem-
bers of the Department of Information Systems at the University of Nijmegen,
The Netherlands, for the discussions about transformations of information models.
Most of the authors of chapters included in this book also served as reviewers
for chapters written by other authors. Thanks go to all those who provided
constructive and comprehensive reviews. Special thanks also go to the publish-
ing team at Idea Group Publishing, in particular to Michele Rossi, Carrie
Skovrinskie, Jan Travers, and Mehdi Khosrow-Pour.
In closing, I wish to thank all of the authors for their insights and excellent
contributions to this book.

Patrick van Bommel, PhD
Nijmegen, The Netherlands
February 2004

/>xv
Section I
Fundamentals of
Transformations
Transformation-Based Database Engineering 1
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Chapter I
Transformation-Based
Database Engineering
Jean-Luc Hainaut, University of Namur, Belgium
Abstract
In this chapter, we develop a transformational framework in which many
database engineering processes can be modeled in a precise way, and in
which properties such as semantics preservation and propagation can be
studied rigorously. Indeed, the transformational paradigm is particularly
suited to database schema manipulation and translation, that are the basis
of such processes as schema normalization and optimization, model
translation, reverse engineering, database integration and federation or
database migration. The presentation first develops a theoretical framework
based on a rich, wide spectrum specification model. Then, it describes how
more complex transformations can be built through predicate-based filtering
and composition. Finally, it analyzes two major engineering activities,
namely database design and reverse engineering, modeled as goal-oriented
schema transformations.
2 Hainaut

Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Motivation and Introduction
Modeling software design as the systematic transformation of formal specifica-
tions into efficient programs, and building CASE
1
tools that support it, has long
been considered one of the ultimate goals of software engineering. For instance,
Balzer (1981) and Fikas (1985) consider that the process of developing a
program [can be] formalized as a set of correctness-preserving transfor-
mations [ ] aimed to compilable and efficient program production. In this
context, according to Partsch (1983),
“a transformation is a relation between two program schemes P
and P’ (a program scheme is the [parameterized] representation
of a class of related programs; a program of this class is obtained
by instantiating the scheme parameters). It is said to be correct if
a certain semantic relation holds between P and P’.”
These definitions still hold for database schemas, which are special kinds of
abstract program schemes. The concept of transformation is particularly attrac-
tive in this realm, though it has not often been made explicit (for instance, as a
user tool) in current CASE tools. A (schema) transformation is most generally
considered to be an operator by which a data structure S1 (possibly empty) is
replaced by another structure S2 (possibly empty) which may have some sort of
equivalence with S1. Some transformations change the information contents of
the source schema, particularly in schema building (adding an entity type or an
attribute) and in schema evolution (removing a constraint or extending a
relationship type). Others preserve it and will be called semantics-preserving or
reversible. Among them, we will find those which just change the nature of a
schema object, such as transforming an entity type into a relationship type or
extracting a set of attributes as an independent entity type.

Transformations that are proved to preserve the correctness of the original
specifications have been proposed in practically all the activities related to
schema engineering: schema normalization (Rauh, 1995), DBMS
2
schema
translation (Hainaut, 1993b; Rosenthal, 1988), schema integration (Batini, 1992;
McBrien, 2003), schema equivalence (D’Atri, 1984; Jajodia, 1983; Kobayashi,
1986; Lien, 1982), data conversion (Navathe, 1980; Estiévenart, 2003), reverse
engineering (Bolois, 1994; Casanova, 1984; Hainaut, 1993, 1993b), schema
optimization (Hainaut, 1993b; Halpin, 1995) database interoperability (McBrien,
2003; Thiran, 2001) and others. The reader will find in Hainaut (1995) an
illustration of numerous application domains of schema transformations.
The goal of this chapter is to develop and illustrate a general framework for
database transformations in which all the processes mentioned above can be
Transformation-Based Database Engineering 3
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
formalized and analyzed in a uniform way. We present a wide spectrum
formalism in which all the information/data models currently used can be
specified, and on which a set of basic transformational operators is defined. We
also study the important property of semantics-preservation of these operators.
Next, we explain how higher-level transformations can be built through three
mechanisms, from mere composition to complex model-driven transformation.
The database design process is revisited and given a transformational interpre-
tation. The same exercise is carried out in the next section for database reverse
engineering then we conclude the chapter.
Schema Transformation Basics
This section describes a general transformational theory that will be used as the
basis for modeling database engineering processes. First, we discuss some
preliminary issues concerning the way such theories can be developed. Then, we

define a wide-spectrum model from which operational models (i.e., those which
are of interest for practitioners) can be derived. The next sections are dedicated
to the concept of transformation, to its semantics-preservation property, and to
the means to prove it. Finally, some important basic transformations are
described.
• Warning. In the database world, a general formalism in which database
specifications can be built is called a model. The specification of a database
expressed in such a model is called a schema.
Developing Transformational Theories
Developing a general purpose transformational theory requires deciding on the
specification formalism, i.e., the model, in which the schemas are expressed and
on the set of transformational operators. A schema can be defined as a set of
constructs (entity types, attributes, keys, indexes, etc.) borrowed from a definite
model whose role is to state which constructs can be used, according to which
assembly rules, in order to build valid schemas. For simplicity, the concept of
entity type is called a construct of the ERA
3
model, while entity type CUS-
TOMER is a construct of a specific schema. They are given the same name,
though the latter is an instance of the former.
Though some dedicated theories rely on a couple of models, such as those which
are intended to produce relational schemas from ERA schemas, the most
interesting theories are based on a single formalism. Such a formalism defines
4 Hainaut
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
the reference model on which the operators are built. According to its generality
and its abstraction level, this model defines the scope of the theory, that can
address a more or less wide spectrum of processes. For instance, building a
theory on the relational model will allow us to describe, and to reason on, the

transformation of relational schemas into other relational schemas. The 1NF
4
normalization theory is a popular example. Another example would be a
transformational theory based on the ORM (Object-Role model) that would
provide techniques for transforming (normalizing, optimizing) conceptual schemas
into other schemas of the same abstraction level (de Troyer, 1993; Proper, 1998).
The hard challenge is to choose a unique model that can address not only intra-
model transformations, but inter-model operators, such as ORM-to-relational
conversion.
To identify such models, let us consider a set of models Γ that includes, among
others, all the operational formalisms that are of interest for a community of
practitioners, whatever the underlying paradigm, the age and the abstraction
level of these formalisms. For instance, in a large company whose information
system relies on many databases (be they based on legacy or modern technolo-
gies) that have been designed and maintained by several teams, this set is likely
to include several variants of the ERA model, UML class diagrams, several
relational models (e.g., Oracle 5 to 10 and DB2 UDB), the object-relational
model, the IDMS and IMS models and of course the standard file structure model
on which many legacy applications have been developed.
Let us also consider the transitive inclusion relation “≤” such that M ≤ M’, where
M≠M’ and M,M’ ∈ Γ, means that all the constructs of M also appear in M’.
5
For
instance, if M denotes the standard relational model and M’ the object-
relational model, then M ≤ M’ holds, since each schema expressed in M is a valid
schema according to model M’.
Now, we consider a model M* in Γ, such that:
∀M∈Γ, M≠M*: M ≤ M*,
and a model M0 in Γ, for which the following property holds:
∀M∈Γ, M≠M0: M0 ≤ M.

(ΓxΓ, ≤) forms a lattice of models, in which M0 denotes the bottom node and M*
the upper node.
M0, admittedly non-empty, is made up of a very small set of elementary abstract
constructs, typically nodes, edges and labels. An ERA schema S comprising an
entity type E with two attributes A1 and A2 would be represented in M0 by the
Transformation-Based Database Engineering 5
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
nodes n1, n2, n3 which are given the labels “E”, “A1” and “A2”, and by the edges
(n1,n2) and (n1,n3).
On the contrary, M* will include a greater variety of constructs, each of them
being a natural abstraction of one or several constructs of lower-level models.
This model should include, among others, the concepts of object type, attribute
and inter-object association, so that the contents of schema S will be represented
in M* by an object type with name “E” comprising two attributes with names “A1”
and “A2”.
Due to their high level of abstraction, models M0 and M* are good candidates to
develop a transformational theory relying on a single model. Considering the
context-dependent definition of Γ, M0 and M*, we cannot assert that these
concepts are unique. Therefore, there is no guarantee that a universal theory can
be built.
Approaches based on M0 generally define data structures as semantics-free
binary graphs on which a small set of rewriting operators are defined. The
representation of an operational model M such as ERA, relational or XML, in M0
requires some additional features such as typed nodes (object, attribute, associa-
tion and roles for instance) and edges, as well as ad hoc assembly rules that
define patterns. A transformation specific to M is also defined by a pattern, a sort
of macro-transformation, defined by a chain of M0 transformations. McBrien
(1998) is a typical example of such theories. We can call this approach
constructive or bottom-up, since we build operational models and transforma-

tions by assembling elementary building blocks.
The approaches based on M* naturally require a larger set of rewriting rules. An
operational model M is defined by specializing M*, that is, by selecting a subset
of concepts and by defining restrictive assembly rules. For instance, a relational
schema can be defined as a set of object types (tables), a set of attributes
(column), each associated with an object type (at least one attribute per object
type) and a set of uniqueness (keys) and inclusion (foreign keys) constraints.
This model does not include the concept of association. The transformations of
M are those of M* which remain meaningful. This approach can be qualified by
specialization or top-down, since an operational model and its transformational
operators are defined by specializing (i.e., selecting, renaming, restricting) M*
constructs and operators. DB-MAIN (Hainaut, 1996b) is an example of this
approach. In the next section, we describe the main aspects of its model, named GER.
6
Data Structure Specification Model
Database engineering is concerned with building, converting and transforming
database schemas at different levels of abstraction, and according to various
6 Hainaut
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
paradigms. Some processes, such as normalization, integration and optimization
operate in a single model, and will require intra-model transformations. Other
processes, such as logical design, use two models, namely the source and target
models. Finally, some processes, among others, reverse engineering and feder-
ated database development, can operate on an arbitrary number of models (or
on a hybrid model made up of the union of these models) as we will see later on.
The GER model is a wide-spectrum formalism that has been designed to:
• express conceptual, logical and physical schemas, as well as their manipu-
lation,
• support all the data-centered engineering processes, and

• support all DMS
7
models and the production and manipulation of their
schemas.
The GER is an extended entity-relationship model that includes, among others,
the concepts of schema, entity type, entity collection, domain, attribute, relation-
ship type, keys, as well as various constraints. In this model, a schema is a
description of data structures. It is made up of specification constructs which
can be, for convenience, classified into the usual three abstraction levels, namely
conceptual, logical and physical. We will enumerate some of the main constructs
that can appear at each level:
• A conceptual schema comprises entity types (with/without attributes;
with/without identifiers), super/subtype hierarchies (single/multiple, total
and disjoint properties), relationship types (binary/N-ary; cyclic/acyclic;
with/without attributes; with/without identifiers), roles of relationship type
(with min-max cardinalities; with/without explicit name; single/multi-entity-
type), attributes (of entity or relationship types; multi/single-valued; atomic/
compound; with cardinality), identifiers (of entity type, relationship type,
multivalued attribute; comprising attributes and/or roles), constraints (in-
clusion, exclusion, coexistence, at-least-one, etc.)
• A logical schema comprises record types, fields, arrays, foreign keys,
redundancy, etc.
• A physical schema comprises files, record types, fields, access keys (a
generic term for index, calc key, etc.), physical data types, bag and list
multivalued attributes, and other implementation details.
It is important to note that these levels are not part of the model. The schema of
Figure 1 illustrates some major concepts borrowed to these three levels. Such a
hybrid schema could appear in reverse engineering.
Transformation-Based Database Engineering 7
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written

permission of Idea Group Inc. is prohibited.
One remarkable characteristic of wide spectrum models is that all the transfor-
mations, including inter-model ones, appear as intra-model operators. This has
highly interesting consequences. First, a transformation Σ designed for manipu-
lating schemas in an operational model M1 can be used in a model M2 as well,
provided that M2 includes the constructs on which Σ operates. For instance, most
transformations dedicated to COBOL data structure reverse engineering appear
to be valid for relational schemas as well. This strongly reduces the number of
operators. Secondly, any new model can profit from the techniques and
reasoning that have been developed for current models. For instance, designing
methods for translating conceptual schemas into object-relational structures or
into XML schemas (Estiévenart, 2003), or reverse engineering OO-databases
(Hainaut, 1997) have proved particularly easy since these new methods can be,
to a large extent, derived from standard ones.
The GER model has been given a formal semantics in terms of an extended NF2
model (Hainaut, 1989, 1996). This semantics will allow us to analyze the
properties of transformations, and particularly to precisely describe how, and
under which conditions, they propagate and preserve the information contents of
schemas.
Figure 1. Typical hybrid schema made up of conceptual constructs (e.g.,
entity types PERSON, CUSTOMER, EMPLOYEE and ACCOUNT,
relationship type of, identifiers Customer ID of CUSTOMER), logical
constructs (e.g., record type ORDER, with various kinds of fields including
an array, foreign keys ORIGIN and DETAIL.REFERENCE) and physical
objects (e.g., table PRODUCT with primary key PRO_CODE and indexes
PRO_CODE and CATEGORY, table space PRODUCT.DAT) (Note that the
identifier of ACCOUNT, stating that the accounts of a customer have
distinct Account numbers, makes it a dependent or weak entity type.)
1-1
0-N

of
T
PERSON
Name
Address
EMPLOYEE
Employe Nbr
Date Hired
id:
Employe Nbr
ACCOUNT
Account NBR
Amount
id:
of.CUSTOMER
Account NBR
CUSTOMER
Customer ID
id:
Customer ID
ORDER
ORD-ID
DATE_RECEIVED
ORIGIN
DETAIL[1-5] array
REFERENCE
QTY-ORD
id:
ORD-ID
ref:

ORIGIN
ref:
DETAIL[*].REFERENCE
PRODUCT
PRO_CODE
CATEGORY
DESCRIPTION
UNIT_PRICE
id:
PRO_CODE
acc
acc:
CATEGORY
PRODUCT.DAT
PRODUCT

8 Hainaut
Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Let us note that we have discarded the UML class model as a candidate for M*
due to its intrinsic weaknesses, including its lack of agreed-upon semantics, its
non-regularity and the absence of essential concepts. On the contrary, a
carefully defined subset of the UML model could be be a realistic basis for
constructive approaches.
Specifying Operational Models with the GER
In this section, we illustrate the specialization mechanism by describing a
popular operational formalism, namely the standard 1NF relational model. All the
other models, be they conceptual, logical or physical can be specified similarly.
A relational schema mainly includes tables, domains, columns, primary keys,
unique constraints, not null constraints and foreign keys. The relational model can

therefore be defined as in Figure 2. A GER schema made up of constructs from
the first columns only, that satisfy the assembly rules, can be called relational.
As a consequence, a relational schema cannot comprise is-a relations, relation-
ship types, multivalued attributes or compound attributes.
The physical aspects of the relational data structures can be addressed as well.
Figure 3 gives additional specifications through which physical schemas for a
specific RDBMS can be specified. These rules generally include limitations such
as no more than 64 columns per index, or the total length of the components
of any index cannot exceed 255 characters.
Figure 2. Defining standard relational model as a subset of the GER model
GER constructs relational constructs

assembly rules
schema database schema
entity type table an entity type includes at least one attribute
simple domain domain
single-valued and atomic attribute
with cardinality [0-1]
nullable column
single-valued and atomic attribute
with cardinality [1-1]
not null column
primary identifier primary key a primary identifier comprises attributes with
cardinality [1-1]
secondary identifier unique constraint
reference group foreign key the composition of the reference group must be
the same as that of the target identifier
GER names SQL names the GER names must follow the SQL syntax


×