Tải bản đầy đủ (.pdf) (52 trang)

Handbook of Research on Geoinformatics - Hassan A. Karimi Part 2 pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.44 MB, 52 trang )

16
Querying GML
distances between spatial objects and calculating
relative direction. Some other forms of spatial
predicates are realized as a set of functions.
The basic syntax of GML-QL is the same as
that of XQuery, with added spatial functions.
The following are some examples of GML-QL
queries.
Query 1: List the name, population, and area
of each country for the le “country.XML”.
FOR $c IN document(“Country.XML”)
RETURN
<Country>
<GML:name>$c/Name/text()</GML:
n
ame
>

<pop>$c/pop</pop>
<area>Area($c/shape)</area>
</Country>
Query 2: The St. Lawrence River can supply
water to the cities which are within 300 km, if
needed. List the cities which can use water from
the St. Lawrence. This query illustrates buffer
analysis and spatial join operations.
FOR $r IN document(“River.XML”)
$c IN document(“City.XML”)
WHER
E overlap(buffer($r/Shape,300), $c/


Shape) == 1

RETU
RN <CityName>

<cname> $c/Name </
cname>


</CityName>
Both queries use expressions similar to those of
XQuery. In the rst example, document function
opens the country. XML document and binds the
value of every country element in that document
to variable $c. Finally, the result is constructed as
dened in the return clause. The values obtained
from the query, which is bound to the variable,
can be used to construct new elements in the
result if necessary. In this example, a function,
area(), is used to calculate the area of a country.
The second example illustrates buffer analysis
and spatial joins operations by using overlap()
and buffer() functions.
Although Vatsavai (2002) mentioned that
spatial operators in this query language are imple-
mented as a set of functions, the details of these
functions were not given. We therefore suppose
that this query language has neither a particular
application nor an implementation. Nevertheless,
it offered a novel and interesting approach to

dene a GML query language based on XQuery
and OGC-SQL (Open Geospatial Consortium,
1999) spatial operators.
CXQuery (Constraint XML Query
Language)
CXQuery or Constraint XML Query Language
(Chen and Revesz, 2002) is a declarative, Datalog-
style language for querying and updating XML
documents. It employs the syntax and semantics
of constraint query languages (Kanellakis et al.,
Domain Operator
Topologically closed geometries Disjoint(a,b) ⇔ a ∩ b = ∅
a and b applies to the A/A, L/L, L/A, P/A and P/L groups
of relationships.
Touches(a,b) ⇔ (I(a)∩I(b) = ∅) ∧ (a ∩ b) ≠∅
a and b applies to the P/L, P/A, L/L and L/A.
Crosses(a,b) ⇔ (dim(I(a) ∩ I(b)) < max(dim(I(a)),
dim(I(b)))) ∧ (a ∩ b ≠a ) ∧ (a ∩ b ≠b)
a and b applies to the A/A, P/A,L/A,L/L
Within(a,b) ⇔ (a ∩ b = a) ∧ (I(a) ∩I(b) ≠ ∅)
a and b applies to the A/A, P/A,L/A,L/L
Contains(a,b) ⇔ Within(b,a)
a and b applies to the A/A, P/A,L/A,L/L
Intersects(a,b) ⇔ not.Disjoint(a,b)
Table 1. Denition of predicates
17
Querying GML
1990). The input of a CXQuery is a set of XML
documents. The output of a CXQuery query is
also an XML document. When CXQuery is used

to dene views, the query result is not material-
ized.
A CXQuery expression contains a rule head
and a rule body, with a “:-” symbol between them.
The rule body contains a set of predicates, which
are separated by semicolons. The semicolons stand
for the logical operation “and”. To simplify the
CXQuery expression, it employs a subset of XPath
functionality to navigate the hierarchical structure
of XML documents and to avoid namespace con-
icts. Since most XML documents exchanged in
e-Business have relatively restricted structures,
CXQuery considers those XML documents that
have internal DTD denitions or have external
DTD denition connections.
Due to these difculties, to date there is no
query language proposal which supports querying
spatial XML documents. Since both CXQuery
and many constraint query languages are based
on Prolog, they can be easily combined. Since
constraint query languages can express spatio-
temporal queries, the combination leads to a
query language for XML documents that contain
spatio-temporal data. Moreover, combination
can be easily implemented on top of a constraint
database system.
Query 3 shows a spatial query: Find all build-
ings located in citycampus and belonging to the
Computer Science department.
citycampus(id,constraint):-

document(“citycampus.XML”),
citycampus(id,
departments, buildings,
BoundedBy),
constraint(x, y, BoundedBy);
Building(name, dept, constraint):-
document(“campus.XML”),
Building(name, dept, spatial),
constraint(x, y, spatial);
Building(name, dept, constraint):-
Building(name, constraint),
citycampus(id, constraint),
cont
ains(citycampus/constraint,Building/
constraint),
Building/dept = «Computer Science».
The rst two rules construct the constraint
representation of the spatial data from the XML
documents. The third rule uses a spatial func-
tion contains() to test the spatial relation of two
spatial objects.
One way in which CXQuery improves upon
XQuery is by specifying schemas for the results
of queries. Chen and Revesz (2002) claim that
query results without schemas are limited for
dening views, integrating data, updating, and
further querying. XQuery can query the results
of a query without a schema provided.
The main focus of this query language is to
provide schema information in the query result.

Since CXQuery is derived from a constraint
query language and the fact that constraint query
languages can express spatial-temporal queries,
the combination leads to a query language for
XML documents that contains support for spa-
tial-temporal data.
Gquery
Gquery (Boucelma and Colonna, 2004) is yet
another GML query language based on XQuery.
Unlike GML-QL, Boucelma and Colonna (2004)
dene a set of Gquery-specic spatial operators
and basic data types. Its data types are polygon,
line and point, in the same way as the basic data
types dened for GML.
The spatial operators can be classied into
three groups: operators that return boolean type
(equal, inside and cut), operators that return oat
type (distance, perimeter and length) and opera-
tors that return spatial type (convexhull, center,
intersection).
18
Querying GML
Query 4 is an example. It obtains the intersec-
tion point between a road and a river:
for n in city
return intersection(n/road, n/river)
Gquery is designed for use in a particular
mediator architecture. It provides an integrated
view of the data supplied by all sources, and
Gquery makes it possible to access and manipulate

integrated data.
conc Lus Ion
Currently, there is a large set of query languages
over XML. Although each one is based on differ-
ent algebra and data models, all of them have the
same aim: to query semi-structured data.
There are fewer query languages for GML
documents. Since GML is an XML encoding, the
features of XML could be applied to GML. With
this, a GML query language should extend a query
language over XML with spatial features.
In fact, in this chapter we have discussed four
query languages over GML. The rst one is a
novel extension of a previous query language
over XML. It is based on a robust data model and
algebra and it offers all the features of an XML
query language and a wide set of spatial opera-
tors. Since it was the rst approach in this area,
it has inspired other query languages (Chung et
al., 2004).
The other three approaches are an extension
of XQuery, with different aims and perspectives.
The rst of these, GML-QL, was the rst novel
approach of a GML query language based on
XQuery. Since the literature about GML-QL is
rather scarce, we suppose that this query lan-
guage has neither a particular application nor
an implementation. Furthermore, details about
spatial operators and functions were not given
by Vatsavai (2002).

Although the second of these, CXQuery , is
based on XQuery, it offers an interesting approach
for a spatial query language over GML. CXQuery
allows to query and update XML documents us-
ing the syntax and semantics of constraint query
languages. This query language is currently the
best approach over GML.
The last approach, Gquery, denes a set of
spatial operators for GML. It is a specic ap-
proach to be applied in a particular mediator
architecture.
In conclusion, GML can represent database
resources on the web, etc. which can be queried
with a specic query language. Query languages
over GML are a reality.
r eferences
Abiteboul, S., Quass, S., McHugh, J., Widom, J.,
& Wiener, J. (1997). The Lorel Query Language
for Semistructured Data. International Journal
on Digital Libraries, 1(1), 68-88.
Beech, D., Malhotra. A., & Rys, M. (1999).
A Formal Data Model and Algebra for XML.
/>FallY99/ malhotra-slides/malhotra.pdf.
Boucelma, O., & Colonna, F. (2004). Mediation
for Online Geoservices. In 4th International
Workshop on Web and Wireless Geographical
Information Systems. W2GIS 2004. Korea.
Chen, Y., & Revesz, P. (2002). CXQuery: A Novel
XML Query Language. In Proc. of International
Conference on Advances in Infrastructure for

Electronic Business, Science, and Medicine on
the Internet (SSGRR’02).
Chung, W., Park, S., & Bae, H. (2004). An Exten-
sion of XQuery for Moving Objects over GML.
ITCC . Proc. Of the International Conference on
Information Technology: Coding and Comput-
ing. IEEE.
19
Querying GML
Córcoles, J. E., & González, P. (2001). A Speci-
cation of a Spatial Query Language over GML.
ACM-GIS 2001. 9th ACM International Sympo-
sium on Advances in Geographic Information
Systems. Atlanta (USA).
Deutsch, A., Fernandez, M., Florescu, D., Levy, A.,
& Suciu, D. (1999). XML-QL: A Query Language
for XML. Computer Networks, 31, 11-16.
Kanellakis, P. C. Kuper, G. M., & Revesz (1990).
P. Constraint Query languages, Symposium on
Principles of Database Systems.
Open Geospatial Consortium. (1999). Simple
Features Specication For SQL, 05-1341. Open
Geospatial Consortium. Retrieved 13th January,
2005, from
Open Geospatial Consortium (2003). Geography
Markup Language – GML. Retrieved 13th Janu-
ary, 2005, from />documents/02-023r4.pdf.
Robie, J. (1998). The design of XQL. Retrieved
13th January, 2005, from />XSL/Group/1998/09/XQL-design.html.
Vatsavai, R. (2002). GML-QL: A Spatial Query

Language Specication for GML. Retrieved 13th
January, 2005, from blestone-
concepts.com/ucgis2summer2002/ vatsavai/
vatsavai.htm.
W3C (1998) XSL. Retrieved 13th January, 2005,
from http://www.w3. org/TR/REC-XML.
W3C (2001). XQuery: A Query Language for XML.
Retrieved 13th January, 2005, from http://www.
w3.org/TR/2001/WD-XQuery-20010215.
W3C. (2005). Extensible Markup Language
– XML. Retrieved 13th January, 2005, from

key t er Ms
Feature: A feature is an application object
that represents a physical entity, e.g. a building, a
river, or a person. A feature may or may not have
geometric aspects.
Markup Language: Language which com-
bines text and extra information about the text.
The extra information is expressed using markup,
which is intermingled with the primary text.
Query Language: Computer language used
to make queries into databases and information
systems
Semi-Structured Data: Data with incomplete
structure. Data are directly described using a
simple syntax, e.g. XML, GML, etc.
20
Chapter III
Image Database Indexing

Techniques
Michael Vassilakopoulos
University of Central Greece, Greece
Antonio Corral
University of Almería, Spain
Boris Rachev
Technical University of Varna, Bulgaria
Irena Valova
University of Rousse, Bulgaria
Mariana Stoeva
Technical University of Varna, Bulgaria
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Abstr Act
Image Databases (IDBs) are a kind of Spatial Databases where a large number of images are stored
and queried. In this chapter, techniques for indexing an IDB for efciently processing several kinds of
queries, like retrieval based on features, content, structure, processing of joins, and queries by example
are reviewed. The main indexing techniques used in IDBs are either members of the R-tree family (data
driven structures), or members of the quadtree family (space driven structures). Although, research
on IDB indexing counts several years, there are still signicant research challenges, which are also
discussed in this chapter. IDBs and their indexing structures bring together two different disciplines
(databases and image processing) and interdisciplinary research efforts are required. Moreover, deal-
ing with the semantic gap (successful integrated retrieval based on low-level features and high-level
semantic features) and querying between images and other kinds of spatial data are also signicant
future research directions.
21
Image Database Indexing Techniques
Introduct Ion
Image Databases (IDBs) are a special kind of
Spatial Databases where a large number of images
are stored and queried. IDBs have a plethora of

applications in modern life, for example in medi-
cal, multimedia, and educational applications.
In the framework of Geographical Information
Systems (GIS), digital images (raster data) may
represent changes in cultivations, sunny areas, and
the discrimination between urban environments
and country sides.
Apart from the raster format, GIS data may
be stored in vector format (points, line segments,
polygons, etc.). Each of these data formats has
certain advantages making a choice between them
a challenge. Raster data leads to faster computing
for several operations (e.g., overlays) and are well
suited for remote sensing. On the other hand, they
have a xed resolution leading to limited detail.
In this article, we focus on raster data (image
databases) and their indexing techniques.
Since the start of the 1980s several structures
for spatial objects have been proposed in the lit-
erature for efcient storage and retrieval of image
collections. Based on these methods, many kinds
of useful queries on image data may be processed
efciently. These include:

Que
ries about the content of additional
properties (descriptive information) that
have been embedded for each image (e.g.,
which images have been used in the book
cover of children’s books?).

• Que
ries about the characteristics/features
of the images like color, texture, shape etc.
(e.g., nd the images that depict vivid blue
sky.).
• Queries for retrieving images with specied
content (e.g., nd the images that contain
the sub-image of a specied chair.).
• Que
ries by example or sketch (e.g., a sample
image is chosen, or drawn by the user and
images similar to this sample are sought.).
• Str
uctural queries (e.g., nd the images that
contain a number of specic objects in a
specied arrangement.).
• Ima
ge Joins (e.g., nd the cultivation areas
that reside in polluted atmosphere areas.).
• Queries that combine regional data and
other sorts of spatial data (e.g., nd the cities
represented by point data that reside within
5km from cotton cultivations.).
• Temporal Queries on sequences of evolving
images (e.g., nd if there has been an increase
in the regions of wheat cultivations in this
prefecture during the last two years.).
The importance of image indexing and query-
ing techniques led major Database Management
Systems’ manufacturers to embed related exten-

sions to the core engine of their products, (e.g.,
DB2 has embedded QBIC technology) (Flickner
et al. 1995) and Oracle provides Content-Based
Image Retrieval (CBIR) based on Virage (An-
namalai et al. 2000).
bAckground
A digital image is a representation of a two-di-
mensional image as a nite set of digital values,
called picture elements or pixels. In a binary
image, each pixel can be either black, or white,
while in a greyscale (color) image each pixel cor-
responds to a shade of gray (to a color), among a
set of permitted greyscale (color) values.
Each image represents a scene containing
objects and regions. An IDB is an organized
collection of digital images aiming at the man-
agement and the efcient processing of queries
on this image collection. There are numerous
publications in the literature related to the pro-
cessing of queries on image features like color
(e.g., distribution of colors, dominant colors,
and color moments), texture (the pattern of the
image surface change, usually expressed by a
combination of characteristics like coarseness,
contrast, directionality, uniformity, regularity,
22
Image Database Indexing Techniques
density, frequency, etc.) and shape (the physical
structure of objects, or the geometric shapes pres-
ent in the image). In several of these publications

(emerging from the image processing/computer
vision community) the term indexing refers to
the features corresponding to each image and to
the algorithm used for computing the similarity
between them (the algorithm often works by an
exhaustive comparison with all the images pres-
ent in the databases). In this article, indexing is
used in the context of databases and corresponds
to the access methods (data structures) used to
speed up query processing.
Several publications that contain review mate-
rial have appeared in the literature. Rui et al. (1999)
review numerous papers covering several aspects
of CBIR, including multidimensional indexing,
and identify open research issues. Smeulders et
al. (2000) is another detailed review of CBIR
techniques covering the research presented up
to the year 2000 that includes also a subsection
on storage and indexing. The last section of this
paper presents the authors’ view on CBIR’s fu-
ture trends. Manolopoulos et al. (2000) overview
indexing for structural and feature based queries.
Veltkamp and Tanase (2001) performed a survey
on numerous CBIR systems providing informa-
tion that is available for each of them on several
technical aspects, including the use of indexing
structures. One conclusion of this survey is that
“Indexing data structures are often not used”.
Manouvrier et al. (2005) present a detailed review
of quadtree based indexing in the image domain

ranging from image representation and storage
to CBIR. Price (2006) maintains an extensive
Computer Vision bibliography (an invaluable tool
for the researcher) that contains many references
to image indexing.
MAIn focus of the Art IcLe
In this section, we review the main indexing
techniques that have been proposed for image
databases. These techniques are grouped and
classied by the family of their structure. The two
main families are the R-tree family (data-driven
structures) and the quadtree family (space-driven
structures), (subsection 2.1.2 of chapter 6, Mano-
lopoulos et al. 2000).
Chang (1987) proposed the use of 2-D strings
for the structural representation of objects appear-
ing in an image. Using this technique, structural
queries can be answered by exhaustive com-
parisons with all the images in the IDB. Petrakis
(1993) and Orphanoudakis (1996) used hash-based
indexing to speed up processing. 2D strings are
an efcient representation of the “left/right”
and “below/above” relationships. Petrakis and
Faloutsos (1997), Petrakis (2002) and Petrakis et
al. (2002) adopted Attributed Relational Graphs
(ARG), the most general image structure rep-
resentation method, where individual object, or
regions are represented by graph nodes and their
relationships are represented by edges between
such nodes. The method developed by Petrakis and

Faloutsos (1997) achieves fast query processing
by making certain assumptions on the presence
of objects in each image. Petrakis (2002) and
Petrakis et al. (2002) relax these assumptions.
All these ARG-based methods achieve high
performance by indexing ARGs with the R-tree
family structure.
An R-tree is a balanced multiway tree for
secondary storage, where each node is related
to a Minimum Bounding Rectangle (MBR), that
represents the minimum rectangle that bounds the
data elements contained in the node. The MBR
of the root bounds all the data stored in the tree.
Figure 1 depicts some rectangles (MBRs of data
e
lem
ents) on the right and the corresponding R-
tree on the left. Dotted lines denote the bound-
ing rectangles of the subtrees that are rooted
in inner nodes. The most widely used R-tee is
the R*-tree; for more details refer to Gaede and
Günther (1998).
Papadias et al. (1998) treat the problem of
structural image queries as a Multiple Constraint
23
Image Database Indexing Techniques
Satisfaction (MCS) problem. Both the images and
the queries are mapped to regions in a multidi-
mensional space and are indexed by structures of
the R-tree family. Query processing is treated as

general form of spatial joins (multi-way spatial
joins).
QBIC (Flincker et al. 1995) was one of the
rst systems that introduced multidimensional
indexing to enhance performance of CBIR.
Color, shape and texture features are extracted
from the images and are represented by points
in high-dimensional spaces. Karhunen Loeve
Transform is used to perform dimension reduc-
tion of the feature data (in order to overcome the
degradation of performance of multidimensional
index structures as the dimensionality increases,
a situation known as the “curse of dimensional-
ity”, Lin et al. 1994) and a structure belonging to
the R-tree family (an R*-tree) is used as a multi-
dimensional indexing structure.
Seidl and Kriegel (2001) present techniques for
adaptable similarity search. They use quadratic
distance functions that are evaluated using multi-
dimensional index structures of the R-tree family
(and especially X-trees), dimensionality reduction
and approximation techniques (for an introduction
to X-trees, see Manolopoulos et. al. 2000).
For efcient processing of queries in image
databases, Quadtrees have also been extensively
used as indexing mechanisms. The Quadtree is a
four-way unbalanced tree where each node cor-
responds to a subquadrant of the quadrant of its
father node (the root corresponds to the whole
space). These trees subdivide space in a hierarchi-

cal and regular fashion. They are mainly designed
for main memory, however several alternatives for
secondary memory have been proposed. The most
widely used Quadtree is the Region Quadtree that
stores regional data in the form of raster images.
Figure 2 depicts an 8x8 pixel array and the corre-
sponding Quadtree. Note that black/white squares
represent black/white regions, while circles rep-
resent internal nodes (gray regions). The Linear
Region Quadtree is an external memory version
of the Region Quadtree, where each quadrant is
represented by a codeword stored in a B+-tree;
for more details refer to Samet (1990).
Quadtrees have been used for CBIR (represen-
tation and querying by image features) by several
researchers, as a mechanism for calculating im-
age similarity by dening appropriate similarity
measures. Examples of such work follows. In
some research efforts, complete Quadtrees with
a xed number of levels are used, since they
lead to precisely enough results. Each node in
the Quadtree stores the features that correspond
to its quadrant, for example, a color histogram
(Lin et al., 2001), or a combination of feature
Figure 1.
An example of an R-tree
A B

C
D E G F H I K J L M


N
G
F
E
D
K
J
H
N
M
L
A
B
I
C
Figure 2. An example of a Region Quadtree

24
Image Database Indexing Techniques
histograms (Malki et al, 1999). De Natale and
Granelli (2001) use unbalanced Quadtrees for
image segmentation to dominant colors. Each
quadtree is modelled by a binary array represent-
ing its structure and a label array representing
the dominant color associated to each node or
leaf. Ahmad & Grosky (2003) use unbalanced
Quadtrees to decompose an image into a spatial
arrangement of features points (extracted using
image processing techniques) and to quantify

image similarity, while providing geometric vari-
ance independence. For search and retrieval, an
indexing scheme based on image signatures and
quadtrees is used. Chakrabarti et al. (2000), use
Quadtrees to represent two-dimensional shapes
and perform shape-based similarity retrieval. The
proposed representation is designed to exhibit
invariance to scale, translation and rotation.
Overlapping has been applied to Linear Region
Quadtrees (Tzouramanis et al. 2004). In this and
previous papers by the same authors, four differ-
ent extensions of the Linear Region Quadtree are
presented for indexing a sequence of evolving
raster data. Moreover, temporal window queries
are dened and studied. These queries relate to
the evolution of regional data inside a window in
the course of time.
Quadtrees have also been used for creating an
IDB, where image retrieval, insertion, deletion,
comparison and set operations can be applied. A
single quadtree is used for all images. Its nodes
are associated with the list of images that have
information in the respective quadrants. Vas-
silakopoulos & Manolopoulos (1995) proposed
Dynamic Inverted Quadtrees, while Jomier et
al. (2000) proposed a version suitable for binary,
gray scale or color images.
Corral et al. (1999) combine two different
kinds of data and two different kinds of indexing
structures. They present ve algorithms suitable

for processing join queries between point data
stored in an R-tree and image data stored in a
Linear Region Quadtree.
Due to space limitations, the most prominent
IDB indexing structures are reviewed in this
article. The choice of an indexing method among
them depends on the application. Each of the
above techniques has been designed around a
specic problem setting. A qualitative comparison
between them is an interesting direction for future
work that lies beyond the scope and the size limit
of this article. Descriptions of several other Spatial
Access Methods that have been used in IDBs can
be found in Samet (1990), Gaede and Günther
(1998) and Manolopoulos et. al. (2000).
future trends
IDBs are related to two different scientic commu-
nities: database and image processing / computer
vision researchers. Multidimensional access meth-
ods as well as information retrieval techniques and
their use for query processing, constitute the key
meeting point of the two worlds. Several of the
techniques of the image processing community
could make further use of access methods or/and
adapt to their properties, leading to more efcient
processing of image related queries. Related to the
previous research direction is the further develop-
ment of systems able to retrieve (and, in general,
process queries) from image collections existing
in different sources, including the WWW (Rui et

al. 1999) and indexing techniques are expected
to play a dominant role in them.
In Zhao and Grosky (2002) one of the rst
techniques for integrated image retrieval based
on low-level features and high-level semantic
features of images is presented. Mojsilovic et al.
(2004) present a methodology for semantic-based
image retrieval based on low-level image descrip-
tors. However, neither of these works is based on
indexing structures. Since image retrieval based
on both these kinds is features is crucial for the
usefulness of CBIR systems (for a discussion of
the semantic gap see subsection 2.4 of, Smeulders
et al. 2000) and still remains one of the big chal-
25
Image Database Indexing Techniques
lenges for researchers, indexing structures could
be used in this context for calculating the correla-
tions between low-level features and high-level
concepts efciently.
In Mao et al. (2005) distance-based tree struc-
tures are used for computing the similarity of im-
ages, which are represented by features reecting
their structure, texture and color. Although the
high dimensionality of the feature space sug-
gests that distance-based indexing techniques
are outperformed by sequential scan (curse of
dimensionality), the authors show that the in-
trinsic dimensionality of real data is low and can
apply distance-based indexing that is specically

designed to reect the intrinsic clustering of real
data. The design and study of more generalized
techniques in this direction is another research
challenge.
Despite the extensive research performed
in spatial / spatio-temporal databases, storing a
large database of (possibly evolving) images, or
of regional data sets and being able to efciently
answer queries between these data and other
sorts of spatial/spatiotemporal data, or queries
involving the notion of time is still a big research
challenge. For example, being able to efciently
answer queries like: nd the boats (moving points)
that were inside the storm (changing regional data)
during this morning (a time interval).
conc Lus Ion
In this paper, we have reviewed techniques related
to indexing an image database as a means for ef-
ciently processing several kinds of queries, like
retrieval based on features, content, structure,
processing of joins, and queries by example.
Although, research in this scientic area counts
several years, there are still signicant research
challenges. Image databases and their indexing
structures bring together two different disciplines
(databases and image processing) and develop-
ing a true Image Database System requires
interdisciplinary research efforts. Nevertheless,
the semantic gap is alive and querying between
images and other kinds of spatial data has not

attracted enough attention yet.
r eferences
Ahmad, I., & Grosky, W. I. (2003). Indexing and
retrieval of images by spatial constraints. Journal
of Visual Communication and Image Representa-
tion, 14(3), 291-320.
Annamalai, M., Chopra, R., DeFazio, S., & Ma-
vris, S. (2000). Indexing images in oracle8i. In
Proc. SIGMOD’00, 539-547.
Chakrabarti, K., Ortega-Binderberger, M., Por-
kaew, K., Zuo, P., & Mehrotra, S. (2000). Similar
Shape Retrieval in MARS. In Proc. IEEE Int. Conf.
on Multimedia and Expo (II), 709-712.
Chang, S. K., Shi, Q. Y., & Yan, C. W. (1987).
Iconic indexing by 2-d strings. IEEE Trans. Pat-
tern Anal. Machine Intell., 9, 413-427.
Corral, A., Vassilakopoulos, M., & Manolopoulos,
Y. (1999). Algorithms for Joining R-trees and
Linear Region Quadtrees. In Proc. of SSD’99,
LNCS 1651, 251-269. Spinger Verlag.
De Natale, F. G. B., & Granelli, F. (2001). Struc-
tured-Based Image Retrieval Using a Structured
Color Descriptor. In Proc. Int. Workshop on
Content-Based Multimedia Indexing (CBMI’01),
109-115
Flickner, M., Sawhney, H., Ashley, J., Huang, Q.,
Dom, B., Gorkani, M., Hafner, J., Lee, D., Pet-
kovic, D., Steele, D., & Yanker, P. (1995). Query
by Image and Video Content: The QBIC System.
IEEE Computer 28(9), 23-32

Gaede, V., & Günther, O. (1998). Multidimensional
Access Methods. ACM Computing Surveys, 30(2),
170-231.
26
Image Database Indexing Techniques
Jomier, G., Manouvrier, M., & Rukoz, M. (2000).
Storage and Management of Similar Images.
Journal of the Brazilian Computer Society (JBCS),
3(6), 13-26.
Lin, K I., Jagadish, H. V., & Faloutsos, C. (1994)
The TV-tree - an index structure for high-dimen-
sional data. VLDB Journal, 3, 517-542.
Lin, S., Tamer, Özsu, M., Oria, V., & Ng, R.
(2001). An Extendible Hash for Multi-Precision
Similarity Querying of Image Databases. In Proc.
of VLDB’2001, 221-230.
Malki, J., Boujemaa, N., Nastar, C., & Winter, A.
(1999). Region Queries without Segmentation for
Image Retrieval by Content. In proc. of 3rd Int.
Conf. on Visual Information Systems (Visual’99),
115-122.
Manolopoulos, Y., Theodoridis, Y., & Tsotras,
V. (2000). Image and Multimedia indexing. In
Advanced Database Indexing, Kluwer Publish-
ers, 167-184.
Manouvrier, M., Rukoz, M., & Jomier, G. (2005)
Quadtree-Based Image Representation and Re-
trieval. In Manolopoulos Y., Papadopoulos A. &
Vassilakopoulos M. (Eds.). Spatial Databases:
Technologies, Techniques and Trends. Idea Group

Publishing, Information Science Publishing and
IRM Press, 81-106.
Mao, R., Iqbal, Q., Liu, W., & Miranker, D. (2005).
Case study: Distance-Based Image Retrieval in
the MoBIoS DBMS, In Proc. of the 5th Int. Conf.
on Computer and Information Technology (CIT-
2005), pp. 49-57.
Mojsilovic, A., Gomes, J., & Rogowitz, B. (2004).
Semantic-Friendly Indexing and Quering of Im-
ages Based on the Extraction of the Objective
Semantic Cues. In Special Issue on Content-Based
Image Retrieval of IJCV (56), No. 1-2, 79-107.
Papadias, D., Mamoulis, N., & Delis, V. (1998).
Algorithms for Querying by Spatial Structure. In
Proc. VLDB’98, 546-557.
Petrakis, E., & Orphanoudakis, S. (1993). Meth-
odology for the representation, indexing and
retrieval of images by content. Image Vision
Comput. 11(8), 504-521.
Petrakis, E., & Orphanoudakis, S. (1996). A
Generalized Approach for Image Indexing and
Retrieval Based οn 2-D String. In Jungert C.
& Tortora G. (eds.). Intelligent Image Database
Systems ( important contributions in the eld of
spatial projections), World Scientic Publishing
C
o. 19
7-218.
Petrakis, E. (2002). Fast Retrieval by Spatial Struc-
ture in Image Databases. J. Vis. Lang. Comput.,

13(5), 545-569
Petrakis, E., & Faloutsos, C. (1997). Similarity
Searching in Medical Image Databases. IEEE
Trans. Knowl. Data Eng. 9(3), 435-447
Petrakis, E., Faloutsos, C., & Lin, K-I (2002).
ImageMap: An Image Indexing Method Based
on Spatial Similarity. IEEE Trans. Knowl. Data
Eng., 14(5), 979-987
Price, K. (2006). Annotated Computer Vision
Bibliography. />bibliography/contents.html
Rui, Y., Huang, T. S., & Chang, S F. (1999)
Image retrieval: Current techniques, promising
directions, and open issues. Journal of Visual
Communication and Image Representation,
10(1), 39-62.
Samet, H. (1990). The Design and Analysis of
Spatial Data Structures. Addison Wesley.
Seidl T. & Kriegel H P. (2001) Adaptable Similar-
ity Search in Large Image Databases. In Veltkamp
R., Burkhardt H., Kriegel H P.(eds.): State-of-the
Art in Content-Based Image and Video Retrieval,
Kluwer Publishers, 297-317.
Smeulders, A, Worring, M., Santini, S., Gupta, A.,
& Jain, R. (2000) Content-Based Image Retrieval
at the End of the Early Years. IEEE Trans. Pattern
Anal. Mach. Intell. 22(12), 1349-1380.
27
Image Database Indexing Techniques
Tzouramanis, T., Vassilakopoulos, M., & Manolo-
poulos, Y. (2004). Benchmarking access methods

for time-evolving regional data. Data & Knowl.
Eng., 49(3), 243-286
Vassilakopoulos, M., & Manolopoulos, Y. (1995).
Dynamic Inverted Quadtree - a Structure for
Pictorial Databases. Information Systems. Spe-
cial Issue on Multimedia Information Systems,
20(6), 483-500.
Veltkamp, R. C., & Tanase, M. (2001) Content-
based image retrieval systems: A survey. http://
www.aa-lab.cs.uu.nl/cbirsurvey/.
Zhao, R., & Grosky, W. I. (2002) Bridging the
semanitic gap in image retrieval. In Distributed
multimedia databases: techniques & applications,
Idea Group Publishing, 14-36.
key ter Ms
Access Method or Index Structure: A
technique of organizing data that allows the
efcient retrieval of data according to a set of
search criteria.
Color Features of an Image: Characteristics
of an image related to the presence of color in-
formation, like distribution of colors, dominant
colors, or color moments.
Content-Based Image Retrieval: Searching
for images in image databases according to their
visual contents, like searching for images with
specic color, texture, or shape properties, for
images containing specic objects, or containing
objects in a specied arrangement.
Image Database: An organized collection

of digital images aimed at the efcient manage-
ment and the processing of queries on this image
collection
Query Processing: Extracting information
from a large amount of data without actually
changing the underlying database where the data
are organized.
Semantic Features of an Image: The contents
of an image according to human perception, like
the objects present in the image or the concepts
/ situations related to the image.
Shape Features of an Image: The physical
structure(s) of the objects, or the geometric shapes
present in the image.
Similarity of Images: The degree of likeness
between images according to a number of features,
like color texture, shape, and semantic features.
Structural Features of an Image: The ar-
rangement of the objects depicted in the image.
Texture Features of an Image: The pattern(s)
of the image’s surface change, usually expressed
by a combination of characteristics like coarse-
ness, contrast, directionality, uniformity, regular-
ity, density, and frequency.
28
Chapter IV
Different Roles and De.n itions
of Spatial Data Fusion
Patrik Skogster
Rouaniemi University of Applied Sciences, Finland

Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Abstr Act
Geographic information is created by manipulating geographic (or spatial) data (generally known by
the abbreviation geodata) in a computerized system. Geo-spatial information and geomatics are issues
of modern business and research. It is essential to provide their different denitions and roles in order
to get an overall picture of the issue. This article discusses about the problematic of denitions, but also
the technologies and challenges within spatial data fusion.
Introduct Ion
Due to the rapid advances in database systems
and information technology over the last decade,
researchers in information systems, decision sci-
ence, articial intelligence (AI), machine learning,
and data mining communities are facing a new
challenge: discovering and driving useful and
actionable knowledge from massive data sets.
During the last decade, many researchers have
also studied how to exploit the synergy in infor-
mation from multiple sources. This phenomenon
29
Different Roles and De.nitions of Spatial Data Fusion
includes terminology such as spatial data fusion,
information fusion, knowledge (and/or belief)
fusion, and many more.
t er MIno Logy
Geospatial data has many denitions, but one point
of view is that it is data consisting of geographical
information, geostatistics and geotextual infor-
mation. This theme was handled already in the
mid 80’s by Crist & Cicone (1984a). According
to Crist and Cicone (1984a), geostatistics are data

that is related to a national or subnational unit
and can be georeferenced. Geotextual data are
dened as text databases (like treaty databases)
that are linked to some geographic entity. Crist
and Cicone also (1984b) argue that data fusion is
not just overlaying maps.
Information fusion is a term dened as “a for-
mal framework in which are expressed the means
and tools for the alliance of data originating from
different sources” (Wald 1999). Wald continues
(2000) that spatial data fusion is therefore “the
formal framework that expresses the means and
tools for the alliance of data originating from dif-
ferent sources”. It must be remembered, though,
that every denition always reects the current
subject. Wald´s (2000) focus is mainly on the
prominent vision of remote sensing data, where
discussion is about pixel fusion, image fusion,
sensor fusion and measurement fusion.
The term “information fusion” can, in other
words, be used when different information and
data is used to solve problems. Locational data
can be added to this information fusion context
and the result is spatial data fusion. As Kim
(2005) describes, “information fusion can be
implemented at two different levels: raw data
and intermediate data”. Information fusion at
raw data basically means taking advantage of the
synergy from considering multiple of the same
pattern (i.e., considering two temporal series

based on different measurement systems), while
information fusion at immediate data is to take
the synergy from utilizing multiple patterns (i.e.,
utilizing both temporal and spatial patterns)(Hall
& Llinas 2001). The information fusion at the raw
data level becomes important for example when
two different measurements have recorded the
same activities or events (e.g., ood level or mar-
ket share) at the same location on a regular basis
(Vanderhaegen & Muro 2005; Pereira 2002).
Knowledge fusion is the process by which het-
erogeneous information from multiple sources is
merged to create knowledge that is more complete,
less uncertain, and less conicting than the input.
Knowledge fusion can be seen as a process that
creates knowledge. Knowledge fusion can also
involve annotating the output information with
meta-level information about the provenance of
the information used and the mode of aggregation
(Hunter and Liu 2005; Hunter and Summerton
2004).
Spatial data fusion is a combination of the
above mentioned with the dimension of spatial-
ity. It is by denition an enormous and complex
eld, comprising issues ranging from registration
and pixel-level fusion of data for improving the
spatial resolution of managerial decision level
fusion by using previously computed informa-
tion stored in geographic information systems
(Malhotra 1998).

t echno Log Ies wIth In spAt IAL
dAt A fus Ion
It has been estimated that up to 80% of all data
stored in corporate databases may have a spatial
component (Franklin 1992). To support analytical
processes, today’s organizations deploy data ware-
houses and client tools such as OLAP (On-Line
Analytical Processing) to access, visualize, and
analyze integrated, aggregated and summarized
data. The term “multidimensional” was estab-
lished in the mid-1980s by computer scientists
who were involved in the extraction of meaningful
30
Different Roles and Denitions of Spatial Data Fusion
information from very large statistical databases
(Rafanelli 2003).
Since a large part of this data has a spatial
component, better client tools are required to
take full advantage of the geometry of the spatial
phenomena or objects being analyzed. In this
regard, Spatial OLAP (SOLAP) technology can
be one solution (Rivest et al. 2005). A SOLAP
tool can be dened as “a type of software that
allows rapid and easy navigation within spatial
databases and that offers many levels of informa-
tion granularity, many themes, many epochs and
many display modes synchronized or not: maps,
tables and diagrams” (Bedard et al. 2005).
As an alternative to the traditional statistic
regression models, new algorithms have been

developed and presented from machine learn-
ing, articial intelligence, and data mining
communities (Kim 2005). These algorithms
include decision tree algorithms (Quinlan 1993),
support vector machines (SVM) (Burges 1998),
and genetic algorithms (Goldberg 1989). In
particular, many algorithms based on articial
neural networks (ANNs) and their variants have
been shown very successful to predict, classify,
and describe temporally correlated data (Giles et
al. 2001). ANNs can also be applied to various
business applications (Christakos et al. 2002)
such as analytical review procedures in auditing
(Koskivaara 2004), stock market predictions (Saad
et al. 1998), market segmentations (Hruschka &
Natter 1999) and Web usage proling (Ananda-
rajan 2002). It is the universal approximation
property of ANNs that makes ANNs one of the
most popular algorithms to analyze temporally
correlated stochastic processes. The universal
approximation property implies that with an
innite number of hidden nodes, multi-layer
neural networks can approximate any function
arbitrarily close (Hornik et al. 1989). Commonly,
multi-layer perceptions with sigmoidal and radial
basis functions have been used as alternatives to
the linear stochastic model, AR(p) model.
Another type of neural network, time-delay
neural networks (TDNNs), has been used to ap-
proximate a stochastic process. In TDNNs, input

patterns arrive at hidden nodes at different times
through delayed connections and thus can inu-
ence subsequent inputs. A different type of neural
network, recurrent neural networks (RNNs), has
also been proposed to model temporally correlated
data sets. “Jordan” (Jordan 1986) and “Elman”
networks (Elman 1990) are two representative
examples of RNNs. Both networks employ
feedback connections to enhance the limited
representational power of networks due to a nar-
row time window. For example, Jordan networks
have a feedback loop from the output layer to an
additional input called the context layer. However,
both types of networks are still restricted in the
sense that they cannot deal with an arbitrarily
long time window (Dorffher 1996).
Since transactional systems are not designed
to support the decisional processes, new types
of systems have been developed to perform data
fusion such as those developed by Fischer and
Ostwald (2001). The solutions are technically
called Analytical Systems and are known on the
market as Business Intelligence (BI) solutions.
Rivest et al (2005) explain that these systems,
in which the data warehouse is usually a central
component, are optimized to facilitate complex
analysis and to improve the performance of
database queries involving thousands or more
occurrences. According to Meeks and Dasgupta
(2004), “in the short-term, the models need re-

nement, primarily through applying statistical
condence to the scoring functions”. Popular web
search engines can also be seen as BI- solutions.
They use several different evaluation schemes
such as keyword proximity, keyword density, and
synonym matching, among others, to estimate the
quality of links and les returned from Internet
text searches. Analyses made by their servers
combine different search terms with relevant data
available, in other words search engines perform
data fusion.
31
Different Roles and Denitions of Spatial Data Fusion
Extensible Markup Language (XML) is
applied to control all denitions of discovered
patterns and rules to ensure the consistency of
the proposed knowledge map. The advantage
of XML is that it represents a compromise be-
tween exibility, simplicity, and readability by
both humans and machines (World Wide Web
Consortium (W3C), 2000). So XML is rapidly
becoming an information-exchange standard for
integrating data among various Internet-based
applications (Bertino and Ferrari 2001). It must,
though, be noticed that especially in the web-
browsing context exist also numerous other data
fusion standards.
Fusion rule technology is a logic-based ap-
proach to knowledge fusion. A set of fusion rules
is a way of specifying how to merge structured

reports. Structured reports are XML documents,
where the data entries are restricted to indi-
vidual words or simple phrases, such as names
and domain-specic terminology, numbers and
units. Different sets of fusion rules, with differ-
ent merging criteria, can be used to investigate
a set of structured data analyses by looking at
the results of merging. More information can
be found in Hunter & Liu (2005) and Hunter &
Summerton (2004).
Asynchronous JavaScript and XML (AJAX)
is created to build dynamic web pages on the
client side. Data is read from the server or sent
to the server by JavaScript requests. However,
some processing at the server side is required to
handle requests, i.e., nding and storing the data.
This is accomplished more easily with the use of a
framework dedicated to process Ajax requests. In
the article that coined the “AJAX” term, Garrett
(2005) describes the technology as “an intermedi-
ary between the user and the server.” This Ajax
engine is intended to suppress so called waiting
time for the user when the page attempts to ac-
cess the server. The goal of the framework is to
provide this Ajax engine and associated server
and client-side functions.
chALLenges In spAt IAL dAt A
fus Ion
The use of various spatio-temporal data and
information usually greatly improves decision-

making in many different elds (Cristakos et
al. 2003). Examples can be found in Meeks and
Dasgupta (2004). When using spatial and temporal
information to improve decision making, atten-
tion must be paid to uncertainty and sensitivity
issues (Crosetto & Tarantola 2001).
Because the spatial data fusion process is by
its origins a process that produces data assimila-
tions, the challenges it is facing are largely related
to the data handling process. These include the
ability to accept higher data rates and volumes,
improved analysis performance and improved
multiple operations. Data integration processes,
synchronous sampling and common measurement
standards are developed for optimizing the data
fusion performance. This includes increasing
both the data management process and data col-
lection efciency.
Fusion processes are not yet robust enough.
They must be capable of accepting wider ranges
of data types, accommodating natural language
extracted text reports, historical data and vari-
ous spatial information (maps, charts, images).
Therefore, the processes must have learning abili-
ties. Fusion processes must develop adaptive and
learning properties, using the operating process
to add to the knowledge base while exhibiting a
sensitivity to “unexpected” or “too good to be
true” situations that may indicate countermeasures
or deception activities.

The performance of processes needs to in-
crease exponentially. When the amount of pro-
cessed data increases and analyses become more
complicated, efcient, linked data structures are
required to handle the wide variety of data. Data
volumes for global and even regional spatial da-
tabases will be measured in millions of terabits,
and short access times are demanded for even
broad searches.
32
Different Roles and Denitions of Spatial Data Fusion
Future aspects on spatial data fusion are sub-
jects of great uncertainty. Nevertheless, merging
spatial data through the use of WMS (Web Map
Sevices) or geoRSS (Really Simple Syndication)
seems to become more and more common practice,
as well as strategic decisions based on spatial data
infrastructure (SDI) context.
The collection of data and its availability can
also be seen as a strategic matter. Roberts et
al. (2005) highlights the importance of making
sense of networks that comprise many nodes and
are animated by ows of resources and knowl-
edge. The transfer of managerial practices and
knowledge is essential to the functionality of
these networks and resources. A survey made
by Vanderhaeden & Muro (2005) reveals that
almost all of the organisations (90%) making
use of spatial data “experience problems with the
availability, quality and use of spatial data”. In

general, the organisations using the widest range
of data types experienced the greatest difculties
in using the data.
The quality of the spatial data is still only
one of the many factors that must be taken into
consideration within spatial data fusion. Clearly,
the results of any spatial data fusions are only as
good as the data on which it is based (Johnson et
al. 2001). One approach to improve data quality
is the imposition of constraints upon data entered
into the database (Cockcroft 1997). The proposal
is that “better decisions can be made by account-
ing risks due to errors in spatial data” (Van Oort
& Bregt 2005).
conc Lus Ion
A great amount of data located in various data-
bases have a spatial component. New innovative
applications can be produced by assimilating in-
formation with other data. This paper introduced
the terminology and technology associated with
spatial data fusion. Data fusion is the process of de-
tection, association, correlation, and combination
of data and information from multiple sources. In
order to lead the reader further, material mentioned
in the list of references are suggested and allow
one to go beyond the traditional transactional
data fusion capabilities.
r eferences
Anandarajan, M. (2002). Proling Web Usage
in the Workplace: A Behavior-based Articial

Intelligence Approach. Journal of Management
Information Systems. 19(1), 243-266
Bertino, E., & Ferrari, E. (2001). XML and data
integration. IEEE Internet Computing, 5(6), 75-
76.
Burges, C. J. C. (1998). A Tutorial on Support
Vector Machines. Data Mining and Knowledge
Discovery, 2(2), 1-27.
Christakos, G., Bogaert, P., & Serre, M. (2002).
In: Temporal GIS: Advanced Functions for Field-
based Applications. Berlin: Springer
Crist, E. P., & Cicone, R. C. (1984a). Comparisons
of the dimensionality and features of simulated
Landsat-4 MSS and TM data. Remote Sensing of
Environment, 14(1-3), 235-246
Crist, E. P., & Cicone, R. C. (1984b). A physi-
cally-based transformation of Thematic Mapper
data-the TM tasseled cap. IEEE Transactions
on Geoscience and Remote Sensing, 22(3), 256-
263.
Cockcroft, S. (1997). A Taxonomy of Spatial
Data Integrity Constraints. GeoInformatica 1(4),
327-343
Crosetto, M., & Tarantola, S. (2001). Uncertainty
and sensitivity analysis: Tools for GIS-based
model implementation. International Journal
of Geographical Information Science 15(5),
415–437.
33
Different Roles and Denitions of Spatial Data Fusion

Dorffher, G. (1996). Neural Networks for Time
Series Processing. Neural Network World. 6(4),
447-468.
Elman, J. L. (1990). Finding Structure in Time.
Cognitive Science,14(2), 179-212.
Fischer, G., & Ostwald, J. (2001). Knowledge
management: problems, promises, realities, and
challenges. IEEE Intelligent Systems, 16(1), 60-
72.
Franklin, C. (1992). An introduction to geographic
information systems: Linking maps to databases.
Database 15(2), 13–21.
Garrett, J. J. (2005). Ajax: A New Approach to
Web Applications. ptivepath.
com/publications/essays/archives/000385.php,
visited 25.4.2006.
Giles, C. L., Lawrence, S., & Tsoi, A. C. (2001).
Noisy Time Series Prediction using a Recurrent
Neural Network and Grammatical Inference.
Machine Learning, 44(1/2), 161-183.
Goldberg, D. E. (1989). Genetic Algorithms in
Search, Optimization and Machine Learning.
New York: Addison-Wesley.
Hall, D. L., & Llinas, J. (2001). Handbook on Mul-
tisensor Data Fusion. Boca Raton: CRC Press.
Hornik, K., Stinchcombe, M., & White, H
(1989). Multi-layer Feedforward Networks are
Universal Approximators. Neural Networks,
2(5), 359-366.
Hruschka, H., & Natter, M. (1999). Comparing

Performance of Feedforward Neural Nets and K-
means for Market Segmentation. European Jour-
nal of Operational Research, 114(2), 346-353.
Hunter, A., & Liu, W. (2005). Fusion rules for
merging uncertain information. Information
Fusion, 7, 97-134.
Hunter, A., & Summerton, R. (2004). Fusion rules
for context-dependent aggregation of structured
news reports. Journal of Applied Non-classical
Logic, 14(3), 329-366.
Johnson, R. G. (2001). In: United States Imagery
and Geospatial Information Service Geospatial
Transition Plan. Bethesda, MD: National Imagery
and Mapping Agency.
Jordan, M. I. (1986). Serial Order: A Parallel
Distributed Processing Approach. Technical Re-
port ICS 8604. San Diego: Institute for Cognitive
Sciences, University of California.
Kim, Y. (2005). Information fusion via a hier-
archical neural network model. The Journal of
Computer Information Systems, 45(4), 1-14.
Koskivaara, E. (2004). Articial neural networks
for analytical review in auditing. Publications
of the Turku School of Economics and Business
Administration A-7.
Malhotra, Y. (1998). Deciphering the knowledge
management hype. Journal for Quality & Par-
ticipation, 2(14), 58-60.
Meeks, W. L., & Dasgupta, S. (2004). Geospatial
information utility: an estimation of the relevance

of geospatial information to users. Decision Sup-
port Systems, 38(1), 47-63.
Van Oort, P. A. J., & Bregt, A. K. (2005). Do Users
Ignore Spatial Data Quality? A Decision-Theoretic
Perspective. Risk Analysis, 25(6), 1599-1610.
Pereira, G. M. (2002). A typology of spatial and
temporal scale relations. Geographical Analysis,
34(1), 21–33.
Quinlan, J. R. (1993). Programs for Machine
Learning. San Mateo, CA: Morgan Kaufmann.
Rafanelli, M. (2003). Multidimensional Data-
bases: Problems and Solutions. London: Idea
Group Publishing.
Rivest, S., Bedard, Y., Proulx, M-L., Nadeau, M.,
Hubert, F., & Pastor, J. (2005). SOLAP technol-
ogy: Merging business intelligence with geospa-
34
Different Roles and Denitions of Spatial Data Fusion
tial technology for interactive spatio-temporal
exploration and analysis of data. ISPRS Journal
of Photogrammetry and Remote Sensing, 60(1),
17-33.
Roberts, S. M., Jones, J. P., & Frohling, O. (2005).
NGOs and the globalization of managerialism: A
research framework, 33(11), 1845-1864.
Saad, E., Prokhorov, D., & Wunsch, D. (1998).
Comparative Study of Stock Trend Prediction
Using Time Delay, Recurrent and Probabilistic
Neural Networks. IEEE Transactions on Neural
Networks, 9(6), 1456-1470.

Vanderhaegen, M., & Muro, E. (2005). Contri-
bution of a European spatial data infrastructure
to the effectiveness of EIA and SEA studies.
Environmental Impact Assessment Review, 25(2),
123-142.
Wald, L. (2000). A Conceptual Approach To The
Fusion Of Earth Observation Data. Surveys in
Geophysics, 2(2-3), 177-186.
Wald, L. (1999). Some Terms of Reference in Data
Fusion. IEEE Transactions on Geosciences and
Remote Sensing, 37(3), 1190-1193.
World Wide Web Consortium (2000), Extensible
Markup Language (XML) 1.0, 2nd ed., World Wide
Web Consortium, available at: www.w3.org/TR/
REC-xml (1st ed. published in 1998), Vol. W3C
Recommendation.
key t er Ms
AJAX Processes: A scripting technique for
silently loading new data from the server. Al-
though AJAX scripts commonly use the soon to
be standardized XMLHttpRequest object, they
could also use a hidden iframe or frame. An AJAX
script is useless by itself. It also requires a DOM
Scripting component to embed the received data
in the document.
Arti. cial Intelligence (AI): Multidisciplinary
eld encompassing computer science, neuro-
science, philosophy, psychology, robotics, and
linguistics, and is devoted to the reproduction of
the methods or results of human reasoning and

brain activity.
Articial Neural Networks (ANN): Also
called a simulated neural network (SNN) or just a
neural network (NN), is an interconnected group
of articial neurons that uses a mathematical or
computational model for information processing
based on a connectionist approach to computa-
tion.
Data Mining: The analysis of data to establish
relationships and identify patterns.
Extensible Markup Language (XML): A
W3C-recommended general-purpose markup
language for creating special-purpose markup
languages, capable of describing many different
kinds of data. XML is a way of describing data.
GeoRSS: “RSS” is variously used to refer to
the following: Really Simple Syndication (RSS
2.0), Rich Site Summary (RSS 0.91, RSS 1.0)
and RDF Site Summary (RSS 0.9 and 1.0). It
can be dened as a family of web feed formats.
In the RSS- context geographical data is known
as geoRSS.
Geospatial Data: Data consisting of geo-
graphical information, geostatistics and geotex-
tual information.
Raw Data: Uninterpreted data from a storage
medium. The maximum amount of raw data that
can be copied from a storage medium equals the
capacity of the medium.
Spatial Data Infrastructure (SDI): Often

used to denote the relevant base collection of
technologies, policies and institutional arrange-
ments that facilitate the availability of and access
to spatial data.
35
Different Roles and Denitions of Spatial Data Fusion
Web Map Service (WMS): Produces maps of
spatially referenced data dynamically from geo-
graphic information. This international standard
denes a “map” to be a portrayal of geographic
information as a digital image le suitable for
display on a computer screen. A map is not the
data itself. WMS-produced maps are generally
rendered in a pictorial format such as PNG, GIF
or JPEG. This is in contrast to a Web Feature
Service (WFS), which returns the actual data.
36
Chapter V
Spatial Data Infrastructures
Carlos Granell
Universitat Jaume I, Spain
Michael Gould
Universitat Jaume I, Spain
Miguel Ángel Manso
Technical University of Madrid, Spain
Miguel Ángel Bernabé
Technical University of Madrid, Spain
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Abstr Act
Geographic Information Systems (GIS) are data-centric applications that rely on the input and constant

maintenance of large quantities of basic and thematic spatial data in order to be useful tools for deci-
sion-making. This chapter presents the institutional collaboration framework and the major technology
components to facilitate discovery and sharing of spatial data: Spatial Data Infrastructures (SDI). We
review the essential software components –metadata editors and associated catalogue services, spatial
data content repositories, client applications, and middleware or intermediate geospatial services– that
dene SDIs as heterogeneous distributed information systems. Finally we highlight future research needs
in the areas of semantic interoperability of SDI services and in improved institutional collaboration.
37
Spatial Data Infrastructures
Int roduct Ion
Geographic Information Systems (GIS) and re-
lated spatial applications are data-centric in the
sense that they rely on the input and constant main-
tenance of large quantities of reference spatial data,
on top of which integrators and end-users produce
value-added thematic geographic information for
the purpose of decision-making. A typical GIS
workow can be simplied as consisting of three
components: 1) data entry and reformatting, 2) data
processing (geoprocessing), and 3) presentation
of results to the user. In practice, this apparently
simple workow is constrained by two key factors.
The rst is limited interoperability among GIS
components, because most are tightly coupled
to specic data formats or to other software,
complicating the task of integrating components
from multiple vendors. The second is that the
basic spatial data (reference data) necessary to
begin geoprocessing are in many cases not readily
available, because they are poorly documented,

outdated, are too expensive, or are available under
restrictive licensing conditions. This second factor
has been seriously limiting the ability of govern-
ment employees, researchers and businesses to
exploit geographic information, unnecessarily
incrementing project costs and, thus, negatively
affecting the economy.
Many government administrations have rec-
ognized this critical problem and have initiated
coordinated actions to facilitate the discovery and
sharing of spatial data, creating the institutional
basis for Spatial Data Infrastructures (SDI) (van
Loenen and Kok 2004). The Global Spatial Data
Infrastructure (GSDI) association (www.gsdi.org)
denes SDI as a coordinated series of agreements
on technology standards, institutional arrange-
ments, and policies that enable the discovery
and facilitate the availability of and access to
spatial data. The SDI, once agreed upon and
implemented, serves to connect GIS and other
spatial data users to a myriad of spatial data
sources, the majority of which are held by public
sector agencies.
In 1990 the U.S. Federal Geographic Data
Committee (FGDC) was created and in 1994,
then president, William Clinton, asked it (Ex-
ecutive Order 12906) to establish a national SDI
in conjunction with organizations from state,
local, and tribal governments, the academic
community, and the private sector. Three years

later the European Umbrella Organization for
Geographic Information (EUROGI) was created
with the mission to develop a unied European
approach to the use of geographic technologies
(a mission far from complete). More recently, the
European Commission launched the Infrastruc-
ture for Spatial Information in Europe (INSPIRE)
initiative for the creation of a European Spatial
Data Infrastructure (ESDI), based on a Frame-
work Directive (European legislation) dening
how European member states should go about
facilitating discovery and access to integrated
and interoperable spatial information services
and their respective data sources. As the number
of national SDIs increased, to include in 2004
about half the nations worldwide (Masser 2005;
Crompvoets et al. 2004), the Global Spatial Data
Infrastructure (GSDI) Association was created to
promote international cooperation and collabora-
tion in support of local, national, and international
SDI developments.
The basic creation and management principles
of SDI apply to any and all spatial jurisdictions
in a spatial hierarchy, from municipalities to
regions, states, nations, and international areas.
Béjar et al. (2004) show how each SDI at each
level in the hierarchy can be created in accordance
with its thematic (e.g., soils, transportation) and
geographical (e.g., municipality, nation) cover-
age, following international standards-based

processes and interfaces, to help ensure that the
SDIs t like puzzle pieces, both geographically
and vertically (thematically). This harmoniza-
tion exercise is necessary to allow for seamless
spatial data discovery and exploitation crossing
jurisdictional boundaries, in the case of response
to ooding or forest res, just to name two im-
38
Spatial Data Infrastructures
portant cross-border applications. In practice this
harmonization has been difcult to achieve due
to political but also semantic-related differences
between neighboring regions. An early (1980s)
European exercise in cross-border harmonization,
stitching together nationally-produced pieces of
the Coordinated Information on the European
Environment (CORINE) land cover database,
highlighted some of these discrepancies at regional
and national borders: experts on both sides dis-
agreed on how to classify the same, cross-border
land cover regions.
sd I essent IAL co Mponents
Although SDIs are primarily institutional col-
laboration frameworks, they also dene and guide
implementation of heterogeneous distributed
information systems, consisting of four main
software components linked via Internet. These
components are: 1) metadata editors and associ-
ated catalogue services, 2) spatial data content
repositories, 3) client applications for user search

and access to spatial data, and 4) middleware or
intermediate geoprocessing services which assist
the user in nding and in transforming spatial
data for use at the client side application.
Figure 1 summarizes these essential technol-
ogy components, as generally accepted within the
geographic information standards organizations
Open Geospatial Consortium (OGC) (www.open-
geospatial.org) and ISO Technical Committee
211 (www.isotc211.org), and synthesized by the
FGDC and NASA. This conceptual architecture
may be interpreted as a traditional 3-tier client-
middleware-server model, where GI applications
seek spatial data content that are discovered
and then possibly transformed or processed by
intermediary services before presentation by the
client application. But the architecture also may
be interpreted using the web services ‘publish-
nd-bind’ triangle model (Gottschalk et al. 2002),
whereby spatial data content (and service) offers
are published to catalogue servers, which are later
queried to discover (nd) data or services, and
then the client application binds to (consumes or
executes) them.

Figure 1. High-level SDI architecture, taken from the FGDC-NASA Geospatial Interoperability Refer-
ence Model (GIRM), (FGDC 2003)
39
Spatial Data Infrastructures
Regardless of the precise conceptual model

adopted, what is common among nearly all SDIs
is the primary goal of improving discovery and
access to spatial data. Discovery is based on
the documentation of datasets to be shared, in
the sense of metadata following international
standards such as ISO 19115/19139. Metadata
describing the content, geographic and temporal
coverage, authorship, access and usage rights
details, and other attributes of a dataset are cre-
ated within GIS applications or externally using
specialized text editors. The metadata les are
stored in standard XML formats and are then
sent (published) to some data catalogue server,
in many cases one which is located at a central
node of the SDI but in principle may be distributed
anywhere on the network.
Users wishing to discover spatial data sources
normally access catalogue search interfaces via
web applications called Geoportals (Bernard
et al. 2005), examples of which may be found
at and http://eu-
geoportal.jrc.it/. The geoportal is an interface
façade, both hiding the implementation details of
the underlying catalogue query mechanisms, and
inviting participation in the SDI community. In
addition to discovery queries, the geoportal also
normally provides free access to quick looks or
small samples of datasets that are discovered.
This spatial data visualization is frequently imple-
mented as software employing Web Map Service

(WMS) software interfaces (OGC 2006), allowing
for integration of heterogeneous client and server
products from multiple vendors, as proprietary
or free software solutions. WMS-based services
receive a request for a certain spatial data layer
and for a certain geographical extent, convert the
data (initially in vector or raster format) to create
a bitmap (standard MIME formats such as JPEG,
GIF, PNG) and then deliver the image to the web
client (browser or GIS/SDI client).
More sophisticated spatial data (web) services
are becoming available, many of which also
following de jure ISO standards and de facto
specications from organizations such as OGC,
Organization for the Advancement of Structured
Information Standards (OASIS), and W3C. These
include services providing concrete functionality
such as coordinate transformation, basic image
processing and treatment, basic geostatistics, and
composition or chaining of individual services to
form more complex services.
Summarizing both the institutional and
technological aspects, article 1 of the INSPIRE
European Directive proposal (EC 2004), lists the
following necessary components for SDIs:
“The component elements of those infrastruc-
tures shall include metadata, spatial datasets and
spatial data services; network services and tech-
nologies; agreements on sharing, access and use;
and coordination and monitoring mechanisms,

processes and procedures.”
Caution should be taken when describing more
narrow initiatives, projects or products which
provide only a subset of these requirements, as
SDI. This is especially the case of institutional
agreements and cooperation for improved access
to spatial data: the above principles and compo-
nents should be explicitly involved.
f uture r ese Arch
SDI researchers are active on several fronts,
but two main areas of interest are: improving
institutional collaboration and SDI effective-
ness (including cost-benet analyses and more
elaborate data access policy), and SDI component
implementation and testing. In the second area
fall topics such as semantic interoperability and
composition of SDI (web) services, the integra-
tion of so-called disruptive technologies such as
Google Earth and similar commercial services,
grass-roots initiatives contributing user-generated
data, integration with grid computing and with
e-Government solutions, and exploitation of data
from diverse sensor networks.
40
Spatial Data Infrastructures
For further details on SDI the reader should
consult the Spatial Data Infrastructure Cookbook
(GSDI 2004) and the European Commission’s
International Journal of SDI Research (http://
ijsdir.jrc.it).

r eferences
Béjar, R., Gallardo, P., Gould, M., Muro, P.,
Nogueras, J., & Zarazaga, J. (2004). A high level
architecture for national SDI: The Spanish case.
EC-GI&GIS Workshop, Warsaw, June 2004.
Retrieved April 4, 2006, from -gis.
org/Workshops/10ec-gis/.
Bernard, L., Kanellopoulos, I., Annoni, A., &
Smits, P. (2005). The European geoportal — one
step towards the establishment of a European
Spatial Data Infrastructure. Computers, Environ-
ment and Urban Systems, 29(1), 15–31.
Crompvoets, J., Bregt, A., Rajabifard, A., &
Williamson, I. (2004). Assessing the worldwide
status of national spatial data clearinghouses.
International Journal of Geographical Informa-
tion Science, 18(7), 665-689.
EC Commission of The European Commu-
nities (2004). Proposal for a Directive of the
European Parliament and of the Council estab-
lishing an infrastructure for spatial information
in the Community (INSPIRE), COM(2004) 516.
Retrieved April 4, 2006, from .
it/proposal/EN.pdf.
FGDC (2003). The Geospatial Interoperability
Reference Model, version 1.1. Federal Geographic
Data Committee Geospatial Applications Interop-
erability (GAI) Working Group. Retrieved April
4, 2006, />Gottschalk, K., Graham, S., Krueger, S., & Snell,
J. (2002). Introduction to Web services archi-

tecture. IBM Systems Journal, 41(2). Retrieved
April 4, 2006, from son.
ibm.com/journal/sj/412/gottschalk.html
GSDI (2004). Spatial Data Infrastructure Cook-
book (English version 2.0). Retrieved April 4,
2006, from />kindex.asp.
Masser, I. (2005). GIS Worlds; Creating Spatial
Data Infrastructures. Redlands, California: ESRI
Press.
OGC (2006). OpenGIS Web Map Service
(WMS) implementation specication, version
1.3. Retrieved April 4, 2006, from http://portal.
opengeospatial.org/les/?artifact_id=14416.
van Loenen, B., & Kok, B.C. (Eds.). (2004). Spa-
tial data infrastructure and policy development
in Europe and the United States. Delft: Delft
University Press.
key t er Ms
FGDC: Federal Geographic Data Committee,
an interagency committee established in the US
in 1990, with the mandate to create and support
data sharing, in the form of the US National SDI.
.
GI: Geographic information, the subset of
information pertaining to, or referenced to, known
locations on or near the Earth’s surface.
GSDI: Global Spatial Data Infrastructure
Association. An umbrella organization grouping
national, regional and local organizations dedi-
cated to the creation and maintenance of SDIs

around the world. .
OGC: Open Geospatial Consortium, a mem-
bership body of 300-plus organizations from the
commercial, government and academic sectors,
that creates consensus interface specications
in an effort to maximize interoperability among
software detailing with geographic data. http://
www.opengeospatial.org.
SDI: Spatial Data Infrastructure

×