Tải bản đầy đủ (.pdf) (31 trang)

The Semantic Web:A Guide to the Future of XML, Web Services, and Knowledge Management phần 7 ppt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (678.62 KB, 31 trang )

164

Chapter 7

Word Sense and Hypernimic Taxomic Representation
Sense 1: depository financial institution, bank, banking concern, banking company —
(a financial institution that accepts deposits and channels the money into lending
activities; "he cashed a check at the bank"; "that bank holds the mortgage on my
home")
⇒ financial institution, financial organization, financial organisation—(an
institution (public or private) that collects funds (from the public or other
institutions) and invests them in financial assets)
⇒ institution, establishment — (an organization founded and united for a
specific purpose)
⇒ organization, organisation — (a group of people who work together)
⇒ social group —(people sharing some social relation)
⇒ group, grouping —(any number of entities (members)
considered as a unit)
Sense 2: bank — (sloping land (especially the slope beside a body of water); "they
pulled the canoe up on the bank"; "he sat on the bank of the river and watched the
currents")
⇒ slope, incline, side — (an elevated geological formation; "he climbed the steep
slope"; "the house was built on the side of the mountain")
⇒ geological formation, geology, formation — (the geological features of the
earth)
⇒ natural object — (an object occurring naturally; not made by man)
⇒ object, physical object —(a tangible and visible entity; an entity that
can cast a shadow; "it was full of rackets, balls, and other objects")
⇒ entity, physical thing — (that which is perceived or known or
inferred to have its own physical existence (living or
nonliving))


Sense 3: bank — (a supply or stock held in reserve for future use (especially in
emergencies))
Figure 7.9 WordNet entry for bank: First three word senses and their hypernymic
taxonomies conceptual model.

The if part of the rule is sometimes called the antecedent; the then part is called
the consequent. Rules are like axioms or constraints. Although we briefly talk
about axioms in the next section, most of the discussion will have to wait until
Chapter 8. These logical rules are related to rules you may be more familiar
with: the production rules of expert systems. Production rules are conditionaction rules of the form:
■■

If condition X is true, then perform action Y.

where X again is an arbitrarily complex set of conditions that hold (or are true) in
the current state of the environment, and Y is an arbitrarily complex set of actions.


Understanding Taxonomies

165

Actions here include setting specific values to variables, asserting variables
(conditions) to be true, or executing other production rules, in a rule-chaining
style sometimes called forward-chaining (or top-down or right-to-left inference,
the prototypical reasoning method employed by expert systems). In other
words, if the antecedent of the production rule is true, then the actions of the
consequent are executed, thereby changing the state of the environment, and
so possibly enabling the conditions of other rules in the entire rule set to
become true, thus causing them to fire (become activated). Other common synonyms for production rules are demon and trigger, the latter sometimes used as

a mechanism in database technology for changing the state of a database.
The opposite type of rule execution in expert systems is called backwardchaining (bottom-up, right-to-left, goal-directed reasoning), where the consequent’s goal states are considered true, and so its conditions would generate
new goals, with the new goals matching the consequents of other rules.5 In
general, the production rules of expert systems are essentially nonlogical
implementations of inference—that is, they simulate inference. Although
production rules are still in use today, in practice, more modern knowledge
technologies (such as ontological engineering, which we discuss in Chapter 8)
employ logical rules in true logical inference.
In a conceptual model, it truly is possible to define and express the subclass of
relation between a parent class and a child class. Object-oriented programming modeling languages such as UML (and tools such as Rational Rose that
use UML) are rich enough to express the semantics of the subclass of relation
between two given classes.6 What is also important is that the definitions of a
class, superclass, and subclass be semantically well specified at the metamodel level so that the object-model level classes such as Person and its subclass Employee can be well specified semantically. The object-model level is
the level that we are interested in. It is the level at which we construct our
domain and system models. The meta-model level is the level that defines the
constructs such as class, relation, and attribute that we will use at the objectmodel level to define our content models. The meta-model level is often the
level where the conceptual modeling language (such as UML) itself is defined.
What is defined at the modeling language level enables us to express things in
that language (i.e., construct our own models using the language) at the object
level. This notion of meta level and object level can be confusing, so it is a topic
that we will return to in the next chapter when we look at ontologies.

5
For a more detailed description of expert systems and their problems, see Obrst and Liu
(2003), pp. 113 to 116.
6
For readers unfamiliar with the object-oriented programming paradigm, we suggest Graham
(2000) and Rumbaugh et al. (1991). For general information on and specifications of UML, see
For information on Rational Rose and UML, see ional.
com/uml/index.jsp.



166

Chapter 7

The Entity-Relational (ER) model or language (and the Enhanced or Extended
ER or EER model)7 that is used to define a conceptual schema for a database is
also considered a conceptual modeling language. When one designs a database, one first creates a conceptual schema (which is where the initial conception of the domain of the eventual database is modeled), reduces that to a
logical schema, and finally reduces that in turn to a physical schema. These
schemas represent levels of abstraction: from the human conceptual level to
the database table/column level to the actual implemented tables, columns,
and keys.

Logical Theory
The upper-right endpoint designates a logical theory. Ontologies represented as
logical theories are directly semantically interpretable by our software. This is
the high-end notion of an ontology: a logical theory. Much of current ontological engineering and knowledge representation (we will talk about these disciplines in more detail later) aspires to building ontologies as logical theories.
We investigate ontologies and Semantic Web languages used to express
ontologies more in Chapter 8. For now, all we need to say about logical theories is that they are built on axioms (a range of primitive to complex statements
asserted to be true) and inference rules (rules that, given premises/
assumptions, provide valid conclusions), which together are used to prove
theorems about the domain represented by the ontology-as-logical-theory.
The whole set of axioms, inference rules, and theorems together constitute the
logical theory.
In a logical theory, we can express the semantics of a model to the highest
degree possible. The subclass of relation can become a richer relation, perhaps
defined as the disjoint subclass of relation with the property of transitivity. A
class’s superclass relation to its subclasses can also be defined as exhaustive—
that is, the subclasses exhaustively partition the superclass. Similar fine

semantic distinctions can be made of relations and attributes, and other modeling constructs such as facets, which represent meta data associated with relations (or assertions on assertions).

Ontology
Now that we have looked at the Ontology Spectrum, ranging from taxonomies
to logical theories, can we define what an ontology is? Let’s look at a preliminary definition and save the elaboration until next chapter. An ontology defines
the common words and concepts (meanings) used to describe and represent an
area of knowledge, and so standardizes the meanings. Ontologies are used by
7

For the distinction between ER and EER and the kinds of schemas built for databases, refer to
nearly any standard database text. We like Halpin (1995) and Ullman (1989).


Understanding Taxonomies

167

people, databases, and applications that need to share domain information (a
domain is just a specific subject area or area of knowledge, like medicine, counterterrorism, imagery, automobile repair, etc.). Ontologies include computerusable definitions of basic concepts in the domain and the relationships among
them. They encode knowledge in a domain and also knowledge that spans
domains. So, they make that knowledge reusable.
An ontology includes the following:



Classes (general things) in the many domains of interest





Instances (particular things)




Relationships among those things




Properties (and property values) of those things




Functions of and processes involving those things




Constraints on and rules involving those things

Having completed our discussion of the Ontology Spectrum, let’s now turn to
describing a language (actually a language and an entire modeling paradigm)
that is often used to model Web objects and the things that can be said of
Web objects, and that can structure that model into a taxonomy or a set of
taxonomies.

Topic Maps
This section briefly describes Topic Maps (sometimes abbreviated TM). Topic

Maps is a technology that has arisen in recent years to address the issue of
semantically characterizing and categorizing documents and sections of documents on the Web with respect to their content—in other words, what topics or
subject areas those documents actually address. As such, they are closely
related to other efforts in general characterized as the Semantic Web. Topic
Maps provides a content-oriented index into a set of documents, much like the
index of a book but with this qualification: an index of a book does not typically characterize the contents of that book as a set of linked topics, but rather
as a set of mostly isolated subject references with occasional cross-references to
other subjects.
A Topic Map, however, does act as a set of linked topics that index a document
collection. In addition, in the Topic Maps paradigm, one can have multiple
topic maps indexing the same Web document collections (much as a book may
have multiple indexes, such as a subject index, a name index, and so forth; the
important point here is that one can have multiple topic maps indexing the
subjects in different ways). Topic maps can be viewed as information overlays
on documents or arbitrary information resources. They enable content-based


168

Chapter 7

navigation over these resources irrespective of the latter’s form. Topic maps
thus act as taxonomies—ways of describing, classifying, and indexing an
information space consisting of Web and, as we’ll see, non-Web objects.
Whether or not Topic Maps can constitute full-fledged ontologies is subject to
some dispute, and we will hold off on that discussion until the next chapter.

Topic Maps Standards
The development of Topic Maps began in the pre-XML and pre-WWW era
when SGML (Standard Generalized Markup Language, a document composition language, of which a simpler subset became XML) reigned supreme.

SGML was based on DTDs that later became the driving structural definition
of early XML, now largely being superseded by XML Schema. So, the early
Topic Maps standard was in fact based on SGML and used a non-XML syntax.
The problem, then as now, is this: How do you characterize the semantics of
your documents? How do you represent what your content means—in a way
that a machine can use?
Topic Maps today, as defined by the International Standards Organization
(ISO) 13250 standard (hereafter referred to as ISO 13250),8 are specified in
terms of two different interchange syntaxes: a more recent one based on XML
and an older one based on an SGML DTD that used the ISO 19744 HyTime
standard (a standard for specifying hypertext that includes resource addressing and linking). To simplify the exposition, this chapter focuses only on the
XML TM syntax, referred to as XTM.9
Figure 7.10 shows the components of the Topic Maps standard and their relationship to each other. The ISO 13250 components are on the left, and the
OASIS Published Subject Indicator Technical Committees are on the right.
Note that items marked with a * have yet to be fully defined—though versions
do exist. The Standard Application Model (SAM) defines the formal data
model of Topic Maps and its semantics in natural language.10 The Reference
Model is intended to be a more abstract model of Topic Maps than SAM and to
enable Topic Maps to semantically interoperate with other knowledge representation formalisms and Semantic Web ontology languages.11 The Topic Map
Query Language (TMQL) will be an SQL-like language for querying topic map
information. The Topic Map Constraint Language (TMCL) will give a database
schemalike capability to Topic Maps enabling constraints on the meaning to be
defined for Topic Maps. Both TMQL and TMCL are dependent on the final
elaboration of SAM, which is itself dependent on RM.12
8

For additional information on the various Topic Maps standards, see Biezunski et al., 2002.

9


Garshol and Moore (2002a).

10

Garshol and Moore (2002b).

11

See Newcomb and Biezunski (2002) for a view of what the RM might look like.

12

Biezunski et al. (2002) makes these relationships clear.


Understanding Taxonomies

ISO13250

169

OASIS

*Reference
Model

Published
Subjects TC

Standard

Application
Model

HyTime
Syntax

*Topic Map
Query
Language

XTM
Syntax

*Topic Map
Constraint
Language

XML
Vocabulary TC

Geography
& Languages TC

Key: * - future

Figure 7.10 Components of the Topic Maps Standard.

The products of the OASIS technical committees are intended to be layered
onto the ISO 13250 standard’s products.13 The Published Subjects Technical
Committee will define and manage published subjects (which will be discussed

shortly), and establish usage requirements for these. The XML Vocabulary
Technical Committee will define the vocabulary to enable Topic Maps to interact with existing and emerging XML standards and technologies; the vocabulary will be defined as published subjects according to the standards defined by
the Published Subjects TC. Finally, the Geography and Languages Technical
Committee will define geographical country, region, and language-based published subjects to ensure interoperability across geographical and linguistic
boundaries. All of the OASIS technical committees are currently actively pursuing their objectives.
Listing 7.1 depicts a simple XTM topic map. We will refer to this example in
the subsequent discussion of the important concepts of Topic Maps.14
13

See OASIS Topic Maps technical committees.

14

The left-hand side of Figure 7.10 is adapted from Biezunski et al. (2002).


170

Chapter 7

<topic id=”Front Royal”>
<instanceOf><topicRef xlink:href=”#city”/></instanceOf>
<baseName>
<baseNameString>Front Royal</baseNameString>
<variant>
<topicRef xlink:href=”#display”/></parameters>
<variantName>
<resourceData>Gateway to Skyline Drive</resourceData>
</variantName>
</variant>

</baseName>
<occurrence>
<instanceOf><topicRef xlink:href=”#portal”/></instanceOf>
xlink:href=” /></occurrence>
</topic>
<topic id=”Winchester”>
<instanceOf><topicRef xlink:href=”#city”/></instanceOf>
<baseName>
<baseNameString>Winchester</baseNameString>
</baseName>
<occurrence>
<instanceOf><topicRef xlink:href=”#portal”/></instanceOf>
<resourceRef xlink:href=” /></occurrence>
</topic>

Listing 7.1 A Simple XTM topic map: Topics, occurrences.

Topic Maps Concepts
The XTM standard15 identifies the key concepts of Topic Maps. The key concepts are topic, association, occurrence, subject descriptor, and scope. We describe
these concepts in the following text.

Topic
Anything can be a topic—that is, any distinct subject of interest for which
assertions can be made. Nearly everything in Topic Maps can become a topic,
including many of the other XTM constructs we talk about in this section. A
topic is a representation of the subject; according to the XTM standard, it acts
as a resource that is a proxy for the subject.
15


See Pepper and Moore (2001) for the online XTM V1.0 standard.


Understanding Taxonomies

171

The notion of subject in Topic Maps deserves some discussion. A subject is the
what—for instance, “Front Royal, Virginia” or “the Mars Lander” or “inventory control” or “agriculture”; a topic is an information representation of the
what. So a topic represents the subject that is referred to. If the subject is “Front
Royal,” then the topic would be Front Royal. Because subjects can be anything,
topics can be anything. A topic is just a construct in Topic Maps, one of the
essential building blocks. The way the subject of a topic is referred to is by having the topic point to a resource that expresses the subject. The resource either
constitutes the subject (and so addresses the subject) or indicates the subject.16 In
either case, the subject of the topic is represented by an occurrence of a resource,
and it is the nature of that resource that determines the addressability of the
subject. If the resource uses the resourceRef XTM construct, then it constitutes
the subject and is addressable. If the resource uses the subjectIndicatorRef construct, then it indicates the subject and is not directly addressable. Web objects
are addressable; non-Web objects are not directly addressable and so must be
indicated (for example, all occurrences of the same topic are about the same
subject, though they are distinct resources). A resource occurrence can also have
a data value that is directly specified inline.
In Listing 7.1, the topic map is enclosed by the <topic> and </topic> delimiters.
The topic is identified by the id=”Front Royal”. The topic is an instance of
another topic, identified by the <topicRef> markup.
<instanceOf><topicRef xlink:href=”#city”/></instanceOf>

In this case, Front Royal is a city, so the topic Front Royal is itself an instance of
the topic reference city. Because the resourceRef construct is used, this example
illustrates a topic that constitutes the subject, and the resource is addressable:

<occurrence>
...
<resourceRef xlink:href=” /></occurrence>

A topic is identified by a name. The primary way of identifying a topic map is
to use the required base name. In the example, the base name of the topic is represented as:
<baseName>
<baseNameString>Front Royal</baseNameString>
...
</baseName>
16

See Biezunski (2003), p. 19.


172

Chapter 7

The <basename> and </basename> delimiters enclose this base name. The base
name is meant to uniquely identify the topic (within a particular scope, which
we will discuss later). In addition to the base name, however, a variant name,
specifically, a display name and/or a sort name, can be used. In the example, a
display name is represented, within the base name markup:
<variant>
<topicRef xlink:href=”#display”/></parameters>
<variantName>
<resourceData>Gateway to Skyline Drive</resourceData>
</variantName>
</variant>


Each topic is implicitly an instance of a topic type—that is, the class of the topic,
though the type may not be explicitly marked in any given topic map. If the
topic type is not explicitly marked, then the topic is considered implicitly of
type A similar circumstance holds for typing associations and occurrences: If no type is specified, then
an association or an occurrence is defined to be, respectively, of type
or http://www
.topicmaps.org/xtm/1.0/core.xtm#occurrence.

Occurrence
As noted in the preceding text, an occurrence is a resource specifying some
information about a topic. The resource is either addressable (using a URI) or
has a data value specified inline. For the former, resourceRef is used. The example in Listing 7.1 illustrates this usage:
<occurrence>
...
<resourceRef xlink:href=” /></occurrence>

For the latter, the inline value, resourceData, is used (this is not part of Listing 7.1)
for arbitrary character data:
...
<resourceData>Front Royal is on the Shenandoah River
</resourceData>
</occurrence>

Note, however, that in Listing 7.1, the alternative use of resourceData is
exemplified—not to specify an occurrence, but to specify a variant name:
<variantName>
<resourceData>Gateway to Skyline Drive</resourceData>
</variantName>



Understanding Taxonomies

173

Like topics, occurrences can also be of different types, specified by the topicRef
markup. Occurrences are ways to characterize a topic. Because they can represent any information to be associated with a topic, they can also act as attributes of a topic, though XTM does not really distinguish attributes from other
information, a distinction that is sometimes made in other schema or knowledge representation languages.

Association
An association is the relationship between (one or more) topics. Associations
are delimited by <association> and </association>. In Listing 7.2, the association
located-in is asserted to hold between two topic references: Front Royal (indicated by the URI that is the value of one topicRef ) and Virginia (indicated by
the URI that is the value of the other topicRef). The specification of the semantics of located-in is not explicitly represented but is assumed to be defined by or
known to the creator of the topic map (and could remain implicit).
<association>
<instanceOf><topicRef xlink:href=”#located-in”/></instanceOf>
<member>
<roleSpec><topicRef xlink:href=”#city”/></roleSpec>
<topicRef xlink:href=”#Front-Royal”/>
</member>
<member>
<roleSpec><topicRef xlink:href=”#state”/></roleSpec>
<topicRef xlink:href=”#Virginia”/>
</member>
</association>

Listing 7.2 Topic map associations.


As depicted in the preceding example, the association located-in is specified to
be a (undirected) relationship between two members. A member is just a set of
topics, in this case two topics identified as the URIs #Front-Royal and #Virginia,
and demarcated by the topicRef constructs. This example also shows an important aspect of associations: The topics that are related by the association assume
different roles in that association. The topic referenced as #Front-Royal is in the
#city role, and the topic #Virginia is in the #state role of the #located-in association. An association is similar to the database notion of a relation or, as we shall
see in the next section comparing Topic Maps to RDF/S and in the next chapter on ontologies, to the ontology notion of a predicate (sometimes also called
relation or property). An association role specifies how a particular topic acts as
a member of an association, its manner of playing in that association. If there
were a uses association between Sammy Sosa and a Rawlings 34-inch Pro


174

Chapter 7

Model baseball bat, then Sammy would be in the batter role and the Rawlings
would be in the bat role, as the following hypothetical portion of a topic map
makes clear:
<association>
<instanceOf><topicRef xlink:href=”#uses”/></instanceOf>
<member>
<roleSpec><topicRef xlink:href=”#batter”/></roleSpec>
<topicRef xlink:href=”#Sammy-Sosa”/>
</member>
<member>
<roleSpec><topicRef xlink:href=”#bat”/></roleSpec>
Baseball-Bat “/>
</member>

</association>

Subject Descriptor
We’ve looked at subjects in our discussion of topics. A subject indicator is just a
way of indicating subjects. And topics are really the information representation
of subjects. Typically (as we’ve seen), a subject is indicated by defining a resource.
If two given topics in fact use the same resource, then their subjects (identified
or indicated by those resources) are identical. For example, see Listing 7.3.
<topic id=”Front Royal”>
<instanceOf><topicRef xlink:href=”#city”/></instanceOf>
<baseName>
<baseNameString>Front Royal</baseNameString>
<variant>
<topicRef xlink:href=”#display”/></parameters>
<variantName>
<resourceData>Gateway to Skyline Drive</resourceData>
</variantName>
</variant>
</baseName>
<occurrence>
<instanceOf><topicRef xlink:href=”#portal”/></instanceOf>
xlink:href=” /></occurrence>
</topic>
<topic id=”Front Royal, Virginia”>
<instanceOf><topicRef xlink:href=”#city”/></instanceOf>

Listing 7.3 Topic map subject indicators.



Understanding Taxonomies

175

<baseName>
<baseNameString>Front Royal, Virginia</baseNameString>
</baseName>
<occurrence>
<instanceOf><topicRef xlink:href=”#portal”/></instanceOf>
xlink:href=” /></occurrence>
</topic>

Listing 7.3 (continued)

In the listing, we’d like to say that both topics (Front Royal and Front Royal, Virginia) are really about the same subject. This judgment is confirmed, not by the
near identity of the strings “Front Royal” and “Front Royal, Virginia” (whose
string and concept similarity is apparent to a human being), but by the fact
that both topics have the same resource or subject indicator, as represented by
the common occurrence specification:
<occurrence>
<instanceOf><topicRef xlink:href=”#portal”/></instanceOf>
xlink:href=” /></occurrence>

The XTM standard also allows for a published subject indicator or, more simply,
a published subject. A published subject is simply a subject that has general definition and usage and is identified by a specific published reference. In fact, the
XTM standard states that there are default, mandatory published subjects,
made mandatory by the requirements of the XTM standard itself. They include
topic, association, occurrence, class-instance relationship, class, instance, superclasssubclass relationship, superclass, subclass, suitability for sorting, and suitability for

display.17

Scope
Scope in Topic Maps is similar to the notion of namespace in other markup languages. Scope specifies the applicability or context of the topic, its occurrences
and resources, and its associations. Subjects have a scope. The names of topics
are unique within a scope. Resources specified within a particular topic have
17

See Pepper and Moore (2001), Section 2.3.2, “XTM Mandatory Published Subject Indicators,”
for the specification of these.


176

Chapter 7

the same scope as that topic. That is why topic maps should be merged if they
have the same base name; they indicate the same subject having the same scope.
We note that the notion of scope is not explicitly called out by a Topic Maps
markup construct but is defined with respect to the naming conventions of
topics: Any topic map utilizing or specifying a topic that has the same base
name is in the same scope defined by that unique name.

Topic Maps versus RDF
We are now able to compare Topic Maps to RDF.18 We will discuss RDF Schema
(abbreviated RDFS) and its constructs as needed to provide context for comparing RDF/S (which is how we will abbreviate the combination of RDF and
RDFS) and Topic Maps. In general, however, we will postpone a more detailed
description of RDFS to the next chapter.19 We will see that RDF and Topic Maps
are fairly aligned; their respective concepts can be reasonably mapped to each
other. On the one hand, it will seem as though they provide redundant functionality. On the other hand, we will try to demonstrate that they actually complement each other.

The crucial distinction is this: RDF expresses instance-level semantic relations
phrased in terms of a triple. RDFS expresses class-level relations describing
acceptable instance-level relations phrased in terms of a triple, which will be
described in more detail shortly.
All of the following are equivalent notions of a triple:
<subject, verb, object>
<object1, relation1, object2>
<resource, property, property-value>

RDF Revisited
In Chapter 5, we examined RDF and RDF Schema. We saw that RDF has the
following important concepts: resource, property (and property value), and
statement. Let’s take a brief look at each of these.
RDF was developed primarily to represent meta data resources about Web
objects and to support the meaning-preserving exchange of information about
those objects. A resource is anything being described by an RDF expression.
18
19

For an extended comparison, see Freese (2003).

For the RDF specification, see Lassila and Swick (1999). For the most recent revision of the
RDF/XML Syntax Specification, see Beckett (2001). For the RDFS specification, see Brickley and
Guha (2002).


Understanding Taxonomies

177


A resource can be a Web page (an HTML or XML document) in whole or part,
a collection of Web pages, and even objects that do not exist on the Web. This
is similar to the notion of addressability in XTM; some objects exist in the real
world and can only be indicated and not directly accessed. Resources are
named by using a URI and can also include an optional anchor identifier.
A property is a specific piece of information used to describe a resource. It can
be an aspect, characteristic, attribute, or relation. These can mean different
things to different people, so we won’t try to distinguish these concepts here
but will discuss them in the next chapter. A property of a resource will have a
defined meaning and can have a defined range of acceptable property values
(either simple enumerated types or more complex values), or they will simply
“relate” to other resources and will typically have relationships with other
properties. A property value can thus be another resource (again, identified by
a URI) or a literal (a primitive XML data type or a simple string).
A statement in RDF pulls resources, properties, and property values together.
Statements are typically called triples—though, as we shall see, they can also
be viewed as graphs—because they include a subject (the resource), a predicate/verb (the property), and an object (the property value or another
resource). For example, the following is an RDF statement in XML serialization
syntax:
<?xml version=”1.0”?>
xmlns:rdf=” />xmlns:j=” />about=” /><j:Creator>John Author Livingston</j:Creator>
</rdf:Description>
</rdf:RDF>

In this example, the entire statement is delimited by <rdf:RDF> and
</rdf:RDF>. The subject here is the resource specified by “n
shome.org/Home/JohnAL”. The predicate is property Creator. The object is
the resource (literal) John Author Livingston. The statement is equivalent to

the English statement:
“The creator of page is John Author
Livingston”

RDF statements can also be depicted as directed graphs. The graph form equivalent to the preceding triple representation is shown in Figure 7.11. Note that the
figure is simplified slightly. For example, namespace information has been
removed. Actually, the property creator is defined in the namespace prefixed by j:.


178

Chapter 7

/>
Predicate (Property)

Subject (Resource)

/>
John Authur Livingston

Object
(Property-value
or Resource)

Figure 7.11 RDF statement as a graph.

Comparing Topic Maps and RDF
Both Topic Maps and RDF attempt to describe the information content of Web
objects in terms of resources. Both standards exist in order to establish content

meta data (data being about other data) about Web objects, to make those
objects and their content more easily accessible. In Topic Maps, a topic is a Web
object having occurrences (defined as resources—i.e., arbitrary information
about the topic). The subject of the topic itself is represented by an occurrence
of a resource, which can be addressable or not. Recall that an addressable subject is a Web object; a nonaddressable, indicated subject is not a Web object.
Topics are linked by associations, and each topic in an association has a particular role that it plays in that association. But RDF was explicitly developed to
enable the description (and linkage) of meta data to Web objects, whereas
Topic Maps was meant to enable multiple content-based indexing of documents. If that distinction is kept in mind, then Topic Maps and RDF can be seen
to be complementary paradigms. If indexing (or overlays of topic structure)
represent the linking of subjects, then in fact it might be the case that RDF
could represent the set of assertions that attempt to constitute the meaning of
those subjects. In that case, Topic Maps and RDF can equitably coexist, each
borrowing on the other’s strengths and purposes.
In RDF, a resource (subject) has a property (predicate, relation), which has a
property-value (object), which in turn can be a resource. This complicates the
picture somewhat, at least with respect to Topic Maps, insofar as Topic Maps
doesn’t have this same notion of a resource’s property itself being a resource,
which by definition can have its own properties. And so on. This kind of linking means that RDF is a bit more complicated than Topic Maps. Whether Topic
Maps evolves to have comparable machinery remains an open question. Currently, it is probably easier to represent a given complicated topic map in RDF
than it is to represent a complicated RDF set of assertions in Topic Maps.


Understanding Taxonomies

179

Table 7.5 shows the closest comparable constructs between Topic Maps
and RDF.
The table cannot do real justice to the mappings between the constructs in
these two paradigms, since, in general, so many qualifications would have to

be made about the comparable equivalence between a topic and a resource (Is a
topic really a resource? Is a resource about a subject as a topic is? Isn’t the mapping of these constructs more along the lines of a mapping between comparable triples?) that the ultimate comparison is more suggestive than real.
Topic Maps does not yet have a defined Reference Model (RM), whereas RDF
currently has RDF Schema, which is another distinction between the two paradigms. RDF Schema is a meta level or more abstract model that describes the
object level of RDF. When the RM is defined (possibly with assistance from the
Topic Map Constraint Language, itself under development), it may then be
that the two paradigms have more comparable, formal power in defining
assertions about topics and associations or resources and properties in terms
of the semantics of those assertions. This is a topic (pun slightly intended) that
we address in more detail in the next chapter.
Table 7.5

Comparing Topic Maps to RDF

TOPIC MAPS CONSTRUCTS

RDF CONSTRUCTS

Topic

Resource

Occurrence

20

Property

Property value
Association21


Property

Scope

Namespace
22

Subject

Resource

20

Occurrence in the Topic Maps paradigm is, strictly speaking, more like an instance in the
object-oriented or ontology paradigms. With respect to RDF, a TM occurrence, because it is
something that is relevant to a topic, can really be either a resource or property, simply because
an instance in RDF is a triple specifying a specific object having a specific property/relation to
another specific object—that is, a resource having a property and a value for that property (all of
which can technically be resources).

21

An association is a relation between subjects (i.e., topics). As such, perhaps a better understanding is that is it is a type of property under the RDF perspective.
22

Although a subject is technically not a first-class construct in Topic Maps, because it crucially
stands behind the notion of topic, which is the first-class notion, we include it in the comparison.



180

Chapter 7

Summary
A taxonomy is a hierarchic classification (typically in a tree structure) of realworld objects. In information technology, a taxonomy is used to classify the
information correlates of those objects. Because taxonomies are so closely
related to other classification, vocabulary description, and information model
representations, this chapter also described a framework called the Ontology
Spectrum. The Ontology Spectrum distinguishes taxonomies from other representations in this space: thesauri, conceptual models, and logical theories.
Taxonomies are important because they help structure and provide at least a
simple semantics for an information space.
This chapter also introduced Topic Maps and the various TM standards. Any
given topic map is at least a taxonomy in the sense that it tries to say something about how subjects are structured and related, using the notions of topics and associations. One can have multiple topic maps covering the same
collection of Web and non-Web objects, just as one can have multiple indexes
of the same document or documents.
If Topic Maps is a way of describing and structuring an information space in
terms of topics and associations, then, in contrast, RDF is a Web language for
describing and structuring an information space in terms of resources and properties. But after revisiting what RDF is—and to a limited extent, introducing
some aspects of RDF Schema, which we look at more closely in the next
chapter—we saw that Topic Maps and RDF actually have many similarities.
The primary differences between the two paradigms are (1) they were developed by different communities for slightly different classification tasks and
(2) RDF has a schema level (RDF Schema) that enables you to describe a set of
properties and the relationships between these properties and other
resources—in other words, a meta model to the RDF object model—whereas
Topic Maps currently does not have such a level. With the eventual development of the Reference Model and a Topic Map Constraint Language, however,
this latter distinction may be weakened.
As we shall see in the next chapter, RDF and Topic Maps pave the way for
increasing the representational capabilities of an information model over that
of a taxonomy. Both paradigms provide some of the essential building blocks

for constructing the semantically richer notion of ontologies.


CHAPTER

Installing Custom Controls

Understanding Ontologies

8
181

“Ontology is the very first science. Ontology involves discovering categories and fitting objects into them in ways
that make sense . . . . When we make a list of things to do,
or of records and books we most want to buy, or videos
we intend to rent, we are categorizing—we are engaging
in rudimentary ontology. By prioritizing items in a list, we
are assigning relationships among various things. Ontology can be relatively simple, or it can be quite complex.
Ontology becomes more complex, and even daunting,
when we begin to grapple with large domains of objects
with complex relationships among them. For instance,
anyone who has attempted to outline the processes and
components of even a relatively small enterprise has
experienced the brain-cramps that can come with
complex ontology.”
—David Koepsell, Center for Commercial Ontology: Prospectus
/>
O

ntologies are about vocabularies and their meanings, with explicit, expressive,

and well-defined semantics—possibly machine-interpretable. So what does
this statement mean? What’s a vocabulary? What’s a meaning? What is semantics? What does machine-interpretable mean? What is ontology and what are
ontologies? In this chapter, we define what ontology is and what ontologies are
in clear and simple language, with meaningful examples. You may discover
many ideas that are strange at first, such as semantics, knowledge representation, domain, reference, truth-function, intension, extension, axiom, theorem,
theory, but you will be given useful, incisive, and simple explanations of what
those ideas are, how they can be used in practice in your information technology projects, and where semantic technologies are heading.
You will also be happy to know that ontologies do have something to do with
taxonomies, discussed in the previous chapter. In fact, ontologies extend taxonomies quite some way. Ontologies are to taxonomies as two-dimensional
space is to three- (or more) dimensional space. In other words, ontologies
enable you to specify the semantics of your domain, your enterprise, or your
community, or across many communities, in great and arbitrarily greater detail.
You’ll also learn a bit about some languages used to express ontologies, including the W3C’s emerging Web Ontology Language (OWL).

181


182

Chapter 8

Overview of Ontologies
So what is ontology, and what are ontologies? Before looking at some definitions, let’s take a look at an actual ontology.

Ontology Example
Figure 8.1 shows a simple human resources ontology created in the ontology
management tool called Protégé (http://protégé.stanford.edu). You’ll notice
that there are classes such as Person, Organization, and Employee. In an ontology, these are really called concepts, because it is intended that they correspond
to the mental concepts that human beings have when they understand a particular body of knowledge or subject matter area or domain (these phrases are
all used interchangeably; they are intended to be synonymous), such as the

human resources domain.
These concepts and the relationships between them are usually implemented
as classes, relations, properties, attributes, and values (of the properties/
attributes). So what Figure 8.1 depicts primarily are concepts of the important
entities of the domain, which are implemented as classes. Examples are Person, Organization, and Employee. Also depicted are the relations between
these entity-focused concepts, such as employee_of, managed_by, and manages. Finally, properties or attributes are depicted. Examples include address,
name, birthdate, and ssn under the Person class. These properties or attributes
have either explicit values or, more often, have value ranges. The value range
for the property/attribute of employee_of, a property of the class Employee,
for example, is the class Organization. By range we mean that the only possible
values for any instances of the property employee_of defined for the class
Employee must come from the class Organization.
Immediately we see that an ontology tries to capture the meaning (what we will
call semantics) of a particular subject area or area of knowledge that corresponds
to what a human being knows about that domain. An ontology also characterizes that meaning in terms of concepts and their relationships. Furthermore, an
ontology is often represented as classes, relations, properties, attributes, and
values.
Figure 8.1 is a graphical fragment of a simple ontology attempting to model the
human resources domain (person, employee, organization), their subclasses
(staff employee, management employee, company, group, division, and
department), their properties, and the relationships among those concepts.


183

Understanding Ontologies

Person
address String
name

String
birthdate String
ssn
String
isa

Organization
part_of
Class
Organization
employs
Class*
Employee
managed_by Instance Management_Employee
employee_of

employs*

Employee
employee_of Class Organization
employee_number
String
isa

Staff_Employee

managed_by

manages


isa
Management_Employee
manages Class Organization
isa

isa

isa

manages

part_of

isa

manages

President
Class Company

manages

isa
isa
Manager
Class Department

Director
Class Division


manages

isa

isa

manages

Department
part_of Class Division

Vice_President
manages Class Group

part_of

manages

manages

part_of

Division
Class Group
part_of

part_of

Group
Class Company

part_of

part_of

Company
Class Company

part_of

Figure 8.1 Graphical ontology example: Human resources.

Listing 8.1 is a fragment of the textual view of the same ontology—see the companion Web site ( for the equivalent of the same ontology represented in RDF/S. In the case of Listing 8.1, the
language used is the Open Knowledge Base Connectivity Language (OKBC,
Both underscore an important point: There is
no logical difference between a graphical and a textual rendition of an ontology (or
any other model, for that matter).
This fact is important, because a key point of this chapter is that an ontology is
represented in a knowledge representation language (such as a Semantic Web language like RDF/S, DAML+OIL, OWL, or in an ontology language that predates the Semantic Web, such as Ontolingua/KIF/Common Logic, OKBC,
CycL, or Prolog). Furthermore, such ontology languages are in turn typically


184

Chapter 8

based on a particular logic, with the logic itself being a language with a syntax
and a semantics (these latter concepts are explained later in this chapter).
Sometimes, therefore, we call the language in which the ontology is represented a logic-based language. So ultimately it does not matter whether you see
a graphical or a textual rendition of an ontology; both are exactly equivalent.
The important issue is that of the power of the underlying language used to

represent the ontology.
(defclass
(is-a USER)
(role concrete)
(single-sot managed_by
(type SYMBOL)
;+
(allowed-classes Management_Employee)
;+
(cardinality 1 1)
(create-accessor read-write))
(single-slot part_of
(type SYMBOL)
;+
(allowed-parents Organization)
;+
(cardinality 0 1)
(create-accessor read-write))
(multislot employs
(type SYMBOL)
;+
(allowed-parents Employee)
(cardinality 1 ?VARIABLE)
(create-accessor read-write)))
(defclass Department
(is-a Organization)
(role concrete)
(single-slot part_of
(type SYMBOL)
;+

(allowed-parents Division)
;+
(cardinality 0 1)
(create-accessor read-write)))

(defclass Company
(is-a Organization)
(role concrete)
(single-slot part_of
(type SYMBOL)
;+
(allowed-parents Company)
;+
(cardinality 0 1)
(create-accessor read-write)))
(defclass Person

Listing 8.1 Textual ontology example: Human resources.


Understanding Ontologies

;+

;+

;+

;+


185

(is-a USER)
(role concrete)
(single-slot birthdate
(type STRING)
(cardinality 1 1)
(create-accessor read-write))
(single-slot name_
(type STRING)
(cardinality 1 1)
(create-accessor read-write))
(single-slot address
(type STRING)
(cardinality 1 1)
(create-accessor read-write))
(single-slot ssn
(type STRING)
(cardinality 1 1)
(create-accessor read-write)))

... some classes omitted for brevity
(see companion website for complete listing)...

Listing 8.1 (continued)

A corollary issue is that high-end ontology languages are backed by a rigorous
formal logic, which thereby makes the ontology machine-interpretable. By
machine-interpretable we mean that the semantics of the model is semantically
interpretable by the machine; in other words, the computer and its software can

interpret the semantics of the model directly—without direct human involvement.
Software supported by ontologies moves up to the human knowledge/conceptual level; humans do not have to move down to the machine level. Interaction
with computers takes place at our level, not theirs. This is an extremely important point, and it underscores the value of ontologies. In the following sections,
we elaborate these issues so that you understand the importance of ontologies
in the coming Semantic Web.

Ontology Definitions
The description and the picture and the code are nice, but just what is an ontology? An ontology defines the common words and concepts (the meaning) used
to describe and represent an area of knowledge. But what does that definition
mean? Let’s delve into just what ontology is and what ontologies are. If you
look up ontology in the dictionary, you’ll find the following definition (from
Merriam-Webster OnLine: />

186

Chapter 8

1. A branch of metaphysics concerned with the nature and relations of being
2. A particular theory about the nature of being or the kinds of existents
This definition indicates that the term originates in philosophy—specifically, a
part of metaphysics that is the systematic study of the principles underlying a
particular subject, most often the nature of being and the nature of experience.
Often these days, the distinction is made between “big O” Ontology and
“little o” ontology. “Big O” Ontology is the philosophical discipline. “Little O”
ontology is the information technology engineering discipline that has
emerged over the past eight or so years. Much like the distinction between
ordinary taxonomies and taxonomies as used in information technology, there
is a comparable distinction for ontologies. IT offers the following definitions.
The first definition is really a simple paraphrase in everyday language of the
more technically jargonistic second definition, but to understand the second, it

helps to build on an elucidation of the first definition:
■■

An ontology defines the common words and concepts (the meaning) used
to describe and represent an area of knowledge.

■■

An ontology is an engineering product consisting of “a specific vocabulary used to describe [a part of] reality, plus a set of explicit assumptions
regarding the intended meaning of that vocabulary”1—in other words, the
specification of a conceptualization.2

Let’s try to unpack these definitions. The first definition has two parts:
■■

Describing and representing an area of knowledge

■■

Defining the common words and concepts of the description

Recall from the previous chapter what we learned about a domain: A domain is
a subject matter area or area of knowledge. Some examples of areas of knowledge or domains are medicine, automobile repair, financial planning, machine
tooling, business management, physics, textiles, and geopolitics. Describing an
area of knowledge is the act of expressing, in either written or spoken words,
the important points about a specific area of knowledge. For example, in
describing automobile repair, we would probably talk about the following:
■■

The kinds of cars there are (sedans, station wagons, sports cars, luxury

cars, compacts, domestic and foreign cars)

■■

The types of engines (corresponding perhaps to the types of fuel used:
gasoline, diesel, electric-powered, hybrid)

1

Guarino (1998, p. 4).

2

Gruber (1993).


Understanding Ontologies

187




The particular engines (for example, a 1995–96 V-6 Ford Taurus 244/4.0
Aerostar Automatic with Block Casting # 95TM-AB, Head Casting 95TM)




The manufacturers (Ford, General Motors, Chevrolet, Nissan, Honda,

Volvo, Volkswagen, Saab, Hyundai, and so on)




The things that constitute cars (engines, brake systems, cooling systems,
electric systems, suspension, body, and so on) and their properties (an
engine has 4, 6, 8, or 12 cylinders; brake pads have different compositions
such as semimetallic or nonferrous material)

We’ll see in the next section a more complicated, technical definition of description, one that brings into our discussion the semantic notion of intension.
An important part of automobile repair is elaborating how to repair various
cars, subsystems of cars, diagnosis, tools to use in diagnosis and repair, parts
to use in the repair process, costing and estimating of repairs, how to manage
an automobile repair facility, certification of excellence in automobile repair,
and so on. When describing an area of knowledge—a domain—we describe
the important things in the domain, their properties, and the relationships among
the things. If we were to elaborate our description (because, say, we were writing a paper or a book on automobile repair), we may even include rules about
the domain, such as the following diagnosis rule, which specifies how to determine what is wrong with an automobile system in order to repair it: If the car
won’t start and it doesn’t turn over, check and clean the battery connections.
Therefore, a description is or can be an ontology. As we saw in Chapter 7, it
includes the same kinds of concepts:



Classes (general things) in the many domains of interest





Instances (particular things)




The relationships among those things




The properties (and property values) of those things




The functions of and processes involving those things




Constraints on and rules involving those things

In addition to describing an area of knowledge, we also need to represent that
description. What does representation mean? Representing means that we
encode the description in a way that enables someone to use the description. A
description consists of words and phrases in a natural language (such as
English or Chinese), that is, vocabulary/terminology and sentences that combine terminologies to express relationships among the terms (we’ll use vocabulary and terminology as equivalent here and use term for the individual


188


Chapter 8

word). So representing means that we represent the description using terms
and sentences. We define the terms (or we already have the terms defined in
our mental lexicon), and then we combine those defined terms in ways that
elaborate more of the meaning about the area of knowledge.
In information technology, however, we use a slightly more complicated notion
of representing. We represent in order to use the description in information
technology; in other words, we create a model that software will be able to utilize. We represent the classes, instances, relationships, properties, and rules for
the area of knowledge. We use the terms of the natural-language description as
labels for the underlying concepts—that is, the meaning of the area of knowledge consisting of classes, properties, and relationships. Typically, we represent or codify the ontology in a logical, knowledge representation language
(which we discuss a bit later) rather than a natural language, because we want
to represent our description as clearly, precisely, and unambiguously as possible, and natural language can be very ambiguous. We also want to make its
meaning available for information technology use. Representation thus has to
do with defining the terms (the vocabulary that acts as labels for the concepts),
and that means also defining the concepts and their relationships that are
behind the labels and that constitute the meaning of the model of the knowledge area we are interested in.
The first definition dealt with describing and representing an area of knowledge. What about the second definition of ontology? What does the specification of a conceptualization mean? Let’s try to clarify that definition by referring
to the different parts of the first definition.3
A conceptualization is a way of thinking about part of the world. When we conceive of the world or a part of the world, we have in mind, literally, a mental
model of that part of the world. For example, when we conceive of automobile
repair, we probably have a set of mental images of automobiles, their subsystems and parts, an automobile garage or repair shop, mechanics in uniforms,
and so on. If we were to describe these images, we would probably do so
according to the first definition—in terms of the things that are important to
the notion of automobile repair, and their properties and relationships. Given
a particular way of thinking about a part of the world (a subject area or
domain), in other words, a conceptualization (we conceive it to be this way or
that way, and not some other way), when we seek to describe it to ourselves
or another person in a fairly detailed and precise way, we say we are specifying it.

Table 8.1 displays some of the key terminology you’ll learn about in this chapter, along with shorthand definitions. You may want to refer back to this table
periodically as you encounter one of the terms in the text.
3

See Guarino and Giaretta (1995) for elaboration of various definitions of ontology.


×