Tải bản đầy đủ (.pdf) (38 trang)

THE SEMANTIC WEB CRAFTING INFRASTRUCTURE FOR AGENCY jan 2006 phần 1 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (337.54 KB, 38 trang )

THE SEMANTIC WEB
CRAFTING INFRASTRUCTURE FOR AGENCY
Bo Leuf
Technology Analyst, Sweden
THE SEMANTIC WEB

THE SEMANTIC WEB
CRAFTING INFRASTRUCTURE FOR AGENCY
Bo Leuf
Technology Analyst, Sweden
Copyright # 2006 John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,
West Sussex PO19 8SQ, England
Telephone (+44) 1243 779777
Email (for orders and customer service enquiries):
Visit our Home Page on www.wiley.com
All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted
in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise,
except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued
by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the
permission in writing of the Publisher. Requests to the Publisher should be addressed to the Permissions
Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ,
England, or emailed to , or faxed to (+44) 1243 770620.
This publication is designed to provide accurate and authoritative information in regard to the subject matter
covered. It is sold on the understanding that the Publisher is not engaged in rendering professional
services. If professional advice or other expert assistance is required, the services of a competent
professional should be sought.
Other Wiley Editorial Offices
John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA
Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA
Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany


John Wiley & Sons Australia Ltd, 42 McDougall Street, Milton, Queensland 4064, Australia
John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop # 02-01, Jin Xing Distripark, Singapore 129809
John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1
Wiley also publishes its books in a variety of electronic formats. Some content that appears in
print may not be available in electronic books.
Library of Congress Cataloging-in-Publication Data
Leuf, Bo, Technology Analyst, Sweden.
The Semantic Web: crafting infrastructure for agency/Bo Leuf.
p. cm.
Includes bibliographical references and index.
ISBN 0-470-01522-5
1. Semantic Web. I. Title.
TK5105.88815.L48 2005
025.04–dc22 2005024855
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
ISBN-13 978-0-470-01522-3 (HB)
ISBN-10 0-470-01522-5 (HB)
Typeset in 10/12pt Times Roman by Thomson Press (India) Limited, New Delhi
Printed and bound in Great Britain by Antony Rowe, Chippenham, Wilts
This book is printed on acid-free paper responsibly manufactured from sustainable forestry
in which at least two trees are planted for each one used for paper production.
I dedicate this book to the many visionaries , architects, and programmers,
who all made the Internet an interesting place to ‘live’ and work in.
And especially to Therese.

Contents
Foreword xiii
Preface xv
Part I Content Concepts

1 Enhancing the Web 3
Chapter 1 at a Glance 3
There and Back Again 4
Resource Identifiers 5
Extending Web Functionality 11
From Flat Hyperlink Model 17
To Richer Informational Structu res 22
The Collaboration Aspect 23
Extending the Content Model 26
Mapping the Infosphere 27
Well-Defined Semantic Models 30
2 Defining the Semantic Web 31
Chapter 2 at a Glance 31
From Model to Reality 32
The Semantic Web Concept 33
Representational Models 43
The Road Map 46
Identity 46
Markup 49
Relationships 49
Reasoning 51
Agency on the Web 52
Semantic P2P 52
The Visual Tour 53
The Map 54
The Architectural Goals 55
The Implementation Levels 57
3 Web Information Management 59
Chapter 3 at a Glance 59
The Personal View 60

Creating and Using Content 63
Authoring 64
Publishing 65
Exporting Databases 66
Distribution 66
Searching and Sifting the Data 69
Semantic Web Services 71
Security and Trust Issues 81
XML Security 83
Trust 84
A Comm ercial or Free Web 91
4 Semantic Web Collaboration and Agency 97
Chapter 4 at a Glance 97
Back to Basics 98
The Return of P2P 99
WebDAV 101
Peer Aspects 103
Peering the Data 103
Peering the Services 105
Edge Computing 106
Automation and Agency 108
Kinds of Agents 109
Multi-Agent Systems 111
Part II Current Technology Overview
5 Languages and Protocols 117
Chapter 5 at a Glance 117
Markup 118
The Bare Bones (HTML) 119
Disentangling the Logic (XHTML and XML) 119
Emerging Special-Purpose Markup 122

Higher-Level Protocols 124
A Messaging Framework (SOAP versus REST) 125
Expressing Assertions (RDF) 131
The Graphical Approach 141
Exchanging Metadata (XMI) 144
Applying Semantic Rules 146
RDF Query Languages 148
Multi-Agent Protocols 153
viii Contents
6 Ontologies and the Semantic Web 155
Chapter 6 at a Glance 155
Ontology Defined 156
Ontologies for the Web 159
Ontology Types 160
Building Ontologies 161
Web Ontology Language, OWL 165
Other Web Ontology Efforts 169
Knowledge Representation 172
Conceptual Graphs 172
Promoting Topic Maps 173
Description Logics 176
7 Organizations and Projects 177
Chapter 7 at a Glance 177
Major Players 178
W3C 178
Semantic Web Communities 182
Dublin Core Metadata Initiative 183
DARPA 186
EU-Chartered Initiatives 187
AIC at SRI 190

Bioscience Communities 193
Biopathways Consortium 193
8 Application and Tools 195
Chapter 8 at a Glance 195
Web Annotation 196
The Annotea Project 198
Evaluating Web Annotation 205
Infrastructure Development 208
Develop and Deploy an RDF Infrastructure 208
Building Ontologies 209
Information Management 213
Haystack 213
Digital Libraries 214
Applying RDF Query Solutions 216
Project Harmony 216
DSpace 219
Simile 223
Web Syndication 224
RSS and Other Content Aggregators 225
Metadata Tools 226
Browsing and Authoring Tools 226
Metadata Gathering Tools 226
Contents ix
9 Examples of Deployed Systems 229
Chapter 9 at a Glance 229
Application Examples 230
Retsina Semantic Web Calendar Agent 232
MusicBrainz and Freedb 233
Semantic Portals and Syndication 235
SUMO Ontology 242

Open Directory Project 245
Ontolingua 246
Industry Adoption 251
Adobe XMP 252
Sun Global Knowledge Engineering (GKE) 255
Implemented Web Agents 256
Agent Environments 256
Intelligent Agent Platforms 258
Part III Future Potential
10 The Next Steps 263
Chapter 10 at a Glance 263
What Would It Be Like? 264
Success on the Web 264
Medical Monitoring 270
Smart Maintenance 272
And So It Begins 272
Meta-Critique 273
Where Are We Now? 283
Intellectual Property Issues 291
The Road Goes Ever On 293
Reusable Ontologies 294
Device Independence 296
11 Extending the Concept 299
Chapter 11 at a Glance 299
Externalizing from Virtual to Physical 300
The Personal Touch 301
Pervasive Connectivity 302
User Interaction 306
Engineering Automation Adaptability 310
Whither the Web? 312

Evolving Human Knowledge 313
Towards an Intelligent Web 314
The Global Brain 315
Conclusions 316
Standards and Compliance 316
x Contents
The Case for the Semantic Web 318
We can Choose the Benefits 320
Part IV Appendix Material
Appendix A Technical Terms and References 325
At a Glance 325
Glossary 325
RDF 333
RDF Schema Example Listing 333
RDF How-to 336
Appendix B Semantic Web Resources 339
At a Glance 339
Further Reading 339
Book Resources 339
Other Publications 340
Web Resources 341
Miscellaneous 343
Appendix C Lists 345
Index 355
Contents xi

Foreword
As an individual, as a technologist, as a business person, and as a civic participant, you
should be concerned with how the Semantic Web is going to change the way our knowledge-
based society functions.

An encouraging sign is that one of the first large-scale community-based data pools, the
Wikipedia, has grown to well over half a million articles. It is an ominous indicator that one
of the first large-scale governmental metadata assignment projects is going on in China for
the purpose restricting personal access to political information.
The Semantic Web is not a single technology; rather it is a cluster of technologies,
techniques, protocols, and processes. As computational power becomes more powerful and
more ubiquitous, the amount of control that information technology will hold over people’s
lives will become more pervasive, and the individual’s personal control ever less.
At the same time, the employment of anonymous intelligent agents may buy individuals a
new measure of privacy. The Semantic Web is the arena in which these struggles will be
played out.
The World Wide Web profoundly transformed the way people gain access to information;
the Semantic Web will equally profoundly change the way machines access information.
This change will transform yet again our own roles as creators, consumers and manipulators
of knowledge.
—Mitchel Ahren, Director of Marketing Operations,
AdTools | Digital Marketing Concepts, Inc.

Preface
This is a book that could fall into several different reader categories – popular, academic,
technical – with perhaps delusional ambitions of being both an overview and a detailed study
of emerging technology. The idea of writing The Semantic Web grew out of my previous two
books, The Wiki Way (Addison-Wesley, 2001) and Peer to Peer (Addison-Wesley, 2002). It
seemed a natural progression, going from open co-authoring on the Web to open peer-
sharing and communication, and then onto the next version of the Web, involving peer-
collaboration between both software agents and human users.
Started in 2002, this book had a longer and far more difficult development process than the
previous two. Quite honestly, there were moments when I despaired of its comple tion and
publication. The delay of publication until 2005, however, did bring some advantages,
mainly in being able to incorporate much revised material that otherwise would have been

waiting for a second edition.
The broader ‘Semantic Web’ as a subject still remains more of a grand vision than an
established reality. Technology developments in the field are both rapid and unpredictable,
subject to many whims of fate and fickle budget allocations.
The field is also ‘messy’ with many diverging views on what it encompasses. I often felt
like the intrepid explorer of previous centuries, swinging a machete to carve a path through
thick unruly undergrowth, pencil-marking a rough map of where I thought I was in relation
to distant shorelines and major landmarks.
My overriding concern when tackling the subject of bleeding-edge technologies can be
summed up as providing answers to two fundamental reader questions:
 What does this mean?
 Why should I care?
To anticipate the detailed exposition of the latter answer, my general answer is that you –
we – should care, because these technologies not only can, but most assuredly will, affect us
more than we can possibly imagine today.
Purpose and Target
The threefold purpose of the book is rather simple, perhaps even simplistic:
 Introduce an arcane subject comprehensively to the uninitiated.
 Provide a solid treatment of the current ‘state of the art’ for the technologies involved.
 Outline the overall ‘vision of the future’ for the new Web.
My guiding ambition was to provide a single volume replete with historical background,
state of the art, and vision. Intended to be both informative and entertaining, the approach
melds practical information and hints with in-depth analysis.
The mix includes conceptual overviews, philosophical reflection, and contextual material
from professionals in the field – in short, all things interesting. It includes the broad strokes
for introducing casual readers to ontologies and automated processing of semantics (not a
unique approach, to be sure), but also covers a sampling of the real-world implementations
and works-in-progress.
However, the subject matter did not easily lend itself to simple outlines or linear
progressions, so I fear the result may be perceived as somewhat rambling. Well, that’s

part of the journey at this stage. Yet with the h elp of astute technical reviewers and the
extended period of preparation, I was able to refine the map and sharpen the focus
considerably. I could thus better triangulate the book’s position on the conceptual maps of
both the experts and the interested professionals.
The technologies describ ed in this book will define the next generation Internet and Web.
They may conceivably define much of your future life and lifestyle as well, just as the
present day Web has become central to the daily activities of many – the author included.
Therefore, it seems fitting also to contemplate the broader implications of the technologies,
both for personal convenience and as instigator or mediator of social change.
These technologies can affect us not only by the decisions to implement and deploy them,
but sometimes even more in the event of a decision not to use them. Either way, the decision
taken must be an informed one, ideally anchored in a broad public awareness and under-
standing of what the decision is about and with some insight into what the consequences
might be. Even a formal go-ahead deci sion is not sufficient in itself. The end result is shape d
significantly by general social acceptance and expectations, and it may even be rejected by
the intended users.
Some readers might find the outlined prospects more alarming than enticing – that is
perhaps as it should be. As with many new technologies, the end result depends significantly
on social and political decisions that for perspective require not just a clear vision but
perhaps also a glimpse of some dangers lurking along the way.
We can distinguish several categories of presumptive readers:
 The casual reader looking for an introduction and overview, who can glean enough
information to set the major technologies in their proper relationships and thus catch a
reflection of the vision.
 The ‘senior management’ types looking for buzzwords and ‘the next big thing’ explained
in sufficient detail to grasp, yet not in such unrelenting technical depth as to daze.
 The industry professional, such as a manager or the person responsible for technology,
who needs to get up to speed on what is happening in the field. Typically, the professional
wants both general technology overviews and implementation guides in order to make
informed decisions.

 The student in academic settings who studies the design and implementation of the core
technologies and the related tools.
xvi Preface
The overall style and structure of the book is held mainly at a moderate level of technical
difficulty. On the other hand, core chapters are technical and detailed enough to be used as a
course textbook. All techno-jargon terms are explained early on.
In The Semantic Web, therefore, you are invited to a guided journey through the often
arcane realms of Web technology. The narrative starts with the big picture, a landscape as if
seen from a soaring plane. We then circle areas of specialist study, thermal-generating ‘hot-
spots’, subjects that until recently were known mainly from articles in technical journals with
limited coverage of a much broader field, or from the Web sites of the institutions involved in
the research and development. Only in the past few years has the subject received wider
public notice with the publication of several overview books, as noted in Appendix B.
Book Structure
The book is organized into three fairly independent parts, each approaching the subject from
a different direction. There is some overlap, but yo u should find that each part is
complementary. Therefore, linear cover-to-cover reading might not be the optimal approach.
For some readers, the visions and critiques in Part III might be a better starting point than the
abstract issues of Part I, or the technicalities in Part II.
Part I sets the conceptual foundations for the later discussions. These first four chapters
present mostly high-level overviews intended to be appropriate for almost anyone.
 The first chapter starts with the basics and defines what Web technology is all about. It
also introduces the issues that led to the formulation of the Semantic Web initiative.
 Chapter 2 introduces the architectural models relevant to a discussion of Web technologies
and defines important terminology.
 Chapter 3 discusses general issues around creating and managing the content and
metadata structures that form the underpinnings of the Semantic Web.
 Finally, Cha pter 4 looks at online collaboration processes, which constitute an important
motivating application area for Semantic Web activities.
Part II focuses on the technologies behind the Semantic Web initiative. These core

chapters also explore representative implementations for chosen implementation areas,
providing an in-depth mix of both well-known and lesser-known solutions that illustrate
different ways of achieving Semantic Web functionality. The material is detailed enough for
Computer Studies courses and as a guide for more technical users actually wanting to
implement and augment parts of the Semantic Web.
 Chapter 5 provides layered analysis of the core protocol technologies that define Web
functionality. The main focus is on the structures and metadata assertions used to describe
and manage published data.
 Chapter 6 is an in-depth study of ontologies, the special structures used to represent term
definitions and meaningful relationships.
 Chapter 7 introduces the main organizations active in defining specifications and protocols
that are central to the Semantic Web.
 Chapter 8 examines application areas of the Sema ntic Web where prototype tools are
already implemented and available.
Preface xvii
 Chapter 9 expands on the previous chapters by examining application areas where some
aspect of the technology is deployed and usable today.
Part III elevates the discussion into the misty realms of analysis and speculation.
 Chapter 10 provides an ‘insights’ section that considers the immediate future potential for
Semantic Web solutions, and the implications for users.
 Chapter 11 explores some directions in which future Web functionality might develop in
the longer term, such as ubiquitous connectivity and the grander theme of managing
human knowledge management.
Finally, the appendices supplement the main body of the book with a terminological
glossary, references, and resources – providing additional detail that while valuable did not
easily fit into the flow of the main text.
Navigation
This book is undeniably filled with a plethora of facts and explanations, and it is written
more in the style of a narrative rather than of reference-volume itemization. Despite the
narrative ambitions, texts like this require multiple entry points and quick ways to locate

specific details.
As a complement to the detailed table of contents and index, each chapter’s ‘at a glan ce’
page provides a quick overview of the main topics covered in that chapter. Technical terms in
bold are often provided with short explanations in the Appendix A glossary.
Scattered throughout the text you will find the occasional numbered ‘Bit’ where some
special insight or factoid is singled out and highlighted. Calling the element a ‘Bit’ seemed to
convey about the right level of unpretentious emphasis – they are often just my two-bits
worth of comment. Bits serve the additional purpose of providing visual content cues for the
reader and are therefore given their own List of Bits in Appendix C.
When referencing Web resources, I use the convention of omitting the ‘http://’ prefix
because modern Web browsers accept addresses typed in without it. Although almost
ubiquitous, the ‘www.’ prefix is not always required, and in cases where a cited Web address
properly lacks it, I tried to catch instances where it might have been added incorrectly in the
copyedit process.
The Author
As an independent consultant in Sweden for 30 years, I have been responsible for software
development, localization projects, and design-team training. A special interest was adapting
an immersive teaching methodology developed for human languages to technical-training
fields.
Extensive experience in technical commun ication and teaching is coupled with a deep
understanding of cross-platform software product design, user interfaces, and usability
analysis. All provide a unique perspective to writing technical books and articles.
xviii Preface
Monthly featured contributions to a major Swedish computer magazine make up the bulk
of my technical analyst writing at present. Coverage of, and occasional speaking engage-
ments at, select technology conferences have allowed me to meet important developers.
I also maintain several professional and recreational Internet Web sites, providing
commercial Web hosting and Wiki services for others.
Collaborative Efforts
A great many people helped make this book possible by contributing their enthusiasm, time,

and effort – all in the spirit of the collaborative peer community that both the Web and book
authoring encourage. Knowledge able professionals and colleagues offered valuable time in
several rounds of technical review to help make this book a better one and I express my
profound gratitude for their efforts. I hope they enjoy the published version.
My special thanks go to the many reviewers who participated in the development work.
The Web’s own creator, and now director of the World Wide Web Consortium, Sir Tim
Berners-Lee, also honoured me with personal feedback.
Thanks are also due to the editors and production staff at John Wiley & Sons, Ltd.
Personal thanks go to supportive family members for enduring long months of seemingly
endless research and typing, reading, and editing – and for suffering the general mental
absentness of the author grappling with obscure issues.
Errata and Omissions
Any published book is neither ‘finished’ nor perfect , just hopef ully the best that could be
done within the constraints at hand. The hardest mistakes to catch are the things we think we
know. Some unquestioned truths can simply be wrong, can have changed since we learned
them, or may have more complex answers than we at first realized.
Swedish has the perceptive word hemmablind, literally blind-at-home, which means that
we tend not to see the creeping state of disarray in our immediate surroundings – think of
how unnoticed dust ‘bunnies’ can collect in corners and how papers can stack up on all
horizontal surfaces. The concept is equally applicable to not always noticing changes to our
particular fields of knowledge until someone points them out.
Omissions are generally due to the fact that an author must draw the line somewhere in
terms of scope and detail. This problem gets worse in ambitious works such as this one that
attempt to cover a large topic. I have tried in the text to indicate where this line is drawn
and why.
Alternatively, I might sometimes make overly simplified statements that someone,
somewhere, will be able to point to and say ‘Not so!’. My excuse is that not everything
can be fully verified, and sometimes the simple answer is good enough for the focus at
hand.
A book is also a snapshot. During the course of writing, things changed! Constantly!

Rapidly! In the interval between final submission and the printed book, not to mention by the
time you read this, they have likely changed even more. Not only does the existing software
continue to evolve, or sometimes disappear altogether, but new implementations can
suddenly appear from nowhere and change the entire landscape overnight.
Preface xix
Throughout the development process, therefore, book material was under constant update
and revision. A big headache involved online resource links; ‘link-rot’ is deplorable but
inevitable. Web sites change, move, or disappear. Some resources mentioned in the text
might therefore not be found and others not mentioned might be perceived as better.
The bottom line in any computer-related field is that any attempt to make a definitive
statement about such a rapidly moving target is doomed to failure. But we have to try.
Book Support and Contacting the Author
The Internet has made up-to-date reader support a far easier task than it used to be, and the
possibilities continue to amaze and stimulate me.
Reader feedback is always appreciated. Your comments and factual corrections will be
used to improve future editions of the book, and to update the support Web site. You may
e-mail me at , but to get past the junk filters, please use a meaningful subject
line and clearly reference the book. You may also write to me c/o the publisher.
Authors tend to get a lot of correspondence in connection with a published book. Please be
patient if you write and do not get an immediate response – it might not be possible. I do try
to at least acknowledge received reader mail within a reasonable time.
However , I suggest first visiting the collaborativ e wiki farm (follow links from www.leuf .com/
TheSemanticWeb), where you can meet an entire community of readers, fi nd updates and errata, and
participate in discussions about the book. The main attraction of book-related Web resources is the
contacts you can form with other readers. Collectiv ely, the readers of such a site always have more
answers and wisdom than any number of individual authors.
Thank you for joining me in this journey.
Bo Leuf
Technology Analyst, Sweden
(Gothenburg, Sweden, 2003–2005)

xx Preface
Part I
Content Concepts

1
Enhancing the Web
Although most of this book can be seen as an attempt to navigate through a landscape of
potential and opportunity for a future World Wide Web, it is prudent, as in any navigational
exercise, to start by determining one’s present location. To this end, the first chapter is a
descriptive walkabout in the current technology of the Web – its concepts and protocols. It
sets out first principles relevant to the following exploration, and it explains the terms
encountered.
In addition, a brief Web history is provided, embedded in the technical descriptions. Much
more than we think, current and future technology is designed and implemented in ways that
critically depend on the technology that came before. A successor technology is usually a
reaction, a complement, or an extension to previous technology – rarely a simple plug-in
replacement out of the blue. New technologies invariably carry a legacy, sometimes
inheriting features and conceptual aspects that are less appropriate in the new setting.
Technically savvy readers may recognize much material in this chapter, but I suspect many
will still learn some surprising things about how the Web works. It is a measure of the success
of Web technology that the average user does not need to know much of anything technical to
surf the Web. Most of the technical detail is well-hidden behind the graphical user interfaces –
it is essentially click-and-go. It is also a measure of success that fundamental enhancements
(that is, to the basic Web protocol, not features that rely on proprietary plug-in components)
have already been widely deployed in ways that are essentially transparent to the user, at
least if the client software is regularly updated.
Chapter 1 at a Glance
Chapter 1 is an overview chapter designed to give a background in broad strokes on Web
technology in general, and on the main issues that lead to the formulation of the Semantic
Web. A clear explanation of relevant terms and concepts prepare the reader for the more

technical material in the rest of the book.
There and Back Again sets the theme by suggesting that the chapter is a walkabout in the
technology fields relevant to the later discussions that chapter by chapter revisit the main
concepts, but in far greater detail.
The Semantic Web: Crafting Infrastructure for Agency Bo Leuf
# 2006 John Wiley & Sons, Ltd

×