Web Data Management
Sourav S. Bhowmick
Wee Keong Ng
Sanjay K. Madria
Web Data Management
A Warehouse Approach
With 106 Illustrations
Sourav S. Bhowmick
and Wee Keong Ng
School of Computer Engineering
Nanyang Technological University
50 Nanyang Avenue
Blk N4 2A-32
Nanyang, 639798
Singapore
Sanjay K. Madria
University of Missouri
Department of Computer Science
1870 Miner Circle Drive
310 Computer Science Building
Rolla, MO 65409
USA
Library of Congress Cataloging-in-Publication Data
Bhowmick, Sourav S.
Web data management : a warehouse approach / Sourav S. Bhowmick, Sanjay K.
Madria, Wee Keong Ng.
p. cm. — (Springer professional computing)
Includes bibliographical references and index.
ISBN 0-387-00175-1 (alk. paper)
1. Web databases. 2. Database management. 3. Data warehousing. I. Madria, Sanjay
Kumar. II. Ng, Wee Keong. III. Title. IV. Series.
QA76.9.W43B46 2003
005.75′8—dc21
2003050523
ISBN 0-387-00175-1
Printed on acid-free paper.
2004 Springer-Verlag New York, Inc.
All rights reserved. This work may not be translated or copied in whole or in part without the written permission of
the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief
excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage
and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified
as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
Printed in the United States of America.
9 8 7 6 5 4 3 2 1
SPIN 10901038
Typesetting: Pages created by the author using a Springer TEX macro package.
www.springer-ny.com
Springer-Verlag New York Berlin Heidelberg
A member of BertelsmannSpringer Science+Business Media GmbH
464
Index
table node pool, 374
tag, 52
tag attributes, 203
tag object, 79
tagless, 147
tags, 72, 147
target node type identifier, 129
Technion (Israel Institute of Technology),
21
Thunderstone, 20
TopBlend, 369
topic-specific PIW crawlers, 11
topological structure, 149
transaction management, 48
TranScm, 208
traverse, 50
treatment statement, 127
trigger, 13
truncation, 18
TSIMMIS, 37
tumour, 127
tuple, 27
tuple set, 257
type coercion, 48
type identifiers, 260
type-checking, 48
ULIXES, 46, 146, 147, 205
unbounded length paths, 203
UNIX, 21
unordered, 56
UnQL, 47, 146, 149
URL, 27, 127, 158, 202, 208
URL-minder, 369
user-driven coupling, 252
valid coupling query, 194
valid query, 189
validity checking phase, 181
validity conditions, 165
value, 100
value-driven predicate, 114
variables, 30
versions, 374
view-definition language, 42
virtual loose schema, 42
visibility, 399
visualization, 8
visualize, 353
VScmDL, 208
W3QL, 21, 22
W3QS, 21, 30, 146, 147, 204, 208
warehouse, 149
warehouse data, 207
warehouse document pool, 374
warehouse node pool, 374
warehousing, 1
Web, 1, 17
web algebra, 417
web algebraic operators, 16, 251, 367, 418
web bag, 273, 419
web cartesian product, 288
web coalesce, 358
web correlate, 424
web crawler, 5, 20
Web data, 1, 14, 18, 147, 203, 257, 417
web delta manager, 11
web deltas, 12
web directory services, 425
web distinct, 273, 417
Web documents, 127, 255, 357, 418, 420
web join, 213, 417
web manipulator, 11
web marts, 13
web miner, 11, 417
web objects, 202, 418
web operators, 251, 417
web project, 11, 216, 251, 417
web query, 99, 161, 202
web query processing systems, 391
web ranking, 424
web schema, 181, 207, 252, 355, 399, 418
web schema pool, 375
web select, 168, 247
Web sites, 2, 8
web sort, 364
web table, 12, 176, 207, 251, 353, 355, 371,
391, 418
web table generation phase, 253
web tuple pool, 374, 375
web tuples, 200, 207, 251, 391, 418
web tuples generation phase, 253
web union, 252, 417
web warehouse, 1, 2, 5, 10, 205, 207, 287,
389, 392
web warehousing, 417
WebCQ, 370
WebGUIDE, 369
WebLog, 21, 28, 30, 147, 149, 204, 208
WebOQL, 40, 44, 146, 147, 204
Index
webs, 45
WebSQL, 21, 27, 30, 147, 149, 203, 204,
208
WHIRL, 40
WHOM, 10, 94, 418
WHOWEDA, 11, 17, 146, 207, 251, 289,
367, 418
WordNet, 32
World Wide Web, 1
wrapper, 7, 8, 35
Wrapper Specification Language (WSL),
38
WWW, 14, 18, 146, 390
X-Terminal, 24
XML, 8, 17, 210
XML Graph, 57
XML-QL, 52, 56, 146, 147, 205
XML-QL query, 58
Xpath, 90
Yahoo, 5, 20
YAT, 209
YATL , 52, 147, 205
465