Tải bản đầy đủ (.pdf) (537 trang)

Database and expert systems applications conferen

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (17.28 MB, 537 trang )


Lecture Notes in Computer Science
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Alfred Kobsa
University of California, Irvine, CA, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Germany
Madhu Sudan
Microsoft Research, Cambridge, MA, USA


Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbruecken, Germany

7446


Stephen W. Liddle Klaus-Dieter Schewe
A Min Tjoa Xiaofang Zhou (Eds.)

Database and Expert
Systems Applications
23rd International Conference, DEXA 2012
Vienna, Austria, September 3-6, 2012
Proceedings, Part I

13


Volume Editors
Stephen W. Liddle
Brigham Young University, Marriott School
784 TNRB, Provo, UT 84602, USA
E-mail:
Klaus-Dieter Schewe
Software Competence Center Hagenberg
Softwarepark 21, 4232 Hagenberg, Austria

E-mail:
A Min Tjoa
Vienna University of Technology, Institute of Software Technology
Favoritenstraße 9-11/188, 1040 Wien, Austria
E-mail:
Xiaofang Zhou
University of Queensland
School of Information Technology and Electrical Engineering
Brisbane, QLD 4072, Australia
E-mail:

ISSN 0302-9743
e-ISSN 1611-3349
e-ISBN 978-3-642-32600-4
ISBN 978-3-642-32599-1
DOI 10.1007/978-3-642-32600-4
Springer Heidelberg Dordrecht London New York
Library of Congress Control Number: 2012943836
CR Subject Classification (1998): H.2.3-4, H.2.7-8, H.2, H.3.3-5, H.4.1, H.5.3, I.2.1,
I.2.4, I.2.6, J.1, C.2
LNCS Sublibrary: SL 3 – Information Systems and Application, incl. Internet/Web
and HCI
© Springer-Verlag Berlin Heidelberg 2012
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,

even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)


Preface

This volume includes invited papers, research papers, and short papers presented
at DEXA 2012, the 23rd International Conference on Database and Expert Systems Applications, held in Vienna, Austria. DEXA 2012 continued the long and
successful DEXA tradition begun in 1990, bringing together a large collection of
bright researchers, scientists, and practitioners from around the world to share
new results in the areas of database, intelligent systems, and related advanced
applications.
The call for papers resulted in the submission of 179 papers, of which 49 were
accepted as regular research papers, and 37 were accepted as short papers. The
authors of these papers come from 43 different countries. The papers discuss a
range of topics including:























Database query processing, in particular XML queries
Labeling of XML documents
Computational efficiency
Data extraction
Personalization, preferences, and ranking
Security and privacy
Database schema evaluation and evolution
Semantic Web
Privacy and provenance
Data mining
Data streaming
Distributed systems
Searching and query answering
Structuring, compression and optimization
Failure, fault analysis, and uncertainty
Predication, extraction, and annotation
Ranking and personalization
Database partitioning and performance measurement
Recommendation and prediction systems

Business processes
Social networking

In addition to the papers selected by the Program Committee two internationally recognized scholars delivered keynote speeches:
Georg Gottlob: DIADEM: Domains to Databases
Yamie A¨ıt-Ameur: Stepwise Development of Formal Models for Web Services
Compositions – Modelling and Property Verification


VI

Preface

In addition to the main conference track, DEXA 2012 also included seven
workshops that explored the conference theme within the context of life sciences,
specific application areas, and theoretical underpinnings.
We are grateful to the hundreds of authors who submitted papers to DEXA
2012 and to our large Program Committee for the many hours they spent carefully reading and reviewing these papers. The Program Committee was also
assisted by a number of external referees, and we appreciate their contributions
and detailed comments.
We are thankful for the Institute of Software Technology at Vienna University of Technology for organizing DEXA 2012, and for the excellent working
atmosphere provided. In particular, we recognize the efforts of the conference
Organizing Committee led by the DEXA 2012 General Chair A Min Tjoa. We
are gratefull to the Workshop Chairs Abdelkader Hameurlain, A Min Tjoa, and
Roland R. Wagner.
Finally, we are especially grateful to Gabriela Wagner, whose professional
attention to detail and skillful handling of all aspects of the Program Committee
management and proceedings preparation was most helpful.
September 2012


Stephen W. Liddle
Klaus-Dieter Schewe
Xiaofang Zhou


Organization

Honorary Chair
Makoto Takizawa

Seikei University, Japan

General Chair
A Min Tjoa

Technical University of Vienna, Austria

Conference Program Chair
Stephen Liddle
Klaus-Dieter Schewe
Xiaofang Zhou

Brigham Young University, USA
Software Competence Center Hagenberg and
Johannes Kepler University Linz, Austria
University of Queensland, Australia

Publication Chair
Vladimir Marik


Czech Technical University, Czech Republic

Program Committee
Witold Abramowicz
Rafael Accorsi
Hamideh Afsarmanesh
Riccardo Albertoni
Rachid Anane
Annalisa Appice
Mustafa Atay
James Bailey
Spiridon Bakiras
Zhifeng Bao
Ladjel Bellatreche
Morad Benyoucef
Catherine Berrut
Debmalya Biswas
Athman Bouguettaya
Danielle Boulanger
Omar Boussaid
Stephane Bressan
Patrick Brezillon
Yiwei Cao
Silvana Castano

The Poznan University of Economics, Poland
University of Freiburg, Germany
University of Amsterdam, The Netherlands
OEG, Universidad Polit´ecnica de Madrid,
Spain

Coventry University, UK
Universit`
a degli Studi di Bari, Italy
Winston-Salem State University, USA
University of Melbourne, Australia
City University of New York, USA
National University of Singapore, Singapore
ENSMA, France
University of Ottawa, Canada
Grenoble University, France
Nokia Research, Germany
RMIT, Australia
MODEME,University of Lyon, France
University of Lyon, France
National University of Singapore, Singapore
University of Paris VI (UPMC), France
RWTH Aachen University, Germany
Universit`
a degli Studi di Milano, Italy


VIII

Organization

Barbara Catania
Michelangelo Ceci
Cindy Chen
Phoebe Chen
Shu-Ching Chen

Hao Cheng
James Cheng
Reynold Cheng
Max Chevalier
Byron Choi
Henning Christiansen
Soon Ae Chun
Eliseo Clementini
Oscar Corcho
Bin Cui
Deborah Dahl
J´erˆome Darmont
Andre de Carvalho
Guy De Tr´e
Olga De Troyer
Roberto De Virgilio
John Debenham
Hendrik Decker
Zhi-Hong Deng
Vincenzo Deufemia
Claudia Diamantini
Juliette Dibie-Barth´elemy
Ying Ding
Zhiming Ding
Gillian Dobbie
Peter Dolog
Dejing Dou
Cedric du Mouza
Johann Eder
David Embley

Suzanne M. Embury
Bettina Fazzinga
Leonidas Fegaras
Stefano Ferilli
Flavio Ferrararotti
Filomena Ferrucci
Flavius Frasincar
Bernhard Freudenthaler

Universit`
a di Genova, Italy
University of Bari, Italy
University of Massachusetts Lowell, USA
La Trobe University, Australia
Florida International University, USA
Yahoo
Nanyang Technological University, Singapore
The University of Hong Kong, China
IRIT - SIG, Universit´e de Toulouse, France
Hong Kong Baptist University, Hong Kong
Roskilde University, Denmark
City University of New York, USA
University of L’Aquila, Italy
Universidad Polit´ecnica de Madrid, Spain
Peking University, China
Conversational Technologies
Universit´e de Lyon (ERIC Lyon 2), France
University of Sao Paulo, Brazil
Ghent University, Belgium
Vrije Universiteit Brussel, Belgium

Universit`
a Roma Tre, Italy
University of Technology, Sydney, Australia
Universidad Polit´ecnica de Valencia, Spain
Peking University, China
Universit`a degli Studi di Salerno, Italy
Universit`
a Politecnica delle Marche, Italy
AgroParisTech, France
Indiana University, USA
Chinese Academy of Sciences, China
University of Auckland, New Zealand
Aalborg University, Denmark
University of Oregon, USA
CNAM, France
University of Klagenfurt, Austria
Brigham Young University, USA
The University of Manchester, UK
University of Calabria, Italy
The University of Texas at Arlington, USA
University of Bari, Italy
Victoria University of Wellington, New Zealand
Universit`
a di Salerno, Italy
Erasmus University Rotterdam,
The Netherlands
Software Competence Center Hagenberg,
Austria



Organization

Hiroaki Fukuda
Steven Furnell
Aryya Gangopadhyay
Yunjun Gao
Manolis Gergatsoulis
Fabio Grandi
Carmine Gravino
Sven Groppe
William Grosky
Jerzy Grzymala-Busse
Francesco Guerra
Giovanna Guerrini
Antonella Guzzo
Abdelkader Hameurlain
Ibrahim Hamidah
Wook-Shin Han
Takahiro Hara
Theo H¨arder
Francisco Herrera
Steven Hoi
Estevam Rafael Hruschka Jr.
Wynne Hsu
Yu Hua
Jimmy Huang
Xiaoyu Huang
Ionut Emil Iacob
Sergio Ilarri
Abdessamad Imine

Yoshiharu Ishikawa
Adam Jatowt
Peiquan Jin
Anne Kao
Dimitris Karagiannis
Stefan Katzenbeisser
Yiping Ke
Sang-Wook Kim
Hiroyuki Kitagawa
Carsten Kleiner
Ibrahim Korpeoglu
Harald Kosch

IX

Shibaura Institute of Technology, Japan
University of Plymouth, UK
University of Maryland Baltimore County, USA
Zhejiang University, China
Ionian University, Greece
University of Bologna, Italy
University of Salerno, Italy

ubeck University, Germany
University of Michigan, USA
University of Kansas, USA
Universit`a degli Studi Di Modena e Reggio
Emilia, Italy
University of Genoa, Italy
University of Calabria, Italy

Paul Sabatier University, Toulouse, France
Universiti Putra Malaysia, Malaysia
Kyungpook National University, Korea
Osaka University, Japan
TU Kaiserslautern, Germany
University of Granada, Spain
Nanyang Technological University, Singapore
Federal University of Sao Carlos, Brazil, and
Carnegie Mellon University, USA
National University of Singapore, Singapore
Huazhong University of Science and
Technology, China
York University, Canada
South China University of Technology, China
Georgia Southern University, USA
University of Zaragoza, Spain
University of Nancy, France
Nagoya University, Japan
Kyoto University, Japan
University of Science and Technology, China
Boeing Phantom Works, USA
University of Vienna, Austria
Technical University of Darmstadt, Germany
Institute of High Performance Computing,
Singapore
Hanyang University, Korea
University of Tsukuba, Japan
University of Applied Sciences and Arts
Hannover, Germany
Bilkent University, Turkey

University of Passau, Germany


X

Organization

Michal Kr´
atk´
y
Arun Kumar
Ashish Kundu
Josef K¨
ung
Kwok-Wa Lam
Nadira Lammari
Gianfranco Lamperti
Mong Li Lee
Alain Toinon Leger
Daniel Lemire
Lenka Lhotska
Wenxin Liang
Lipyeow Lim
Tok Wang Ling
Sebastian Link
Volker Linnemann
Chengfei Liu
Chuan-Ming Liu
Fuyu Liu
Hong-Cheu Liu

Jorge Lloret Gazo
´
Miguel Angel

opez Carmona
Jiaheng Lu
Jianguo Lu
Alessandra Lumini
Hui Ma
Qiang Ma
St´ephane Maag
Nikos Mamoulis
Elio Masciari
Norman May
Jose-Norberto Maz´
on
Dennis McLeod
Brahim Medjahed
Harekrishna Misra
Jose Mocito
Riad Mokadem
Lars M¨onch
Yang-Sae Moon
Reagan Moore

VSB-Technical University of Ostrava,
Czech Republic
IBM Research, India
IBM T.J. Watson Research Center, Hawthorne,
USA

University of Linz, Austria
University of Hong Kong, Hong Kong
CNAM, France
University of Brescia, Italy
National University of Singapore, Singapore
Orange - France Telecom R&D, France
LICEF Research Center, Canada
Czech Technical University, Czech Republic
Dalian University of Technology, China
University of Hawai at Manoa, USA
National University of Singapore, Singapore
University of Auckland, New Zealand
University of L¨
ubeck, Germany
Swinburne University of Technology, Australia
National Taipei University of Technology,
Taiwan
Microsoft Corporation, USA
University of South Australia, Australia
University of Zaragoza, Spain
University of Alcal´
a de Henares, Spain
Renmin University, China
University of Windsor, Canada
University of Bologna, Italy
Victoria University of Wellington, New Zealand
Kyoto University, Japan
TELECOM SudParis, France
University of Hong Kong, Hong Kong
ICAR-CNR, Universit`a della Calabria, Italy

SAP AG, Germany
University of Alicante, Spain
University of Southern California, USA
University of Michigan - Dearborn, USA
Institute of Rural Management Anand, India
INESC-ID/FCUL, Portugal
IRIT, Paul Sabatier University, France
FernUniversit¨
at in Hagen, Germany
Kangwon National University, Korea
University of North Carolina at Chapel Hill,
USA


Organization

Franck Morvan
Mirco Musolesi
Ismael Navas-Delgado
Wilfred Ng
Javier Nieves Acedo
Mourad Oussalah
Gultekin Ozsoyoglu
George Pallis
Christos Papatheodorou
Marcin Paprzycki
Oscar Pastor Lopez
Jovan Pehcevski
Reinhard Pichler
Clara Pizzuti

Jaroslav Pokorny
Elaheh Pourabbas
Fausto Rabitti
Claudia Raibulet
Isidro Ramos
Praveen Rao
Rodolfo F. Resende
Claudia Roncancio
Edna Ruckhaus
Massimo Ruffolo
Igor Ruiz Ag´
undez
Giovanni Maria Sacco
Shazia Sadiq
Simonas Saltenis
Carlo Sansone
Igor Santos Grueiro
N.L. Sarda
Marinette Savonnet
Raimondo Schettini
Erich Schweighofer
Florence Sedes
Nazha Selmaoui
Patrick Siarry
Gheorghe Cosmin Silaghi
Leonid Sokolinsky

XI

IRIT, Paul Sabatier University, Toulouse,

France
University of Birmingham, UK
University of M´alaga, Spain
University of Science and Technology,
Hong Kong
Deusto University, Spain
University of Nantes, France
Case Western Reserve University, USA
University of Cyprus, Cyprus
Ionian University and “Athena” Research
Centre, Greece
Polish Academy of Sciences, Warsaw
Management Academy, Poland
Universidad Politecnica de Valencia, Spain
European University, Macedonia
Technische Universit¨
at Wien, Austria
ICAR-CNR, Italy
Charles University in Prague, Czech Republic
National Research Council, Italy
ISTI, CNR Pisa, Italy
Universit`
a degli Studi di Milano-Bicocca, Italy
Technical University of Valencia, Spain
University of Missour-KaNSAS City, USA
Federal University of Minas Gerais, Brazil
Grenoble University / LIG, France
Universidad Simon Bolivar, Venezuela
ICAR-CNR, Italy
Deusto University, Spain

University of Turin, Italy
University of Queensland, Australia
Aalborg University, Denmark
Universit`a di Napoli ”Federico II”, Italy
Deusto University, Spain
I.I.T. Bombay, India
University of Burgundy, France
Universit`
a degli Studi di Milano-Bicocca, Italy
University of Vienna, Austria
IRIT, Paul Sabatier University, Toulouse,
France
University of New Caledonia, France
Universit´e Paris 12 (LiSSi), France
Babes-Bolyai University of Cluj-Napoca,
Romania
South Ural State University, Russia


XII

Organization

Bala Srinivasan
Umberto Straccia
Darijus Strasunskas
Lena Stromback
Aixin Sun
David Taniar
Cui Tao

Maguelonne Teisseire
Sergio Tessaris
Olivier Teste
Stephanie Teufel
Jukka Teuhola
Taro Tezuka
Bernhard Thalheim
J.M. Thevenin
Helmut Thoma
A Min Tjoa
Vicenc Torra
Traian Truta
Theodoros Tzouramanis
Marco Vieira
Jianyong Wang
Junhu Wang
Qing Wang
Wei Wang
Wendy Hui Wang
Andreas Wombacher
Lai Xu
Ming Hour Yang
Xiaochun Yang
Haruo Yokota
Zhiwen Yu
Xiao-Jun Zeng
Zhigang Zeng
Xiuzhen (Jenny) Zhang
Yanchang Zhao
Yu Zheng

Qiang Zhu
Yan Zhu

Monash University, Australia
Italian National Research Council, Italy
Strasunskas Forskning, Norway
Swedish Meteorological and Hydrological
Institute, Sweden
Nanyang Technological University, Singapore
Monash University, Australia
Mayo Clinic, USA
Irstea - TETIS, France
Free University of Bozen-Bolzano, Italy
IRIT, University of Toulouse, France
University of Fribourg, Switzerland
University of Turku, Finland
University of Tsukuba, Japan
Christian Albrechts Universit¨at Kiel, Germany
University of Toulouse I Capitole, France
Thoma SW-Engineering, Basel, Switzerland
Technical University of Vienna, Austria
IIIA-CSIC, Spain
Northern Kentucky University, USA
University of the Aegean, Greece
University of Coimbra, Portugal
Tsinghua University, China
Griffith University, Brisbane, Australia
The Australian National University, Australia
University of New South Wales, Sydney,
Australia

Stevens Institute of Technology, USA
University Twente, The Netherlands
Bournemouth University, UK
Chung Yuan Christian University, Taiwan
Northeastern University, China
Tokyo Institute of Technology, Japan
Northwestern Polytechnical University, China
University of Manchester, UK
Huazhong University of Science and
Technology, China
RMIT University Australia, Australia
RDataMining.com, Australia
Microsoft Research Asia, China
The University of Michigan, USA
Southwest Jiaotong University, Chengdu,
China


Organization

XIII

External Reviewers
Hadjali Allel
Toshiyuki Amagasa
Flora Amato
Abdelkrim Amirat
Zahoua Aoussat
Radim Baˇca
Dinesh Barenkala

Riad Belkhatir
Yiklun Cai
Nafisa Afrin Chowdhury
Shumo Chu
Ercument Cicek
Camelia Constantin
Ryadh Dahimene
Matthew Damigos
Franca Debole
Saulo Domingos de Souza
Pedro
Laurence Rodrigues
do Amaral
Andrea Esuli
Qiong Fang
Nikolaos Fousteris
Filippo Furfaro
Jose Manuel Gimenez
Reginaldo Gotardo
Fernando Gutierrez
Zeinab Hmedeh
Hai Huang
Lili Jiang
Shangpu Jiang
Hideyuki Kawashima
Selma Khouri
Christian Koncilia
Cyril Labbe
Thuy Ngoc Le
Fabio Leuzzi

Luochen Li
Jing Li
Xumin Liu
Yifei Lu
Jia-Ning Luo

ENSSAT, France
Tsukuba University, Japan
University of Naples Federico II, Italy
University of Nantes, France
University of Algiers, Algeria
Technical University of Ostrava,
Czech Republic
University of Missouri-Kansas City, USA
University of Nantes, France
University of Hong Kong
University of Oregon, USA
Nanyang Technological University, Singapore
Case Western Reserve University, USA
UPMC, France
CNAM, France
NTUA, Greece
ISTI-CNR, Italy
Federal University of Sao Carlos, Brazil
Federal University of Uberlandia, Brazil
ISTI-CNR, Italy
University of Science and Technology,
Hong Kong
Ionian University, Greece
DEIS, University of Calabria, Italy

Universidad de Alcala, Spain
Federal University of Sao Carlos, Brazil
University of Oregon, USA
CNAM, France
KAUST, Saudi Arabia
Lanzhou University, China
University of Oregon, USA
University of Tsukuba, Japan
LIAS/ENSMA, France
University of Klagenfurt, Austria
Universit´e Joseph Fourier, Grenoble, France
National University of Singapore, Singapore
University of Bari, Italy
National University of Singapore, Singapore
University of Hong Kong
Rochester Institute of Technology, USA
University of New South Wales, Australia
Ming Chuan University, Taiwan


XIV

Organization

Ivan Marsa Maestre
Ruslan Miniakhmetov
Jun Miyazaki
Bo Ning
Goke Ojewole
Ermelinda Oro

Constantin Pan
Mandeep Pannu
Srivenu Paturi
Xinjian Qi
Gang Qian
Jianbin Qin
Vineet Rajani
Hongda Ren
Sara Romano
Sherif Sakr
Federica Sarro
Wei Shen
Vasil Slavov
Daniela Stojanova
Umberto Straccia
Guilaine Talens
Bin Wang
Changzhou Wang
Hao Wang
Jia Wang
Yousuke Watanabe
Huanhuan Wu
Zhiqiang Xu
Kefeng Xuan
Da Yan
Qi Yu
Gneg Zhao
Zhou Zhao
Mikhail Zymbler


Universidad de Alcala, Spain
South Ural State University, Chelyabinsk,
Russia
Nara Institute of Science and Technology,
Japan
Dalian Maritime University, China
NVIDIA, USA
ICAR-CNR, Italy
South Ural State University, Chelyabinsk,
Russia
Coventry University, UK
University of Missouri-Kansas City, USA
Case Western Reserve University, USA
University of Central Oklahoma, USA
University of New South Wales, Australia
Max Planck Institute for Software Systems,
Germany
Tsinghua University, China
University of Naples Federico II, Italy
NICTA, Australia
University of Salerno, Italy
Tsinghua University, China
University of Missouri-Kansas City, USA
Jozef Stefan Institute, Slovenia
ISTI-CNR, Italy
MODEME labs, University of Lyon-Jean
Moulin Lyon, France
Northeastern University, China
Boeing Research and Technology, USA
University of Hong Kong

The Chinese University of Hong Kong,
Hong Kong
Tokyo Institute of Technology, Japan
The Chinese University of Hong Kong,
Hong Kong
Nanyang Technological University, Singapore
Monash University, Australia
University of Science and Technology,
Hong Kong
Rochester Institute of Technology, USA
Monash University, Australia
University of Science and Technology,
Hong Kong
South Ural State University, Chelyabinsk,
Russia


Table of Contents – Part I

Keynote Talks
DIADEM: Domains to Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Tim Furche, Georg Gottlob, and Christian Schallhart

1

Stepwise Development of Formal Models for Web Services Compositions:
Modelling and Property Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Yamine Ait-Ameur and Idir Ait-Sadoune

9


XML Queries and Labeling I
A Hybrid Approach for General XML Query Processing . . . . . . . . . . . . . .
Huayu Wu, Ruiming Tang, Tok Wang Ling, Yong Zeng, and
St´ephane Bressan

10

SCOOTER: A Compact and Scalable Dynamic Labeling Scheme for
XML Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Martin F. O’Connor and Mark Roantree

26

Reuse the Deleted Labels for Vector Order-Based Dynamic XML
Labeling Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Canwei Zhuang and Shaorong Feng

41

Computational Efficiency
Towards an Efficient Flash-Based Mid-Tier Cache . . . . . . . . . . . . . . . . . . . .
Yi Ou, Jianliang Xu, and Theo H¨
arder

55

Evacuation Planning of Large Buildings Using Ladders . . . . . . . . . . . . . . .
Alka Bhushan, Nandlal L. Sarda, and P.V. Rami Reddy


71

A Write Efficient PCM-Aware Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Meduri Venkata Vamsikrishna, Zhan Su, and Kian-Lee Tan

86

XML Queries
Performance Analysis of Algorithms to Reason about XML Keys . . . . . . .
Flavio Ferrarotti, Sven Hartmann, Sebastian Link,
Mauricio Marin, and Emir Mu˜
noz
Finding Top-K Correct XPath Queries of User’s Incorrect XPath
Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Kosetsu Ikeda and Nobutaka Suzuki

101

116


XVI

Table of Contents – Part I

Analyzing Plan Diagrams of XQuery Optimizers . . . . . . . . . . . . . . . . . . . . .
H.S. Bruhathi and Jayant R. Haritsa

131


Data Extraction
Spreadsheet Metadata Extraction: A Layout-Based Approach . . . . . . . . . .
Somchai Chatvichienchai
Automated Extraction of Semantic Concepts from Semi-structured
Data: Supporting Computer-Based Education through the Analysis of
Lecture Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Thushari Atapattu, Katrina Falkner, and Nickolas Falkner
A Confidence–Weighted Metric for Unsupervised Ontology Population
from Web Texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hil´
ario Oliveira, Rinaldo Lima, Jo˜
ao Gomes, Rafael Ferreira,
Fred Freitas, and Evandro Costa

147

161

176

Personalization, Preferences, and Ranking
Situation-Aware User’s Interests Prediction for Query Enrichment . . . . . .
Imen Ben Sassi, Chiraz Trabelsi, Amel Bouzeghoub, and
Sadok Ben Yahia

191

The Effective Relevance Link between a Document and a Query . . . . . . .
Karam Abdulahhad, Jean-Pierre Chevallet, and Catherine Berrut


206

Incremental Computation of Skyline Queries with Dynamic
Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Tassadit Bouadi, Marie-Odile Cordier, and Ren´e Quiniou

219

Databases and Schemas
Efficient Discovery of Correlated Patterns in Transactional Databases
Using Items’ Support Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
R. Uday Kiran and Masaru Kitsuregawa

234

On Checking Executable Conceptual Schema Validity by Testing . . . . . . .
Albert Tort, Antoni Oliv´e, and Maria-Ribera Sancho

249

Querying Transaction–Time Databases under Branched Schema
Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wenyu Huo and Vassilis J. Tsotras

265


Table of Contents – Part I

XVII


Privacy and Provenance
Fast Identity Anonymization on Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Xuesong Lu, Yi Song, and St´ephane Bressan

281

Probabilistic Inference of Fine-Grained Data Provenance . . . . . . . . . . . . . .
Mohammad Rezwanul Huq, Peter M.G. Apers, and
Andreas Wombacher

296

Enhancing Utility and Privacy-Safety via Semi-homogenous
Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Xianmang He, Wei Wang, HuaHui Chen, Guang Jin,
Yefang Chen, and Yihong Dong

311

XML Queries and Labeling II
Processing XML Twig Pattern Query with Wildcards . . . . . . . . . . . . . . . .
Huayu Wu, Chunbin Lin, Tok Wang Ling, and Jiaheng Lu

326

A Direct Approach to Holistic Boolean-Twig Pattern Evaluation . . . . . . .
Dabin Ding, Dunren Che, and Wen-Chi Hou

342


Full Tree-Based Encoding Technique for Dynamic XML Labeling
Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Canwei Zhuang and Shaorong Feng

357

Data Streams
Top-k Maximal Influential Paths in Network Data . . . . . . . . . . . . . . . . . . . .
Enliang Xu, Wynne Hsu, Mong Li Lee, and Dhaval Patel

369

Learning to Rank from Concept-Drifting Network Data Streams . . . . . . .
Lucrezia Macchia, Michelangelo Ceci, and Donato Malerba

384

Top-k Context-Aware Queries on Streams . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lo¨ıc Petit, Sandra de Amo, Claudia Roncancio, and Cyril Labb´e

397

Structuring, Compression and Optimization
Fast Block-Compressed Inverted Lists (Short Paper) . . . . . . . . . . . . . . . . . .
Giovanni M. Sacco

412

Positional Data Organization and Compression in Web Inverted

Indexes (Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Leonidas Akritidis and Panayiotis Bozanis

422

Decreasing Memory Footprints for Better Enterprise Java Application
Performance (Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Stoyan Garbatov and Jo˜
ao Cachopo

430


XVIII

Table of Contents – Part I

Knowledge-Driven Syntactic Structuring: The Case of Multidimensional
Space of Music Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wladyslaw Homenda and Mariusz Rybnik

438

Data Mining I
Mining Frequent Itemsets Using Node-Sets of a Prefix-Tree . . . . . . . . . . . .
Jun-Feng Qu and Mengchi Liu

453

MAX-FLMin: An Approach for Mining Maximal Frequent Links and

Generating Semantical Structures from Social Networks . . . . . . . . . . . . . . .
Erick Stattner and Martine Collard

468

Road Networks and Graph Search
Sequenced Route Query in Road Network Distance Based on
Incremental Euclidean Restriction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Yutaka Ohsawa, Htoo Htoo, Noboru Sonehara, and Masao Sakauchi

484

Path-Based Constrained Nearest Neighbor Search in a Road Network
(Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Yingyuan Xiao, Yan Shen, Tao Jiang, and Heng Wang

492

Efficient Fuzzy Ranking for Keyword Search on Graphs
(Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Nidhi R. Arora, Wookey Lee, Carson Kai-Sang Leung,
Jinho Kim, and Harshit Kumar
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

502

511


Table of Contents – Part II


Query Processing I
Consistent Query Answering Using Relational Databases through
Argumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cristhian A.D. Deagustini, Santiago E. Fulladoza Dalib´
on,
Sebasti´
an Gottifredi, Marcelo A. Falappa, and Guillermo R. Simari
Analytics-Driven Lossless Data Compression for Rapid In-situ Indexing,
Storing, and Querying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
John Jenkins, Isha Arkatkar, Sriram Lakshminarasimhan,
Neil Shah, Eric R. Schendel, Stephane Ethier, Choong-Seock Chang,
Jacqueline H. Chen, Hemanth Kolla, Scott Klasky, Robert Ross, and
Nagiza F. Samatova

1

16

Prediction, Extraction, and Annotation
Prediction of Web User Behavior by Discovering Temporal Relational
Rules from Web Log Data (Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Xiuming Yu, Meijing Li, Incheon Paik, and Keun Ho Ryu
A Hybrid Approach to Text Categorization Applied to Semantic
Annotation (Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Jos´e Luis Navarro-Galindo, Jos´e Samos, and
M. Jos´e Mu˜
noz-Alf´erez
An Unsupervised Framework for Topological Relations Extraction from
Geographic Documents (Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Corrado Loglisci, Dino Ienco, Mathieu Roche,
Maguelonne Teisseire, and Donato Malerba

31

39

48

Failure, Fault Analysis, and Uncertainty
Combination of Machine-Learning Algorithms for Fault Prediction in
High-Precision Foundries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Javier Nieves, Igor Santos, and Pablo G. Bringas

56

A Framework for Conditioning Uncertain Relational Data . . . . . . . . . . . . .
Ruiming Tang, Reynold Cheng, Huayu Wu, and St´ephane Bressan

71

Cause Analysis of New Incidents by Using Failure Knowledge
Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Yuki Awano, Qiang Ma, and Masatoshi Yoshikawa

88


XX


Table of Contents – Part II

Ranking and Personalization
Modeling and Querying Context-Aware Personal Information Spaces
(Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Rania Kh´efifi, Pascal Poizat, and Fatiha Sa¨ıs

103

Ontology-Based Recommendation Algorithms for Personalized
Education (Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Amir Bahmani, Sahra Sedigh, and Ali Hurson

111

Towards Quantitative Constraints Ranking in Data Clustering . . . . . . . . .
Eya Ben Ahmed, Ahlem Nabli, and Fa¨ıez Gargouri

121

A Topic-Oriented Analysis of Information Diffusion in a Blogosphere . . .
Kyu-Hwang Kang, Seung-Hwan Lim, Sang-Wook Kim,
Min-Hee Jang, and Byeong-Soo Jeong

129

Searching I
Trip Tweets Search by Considering Spatio-temporal Continuity of User
Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Keisuke Hasegawa, Qiang Ma, and Masatoshi Yoshikawa


141

Incremental Cosine Computations for Search and Exploration of Tag
Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Raymond Vermaas, Damir Vandic, and Flavius Frasincar

156

Impression-Aware Video Stream Retrieval System with Temporal
Color-Sentiment Analysis and Visualization . . . . . . . . . . . . . . . . . . . . . . . . .
Shuichi Kurabayashi and Yasushi Kiyoki

168

Database Partitioning and Performance
Dynamic Workload-Based Partitioning for Large-Scale Databases
(Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Miguel Liroz-Gistau, Reza Akbarinia, Esther Pacitti,
Fabio Porto, and Patrick Valduriez

183

Dynamic Vertical Partitioning of Multimedia Databases Using Active
Rules (Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lisbeth Rodr´ıguez and Xiaoou Li

191

RTDW-bench: Benchmark for Testing Refreshing Performance of

Real-Time Data Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Jacek Jedrzejczak, Tomasz Koszlajda, and Robert Wrembel

199

Middleware and Language for Sensor Streams (Short Paper) . . . . . . . . . . .
Pedro Furtado

207


Table of Contents – Part II

XXI

Semantic Web
Statistical Analysis of the owl:sameAs Network for Aligning Concepts
in the Linking Open Data Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Gianluca Correndo, Antonio Penta, Nicholas Gibbins, and
Nigel Shadbolt

215

Paragraph Tables: A Storage Scheme Based on RDF Document
Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Akiyoshi Matono and Isao Kojima

231

Data Mining II

Continuously Mining Sliding Window Trend Clusters in a Sensor
Network (Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Annalisa Appice, Donato Malerba, and Anna Ciampi

248

Generic Subsequence Matching Framework: Modularity, Flexibility,
Efficiency (Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
David Novak, Petr Volny, and Pavel Zezula

256

Distributed Systems
R-Proxy Framework for In-DB Data-Parallel Analytics . . . . . . . . . . . . . . .
Qiming Chen, Meichun Hsu, Ren Wu, and Jerry Shan

266

View Selection under Multiple Resource Constraints in a Distributed
Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Imene Mami, Zohra Bellahsene, and Remi Coletta

281

Web Searching and Query Answering
The Impact of Modes of Mediation on the Web Retrieval Process
(Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mandeep Pannu, Rachid Anane, and Anne James

297


Querying a Semi-automated Data Integration System . . . . . . . . . . . . . . . . .
Cheikh Niang, B´eatrice Bouchou, Moussa Lo, and Yacine Sam

305

Recommendation and Prediction Systems
A New Approach for Date Sharing and Recommendation in Social
Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Dawen Jia, Cheng Zeng, Wenhui Nie, Zhihao Li, and Zhiyong Peng
A Framework for Time-Aware Recommendations . . . . . . . . . . . . . . . . . . . . .
Kostas Stefanidis, Irene Ntoutsi, Kjetil Nørv˚
ag, and
Hans-Peter Kriegel

314
329


XXII

Table of Contents – Part II

A Hybrid Time-Series Link Prediction Framework for Large Social
Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Jia Zhu, Qing Xie, and Eun Jung Chin

345

Query Processing II

A Comparison of Top-k Temporal Keyword Querying over Versioned
Text Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wenyu Huo and Vassilis J. Tsotras

360

An Efficient SQL Rewrite Approach for Temporal Coalescing in the
Teradata RDBMS (Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mohammed Al-Kateb, Ahmad Ghazal, and Alain Crolotte

375

HIP: I nformation P assing for Optimizing Join-Intensive Data
Processing Workloads on H adoop (Short Paper) . . . . . . . . . . . . . . . . . . . . .
Seokyong Hong and Kemafor Anyanwu

384

Query Processing III
All-Visible-k -Nearest-Neighbor Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Yafei Wang, Yunjun Gao, Lu Chen, Gang Chen, and Qing Li

392

Algorithm for Term Linearizations of Aggregate Queries with
Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Victor Felea and Violeta Felea

408


Evaluating Skyline Queries on Spatial Web Objects (Short Paper) . . . . . .
Alfredo Regalado, Marlene Goncalves, and Soraya Abad-Mota

416

Alternative Query Optimization for Workload Management
(Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Zahid Abul-Basher, Yi Feng, Parke Godfrey, Xiaohui Yu,
Mokhtar Kandil, Danny Zilio, and Calisto Zuzarte

424

Searching II
Online Top-k Similar Time-Lagged Pattern Pair Search in Multiple
Time Series (Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hisashi Kurasawa, Hiroshi Sato, Motonori Nakamura, and
Hajime Matsumura
Improving the Performance for the Range Search on Metric Spaces
Using a Multi-GPU Platform (Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . .
Roberto Uribe-Paredes, Enrique Arias, Jos´e L. S´
anchez,
Diego Cazorla, and Pedro Valero-Lara

432

442


Table of Contents – Part II


XXIII

A Scheme of Fragment-Based Faceted Image Search (Short Paper) . . . . .
Takahiro Komamizu, Mariko Kamie, Kazuhiro Fukui,
Toshiyuki Amagasa, and Hiroyuki Kitagawa

450

Indexing Metric Spaces with Nested Forests (Short Paper) . . . . . . . . . . . .
Jos´e Martinez and Zineddine Kouahla

458

Business Processes and Social Networking
Navigating in Complex Business Processes . . . . . . . . . . . . . . . . . . . . . . . . . .
Markus Hipp, Bela Mutschler, and Manfred Reichert

466

Combining Information and Activities in Business Processes
(Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Giorgio Bruno

481

Opinion Extraction Applied to Criteria (Short Paper) . . . . . . . . . . . . . . . .
Benjamin Duthil, Fran¸cois Trousset, G´erard Dray,
Jacky Montmain, and Pascal Poncelet
SocioPath: Bridging the Gap between Digital and Social Worlds
(Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Nagham Alhadad, Philippe Lamarre, Yann Busnel,
Patricia Serrano-Alvarado, Marco Biazzini, and
Christophe Sibertin-Blanc

489

497

Data Security, Privacy, and Organization
Detecting Privacy Violations in Multiple Views Publishing
(Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Deming Dou and St´ephane Coulondre

506

Anomaly Discovery and Resolution in MySQL Access Control
Policies (Short Paper) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mohamed Shehab, Saeed Al-Haj, Salil Bhagurkar, and Ehab Al-Shaer

514

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

523


DIADEM: Domains to Databases
Tim Furche, Georg Gottlob, and Christian Schallhart
Department of Computer Science, Oxford University,
Wolfson Building, Parks Road, Oxford OX1 3QD



Abstract. What if you could turn all websites of an entire domain into
a single database? Imagine all real estate offers, all airline flights, or
all your local restaurants’ menus automatically collected from hundreds
or thousands of agencies, travel agencies, or restaurants, presented as a
single homogeneous dataset.
Historically, this has required tremendous effort by the data providers
and whoever is collecting the data: Vertical search engines aggregate
offers through specific interfaces which provide suitably structured data.
The semantic web vision replaces the specific interfaces with a single one,
but still requires providers to publish structured data.
Attempts to turn human-oriented HTML interfaces back into their
underlying databases have largely failed due to the variability of web
sources. In this paper, we demonstrate that this is about to change: The
availability of comprehensive entity recognition together with advances
in ontology reasoning have made possible a new generation of knowledgedriven, domain-specific data extraction approaches. To that end, we introduce diadem, the first automated data extraction system that can
turn nearly any website of a domain into structured data, working fully
automatically, and present some preliminary evaluation results.

1

Introduction

Most websites with offers on books, real estate, flights, or any number of other
products are generated from some database. However, meant for human
consumption, they make the data accessible only through, increasingly sophisticated, search and browse interfaces. Unfortunately, this poses a significant challenge in automatically processing these offers, e.g., for price comparison, market
analysis, or improved search interfaces. To obtain the data driving such applications, we have to explore human-oriented HTML interfaces and extract the data
made accessible through them, without requiring any human involvment.
Automated data extraction has long been a dream of the web community,

whether to improve search engines, to “model every object on the planet” 1 , or to

1

The research leading to these results has received funding from the European Research Council under the European Community’s Seventh Framework Programme
(FP7/2007–2013) / ERC grant agreement DIADEM, no. 246858.
Bing’s new aim, />
S.W. Liddle et al. (Eds.): DEXA 2012, Part I, LNCS 7446, pp. 1–8, 2012.
c Springer-Verlag Berlin Heidelberg 2012


2

T. Furche, G. Gottlob, and C. Schallhart

Structured API
(XML/JSON)

Semantic API
(RDF)

Product
provider

HTML
interface

template

Single agency

Few attributes

2

1
Form filling

Domain database

Object
identification

Whole Domain

DIADEM Single schema
Other providers

Floor plans

3

Rich attributes
Cleaning &
integration

Maps

4
Energy Performance Chart
Tables


Other
Provider
Other
Provider

Other
Provider

Other
Provider

Flat Text

Fig. 1. Data extraction with DIADEM

bootstrap the semantic web vision. Web extraction comes roughly in two shapes,
namely web information extraction (IE), extracting facts from flat text at very
large scale, and web data extraction (DE), extracting complex objects based on
text, but also layout, page and template structure, etc. Data extraction often
uses some techniques from information extraction such as entity and relationship
recognition, but not vice versa. Historically, IE systems are domain-independent
and web-scale [15,12], but at a rather low recall. DE systems fall into two categories: domain-independent, low accuracy systems [3,14,13] based on discovering
the repeated structure of HTML templates common to a set of pages, and highly
accurate, but site-specific systems [16,4] based on machine learning.
In this paper, we argue that a new trade-off is necessary to make highly
accurate, fully automated web extraction possible at a large scale. We trade off
scope for accuracy and automation: By limiting ourselves to a specific domain
where we can provide substantial knowledge about that domain and the representation of its objects on web sites, automated data extraction becomes possible
at high accuracy. Though not fully web-scale, one domain often covers thousands

or even tens of thousands of web sites: To achieve a coverage above 80% for typical attributes in common domains, it does not suffice to extract only from large,
popular web sites. Rather, we need to include objects from thousands of small,
long-tail sources, as shown in [5] for a number of domains and attributes.


×