Tải bản đầy đủ (.pdf) (453 trang)

ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT docx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (12.07 MB, 453 trang )

TLFeBOOK
ARTIFICIAL INTELLIGENCE RESEARCH AND
DEVELOPMENT
Frontiers in Artificial Intelligence and
Applications
FAIA covers all aspects of theoretical and applied artificial intelligence research in the form of
monographs, doctoral dissertations, textbooks, handbooks and proceedings volumes. The FAIA
series contains several sub-series, including “Information Modelling and Knowledge Bases” and
“Knowledge-Based Intelligent Engineering Systems”. It also includes the biannual ECAI, the
European Conference on Artificial Intelligence, proceedings volumes, and other ECCAI – the
European Coordinating Committee on Artificial Intelligence – sponsored publications. An
editorial panel of internationally well-known scholars is appointed to provide a high quality
selection.
Series Editors:
J. Breuker, R. Dieng, N. Guarino, J.N. Kok, J. Liu, R. López de Mántaras,
R. Mizoguchi, M. Musen and N. Zhong
Volume 131
Recently published in this series
Vol. 130. K. Zieliński and T. Szmuc (Eds.), Software Engineering: Evolution and Emerging
Technologies
Vol. 129. H. Fujita and M. Mejri (Eds.), New Trends in Software Methodologies, Tools and
Techniques
Vol. 128. J. Zhou et al. (Eds.), Applied Public Key Infrastructure
Vol. 127. P. Ritrovato et al. (Eds.), Towards the Learning Grid
Vol. 126. J. Cruz, Constraint Reasoning for Differential Models
Vol. 125. C K. Looi et al. (Eds.), Artificial Intelligence in Education
Vol. 124. T. Washio et al. (Eds.), Advances in Mining Graphs, Trees and Sequences
Vol. 123. P. Buitelaar et al. (Eds.), Ontology Learning from Text: Methods, Evaluation and
Applications
Vol. 122. C. Mancini, Cinematic Hypertext –Investigating a New Paradigm
Vol. 121. Y. Kiyoki et al. (Eds.), Information Modelling and Knowledge Bases XVI


Vol. 120. T.F. Gordon (Ed.), Legal Knowledge and Information Systems – JURIX 2004: The
Seventeenth Annual Conference
Vol. 119. S. Nascimento, Fuzzy Clustering via Proportional Membership Model
Vol. 118. J. Barzdins and A. Caplinskas (Eds.), Databases and Information Systems – Selected
Papers from the Sixth International Baltic Conference DB&IS’2004
Vol. 117. L. Castillo et al. (Eds.), Planning, Scheduling and Constraint Satisfaction: From
Theory to Practice
Vol. 116. O. Corcho, A Layered Declarative Approach to Ontology Translation with
Knowledge Preservation
ISSN 0922-6389
Artificial Intelligence Research and
Development
Edited by
Beatriz López
Institute of Informatics and Applications,
University of Girona, Spain
Joaquim Meléndez
Institute of Informatics and Applications,
University of Girona, Spain
Petia Radeva
Computer Vision Center & Department of Computer Science,
Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain
and
Jordi Vitrià
Computer Vision Center & Department of Computer Science,
Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain
Amsterdam • Berlin • Oxford • Tokyo • Washington, DC
© 2005 The authors.
All rights reserved. No part of this book may be reproduced, stored in a retrieval system,
or transmitted, in any form or by any means, without prior written permission from the publisher.

ISBN 1-58603-560-6
Library of Congress Control Number: 2005932065
Publisher
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
Netherlands
fax: +31 20 687 0019
e-mail:
Distributor in the UK and Ireland Distributor in the USA and Canada
IOS Press/Lavis Marketing IOS Press, Inc.
73 Lime Walk 4502 Rachael Manor Drive
Headington Fairfax, VA 22032
Oxford OX3 7AD USA
England fax: +1 703 323 3668
fax: +44 1865 750079 e-mail:
LEGAL NOTICE
The publisher is not responsible for the use which might be made of the following information.
PRINTED IN THE NETHERLANDS
Artificial Intelligence Research and Development v
B. López et al. (Eds.)
IOS Press, 2005
© 2005 The authors. All rights reserved.
Preface
Artificial Intelligence (AI) has been from its beginning the wave of evolution in computer
science. It is in good health and a proof of it is the fact that many companies qualify its
novelties as ‘smart’ or ‘intelligent’ independently of the features included in them; the term
‘society of knowledge’ has been imposed to draw society nearer to the future and a symbol
of breakthrough. From this perspective, AI has reached its maturity and it has exploded into
an endless set of sub-areas, getting in touch with all other disciplines to assist situation as-

sessment, analysis and interpretation of music, management of environmental and biologi-
cal systems, planning trains, routing of communication networks, assisting medical diagno-
sis or powering auctions.
The wide variety of Artificial Intelligence application areas has meant that AI research-
ers often become scattered in different micro specialized conferences. There are few occa-
sions where the AI research community joins together, while computer scientists and engi-
neers can find a lot of interesting ideas from the cross fertilization of results coming from
all of these application areas. The Catalan Association for Artificial Intelligence (ACIA
1
) is
aware of the benefit of this contact and with this aim it organizes an annual conference to
promote synergies in the research community of its influence. This book provides a repre-
sentative selection of papers resulting from this activity. The advances made by ACIA peo-
ple and its influence area have been gathered in this single volume as an update of previous
volumes published in 2003 and 2004 (corresponding to series numbers 100 and 113).
The book is organized according to the different sessions in which the papers were pre-
sented at the Eighth Catalan Conference on Artificial Intelligence, held in Alguer (Italy) on
October 26–28
th
, 2005. Namely: Neural Networks, Computer Vision, Applications, Ma-
chine Learning, Reasoning, Planning and Robotics, and Multi-Agent Systems. Papers have
been selected after a double blind review process in which distinguished AI researchers
from all over Europe participated. Among the 77 papers received, 26 were selected as oral
presentations and 25 as posters. The quality of the papers was high on average, and the se-
lection between an oral or a poster session was based on the degree of discussion that a pa-
per could generate more than on its quality. All of the papers collected in this volume
would be of interest to any computer scientist or engineer interested in AI.
We would like to express our sincere gratitude to all the authors and members of the
scientific and organizing committees that have made this conference a success. Our special
thanks also to the plenary speakers for their effort in preparing the lectures.

Alghero, October 2005
Beatriz López (University of Girona)
Joaquim Meléndez (University of Girona)
Petia Radeva (Computer Vision Center, UAB)
Jordi Vitrià (Computer Vision Center, UAB)
1
ACIA, the Catalan Association for Artificial Intelligence, is member of the European Coordinating
Committee for Artificial Intelligence (ECCAI). .
This page intentionally left blank
vii
Conference Organization
CCIA 2005 was organized by the University of Girona, the Computer Vision Center, the
Universitat Autònoma de Barcelona and the Associació Catalana d’Intelligència Artificial.
General Chairs
Beatriz López, University of Girona
Joaquim Meléndez, University of Girona
Petia Radeva, Computer Vision Center
Jordi Vitrià, Computer Vision Center
Scientific Committee
Isabel Aguiló, Universitat de les Illes Balears
Josep Aguilar, Centre National de la Recherche Scientifique
Cecilio Angulo, Technical University of Catalonia
Rene Bañares-Alcántara, University of Oxford
Ester Bernardó, Ramon Llull University
Vicent Botti, Technical University of Valencia
Jaume Casasnovas, Universitat de les Illes Balears
Jesus Cerquides, University of Barcelona
M. Teresa Escrig, Universitat Jaume I
Francesc Ferri, University of Valencia
Rafael García, University of Girona

Josep M. Garrell, Ramon Llull University
Héctor Geffner, Pompeu Fabra University
Elisabet Golobardes, Ramon Llull University
Antoni Grau, Technical University of Catalonia
M. Angeles Lopez, Universitat Jaume I
Beatriz López, University of Girona
Maite López, University of Barcelona
Joan Martí, University of Girona
Enric Martí, Computer Vision Center
Joaquím Melendez University of Girona
José del R. Millán, IDIAP Research Institute
Margaret Miró, Universitat de les Illes Balears
Antonio Moreno, Rovira i Virgili University
Eva Onaindia, Technical University of Valencia
Miquel Angel Piera, Universitat Atònoma de Barcelona
Filiberto Pla, Universitat Jaume I
Enric Plaza, Artificial Intelligence Research Institute
Monique Polit, University of Perpignan
Josep Puyol-Gruart, Artificial Intelligence Research Institute
Petia Radeva, Computer Vision Center
Ignasi R.Roda, University of Girona
Josep Lluís de la Rosa, University of Girona
Xari Rovira, Ramon Llull University
viii
Mónica Sànchez, Technical University of Catalonia
Miquel Sànchez-Marrè, Technical University of Catalonia
Mássimo Tistarelli, Università degli Studi di Sassari
Ricardo Toledo, Computer Vision Center
Miguel Toro, University of Sevilla
Vicenc Torra, Artificial Intelligence Research Institute

Enric Trillas, Technical University of Madrid
Magda Valls, University of Lleida
Llorenç Valverde, Universitat de les Illes Balears
Jordi Vitrià, Computer Vision Center
Additional referees
Arantza Aldea (Oxford Brooks University), Yolanda Bolea, (Technical University of
Catalunya), Mercedes E. Narciso Farias (Universitat Atònoma de Barcelona), Lluis Godo
(Artificial Intelligence Research Institute), Luis González Abril (University of Sevilla),
Felip Mañà (University of Lleida), Robert Martí (University of Girona), Pablo Noriega
(Artificial Intelligence Research Institute), Raquel Ros (Artificial Intelligence Research
Institute), Francisco Ruiz Vegas (Technical University of Catalunya), Aïda Valls (Rovira i
Virgili University), Pere Vila (University of Girona).
Organizing Committee
Esteve del Acebo, ARLab, University of Girona.
Marc Carreras, VICOROB, University of Girona.
Joan Colomer, eXiT, University of Girona.
Xavier Cufí, VICOROB, University of Girona.
Joan Martí, VICOROB, University of Girona.
Josep Lluís Marzo, BCDS, University of Girona.
Ignasi Rodríguez-Roda, LEQUIA, University of Girona.
Josep Lluís de la Rosa, ARLab, University of Girona.
Massimo Tistarelli, Università degli Studi di Sassari.
Josep Vehí, MICE, University of Girona.
Web manager and Secretariat
Xavier Ortega, Montse Vila, Maria Brugué
Sponsoring Institutions
Ciutat de l'Alguer,
assessorat de
cultura
ix

Contents
Preface v
Beatriz López, Joaquim Meléndez, Petia Radeva and Jordi Vitrià
Conference Organization vii
Invited Talks
On the Design of Bio-Inspired Multi-Agent Systems for Coordination and Control 3
Paul Valckenaers
Mássimo Tistarelli Talk 5
Mássimo Tistarelli
1. Neural Networks
Direct Policy Search Reinforcement Learning for Robot Control 9
Andres El-Fakdi, Marc Carreras and Narcís Palomeras
A Hopfield Network for the Portfolio Selection Problem 17
Alberto Fernández and Sergio Gómez
Neural Networks for Estimating the Efficiency of a WWTP Biologic Treatment 25
Frédérik Thiery, Stéphane Grieu, Adama Traore, Maxime Estaben
and Monique Polit
Learning Human-Level AI Abilities to Drive Racing Cars 33
Francisco Gallego, Faraón Llorens, Mar Pujol and Ramón Rizo
Feature Selection and Outliers Detection with Genetic Algorithms and
Neural Networks 41
Agusti Solanas, Enrique Romero, Sergio Gómez, Josep M. Sopena,
Rene Alquézar and Josep Domingo-Ferrer
2. Computer Vision
Multispectral Image Segmentation for Fruit Quality Estimation 51
Adolfo Martínez-Usó, Filiberto Pla and Pedro García-Sevilla
Detecting Mammographic Abnormalities from Image Registration Results 59
Robert Martí, David Raba, Caroline Rubin and Reyer Zwiggelaar
On the Usefulness of Supervised Learning for Vessel Border Detection in
IntraVascular Imaging 67

Aura Hernàndez, Debora Gil and Petia Radeva
x
Image Segmentation Based on Inter-Feature Distance Maps 75
Susana Álvarez, Xavier Otazu and Maria Vanrell
Staff and Graphical Primitive Segmentation in Old Handwritten Music Scores 83
Alicia Fornés, Josep Lladós and Gemma Sánchez
Real-Time Face Tracking for Context-Aware Computing 91
Bogdan Raducanu and Jordi Vitrià
Experimental Study of the Usefulness of External Face Features for Face
Classification 99
Àgata Lapedriza, David Masip and Jordi Vitrià
Angle Images Using Gabor Filters in Cardiac Tagged MRI 107
Joel Barajas, Jaume Garcia-Barnés, Francesc Carreras,
Sandra Pujadas and Petia Radeva
Classifying Natural Objects on Outdoor Scenes 115
Anna Bosch, Xavier Muñoz, Joan Martí and Arnau Oliver
Mass Segmentation Using a Pattern Matching Approach with a Mutual
Information Based Metric 123
Arnau Oliver, Jordi Freixenet, Joan Martí and Marta Peracaula
Feature Selection with Non-Parametric Mutual Information for Adaboost
Learning 131
Xavier Baró and Jordi Vitrià
3. Applications
OntoMusic: From Scores to Expressive Music Performances 141
Pere Ferrera and Josep Puyol-Gruart
Knowledge Production and Integration for Diagnosis, Treatment and Prognosis
in Medicine 149
John A. Bohada and David Riaño
Application of Clustering Techniques in a Network Security Testing System 157
Guiomar Corral, Elisabet Golobardes, Oriol Andreu, Isard Serra,

Elisabet Maluquer and Àngel Martínez
Llaüt: Fuzzy Logic Based Selective Bot 165
Rut Garí, Ricardo Galli, Llorenç Valverde and Juan Fornés
A Hybrid System Combining Self Organizing Maps with Case Based Reasoning
in Structural Assessment 173
L.E. Mujica, J. Vehí and J. Rodellar
Acquiring Unobtrusive Relevance Feedback Through Eye-Tracking in Ambient
Recommender Systems 181
Gustavo González, Beatriz López, Cecilio Angulo and Josep Lluís de la Rosa
xi
Evolution Strategies for DS-CDMA Pseudonoise Sequence Design 189
Rosa Maria Alsina Pagès, Ester Bernadó Mansilla
and Jose Antonio Morán Moreno
Evaluation of Knowledge Bases by Means of Multi-Dimensional OWA Operators 197
Isabel Aguiló, Javier Martín, Gaspar Mayor and Jaume Suñer
Automatic Discovery of Synonyms and Lexicalizations from the Web 205
David Sánchez and Antonio Moreno
4. Machine Learning
Imbalanced Training Set Reduction and Feature Selection Through Genetic
Optimization 215
R. Barandela, J.K. Hernández, J.S. Sánchez and F.J. Ferri
A Case-Based Methodology for Feature Weighting Algorithm Recommendation 223
Héctor Núñez and Miquel Sànchez-Marrè
Comparison of Strategies Based on Evolutionary Computation for the Design
of Similarity Functions 231
A. Fornells Herrera, J. Camps Dausà, E. Golobardes i Ribé
and J.M. Garrell i Guiu
Using Symbolic Descriptions to Explain Similarity on CBR 239
Eva Armengol and Enric Plaza
A Clustering-Based Fuzzy Classifier 247

Isabela Drummond and Sandra Sandri
Multilingual Question Classification Based on Surface Text Features 255
E. Bisbal, D. Tomás, L. Moreno, J.L. Vicedo and A. Suárez
5. Reasoning
On Warranted Inference in Possibilistic Defeasible Logic Programming 265
Carlos Chesñevar, Guillermo Simari, Lluís Godo and Teresa Alsinet
A Discretization Process in Accordance with a Qualitative Ordered Output 273
Francisco J. Ruiz, Cecilio Angulo, Núria Agell, Xari Rovira, Mónica Sánchez
and Francesc Prats
Supervised Fuzzy Control of Dissolved Oxygen in a SBR Pilot Plant 281
M.F. Teran, J. Colomer, J. Meléndez and J. Colprim
On the Consistency of a Fuzzy C-Means Algorithm for Multisets 289
Vicenç Torra and Sadaaki Miyamoto
xii
6. Planning and Robotics
A CBR System for Autonomous Robot Navigation 299
Raquel Ros, Ramon López de Màntaras, Carles Sierra and Josep Lluís Arcos
The Use of a Reasoning Process to Solve the Almost SLAM Challenge at
the Robocup Legged League 307
M. Teresa Escrig Monferrer and Juan Carlos Peris Broch
An Anytime Approach for On-Line Planning 323
Oscar Sapena and Eva Onaindía
Bug-Based T
2
: A New Globally Convergent Approach to Reactive Navigation 331
Javier Antich and Alberto Ortiz
A Heuristic Technique for the Capacity Assessment of Periodic Trains 339
M. Abril, M.A. Salido, F. Barber, L. Ingolotti, A. Lova and P. Tormos
Development of a Webots Simulator for the Lauron IV Robot 347
Julio Pacheco and Francesc Benito

A Preliminary Study on the Relaxation of Numeric Features in Planning 355
Antonio Garrido, Eva Onaindía and Donato Hernández
Multi-Objective Multicast Routing Based on Ant Colony Optimization 363
Diego Pinto, Benjamín Barán and Ramón Fabregat
Solving the GMM-Model with a MOEA 371
F. Solano, R. Fabregat, B. Barán, Y. Donoso and J.L. Marzo
7. Multiagent Systems
Auctioning Substitutable Goods 381
Andrea Giovannucci, Juan A. Rodríguez-Aguilar and Jesús Cerquides
The Agent Reputation and Trust (ART) Testbed Architecture 389
Karen K. Fullam, Tomas B. Klos, Guillaume Muller, Jordi Sabater-Mir,
Zvi Topol, K. Suzanne Barber, Jeffrey Rosenschein and Laurent Vercouter
Towards an Organizational MAS Methodology 397
Estefania Argente, Vicente Julian, Soledad Valero and Vicente Botti
Modelling the Human Values Scale in Recommender Systems: A First Approach 405
Javier Guzmán, Gustavo González, Josep L. de la Rosa and José A. Castán
Solving Ceramic Tile Factory Production Programming by MAS 413
E. Argente, A. Giret, S. Valero, P. Gómez and V. Julian
Integrating Information Sources for Recommender Systems 421
Silvana Aciar, Josefina López Herrera and Josep Lluis de la Rosa
xiii
OntoPathView: A Simple View Definition Language for the Collaborative
Development of Ontologies 429
E. Jimenez, R. Berlanga, I. Sanz, M.J. Aramburu and R. Danger
Author Index 437
This page intentionally left blank






















,QYLWHG7DONV
This page intentionally left blank

2QWKH'HVLJQRI%LRLQVSLUHG0XOWLDJHQW6\VWHPV
IRU&RRUGLQDWLRQDQG&RQWURO

3DXO9DOFNHQDHUV
.DWKROLHNH8QLYHUVLWHLW/HXYHQ0HFKDQLFDO(QJLQHHULQJ'HSDUWPHQW'LYLVLRQ30$
%HOJLXP
$EVWUDFW
7KHILUVWSDUWRIWKHSUHVHQWDWLRQVXUYH\VIXQGDPHQWDOLQVLJKWVLQWKHµVFLHQFHRIWKH
DUWLILFLDO¶ DV FRLQHG E\ +HUEHUW 6LPRQ )URP WKHVH LQVLJKWV D QXPEHU RI GHVLJQ
UHTXLUHPHQWVDQGSULQFLSOHVIRUFRPSOH[DGDSWLYHPDQLQYHQWHGV\VWHPVDUHGHULYHG

7KHVHFRQGSDUWGLVFXVVHVDPXOWLDJHQWFRRUGLQDWLRQDQGFRQWUROV\VWHPDQVZHULQJ
WKHDERYHUHTXLUHPHQWVIRUDVSHFLILFDSSOLFDWLRQGRPDLQ7KHV\VWHPGHVLJQLVLQVSLUHG
E\WKHEHKDYLRURIVRFLDOLQVHFWVDQWFRORQLHVDQGLQFOXGHVVRPHµVRFLDOHQJLQHHULQJ¶RI
WKHDJHQWVRFLHW\7KH GLVFXVVLRQIRFXVHVRQWKH VRFLHW\RIWKHDJHQWVLWVDUFKLWHFWXUH
VWUXFWXUHDQGLQWHUDFWLRQPHFKDQLVPVWKHLQWHUQDODUFKLWHFWXUHRIDVLQJOHDJHQWLVQRW
GLVFXVVHG7KLVVDPSOH0$6GHVLJQLVFRPSXWDWLRQDOO\HIILFLHQWDQG JLYHVLWVXVHUIXOO
FRQWURORYHUWKHFRPSXWDWLRQDQGFRPPXQLFDWLRQHIIRUWZKHUHWKHUHVXOWLVµEHVWHIIRUW¶
RQO\7KHVHFRRUGLQDWLRQDQGFRQWUROV\VWHPVKDQGOHDJRLQJFRQFHUQWKH\GRQRWOLPLW
WKHPVHOYHVWRRQHVKRWSUREOHPVROYLQJ
$ERXWWKHVSHDNHU
3DXO9DOFNHQDHUVUHFHLYHGWKHDSSOLHGPDWKHPDWLFVHQJLQHHULQJGHJUHHLQWKH
FRPSXWHUVFLHQFHHQJLQHHULQJGHJUHHLQ DQGWKHPHFKDQLFDOHQJLQHHULQJ3K'
GHJUHHLQDOOIURPWKH.DWKROLHNH8QLYHUVLWHLW/HXYHQ%HOJLXP6LQFHKHLV
ZLWK WKH PHFKDQLFDO HQJLQHHULQJ GHSDUWPHQW GLYLVLRQ 30$ RI WKH .DWKROLHNH
8QLYHUVLWHLW /HXYHQ +LV PDLQ UHVHDUFK LQWHUHVWV DUH LQ GLVWULEXWHG LQWHOOLJHQW
PDQXIDFWXULQJ FRQWURO PXOWLDJHQW FRRUGLQDWLRQ DQG FRQWURO DQG GHVLJQ WKHRU\ IRU
HPHUJHQWV\VWHPV
3DXO9DOFNHQDHUVLVWKHYLFHFKDLURI,)$&7&RQPDQXIDFWXULQJSODQWFRQWURO
+H KDV SXEOLVKHG PRUH WKDQ  SXEOLFDWLRQV LQ WKH GRPDLQ +H LV D PHPEHU RI WKH
VWHHULQJFRPPLWWHHRIWKH,061HWZRUNRI([FHOOHQFHLQZKLFKKHLVFKDLULQJWKH6,*RQ
EHQFKPDUNLQJ RI PDQXIDFWXULQJ FRQWURO V\VWHPV +H FXUUHQWO\ SDUWLFLSDWHV LQ WKH (8
*URZWKSURMHFW03$RQPRGXODUSODQWDUFKLWHFWXUHWKH(8*URZWKSURMHFW0$%(RQ
PXOWLDJHQW EXVLQHVV HQYLURQPHQWV DQG LV WKH GDLO\ FRRUGLQDWRU RI WKH FRQFHUWHG
UHVHDUFKDFWLRQ$J&R±IXQGHGE\WKH.8/HXYHQUHVHDUFKFRXQFLO±RQDJHQWEDVHG
FRRUGLQDWLRQDQGFRQWURO,QWKHUHFHQWSDVW3DXO9DOFNHQDHUVSDUWLFLSDWHGLQWKH,06
SURMHFWRQ +RORQLF0DQXIDFWXULQJ 6\VWHPV DV PHPEHU RI WKH WHFKQLFDOFRRUGLQDWLRQ
ERDUGDQGZDV WKH FRRUGLQDWRU RIWKH ,06 :RUNLQJ *URXSDQG WKH (8(VSULW /75
SURMHFW0$6&$'$RQPXOWLDJHQWPDQXIDFWXULQJFRQWURO
B. López et al. (Eds.)
Artificial Intelligence Research and Development

IOS Press, 2005
© 2005 The authors. All rights reserved.
3
This page intentionally left blank
0iVVLPR7LVWDUHOOL7DON

0iVVLPR7LVWDUHOOL
&RPSXWHU9LVLRQ/DERUDWRU\8QLYHUVLWjGHJOL6WXGLGL6DVVDUL
,WDO\

$ERXWWKHVSHDNHU
0DVVLPR7LVWDUHOOLUHFHLYHGWKHGHJUHHLQ(OHFWURQLF(QJLQHHULQJLQIURPWKH
8QLYHUVLW\RI*HQRDDQGWKH3K'LQ&RPSXWHU6FLHQFHDQG(OHFWURQLF(QJLQHHULQJLQ
IURPWKH8QLYHUVLW\RI *HQRD6LQFH KHKDV EHHQLQYROYHGDVFRRUGLQDWRU
SULQFLSDO LQYHVWLJDWRU DQG WDVN PDQDJHU LQ YDULRXV SURMHFWV IXQGHG E\ WKH (XURSHDQ
&RPPXQLW\ $PRQJ WKHP 3 ,08 3 92,/$ ),567 %5 6(&21'
%59$37,'(029$,'DQG/759,56%6
'XULQJ   DQG  KH KDV EHHQ YLVLWLQJ WKH 'HSDUWPHQW RI &RPSXWHU
6FLHQFH7ULQLW\&ROOHJH'XEOLQ,UHODQG,QKHZDVDYLVLWLQJVFLHQWLVWDW7KLQNLQJ
0DFKLQHVDQGWKH0,7&DPEULGJH0DVVDFKXVVHWWV+HLVFXUUHQWO\DVVRFLDWHSURIHVVRU
DWWKH)DFXOW\RI$UFKLWHFWXUHRIWKH8QLYHUVLW\RI6DVVDUL
+LVPDLQUHVHDUFKLQWHUHVWVFRYHUELRORJLFDODQGDUWLILFLDOYLVLRQELRPHWULFVURERWLF
QDYLJDWLRQDQGYLVXRPRWRUFRRUGLQDWLRQ+HLVDXWKRURIPRUHWKDQVFLHQWLILFSDSHUV
LQ FRQIHUHQFHV DQG LQWHUQDWLRQDO MRXUQDOV ,Q  KH ZDV WKH FKDLUPDQ IRU WKH ,QWO
ZRUNVKRSRQ$GYDQFHVLQ)DFLDO,PDJH$QDO\VLVDQG5HFRJQLWLRQ7HFKQRORJ\DQGLQ
IRUWKH,QWOZRUNVKRSRQ³%LRPHWULF$XWKHQWLFDWLRQ´+HZDVDVVRFLDWHHGLWRUIRU
WKHMRXUQDO,PDJHDQG9LVLRQ&RPSXWLQJKHLVFRHGLWRUIRUWKHVSHFLDOLVVXHRI,(((
7UDQVDFWLRQVRQ&LUFXLWVDQG6\VWHPVIRU9LGHR7HFKQRORJ\RQ,PDJHDQG9LGHR%DVHG
%LRPHWULFVDQGLQKHZDVWKHGLUHFWRURIWKH,QWO6XPPHU6FKRRORQ%LRPHWULFV

















B. López et al. (Eds.)
Artificial Intelligence Research and Development
IOS Press, 2005
© 2005 The authors. All rights reserved.
5
This page intentionally left blank
1. Neural Networks
This page intentionally left blank
Direct Policy Search Reinforcement
Learning for Robot Control
Andres El-Fakdi
1
, Marc Carreras and Narcís Palomeras
University of Girona, Spain
Abstract.

In this paper, we present Policy Methods as an alternative to Value Methods
to solve Reinforcement Learning problems. The paper proposes a Direct Policy
Search algorithm that uses a Neural Network to represent the control policies. De-
tails about the algorithm and the update rules are given. The main application of
the proposed algorithm is to implement robot control systems, in which the gener-
alization problem usually arises. In this paper, we point out the suitability of our
algorithm in a RL benchmark, that was specially designed to test the generalization
capability of RL algorithms. Results check out that policy methods obtain better
results than value methods in these situations.
Keywords. Reinforcement learning, Direct Policy Search and Robot Learning
1. Introduction
A commonly used methodology in robot learning is Reinforcement Learning (RL) [1].
In RL, an agent tries to maximize a scalar evaluation (reward or punishment) obtained
as a result of its interaction with the environment. The goal of a RL system is to find
an optimal policy which maps the state of the environment to an action which in turn
will maximize the accumulated future rewards. Most RL techniques are based on Finite
Markov Decision Processes (FMDP) causing finite state and action spaces. The main
advantage of RL is that it does not use any knowledge database, so the learner is not
told what to do as occurs in most forms of machine learning, but instead must discover
actions yield the most reward by trying them. Therefore, this class of learning is suitable
for online robot learning. The main disadvantages are a long convergence time and the
lack of generalization among continuous variables.
The dominant approach for solving the RL problem has been the use of a value-
function but, although it has demonstrated to work well in many applications, it has sev-
eral limitations. If the state-space is not completely observable (POMDP), small changes
in the estimated value of an action cause it to be, or not be, selected; and this will deto-
nate in convergence problems [2]. Over the past few years, studies have shown that ap-
proximating directly a policy can be easier than working with value functions, and better
results can be obtained [3,4]. Instead of approximating a value function, new methodolo-
1

Correspondence to: Andres El-Fakdi, Edifici PIV, Campus Montilivi, Universitat de Girona, 17071 Girona,
Spain. Tel.: +34 972 419 871; Fax: +34 972 418 259; E-mail:
B. López et al. (Eds.)
Artificial Intelligence Research and Development
IOS Press, 2005
© 2005 The authors. All rights reserved.
9
gies approximate a policy using an independent continuous function approximator with
its own parameters, trying to maximize the expected reward. Examples of direct policy
methods are the REINFORCE algorithm [5], the direct-gradient algorithm [6] and cer-
tain variants of the actor-critic framework [7]. The advantages of policy methods against
value-function based methods are various. A problem for which the policy is easier to
represent should be solved using policy algorithms [4]. Working this way should repre-
sent a decrease in the computational complexity and, for learning control systems which
operate in the physical world, the reduction in time-consuming would be notorious. Fur-
thermore, learning systems should be designed to explicitly account for the resulting vi-
olations of the Markov property. Studies have shown that stochastic policy-only methods
can obtain better results when working in POMDP than those ones obtained with deter-
ministic value-function methods [8]. On the other side, policy methods learn much more
slowly than RL algorithms using value function [3] and they typically find only local
optima of the expected reward [9].
We propose the use of an online Direct Policy Search (DPS) algorithm, based on
Baxter and Bartlett’s direct-gradient algorithm OLPOMDP [10], for its application in the
control system of a real system, such as a robot. This algorithm has the goal of learning a
state/action mapping that will be applied in the control system. The policy is represented
by a neural network whose input is a representation of the state, whose output is ac-
tion selection probabilities, and whose weights are the policy parameters. The proposed
method is based on a stochastic gradient descent with respect to the policy parameter
space, it does not need a model of the environment to be given and it is incremental,
requiring only a constant amount of computation step. The objective of the agent is to

compute a stochastic policy [8], which assigns a probability over each action.
The work presented in this paper is the continuation of a research line about robot
learning using RL, in which a more conventional value-function algorithm was first in-
vestigated [11,12]. The robot task used to test the algorithm was the learning of a target
following behavior with an underwater robot. This robot task has already been tested
in a simulation environment, obtaining very satisfactory results [13] . In this paper, we
describe in detail our DPS algorithm and show its efficiency in a RL benchmark, the
"mountain-car" task, to show the high generalization capability of policy methods.
2. The DPS algorithm
A partially observable Markov decision process (POMDP) consists of a state space S,an
observation space Y and a control space U. For each state i ∈ S there is a deterministic
reward r(i). As mentioned before, the algorithm is designed to work on-line, at every time
step the learner (our robot) will be given an observation of the state and, according to the
policy followed at that moment, it will generate a control action. As a result, the learner
will be driven to another state and will receive a reward associated to this new state. This
reward will allow us to update the controller’s parameters that define the policy followed
at every iteration, resulting in a final policy considered to be optimal or closer to optimal.
The algorithm procedure is summarized in Table 1. The schema of the ANN, used to
implement the control policy, can be seen in Figure 1.
The algorithm works as follows: having initialized the parameters vector θ
0
, the
initial state i
0
and the gradient z
0
=0, the learning procedure will be iterated T times. At
A. El-Fakdi et al. / Direct Policy Search Reinforcement Learning for Robot Control10

×