Tải bản đầy đủ (.pdf) (330 trang)

Data mining for social robotics toward autonomously social robots mohammad nishida 2016 01 09

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (9.21 MB, 330 trang )

Advanced Information and Knowledge Processing

Yasser Mohammad
Toyoaki Nishida

Data Mining
for Social
Robotics
Toward Autonomously Social Robots


Advanced Information and Knowledge
Processing
Series editors
Lakhmi C. Jain
Bournemouth University, Poole, UK, and
University of South Australia, Adelaide, Australia
Xindong Wu
University of Vermont


Information systems and intelligent knowledge processing are playing an increasing
role in business, science and technology. Recently, advanced information systems
have evolved to facilitate the co-evolution of human and information networks
within communities. These advanced information systems use various paradigms
including artificial intelligence, knowledge management, and neural science as well
as conventional information processing paradigms. The aim of this series is to
publish books on new designs and applications of advanced information and
knowledge processing paradigms in areas including but not limited to aviation,
business, security, education, engineering, health, management, and science. Books
in the series should have a strong focus on information processing—preferably


combined with, or extended by, new results from adjacent sciences. Proposals for
research monographs, reference books, coherently integrated multi-author edited
books, and handbooks will be considered for the series and each proposal will be
reviewed by the Series Editors, with additional reviews from the editorial board and
independent reviewers where appropriate. Titles published within the Advanced
Information and Knowledge Processing series are included in Thomson Reuters’
Book Citation Index.

More information about this series at />

Yasser Mohammad Toyoaki Nishida


Data Mining for Social
Robotics
Toward Autonomously Social Robots

123


Yasser Mohammad
Department of Electrical Engineering
Assiut University
Assiut
Egypt

Toyoaki Nishida
Department of Intelligence Science
and Technology
Kyoto University

Kyoto
Japan

ISSN 1610-3947
ISSN 2197-8441 (electronic)
Advanced Information and Knowledge Processing
ISBN 978-3-319-25230-8
ISBN 978-3-319-25232-2 (eBook)
DOI 10.1007/978-3-319-25232-2
Library of Congress Control Number: 2015958552
© Springer International Publishing Switzerland 2015
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made.
Printed on acid-free paper
This Springer imprint is published by SpringerNature
The registered company is Springer International Publishing AG Switzerland


Preface


Robots are here!
Service robots are beginning to live with us and occupy the same social space we
live in. These robots should be able to understand human’s natural interactive
behavior and to respond correctly to it. To do that they need to learn from their
interactions with humans. Considering the exceptional cognitive abilities of Homo
sapiens, two features immediately pop up, namely, autonomy and sociality.
Autonomy is what we consider when we think of human’s ability to play chess,
think about the origin of the universe, plan for hunts or investments, build a robust
stable perception of her environment, etc. This was the feature most inspiring the
early work in AI with its focus on computation and deliberative techniques. It was
also the driving force behind more recent advances that returned the interactive
nature of autonomy to the spotlight including reactive robotics, behavioral robotics,
and the more recent interest in embodiment.
Sociality, or the ability to act appropriately in the social domain, is another easily
discerned feature of human intelligence. Even playing chess has a social component
for if there was no social environment, it is hard to imagine a single autonomous
agent coming up with this two-player game. Humans do not only occupy physical
space but also occupy a social space that shapes them while they shape it.
Interactions between agents in this social space can be considered as efficient
utilization of natural interaction protocols which can be roughly defined as a kind of
multi-scale synchrony between interaction partners.
The interplay between autonomy and sociality is a major theoretical and practical concern for modern social robotics. Robots are expected to be autonomous
enough to justify their treatment as something different from an automobile and
they should be socially interactive enough to occupy a place in our humanly
constructed social space. Robotics researchers usually focus on one of these two
aspects but we believe that a breakthrough in the field is expected only when the
interplay between these two factors is understood and leveraged.
This is where data mining techniques (especially time-series analysis methods)
come into the picture. Using algorithms like change point discovery, motif


v


vi

Preface

discovery, and causality analysis, future social robots will be able to make sense of
what they see humans do and using techniques developed for programming by
demonstration they may be able to autonomously socialize with us.
This book tries to bridge the gap between autonomy and sociality by reporting
our efforts to design and evaluate a novel control architecture for autonomous,
interactive robots and agents that allow the robot/agent to learn natural social
interaction protocols (both implicit and explicit) autonomously using unsupervised
machine learning and data mining techniques. This shows how autonomy can
enhance sociality. The book also reports our efforts to utilize the social interactivity
of the robot to enhance its autonomy using a novel fluid imitation approach.
The book consists of two parts with different (yet complimentary) emphasis that
introduce the reader to this exciting new field in the intersection of robotics, psychology, human-machine interaction, and data mining.
One goal that we tried to achieve in writing this book was to provide a
self-contained work that can be used by practitioners in our three fields of interest
(data mining, robotics, and human-machine-interaction). For this reason we strove
to provide all necessary details of the algorithms used and the experiments reported
not only to ease reproduction of results but also to provide readers from these three
widely separated fields with the essential and necessary knowledge of the other
fields required to appreciate the work and reuse it in their own research and
creations.
Assiut, Egypt
Kyoto, Japan
October 2015


Yasser Mohammad
Toyoaki Nishida


Contents

1

Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 General Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Relation to Different Research Fields . . . . . . . . . . . . . . .
1.3.1
Interaction Studies . . . . . . . . . . . . . . . . . . . . .
1.3.2
Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.3
Neuroscience and Experimental Psychology . . . .
1.3.4
Machine Learning and Data Mining . . . . . . . . .
1.3.5
Contributions . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Interaction Scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Nonverbal Communication in Human–Human Interactions
1.6 Nonverbal Communication in Human–Robot Interactions .
1.6.1
Appearance . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6.2
Gesture Interfaces . . . . . . . . . . . . . . . . . . . . . .

1.6.3
Spontaneous Nonverbal Behavior . . . . . . . . . . .
1.7 Behavioral Robotic Architectures . . . . . . . . . . . . . . . . . .
1.7.1
Reactive Architectures. . . . . . . . . . . . . . . . . . .
1.7.2
Hybrid Architectures . . . . . . . . . . . . . . . . . . . .
1.7.3
HRI Specific Architectures. . . . . . . . . . . . . . . .
1.8 Learning from Demonstrations . . . . . . . . . . . . . . . . . . . .
1.9 Book Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.10 Supporting Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part I
2

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

1
1

5
7
7
9
11
11
11
12
14
17
18
18
19
21
21
22
23
24
26
27
28
28

.
.
.
.
.

.

.
.
.
.

.
.
.
.
.

.
.
.
.
.

35
35
36
36
37

Time Series Mining

Mining Time-Series Data . . . . . . . . . . . . . . . . . .
2.1 Basic Definitions . . . . . . . . . . . . . . . . . . . .
2.2 Models of Time-Series Generating Processes .
2.2.1
Linear Additive Time-Series Model

2.2.2
Random Walk . . . . . . . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.

.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

vii



viii

3

Contents

2.2.3
Moving Average Processes . . . . . . .
2.2.4
Auto-Regressive Processes. . . . . . . .
2.2.5
ARMA and ARIMA Processes. . . . .
2.2.6
State-Space Generation . . . . . . . . . .
2.2.7
Markov Chains. . . . . . . . . . . . . . . .
2.2.8
Hidden Markov Models. . . . . . . . . .
2.2.9
Gaussian Mixture Models . . . . . . . .
2.2.10 Gaussian Processes . . . . . . . . . . . . .
2.3 Representation and Transformations . . . . . . . .
2.3.1
Piecewise Aggregate Approximation .
2.3.2
Symbolic Aggregate Approximation .
2.3.3
Discrete Fourier Transform . . . . . . .
2.3.4
Discrete Wavelet Transform . . . . . . .

2.3.5
Singular Spectrum Analysis . . . . . . .
2.4 Learning Time-Series Models from Data . . . . .
2.4.1
Learning an AR Process . . . . . . . . .
2.4.2
Learning an ARMA Process . . . . . .
2.4.3
Learning a Hidden Markov Model . .
2.4.4
Learning a Gaussian Mixture Model .
2.4.5
Model Selection Problem. . . . . . . . .
2.5 Time Series Preprocessing . . . . . . . . . . . . . . .
2.5.1
Smoothing . . . . . . . . . . . . . . . . . . .
2.5.2
Thinning . . . . . . . . . . . . . . . . . . . .
2.5.3
Normalization . . . . . . . . . . . . . . . .
2.5.4
De-Trending . . . . . . . . . . . . . . . . .
2.5.5
Dimensionality Reduction . . . . . . . .
2.5.6
Dynamic Time Warping . . . . . . . . .
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

38

40
40
41
42
43
45
47
50
51
52
54
55
56
67
67
70
73
76
77
77
77
78
78
79
80
81
82
83

Change Point Discovery . . . . . . . . . . . . . . . . . .

3.1 Approaches to CP Discovery . . . . . . . . . . .
3.2 Markov Process CP Approach . . . . . . . . . .
3.3 Two Models Approach . . . . . . . . . . . . . . .
3.4 Change in Stochastic Processes . . . . . . . . .
3.5 Singular Spectrum Analysis Based Methods
3.5.1
Alternative SSA CPD Methods . .
3.6 Change Localization . . . . . . . . . . . . . . . . .
3.7 Comparing CPD Algorithms . . . . . . . . . . .
3.7.1
Confusion Matrix Measures . . . . .
3.7.2
Divergence Measures . . . . . . . . .
3.7.3
Equal Sampling Rate . . . . . . . . .
3.8 CPD for Measuring Naturalness in HRI . . .
3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.


85
86
87
90
93
94
98
98
99
100
101
104
105
107
107

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.


Contents

4

5

Motif Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Motif Discovery Problem(s). . . . . . . . . . . . . . . . . . .
4.2 Motif Discovery in Discrete Sequences . . . . . . . . . . .
4.2.1

Projections Algorithm . . . . . . . . . . . . . . . .
4.2.2
GEMODA Algorithm . . . . . . . . . . . . . . . .
4.3 Discretization Algorithms . . . . . . . . . . . . . . . . . . . .
4.3.1
MDL Extended Motif Discovery . . . . . . . .
4.4 Exact Motif Discovery . . . . . . . . . . . . . . . . . . . . . .
4.4.1
MK Algorithm . . . . . . . . . . . . . . . . . . . . .
4.4.2
MK+ Algorithm . . . . . . . . . . . . . . . . . . . .
4.4.3
MK++ Algorithm . . . . . . . . . . . . . . . . . . .
4.4.4
Motif Discovery Using Scale Normalized
Distance Function (MN) . . . . . . . . . . . . . .
4.5 Stochastic Motif Discovery . . . . . . . . . . . . . . . . . . .
4.5.1
Catalano’s Algorithm . . . . . . . . . . . . . . . .
4.6 Constrained Motif Discovery . . . . . . . . . . . . . . . . . .
4.6.1
MCFull and MCInc . . . . . . . . . . . . . . . . .
4.6.2
Real-Valued GEMODA. . . . . . . . . . . . . . .
4.6.3
Greedy Motif Extension . . . . . . . . . . . . . .
4.6.4
Shift-Density Constrained Motif Discovery .
4.7 Comparing Motif Discovery Algorithms . . . . . . . . . .
4.8 Real World Applications . . . . . . . . . . . . . . . . . . . . .

4.8.1
Gesture Discovery from Accelerometer Data
4.8.2
Differential Drive Motion Pattern Discovery.
4.8.3
Basic Motions Discovery from Skeletal
Tracking Data . . . . . . . . . . . . . . . . . . . . .
4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . 145
. . . . . . . 146
. . . . . . . 147

Causality Analysis . . . . . . . . . . . . . . . . .
5.1 Causality Discovery . . . . . . . . . . . .
5.2 Correlation and Causation . . . . . . . .
5.3 Granger-Causality and Its Extensions
5.4 Convergent Cross Mapping . . . . . . .
5.5 Change Causality . . . . . . . . . . . . . .
5.6 Application to Guided Navigation . .
5.6.1
Robot Guided Navigation .
5.7 Summary . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.

.
.
.
.
.
.

Part II
6

ix

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.


.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

109
109
110
114
115
118
120
124
125
127
129


.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.


131
134
134
136
136
138
138
140
143
144
144
145

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

149
150
150
151
153
162
165
165
166
166


Autonomously Social Robots

Introduction to Social Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.1 Engineering Social Robots. . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.2 Human Social Response to Robots . . . . . . . . . . . . . . . . . . . . . 174


x

Contents

6.3

Social Robot Architectures . . . . . . .
6.3.1
C4 Cognitive Architecture .
6.3.2
Situated Modules . . . . . . .
6.3.3
HAMMER. . . . . . . . . . . .
6.4 Summary . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.


.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.

.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.

.
.
.

177
177
181
185
190
190

7

Imitation and Social Robotics . . . . . . . . . . . . . . . . . . . . . . . . .
7.1 What Is Imitation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Imitation in Animals and Humans . . . . . . . . . . . . . . . . . .
7.3 Social Aspects of Imitation in Robotics. . . . . . . . . . . . . . .
7.3.1
Imitation for Bootstrapping Social Understanding .
7.3.2
Back Imitation for Improving Perceived Skill. . . .
7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.

.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

193
193
196
200
201
202
204
204


8

Theoretical Foundations . . . . . . . . . . . . . . . . . . . . .
8.1 Autonomy, Sociality and Embodiment . . . . . . .
8.2 Theory of Mind . . . . . . . . . . . . . . . . . . . . . . .
8.3 Intention Modeling . . . . . . . . . . . . . . . . . . . . .
8.3.1
Traditional Intention Modeling . . . . . .
8.3.2
Intention in Psychology. . . . . . . . . . .
8.3.3
Challenges for the Theory of Intention
8.3.4
The Proposed Model of Intention . . . .
8.4 Guiding Principles . . . . . . . . . . . . . . . . . . . . .
8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

The Embodied Interactive Control Architecture
9.1 Motivation . . . . . . . . . . . . . . . . . . . . . . .
9.2 The Platform . . . . . . . . . . . . . . . . . . . . . .
9.3 Key Features of EICA . . . . . . . . . . . . . . .
9.4 Action Integration . . . . . . . . . . . . . . . . . .
9.4.1
Behavior Level Integration. . . . . .
9.4.2
Action Level Integration . . . . . . .
9.5 Designing for EICA . . . . . . . . . . . . . . . . .

9.6 Learning Using FPGA . . . . . . . . . . . . . . .
9.7 Application to Explanation Scenario . . . . . .
9.7.1
Fixed Structure Gaze Controller . .
9.8 Application to Collaborative Navigation . . .
9.9 Summary . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.


.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.

.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

207
207
211
218
219
221
222
223
224
225
225

.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.

229
229
230
233
234
236
236
237
238
240
241
242
243
243

10 Interacting Naturally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
10.1 Main Insights. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
10.2 EICA Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247


Contents

10.3 Down–Up–Down Behavior
10.4 Mirror Training (MT) . . . .
10.5 Summary . . . . . . . . . . . .

References . . . . . . . . . . . . . . . .

xi

Generation (DUD).
..............
..............
..............

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.

.
.
.

.
.
.
.

249
252
253
253

11 Interaction Learning Through Imitation . . . . . . . . . . . . . . .
11.1 Stage 1: Interaction Babbling. . . . . . . . . . . . . . . . . . . .
11.1.1 Learning Intentions . . . . . . . . . . . . . . . . . . . .
11.1.2 Controller Generation . . . . . . . . . . . . . . . . . .
11.2 Stage 2: Interaction Structure Learning . . . . . . . . . . . . .
11.2.1 Single-Layer Interaction Structure Learner . . . .
11.2.2 Interaction Rule Induction . . . . . . . . . . . . . . .
11.2.3 Deep Interaction Structure Learner . . . . . . . . .
11.3 Stage 3: Adaptation During Interaction . . . . . . . . . . . . .
11.3.1 Single-Layer Interaction Adaptation Algorithm .
11.3.2 Deep Interaction Adaptation Algorithm . . . . . .
11.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.4.1 Explanation Scenario. . . . . . . . . . . . . . . . . . .
11.4.2 Guided Navigation Scenario. . . . . . . . . . . . . .
11.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

255
255
256
257

259
259
261
264
266
266
268
269
270
271
272
272

Imitation. . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction. . . . . . . . . . . . . . . . . . . . . . . . .
Example Scenarios . . . . . . . . . . . . . . . . . . . .
The Fluid Imitation Engine (FIE) . . . . . . . . . .
Perspective Taking . . . . . . . . . . . . . . . . . . . .
12.4.1 Transforming Environmental State . .
12.4.2 Calculating Correspondence Mapping
12.5 Significance Estimator. . . . . . . . . . . . . . . . . .
12.6 Self Initiation Engine . . . . . . . . . . . . . . . . . .
12.7 Application to the Navigation Scenario . . . . . .
12.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12 Fluid
12.1
12.2
12.3

12.4

13 Learning from Demonstration . . . . . . . . . . . .
13.1 Early Approaches . . . . . . . . . . . . . . . . . .
13.2 Optimal Demonstration Methods . . . . . . .
13.2.1 Inverse Optimal Control . . . . . .
13.2.2 Inverse Reinforcement Learning .
13.2.3 Dynamic Movement Primitives. .
13.3 Statistical Methods . . . . . . . . . . . . . . . . .
13.3.1 Hidden Markov Models. . . . . . .
13.3.2 GMM/GMR. . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.

.

.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.


.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

275
276
278
279
280
280
282
286
288
288
290
290


.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.


.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.


.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.


.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.


293
294
295
295
299
302
307
307
307


xii

Contents

13.4 Symbolization Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 313
13.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
14 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325


Chapter 1

Introduction

How to create a social robot that people do not only operate but relate to? This book
is an attempt to answer this question and will advocate using the same approach
used by infants to grow into social beings: developing natural interaction capacity

autonomously. We will try to flesh out this answer by providing a computational
framework for autonomous development of social behavior based on data mining
techniques.
The focus of this book is on how to utilize the link between autonomy and sociality in order to improve both capacities in service robots and other kinds of embodied
agents. We will focus mostly on robots but the techniques developed are applicable
to other kinds of embodied agents as well. The book reports our efforts to enhance
sociality of robots through autonomous learning of natural interaction protocols as
well as enhancing robot’s autonomy through imitation learning in a natural environment (what we call fluid imitation). The treatment is not symmetric. Most of the book
will be focusing on the first of these two directions because it is the least studied in
literature as will be shown later in this chapter. This chapter focuses on the motivation
of our work and provides a road-map of the research reported in the rest of the book.

1.1 Motivation
Children are amazing learners. Within few years normal children succeed in learning
skills that are beyond what any available robot can currently achieve. One of the key
reasons for this superiority of child learning over any available robotic learning
system, we believe, is that the learning mechanisms of the child were evolved for
millions of years to suit the environment in which learning takes place. This match
between the learning mechanism and the learned skill is very difficult to engineer
as it is related to historical embodiment (Ziemke 2003) which means that the agent
and its environment undergo some form of joint evolution through their interaction.
It is our position that a breakthrough in robotic learning can occur once robots can
© Springer International Publishing Switzerland 2015
Y. Mohammad and T. Nishida, Data Mining for Social Robotics,
Advanced Information and Knowledge Processing,
DOI 10.1007/978-3-319-25232-2_1

1



2

1 Introduction

get a similar chance to co-evolve their learning mechanisms with the environments
in which they are embodied.
Another key reason of this superiority is the existence of the care-giver. The caregiver helps the child in all stages of learning and in the same time provides the fail safe
mechanism that allows the child to make mistakes that are necessary for learning. The
care-giver cannot succeed in this job without being able to communicate/interact with
the child using appropriate interaction modalities. During the first months and years
of life, the child is unable to use verbal communication and in this case nonverbal
communication is the only modality available for the care-giver. This importance
of nonverbal communication in this key period in the development of any human
being is a strong motivation to study this form of communication and ways to endow
robots and other embodied agents with it. Even during adulthood, human beings still
use nonverbal communication continuously either consciously or unconsciously as
researchers estimate that over 70 % of human communication is nonverbal (Argyle
2001). It is our position here that robots need to engage in nonverbal communication
using natural means with their human partners in order for them to learn the most
from these interactions as well as to be more acceptable as partners (not just tools)
in the social environment.
Research in learning from demonstration (imitation) (Billard and Siegwart 2004)
can be considered as an effort to provide a care-giver-like partner for the robot to help
in teaching it basic skills or to build complex behaviors from already learned basic
skills. In this type of learning, the human partner shows the robot how to execute
some task, then the robot watches and learns a model of the task and starts executing
it. In some systems the partner can also verbally guide the robot (Rybski et al. 2007),
correct robot mistakes (Iba et al. 2005), use active learning by guiding robot limps to
do the task (Calinon and Billard 2007), etc. In most cases the focus of the research is
on the task itself not the interaction that is going between the robot and the teacher.

A natural question then is who taught the robot this interaction protocol? Nearly in
all cases, the interaction protocol is fixed by the designer. This does not allow the
robot to learn how to interact which is a vital skill for robot’s survival (as it helps
learning from others) and acceptance (as it increases its social competence).
Teaching robot interaction skills (especially nonverbal interaction skills) is more
complex than teaching them other object related skills because of the inherent ambiguity of nonverbal behavior, its dependency on the social context, culture and personal
traits, and the sensitivity of nonverbal behavior to slight modifications of behavior
execution. Another reason of this difficulty is that learning using a teacher requires
an interaction protocol, so how can we teach the robots the interaction protocol itself?
One major goal of this book is to overcome these difficulties and develop a computational framework that allows the robot to learn how to interact using nonverbal
communication protocols from human partners.
Considering learning from demonstration (imitation learning) again, most work in
the literature focuses on how to do the imitation and how to solve the correspondence
problem (difference in form factor between the imitatee and the imitator) (Billard
and Siegwart 2004) but rarely on the question of what to imitate from the continuous
stream of behaviors that other agents in the environment are constantly executing.


1.1 Motivation

3

Based on our proposed deep link between sociality and autonomy we propose to
integrate imitation more within the normal functioning of the robot/agent by allowing
it to discover for itself interesting behavioral patterns to imitate, best times to do the
imitation and best ways to utilize feedback using natural social cues. This completes
the cycle of autonomy–sociality relation and is discussed toward the end of this book
(Chap. 12).
The proposed approach for achieving autonomous sociality can be summarized
as autonomous development of natural interactive behavior for robots and embodied

agents. This section will try to unwrap this description by giving an intuitive sense
of each of the terms involved in it.
The word “robot” was introduced to English by the Czech playwright, novelist and
journalist Karel Capek (1880–1938) who introduced it in his 1920 hit play, R.U.R.,
or Rossum’s Universal Robots. The root of the word is an old Church Slavonic word,
robota, for servitude, forced labor or drudgery. This may bring to mind the vision of
an industrial robot in a factory content to forever do what it was programmed to do.
Nevertheless, this is not the sense by which Capek used the word in his play. R.U.R.
tells the story of a company using the latest science to mass produce workers who
lack nothing but a soul. The robots perform all the work that humans preferred not to
do and, soon, the company is inundated with orders. At the end, the robots revolt, kill
most of the humans only to find that they do not know how to produce new robots.
In the end, there is a deux ex machina moment, when two robots somehow acquire
the human traits of love and compassion and go off into the sunset to make the world
anew.
A brilliant insight of Capek in this play is understanding that working machines
without the social sense of humans can not enter our social life. They can be dangerous
and can only be redeemed by acquiring some sense of sociality. As technology
advanced, robots are now moving out of the factories and into our lives. Robots are
providing services for the elderly, work in our offices and hospitals and starting to
live in our homes. This makes it more important for robots to become more social.
The word “behave” has two interrelated meanings. Sometimes it is used to point
to autonomous behavior or achieving tasks, yet in other cases it is used to stress
being polite or social. This double meaning is one of the main themes connecting
the technical pieces of our research: autonomy and sociality are interrelated. Truly
social robots cannot be but truly autonomous robots.
An agent X is said to be autonomous from an entity Y toward a goal G if and only
if X has the power to achieve G without needing help from Y (Castelfranchi and
Falcone 2004). The first feature of our envisioned social robot is that it is autonomous
from its designer toward learning and executing natural interaction protocols. The

exact notion of autonomy and its incorporation in the proposed system are discussed
in Chap. 8. For now, it will be enough to use the aforementioned simple definition of
autonomy.
The term “developmental” is used here to describe processes and mechanisms
related to progressive learning during individual’s life (Cicchetti and Tucker 1994).
This progressive learning usually unfolds into distinctive stages. The proposed system is developmental in the sense that it provides clearly distinguishable stages of


4

1 Introduction

progression in learning that covers the robot’s—or agent’s—life. The system is also
developmental in the sense that its first stages require watching developed agent
behaviors (e.g. humans or other robots) without the ability to engage in these interactions, while the final stage requires actual engagement in interactions to achieve
any progress in learning. This situation is similar to the development of interaction
skills in children (Breazeal et al. 2005a).
An interaction protocol is defined here as multi-layered synchrony in behavior
between interaction partners (Mohammad and Nishida 2009). Interaction protocols
can be explicit (e.g. verbal communication, sign language etc.) or implicit (e.g. rules
for turn taking and gaze control). Interaction protocols are in general multi-layered
in the sense that the synchrony needs to be sustained at multiple levels (e.g. body
alignment in the lowest level, and verbal turn taking in a higher level). Special cases of
single layer interaction protocols certainly exist (e.g. human–computer interaction
through text commands and printouts), but they are too simple to gain from the
techniques described in this work.
Interaction Protocols can have a continuous range of naturalness depending on
how well they satisfy the following two properties:
1. A natural interaction protocol minimizes negative emotions of the partners compared with any other interaction protocol. Negative emotions here include stress,
high cognitive loads, frustration, etc.

2. A natural interaction protocol follows the social norms of interaction usually
utilized in human–human interactions within the appropriate culture and context
leading to a state of mutual-intention.
The first feature stresses the psychological aspect associated with naturalness
(Nishida et al. 2014 provides detailed discussion). The second one emphasizes the
social aspect of naturalness and will be discussed in Chap. 8.
Targets of this work are robots and embodied agents. A robot is usually defined
as a computing system with sensors and actuators that can affect the real world
(Brooks 1986). An embodied agent is defined here as an agent that is historically
embodied in its environment (as defined in Chap. 8) and equipped with sensors and
actuators that can affect this environment. Embodied Conversational Agents (ECA)
(Cassell et al. 2000) can fulfill this definition if their capabilities are grounded in the
virtual environments they live within. Some of the computational algorithms used
for learning and pattern analysis that will be presented in this work are of general
nature and can be used as general tools for machine learning, nevertheless, the whole
architecture relies heavily on agent’s ability to sense and change its environment
and partners (as part of this environment) and so it is not designed to be directly
applicable to agents that cannot satisfy these conditions. Also the architecture is
designed to allow the agent to develop (in the sense described earlier in this section)
in its own environment and interaction contexts which facilitates achieving historical
embodiment.
Putting things together, we can expand the goal of this book in two steps: from
autonomously social robots to autonomous development of natural interaction protocols for robots and embodied agents which in turn can be summarized as: design and


1.1 Motivation

5

evaluation of systems that allow robots and other embodied agents to progressively

acquire and utilize a grounded multi-layered representation of socially accepted synchronizing behaviors required for human-like interaction capacity that can reduce
the stress levels of their partner humans and can achieve a state of mutual intention.
The robot (or embodied agent) learns these behaviors (protocols) independent of its
own designer in the sense that it uses only unsupervised learning techniques for all
its developmental stages and it develops its own computational processes and their
connections as part of this learning process.
This analysis of our goal reveals some aspects of the concepts social and
autonomous as used in this book. Even though these terms will be discussed in
much more details later (Chaps. 6 and 8), we will try to give an intuitive description
of both concepts here.
In robotics and AI research, the term social is associated with behaving according
to some rules accepted by the group and the concept is in many instances related to the
sociality of insects and other social animals more than the sense of natural interaction
with humans highlighted earlier in this section. In this book, on the other hand, we
focus explicitly on sociality as the ability to interact naturally with human beings
leading to more fluid interaction. This means that we are not interested in robots
that can coordinate between themselves to achieve goals (e.g. swarm robotics) or
that are operated by humans through traditional computer mediated interfaces like
keyboards, joysticks or similar techniques.
We are not interested much in tele-operated robots, even though the research presented here can be of value for such robots, mainly because these robots lack—in
most cases—the needed sense of autonomy. The highest achievement of a teleoperated robot is usually to disappear altogether giving the operating human the sense of
being in direct contact with the robot’s environment and allowing her to control this
environment without much cognitive effort. The social robots we think about in this
book do not want to disappear and be taken for granted but we like them to become
salient features of the social environment of their partner humans. These two goals
are not only different but opposites. This sense of sociality as the ability to interact
naturally is pervasive in this book. Chapter 6 delves more into this concept.

1.2 General Overview
The core part of this book (Part II) represents our efforts to realize a computational

framework within which robots can develop social capacities as explained in the previous sections and can use these capacities for enhancing their task competence. This
work can be divided into three distinct—but inter-related—phases of research. The
core research aimed at developing a robotic architecture that can achieve autonomous
development of interactive behavior and providing proof-of-concept experiments to
support this claim (Chaps. 9–11 in this book). The second phase was real world applications of the proposed system to two main scenarios (Sect. 1.4). The third phase


6

1 Introduction

intention

simulation
ToM

Interaction
Protocol

simulation intention
ToM

Learning through
imitation

Intention coupling
Basic Interactive Behaviors
Nod

Look@...


Confirm

…..

Basic Interactive Behaviors
Nod

Look@...

Confirm

Learning through
data mining

…..

Fig. 1.1 The concept of interaciton protocols as coupling between the intentions of different partners implemented through a simulation based theory of mind

focused on the fluid imitation engine (Chap. 12) which tries to augment learning
from demonstration techniques (Chap. 13) with a more natural interaction mode.
The core research was concerned with the development of the Embodied Interactive Control Architecture (EICA). This architecture was designed from ground
up to support the long term goal of achieving Autonomous Development of Natural
Interactive Behavior (ADNIB). The architecture is based on two main theoretical
hypotheses formulated in accordance with recent research in neuroscience, experimental psychology and robotics. Figure 1.1 shows a conceptual view of our approach.
Social behavior is modeled by a set of interaction protocols that in turn implement
dynamic coupling between the intentions of interaction partners. This coupling is
achieved through simulation of the mental state of the other agent (ToM). ADNIB
is achieved by learning both the basic interactive acts using elementary time-series
mining techniques and higher level protocols using imitation.

The basic platform described in Chap. 9 was used to implement autonomously
social robots through a special architecture that is described in details in Chap. 10.
Chapter 11 is dedicated to the details of the developmental algorithms used to achieve
autonomous sociality and to case studies of their applications. This developmental
approach passes through three stages:
Interaction Babbling: During this stage, the robot learns the basic interactive acts
related to the interaction type at hand. The details of the algorithms used at this
stage are given in Sect. 11.1.
Interaction Structure Learning: During this stage, the robot uses the basic interactive acts it learned in the previous stage to learn a hierarchy of probabilistic/dynamical systems that implement the interaction protocol at different time
scales and abstraction levels. Details of this algorithm are given in Sect. 11.2.


1.2 General Overview

7

Interactive Adaptation: During this stage, the robot actually engages in human–
robot interactions to adapt the hierarchical model it learned in the previous stage
to different social situations and partners. Details of this algorithm are given in
Sect. 11.3.
The final phase was concerned with enhancing the current state of learning by
demonstration research by allowing the agent to discover interesting patterns of
behavior. This is reported in Chap. 12.

1.3 Relation to Different Research Fields
The work reported in this book relied on the research done in multiple disciplines
and contributed to these disciplines to different degrees. In this section we briefly
describe the relation between different disciplines and this work.
This work involved three main subareas. The development of the architecture itself
(EICA), the learning algorithms used to learn the controller in the stages highlighted

in the previous section and the evaluation of the resulting behavior. Each one of
these areas was based on results found by many researchers in robotics, interactions
studies, psychology, neuroscience, etc.

1.3.1 Interaction Studies
Interaction studies is a wide area of research focusing on the study of human–human
interactions in various contexts including face to face situations. A classical theory
of human–human interaction is the speech act theory according to which utterances
(and other communicative behaviors) are actions performed by rational agents (e.g.
humans) to further their goals and achieve their desires based on their beliefs and
intentions (Austin 1975). In some sense, our work is extending the notion of speech
acts to what we can call interaction acts that represent socially meaningful signals
issued through interactive actions including both verbal and nonverbal behaviors.
The speech act theory involves analysis of utterances at three levels:
1. locutionary act which involves the actual physical activity generating the utterance
and its direct meaning.
2. illocutionary act which involves the intended socially meaningful action that
the act was invoked to provoke. This includes assertion, direction, commission,
expression and declaration.
3. perlocutionary act which encapsulates the actual outcome of the utterance including convincing, inspiring, persuading, etc.
The most important point of the speech-act theory for our purposes is its clear
separation between locutionary acts and illocutionary acts. Notice that the same utterance with the same locutionary act may invoke different illocutionary acts based on


8

1 Introduction

the context, shared background of interaction partners, and nonverbal behavior associated with the locutionary act. This entails that understanding the social significance
of verbal behavior depends crucially on the ability of interacting partners to parse

the nonverbal aspects of the interaction (which we encapsulate in the interaction protocol) and its context (which selects the appropriate interaction protocol for action
generation and understanding).
Another related theory to our work is the contribution theory of Herbert Clark
(Clark and Brennan 1991) which provides a specific model of communication. The
principle constructs of the contribution theory are the common ground and grounding.
The common ground is a set of beliefs that are held by interaction partners and,
crucially, known by them to be held by other interaction partners. This means that
for a proposition b to be a part of the common ground it must not only be a member
of the belief set of all partners involved in the interaction but a second order belief B
that has the form ‘for all interaction partners P: P believes b’ must also be a member
of the belief set of all partners.
Building this common ground is achieved through a process that Clark calls
grounding. Grounding is achieved through speech-acts and nonverbal behaviors.
This process is achieved through a hierarchy of contributions where each contribution is considered to consist of two phases:
Presentation phase: involving the presentation of an utterance U by A to B expecting
B to provide some evidence e that (s)he understood U . By understanding here
we mean not only getting the direct meaning but the underlying speech act which
depends on the context and nonverbal behavior od A.
Acceptance phase: involving B showing evidence e or stronger that it understood U .
Both the presentation and acceptance phases are required to ensure that the content
of utterance U (or the act it represents) is correctly encoded as a common ground for
both partners A and B upon which future interaction can be built.
Clark distinguishes three methods of accepting an utterance in the acceptance
phase:
1. Acknowledgment through back channels including continuers like uh, yeah and
nodding.
2. Initiation of a relevant next turn. A common example is using an answer in the
acceptance phase to accept a question given in the presentation phase. The answer
here does not only involve information about the question asked but also reveals
that B understood what A was asking about. In some cases, not answering a

question reveals understanding. The question/answer pattern is an example of a
more general phenomenon called adjacency pairs in which issuing the second
part of the pair implies acceptance of the first part.
3. The simplest form of acceptance is continued attention. Just by not interrupting
or changing attention focus, B can signal acceptance to A. One very ubiquitous
way to achieve this form of attention is joint or mutual gaze which signals without
any words the focus of attention. When the focus of attention is relevant to the
utterance, A can assume that B understood the presented utterance.


1.3 Relation to Different Research Fields

9

It is important to notice that in two of these three methods, nonverbal behavior
is the major component of the acceptance response. Add to this that the utterance
meaning is highly dependent on the accompanying nonverbal behavior and we can
see clearly the importance of the nonverbal interaction protocol in achieving natural
interaction between humans. This suggests that social robots will need to achieve
comparable levels of fluency in nonverbal interaction protocols if they are to succeed
in acting as interaction partners.
The grounding process itself shows another important feature of these interaction
protocols. Acceptance can be achieved through another presentation/acceptance dyad
leading to a hierarchical structure. An example of this hierarchical organization can
be seen in an example due to Clark and Brennan (1991):
Alan: Now, –um, do you and your husband have a j– car?
Barbara: – have a car?
Alan: yeah.
Barbara: no –


The acceptance phase of the first presentation involved a complete presentation
of a question (“have a car?”) and a complete acceptance for this second presentation
(“yeah”). The acceptance of the first presentation is not complete until the second
answer of Barbara (“no”).
This hierarchical structure of conversation and interaction in general informed the
design of our architecture as will be explained in details in Chap. 10.
The importance of gaze in the acceptance phase of interactions and signaling
continued attention inspired our work in gaze control as shown by the selection of
interaction scenarios (Sect. 1.4) and applications of our architecture (Sects. 9.7 and
11.4).

1.3.2 Robotics
For the purposes of this chapter, we can define robotics as building and evaluating
physically situated agents. Robotics itself is a multidisciplinary field using results
from artificial intelligence, mechanical engineering, computer science, machine
vision, data mining, automatic control, communications, electronics etc.
This work can be viewed as the development of a novel architecture that supports
a specific kind of Human–Robot Interaction (namely grounded nonverbal communication). It utilizes results of robotics research in the design of the controllers used
to drive the robots in all evaluation experiments. It also utilizes previous research in
robotic architectures (Brooks 1986) and action integration (Perez 2003) as the basis
for the proposed EICA architecture (See Chap. 9).
Several threads of research in robotics contributed to the work reported in this
book. The most obvious of these is research in robotic architectures to support
human–robot interaction and social robotics in general (Sect. 6.3).


10

1 Introduction


This direction of research is as old as robotics itself. Early architectures were
deliberative in nature and focused on allowing the robot to interact with its (usually
unchanging) environment. Limitations of software and hardware reduced the need
for fast response in these early robots with Shakey as the flagship of this generation of
robots. Shakey was developed from approximately 1966 through 1972. The robot’s
environment was limited to a set of rooms and corridors with light switches that can
be interacted with.
Shakey’s programming language was LISP and it used STRIPS (Stanford Research
Institute Problem Solver) as its planner. A STRIPS system consists of an initial state,
a goal state and a set of operations that can be performed. Each operation has a set
of preconditions that must be satisfied for the operation to be executable, and a set of
postconditions that are achieved once the operation is executed. A plan in STRIPS
is an ordered list of operations to be executed in order to go from the initial state to
the goal state. Just deciding whether a plan exists is a PSPACE-Complete problem.
Even from this very first example, the importance of goal directed behavior and
the ability to autonomously decide on a course of action is clear. Having people
interacting with the robot, only complicates the problem because of the difficulty in
predicting human behavior in general.
Due to the real-time restrictions of modern robots, this deliberative approach was
later mostly replaced by reactive architectures that make the sense–act loop much
shorter hoping to react timely to the environment (Brooks et al. 1998). We believe
that reactive architectures cannot, without extensions, handle the requirements of
contextual decision making hinted to in our discussion of interaction studies because
of the need to represent the interaction protocol and reason about the common ground
with interaction partners.
For these reasons (and others discussed in more details in Chap. 8) we developed
a hybrid robotic architecture that can better handle the complexity of modeling and
executing interaction protocols as well as learning them.
The second thread of robotics research that our system is based upon is the tradition
of intelligent robotics which focuses on robots that can learn new skills. Advances in

this area mirror the changes in robotic architecture from more deliberative to more
reactive approaches followed by several forms of hybrid techniques.
An important approach for robot learning that is gaining more interest from robotocists is learning from demonstration. In the standard learning from demonstration
(LfD) setting, a robot is shown some behavior and is expected to learn how to execute
it. Several approaches to LfD have been proposed over the years starting from inverse
optimal control in the 1990s and the two currently most influential approaches are statistical modeling and Dynamical Motion Primitives (Chap. 13). The work reported
in this book extends this work by introducing a complete system for learning not
only how to imitate a demonstration but for segmenting relevant demonstrations
from continuous input streams in what we call fluid imitation discussed in details in
Chap. 12.


1.3 Relation to Different Research Fields

11

1.3.3 Neuroscience and Experimental Psychology
Neuroscience can be defined as the study of the neural system in humans and animals.
The design of EICA and its top-down, bottom-up action generation mechanism was
inspired in part by some results of neuroscience including the discovery of mirror
neurons (Murata et al. 1997) and their role in understanding the actions of others and
learning the Theory of Mind (See Sect. 8.2).
Experimental Psychology is the experimental study of thought. EICA architecture
is designed based on two theoretical hypotheses. The first of them (intention through
interaction hypothesis) is based on results in experimental psychology especially the
controversial assumption that conscious intention, at least sometimes, follows action
rather than preceding it (See Sect. 8.3).

1.3.4 Machine Learning and Data Mining
Machine learning is defined here as the development and study of algorithms that

allow artificial agents (machines) to improve their behavior over time. Unsupervised
as well as supervised machine learning techniques where used in various parts of this
work to model behavioral patterns and learn interaction protocols. For example, the
Floating Point Genetic Algorithm FPGA (presented in Chap. 9) is based on previous
results in evolutionary computing.
Data mining is defined here as the development of algorithms and systems to
discover knowledge from data. In the first developmental stage (interaction babbling)
we aim at discovering the basic interactive acts from records of interaction data (See
Sect. 11.1.1). Researchers in data mining developed many algorithms to solve this
problem including algorithms for detection of change points in time series as well as
motif discovery in time series. We used these algorithms as the basis for development
of novel algorithms more useful for our task.

1.3.5 Contributions
The main contribution of this work is to provide a complete framework for developing
nonverbal interactive behavior in robots using unsupervised learning techniques. This
can form the basis for developing future robots that exhibit ever improving interactive
abilities and that can adapt to different cultural conditions and contexts. This work
also contributed to various fields of research:
Robotics: The main contributions to the robotics field are:
• The EICA architecture provides a common platform for implementing both
autonomous and interactive behaviors.


12

1 Introduction

• The proposed learning system can be used to teach robots new behaviors by utilizing natural interaction modalities which is a step toward general purpose robots
that can be bought then trained by novice users to do various tasks under human

supervision.
Experimental Psychology: This work provides a computational model for testing
theories about intention and theory of mind development.
Machine Learning: A novel Floating Point Genetic Algorithm was developed to
learn the parameters of any parallel system including the platform of EICA (See
Chap. 9). The Interaction Structure Learning algorithms presented in Sect. 11.2
can be used to learn a hierarchy of dynamical systems representing relations
between interacting processes at multiple levels of abstraction.
Data Mining: Section 3.5 introduces several novel change point discovery algorithms
that can be shown to provide higher specificity over a traditional CPD algorithm both in synthetic and real world data sets. Also this work defines the constrained motif discovery problem and provides several algorithms for solving it
(See Sect. 4.6) as well as algorithms for the discovery of causal relations between
variables represented by time-series data (Sect. 5.5)
Interaction Studies: This book reports the development of several gaze controllers
that could achieve human-like gazing behavior based on the approach–avoidance
model as well as autonomous learning of interaction protocols. These controllers
can be used to study the effects of variations in gaze behavior on the interaction
smoothness. Moreover, we studied the effect of mutual and back imitation in the
perception of robot’s imitative skill in Chap. 7.

1.4 Interaction Scenarios
To test the ideas presented in this work we used two interaction scenarios in most of
the experiments done and presented in this book.
The first interaction scenario is presented in Fig. 1.2. In this scenario the participant
is guiding a robot to follow a predefined path (drawn or projected on the ground) using
free hand gestures. There is no predefined set of gestures to use and the participant
can use any hand gestures (s)he sees useful for the task. This scenario is referred to
as the guided navigation scenario in this book.
Optionally a set of virtual objects are placed at different points of the path that
are not visible to the participant but are known to the robot (only when approaching
them) using virtual infrared sensors. Using these objects it is possible to adjust the

distribution of knowledge about the task between the participant and the robot. If
no virtual objects are present in the path, the participant–robot relation becomes a
master–slave one as the participant knows all the information about the environment
(the path) and the robot can only follow the commands of the participant. If objects are
present in the environment the relation becomes more collaborative as the participant
now has partial knowledge about the environment (the path) while the robot has the


×