Tải bản đầy đủ (.pdf) (475 trang)

Machine learning, optimization, and big data 2016

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (27.04 MB, 475 trang )

LNCS 10122

Panos M. Pardalos · Piero Conca
Giovanni Giuffrida · Giuseppe Nicosia (Eds.)

Machine Learning,
Optimization,
and Big Data
Second International Workshop, MOD 2016
Volterra, Italy, August 26–29, 2016
Revised Selected Papers

123


Lecture Notes in Computer Science
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland


John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany

10122


More information about this series at />

Panos M. Pardalos Piero Conca
Giovanni Giuffrida Giuseppe Nicosia (Eds.)




Machine Learning,
Optimization,
and Big Data
Second International Workshop, MOD 2016

Volterra, Italy, August 26–29, 2016
Revised Selected Papers

123


Editors
Panos M. Pardalos
Department of Industrial and Systems
Engineering
University of Florida
Gainesville, FL
USA
Piero Conca
Semantic Technology Laboratory
National Research Council (CNR)
Catania
Italy

Giovanni Giuffrida
Dipartimento di Sociologia e Metodi della
Ricerca Sociale
Università di Catania
Catania
Italy
Giuseppe Nicosia
Department of Mathematics and Computer
Science
University of Catania
Catania

Italy

ISSN 0302-9743
ISSN 1611-3349 (electronic)
Lecture Notes in Computer Science
ISBN 978-3-319-51468-0
ISBN 978-3-319-51469-7 (eBook)
DOI 10.1007/978-3-319-51469-7
Library of Congress Control Number: 2016961276
LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI
© Springer International Publishing AG 2016
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made.
Printed on acid-free paper
This Springer imprint is published by Springer Nature
The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland


Preface


MOD is an international workshop embracing the fields of machine learning, optimization, and big data. The second edition, MOD 2016, was organized during August
26–29, 2016, in Volterra (Pisa, Italy), a stunning medieval town dominating the picturesque countryside of Tuscany.
The key role of machine learning, optimization, and big data in developing solutions
to some of the greatest challenges we are facing is undeniable. MOD 2016 attracted
leading experts from the academic world and industry with the aim of strengthening the
connection between these institutions. The 2016 edition of MOD represented a great
opportunity for professors, scientists, industry experts, and postgraduate students to
learn about recent developments in their own research areas and to learn about research
in contiguous research areas, with the aim of creating an environment to share ideas
and trigger new collaborations.
As program chairs, it was an honor to organize a premiere workshop in these areas
and to have received a large variety of innovative and original scientific contributions.
During this edition, four plenary lectures were presented:
Nello Cristianini, Bristol University, UK
George Michailidis, University of Florida, USA
Stephen H. Muggleton, Imperial College London, UK
Panos Pardalos, University of Florida, USA
There were also two tutorial speakers:
Luigi Malagó, Shinshu University, Nagano, Japan
Luca Oneto and Davide Anguita, Polytechnic School, University of Genova, Italy
Furthermore, an industrial panel on “Machine Learning, Optimization and Data
Science for Real-World Applications” was also offered:
Amr Awadallah, Founder and CTO at Cloudera, San Francisco, USA
Giovanni Giuffrida, CEO and co-founder at Neodata Group, Italy
Andy Petrella, Data Scientist and co-founder at Data Fellas, Liege, Belgium
Daniele Quercia, Head of Social Dynamics group at Bell Labs, Cambridge, UK
Fabrizio Silvestri, Facebook Inc., USA
Moderator: Donato Malerba, University of Bari, Italy and Consorzio Interuniversitario Nazionale per l’Informatica (CINI)
MOD 2016 received 97 submissions, and each manuscript was independently

reviewed via a blind review process by a committee formed by at least five members.
These proceedings contain 40 research articles written by leading scientists in the fields
of machine learning, computational optimization, and data science presenting a substantial array of ideas, technologies, algorithms, methods and applications.


VI

Preface

This conference could not have been organized without the contributions of these
researchers, and we thank them all for participating. A sincere thank goes also to all the
Program Committee, formed by more than 300 scientists from academia and industry,
for their valuable work of selecting the scientific contributions.
Finally, we would like to express our appreciation to the keynote speakers, tutorial
speakers, and the industrial panel who accepted our invitation, and to all the authors
who submitted their research papers to MOD 2016.
August 2016

Panos M. Pardalos
Piero Conca
Giovanni Giuffrida
Giuseppe Nicosia


Organization

MOD 2016 Committees
General Chair
Giuseppe Nicosia


University of Catania, Italy

Conference and Technical Program Committee Co-chairs
Panos Pardalos
Piero Conca
Giovanni Giuffrida
Giuseppe Nicosia

University
University
University
University

of
of
of
of

Florida, USA
Catania, Italy
Catania, Italy
Catania, Italy

Tutorial Chair
Giuseppe Narzisi

New York Genome Center, NY, USA

Industrial Session Chairs
Ilaria Bordino

Marco Firrincieli
Fabio Fumarola
Francesco Gullo

UniCredit
UniCredit
UniCredit
UniCredit

R&D,
R&D,
R&D,
R&D,

Italy
Italy
Italy
Italy

Organizing Committee
Piero Conca
Jole Costanza
Giuseppe Narzisi
Andrea Patane’
Andrea Santoro
Renato Umeton

CNR and University of Catania, Italy
Italian Institute of Technology, Milan, Italy
New York Genome Center, USA

University of Catania, Italy
University of Catania, Italy
Harvard University, USA

Publicity Chair
Giovanni Luca Murabito

DiGi Apps, Italy

Technical Program Committee
Ajith Abraham
Andy Adamatzky
Agostinho Agra
Hernán Aguirre
Nesreen Ahmed

Machine Intelligence Research Labs, USA
University of the West of England, UK
University of Aveiro, Portugal
Shinshu University, Japan
Intel Research Labs, USA


VIII

Organization

Youhei Akimbo
Leman Akoglu
Richard Allmendinger

Paula Amaral
Ekart Aniko
Paolo Arena
Ashwin Arulselvan
Jason Atkin
Martin Atzmueller
Chloé-Agathe Azencott
Jaume Bacardit
James Bailey
Baski Balasundaram
Wolfgang Banzhaf
Helio Barbosa
Thomas Bartz-Beielstein
Simone Bassis
Christian Bauckhage
Aurélien Bellet
Gerardo Beni
Tanya Berger-Wolf
Heder Bernardino
Daniel Berrar
Martin Berzins
Rajdeep Bhowmik
Albert Bifet
Mauro Birattari
J. Blachut
Konstantinos Blekas
Maria J. Blesa
Christian Blum
Flavia Bonomo
Gianluca Bontempi

Pascal Bouvry
Larry Bull
Tadeusz Burczynski
Róbert Busa-Fekete
Sergio Butenko
Stefano Cagnoni
Mustafa Canim
Luigia Carlucci Aiello
Tania Cerquitelli
Uday Chakraborty
Lijun Chang
W. Art Chaovalitwongse
Ying-Ping Chen

Shinshu University, Japan
Stony Brook University, USA
University College London, UK
University Nova de Lisboa, Portugal
Aston University, UK
University of Catania, Italy
University of Strathclyde, UK
University of Nottingham, UK
University of Kassel, Germany
Mines ParisTech Institut Curie, France
Newcastle University, UK
University of Melbourne, Australia
Oklahoma State University, USA
Memorial University, Canada
Laboratório Nacional Computação Científica, Brazil
Cologne University of Applied Sciences, Germany

University of Milan, Italy
Fraunhofer IAIS, Germany
Télécom ParisTech, France
University of California at Riverside, USA
University of Illinois at Chicago, USA
Universidade Federal de Juiz de Fora, Brazil
Shibaura Institute of Technology, Japan
University of Utah, USA
Cisco Systems, Inc., USA
University of Waikato, New Zealand
Université Libre de Bruxelles, Belgium
University of Liverpool, UK
University of Ioannina, Greece
Universitat Politècnica de Catalunya, Spain
Basque Foundation for Science, Spain
Universidad de Buenos Aires, Argentina
Université Libre de Bruxelles, Belgium
University of Luxembourg, Luxembourg
University of the West of England, UK
Polish Academy of Sciences, Poland
University of Paderborn, Germany
Texas A&M University, USA
University of Parma, Italy
IBM T.J. Watson Research Center, USA
Sapienza Università di Roma, Italy
Politecnico di Torino, Italy
University of Missouri St. Louis, USA
University of New South Wales, Australia
University of Washington, USA
National Chiao Tung University, Taiwan



Organization

Koke Chen
Kaifeng Chen
Silvia Chiusano
Miroslav Chlebik
Sung-Baa Cho
Siang Yew Chong
Philippe Codognet
Pietro Colombo
Ernesto Costa
Jole Costanza
Maria Daltayanni
Raj Das
Mahashweta Das
Kalyanmoy Deb
Noel Depalma
Clarisse Dhaenens
Luigi Di Caro
Gianni Di Caro
Tom Diethe
Federico Divina
Stephan Doerfel
Karl Doerner
Rafal Drezewski
Ding-Zhou Du
George S. Dulikravich
Talbi El-Ghazali

Michael Emmerich
Andries Engelbrecht
Roberto Esposito
Cesar Ferri
Steffen Finck
Jordi Fonollosa
Carlos M. Fonseca
Giuditta Franco
Piero Fraternali
Valerio Freschi
Enrique Frias-Martinez
Marcus Gallagher
Patrick Gallinari
Xavier Gandibleux
Amir Hossein Gandomi
Inmaculada Garcia
Fernandez
Deon Garrett
Paolo Garza
Martin Josef Geiger

IX

Wright State University, USA
NEC Labs America, USA
Politecnico di Torino, Italy
University of Sussex, UK
Yonsei University, South Korea
University of Nottingham, Malaysia Campus, Malaysia
University of Tokyo, Japan

Università dell’Insubria, Italy
University of Coimbra, Portugal
Fondazione Istituto Italiano di Tecnologia, Italy
University of California Santa Cruz, USA
University of Auckland, New Zealand
Hewlett Packard Labs, USA
Michigan State University, USA
Joseph Fourier University, France
University of Lille 1, France
University of Turin, Italy
IDSIA, Switzerland
University of Bristol, UK
Pablo de Olavide University, Spain
University of Kassel, Germany
Johannes Kepler University Linz, Austria
AGH University of Science and Technology, Poland
University of Texas at Dallas, USA
Florida International University, USA
University of Lille, France
Leiden University, The Netherlands
University of Pretoria, South Africa
University of Turin, Italy
Universitat Politècnica de València, Spain
Vorarlberg University of Applied Sciences, Austria
Institute for Bioengineering of Catalonia, Spain
University of Coimbra, Portugal
University of Verona, Italy
Politecnico di Milano, Italy
University of Urbino, Italy
Telefonica Research, Spain

University of Queensland, Australia
Pierre et Marie Curie University, France
University of Nantes, France
The University of Akron, USA
University of Almeria, Spain
Icelandic Institute Intelligent Machine, Iceland
Politecnico di Torino, Italy
Helmut Schmidt University, Germany


X

Organization

Michel Gendreau
Kyriakos Giannakoglou
Giovanni Giuffrida
Aris Gkoulalas Divanis
Christian Gogu
Michael Granitzer
Mario Guarracino
Heiko Hamann
Jin-Kao Hao
William Hart
Richard F. Hartl
Mohammad Hasan
Geir Hasle
Verena Heidrich-Meisner
Eligius Hendrix
Carlos Henggeler Antunes

Alfredo G. Hernández-Díaz
Francisco Herrera
J. Michael Herrmann
Jaakko Hollmén
Vasant Honavar
Hongxuan Huang
Fabric Huet
Sam Idicula
Yoshiharu Ishikawa
Christian Jacob
Hasan Jamil
Gareth Jones
Laetitia Jourdan
Narendra Jussien
Valeriy Kalyagin
George Karakostas
George Karypis
Ioannis Katakis
Saurabh Kataria
Graham Kendall
Kristian Kersting
Zeynep Kiziltan
Joshua Knowles
Andrzej Kochut
Yun Sing Koh
Igor Konnov
Petros Koumoutsakos
Georg Krempl
Erhun Kundakcioglu


École Polytechnique de Montréal, Canada
National Technical University of Athens, Greece
University of Catania, Italy and Neodata Intelligence
Inc., Italy
IBM Dublin Research Lab, Ireland
Université Toulouse III, France
University of Passau, Germany
ICAR-CNR, Italy
University of Paderborn, Germany
University of Angers, France
Sandia Labs, USA
University of Vienna, Austria
Indiana University Purdue University, USA
SINTEF, Norway
Extraterrestrial Physics CAU Kiel, Germany
Universidad de Málaga, Spain
University of Coimbra, Portugal
Pablo de Olvide University, Spain
University of Granada, Spain
University of Edinburgh, UK
Aalto University, Finland
Penn State University, USA
Tsinghua University, China
University of Nice, France
Oracle, USA
Nagoya University, Japan
University of Calgary, Canada
University of Idaho, USA
Dublin City University, Ireland
Inria/LIFL/CNRS, France

Ecole des Mines de Nantes/LINA, France
Higher School of Economics, Russia
McMaster University, Canada
University of Minnesota, USA
University of Athens, Greece
Xerox Research, USA
University of Nottingham, UK
TU Dortmund University, Germany
University of Bologna, Italy
University of Manchester, UK
IBM T.J. Watson Research Center, USA
University of Auckland, New Zealand
Kazan University, Russia
ETHZ, Switzerland
University of Magdeburg, Germany
Ozyegin University, Turkey


Organization

Halina Kwasnicka
Joerg Laessig
Albert Y.S. Lam
Niklas Lavesson
Kang Li
Edo Liberty
Arnaud Liefooghe
Weifeng Liu
Giosue’ Lo Bosco
Fernando Lobo

Marco Locatelli
Manuel Lopez-Ibanez
Jose A. Lozano
Paul Lu
Angelo Lucia
Luigi Malagò
Lina Mallozzi
Vittorio Maniezzo
Yannis Manolopoulos
Marco Maratea
Elena Marchiori
Tiziana Margaria
Juan Enrique
Martinez-Legaz
Basseur Matthieu
Giancarlo Mauri
Suzanne McIntosh
Gabor Melli
Silja Meyer-Nieberg
Alessio Micheli
Martin Middendorf
Taneli Mielikäinen
Kaisa Miettinen
Marco A. Montes De Oca
Antonio Mora
Christian L. Müller
Mohamed Nadif
Hidemoto Nakada
Amir Nakib
Mirco Nanni

Giuseppe Nicosia
Jian-Yun Nie
Xia Ning
Eirini Ntoutsi

XI

Wroclaw University of Technology, Poland
University of Applied Sciences Zittau/Görlitz,
Germany
The University of Hong Kong, Hong Kong,
SAR China
Blekinge Institute of Technology, Sweden
Groupon Inc., USA
Yahoo Labs, USA
University of Lille, France
China University of Petroleum, China
Università di Palermo, Italy
University of Algarve, Portugal
University of Parma, Italy
University of Manchester, UK
The University of the Basque Country, Spain
University of Alberta, Canada
University of Rhode Island, USA
Shinshu University, Japan
University of Naples Federico II, Italy
University of Bologna, Italy
Aristotle University of Thessaloniki, Greece
University of Genova, Italy
Radboud University, The Netherlands

Lero, Ireland
Universitat Autònoma de Barcelona, Spain
LERIA Angers, France
University of Milano-Bicocca, Italy
NYU Courant Institute and Cloudera Inc., USA
VigLink, USA
Universität der Bundeswehr München, Germany
University of Pisa, Italy
University of Leipzig, Germany
Nokia, Finland
University of Jyväskylä, Finland
Clypd, Inc., USA
University of Granada, Spain
Simons Center for Data Analysis, USA
University of Paris Descartes, France
National Institute of Advanced Industrial, Japan
Université Paris Est Créteil, France
ISTI-CNR Pisa, Italy
University of Catania, Italy
Université de Montréal, Canada
Indiana University Purdue, USA
Ludwig-Maximilians-Universitüt München, Germany


XII

Organization

Salvatore Orlando
Sinno Jialin Pan

Pan Pan
George Papastefanatos
Luis Paquete
Andrew J. Parkes
Ioannis Partalas
Jun Pei
Nikos Pelekis
David Pelta
Diego Perez
Vincenzo Piuri
Silvia Poles
George Potamias
Adam Prugel-Bennett
Buyue Qian
Chao Qian
Günther Rail
Helena Ramalhinho
Jan Ramon
Vitorino Ramos
Zbigniew Ras
Khaled Rasheed
Jan Rauch
Steffen Rebennack
Celso Ribeiro
Florian Richoux
Juan J. Rodriguez
Andrea Roli
Samuel Rota Bulò
Arnab Roy
Alessandro Rozza

Thomas Runarsson
Berc Rustem
Florin Rusu
Nick Sahinidis
Lorenza Saitta
Horst Samulowitz
Ganesh Ram Santhanam
Vítor Santos Costa
Claudio Sartori
Frédéric Saubion
Andrea Schaerf
Robert Schaefer
Fabio Schoen
Christoph Schommer

Università Ca’ Foscari Venezia, Italy
Nanyang Technological University, Singapore
Alibaba Inc., China
IMIS/RC Athena, Greece
University of Coimbra, Portugal
University of Nottingham, USA
Viseo R&D, France
University of Florida, USA
University of Piraeus, Greece
University of Granada, Spain
University of Essex, UK
University of Milan, Italy
Noesis Solutions NV, Belgium
FORTH-ICS, Greece
University of Southampton, UK

IBM T.J. Watson Research Center, USA
Nanjing University, China
Technische Universität Wien, Austria
Universitat Pompeu Fabra, Spain
Inria Lille, France
Technical University of Lisbon, Portugal
University of North Carolina, USA
University of Georgia, USA
University of Economics Prague, Czech Republic
University of Florida, USA
Universidade Federal Fluminense, Brazil
Université de Nantes, France
University of Burgos, Spain
University of Bologna, Italy
Fondazione Bruno Kessler, Italy
Fujitsu Laboratories of America, USA
Università di Napoli-Parthenope, Italy
University of Iceland, Iceland
Imperial College London, UK
University of California, Merced, USA
Carnegie Mellon University, USA
Università del Piemonte Orientale, Italy
IBM Research, USA
Iowa State University, USA
Universidade do Porto, Portugal
University of Bologna, Italy
University of Angers, France
University of Udine, Italy
AGH University of Science and Technology, Poland
University of Florence, Italy

University of Luxembourg, Luxembourg


Organization

Oliver Schuetze
Michèle Sebag
Giovanni Semeraro
Roberto Serra
Marc Sevaux
Junming Shao
Ruey-Lin Sheu
Patrick Siarry
Dan Simovici
Karthik Sindhya
Anthony Man-ChoSo
Christine Solnon
Oliver Stein
Catalin Stoean
Thomas Stützle
Ponnuthurai Suganthan
Johan Suykens
El-Ghazali Talbi
Domenico Talia
Wei Tan
Letizia Tanca
Ke Tang
Andrea Tettamanzi
Jerzy Tiuryn
Heike Trautmann

Vincent S. Tseng
Theodoros Tzouramanis
Satish Ukkusuri
Giorgio Valentini
Pascal Van Hentenryck
Analucia Varbanescu
Carlos A. Varela
Iraklis Varlamis
Eleni Vasilaki
Sébastien Verel
Vassilios Verykios
Henna Viktor
Maksims Volkovs
Dean Vucinic
Jason Wang
Jianwu Wang
Lipo Wang
Liqiang Wang
Lin Wu

XIII

CINVESTAV-IPN, Mexico
University of Paris-Sud, France
University of Bari, Italy
University of Modena Reggio Emilia, Italy
Université de Bretagne-Sud, France
University of Electronic Science and Technology,
China
National Cheng-Kung University, Taiwan

Université de Paris 12, France
University of Massachusetts Boston, USA
University of Jyväskylä, Finland
The Chinese University of Hong Kong, Hong Kong,
SAR China
LIRIS — CNRS, France
Karlsruhe Institute of Technology, Germany
University of Craiova, Romania
Université Libre de Bruxelles, Belgium
Nanyang Technological University, Singapore
K.U. Leuven, Belgium
University of Lille, France
University of Calabria, Italy
IBM, USA
Politecnico di Milano, Italy
University of Science and Technology of China, China
University Nice Sophia Antipolis, France
Warsaw University, Poland
TU Dortmund University, Germany
National Chiao Tung University, Taiwan
University of the Aegean, Greece
Purdue University, USA
University of Milan, Italy
University of Michigan, USA
University of Amsterdam, The Netherlands
Rensselaer Polytechnic Institute, USA
Harokopio University of Athens, Greece
University of Sheffield, UK
Université du Littoral Côte d’Opale, France
Hellenic Open University, Greece

University of Ottawa, Canada
University of Toronto, Canada
Vrije Universiteit Brussel, Belgium
New Jersey Institute of Technology, USA
University of Maryland, USA
NTU, Singapore
University of Wyoming, USA
The University of Adelaide, Australia


XIV

Organization

Yinglong Xia
Ning Xiong
Chang Xu
Xin Xu
Shengxiang Yang
Qi Yu
Tina Yu
Kunpeng Zhang
Nan Zhang
Qingfu Zhang
Rui Zhang
Ying Zhao
Bin Zhou
Zhi-Hua Zhou
Djamel A. Zighed
Antennas Zilinskas

Julius Zilinskas

IBM T.J. Watson Research Center, USA
Mälardalen University, Sweden
Peking University, China
George Washington University, USA
De Montfort University, UK
Rochester Institute of Technology, USA
Memorial University of Newfoundland, Canada
University of Illinois at Chicago, USA
The George Washington University, USA
City University of Hong Kong, Hong Kong,
SAR China
IBM Research — Almaden, USA
Tsinghua University, China
University of Maryland, USA
Nanjing University, China
University of Lyon 2, France
Vilnius University, Lithuania
Vilnius University, Lithuania


Contents

Machine Learning: Multi-site Evidence-Based Best Practice Discovery . . . . .
Eva K. Lee, Yuanbo Wang, Matthew S. Hagen, Xin Wei,
Robert A. Davis, and Brent M. Egan

1


Data-Based Forest Management with Uncertainties and Multiple Objectives . . .
Markus Hartikainen, Kyle Eyvindson, Kaisa Miettinen,
and Annika Kangas

16

Metabolic Circuit Design Automation by Multi-objective BioCAD . . . . . . . .
Andrea Patané, Piero Conca, Giovanni Carapezza, Andrea Santoro,
Jole Costanza, and Giuseppe Nicosia

30

A Nash Equilibrium Approach to Metabolic Network Analysis . . . . . . . . . . .
Angelo Lucia and Peter A. DiMaggio

45

A Blocking Strategy for Ranking Features According
to Probabilistic Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Gianluca Bontempi

59

A Scalable Biclustering Method for Heterogeneous Medical Data . . . . . . . . .
Maxence Vandromme, Julie Jacques, Julien Taillard,
Laetitia Jourdan, and Clarisse Dhaenens

70

Neural Learning of Heuristic Functions for General Game Playing . . . . . . . .

Leo Ghignone and Rossella Cancelliere

82

Comparing Hidden Markov Models and Long Short Term Memory
Neural Networks for Learning Action Representations . . . . . . . . . . . . . . . . .
Maximilian Panzner and Philipp Cimiano
Dynamic Multi-Objective Optimization with jMetal and Spark:
A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
José A. Cordero, Antonio J. Nebro, Cristóbal Barba-González,
Juan J. Durillo, José García-Nieto, Ismael Navas-Delgado,
and José F. Aldana-Montes
Feature Selection via Co-regularized Sparse-Group Lasso. . . . . . . . . . . . . . .
Paula L. Amaral Santos, Sultan Imangaliyev, Klamer Schutte,
and Evgeni Levin
Economic Lot-Sizing Problem with Remanufacturing Option:
Complexity and Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Kerem Akartunalı and Ashwin Arulselvan

94

106

118

132


XVI


Contents

A Branch-and-Cut Algorithm for a Multi-item Inventory
Distribution Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Agostinho Agra, Adelaide Cerveira, and Cristina Requejo
Adaptive Targeting in Online Advertisement: Models Based
on Relative Influence of Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Andrey Pepelyshev, Yuri Staroselskiy, Anatoly Zhigljavsky,
and Roman Guchenko

144

159

Design of Acoustic Metamaterials Through Nonlinear Programming . . . . . . .
Andrea Bacigalupo, Giorgio Gnecco, Marco Lepidi,
and Luigi Gambarotta

170

Driver Maneuvers Inference Through Machine Learning . . . . . . . . . . . . . . .
Mauro Maria Baldi, Guido Perboli, and Roberto Tadei

182

A Systems Biology Approach for Unsupervised Clustering
of High-Dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Diana Diaz, Tin Nguyen, and Sorin Draghici

193


Large-Scale Bandit Recommender System . . . . . . . . . . . . . . . . . . . . . . . . .
Frédéric Guillou, Romaric Gaudel, and Philippe Preux

204

Automatic Generation of Sitemaps Based on Navigation Systems . . . . . . . . .
Pasqua Fabiana Lanotte, Fabio Fumarola, Donato Malerba,
and Michelangelo Ceci

216

A Customer Relationship Management Case Study Based on Banking Data . . .
Ivan Luciano Danesi and Cristina Rea

224

Lagrangian Relaxation Bounds for a Production-Inventory-Routing
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Agostinho Agra, Adelaide Cerveira, and Cristina Requejo

236

Convergence Rate Evaluation of Derivative-Free Optimization Techniques . . .
Thomas Lux

246

The Learnability of Business Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Olivier Wang, Changhai Ke, Leo Liberti, and Christian de Sainte Marie


257

Dynamic Programming with Approximation Function for Nurse Scheduling . . .
Peng Shi and Dario Landa-Silva

269

Breast Cancer’s Microarray Data: Pattern Discovery Using Nonnegative
Matrix Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Nicoletta Del Buono, Flavia Esposito, Fabio Fumarola,
Angelina Boccarelli, and Mauro Coluccia

281


Contents

Optimizing the Location of Helicopter Emergency Medical Service
Operating Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Maurizio Bruglieri, Cesare Cardani, and Matteo Putzu
An Enhanced Infra-Chromatic Bound for the Maximum Clique Problem . . . .
Pablo San Segundo, Jorge Artieda, Rafael Leon, and Cristobal Tapia

XVII

293
306

Cultural Ant Colony Optimization on GPUs for Travelling

Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Olgierd Unold and Radosław Tarnawski

317

Combining Genetic Algorithm with the Multilevel Paradigm
for the Maximum Constraint Satisfaction Problem . . . . . . . . . . . . . . . . . . . .
Noureddine Bouhmala

330

Implicit Location Sharing Detection in Social Media Turkish
Text Messaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Davut Deniz Yavuz and Osman Abul

341

Fuzzy Decision-Making of a Process for Quality Management . . . . . . . . . . .
Feyza Gürbüz and Panos M. Pardalos

353

A Bayesian Network Profiler for Wildfire Arsonists . . . . . . . . . . . . . . . . . .
Rosario Delgado, José Luis González, Andrés Sotoca,
and Xavier-Andoni Tibau

379

Learning Optimal Decision Lists as a Metaheuristic Search for Diagnosis
of Parkinson’s Disease. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Fernando de Carvalho Gomes and José Gilvan Rodrigues Maia

391

Hermes: A Distributed-Messaging Tool for NLP . . . . . . . . . . . . . . . . . . . . .
Ilaria Bordino, Andrea Ferretti, Marco Firrincieli, Francesco Gullo,
Marcello Paris, and Gianluca Sabena

402

Deep Learning for Classification of Dental Plaque Images . . . . . . . . . . . . . .
Sultan Imangaliyev, Monique H. van der Veen,
Catherine M.C. Volgenant, Bart J.F. Keijser, Wim Crielaard,
and Evgeni Levin

407

Multiscale Integration for Pattern Recognition in Neuroimaging . . . . . . . . . .
Margarita Zaleshina and Alexander Zaleshin

411

Game Theoretical Tools for Wing Design . . . . . . . . . . . . . . . . . . . . . . . . .
Lina Mallozzi, Giovanni Paolo Reina, Serena Russo,
and Carlo de Nicola

419


XVIII


Contents

Fastfood Elastic Net: Combining Variable Selection with Kernel
Expansion Approximations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sonia Kopel, Kellan Fluette, Geena Glen, and Paul E. Anderson
Big Data Analytics in a Public General Hospital . . . . . . . . . . . . . . . . . . . . .
Ricardo S. Santos, Tiago A. Vaz, Rodrigo P. Santos,
and José M. Parente de Oliveira

427
433

Inference of Gene Regulatory Network Based on Radial Basis Function
Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sanrong Liu, Bin Yang, and Haifeng Wang

442

Establishment of Optimal Control Strategy of Building-Integrated
Photovoltaic Blind Slat Angle by Considering Interior Illuminance
and Electricity Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Taehoon Hong, Jeongyoon Oh, Kwangbok Jeong, Jimin Kim,
and Minhyun Lee

451

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

455



Machine Learning: Multi-site Evidence-Based
Best Practice Discovery
Eva K. Lee1,2,3(&), Yuanbo Wang1,2,3, Matthew S. Hagen1,2,3,
Xin Wei1,2,3, Robert A. Davis4,5, and Brent M. Egan4,5
1

4

Center for Operations Research in Medicine and HealthCare,
Atlanta, GA, USA

2
NSF I/UCRC Center for Health Organization Transformation,
Atlanta, GA, USA
3
Georgia Institute of Technology, Atlanta, GA, USA
University of South Carolina School of Medicine, Greenville, SC, USA
5
Care Coordination Institute, Greenville, SC, USA

Abstract. This study establishes interoperability among electronic medical
records from 737 healthcare sites and performs machine learning for best
practice discovery. A mapping algorithm is designed to disambiguate free text
entries and to provide a unique and unified way to link content to structured
medical concepts despite the extreme variations that can occur during clinical
diagnosis documentation. Redundancy is reduced through concept mapping.
A SNOMED-CT graph database is created to allow for rapid data access and
queries. These integrated data can be accessed through a secured web-based

portal. A classification model (DAMIP) is then designed to uncover discriminatory characteristics that can predict the quality of treatment outcome. We
demonstrate system usability by analyzing Type II diabetic patients. DAMIP
establishes a classification rule on a training set which results in greater than
80% blind predictive accuracy on an independent set of patients. By including
features obtained from structured concept mapping, the predictive accuracy is
improved to over 88%. The results facilitate evidence-based treatment and
optimization of site performance through best practice dissemination and
knowledge transfer.

1 Introduction
Individual health systems provide various services and allocate different resources for
patient care. Healthcare resources including professional and staff time are often
constrained. Making clinical decisions is a complicated task since it requires physicians
to infer information from a given case and determine a best treatment based on their
knowledge [1]. Addressing these problems is essential for delivering effective care
plans to patients.
Data from electronic medical records (EMRs) can reveal critical variables that
impact treatment outcomes and inform allocation of limited time and resources,
allowing physicians to practice evidence-based treatment tailored to individual patient
© Springer International Publishing AG 2016
P.M. Pardalos et al. (Eds.): MOD 2016, LNCS 10122, pp. 1–15, 2016.
DOI: 10.1007/978-3-319-51469-7_1


2

E.K. Lee et al.

conditions. On a larger scale, realistically modifiable social determinants of health that
will improve community health can potentially be discovered and addressed.

Although EMR adoption is spreading across the industry, many providers continue
to document clinical findings, procedures and outcomes with “free text” natural language on their EMRs [2]. They have difficulty (manually) mapping concepts to standardized terminologies and struggle with application programs that use structured
clinical data. This creates challenges for multi-site comparative effectiveness studies.
Standardized clinical terminologies are essential in facilitating interoperability
among EMR systems. They allow seamless sharing and exchange of healthcare
information for quality care delivery and coordination among multiple sites. However,
the volume and number of available clinical terminologies are large and are expanding.
Further, due to the increase in medical knowledge, and the continued development of
more advanced computerized medical systems, the use of clinical terminologies has
extended beyond diagnostic classification [3]. SNOMED-CT is a multidisciplinary
terminology system that is used for clinical decision support, ICU monitoring, indexing
medical records, medical research, and disease surveillance [4]. LOINC is a set of
universal names for expressing laboratory tests and clinical observations [5]. RxNorm
is a normalized naming system for medicines and drugs [6]. The Unified Medical
Language System (UMLS) is a terminology integration system developed by the US
National Library of Medicine (NLM) to promote the interoperability of systems and
mapping between the multitude of available clinical terminologies [7].
Many systems have been developed to map heterogeneous terminologies to support
communication and semantic interoperability between healthcare centers. STRIDE
mapped RxNorm concepts to the SNOMED-CT hierarchy and used the RxNorm
relationships in UMLS to link pharmacy data from two EMR sources in the Stanford
University Medical Center [8]. Carlo et al. classified ICD-9 diagnoses from unstructured discharge summaries using mapping tables in UMLS [9]. Patel and Cimino used
the existing linkages in UMLS to predict new potential terminological relationships
[10]. While many of these studies only focus on one standardized concept, our work
herein designs a mapping system that has the ability to accurately map medications,
laboratory results, and diagnosis entries from multiple EMRs. Each of the entries is
mapped to predefined terms in the SNOMED-CT ontology. Due to the hierarchical
nature of SNOMED-CT, similarities between patient diagnoses, laboratory results, and
medications can be found more easily. Hence mapped concepts can be generalized to
find shared characteristics among patients. Our work thus creates a more powerful

system than previous studies, because multiple types of concepts can be mapped and
compared in a hierarchical manner.
This study includes 737 clinical sites and de-identified data for over 2.7 million
patients with data collected from January 1990 to December 2012. To the best of our
knowledge analysis of EMR data across hundreds of healthcare sites and millions of
patients has not been attempted previously. Such analysis requires effective database
management, data extraction, preprocessing, and integration. In addition, temporal data
mining of longitudinal health data cannot currently be achieved through statistically
and computationally efficient methodologies and is still under-explored [1]. This is a
particularly important issue when analyzing outcome and health conditions for chronic
disease patients.


Machine Learning: Multi-site Evidence-Based Best Practice

3

In this paper, we first establish interoperability among EMRs from 737 clinical sites
by developing a system that can accurately map free text to concise structured medical
concepts. Multiple concepts are mapped, including patient diagnoses, laboratory
results, and medications, which allows shared characterization and hierarchical comparison. We then leverage a mixed integer programming-based classification model
(DAMIP) [11] to establish classification rules with relatively small subsets of discriminatory features that can be used to predict diabetes treatment outcome. Previous
studies in diabetes have identified features such as demographics, obesity, hypertension, and genetics that appear to be linked to the development and progression of type
II diabetes [12–14]. However, little is known about how these features interact with
treatment characteristics to affect patient outcome.

2 Methods and Design
2.1

Integrating and Mapping of Data to Standardized Clinical Concepts

and Terminologies

This study utilized EMR data of 2.7 million patients collected from 737 healthcare
facilities. A relational database was first designed with Postgres 9.3 to store these data.
Thirteen tables were created containing patient records pertaining to procedures,
demographics, diagnosis codes, laboratory measurements and medications. Indexes
were developed for each table to enable rapid search and table joins for data querying.
The data size for indexes is an additional 11 GB, totaling 27 GB for the entire database. We label this the CCI-health database, where CCI stands for Care Coordination
Institute.
In the CCI-health database, 2.46 million patients are associated with a diagnosis,
1.33 million are associated with laboratories, and 955,000 are linked to medications.
Laboratory and medication records are described with free text entries without unique
codes for each entry. Since clinicians may describe identical treatments with many
possible variations, it is essential to map entries to structured concepts without ambiguity. Overall 803 unique lab phrases and 9,755 medication phrases were extracted
from the patient records. Metamap [15–17], developed at the National Library of
Medicine, is a natural language processing tool that maps biomedical text to UMLS [7]
concepts. In this study, we used Metamap to recognize laboratory terms from the
LOINC [5] terminology and medication terms from the RxNorm [6] terminology. This
was done respectively using the UMLS MRCONSO and RXNCONSO tables. The final
step was to map associated terms to concepts in the SNOMED-CT ontology.
SNOMED-CT [4] is a comprehensive medical ontology with detailed hierarchical
relationships containing over 521,844 concepts and 4.65 million relationships (July
2015 release). LOINC and RxNorm terms established from the CCI-health database
were linked to SNOMED using the UMLS MRREL and RXNREL tables. In our
implementation, for LOINC, only concepts that have the name ‘procedure’ were
returned from the MRREL table. For RxNorm, only concepts that have “has_form”,
“has_ingredient”, and “has_tradename” relationships were returned from the RXNREL
table. When medication entries in an EMR and a SNOMED concept were named



4

E.K. Lee et al.

Fig. 1. The diagram shows the mapping procedure for laboratory phrases to SNOMED-CT
concepts.

Fig. 2. This diagram shows the mapping procedure for medication phrases to SNOMED-CT
concepts.

completely differently, relationships could still be found due to rules such as tradenames and ingredients. Figures 1 and 2 show the workflows for mapping laboratory
and medication phrases to SNOMED-CT concepts.


Machine Learning: Multi-site Evidence-Based Best Practice

5

The CCI-health database employs ICD-9 [18] codes for patient diagnoses. This
makes the mapping procedure to SNOMED-CT concepts slightly different from those
designed for laboratories and medications. The ICD9CM_SNOMED_MAP table in
UMLS can be used to map ICD-9 directly to SNOMED-CT concepts. However, this
does not include all ICD-9 codes that are associated with patients in the CCI-health
database. Metamap was then used to analyze the descriptions of the remaining ICD
codes that are not found in the ICD9CM_SNOMED_MAP table. The MRCONSO
table was used to map the UMLS concepts returned by Metamap to associated
SNOMED-CT concepts (Fig. 3).

Fig. 3. This figure shows the mapping of diagnosis ICD-9 codes to SNOMED-CT concepts.


SNOMED-CT provides a rich hierarchy enabling semantic distances between
concepts to be measured by path distances. We developed a Neo4j Graph Database for
the CCI-health data to rapidly compute common ancestor queries between the mapped
SNOMED CT terms. In our Neo4j Graph Database, tab delimited files of all SNOMED
concepts and relationships are exported from SNOMED CT Postgres relational database tables. The tab delimited files are then directly loaded into Neo4j community
edition 2.0.0 using their batch inserter ( />batchinsert.html). This results in a SNOMED Graph Database that has many cycles.
The cycles greatly impede graph operations such as returning all paths between nodes.
Using the Cypher query language, we can quickly identify all cycles with 2 or more
nodes. Each cycle can then be removed by deleting an edge based on some criteria
involving node depth and in-degree (number of incoming edges to a node).
After an acyclic SNOMED-CT Graph database was created, graph computations
such as shortest paths and common ancestor queries can be performed rapidly. This is
beneficial since laboratories, diagnoses, and medications are all mapped to many
SNOMED-CT concepts that can be too specific for machine learning analysis. In this
study, all nodes are assigned a depth level according to the minimum number of edges
that must be traversed to reach the root node. The root node is at depth level 0. All the


6

E.K. Lee et al.

mapped SNOMED-CT concepts can then be generalized to concepts at a higher depth
level. It is important to choose an appropriate depth level to accurately distinguish
patient characteristics from one another. For medications and diagnosis, a depth level
of 2 is chosen. A depth level of 3 is chosen for laboratories, since assigning lower depth
levels returned concepts that are too general. For a given SNOMED-CT concept, Neo4j
can quickly calculate all possible paths to the root node. With the Cypher query
language, Neo4j returns all nodes for a given depth level that are crossed from all
possible paths to the root of the hierarchy. This method was used for all mapped

SNOMED-CT concepts, and they can be converted into equivalent nodes at a more
general depth level. After conversion, the data was manipulated so that each row
contains one patient with a set of yes/no columns for each general SNOMED-CT
concept. Some mapped SNOMED-CT concepts are too common and are necessary to
remove before analysis. Examples include “Disorder of body system (362965005)”,
“Measurement procedure (122869004)”, “Chemical procedure (83762000)”, and
“Types of drugs (278471000)”. Upon cleaning, we arrived at the final integrated
dataset that includes additional features for predictive analysis.

2.2

Clustering Patients for Multi-site Outcome Analysis

The CCI-health database contains 267,666 diabetic patients. Each patient is characterized by 24 features including hospital site, demographics, laboratory tests and
results, prescriptions, treatment duration, chronic conditions, blood pressure, number of
visits and visit frequencies (Table 1). For each patient, treatment duration is determined
by calculating the elapsed time between diagnosis (indicated by the first prescription of
a diabetic medication) and the last recorded activity (i.e. procedure, lab, etc.). These
variables are considered potential features that may influence treatment outcome. They
are used as input for our classification analysis.
To establish the outcome group status for these diabetic patients, Glycated
hemoglobin (HbA1c) lab measurement series throughout the treatment duration were
used as indicators of treatment outcome. In our analysis, only patients with 7 or more
HbA1c measurements recorded and no missing features were included. This resulted in
3,875 patients. On each patient’s HbA1c measurement series, we performed sliding
window with a size of five measurements with equal weights to reduce potential noise.
Figure 4 shows the comparison of a patient’s HbA1c data before and after sliding
window is performed.
The 3,875 patients were clustered into two outcome groups based on their
smoothed HbA1c lab measurement series. Since each patient has different numbers of

records, a method for clustering time series of different lengths is required. Here we
compared these measurements based on the method proposed by Caiado et al. [19]:
First a periodogram of each patient’s smoothed HbA1c measurements was calculated.
Next, discrepancy statistics were calculated for each pair of periodograms and used as
the distance between each pair of patients. In the case when their recorded measurements are not equal in length, the shorter series is extended by adding zeros and the
zero-padding discrepancy statistics between the two series were used. Lastly, using the
distance matrix filled with discrepancy statistics, agglomerative clustering with average


×