Lecture Notes in Computer Science 4192
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
University of Dortmund, Germany
Madhu Sudan
Massachusetts Institute of Technology, MA, USA
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Moshe Y. Vardi
Rice University, Houston, TX, USA
Gerhard Weikum
Max-Planck Institute of Computer Science, Saarbruecken, Germany
Bernd Mohr Jesper Larsson Träff
Joachim Worringen Jack Dongarra (Eds.)
Recent Advances
in Parallel Virtual Machine
and Message PassingInterface
13th European PVM/MPI User’s Group Meeting
Bonn, Germany, September 17-20, 2006
Proceedings
13
Volume Editors
Bernd Mohr
Forschungszentrum Jülich GmbH
Zentralinstitut für Angewandte Mathematik
52425 Jülich, Germany
E-mail:
Jesper Larsson Träff
C&C Research Laboratories NEC Europe Ltd.
Rathausallee 10, 53757 Sankt Augustin, Germany
E-mail:
Joachim Worringen
Dolphin Interconnect Solutions ASA
R&D Germany
Siebengebirgsblick 26, 53343 Wachtberg, Germany
E-mail:
Jack Dongarra
University of Tennessee
Computer Science Department
1122 Volunteer Blvd, Knoxville, TN 37996-3450, USA
E-mail:
Library of Congress Control Number: 2006931769
CR Subject Classification (1998): D.1.3, D.3.2, F.1.2, G.1.0, B.2.1, C.1.2
LNCS Sublibrary: SL 2 – Programming and Software Engineering
ISSN 0302-9743
ISBN-10 3-540-39110-X Springer Berlin Heidelberg New York
ISBN-13 978-3-540-39110-4 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
© Springer-Verlag Berlin Heidelberg 2006
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper SPIN: 11846802 06/3142 543210
Preface
Since its inception in 1994 as a European PVM user’s group meeting, Eu-
roPVM/MPI has evolved into the foremost international conference dedicated to
the latest developments concerning MPI (Message Passing Interface) and PVM
(Parallel Virtual Machine). These include fundamental aspects of these message
passing standards, implementation, new algorithms and techniques, performance
and benchmarking, support tools, and applications using message passing. De-
spite its focus, EuroPVM/MPI is accommodating to new message-passing and
other parallel and distributed programming paradigms beyond MPI and PVM.
Over the years the meeting has successfully brought together developers, re-
searchers and users from both academia and industry. EuroPVM/MPI has con-
tributed to furthering the understanding of message passing programming in
these paradigms, and has positively influenced the quality of many implementa-
tions of both MPI and PVM through exchange of ideas and friendly competition.
EuroPVM/MPI takes place each year at a different European location, and
the 2006 meeting was the 13th in the series. Previous meetings were held in
Sorrento (2005), Budapest (2004), Venice (2003), Linz (2002), Santorini (2001),
Balatonf¨ured (2000), Barcelona (1999), Liverpool (1998), Cracow (1997), Munich
(1996), Lyon (1995), and Rome (1994). EuroPVM/MPI 2006 took place in Bonn,
Germany, 17 – 20 September, 2006, and was organized jointly by the C&C
Research Labs, NEC Europe Ltd., and the Research Center J¨ulich.
Contributions to EuroPVM/MPI 2006 were submitted in May as either full
papers or posters, or (with a later deadline) as full papers to the special session
ParSim on “Current Trends in Numerical Simulation for Parallel Engineering En-
vironments” (see page 356). Out of the 75 submitted full papers, 38 were selected
for presentation at the conference. Of the 9 submitted poster abstracts, 6 were cho-
sen for the poster session. The ParSim session received 11 submissions, of which
5 were selected for this special session. The task of reviewing was carried out
smoothly within very strict time limits by a large program committee and a num-
ber of external referees, counting members from most of the American and Euro-
pean groups involved in MPI and PVM development, as well as from significant
user communities. Almost all papers received 4 reviews, some even 5, and none
fewer than 3, which provided a solid basis for the program chairs to make the final
selection for the conference program. The result was a well-balanced and focused
program of high quality. All authors are thanked for their contribution to the con-
ference. Out of the accepted 38 papers, 3 were selected as outstanding contribu-
tions to EuroPVM/MPI 2006, and were presented in a special, plenary session:
– “Issues in Developing a Thread-Safe MPI Implementation” by William Gropp
and Rajeev Thakur (page 12)
– “Scalable Parallel Suffix Array Construction” by Fabian Kulla and Peter
Sanders (page 22)
VI Preface
– “Formal Verification of Programs That Use MPI One-Sided Communica-
tion” by Salman Pervez, Ganesh Gopalakrishnan, Robert M. Kirby, Rajeev
Thakur and William Gropp (page 30)
“Late and breaking results”, which were submitted in August as brief ab-
stracts and therefore not included in these proceedings, were presented in the
eponymous session. Like the “Outstanding Papers” session, this was a premiere
at EuroPVM/MPI 2006.
Complementing the emphasis in the call for papers on new message-passing
paradigms and programming models, the invited talks by Richard Graham,
William Gropp and Al Geist addressed possible shortcomings of MPI for emerg-
ing, large-scale systems, covering issues on fault-tolerance and heterogeneity,
productivity and scalability, while the invited talk of Katherine Yelick dealt
with advantages of higher-level, partitioned global address space languages. The
invited talk of Vaidy Sunderam discussed challenges to message-passing pro-
gramming in dynamic metacomputing environments. Finally, with the invited
talk of Ryutaro Himeno, the audience gained insight into the role and design of
the projected Japanese peta-scale supercomputer.
An important part of EuroPVM/MPI is the technically oriented vendor ses-
sion. At EuroPVM/MPI 2006 eight significant vendors of hard- and software for
high-performance computing (Etnus, IBM, Intel, NEC, Dolphin Interconnect So-
lutions, Hewlett-Packard, Microsoft, and Sun), presented their latest products
and developments.
Prior to the conference proper, four tutorials on various aspects of message
passing programming (“Using MPI-2: A Problem-Based Approach”, “Perfor-
mance Tools for Parallel Programming”, “High-Performance Parallel I/O”, and
“Hybrid MPI and OpenMP Parallel Programming”) were given by experts in
the respective fields.
Information about the conference can be found at the conference Web-site
, which will be kept available.
The proceedings were edited by Bernd Mohr, Jesper LarssonTr¨aff and Joachim
Worringen. The EuroPVM/MPI 2006 logo was designed by Bernd Mohr and
Joachim Worringen.
The program and general chairs would like to thank all who contributed to
making EuroPVM/MPI 2006 a fruitful and stimulating meeting, be they tech-
nical paper or poster authors, program committee members, external referees,
participants, or sponsors.
September 2006
E u r o
0 6
M
VP
M
V
P I
P I
P
`
Bernd Mohr
Jesper Larsson Tr¨aff
Joachim Worringen
Jack Dongarra
Organization
General Chair
Jack Dongarra University of Tennessee, USA
Program Chairs
Bernd Mohr Forschungszentrum J¨ulich, Germany
Jesper Larsson Tr¨aff C&C Research Labs, NEC Europe, Germany
Joachim Worringen C&C Research Labs, NEC Europe, Germany
Program Committee
George Almasi IBM, USA
Ranieri Baraglia CNUCE Institute, Italy
Richard Barrett ORNL, USA
Gil Bloch Mellanox, Israel
Arndt Bode Technical University of Munich, Germany
Marian Bubak AGH Cracow, Poland
Hakon Bugge Scali, Norway
Franck Cappello Universit´e de Paris-Sud, France
Barbara Chapman University of Houston, USA
Brian Coghlan Trinity College Dublin, Ireland
Yiannis Cotronis University of Athens, Greece
Jose Cunha New University of Lisbon, Portugal
Marco Danelutto University of Pisa, Italy
Frank Dehne Carleton University, Canada
Luiz DeRose Cray, USA
Frederic Desprez INRIA, France
Erik D’Hollander University of Ghent, Belgium
Beniamino Di Martino Second University of Naples, Italy
Jack Dongarra University of Tennessee, USA
Graham Fagg University of Tennessee, USA
Edgar Gabriel University of Houston, USA
Al Geist OakRidge National Laboratory, USA
Patrick Geoffray Myricom, USA
Michael Gerndt Tu M¨unchen, Germany
Andrzej Goscinski Deakin University, Australia
Richard L. Graham LANL, USA
William D. Gropp Argonne National Laboratory, USA
Erez Haba Microsoft, USA
VIII Organization
Program Committee (cont’d)
Rolf Hempel DLR - German Aerospace Center, Germany
Dieter Kranzlm¨uller Johannes Kepler Universit¨at Linz, Austria
Rainer Keller HLRS, Germany
Stefan Lankes RWTH Aachen, Germany
Erwin Laure CERN, Switzerland
Laurent Lefevre INRIA/LIP, France
Greg Lindahl QLogic, USA
Thomas Ludwig University of Heidelberg, Germany
Emilio Luque Universitat Aut`onoma de Barcelona, Spain
Ewing Rusty Lusk Argonne National Laboratory, USA
Tomas Margalef Universitat Aut`onoma de Barcelona, Spain
Bart Miller University of Wisconsin, USA
Bernd Mohr Forschungszentrum J¨ulich, Germany
Matthias M¨uller Dresden University of Technology, Germany
Salvatore Orlando University of Venice, Italy
Fabrizio Petrini PNNL, USA
Neil Pundit Sandia National Laboratories, USA
Rolf Rabenseifner HLRS, Germany
Thomas Rauber Universit¨at Bayreuth, Germany
Wolfgang Rehm TU Chemnitz, Germany
Casiano Rodriguez-Leon Universidad de La Laguna, Spain
Michiel Ronsse University of Ghent, Belgium
Peter Sanders Universit¨at Karlsruhe, Germany
Martin Schulz Lawrence Livermore National Laboratory, USA
Jeffrey Squyres Open System Lab, Indiana
Vaidy Sunderam Emory University, USA
Bernard Tourancheau Universit´edeLyon/INRIA,France
Jesper Larsson Tr¨aff C&C Research Labs, NEC Europe, Germany
Carsten Trinitis TU M¨unchen, Germany
Jerzy Wasniewski Danish Technical University, Denmark
Roland Wismueller University of Siegen, Germany
Felix Wolf Forschungszentrum J¨ulich, Germany
Joachim Worringen C&C Research Labs, NEC Europe, Germany
Laurence T. Yang St. Francis Xavier University, Canada
External Referees
(excluding members of the Program Committee)
Dorian Arnold
Christian Bell
Boris Bierbaum
Ron Brightwell
Michael Brim
Carsten Clauss
Rafael Corchuelo
Karen Devine
Frank Dopatka
Organization IX
G´abor D´ozsa
Renato Ferrini
Rainer Finocchiaro
Igor Grobman
Yuri Gurevich
Torsten H¨ofler
Andreas Hoffmann
Ralf Hoffmann
Sascha Hunold
Mauro Iacono
Adrian Kacso
Matthew Legendre
Frederic Loulergue
Ricardo Pe˜na Mar´ı
Torsten Mehlan
Frank Mietke
Alexander Mirgorodskiy
Francesco Moscato
Zsolt Nemeth
Raik Nagel
Raffaele Perego
Laura Ricci
Rolf Riesen
Francisco Fern´andez Ri-
vera
Nathan Rosenblum
John Ryan
Carsten Scholtes
Silke Schuch
Stephen F. Siegel
Nicola Tonellotto
Gara Miranda Valladares
Salvatore Venticinque
John Walsh
Zhaofang Wen
For the ParSim session the following external referees provided reviews.
Georg Acher
Tobias Klug
Michael Ott
Daniel Stodden
Max Walter
Josef Weidendorfer
Conference Organization
Bernd Mohr
Jesper Larsson Tr¨aff
Joachim Worringen
Sponsors
The conference would have been substantially more expensive and much less
pleasant to organize without the generous support of a good many industrial
sponsors. Platinum and Gold level sponsors also gave talks at the vendor ses-
sion on their latest products in parallel systems and message passing software.
EuroPVM/MPI 2006 gratefully acknowledges the contributions of the sponsors
to a successful conference.
Platinum Level Sponsors
Etnus, IBM, Intel, and NEC.
X Organization
Gold Level Sponsors
Dolphin Interconnect Solutions, Hewlett-Packard, Microsoft, and Sun.
Standard Level Sponsor
QLogic.
Table of Contents
Invited Talks
Too Big for MPI? 1
Al Geist
Approaches for Parallel Applications Fault Tolerance . 2
Richard L. Graham
Where Does MPI Need to Grow? 3
William D. Gropp
Peta-Scale Supercomputer Project in Japan and Challenges to Life and
Human Simulation in Japan 4
Ryutaro Himeno
Resource and Application Adaptivity in Message Passing Systems 5
Vaidy Sunderam
Performance Advantages of Partitioned Global Address Space
Languages 6
Katherine Yelick
Tutorials
Using MPI-2: A Problem-Based Approach . 7
William D. Gropp, Ewing Lusk
Performance Tools for Parallel Programming 8
Bernd Mohr, Felix Wolf
High-Performance Parallel I/O 10
Robert Ross, Joachim Worringen
Hybrid MPI and OpenMP Parallel Programming 11
Rolf Rabenseifner, Georg Hager, Gabriele Jost, Rainer Keller
Outstanding Papers
Issues in Developing a Thread-Safe MPI Implementation 12
William Gropp, Rajeev Thakur
XII Table of Contents
Scalable Parallel Suffix Array Construction 22
Fabian Kulla, Peter Sanders
Formal Verification of Programs That Use MPI One-Sided
Communication 30
Salman Pervez, Ganesh Gopalakrishnan, Robert M. Kirby,
Rajeev Thakur, William Gropp
Collective Communication
MPI Collective Algorithm Selection and Quadtree Encoding 40
Jelena Pjeˇsivac–Grbovi´c, Graham E. Fagg, Thara Angskun,
George Bosilca, Jack J. Dongarra
Parallel Prefix (Scan) Algorithms for MPI . 49
Peter Sanders, Jesper Larsson Tr¨aff
Efficient Allgather for Regular SMP-Clusters 58
Jesper Larsson Tr¨aff
Efficient Shared Memory and RDMABasedDesignforMPIAllgather
over InfiniBand 66
Amith Ranjith Mamidala, Abhinav Vishnu, Dhabaleswar K. Panda
Communication Protocols
High Performance RDMA Protocols in HPC . 76
Tim S. Woodall, Galen Mark Shipman, George Bosilca,
Richard L. Graham, Arthur B. Maccabe
Implementation and Shared-Memory Evaluation of MPICH2 over the
Nemesis Communication Subsystem 86
Darius Buntinas, Guillaume Mercier, William Gropp
MPI/CTP: A Reconfigurable MPI for HPC Applications 96
Manjunath Gorentla Venkata, Patrick G. Bridges
Debugging and Verification
Correctness Checking of MPI One-Sided Communication Using
Marmot 105
Bettina Krammer, Michael M. Resch
Table of Contents XIII
An Interface to Support the Identification of Dynamic MPI 2 Processes
for Scalable Parallel Debugging 115
Christopher Gottbrath, Brian Barrett, Bill Gropp,
Ewing “Rusty” Lusk, Jeff Squyres
Modeling and Verification of MPI Based Distributed Software 123
Igor Grudenic, Nikola Bogunovic
Fault Tolerance
FT-MPI, Fault-Tolerant Metacomputing and Generic Name Services:
A Case Study 133
David Dewolfs, Jan Broeckhove, Vaidy Sunderam,
Graham E. Fagg
Scalable Fault Tolerant Protocol for Parallel Runtime
Environments . 141
Thara Angskun, Graham E. Fagg, George Bosilca,
Jelena Pjeˇsivac–Grbovi´c, Jack J. Dongarra
An Intelligent Management of Fault Tolerance in Cluster Using
RADICMPI 150
Angelo A. Duarte, Dolores Rexachs, Emilio Luque
Extended mpiJava for Distributed Checkpointing and Recovery 158
Emilio Hern´andez, Yudith Cardinale, Wilmer Pereira
Metacomputing and Grid
Running PVM Applications on Multidomain Clusters . 166
Franco Frattolillo
Reliable Orchestration of Distributed MPI-Applications in a
UNICORE-Based Grid with MetaMPICH and MetaScheduling 174
Boris Bierbaum, Carsten Clauss, Thomas Eickermann,
Lidia Kirtchakova, Arnold Krechel, Stephan Springstubbe,
Oliver W¨aldrich, Wolfgang Ziegler
The New Multidevice Architecture of MetaMPICH in the Context of
Other Approaches to Grid-Enabled MPI 184
Boris Bierbaum, Carsten Clauss, Martin P¨oppe, Stefan Lankes,
Thomas Bemmerl
XIV Table of Contents
Using an Enterprise Grid for Execution of MPI Parallel
Applications – A Case Study 194
Adam K.L. Wong, Andrzej M. Goscinski
Parallel I/O
Self-adaptive Hints for Collective I/O . 202
Joachim Worringen
Exploiting Shared Memory to Improve Parallel I/O Performance 212
Andrew B. Hastings, Alok Choudhary
High-Bandwidth Remote Parallel I/O with the Distributed Memory
Filesystem MEMFS 222
Jan Seidel, Rudolf Berrendorf, Marcel Birkner,
Marc-Andr´e Hermanns
Effective Seamless Remote MPI-I/O Operations with Derived Data
Types Using PVFS2 230
Yuichi Tsujita
Implementation Issues
Automatic Memory Optimizations for Improving MPI Derived
Datatype Performance 238
Surendra Byna, Xian-He Sun, Rajeev Thakur, William Gropp
Improving the Dynamic Creation of Processes in MPI-2 247
M´arcia C. Cera, Guilherme P. Pezzi, Elton N. Mathias,
Nicolas Maillard, Philippe O.A. Navaux
Object-Oriented Message Passing
Non-blocking Java Communications Support on Clusters 256
Guillermo L. Taboada, Juan Touri˜no, Ram´on Doallo
Modernizing the C++ Interface to MPI . 266
Prabhanjan Kambadur, Douglas Gregor, Andrew Lumsdaine,
Amey Dharurkar
Limitations and Extensions
Can MPI Be Used for Persistent Parallel Services? 275
Robert Latham, Robert Ross, Rajeev Thakur
Table of Contents XV
Observations on MPI-2 Support for Hybrid Master/Slave Applications
in Dynamic and Heterogeneous Environments 285
Claudia Leopold, Michael S¨uß
What MPI Could (and Cannot) Do for Mesh-Partitioning on
Non-homogeneous Networks . 293
Guntram Berti, Jesper Larsson Tr¨aff
Performance
Scalable Parallel Trace-Based Performance Analysis 303
Markus Geimer, Felix Wolf, Brian J.N. Wylie, Bernd Mohr
TAUg: Runtime Global Performance Data Access Using MPI 313
Kevin A. Huck, Allen D. Malony, Sameer Shende, Alan Morris
Tracing the MPI-IO Calls’ Disk Accesses 322
Thomas Ludwig, Stephan Krempel, Julian Kunkel, Frank Panse,
Dulip Withanage
Measuring MPI Send and Receive Overhead and Application
Availability in High Performance Network Interfaces . 331
Douglas Doerfler, Ron Brightwell
Challenges and Issues in Benchmarking MPI 339
Keith D. Underwood
Implementation and Usage of the PERUSE-Interface in Open MPI 347
Rainer Keller, George Bosilca, Graham Fagg, Michael Resch,
Jack J. Dongarra
ParSim
Current Trends in Numerical Simulation for Parallel Engineering
Environments . 356
Martin Schulz, Carsten Trinitis
MPJ Express Meets Gadget: Towards a Java Code for Cosmological
Simulations 358
Mark Baker, Bryan Carpenter, Aamir Shafi
An Approach for Parallel Fluid-Structure Interaction on Unstructured
Meshes 366
Ulrich K¨uttler, Wolfgang A. Wall
XVI Table of Contents
Optimizing a Conjugate Gradient Solver with Non-Blocking Collective
Operations 374
Torsten Hoefler, Peter Gottschling, Wolfgang Rehm,
Andrew Lumsdaine
Parallel DSMC Gasflow Simulation of an In-Line Coater for Reactive
Sputtering 383
A. Pflug, M. Siemers, B. Szyszka
Parallel Simulation of T-M Processes in Underground Repository of
Spent Nuclear Fuel 391
Jiˇr´ıStar´y, Radim Blaheta, Ondˇrej Jakl, Roman Kohut
Poster Abstracts
On the Usability of High-Level Parallel IO in Unstructured Grid
Simulations 400
Dries Kimpe, Stefan Vandewalle, Stefaan Poedts
Automated Performance Comparison . 402
Joachim Worringen
Improved GROMACS Scaling on Ethernet Switched Clusters 404
Carsten Kutzner, David van der Spoel, Martin Fechner,
Erik Lindahl, Udo W. Schmitt, Bert L. de Groot,
Helmut Grubm¨uller
Asynchronity in Collective Operation Implementation 406
Alexandr Konovalov, Alexandr Kurylev, Anton Pegushin,
Sergey Scharf
PARUS: A Parallel Programming Framework for Heterogeneous
Multiprocessor Systems 408
Alexey N. Salnikov
Application of PVM to Protein Homology Search 410
Mitsuo Murata
Author Index . 413
Too Big for MPI?
Al Geist
Oak Ridge National Laboratory
Oak Ridge, Tennessee, USA
In 2008 the National Leadership Computing Facility at Oak Ridge National
Laboratory will have a petaflop system in place. This system will have tens of
thousands of processors and petabytes of memory. This capability system will
focus on application problems that are so hard that they require weeks on the
full system to achieve breakthrough science in nanotechnology, medicine, and
energy. With long running jobs on such huge computing systems the question
arises: Are the computers and applications getting too big for MPI? This talk
will address several reasons why the answer to this question may be yes.
The first reason is the growing need for fault tolerance. This talk will re-
view the recent efforts in adding fault tolerance to MPI and the broader need
for holistic fault tolerance across petascale machines. The second reason is the
potential need by these applications for new features or capabilities that don’t
exist in the MPI standard. A third reason is the emergence of new languages
and programming paradigms on the horizon.
This talk will discuss the DARPA High Productivity Computing Systems
project and the new languages, Fortress, Chapel, Fortress, and X10 being devel-
oped by Cray, Sun, and IBM respectively.
B. Mohr et al. (Eds.): PVM/MPI 2006, LNCS 4192, p. 1, 2006.
c
Springer-Verlag Berlin Heidelberg 2006
Approaches for Parallel Applications Fault
Tolerance
Richard L. Graham
Advanced Computing Laboratory
Los Alamos National Laboratory
Los Alamos, NM 87544, USA
System component failure - hardware and software, permanent and transient -
are an integral part of the life cycle of any computer system. The degree to which
a system suffers from these failures depends on factors such as system complex-
ity, system design and implementation, and system size. These errors may lead
to catastrophic application failure (termination of an application run with a CPU
failure), silent application errors (such as network data corruption), or application
hangs (such as when network interface card (NIC) malfunction), all wasting valu-
able computer time. For certain classes of computer systems, dealing with these
failures is a requirement to provide a simulation environment reliable enough to
meet end-user needs. Also, the more automated these solutions are, requiring min-
imal or no end-user intervention, the more likely they are to be used to achieve the
required application stability. Dealing with failure, or fault tolerance, while min-
imizing application performance degradation, is an active research area, with no
consensus as to what are optimal solution strategies, or even what failures need
to be considered. Errors include items such as transient data transmission errors
(dropped or corrupt packets), transient and permanent network failures (NIC),
and process failure, to list a few. The current MPI standard addresses a limited
number of failure scenarios, with application termination being the default re-
sponse to failure. While the standard provide a mechanism for users to override
this default response, it does not define error codes that provide information on
system level failures - hardware or software. None-the-less, these need to be ad-
dressed to provide end-users with systems that meet their computing needs. Build-
ing on experience gained in the LA-MPI, FT-MPI, and LAM/MPI projects, the
Open MPI collaboration has implemented, and is continuing to implement op-
tional solutions that deal with a number of failure scenarios, to decrease the appli-
cation mean-time-to-failure rate, to acceptable rates. The types of errors currently
being dealt with include transient network data transmission errors, transient and
permanent NIC failures, and process failure. The talk will discuss fault detection,
fault recovery methods, and the degree to which applications need to be modified
to benefit from these, if any. In addition, the performance impact of these solutions
on several applications will be discussed.
B. Mohr et al. (Eds.): PVM/MPI 2006, LNCS 4192, p. 2, 2006.
c
Springer-Verlag Berlin Heidelberg 2006
Where Does MPI Need to Grow?
William D. Gropp
Mathematics and Computer Science Division
Argonne National Laboratory
Argonne, Illinois, USA
MPI has been a successful parallel programming model. The combination of per-
formance, scalability, composability, and support for libraries has made it rela-
tively easy to build complex parallel applications. However, MPI is by no means
the perfect parallel programming model. This talk will review the strengths of
MPI with respect to other parallel programming models and discuss some of the
weaknesses and limitations of MPI in the areas of performance, productivity,
scalability, and interoperability. The talk will conclude with a discussion of what
extensions (or even changes) may be needed in MPI, and what issues should be
addressed by combining MPI with other parallel programming models.
B. Mohr et al. (Eds.): PVM/MPI 2006, LNCS 4192, p. 3, 2006.
c
Springer-Verlag Berlin Heidelberg 2006
Peta-Scale Supercomputer Project in Japan and
Challenges to Life and Human Simulation in
Japan
Ryutaro Himeno
RIKEN Advanced Center for Computing and Communication
Hirosawa 2-1, Wako, Saitama 351-0198, Japan
After two-year-long discussion, we are about to start the Peta-Scale Supercom-
puter project. The target performance is currently 10 Peta FLOPS for a few
selected codes and one Peta FLOPS for major applications. It will start opera-
tion in March, 2011, and then its capability will be enlarged in 2011. Finally, it
will be completed in March, 2012. The project will end in March, 2013.
This project includes two important items in software development: grid mid-
dleware and application software in Nano Science and Life Science. The devel-
opment in grid middleware is planed because the supercomputer center which
will operate the Peta-scale supercomputer is planed to provide services not for a
specific institute or application area like the Earth Simulator but for general uses
as a national infrastructure. Nano and Life sciences are the major application
areas we are going to put emphases on as well as industrial applications.
We are starting to select the target applications to make a benchmark suite
in various scientific and industrial applications. We are also discussing concept
design and will finalize it in Summer, 2006. I will introduce the project plan and
application area, especially in Life science, in detail at the conference.
B. Mohr et al. (Eds.): PVM/MPI 2006, LNCS 4192, p. 4, 2006.
c
Springer-Verlag Berlin Heidelberg 2006
Resource and Application Adaptivity in Message
Passing Systems
Vaidy Sunderam
Department of Mathematics and Computer Science
Emory University
Atlanta, Georgia, USA
Clusters and MPP’s are traditional platforms for message passing applications,
but there is growing interest in more dynamic metacomputing environments.
The latter are characterized by dynamicity in availability and available capacity
– of both nodes and interconnects. This talk will discuss fundamental challenges
in executing message passing programs in such environments, and analyze the
issue of adaptivity from the resource and application points of view. Pragmatic
solutions to some of these challenges will then be described, along with new
approaches to dealing with the aggregation of multidomain computing platforms
for distributed memory concurrent computing.
B. Mohr et al. (Eds.): PVM/MPI 2006, LNCS 4192, p. 5, 2006.
c
Springer-Verlag Berlin Heidelberg 2006
Performance Advantages of Partitioned Global
Address Space Languages
Katherine Yelick
Computer Science Division
University of California at Berkeley
Berkeley, California, USA
For nearly a decade, the Message Passing Interface (MPI) has been the domi-
nant programming model for high performance parallel computing, in large part
because it is universally available and scales to thousands of processors. In this
talk I will describe some of the alternatives to MPI based on a Partitioned Global
Address Space model of programming, such as UPC and Titanium. I will show
that these models offer significant advantages in performance as well as pro-
grammer productivity, because they allow the programmer to build global data
structures and perform one-sided communication in the form of remote reads
and writes, while still giving programmers control over data layout. In particu-
lar, I will show that these languages make more effective use of cluster networks
with RDMA support, allowing them to outperform two-sided communication on
both microbenchmarks and bandwidth-limited computational problems, such as
global FFTs. The key optimization is overlap of communication with computa-
tion and pipelining communication. Surprisingly, sending smaller messages more
frequently can be faster than a few large messages if overlap with computation
is possible. This creates an interesting open problem for global scheduling of
communication, since the simple strategy of maximum aggregation is not always
best. I will also show some of the productivity advantages of these languages
through application case studies, including complete Titanium implementations
of two different application frameworks: an immersed boundary method package
and an elliptic solver using adaptive mesh refinement.
B. Mohr et al. (Eds.): PVM/MPI 2006, LNCS 4192, p. 6, 2006.
c
Springer-Verlag Berlin Heidelberg 2006
Using MPI-2: A Problem-Based Approach
William D. Gropp and Ewing Lusk
Mathematics and Computer Science Division
Argonne National Laboratory
Argonne, Illinois, USA
{gropp, lusk}@mcs.anl.gov
MPI-2 introduced many new capabilities, including dynamic process manage-
ment, one-sided communication, and parallel I/O. Implementations of these fea-
tures are becoming widespread. This tutorial shows how to use these features by
showing all of the steps involved in designing, coding, and tuning solutions to
specific problems. The problems are chosen for their practical use in applications
as well as for their ability to illustrate specific MPI-2 topics. Complete examples
will be discussed and full source code will be made available to the attendees.
B. Mohr et al. (Eds.): PVM/MPI 2006, LNCS 4192, p. 7, 2006.
c
Springer-Verlag Berlin Heidelberg 2006
Performance Tools for Parallel Programming
Bernd Mohr and Felix Wolf
Research Centre J¨ulich
J¨ulich, Germany
{b.mohr, f.wolf}@fz-juelich.de
Extended Abstract. Application developers are facing new and more compli-
catedperformancetuningandoptimizationproblemsasarchitecturesbecomemore
complex. In order to achieve reasonable performance on these systems, HPC users
need help from performance analysis tools. In this tutorial we will introduce the
principles of experimentalperformance instrumentation,measurement, and analy-
sis, with an overview of the major issues, techniques, and resources in performance
tools development, as well as an overview of the performance measurement tools
available from vendors and research groups.
The focus of this tutorial will be on experimental performance analysis, which
is currently the method of choice for tuning large-scale, parallel systems. The
goal of experimental performance analysis is to provide the data and insights re-
quired to optimize the execution behavior of applications or system components.
Using such data and insights, application and system developers can choose to
optimize software and execution environments along many axes, including execu-
tion time, memory requirements, and resource utilization. In this tutorial we will
present a broad range of techniques used for the development of software for per-
formance measurement and analysis of scientific applications. These techniques
range from mechanisms for simple code timings to multi-level hardware/software
measurements. In addition, we will present state of the art tools from research
groups, as well as software and hardware vendors, including practical tips and
tricks on how to use them for performance tuning.
When designing, developing, or using a performance tool, one has to decide on
which instrumentation technique to use. We will cover the main instrumentation
techniques,whichcanbedividedintoeitherstatic,duringcodedevelopment,compi-
lation,orlinking,ordynamic,duringexecution.Themostcommoninstrumentation
approachaugmentssourcecodewithcallstospecificinstrumentationlibraries.Dur-
ing execution, these library routines collect behavioral data. One example of static
instrumentation systems that will be covered in details is the MPI profiling inter-
face, whichispart of the MPI specification, and was definedtoprovideamechanism
for quick development of performance analysissystem for parallel programs.In ad-
dition, we willpresentsimilar work (POMP, OPARI) that has been proposed in the
context of OpenMP. In contrast to static instrumentation, dynamic instrumenta-
tionallowsusers tointeractivelychangeinstrumentationpoints,focusingmeasure-
mentsoncoderegionswhereperformanceproblemshavebeendetected.Anexample
of such dynamic instrumentation systems is the DynInst project from the Univer-
sity of Maryland and University of Wisconsin, which provides an infrastructure to
B. Mohr et al. (Eds.): PVM/MPI 2006, LNCS 4192, pp. 8–9, 2006.
c
Springer-Verlag Berlin Heidelberg 2006
Performance Tools for Parallel Programming 9
helptoolsdeveloperstobuildperformancetools.Wewillcompareandcontrastthese
instrumentation approaches.
Regardless of the instrumentation mechanism, there are two dimensions that
need to be considered for performance data collection: when the performance
collection is triggered and how the performance data is recorded. The triggering
mechanism can be activated by an external agent, such as a timer or a hardware
counter overflow, or internally, by code inserted through instrumentation. The
former is also known as sampling or asynchronous, while the latter is sometimes
referred as synchronous. Performance data can be summarized during runtime
and stored in the form of a profile, or can be stored in the form of traces. We
will present these approaches and discuss how each one reflects a different bal-
ance among data volume, potential instrumentation perturbation, accuracy, and
implementation complexity. Performance data should be stored in a format that
allows the generality and extensibility necessary to represent a diverse set of
performance metrics and measurement points, independent of language and ar-
chitecture idiosyncrasies. We will describe common trace file formats (Vampir,
CLOG, SLOG, EPILOG), as well as profile data formats based on the eXten-
sible Markup Language (XML), which is becoming a standard for describing
performance data representation.
Hardware performance counters have become an essential asset for application
performance tuning. We will discuss in detail how users can access hardware
performance counters using application programming interfaces such as PAPI
and PCL, in order to correlate the behavior of the application to one or more of
the components of the hardware.
Visualization systems should provide natural and intuitive user interfaces,
as well as, methods for users to manipulate large data collections, such that
they could grasp essential features of large performance data sets. In addition,
given the diversity of performance data, and the fact that performance problems
can arise at several levels, visualization systems should also be able to provide
multiple levels of details, such that users could focus on interesting yet complex
behavior while avoiding irrelevant or unnecessary details. We will discuss the
different visualization and presentation approaches currently used on state of
the art research tools, as well as tools from software and hardware vendors.
The tutorial will be concluded with discussion on open research problems.
Given the complexity of the state of the art of parallel applications, new per-
formance tools must be deeply integrated, combining instrumentation, measure-
ment, data analysis, and visualization. In addition, they should be able to guide
or perform performance remediation. Ideally, these environments should scale to
hundreds or thousands of processors, support analysis of distributed computa-
tions, and be portable across a wide range of parallel systems. Also, performing a
whole series of experiments (studies) should be supported to allow a comparative
or scalability analysis. We will discuss research efforts in automating the process
of performance analysis such as the projects under the APART working group
effort. We conclude the tutorial with a discussion on issues related to analysis
of grid applications.
High-Performance Parallel I/O
Robert Ross
1
and Joachim Worringen
2
1
Mathematics and Computer Science Division
Argonne National Laboratory
Argonne, Illinois, USA
2
Dolphin Interconnect Solutions R&D Germany
Wachtberg, Germany
Effectively using I/O resources on HPC machines is a black art. The purpose
of this tutorial is to shed light on the state-of-the-art in parallel I/O and to
provide the knowledge necessary for attendees to best leverage the I/O resources
available to them.
In the first half of the tutorial we discuss the software involved in parallel
I/O. We cover the entire I/O software stack from parallel file systems at the
lowest layer, to intermediate layers (such as MPI-IO), and finally high-level I/O
libraries (such as HDF-5). The emphasis is not just on how to use these layers,
but ways to use them that result in high performance. As part of this discussion
we will present benchmark results from current systems.
The second half of the tutorial will be hands-on, with the participants solving
typical problems of parallel I/O using different approaches. The performance of
these approaches will be evaluate on different machines at remote sites, using
various types of file systems. The results are then compared to get a full picture
of the performance differences and characteristics of the chosen approaches on
the different platforms.
Basic knowledge of parallel (MPI) programming in C and/or Fortran is as-
sumed. For the second half, each participant should bring his own notebook
computer, running either Windows XP or Linux (x86). A limited number of
loan notebook computers are available on request.
B. Mohr et al. (Eds.): PVM/MPI 2006, LNCS 4192, p. 10, 2006.
c
Springer-Verlag Berlin Heidelberg 2006