Tải bản đầy đủ (.pdf) (16 trang)

Tài liệu Grid Computing P30 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (118.09 KB, 16 trang )

30
Distributed object-based Grid
computing environments
Tomasz Haupt
1
and Marlon E. Pierce
2
1
Mississippi State University, Starkville, Mississippi, United States,
2
Indiana University,
Bloomington, Indiana, United States
30.1 INTRODUCTION
Computational Grid technologies hold the promise of providing global scale distributed
computing for scientific applications. The goal of projects such as Globus [1], Legion [2],
Condor [3], and others is to provide some portion of the infrastructure needed to sup-
port ubiquitous, geographically distributed computing [4, 5]. These metacomputing tools
provide such services as high-throughput computing, single login to resources distributed
across multiple organizations, and common Application Programming Interfaces (APIs)
and protocols for information, job submission, and security services across multiple orga-
nizations. This collection of services forms the backbone of what is popularly known as
the computational Grid, or just the Grid.
The service-oriented architecture of the Grid, with its complex client tools and pro-
gramming interfaces, is difficult to use for the application developers and end users. The
perception of complexity of the Grid environment comes from the fact that often Grid
Grid Computing – Making the Global Infrastructure a Reality. Edited by F. Berman, A. Hey and G. Fox

2003 John Wiley & Sons, Ltd ISBN: 0-470-85319-0
714
TOMASZ HAUPT AND MARLON E. PIERCE
services address issues at levels that are too low for the application developers (in terms


of API and protocol stacks). Consequently, there are not many Grid-enabled applications,
and in general, the Grid adoption rate among the end users is low.
By way of contrast, industry has undertaken enormous efforts to develop easy user inter-
faces that hide the complexity of underlying systems. Through Web portals the user has
access to a wide variety of services such as weather forecasts, stock market quotes and on-
line trading, calendars, e-mail, auctions, air travel reservations and ticket purchasing, and
many others yet to be imagined. It is the simplicity of the user interface, which hides all
implementation details from the user, that has contributed to the unprecedented success of
the idea of a Web browser.
Grid computing environments (GCEs) such as computational Web portals are an exten-
sion of this idea. GCEs are used for aggregating, managing, and delivering grid services to
end users, hiding these complexities behind user-friendly interfaces. Computational Web
portal takes advantage of the technologies and standards developed for Internet comput-
ing such as HTTP, HTML, XML, CGI, Java, CORBA [6, 7], and Enterprise JavaBeans
(EJB) [8], using them to provide browser-based access to High Performance Computing
(HPC) systems (both on the Grid and off). A potential advantage of these environments
also is that they may be merged with more mainstream Internet technologies, such as
information delivery and archiving and collaboration.
Besides simply providing a good user interface, computing portals designed around dis-
tributed object technologies provide the concept of persistent state to the Grid. The Grid
infrastructure is implemented as a bag of services. Each service performs a particular trans-
action following a client-server model. Each transaction is either stateless or supports only
a conversional state. This model closely resemble HTTP-based Web transaction model: the
user makes a request by pointing the Web browser to a particular URL, and a Web server
responds with the corresponding, possibly dynamically generated, HTML page. However,
the very early Web developers found this model too restrictive. Nowadays, most Web
servers utilize object- or component-oriented technologies, such as EJB or CORBA, for
session management, multistep transaction processing, persistence, user profiles, providing
enterprise-wide access to resources including databases and for incorporating third-party
services. There is a remarkable similarity between the current capabilities of the Web

servers (the Web technologies), augmented with Application Servers (the Object and Com-
ponent Technologies), and the required functionality of a Grid Computing Environment.
This paper provides an overview of Gateway and Mississippi Computational Web
Portal (MCWP). These projects are being developed separately at Indiana University and
Mississippi State University, respectively, but they share a common design heritage. The
key features of both MCWP and Gateway are the use of XML for describing portal
metadata and the use of distributed object technologies in the control tier.
30.2 DEPLOYMENT AND USE OF COMPUTING
PORTALS
In order to make concrete the discussion presented in the introduction, we describe below
our deployed portals. These provide short case studies on the types of portal users and
the services that they require.
DISTRIBUTED OBJECT-BASED GRID COMPUTING ENVIRONMENTS
715
30.2.1 DMEFS: an application of the Mississippi Computational Web Portal
The Distributed Marine Environment Forecast System (DMEFS) [9] is a project of the
Mississippi State team that is funded by the Office of Naval Research. DMEFS’s goal is
to provide open framework to simulate the littoral environments across many temporal
and spatial scales in order to accelerate the evolution of timely and accurate forecasting.
DMEFS is expected to provide a means for substantially reducing the time to develop,
prototype, test, validate, and transition simulation models to operation, as well as support
a genuine, synergistic collaboration among the scientists, the software engineers, and the
operational users. In other words, the resulting system must provide an environment for
model development, including model coupling, model validation and data analysis, routine
runs of a suite of forecasts, and decision support.
Such a system has several classes of users. The model developers are expected to be
computer savvy domain specialists. On the other hand, operational users who routinely
run the simulations to produce daily forecasts have only a limited knowledge on how the
simulations actually work, while the decision support is typically interested only in accessing
the end results. The first type of users typically benefits from services such as archiving and

data pedigree as well as support for testing and validation. The second type of users benefits
from an environment that simplifies the complicated task of setting up and running the
simulations, while the third type needs ways of obtaining and organizing results.
DMEFS is in its initial deployment phase at the Naval Oceanographic Office Major
Shared Resource Center (MSRC). In the next phase, DMEFS will develop and inte-
grate metadata-driven access to heterogenous, distributed data sources (databases, data
servers, scientific instruments). It will also provide support for data quality assessment,
data assimilation, and model validation.
30.2.2 Gateway support for commodity codes
The Gateway computational Web portal is deployed at the Army Research Laboratory
MSRC, with additional deployment approved for the Aeronautical Systems Center MSRC.
Gateway’s initial focus has been on simplifying access to commercial codes for novice
HPC users. These users are assumed to understand the preprocessing and postprocessing
tools of their codes on their desktop PC or workstation but not to be familiar with
common HPC tasks such as queue script writing and job submission and management.
Problems using HPC systems are often aggravated by the use of different queuing systems
between and even within the same center, poor access for remote users caused by slow
network speeds at peak hours, changing locations for executables, and licensing issues
for commercial codes. Gateway attempts to hide or manage as much of these details
as possible, while providing a browser front end that encapsulates sets of commands
into relatively few portal actions. Currently, Gateway supports job creation, submission,
monitoring, and archiving for ANSYS, ZNS, and Fluent, with additional support planned
for CTH. Gateway interfaces to these codes are currently being tested by early users.
Because Gateway must deal with applications with restricted source codes, we wrap
these codes in generic Java proxy objects that are described in XML. The interfaces for
the invocation of these services likewise are expressed in XML, and we are in the process
716
TOMASZ HAUPT AND MARLON E. PIERCE
of converting our legacy service description to the Web service standard Web Services
Description Language (WSDL) [10].

Gateway also provides secure file transfer, job monitoring and job management through a
Web browser interface. These are currently integrated with the application interfaces but have
proven popular on their own and so will be provided as stand-alone services in the future.
Future plans for Gateway include integration with the Interdisciplinary Computing Envi-
ronment (ICE) [11], which provides visualization tools and support for light code coupling
through a common data format. Gateway will support secure remote job creation and man-
agement for ICE-enabled codes, as well as secure, remote, sharable visualization services.
30.3 COMPUTING PORTAL SERVICES
One may build computational environments such as the one above out of a common set of
core services. We list the following as the base set of abstract service definitions, which
may be (but are not necessarily) implemented more or less directly with typical Grid
technologies in the portal middle tier.
1. Security: Allow access only to authenticated users, give them access only to authorized
areas, and keep all communications private.
2. Information resources: Inform the user about available codes and machines.
3. Queue script generation: On the basis of the user’s choice of code and host, create a
script to run the job for the appropriate queuing system.
4. Job submission: Through a proxy process, submit the job with the selected resources
for the user.
5. Job monitoring: Inform the user of the status of his submitted jobs, and more generally
provide events that allow loosely coupled applications to be staged.
6. File transfer and management : Allow the user to transfer files between his desktop
computer and a remote system and to transfer files between remote systems.
Going beyond the initial core services above, both MCWP and Gateway have identified
and have or are in the process of implementing the following GCE-specific services.
1. Metadata-driven resource allocation and monitoring: While indispensable for acquir-
ing adequate resources for an application, allocation of remote resources adds to the
complexity of all user tasks. To simplify this chore, one requires a persistent and
platform-independent way to express computational tasks. This can be achieved by the
introduction of application metadata. This user service combines standard authentica-

tion, information, resource allocation, and file transfer Grid services with GCE services:
metadata discovery, retrieval and processing, metadata-driven Resource Specification
Language (RSL) (or batch script) generation, resource brokerage, access to remote file
systems and data servers, logging, and persistence.
2. Task composition or workflow specification and management: This user service auto-
mates mundane user tasks with data preprocessing and postprocessing, file transfers,
format conversions, scheduling, and so on. It replaces the nonportable ‘spaghetti’ shell
DISTRIBUTED OBJECT-BASED GRID COMPUTING ENVIRONMENTS
717
scripts currently widely used. It requires task composition tools capable of describing
the workflow in a platform-independent way, since some parts of the workflow may be
preformed on remote systems. The workflow is built hierarchically from reusable mod-
ules (applications), and it supports different mechanisms for triggering execution of
modules: from static sequences with branches to data flow to event-driven systems. The
workflow manager combines information, resource brokers, events, resource allocation
and monitoring, file transfer, and logging services.
3. Metadata-driven, real-time data access service: Certain simulation types perform
assimilation of observational data or analyze experimental data in a real time. These
data are available from many different sources in a variety of formats. Built on top of
the metadata, file transfer and persistence services, this user service closely interacts
with the resource allocation and monitoring or workflow management services.
4. User space, persistency, and pedigree service: This user service provides support for
reuse and sharing of applications and their configuration, as well as for preserving the
pedigree of all jobs submitted by the user. The pedigree information allows the user
to reproduce any previous result on the one hand and to localize the product of any
completed job on the other. It collects data generated by other services, in particular,
by the resource allocation and workflow manager.
30.4 GRID PORTAL ARCHITECTURE
A computational Web portal is implemented as a multitier system composed of clients
running on the users’ desktops or laptops, portal servers providing user level services (i.e.

portal middleware), and backend servers providing access to the computing resources.
30.4.1 The user interface
The user interacts with the portal through either a Web browser, a client application, or
both. The central idea of both the Gateway and the MCWP user interfaces is to allow
users to organize their work into problem contexts, which are then subdivided into session
contexts in Gateway terminology, or projects and tasks using MCWP terms. Problems
(or projects) are identified by a descriptive name handle provided by the user, with
sessions automatically created and time-stamped to give them unique names. Within a
particular session (or task), the user chooses applications to run and selects computing
resources to use. This interface organization is mapped to components in the portal mid-
dleware (user space, persistency, and pedigree services) described below. In both cases,
the Web browser–based user interface is developed using JavaServer Pages (JSP), which
allow us to dynamically generate Web content and interface easily with our Java-based
middleware.
The Gateway user interface provides three tracks: code selection, problem archive, and
administration. The code selection track allows the user to start a new problem, make
an initial request for resources, and submit the job request to the selected host’s queuing
system. The problem archive allows the user to revisit and edit old problem sessions so
that he/she can submit his/her job to a different machine, use a different input file, and
718
TOMASZ HAUPT AND MARLON E. PIERCE
so forth. Changes to a particular session are stored in a newly generated session name.
The administration track allows privileged users to add applications and host computers
to the portal, modify the properties of these entities, and verify their installation. This
information is stored in an XML data record, described below.
The MCWP user interface provides five distinct views of the system, depending on the
user role: developer, analyst, operator, customer, and administrator. The developer view
combines the selection and archive tracks. The analyst view provides tools for data selec-
tion and visualizations. The operator view allows for creating advance scheduling of tasks
for routine runs (similar to creating a cron table). The customer view allows access to rou-

tinely generated and postprocessed results (plots, maps, and so forth). Finally, the adminis-
trator view allows configuration and controlling of all operations performed by the portal.
30.4.2 Component-based middleware
The portal middleware naturally splits into two layers: the actual implementation of the
user services and the presentation layer responsible for providing mechanisms for the
user interactions with the services. The presentation layer accepts the user requests and
returns the service responses. Depending on the implementation strategy for the client,
the services’ responses are directly displayed in the Web browser or consumed by the
client-side application.
A key feature of both Gateway and MCWP is that they provide a container-based
middle tier that holds and manages the (distributed) proxy wrappers for basic services
like those listed above. This allows us to build user interfaces to services without worrying
about the implementation of those services. Thus, for example, we may implement the
portal using standard service implementations from the Globus toolkit, we may implement
some core services ourselves for stand-alone resources, or we may implement the portal
as a mixture of these different service implementation styles.
The Gateway middle tier consists of two basic sections: a Web server running a servlet
engine and a distributed CORBA-based middle tier (WebFlow). This is illustrated in
Figure 30.1. The Web server typically runs a single Java Virtual Machine (JVM) on
a single server host that contains local JavaBean components. These components may
implement specific local services or they may act as proxies for WebFlow-distributed
components running in different JVMs on a nest of host computers. WebFlow servers
consist of a top-level master server and any number of child servers. The master server
acts as a gatekeeper and manages the life cycle of the children. These child servers can in
turn provide access to remote backend services such as HPCs running Portal Batch System
(PBS) or Load Sharing Facility (LSF) queuing systems, a Condor flock, a Globus grid,
and data storage devices. By running different WebFlow child servers on different hosts,
we may easily span organizational barriers in a lightweight fashion. For more information
on the WebFlow middleware, see References [12, 13, 14]. For a general overview of the
role of commodity technologies in computational Grids, see Reference [15].

The MCWP application server is implemented using EJB. The user space is a hierarchy
of entities: users, projects, tasks, and applications. The abstract application metadata tree is
implemented as entity beans as well with the host-independent information as one database
table and host-dependent information as another one. Finally, there are two entities related

×