Grid Computing P6

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (204.84 KB, 28 trang )

PART B
Grid architecture and
technologies
Reprint from International Journal of High Performance Computing Applications

2001 Sage Publications,
Inc. (USA). Minor changes to the original have been made to conform with house style.
6
The anatomy of the Grid
Enabling Scalable Virtual Organizations
∗
Ian Foster,
1,2
Carl Kesselman,
3
and Steven Tuecke
1
1
Mathematics and Computer Science Division, Argonne National Laboratory, Argonne,
Illinois, United States,
2
Department of Computer Science, The University of Chicago,
Chicago, Illinois, United States,
3
Information Sciences Institute, The University of
Southern California, California, United States
6.1 INTRODUCTION
The term ‘the Grid’ was coined in the mid-1990s to denote a proposed distributed com-
puting infrastructure for advanced science and engineering [1]. Considerable progress has
since been made on the construction of such an infrastructure (e.g., [2–5]), but the term
‘Grid’ has also been conﬂated, at least in popular perception, to embrace everything from

advanced networking to artiﬁcial intelligence. One might wonder whether the term has
any real substance and meaning. Is there really a distinct ‘Grid problem’ and hence a need
for new ‘Grid technologies’? If so, what is the nature of these technologies, and what
is their domain of applicability? While numerous groups have interest in Grid concepts
∗
To appear: Intl J. Supercomputer Applications, 2001.
Grid Computing – Making the Global Infrastructure a Reality. Edited by F. Berman, A. Hey and G. Fox

2003 John Wiley & Sons, Ltd ISBN: 0-470-85319-0
172
IAN FOSTER, CARL KESSELMAN, AND STEVEN TUECKE
and share, to a signiﬁcant extent, a common vision of Grid architecture, we do not see
consensus on the answers to these questions.
Our purpose in this article is to argue that the Grid concept is indeed motivated by
a real and speciﬁc problem and that there is an emerging, well-deﬁned Grid technology
base that addresses signiﬁcant aspects of this problem. In the process, we develop a
detailed architecture and roadmap for current and future Grid technologies. Furthermore,
we assert that while Grid technologies are currently distinct from other major technology
trends, such as Internet, enterprise, distributed, and peer-to-peer computing, these other
trends can beneﬁt signiﬁcantly from growing into the problem space addressed by Grid
technologies.
The real and speciﬁc problem that underlies the Grid concept is coordinated resource
sharing and problem solving in dynamic, multi-institutional virtual organizations.The
sharing that we are concerned with is not primarily ﬁle exchange but rather direct access
to computers, software, data, and other resources, as is required by a range of collabora-
tive problem-solving and resource-brokering strategies emerging in industry, science, and
engineering. This sharing is, necessarily, highly controlled, with resource providers and
consumers deﬁning clearly and carefully just what is shared, who is allowed to share,
and the conditions under which sharing occurs. A set of individuals and/or institutions
deﬁned by such sharing rules form what we call a virtual organization (VO).

The following are examples of VOs: the application service providers, storage service
providers, cycle providers, and consultants engaged by a car manufacturer to perform
scenario evaluation during planning for a new factory; members of an industrial con-
sortium bidding on a new aircraft; a crisis management team and the databases and
simulation systems that they use to plan a response to an emergency situation; and mem-
bers of a large, international, multiyear high-energy physics collaboration. Each of these
examples represents an approach to computing and problem solving based on collaboration
in computation- and data-rich environments.
As these examples show, VOs vary tremendously in their purpose, scope, size, duration,
structure, community, and sociology. Nevertheless, careful study of underlying technology
requirements leads us to identify a broad set of common concerns and requirements. In
particular, we see a need for highly ﬂexible sharing relationships, ranging from client-
server to peer-to-peer; for sophisticated and precise levels of control over how shared
resources are used, including ﬁne-grained and multistakeholder access control, delegation,
and application of local and global policies; for sharing of varied resources, ranging from
programs, ﬁles, and data to computers, sensors, and networks; and for diverse usage
modes, ranging from single user to multiuser and from performance sensitive to cost-
sensitive and hence embracing issues of quality of service, scheduling, co-allocation,
and accounting.
Current distributed computing technologies do not address the concerns and require-
ments just listed. For example, current Internet technologies address communication and
information exchange among computers but do not provide integrated approaches to
the coordinated use of resources at multiple sites for computation. Business-to-business
exchanges [6] focus on information sharing (often via centralized servers). So do virtual
THE ANATOMY OF THE GRID
173
enterprise technologies, although here sharing may eventually extend to applications
and physical devices (e.g., [7]). Enterprise distributed computing technologies such as
CORBA and Enterprise Java enable resource sharing within a single organization. The
Open Group’s Distributed Computing Environment (DCE) supports secure resource shar-

ing across sites, but most VOs would ﬁnd it too burdensome and inﬂexible. Storage
service providers (SSPs) and application service providers (ASPs) allow organizations to
outsource storage and computing requirements to other parties, but only in constrained
ways: for example, SSP resources are typically linked to a customer via a virtual private
network (VPN). Emerging ‘distributed computing’ companies seek to harness idle com-
puters on an international scale [31] but, to date, support only highly centralized access to
those resources. In summary, current technology either does not accommodate the range
of resource types or does not provide the ﬂexibility and control on sharing relationships
needed to establish VOs.
It is here that Grid technologies enter the picture. Over the past ﬁve years, research and
development efforts within the Grid community have produced protocols, services, and
tools that address precisely the challenges that arise when we seek to build scalable VOs.
These technologies include security solutions that support management of credentials and
policies when computations span multiple institutions; resource management protocols
and services that support secure remote access to computing and data resources and the
co-allocation of multiple resources; information query protocols and services that provide
conﬁguration and status information about resources, organizations, and services; and
data management services that locate and transport datasets between storage systems and
applications.
Because of their focus on dynamic, cross-organizational sharing, Grid technologies
complement rather than compete with existing distributed computing technologies. For
example, enterprise distributed computing systems can use Grid technologies to achieve
resource sharing across institutional boundaries; in the ASP/SSP space, Grid technologies
can be used to establish dynamic markets for computing and storage resources, hence
overcoming the limitations of current static conﬁgurations. We discuss the relationship
between Grids and these technologies in more detail below.
In the rest of this article, we expand upon each of these points in turn. Our objectives
are to (1) clarify the nature of VOs and Grid computing for those unfamiliar with the
area; (2) contribute to the emergence of Grid computing as a discipline by establishing
a standard vocabulary and deﬁning an overall architectural framework; and (3) deﬁne

clearly how Grid technologies relate to other technologies, explaining both why emerging
technologies do not yet solve the Grid computing problem and how these technologies
can beneﬁt from Grid technologies.
It is our belief that VOs have the potential to change dramatically the way we use
computers to solve problems, much as the Web has changed how we exchange infor-
mation. As the examples presented here illustrate, the need to engage in collaborative
processes is fundamental to many diverse disciplines and activities: it is not limited to
science, engineering, and business activities. It is because of this broad applicability of
VO concepts that Grid technology is important.
174
IAN FOSTER, CARL KESSELMAN, AND STEVEN TUECKE
6.2 THE EMERGENCE OF VIRTUAL ORGANIZATIONS
Consider the following four scenarios:
1. A company needing to reach a decision on the placement of a new factory invokes
a sophisticated ﬁnancial forecasting model from an ASP, providing it with access to
appropriate proprietary historical data from a corporate database on storage systems
operated by an SSP. During the decision-making meeting, what-if scenarios are run
collaboratively and interactively, even though the division heads participating in the
decision are located in different cities. The ASP itself contracts with a cycle provider
for additional ‘oomph’ during particularly demanding scenarios, requiring of course
that cycles meet desired security and performance requirements.
2. An industrial consortium formed to develop a feasibility study for a next-generation
supersonic aircraft undertakes a highly accurate multidisciplinary simulation of the
entire aircraft. This simulation integrates proprietary software components developed
by different participants, with each component operating on that participant’s comput-
ers and having access to appropriate design databases and other data made available
to the consortium by its members.
3. A crisis management team responds to a chemical spill by using local weather and soil
models to estimate the spread of the spill, determining the impact based on population
location as well as geographic features such as rivers and water supplies, creating

a short-term mitigation plan (perhaps based on chemical reaction models), and task-
ing emergency response personnel by planning and coordinating evacuation, notifying
hospitals, and so forth.
4. Thousands of physicists at hundreds of laboratories and universities worldwide come
together to design, create, operate, and analyze the products of a major detector at
CERN, the European high energy physics laboratory. During the analysis phase, they
pool their computing, storage, and networking resources to create a ‘Data Grid’ capable
of analyzing petabytes of data [8–10].
These four examples differ in many respects: the number and type of participants, the
types of activities, the duration and scale of the interaction, and the resources being shared.
But they also have much in common, as discussed in the following (see also Figure 6.1).
In each case, a number of mutually distrustful participants with varying degrees of
prior relationship (perhaps none at all) want to share resources in order to perform some
task. Furthermore, sharing is about more than simply document exchange (as in ‘virtual
enterprises’ [11]): it can involve direct access to remote software, computers, data, sen-
sors, and other resources. For example, members of a consortium may provide access to
specialized software and data and/or pool their computational resources.
Resource sharing is conditional: each resource owner makes resources available, subject
to constraints on when, where, and what can be done. For example, a participant in VO
P of Figure 6.1 might allow VO partners to invoke their simulation service only for
‘simple’ problems. Resource consumers may also place constraints on properties of the
resources they are prepared to work with. For example, a participant in VO Q might
accept only pooled computational resources certiﬁed as ‘secure.’ The implementation of
THE ANATOMY OF THE GRID
175
"Participants in P
can run program
A"
"Participants in P
can run program

B"
"Participants in P
can read data D"
"Participants in
Q can use
cycles if idle
and budget not
exceeded"
Ray tracing using cycles
provided by cycle sharing
consortrum
Multidisciplinary design
using programs & data at
multiple locations
P
Q
Figure 6.1 An actual organization can participate in one or more VOs by sharing some or all
of its resources. We show three actual organizations (the ovals), and two VOs: P, which links
participants in an aerospace design consortium, and Q, which links colleagues who have agreed to
share spare computing cycles, for example, to run ray tracing computations. The organization on
the left participates in P, the one to the right participates in Q, and the third is a member of both
P and Q. The policies governing access to resources (summarized in quotes) vary according to the
actual organizations, resources, and VOs involved.
such constraints requires mechanisms for expressing policies, for establishing the identity
of a consumer or resource (authentication), and for determining whether an operation is
consistent with applicable sharing relationships (authorization).
Sharing relationships can vary dynamically over time, in terms of the resources involv-
ed, the nature of the access permitted, and the participants to whom access is permitted.
And these relationships do not necessarily involve an explicitly named set of individuals,
but rather may be deﬁned implicitly by the policies that govern access to resources. For

example, an organization might enable access by anyone who can demonstrate that he or
she is a ‘customer’ or a ‘student.’
The dynamic nature of sharing relationships means that we require mechanisms for
discovering and characterizing the nature of the relationships that exist at a particular
point in time. For example, a new participant joining VO Q must be able to determine
what resources it is able to access, the ‘quality’ of these resources, and the policies that
govern access.
Sharing relationships are often not simply client-server, but peer to peer: providers
can be consumers, and sharing relationships can exist among any subset of participants.
Sharing relationships may be combined to coordinate use across many resources, each
owned by different organizations. For example, in VO Q, a computation started on one
pooled computational resource may subsequently access data or initiate subcomputations
elsewhere. The ability to delegate authority in controlled ways becomes important in such
situations, as do mechanisms for coordinating operations across multiple resources (e.g.,
coscheduling).
176
IAN FOSTER, CARL KESSELMAN, AND STEVEN TUECKE
The same resource may be used in different ways, depending on the restrictions placed
on the sharing and the goal of the sharing. For example, a computer may be used only
to run a speciﬁc piece of software in one sharing arrangement, while it may provide
generic compute cycles in another. Because of the lack of apriori knowledge about how
a resource may be used, performance metrics, expectations, and limitations (i.e., quality
of service) may be part of the conditions placed on resource sharing or usage.
These characteristics and requirements deﬁne what we term a virtual organization,a
concept that we believe is becoming fundamental to much of modern computing. VOs
enable disparate groups of organizations and/or individuals to share resources in a con-
trolled fashion, so that members may collaborate to achieve a shared goal.
6.3 THE NATURE OF GRID ARCHITECTURE
The establishment, management, and exploitation of dynamic, cross-organizational VO
sharing relationships require new technology. We structure our discussion of this tech-

nology in terms of a Grid architecture that identiﬁes fundamental system components,
speciﬁes the purpose and function of these components, and indicates how these compo-
nents interact with one another.
In deﬁning a Grid architecture, we start from the perspective that effective VO operation
requires that we be able to establish sharing relationships among any potential participants.
Interoperability is thus the central issue to be addressed. In a networked environment,
interoperability means common protocols. Hence, our Grid architecture is ﬁrst and fore-
most a protocol architecture, with protocols deﬁning the basic mechanisms by which
VO users and resources negotiate, establish, manage, and exploit sharing relationships.
A standards-based open architecture facilitates extensibility, interoperability, portability,
and code sharing; standard protocols make it easy to deﬁne standard services that pro-
vide enhanced capabilities. We can also construct application programming interfaces and
software development kits (see Appendix for deﬁnitions) to provide the programming
abstractions required to create a usable Grid. Together, this technology and architecture
constitute what is often termed middleware (‘the services needed to support a common
set of applications in a distributed network environment’ [12]), although we avoid that
term here because of its vagueness. We discuss each of these points in the following.
Why is interoperability such a fundamental concern? At issue is our need to ensure that
sharing relationships can be initiated among arbitrary parties, accommodating new partici-
pants dynamically, across different platforms, languages, and programming environments.
In this context, mechanisms serve little purpose if they are not deﬁned and implemented so
as to be interoperable across organizational boundaries, operational policies, and resource
types. Without interoperability, VO applications and participants are forced to enter into
bilateral sharing arrangements, as there is no assurance that the mechanisms used between
any two parties will extend to any other parties. Without such assurance, dynamic VO
formation is all but impossible, and the types of VOs that can be formed are severely
limited. Just as the Web revolutionized information sharing by providing a universal pro-
tocol and syntax (HTTP and HTML) for information exchange, so we require standard
protocols and syntaxes for general resource sharing.
THE ANATOMY OF THE GRID

177
Why are protocols critical to interoperability? A protocol deﬁnition speciﬁes how
distributed system elements interact with one another in order to achieve a speciﬁed
behavior, and the structure of the information exchanged during this interaction. This
focus on externals (interactions) rather than internals (software, resource characteristics)
has important pragmatic beneﬁts. VOs tend to be ﬂuid; hence, the mechanisms used to
discover resources, establish identity, determine authorization, and initiate sharing must
be ﬂexible and lightweight, so that resource-sharing arrangements can be established
and changed quickly. Because VOs complement rather than replace existing institutions,
sharing mechanisms cannot require substantial changes to local policies and must allow
individual institutions to maintain ultimate control over their own resources. Since pro-
tocols govern the interaction between components, and not the implementation of the
components, local control is preserved.
Why are services important? A service (see Appendix) is deﬁned solely by the pro-
tocol that it speaks and the behaviors that it implements. The deﬁnition of standard
services – for access to computation, access to data, resource discovery, coscheduling,
data replication, and so forth – allows us to enhance the services offered to VO partici-
pants and also to abstract away resource-speciﬁc details that would otherwise hinder the
development of VO applications.
Why do we also consider application programming interfaces (APIs) and software
development kits (SDKs)? There is, of course, more to VOs than interoperability, pro-
tocols, and services. Developers must be able to develop sophisticated applications in
complex and dynamic execution environments. Users must be able to operate these
applications. Application robustness, correctness, development costs, and maintenance
costs are all important concerns. Standard abstractions, APIs, and SDKs can accelerate
code development, enable code sharing, and enhance application portability. APIs and
SDKs are an adjunct to, not an alternative to, protocols. Without standard protocols,
interoperability can be achieved at the API level only by using a single implementation
everywhere – infeasible in many interesting VOs – or by having every implementation
know the details of every other implementation. (The Jini approach [13] of downloading

protocol code to a remote site does not circumvent this requirement.)
In summary, our approach to Grid architecture emphasizes the identiﬁcation and deﬁ-
nition of protocols and services, ﬁrst, and APIs and SDKs, second.
6.4 GRID ARCHITECTURE DESCRIPTION
Our goal in describing our Grid architecture is not to provide a complete enumeration of
all required protocols (and services, APIs, and SDKs) but rather to identify requirements
for general classes of component. The result is an extensible, open architectural structure
within which can be placed solutions to key VO requirements. Our architecture and the
subsequent discussion organize components into layers, as shown in Figure 6.2. Compo-
nents within each layer share common characteristics but can build on capabilities and
behaviors provided by any lower layer.
In specifying the various layers of the Grid architecture, we follow the principles of
the ‘hourglass model’ [14]. The narrow neck of the hourglass deﬁnes a small set of core
178
IAN FOSTER, CARL KESSELMAN, AND STEVEN TUECKE
Application
Application
Collective
Resource
Connectivity
Fabric
Transport
Internet
Link
Grid Protocol Architecture
Internet Protocol Architecture
Figure 6.2 The layered Grid architecture and its relationship to the Internet protocol architecture.
Because the Internet protocol architecture extends from network to application, there is a mapping
from Grid layers into Internet layers.
abstractions and protocols (e.g., TCP and HTTP in the Internet), onto which many different

high-level behaviors can be mapped (the top of the hourglass), and which themselves can
be mapped onto many different underlying technologies (the base of the hourglass). By
deﬁnition, the number of protocols deﬁned at the neck must be small. In our architecture,
the neck of the hourglass consists of Resource and Connectivity protocols, which facilitate
the sharing of individual resources. Protocols at these layers are designed so that they
can be implemented on top of a diverse range of resource types, deﬁned at the Fabric
layer, and can in turn be used to construct a wide range of global services and application-
speciﬁc behaviors at the Collective layer – so called because they involve the coordinated
(‘collective’) use of multiple resources.
Our architectural description is high level and places few constraints on design
and implementation. To make this abstract discussion more concrete, we also
list, for illustrative purposes, the protocols deﬁned within the Globus Toolkit [15]
and used within such Grid projects as the NSF’s National Technology Grid [5],
NASA’s Information Power Grid [4], DOE’s DISCOM [2], GriPhyN (www.griphyn.org),
NEESgrid (www.neesgrid.org), Particle Physics Data Grid (www.ppdg.net), and the
European Data Grid (www.eu-datagrid.org). More details will be provided in a
subsequent paper.
6.4.1 Fabric: Interfaces to local control
The Grid Fabric layer provides the resources to which shared access is mediated by
Grid protocols: for example, computational resources, storage systems, catalogs, network
resources, and sensors. A ‘resource’ may be a logical entity, such as a distributed ﬁle
system, computer cluster, or distributed computer pool; in such cases, a resource imple-
mentation may involve internal protocols (e.g., the NFS storage access protocol or a
cluster resource management system’s process management protocol), but these are not
the concern of Grid architecture.
Fabric components implement the local, resource-speciﬁc operations that occur on spe-
ciﬁc resources (whether physical or logical) as a result of sharing operations at higher
THE ANATOMY OF THE GRID
179
levels. There is thus a tight and subtle interdependence between the functions implemented

at the Fabric level, on the one hand, and the sharing operations supported, on the other.
Richer Fabric functionality enables more sophisticated sharing operations; at the same
time, if we place few demands on Fabric elements, then deployment of Grid infrastruc-
ture is simpliﬁed. For example, resource-level support for advance reservations makes it
possible for higher-level services to aggregate (coschedule) resources in interesting ways
that would otherwise be impossible to achieve.
However, as in practice few resources support advance reservation ‘out of the box,’
a requirement for advance reservation increases the cost of incorporating new resources
into a Grid.
Experience suggests that at a minimum, resources should implement enquiry mecha-
nisms that permit discovery of their structure, state, and capabilities (e.g., whether they
support advance reservation), on the one hand, and resource management mechanisms
that provide some control of delivered quality of service, on the other. The following
brief and partial list provides a resource-speciﬁc characterization of capabilities.
•
Computational resources: Mechanisms are required for starting programs and for mon-
itoring and controlling the execution of the resulting processes. Management mecha-
nisms that allow control over the resources allocated to processes are useful, as are
advance reservation mechanisms. Enquiry functions are needed for determining hard-
ware and software characteristics as well as relevant state information such as current
load and queue state in the case of scheduler-managed resources.
•
Storage resources: Mechanisms are required for putting and getting ﬁles. Third-party
and high-performance (e.g., striped) transfers are useful [16]. So are mechanisms for
reading and writing subsets of a ﬁle and/or executing remote data selection or reduction
functions [17]. Management mechanisms that allow control over the resources allocated
to data transfers (space, disk bandwidth, network bandwidth, CPU) are useful, as are
advance reservation mechanisms. Enquiry functions are needed for determining hard-
ware and software characteristics as well as relevant load information such as available
space and bandwidth utilization.

•
Network resources: Management mechanisms that provide control over the resources
allocated to network transfers (e.g., prioritization, reservation) can be useful. Enquiry
functions should be provided to determine network characteristics and load.
•
Code repositories: This specialized form of storage resource requires mechanisms for
managing versioned source and object code: for example, a control system such as CVS.
•
Catalogs: This specialized form of storage resource requires mechanisms for imple-
menting catalog query and update operations: for example, a relational database [18].
Globus Toolkit: The Globus Toolkit has been designed to use (primarily) existing fab-
ric components, including vendor-supplied protocols and interfaces. However, if a vendor
does not provide the necessary Fabric-level behavior, the Globus Toolkit includes the miss-
ing functionality. For example, enquiry software is provided for discovering structure and
state information for various common resource types, such as computers (e.g., OS ver-
sion, hardware conﬁguration, load [19], scheduler queue status), storage systems (e.g.,
available space), and networks (e.g., current and predicted future load [20, 21], and
180
IAN FOSTER, CARL KESSELMAN, AND STEVEN TUECKE
for packaging this information in a form that facilitates the implementation of higher-
level protocols, speciﬁcally at the Resource layer. Resource management, on the other
hand, is generally assumed to be the domain of local resource managers. One excep-
tion is the General-purpose Architecture for Reservation and Allocation (GARA) [22],
which provides a ‘slot manager’ that can be used to implement advance reservation for
resources that do not support this capability. Others have developed enhancements to the
Portable Batch System (PBS) [23] and Condor [24, 25] that support advance reservation
capabilities.
6.4.2 Connectivity: Communicating easily and securely
The Connectivity layer deﬁnes core communication and authentication protocols required
for Grid-speciﬁc network transactions. Communication protocols enable the exchange of

data between Fabric layer resources. Authentication protocols build on communication
services to provide cryptographically secure mechanisms for verifying the identity of
users and resources.
Communication requirements include transport, routing, and naming. While alternatives
certainly exist, we assume here that these protocols are drawn from the TCP/IP protocol
stack: speciﬁcally, the Internet (IP and ICMP), transport (TCP, UDP), and application
(DNS, OSPF, RSVP, etc.) layers of the Internet layered protocol architecture [26]. This
is not to say that in the future, Grid communications will not demand new protocols that
take into account particular types of network dynamics.
With respect to security aspects of the Connectivity layer, we observe that the com-
plexity of the security problem makes it important that any solutions be based on existing
standards whenever possible. As with communication, many of the security standards
developed within the context of the Internet protocol suite are applicable.
Authentication solutions for VO environments should have the following characteris-
tics [27]:
•
Single sign-on: Users must be able to ‘log on’ (authenticate) just once and then have
access to multiple Grid resources deﬁned in the Fabric layer, without further user
intervention.
•
Delegation: [28–30]. A user must be able to endow a program with the ability to run on
that user’s behalf, so that the program is able to access the resources on which the user
is authorized. The program should (optionally) also be able to conditionally delegate a
subset of its rights to another program (sometimes referred to as restricted delegation).
•
Integration with various local security solutions: Each site or resource provider may
employ any of a variety of local security solutions, including Kerberos and Unix secu-
rity. Grid security solutions must be able to interoperate with these various local
solutions. They cannot, realistically, require wholesale replacement of local security
solutions but rather must allow mapping into the local environment.

•
User-based trust relationships: In order for a user to use resources from multiple
providers together, the security system must not require each of the resource providers
to cooperate or interact with each other in conﬁguring the security environment. For

Grid Computing P6

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về