Tải bản đầy đủ (.pdf) (11 trang)

Tài liệu Grid Computing P13 ppt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (106.08 KB, 11 trang )

13
Autonomic computing and Grid
Pratap Pattnaik, Kattamuri Ekanadham, and Joefon Jann
Thomas J. Watson Research Center, Yorktown Heights, New York, United States
13.1 INTRODUCTION
The goal of autonomic computing is the reduction of complexity in the management of
large computing systems. The evolution of computing systems faces a continuous growth
in the number of degrees of freedom the system must manage in order to be efficient. Two
major factors contribute to the increase in the number of degrees of freedom: Historically,
computing elements, such as CPU, memory, disks, network and so on, have nonuniform
advancement. The disparity between the capabilities/speeds of various elements opens
up a number of different strategies for a task depending upon the environment. In turn,
this calls for a dynamic strategy to make judicious choices for achieving targeted effi-
ciency. Secondly, the systems tend to have a global scope in terms of the demand for
their services and the resources they employ for rendering the services. Changes in the
demands/resources in one part of the system can have a significant effect on other parts
of the system. Recent experiences with Web servers (related to popular events such as
the Olympics) emphasize the variability and unpredictability of demands and the need
to rapidly react to the changes. A system must perceive the changes in the environment
and must be ready with a variety of choices, so that suitable strategies can be quickly
selected for the new environment. The autonomic computing approach is to orchestrate
Grid Computing – Making the Global Infrastructure a Reality. Edited by F. Berman, A. Hey and G. Fox

2003 John Wiley & Sons, Ltd ISBN: 0-470-85319-0
352
PRATAP PATTNAIK, KATTAMURI EKANADHAM, AND JOEFON JANN
the management of the functionalities, efficiencies and the qualities of services of large
computing systems through logically distributed, autonomous controlling elements, and to
achieve a harmonious functioning of the global system within the confines of its stipulated
behavior, while individual elements make locally autonomous decisions. In this approach,
one moves from a resource/entitlement model to a goal-oriented model. In order to signif-


icantly reduce system management complexity, one must clearly delineate the boundaries
of these controlling elements. The reduction in complexity is achieved mainly by making
a significant amount of decisions locally in these elements. If the local decision process
is associated with a smaller time constant, it is easy to revise it, before large damage is
done globally.
Since Grid Computing, by its very nature, involves the controlled sharing of computing
resources across distributed, autonomous systems, we believe that there are a number of
synergistic elements between Grid computing and autonomic computing and that the
advances in the architecture in either one of these areas will help the other. In Grid
computing also, local servers are responsible for enforcing local security objectives and
for managing various queuing and scheduling disciplines. Thus, the concept of cooperation
in a federation of several autonomic components to accomplish a global objective is a
common theme for both autonomic computing and Grid computing. As the architecture
of Grid computing continues to improve and rapidly evolve, as expounded in a number
of excellent papers in this issue, we have taken the approach of describing the autonomic
server architecture in this paper. We make some observations on the ways we perceive it
to be a useful part of the Grid architecture evolution.
The choice of the term autonomic in autonomic computing is influenced by an analogy
with biological systems [1, 2]. In this analogy, a component of a system is like an organ-
ism that survives in an environment. A vital aspect of such an organism is a symbiotic
relationship with others in the environment – that is, it renders certain services to others
in the environment and it receives certain services rendered by others in the environment.
A more interesting aspect for our analogy is its adaptivity – that is, it makes constant
efforts to change its behavior in order to fit into its environment. In the short term, the
organism perseveres to perform its functions despite adverse circumstances, by readjusting
itself within the degrees of freedom it has. In the long term, evolution of a new species
takes place, where environmental changes force permanent changes to the functionality
and behavior. While there may be many ways to perform a function, an organism uses
its local knowledge to adopt a method that economizes its resources. Rapid response to
external stimuli in order to adapt to the changing environment is the key aspect we are

attempting to mimic in autonomic systems.
The autonomic computing paradigm imparts this same viewpoint to the components of
a computing system. The environment is the collection of components in a large system.
The services performed by a component are reflected in the advertised methods of the
component that can be invoked by others. Likewise, a component receives the services of
others by invoking their methods. The semantics of these methods constitute the behavior
that the component attempts to preserve in the short term. In the long term, as technology
progresses new resources and new methods may be introduced. Like organisms, the com-
ponents are not perfect. They do not always exhibit the advertised behavior exactly. There
can be errors, impreciseness or even cold failures. An autonomic component watches for
AUTONOMIC COMPUTING AND GRID
353
these variations in the behavior of other components that it interacts with and adjusts to
the variations.
Reduction of complexity is not a new goal. During the evolution of computing sys-
tems, several concepts emerged that help manage the complexity. Two notable concepts
are particularly relevant here: object-oriented programming and fault-tolerant comput-
ing. Object-oriented designs introduced the concept of abstraction, in which the interface
specification of an object is separated from its implementation. Thus, implementation of
an object can proceed independent of the implementation of dependent objects, since it
uses only their interface specifications. The rest of the system is spared from knowing or
dealing with the complexity of the internal details of the implementation of the object.
Notions of hierarchical construction, inheritance and overloading render easy develop-
ment of different functional behaviors, while at the same time enabling them to reuse the
common parts. An autonomic system takes a similar approach, except that the alternative
implementations are designed for improving the performance, rather than providing dif-
ferent behaviors. The environment is constantly monitored and suitable implementations
are dynamically chosen for best performance.
Fault-tolerant systems are designed with additional support that can detect and cor-
rect any fault out of a predetermined set of faults. Usually, redundancy is employed to

overcome faults. Autonomic systems generalize the notion of fault to encompass any
behavior that deviates from the expected or the negotiated norm, including performance
degradation or change-of-service costs based on resource changes. Autonomic systems
do not expect that other components operate correctly according to stipulated behavior.
The input–output responses of a component are constantly monitored and when a compo-
nent’s behavior deviates from the expectation, the autonomic system readjusts itself either
by switching to an alternative component or by altering its own input–output response
suitably.
Section 13.2 describes the basic structure of a typical autonomic component, delineat-
ing its behavior, observation of environment, choices of implementation and an adaptive
strategy. While many system implementations may have these aspects buried in some
detail, it is necessary to identify them and delineate them, so that the autonomic nature
of the design can be improved in a systematic manner. Section 13.3 illustrates two spec-
ulative methodologies to collect environmental information. Some examples from server
design are given to illustrate them. Section 13.4 elaborates on the role of these aspects in
a Grid computing environment.
13.2 AUTONOMIC SERVER COMPONENTS
The basic structure of any Autonomic Server Component, C, is depicted in Figure 13.1,
in which all agents that interact with C are lumped into one entity, called the environment.
This includes clients that submit input requests to C, other components whose services can
be invoked by C and resource managers that control the resources for C. An autonomic
component has four basic specifications:
AutonomicComp ::= BehaviorSpec, StateSpec, MethodSpec, StrategySpec
BehaviorSpec ::= InputSet , OutputSet , ValidityRelation β ⊆  × 
354
PRATAP PATTNAIK, KATTAMURI EKANADHAM, AND JOEFON JANN
StateSpec ::= InternalState , EstimatedExternalState ξ
MethodSpec ::= MethodSet , each π∈ :  ×  ×
ξ →  ×  × ξ
StrategySpec ::= Efficiency η, Strategy α :  ×  ×

ξ → 
The functional behavior of C is captured by a relation,
β ⊆  × 
,where

is the
input alphabet,

is the output alphabet and
β
is a relation specifying valid input–output
pair. Thus, if C receives an input
u ∈ 
, it delivers an output
v ∈ 
, satisfying the relation
β(u,v)
. The output variability permitted by the relation
β
(as opposed to a function) is
very common to most systems. As illustrated in Figure 13.1, a client is satisfied to get
any one of the many possible outputs
(v, v

,...)
for a given input
u
, as long as they
satisfy some property specified by
β

. All implementations of the component preserve this
functional behavior.
The state information maintained by a component comprises two parts: internal state

and external state
ξ
. Internal state,

, contains the data structures used by an imple-
mentation and any other variables used to keep track of input–output history and resource
utilization. The external state
ξ
is an abstraction of the environment of C and includes
information on the input arrival process, the current level of resources available for C and
the performance levels of other components of the system whose services are invoked
by C. The component C has no control over the variability in the ingredients of
ξ
,as
they are governed by agents outside C. The input arrival process is clearly outside C. We
assume an external global resource manager that may supply or withdraw resources from
C dynamically. Finally, the component C has no control over how other components are
performing and must expect arbitrary variations (including failure) in their health. Thus
the state information,
ξ
, is dynamically changing and is distributed throughout the system.
Internal state
y
Estimated state
of environment
x

p
1
p
2
p
3
Π
b(
u
,
v
)
b(
u
,
v
′)
b(
u
′′,
v
)
v
∈ F
v
′ ∈ F
v
′′∈ F
u
∈ S

Clients Resource mgrs Other services
Autonomic Server Component C
Environment
Figure 13.1 Schematic view of an autonomic component and its environment.
AUTONOMIC COMPUTING AND GRID
355
C cannot have complete and accurate knowledge of
ξ
at any time. Hence, the best C can
do is to keep an estimate,
ξ
,of
ξ
at any time and periodically update it as and when it
receives correct information from the appropriate sources.
An implementation,
π
, is the usual input–output transformation based on state
π :
 ×  ×
ξ →  ×  × ξ
, where an input–output pair
u ∈ 
and
v ∈ 
produced will
satisfy the relation
β(u,v)
. There must be many implementations,
π ∈ 

, available for
the autonomic component in order to adapt to the situation. A single implementation
provides no degree of freedom. Each implementation may require different resources and
data structures. For any given input, different implementations may produce different
outputs (of different quality), although all of them must satisfy the relation
β
.
Finally, the intelligence of the autonomic component is in the algorithm
α
that chooses
the best implementation for any given input and state. Clearly switching from one imple-
mentation to another might be expensive as it involves restructuring of resources and
data. The component must establish a cost model that defines the efficiency,
η
,atwhich
the component is operating at any time. The objective is to maximize
η
. In principle,
the strategy,
α
, evaluates whether it is worthwhile to switch the current implementa-
tion for a given input and state, based on the costs involved and the benefit expected.
Thus, the strategy is a function of the form
α :  ×  × ξ → 
. As long as the cur-
rent implementation is in place, the component continues to make local decisions based
on its estimate of the external state. When actual observation of the external state indi-
cates significant deviations (from the estimate), an evaluation is made to choose the right
implementation, to optimize
η

. This leads to the following two aspects that can be studied
separately.
Firstly, given that the component has up-to-date and accurate knowledge of the state
of the environment, it must have an algorithm to determine the best implementation to
adapt. This is highly dependent upon the system characteristics, the costs associated and
the estimated benefits from different implementations. An interesting design criterion is
to choose the time constants for change of implementation, so that the system enters a
stable state quickly. Criteria and models for such designs are under investigation and here
we give a few examples.
Secondly, a component may keep an estimate of the external state (which is distributed
and dynamically changing) and must devise a means to correct its estimate periodically,
so that the deviation from the actual state is kept within bounds. We examine this question
in the next section.
13.3 APPROXIMATION WITH IMPERFECT
KNOWLEDGE
A general problem faced by all autonomic components is the maintenance of an estimate,
ξ
, of a distributed and dynamically changing external state,
ξ
, as accurately as possi-
ble. We examine two possible ways of doing this: by self-observation and by collective
observation.

×