Tải bản đầy đủ (.pdf) (20 trang)

Innovations in Intelligent Machines 1 - Javaan Singh Chahl et al (Eds) Part 4 pot

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (560.58 KB, 20 trang )

50 P.B. Sujit et al.
Figures 2(b) and 2(d) shows that the team theoretic strategy performs
better than the other strategies. Another study examines the effect of sensor
radius on T
d
(Figure 2(c)). Here, we considered a random target map and
carried out three simulations for each sensor radius. The effect of sensor radius
shown is the average of the three simulations. The figure shows that for this
particular case sensor radius of about 25 gives the best performance compared
to any other sensor radius. The performance of team theory, greedy and full
communication strategies depends on the sensor range. If the sensor radius
is small, a UAV can sense very small area and the decision taken will not
be effective. We expect that with increase in sensor range the performance
will also improve. In the case of team theory, this is not true because if we
consider a large sensor range, the estimated value of the virtual target will
be incorrect. This is because the area sensed by the k
th
UAV can include
regions beyond the search region space where there are no targets. But, the
i
th
UAV does not consider this fact and assumes equal density of targets
everywhere. This unnecessarily gives more weightage on the virtual target
and the overall performance decreases. This effect can be seen in Figure 2(c).
This problem can be resolved if we consider other parameters such as target
density gradients or restriction to the search space.
The ratio of search value to the target value also plays a crucial role. If we
give equal priority to search and attacking a target then the UAV may opt
for search task even though there is a target near it. On the other hand, if we
increase the value of the target then there is a possibility that the UAV may
loiter in the vicinity of a target which is already destroyed. In our simulations,


we considered the search value to be 25% of the target attack value and this
yielded good results. But, a more focused study is necessary to examine this
aspect of the problem.
4 Task Allocation using Negotiation
In this section, we present a task allocation algorithm for multiple UAVs
performing search and attack tasks in an unknown region using negotiation
scheme for the scenario given in Section 3. Here we assume that once a target
is attacked, it is destroyed and hence battle damage assessment task on the
target is not necessary to be performed. This is one of the very few applications
available that exploits the use of negotiation for a network of UAVs involved
in a practical problem of decision-making.
4.1 Problem Formulation
Consider N UAVs/agents performing a search and destroy operation on a
bounded region consisting of M targets whose exact positions are not known
a priori. The basic problem of task allocation is to efficiently assign agent
A
i
∈ N, to target m
i
∈ M, such that the mission is completed as quickly as
Team, Game, and Negotiation based UAV Task Allocation 51
possible. The task allocation problem can be solved by using either a central-
ized controller or a decentralized controller. In the former case each agent
communicates the information it has to the central controller that solves a
task allocation algorithm and assigns each agent to a particular task. How-
ever, implementing this task allocation strategy in real-time requires large
communication overheads and will not be scalable to large number of agents
and targets. Also, these strategies are not robust to failures. Hence, a decen-
tralized task allocation strategy, which avoids many of these problems, may be
more advantageous if implemented on a multi-agent system. One way of imple-

menting a decentralized task allocation strategy would be by making each
agent broadcast its information to all the other agents so that each agent has
the required information to solve the task allocation problem independently
and assign a task for itself. The implementation of this task allocation strat-
egy also requires large amount of communication among the agents. To reduce
this demand one can define a neighbourhood concept for each agent so that
an agent communicates its information only to those agents that are in its
neighbourhood. The neighbourhood can be range dependent, in which case it
is dynamic or pre-defined, in which case it is static or randomly selected. In
this work, we will assume only range dependent neighbourhood for agents.
The implementation of decentralized task allocation with finite communi-
cation range poses several challenging problems. For instance, consider Case
A in Figure 3 where agent A
1
and A
2
have target T
1
in their sensor range and
an allocation has to take place as to which agent should be assigned to the
target. The task allocation can be done using a greedy strategy, in which case
both the agents would move towards the same target which is not desirable.
Another task allocation mechanism used in multi-robot literature is based
T
2
T
1
A
3
A

2
A
1
A
1
A
2
T
1
A
2
A
1
T
1
T
2
Case A
Case B Case D
Case C
A
1
A
2
A
3
Fig. 3. Some scenarios for decision-making
52 P.B. Sujit et al.
on auctions [20]. But in the application under consideration since the system
of UAVs is decentralized, each agent would become an auctioneer and hence

both the agents would auction the same target.
Consider Case B in Figure 3, where A
1
has T
1
and T
2
in its sensor range
while A
2
has only T
2
. The auction mechanism requires broadcast of all the
target and their associated costs. Resolving conflicts using auctions is a diffi-
cult task. In Case C, we can see that A
1
sees T
1
while A
3
is already on its way
to attack T
1
.So,A
1
wastes some resource in moving towards a target that is
already assigned, Since the communication is limited it does not have access
to the assignment of other agents. Instead of T
1
it could have attacked T

2
.
Here too greedy and auction algorithm would not yield good performance. In
Case D, agent A
3
gets the auction information from A
1
and A
2
about T
1
,now
A
3
does not know to which agent it has to send the bid. A modification to the
standard auction algorithm may eliminate some of the difficult issues, how-
ever this would complicate the decision-making rules for multiple agents using
auction mechanism locally. These complications in using auctions for limited
communication cases motivate us to use negotiation as a tool to handle these
situations efficiently. In Case A, A
1
and A
2
can negotiate on which agent
would be assigned to target T
1
. While in Case B, A
1
and A
2

can negotiate
such that one agent attacks T
1
and the other moves towards T
2
. In Case C,
A
2
can detect a conflict between A
1
and A
3
and send decisions such that A
1
or A
3
move towards T
1
. However, in Case D, A
3
actually negotiates between
A
1
and A
2
, which are not neighbours, and detects possible conflict and hence
provides an efficient task allocation decision.
However, the implementation of negotiation scheme involves designing of
negotiation rules over which the decision-making process takes place. In the
next section we describe the negotiation scheme employed for decision-making.

At every time step each agent has to perform a task. The task can be
(i) searching for a target or (ii) attacking a target. Each agent senses its
environment consisting of other agents and targets. An agents’ assignment for
a task depends on four different situations. These situations are dependent
on the availability of neighbouring agents and targets. The four situations, in
which agent A
i
has to perform a task and play a role in the decision making
process are:
1. No targets and no neighbours
Task: Search
Decision role: Continue to move in the same direction
2. No targets but has neighbours
Task: Perform search or attack. The target information may be provided
by the neighbouring agents.
Decision role: Acts as a negotiator for neighbouring agents
3. Targets are present but no neighbours
Task: Attack
Decision role: Select a target that yields maximum value
Team, Game, and Negotiation based UAV Task Allocation 53
4. Target as well as neighbours are present
Task: Search or attack
Decision role: Negotiate with neighbours
Once an agent A
i
is present within a distance d from the target, we assume
that the agent can destroy the target effectively. An agent has to negotiate
with its neighbouring agents for an efficient task allocation. The agents are not
subjected to any turn radius constraints and hence can move in any direction.
The agents have to maximize the number of targets destroyed in the search

space by coordinating with its neighbouring agents through negotiation.
4.2 Decision-making
Negotiation as a Tool to Handle Uncertainty in Agent Actions
In general, negotiation refers to the communication process that facilitates
coordination and cooperation among a group of agents [27]. In multi-agent
systems, its aim is to resolve problems related to resource allocation and task
assignments between various agents in a decentralized setting.
Our approach is somewhat similar to Rubinstein’s model of strategic nego-
tiation [28] where agents make proposals that are either accepted or rejected
by other agents; and whether an agent implements its proposal or not depends
on what other agents do. However, our approach is different from Rubinstein’s
model on many counts due to the nature of the task allocation problem. Unlike
most negotiation models we do not have a situation where each proposal is
vetted by all the other agents. In fact, due to the connectivity restrictions, we
have a network of agents where an agent is not necessarily directly connected
to all other agents. So, each agents decision is based on the response of only
those agents that are connected to it. Moreover, unlike in Rubinstein’s model,
agents make simultaneous offers at pre-defined decision epochs and the actions
are accordingly distributed between agents. Another way in which our model
differs from Rubinstein’s model is that in a task allocation problem the need
for negotiation arises mainly because of lack of information about the action
of other agents. So, the whole process of negotiation is geared towards deter-
mining the action of an agent in a coordinated autonomous fashion without
assuming any kind of hierarchy or priority among agents.
A coordinated decision by an agent would be one that is not in conflict with
the decision of its neighbors. There is no conflict except that which arises due
to uncertainty of agent actions. For example, it occurs when more than one
agent is planning to attack the same target, thus decreasing the effectiveness
of the mission. Resolution of such conflicts can be effected either by
(i) Direct communication/negotiation as in the case when an Agent A

i
and
another agent A
j
are within communication range.
(ii) Indirect negotiation when an Agent A
i
and another agent A
j
,A
j
∈N(A
i
)
want to attack the same target T, and A
i
and A
j
are connected through
54 P.B. Sujit et al.
a sequence of communication links through other agents. For instance,
they may be connected through a third agent A
k
with A
j
∈N(A
k
)and
A
k

∈N(A
i
).
In the first case, since A
j
is within the communication range of A
i
,itcan
exchange information with A
j
and resolve the conflict. While in the second
case, A
i
does not know about the existence of A
j
and so direct communication
is not feasible. So, the intermediate agents are important in the negotiation
process. In the negotiation scheme developed next, we will show that it is the
neighboring agents who contribute to the decision-making of agent A
i
.
Negotiation Scheme
Each agent A
i
performs the following actions during decision-making: (i) Sends/
receives proposals (ii) Processes received proposals and sends Accept/Reject
decisions to proposing agents (iii) Computes own route decision (iv) Implements
decision. All these actions happen within each negotiation cycle. This is shown
in Figure 4. Note that an agent A
i

that has no targets will have only the second
segment, while the agents that have targets as well as neighbouring agents will
have all the four segments of decision-making.
The different segments of the negotiation cycle are described below:
Send/receive proposals (NC1): Each agent evaluates the benefit associated
with each target. Let b
i
(T
j
) be the expected benefit that A
i
gets by attacking
target T
j
, which is given by
b
i
(T
j
)=V
j
w
r
− S
ij
(19)
where, V
j
= value of target T
j

, w
r
= the weight given to search task over
the task of attacking a target, S
ij
= (time to reach the target T
j
by agent
A
i
)/(total flight time). The benefit set B
i
of A
i
consists of benefits for all the
tasks an agent has. Let T
i
be the set of all targets. The benefit set for agent
A
i
is represented as:
B
i
= {b
i
(T
j
) | T
j
∈T

j
} (20)
Agent A
i
chooses a target T
S
i
for which A
i
gets the maximum value, as
S
i
= arg max
j
{b
i
(T
j
) ∈B
i
} (21)
Send proposals
Process
received
proposals
Send
accept/reject
decisions
Decide action
based on

accept/reject
decisions received
A Negotiation cycle
NC1
NC2
NC3
NC4
Fig. 4. Negotiation cycle
Team, Game, and Negotiation based UAV Task Allocation 55
The proposal of agent A
i
, sent to its neighboring agents, is of the form
Q
i
=(A
i
,T
S
i
,b
i
(T
S
i
)), containing the proposer agent’s identification, pro-
posed target, and the value associated with T
S
i
.
Processing received proposals (NC2) and sending decisions (NC3): Let Q

i
be
the set of proposals received by agent A
i
from its neighbors A
j
, including its
own proposal.
Q
i
= {(A
j
,C
S
j

j
); L(A
j
) ∈N(L(A
i
),q
c
)}
Let T
i
k
be a target that appears in at least one of the proposals received
by A
i

.Thatis,T
i
k
= T
S
j
for some Q
j
∈Q
i
. For each such T
i
k
, define A(T
i
k
)
as the set of agents that have proposed T
i
k
,andB(T
i
k
) as the set of values
associated with agents in A(T
i
k
). So,
A(T
i

k
)={A
j
| Q
j
∈Q
i
,T
S
j
= T
i
k
}
B(T
i
k
)={b
i
(T
j
) | A
j
∈A(T
i
k
))} (22)
Using the above sets (A(T
i
k

)andB(T
i
k
)), agent A
i
sends accept or reject
decision to its neighbors using the following rules:
Rule 1: An agent A
i
sends accept to agent A
j
,if
A(T
i
k
)={A
j
} (23)
That is, A(T
i
k
) is a singleton containing only agent A
j
(note that A
j
could be
A
i
itself).
Rule 2: If A(T

i
k
) is not a singleton then agent A
i
sends accept to that agent
in A(T
i
k
) which obtain the maximum value by attacking target T
i
k
and reject
to all other agents in A(T
i
k
). That is, accept is sent to A
j

∈A(T
i
k
)if,
j

= arg max
j
{b
i
(T
j

) ∈B(T
i
k
)} (24)
Note that Rule 2 subsumes Rule 1. But they are stated separately for clarity.
Again A
j

canbeA
i
itself.
Rule 3: An agent can send only one accept for one target. If there are more
than one j

then the agent selects one of them.
Rule 4: For A
i
to decide on its action at the current search step it has to get
accept from all its neighboring agents to which it had sent its proposals.
Rule 1 implies that when an agents’ proposal is not in conflict with other
agents’ proposals an accept can be sent without considering the other agents’
decisions. When more than one agent proposes to attack T
k
then there is
a conflict between the proposing agents which A
i
has to resolve. The con-
flict can be resolved by comparing the benfits’ proposed by the agents. Agent
A
i

compares the b
i
(T
j
) received for target T
k
and sends accept decision to an
56 P.B. Sujit et al.
agent A
k
which has the highest b
i
(T
j
)andreject decisions to the remaining
agents. An agent A
i
can receive a mix of accept and reject decisions from
its neighbors. If we allow the agent to attack a target T
k
, since it has got
acceptance from some of the agents, this assignment would cause ineffective
performance as multiple agents will get assigned to the same target. Hence,
Rule 4 guards against agents getting multiple assignment. Rules 1-4 are the
key to the negotiation scheme. While implementing Rule 3, we may encounter
situations where more than one agent has the same b
i
(T
j
), in which case we

use a deadlock resolution scheme that resolves such deadlocks.
Computing route decision (NC4): Agent A
i
decides whether to implement or
discard its proposed task based on the accept or reject decisions received from
its neighbors. The agent implements its proposal if it receives accept decisions
from all its neighbors and discards it if the agent receives a reject from even
one of its neighbors. An agent that received a reject for its proposal from at
least one neighbor will go on to the next negotiation cycle and this process
will continue till it receives all accept decisions. An agent that has arrived at a
decision (after receiving accept from all its neighbors) will not send any more
proposals during subsequent negotiation cycles. The sequence of negotiation
cycles will terminate automatically when all the agents have converged to a
decision. Later we will prove that only a finite number of negotiation cycles
are necessary. When an agent A
i
receives reject for all its proposals, it adopts
the search task.
Additional Target Information Exchange
An agent that has received acceptance to its proposal may have other tar-
gets within its sensor range. An agent A
i
can send this information to its
neighbouring agents who can use it. The information that an agents sends is
the target location and its value as perceived by A
i
. This information will be
more useful for those agents that may not have decided any targets but are
neighbours of A
i

. The target information broadcast by A
i
can also be useful
if all the proposals of agent A
j
∈N(A
i
) are rejected.
Once an agent receives the available targets from agent A
i
, it can make
assignment to any of the targets based on random number generation, greedy
strategy, or start a negotiation with its neighbouring agents for obtaining an
assignment. Here, we use greedy strategy for simplicity.
Deadlock Resolution Mechanism
We define a deadlock, when an agent A
i
is unable to decide to whom it has
to send an acceptance. This situation can happen when more than one agent,
with the same b
i
(T
j
) value, seeks target T
j
to attack. Since the b
i
(T
j
) values

are same, use of Rule 2 is not possible and agent A
i
cannot send acceptance
Team, Game, and Negotiation based UAV Task Allocation 57
to all the agents as that will violate Rule 3. There are two possible ways of
resolving deadlock: loss information and token algorithm.
Loss information: In this scheme, agent A
i
requests for more information from
agents in A(T
i
k
). This additional information will aid in effective decision-
making. The additional information that an agent requests is the value of
possible loss that each proposing agent suffers if it chooses the next best
action instead of the proposed action. Let the new benefit vector for agent A
k
be
ˆ
B
k
and the loss λ
k
be evaluated using (25) as,
ˆ
B
k
= {B
k
\ b

i
(C
S
K
)}; λ
k
= max B
k
− max
ˆ
B
k
(25)
where,

\

denotes set difference. When agent A
i
requests for loss information,
the loss λ
k
is sent to agent A
i
.LetΛ
i
represent the set of loss information
received from all the agents in A(T
i
k

). An accept is sent to an agent A
j
that
satisfies the condition in (26) and reject is sent to the remaining agents.
A
j
= arg max
i

i
) (26)
Suppose there are multiple b
i
(T
j
)’s that are at the next highest level, then
the same procedure needs to be repeated. Using the loss information does
not guarantee that the deadlock will be resolved. This situation can arise
when multiple agents have the same loss value. In that case, we use a token
algorithm as given below.
Token Algorithm : Every agent A
i
carries a unique token number K
i
. When-
ever the above situation (of the loss being equal) occurs wherein the agent
is unable to decide to whom it has to send acceptance, the agent requests
for token number of the agents A
k
, A

k
∈A(T
i
k
). Agent A
i
compares these
token numbers and chooses an agent A
j
with the least token number. The
token number of A
j
is increased by a number
ˆ
N, where
ˆ
N is an arbitrary
large number greater than N . This scheme ensures that an agent that has
been selected earlier in this situation, will not be selected again in a similar
situation if there is at least one other agent which has not been selected before.
Some Theoretical Results
Theorem 1. If more than one agent is proposing a target T
j
, then at least
one of the agents will receive all acceptances from its neighbors.
Proof. Let A(T
i
j
) be the set of agents proposing target T
j

as their proposal.
Then, by Rule 2, agent A
i
sends an accept decision to agent A
j
which has
the maximum b
i
(T
j
). If there are multiple agents with same b
i
(T
j
) then A
i
invokes the deadlock resolution mechanism by which one agent would receive
an accept. 
58 P.B. Sujit et al.
Theorem 2. The negotiation terminates in a finite number of negotiation
cycles.
Proof. From Theorem 1 we observe that, at each negotiation cycle, at least
one of the agents gets all accept and so decides upon a target for its next step.
Since there are a finite number of agents, in a finite number of negotiation
cycles each agent would decide upon a target to attack. If the target are not
available then they continue to search task. Hence, all the agents would decide
upon a task in a finite number of negotiation cycles. The maximum number
of negotiation cycles an agent can go through is N. 
4.3 Simulation Results
A simulation study is conducted on a battlefield scenario of size 100 × 100.

Through these simulations we show that the negotiation scheme performs
better than greedy strategy in terms of average number of targets destroyed.
The simulation is carried out using 7 UAVs for 100 different sets of target posi-
tions with each set having 20 targets. The a priori knowledge about number
of targets present in the space and their initial positions are not available to
the UAVs. We also study the performance of negotiation and greedy schemes
for various sensor radius.
From Figure 5 we can see that the negotiation scheme outperforms the
greedy strategy. The number of targets using negotiation scheme is higher and
0 50 100 150 200 250 300 350
0
2
4
6
8
10
12
14
16
18
20
Time taken to destroy targets
Average number of targets destroyed
G s
r
= 10
G s
r
= 20
Ns

r
= 20
Ns
r
= 30
G s
r
= 40
G s
r
= 30
G s
r
= 50
Ns
r
=10
and s
r
=40
Ns
r
=50
G > Greedy strategy
N > Negotiation scheme
Fig. 5. Average number of target hits for 100 different target positions
Team, Game, and Negotiation based UAV Task Allocation 59
the time taken to accomplish the mission is comparatively low. An expected
result of increase in performance with increase in sensor range can be seen
for the performance curves of negotiation scheme in the figure. However, this

intuitive result is not true for greedy strategy.
The performance of greedy strategy with sensor radius s
r
= 10 is better
than higher sensor radius s
r
=20tos
r
= 50. This is due to the fact with low
sensor radius, the UAVs are unable to sense the targets initially and hence
move in the initial heading direction (spreading out). But, with higher sensor
radius, the agents are able to sense the target from their initial positions and
hence all the UAVs move in the direction of sensed target as a swarm. Hence,
the performance is worse when compared to lower sensor radius.
We carried out another set of simulations to study the performance of task
allocation algorithm for different target distributions on the search space. In
order to conduct these experiments we define a proximity factor that deter-
mines the nature of the distribution or spread of targets in the search space.
The proximity factor is defined as:
ρ =
S
r
1
N

N
i=1

(x
i

− x
c
)
2
+(y
i
− y
c
)
2
(27)
where N is number of targets, (x
i
,y
i
) represents the position of the i
th
target
location, (x
c
,y
c
) the mean of all the target positions and S
r
the sensor radius.
Low proximity factor implies well separated targets compared to the sensor
radius. While high proximity factor ensures that the targets are placed very
closely. Figure 6 show different target distributions in the search space.
The simulations are carried out using 7 UAVs for a search space consisting
of 50 targets, with different proximity factors. Figure 7 shows the performance

of negotiation and negotiation with target information based task allocation
UAVs
Targets
UAVs
targets
Fig. 6. Battle field with 20 targets for proximity factors ρ =0.625 and ρ =0.11,
while the sensor radius s
r
=10
60 P.B. Sujit et al.
0 50 100 150 200 250 300 350 400
0
5
10
15
20
25
30
35
40
45
50
Time steps
Number of targets destroyed
ρ=0.266
ρ=0.443
ρ=0.886
ρ=1.77
Negotiation only
Negotiation with

information exchange
Fig. 7. Number of targets destroyed for different proximity factors
schemes. From the figure we can see that for lower proximity factors the
number of targets destroyed are low as compared to the number of targets
destroyed in the higher proximity factor case. When the proximity factor is
small, the effect of target information sharing during decision-making by the
agents that have targets in their sensor range is significant. For ρ =0.266,
we can see from the figure that the performance of negotiation with target
information based task allocation is better than that using negotiation only.
Here, the target information broadcast plays a crucial role in enhancing the
performance. Similar kind of effect can be seen for ρ =0.443. However, for
σ =0.886, the negotiation based task allocation is better than that with
target information exchange. This is due to the fact that the additional infor-
mation about distant targets makes the agent choose distant targets to attack
rather than perform search in its own neighborhood. This causes UAVs to
miss nearer targets outside its sensor range. For ρ =1.77, the performance is
the same for both the negotiation schemes. Since the proximity factor is high,
all the agents can sense all the targets hence there is no improvement in per-
formance with information exchange. It should be noted that the amount of
information broadcast also plays a crucial role in the performance of the task
allocation. Hence, there is always a tradeoff between how much of information
should be broadcast and the performance.
Team, Game, and Negotiation based UAV Task Allocation 61
5 Search using Game Theoretic Strategies
In the previous section we have seen search task to be a part of other tasks
to be carried out by the UAVs. However, there are applications like search
and surveillance missions where search is the only task that has to be carried
out. By search we mean that the UAVs are deployed in an unknown region to
collect information about the region.
Consider an unknown region over which a search mission has to be carried

out. Based on the a priori knowledge of the search space, an uncertainty
map is constructed. The uncertainty map is discretized into cells. Here, we
discretize the map into a grid of hexagonal cells, as they offer the flexibility
to move in any direction while expending the same amount of energy. The
uncertainty map constitutes real numbers between 0 and 1 associated with
each cell in the search space. These numbers represent the uncertainty with
which the location of the target is known in that cell. An uncertainty value of
0 would imply that everything is known about the cell (that is, one can say
with certainty whether a target is located in that cell or not). On the other
hand, an uncertainty value of 1 would imply that nothing can be said about
the location of the target in that cell.
One of the motivations for modeling a search problem in a game theore-
tical framework arises from the fact that this framework gives the flexibility
of using two different solution concepts: one based on cooperation between
players and the other based upon non-cooperation. Application of these
notions to the economics had to take into account the fact that players are
not inherently altruistic, thus making the cooperative framework somewhat
untenable, unless the cooperation is enforced by a third party. On the other
hand, in the non-cooperative framework it has been shown that in repeated
games, cooperation automatically emerges as the best noncooperative solu-
tion and hence the notion of cooperation is inherent and enforceable in the
non-cooperative framework. Although when we consider cooperation between
automated agents that are devoid of any selfish motive and have only a com-
mon goal in mind, it is more logical to use a cooperative framework, in our
work we show that the non-cooperative framework is almost equally effective
and is no more computationally time consuming than the cooperative frame-
work. There are other reasons too, related to the specific problem structure,
which justifies the usage of the non-cooperative framework. For instance, when
the sensor performance is unreliable or noisy, or due to ineffective communica-
tion the uncertainty map of each agent changes with time unknowingly to the

other agents, leading to different uncertainty map for different agents. In such
situations, the cooperative decision making mechanism breaks down. Here we
show that when this is the case, the non-cooperative Nash strategies perform
better than the cooperative strategies.
62 P.B. Sujit et al.
5.1 N-person Game Model
The strategies that we propose for N agents/search agents is based on a game
theoretical model. We use q-step look ahead planning [29], where q determines
the depth of the exploratory search to obtain optimal strategies.
The objective of the agents is to select their next action or path at time t
in order to maximize their benefits (that is, maximize uncertainty reduction).
This problem can be modelled as a N-person non-zero sum game with each
agent as a player and the set of paths available to each agent as the set of
strategies.
Another approach to decision-making in this situation, without the bene-
fit of communication between agents, is to make some assumption on the
behavior of other agents in the search space. So, a player/agent may consider
the rest of the N −1 players to be one single player. Hence, we can model the
N-person game as one player playing against the rest of N − 1 players taken
together as a single player (a coalition of N − 1 players). Here, we describe
both the models.
The payoff to each agent can be expressed in terms of search effectiveness
functions. Every cell has an uncertainty value associated with it. Let P
q
i
(C
s
i
),
i ∈{1, 2 ,N} be the set of all possible paths of length q, for an agent A

i
,
emanating from cell C
s
i
. A path P
j
i
(C
s
i
) ∈P
q
i
(C
s
i
),j=1, 2, ,|P
q
i
(C
s
i
)|,
is an ordered set of cells P
j
i
(C
s
i

), defined as,
P
j
i
(C
s
i
)=

C
j,1
i
,C
j,2
i
,C
j,3
i
, ,C
j,q
i
| C
j,k
i
∈C,C
j,1
i
= C
s
i

,C
j,k+1
i
∈N(C
j,k
i
)

(28)
where, C is the collection of all cells, C
s
i
is the current position of A
i
,and
N(C
j,k
i
) is the set of all neighboring cells of C
j,k
i
.
Let the uncertainty value of a cell C
k
at time t, as perceived by A
i
,be
U
i
(C

k
,t). Given a path P
j
i
(C
s
i
) of agent A
i
, suppose A
i
is at cell C
l
at time
t then the reduction of uncertainty associated with C
l
, and the subsequent
updated value of uncertainty, is evaluated as follows:
Case 1: Only A
i
is in cell C
l
at time t, then
v
i
(t)=U
i
(C
l
,t)β

i
(29)
U
i
(C
l
,t+1)=U
i
(C
l
,t)(1 − β
i
) (30)
where, v
i
(t) is the benefit that agent A
i
would obtain (that is, the amount of
uncertainty reduction that it will achieve) when it visits cell C
l
.
Case 2: When more than one agent visits cell C
l
at time t,letA represent
this set of agents. Then,
v
i
(t)=
˜
β

ˆ
β
i
U
i
(C
l
,t) (31)
U
i
(C
l
,t+1)=U
i
(C
l
,t) −
N

i=1
v
i
(t) (32)
Team, Game, and Negotiation based UAV Task Allocation 63
where,
˜
β =1−

j∈A
(1 −β

j
);
ˆ
β
i
=
β
i

j∈A
β
j
(33)
So, given N routes P
1
,P
2
,P
3
, ,P
N
of the N agents, where P
i
is any P
j
i

P
q
i

(we drop the C
s
i
argument from the path notation P
j
i
(C
s
i
) as well as
from P
q
i
(C
s
i
), the set of all possible paths, for simplicity), the reduction in
uncertainty achieved by A
i
at each step t (t =1, 2, ,q) is given by v
i
(t)
and is computed using Case 1 or Case 2 as the case may be. Note that this
computation has to be done simultaneously for all the agents. The total benefit
to A
i
due to path P
i
is
m

i
(P
1
, ,P
N
)=
q

t=1
v
i
(t) (34)
which represents the payoff obtained by A
i
as the agents choose strategies
P
1
,P
2
, ,P
N
. The functions m
i
:

i=1, ,N
P
q
i
→ R

+
,fromthesetof
paths to the uncertainty reduction value, are called the search effectiveness
functions.
5.2 Solution Concepts
The decision to choose a particular path that would provide the maximum
information gain (or uncertainty reduction) can be based on various strategies.
We consider the following strategies: Noncooperative Nash strategy, coali-
tional Nash strategy, security strategy, cooperative strategy, greedy strategy,
and globally optimal strategy. The Nash, security, coalitional Nash, and greedy
strategies do not require any kind of communication to arrive at an optimal
decision, while cooperative and globally optimal strategies require communi-
cation to implement the decision making process.
(i) Noncooperative Nash Equilibrium Strategy: When the agents do not com-
municate with each other to decide on their future action at time t, and each
agent assumes that the other N −1 agents take actions that are beneficial to
them, then we can use the concept of noncooperative Nash equilibrium.
(ii) Coalitional Nash Strategy: This is similar to the non-cooperative Nash
equilibrium strategy, except that each agent assumes the other N − 1 agents
to form a coalition and take actions jointly that are jointly beneficial to them.
(iii) Security Strategy: This strategy becomes relevant when, as before, the
agents do not communicate with each other and each agent assumes the other
N − 1 agents to be adversaries. In such a situation the best strategy for the
agent is to secure its minimal benefit. Hence, it is logical for the agent to use
security strategy that would guarantee a minimal payoff.
64 P.B. Sujit et al.
(iv) Cooperative Strategy: The agents communicate with each other and
decide collectively (jointly) to take the best possible action. This is also the
centralized decision making case.
(v) Greedy Strategy: Agents do not communicate among themselves and use

greedy strategy. An agent does not consider the effect of the possible actions
of the others agent and selects an action that yields the maximum benefit to
itself. This strategy is used for comparison purposes only.
(vi) Globally Optimal Strategy: The game theoretical strategies are based on
local information up to q steps. Hence, the solution is optimal for these q steps
and not globally optimal. We can obtain a globally optimal solution by making
q equal to the largest possible number of steps in an agent’s search path.
This requires huge computational time and also increases the computational
complexity as the domain of the search effectiveness function increases. We
will not use this strategy but, for the interested researcher, some heuristic
algorithms to implement such strategies are discussed in [30].
Non-cooperative N-person Nash Equilibrium Strategy
We define a non-cooperative N-person game in normal form for N agents [31].
A N-person game consists of N search effectiveness functions m
i
,i=1, ,N.
The ordered N - tuple of real numbers (m
1
(P
1
, ,P
N
), ,m
N
(P
1
, ,P
N
))
denotes the payoff to each agent respectively. The players do not cooperate

with each other and arrive at their decisions independently. In such a sit-
uation the equilibrium solution can be stated as: An N-tuple of strategies
{P

1
,P

2
, ,P

N
} with P

i
∈P
q
i
is said to constitute a noncooperative (Nash)
equilibrium solution for an N-person nonzero-sum game, if the following N
inequalities are satisfied for all P
i
∈P
q
i
,i∈ N .
m
1∗
 m
1
(P


1
,P

2
,P

3
, ,P

N
) ≥ m
1
(P
1
,P

2
, ,P

N−1
,P

N
)
m
2∗
 m
2
(P


1
,P

2
, ,P

N
) ≥ m
2
(P

1
,P
2
,P

3
, ,P

N−1
,P

N
)
.
.
.
.
.

.
.
.
.
m
N∗
 m
N
(P

1
,P

2
, ,P

N
) ≥ m
N
(P

1
,P

2
,P

3
, ,P


N−1
,P
N
) (35)
The N-tuple (m
1∗
,m
2∗
, ,m
N∗
) is known as a noncooperative (Nash) equi-
librium outcome of the N-person game in normal form. The pure strategy
Nash equilibrium may not exist always. In this case we need to compute
mixed strategies which guarantee a solution to the noncooperative game.
Mixed Strategies : A mixed strategy for a player is a probability distribution
on the space of its pure strategies. An allowable strategy for A
i
is to choose
P
1
i
with probability (w.p.) y
i
1
, P
2
i
, w.p. y
i
2

, , P
|P
q
i
|
i
w.p. y
i
|P
q
i
|
, so that,
|P
q
i
|

k=1
y
i
k
=1, and 0 ≤ y
i
k
≤1 (36)
Team, Game, and Negotiation based UAV Task Allocation 65
An N-tuple {y
i∗
∈ Y

i
,i ∈ N} is said to constitute a mixed strategy non-
cooperative (Nash) equilibrium solution for a N-person game in normal form,
if the following N inequalities are satisfied for all y
j
∈ Y
j
,j ∈ N :
J
1∗


P
q
1

P
q
2


P
q
N
y
1∗
P
1
y
2∗

P
2
y
N∗
P
N
m
1
(P
1
,P
2
, ,P
N
)


P
q
1

P
q
2


P
q
N
y

1
P
1
y
2∗
P
2
y
N∗
P
N
m
1
(P
1
,P
2
, ,P
N
)
J
2∗


P
q
1

P
q

2


P
q
N
y
1∗
P
1
y
2∗
P
2
y
N∗
P
N
m
2
(P
1
,P
2
, ,P
N
)


P

q
1

P
q
2


P
q
N
y
1∗
P
1
y
2
P
2
y
3∗
P
3
y
N∗
P
N
m
2
(P

1
,P
2
, ,P
N
)
.
.
.
.
.
.
.
.
.
J
N∗


P
q
1

P
q
2


P
q

N
y
1∗
P
1
y
2∗
P
2
y
N∗
P
N
m
N
(P
1
,P
2
, ,P
N
)


P
q
1

P
q

2


P
q
N
y
1∗
P
1
y
2∗
P
2
y
N−1∗
P
N−1
y
N
P
N
m
N
(P
1
,P
2
, ,P
N

) (37)
The noncooperative Nash equilibrium outcome of a N-person game in mixed
strategies is given by the N -tuple {J
1∗
, ,J
N∗
}. If there exists an inner
mixed strategy solution then, such a solution {y
i∗

˘
Y
i
; i ∈ N } of an N -person
game in normal form satisfies the following set of equations:

P
q
2


P
q
N
y
2∗
P
2
y
N∗

P
N
{m
1
(P
1
, ,P
N
) −m
1
(P
l
1
,P
2
, ,P
N
)} =0,
P
1
∈P
q
1
,P
1
= P
l
1
,


P
q
1

P
q
3


P
q
N
y
1∗
P
1
y
3∗
P
3
y
N∗
P
N
{m
2
(P
1
, ,P
N

) −m
2
(P
1
,P
l
2
, ,P
N
)} =0,
P
2
∈P
q
2
,P
2
= P
l
2
,
.
.
.
.
.
.

P
q

1


P
q
N−1
y
1∗
P
1
y
N−1∗
P
N−1
{m
N
(P
1
, ,P
N
) −m
N
(P
1
, ,P
N−1
,P
l
N
)} =0,

P
N
∈P
q
N
,P
N
= P
l
N
(38)
where, P
l
i
is any one of the search paths in P
q
i
,and
˘
Y
i
is the interior of Y
i
.If
the inner mixed strategy solution does not exist then the above formulation
may not yield a feasible solution. In that case, we may have to choose some
other algorithm. The domain of the search effectiveness function increases
66 P.B. Sujit et al.
with increase in q and also increase in number of players. Hence, solving the
algebraic equations becomes computationally time consuming. In order to

reduce computational time we use the concept of domination [31].
Dominating Strategies : There are certain strategies for an agent which yield
less profit than other strategies. For instance, consider agent A
i
choosing path
P
k
i
that has higher benefit than path P
l
i
, for all possible combination of the
paths of the rest of the N −1 agents. Then, we can eliminate path P
l
i
without
affecting the equilibrium solution. Since the objective of the searchers is to
maximize their benefits, we are eliminating a strategy with lower benefit. In
general, for A
i
, considering the search effectiveness function m
i
, we say that
path P
k
i
dominates path P
l
i
,if

m
i
(P
1
, ,P
k
i
, ,P
N
) ≥ m
i
(P
1
, ,P
l
i
, ,P
N
), ∀ P
j
∈P
q
j
,j= i (39)
and if, for at least one j, the strict inequality holds. Eliminating dominated
strategies will reduce the computational time required to compute the mixed
equilibrium strategy. The concept of dominating strategies in non-zero sum
games as formulated above is similar to the dominating strategies as formu-
lated for zero sum games [31]. The dominating strategies concept is applicable
only for noncooperative games and not for cooperative and security strategies.

Coalitional Nash Strategies
In this model, we assume that agent A
i
is playing against the coalition of
the rest of the N − 1 agents. The game is modelled as a bimatrix game
which consists of two search effectiveness matrices, M
1,i
= {m
1,i
kl
} and M
2,i
=
{m
2,i
kl
}. The matrix M
1,i
represents the benefit obtained by agent A
i
.Every
element m
1,i
kl
= V
i
(P
k
i
,

ˆ
P ),
ˆ
P = {(P
1
,P
2
, ,P
N
)|P
j
= P
i
,j =1, ,N},
represents the benefit obtained by agent A
i
choosing path P
k
i
∈P
q
i
while
the coalition chooses a strategy l, l =1, 2, ,|

N
j=1
j=i
P
q

j
|. The matrix M
2,i
represents the benefit obtained by the coalition and every element m
2,i
kl
=

N
j=1
j=i
V
j
(P
k
i
,
ˆ
P ). The agent A
i
assumes the coalition to be a single player.
The players (A
i
and the coalition) arrive at their decisions independently. In
such a situation the equilibrium solution can be stated as: A pair of strategies
{row k

, column l

} is said to constitute a noncooperative (Nash) equilibrium

solution to the bimatrix game, if the following pair of inequalities are satisfied,
∀ k =1, 2, ,|P
q
i
| and ∀ l =1, 2, ,|

N
j=1
j=i
P
q
j
|
m
1,i
k

l

≥ m
1,i
kl

,m
2,i
k

l

≥ m

2,i
k

l
(40)
The agent A
i
considers strategy k

as its equilibrium strategy. Each agent
computes the two search effectiveness matrices and considers k

as its equi-
librium strategy. The dimension for the search effectiveness matrix is
Team, Game, and Negotiation based UAV Task Allocation 67
|P
q
i
|×|
N

j=1
j=i
P
q
j
| (41)
The pure strategy Nash equilibrium may not always exist, in which case we
have to use mixed strategy equilibrium. The main disadvantage of using the
earlier model is that, if there is no inner mixed strategy Nash solution then

we may not be able to find a feasible solution. However, in this model we can
directly use the bilinear programming method to compute the mixed strategies
equilibrium.
A pair {y

,z

} constitutes a mixed-strategy Nash Equilibrium solution
to a bimatrix game (M
1,i
,M
2,i
) if, and only if, their exists a pair (f

,g

)
such that {y

,z

,f

,g

} is a solution of the following bilinear programming
problem:
min
y,z,f,g
[−y


M
1,i
z − y

M
2,i
z + f + g] (42)
subject to
− M
1,i
z ≥−f ·1
|P
q
i
|
, − M
2,i

z ≥−g ·1
|

N
j=1
j=i
P
q
j
|
y ≥ 0,z≥ 0,y


· 1
|P
q
i
|
=1,z· 1
|

N
j=1
j=i
P
q
j
|
= 1 (43)
where 1
|P
q
1
(C
s
1
)|
and 1
|

N
j=1

j=i
P
q
j
|
are column vectors of dimensions |P
q
i
| and
|

N
j=1
j=i
P
q
j
|, with all elements equal to 1.
Security Strategy
In security strategy the individual agents try to secure their minimal profits
assuming adversarial behavior of the other players. For this purpose, the coali-
tional form given above is the ideal framework to obtain security strategies.
Then, agent A
i
chooses a ’row k

’ whose smallest entry is no smaller than the
smallest entry of any other row, which implies
V
¯

(M
1,i
) = max
k
min
l
m
1,i
kl
,k

= arg max
k
{min
l
m
1,i
kl
} (44)
where, k =1, 2, ,|P
q
i
|,andl represents a particular combination of strate-
gies used by the N −1 agents and
l =1, 2, ,|
N

j=1
j=i
P

q
j
| (45)
Further, k

is the security strategy for A
i
, and V
¯
(M
i
) is the guaranteed payoff
to A
i
. Every agent computes the security strategy individually and adopts the
route given by k

.
68 P.B. Sujit et al.
Cooperative Strategy
In this strategy the agents communicate during the decision process. The
agents jointly choose a strategy such that the joint payoff of the game is maxi-
mized. Each agent computes the same search effectiveness function M and
decides it’s q step look ahead path using the function M.ForN agents, let the
search effectiveness function be M = m
1
(P
1
, ,P
N

)+ + m
N
(P
1
, ,P
N
)
which represents the joint payoff due to all the agents’ actions. A N-tuple of
strategies (P

1
, ,P

N
) is said to be a cooperative strategy, if the following
condition is satisfied:
m
1
(P

1
, ,P

N
)+ + m
N
(P

1
, ,P


N
)=M(P

1
, ,P

N
) ≥
M(P
1
, ,P
N
)=m
1
(P
1
, ,P
N
)+ + m
N
(P
1
, ,P
N
), ∀ P
i
∈P
q
i

(46)
The cooperative strategy used in game theory involves communication
between the players to coordinate their actions and arrive at a mutually
acceptable decision. The drawback of cooperative strategy, when used in eco-
nomics, where the players are selfish by nature, is that player may violate
the mutually decided upon agreement to earn larger benefits at the cost of
others. In our scenario, since the agents are automated, they can be assumed
to be altruistic and hence they do not violate the decided upon agreement.
Hence, using the search effectiveness functions, each agent can also compute
the cooperative strategy without explicit communication between the agents.
The equilibrium solution remains the same whether communication is present
or not, provided that all the agents possess the same uncertainty map.
Greedy Strategy
In this case, the agents do not communicate among themselves and use a
greedy strategy to determine their future actions. This is similar to the non-
cooperative strategy used in [32]. The agent chooses a path P
k
i
with a look
ahead policy of q, using the following relation:
m
i
(P
k
i
) ≥ m
i
(P
j
i

), ∀ j =1, 2, ,|P
q
i
| (47)
where m
i
(P
k
i
) is the benefit obtained by agents A
i
using path P
k
i
and it is
evaluated using Eqn. (29).
Selection of Strategies
When there are multiple solutions, the selection of strategies by players
becomes a crucial issue. The security and greedy strategies are straightfor-
ward to implement. If there exists multiple security or greedy strategies, any
one of them will guarantee the same payoff level. In fact, for security strategies
the actual payoff is bound to be higher for the players so long as they stick to
Team, Game, and Negotiation based UAV Task Allocation 69
their security strategies. In the case of multiple cooperative strategies, since
all players communicate with each other during the decision process, they can
decide to adopt a strategy which is beneficial to the overall team goal. One
can also devise some protocol to automate this selection so that communica-
tion between agents can be dispensed with. But when multiple solutions occur
for pure or mixed strategy Nash equilibrium, the agents have to select one of
them. Since every agent can evaluate the search effectiveness function of all the

other agents, they can jointly select a solution whose joint payoff is maximum.
The selection of solution does not involve any communication with the other
agents, but uses the available data through evaluation of search effectiveness
functions. The solution method of choosing a strategy that would maximize
the agents benefit is common for all the agents. When a mixed strategy equi-
librium exists then agents can make a choice based on maximum likelihood
or by random number generation. Here, we choose the maximum likelihood
method.
5.3 Simulation Results
For the purpose of simulation, a region composed of hexagonal grids of size
30×30 is considered. We consider five agents with randomly located initial
positions in the search space. We initially assume a perfect information case
where each agent has the same uncertainty map throughout the search oper-
ation, although it is not a necessary condition. A typical uncertainty map is
shown in Figure 8 along with the initial positions of the searchers. The per-
centage of uncertainty in a cell is proportional to the size of the grey area in
the cell. The total uncertainty in the search space is defined as the sum of the
uncertainties in all the cells.
The uncertainty map is updated at every search step in time. The simula-
tion is carried out for look ahead step lengths of q =1andq = 2. The agents’
uncertainty reduction factors are assumed to remain constant throughout the
Fig. 8. A typical uncertainty map for 30×30 hexagonal grid

×