DISTRIBUTED SELF-ORGANIZATION OF SWARMS TO FIND GLOBALLY  -OPTIMAL ROUTES TO LOCALLY SENSED TARGETS ISHANU CHATTOPADHYAY ∗ Abstract. The problem of near-optimal distributed path planning to locally sensed targets is investigated in the context of large swarms. The proposed algorithm uses only information that can be locally queried, and rigorous theoretical results on convergence, robustness, scalability are established, and effect of system parameters such as the agent-level communication radius and agent velocities on global performance is analyzed. The fundamental philosophy of the proposed approach is to percolate local information across the swarm, enabling agents to indirectly access the global context. A gradient emerges, reflecting the performance of agents, computed in a distributed manner via local information exchange between neighboring agents. It is shown that to follow near-optimal routes to a target which can be only sensed locally, and whose location is not known a priori, the agents need to simply move towards its “best” neighbor, where the notion of “best” is obtained by computing the state-specific language measure of an underlying probabilistic finite state automata. The theoretical results are validated in high-fidelity simulation experiments, with excess of 10 4 agents. Key words. Optimization; Swarms; Probabilistic State Machines AMS subject classifications. 1. Introduction & Motivation. Path planning in a co-operative environment is a problem of great interest in multi-agent robotics. Recent developments in mi- cro machining and MEMs have opened up the possibility of engineering extremely small and cheap robotic platforms in large numbers. Limited in size, on-board com- putational resources and power, such robots nevertheless can potentially exploit co- operation to accomplish complex tasks [1, 2, 3, 4] including surveillance, reconnais- sance, path finding and collaborative payload conveyance. However, coordinating such engineered swarms unveils new challenges not encountered in the operation of one or a few robots [5, 6, 7]. Coordination schemes requiring unique identities for each robot, explicit routing of point-to-point communication between robots, or centralized rep- resentations of the state of an entire swarm are no longer viable. Thus, any approach to effectively control swarms must be intrinsically scalable, and must only use infor- mation that is locally available. The immediate question for the control theorist is whether such algorithms are able to guarantee any level of global performance. This is precisely the problem that is investigated in this paper, with an affirmative answer; a distributed scalable control algorithm is proposed that allows very large swarms (sim- ulation results obtained with 10 4 agents) to self-organize and find near-global-optimal routes to locally known targets. The proposed algorithm uses only information that can be locally queried, and rigorous theoretical results on convergence, robustness, scalability are established, and effect of system parameters such as the agent-level communication radius and agent velocities on global performance is analyzed. The fundamental philosophy of the proposed approach is to percolate local in- formation across the swarm, enabling agents to indirectly access the global context. A gradient emerges, reflecting the performance of agents, computed in a distributed manner via local information exchange between neighboring agents. It is shown that to follow near-optimal routes to a target which can be only sensed locally, and whose location is not known a priori, the agents need to simply move towards its “best” ∗ Corresponding Author, email: ixc128@psu.edu, Mechanical Engineering, The Pennsylvania State University, USA (Supported in part by the U.S. Army Research Office (W911NF-07-1-0376) and the Office of Naval Research (N00014-09-1-0688)) 1 arXiv:1104.4251v1 [cs.RO] 21 Apr 2011 2 I. Chattopadhyay neighbor, where the notion of “best” is obtained by computing the state-specific lan- guage measure [8] of an underlying probabilistic finite state automata [9]. Gradient based method in swarm control are not new [7, 10, 11, 12]. Majority of reported work following this direction draw inspiration from swarming phenomena observed in nature, where self-organized exploration strategies emerge at the collec- tive level as a result of simple rules followed by individual agents. To produce the global behavior, individuals interact by using simple and mostly local communica- tion protocols. Social insects are a good biological example of organisms collectively exploring an unknown environment, and they have served as a source of direct inspira- tion for research on self-organized cooperative robotic exploration and path formation in groups of robots [13, 14]. The standard engineering approach to analyze desired global patterns and break them down into a set of simple rules governing individual agents is seldom applicable to large populations aspiring to accomplish complex tasks. Nevertheless some progress have been made in this direction [15]. “We now know that such synchronized group behavior (of flocking birds) is mediated through sensory modalities such as vision, sound, pressure and odor detection. Individuals tend to maintain a personal space by avoiding those too close to themselves; group cohesion results from a longer-range attraction to others; and animals often align their direction of travel with that of nearby neighbors. These responses can account for many of the group structures we see in nature, includ- ing insect swarms and the dramatic vortex-like mills formed by some species of fish and bat. By adjusting their motion in response to that of near neighbors, individuals in groups both generate, and are in- fluenced by, their social context there is no centralized controller. ” Collective Minds, D. Couzin [16] On the other hand, the Evolutionary Robotics (ER) methodology [17] allows for an implementation of a top-down approach, where reinforcement learning via evolution- ary optimization techniques allows assessment of the systems overall performance, and sequentially improve control laws. While such heuristic techniques have been shown to yield robust and scalable systems, assuring global performance has remained an elusive challenge. The present paper aims to fill this gap by proposing a simple control approach with provable guarantees on global performance. The key difference with the reported gradient based techniques lies in the formal model that is developed, and the associated theoretical results that show that the algorithm achieves near-global optimality. To the best of the author’s knowledge, such an approach has not been previously investigated, primarily due to the complexity spike encountered in deriving optimal solutions in a decentralized environment. Recent investigations [18, 19] into the so- lution complexity of decentralized Markov decision processes have shown that the problem is exceptionally hard even for two agents; illustrating a fundamental divide between centralized and decentralized control of MDP. In contrast to the centralized approach, the decentralized case provably does not admit polynomial-time algorithms. Furthermore, assuming EXP = NEXP, the problems require super-exponential time to solve in the worst case. Furthermore, since distributed systems with access to only local information can be mapped to partially observable MDPs, it follows from [20] that such problems are non-approximable, negating the possibility of obtaining optimal solutions to approximate representations. Such negative results do not preclude the possibility of obtaining near-optimal Distributed Self-Organization Of Swarms To Find  -Optimal Routes 3 solutions efficiently, when the set of models considered is a strictly smaller subset of general MDPs. This is precisely what we achieve in this paper; casting the path plan- ning problem as a performance maximization problem for an underlying probabilistic finite state automata (PFSA). In spite of similar Markovian assumptions, the PFSA model is distinct from the general MDPS (See Section 2.1), and admits decentralized manipulation, such that the control policy, on convergence, is within an  bound of the global optimal. Furthermore, one can freely choose the error bound  (and make it as small as one wishes), with the caveat that the convergence time increases (with no finite upper bound) with decreasing  . The present work is also distinct from Potential Field-based methodologies (PFM) widely studied in the centralized single-or-few robot scenarios [21, 22]. Early PFM implementations had substantial shortcomings [23] suffering from trap situations, in- stability in narrow passages etc . Some of these shortcomings have been addressed recently, leading to globally convergent potential planners [24, 25, 26, 27, 28, 29, 30]. These approaches are computationally hard for single-or-few robots, and thus not applicable in the current context. Some variations of the latter approaches have at- tempted to reduce the complexity by combining search algorithms and potential fields [31, 32, 33], virtual obstacle methods method [34, 35], sub-goal methods [36, 37], wall- following methods [38, 39, 34, 40] etc. Nevertheless, since heuristic strategies only based on local environment information are usually applied, many of these methods cannot guarantee convergence in general. The rest of the paper is organized in six sections. Section 2 briefly summarizes the theory of quantitative measures of probabilistic regular languages, and the pertinent approaches to centralized performance maximization of PFSA. Section 3 develops the PFSA model for a swarm, and Section 4 presents the theoretical development for decentralized PFSA optimization, thus solving the problem of computing  -optimal routes in a static or frozen swarm. Section 5 extends the results to a dynamic swarm, where route optimization and positional updates are carried out simultaneously. Sec- tion 6 validates the theoretical development with high fidelity simulation results. The paper concludes in Section 7 with recommendations for future work. 2. Background: Language Measure Theory. This section summarizes the concept of signed real measure of probabilistic regular languages, and its application in performance optimization of probabilistic finite state automata (PFSA) [8]. A string over an alphabet ( i.e. a non-empty finite set) Σ is a finite-length sequence of symbols from Σ [41]. The Kleene closure of Σ, denoted by Σ ∗ , is the set of all finite-length strings of symbols including the null string  . xy is the concatenation of strings x and y , and the null string  is the identity element of the concatenative monoid. Definition 1 ( PFSA ). A PFSA G over an alphabet Σ is a sextuple ( Q, Σ , δ, ̃ Π , χ, C ) , where Q is a set of states, δ : Q × Σ ? → Q is the (possibly partial) transition map; ̃ Π : Q × Σ → [0 , 1] is an output mapping or the probability morph function that specifies the state-specific symbol generation probabilities, satisfying ∀ q i ∈ Q, σ ∈ Σ , ̃ Π( q i , σ ) = 0 , and ∑ σ ∈ Σ ̃ Π( q i , σ ) = 1 , the state characteristic function χ : Q → [ − 1 , 1] assigns a signed real weight to each state reflecting the immediate pay-off from visiting that state, and C is the set of controllable transitions that can be disabled (See Definition 2) by an imposed control policy. Definition 2 ( Control Philosophy ). If δ ( q i , σ ) = q k , then the disabling of σ at q i prevents the state transition from q i to q k . Thus, disabling a transition σ at a state q replaces the original transition with a self-loop with identical occurrence probability, i.e. we now have δ ( q i , σ ) = q i . Transitions that can be so disabled are controllable, 4 I. Chattopadhyay and belong to the set C . Definition 3. The language L ( q i ) generated by a PFSA G initialized at the state q i ∈ Q is defined as: L ( q i ) = { s ∈ Σ ∗ | δ ( q i , s ) ∈ Q } Similarly, for every q j ∈ Q , L ( q i , q j ) denotes the set of all strings that, starting from the state q i , terminate at the state q j , i.e., L ( q i , q j ) = { s ∈ Σ ∗ | δ ( q i , s ) = q j ∈ Q } Definition 4 ( State Transition Matrix ). The state transition probability ma- trix Π ∈ [0 , 1] Card ( Q ) × Card ( Q ) , for a given PFSA is defined as: ∀ q i , q j ∈ Q, Π ij = ∑ σ ∈ Σ s . t . δ ( q i ,σ )= q j ̃ Π( σ, q i ) Note that Π is a square non-negative stochastic matrix [42], where Π ij is the probability of transitioning from q i to q j . Notation 1. We use matrix notations interchangeably for the morph function ̃ Π . In particular, ̃ Π ij = ̃ Π( q i , σ j ) with q i ∈ Q, σ j ∈ Σ . Note that ̃ Π ∈ [0 , 1] Card ( Q ) × Card (Σ) is not necessarily square, but each row sums up to unity. A signed real measure [43] ν i : 2 L ( q i ) → R ≡ ( −∞ , + ∞ ) is constructed on the σ -algebra 2 L ( q i ) [8], implying that every singleton string set { s ∈ L ( q i ) } is a measurable set. Definition 5 ( Language Measure ). Let ω ∈ L ( q i , q j ) ⊆ 2 L ( q i ) . The signed real measure ν i θ of every singleton string set { ω } is defined as: ν i θ ( { ω } ) , θ (1 − θ ) | ω | ̃ Π( q i , ω ) χ ( q j ) . For every choice of the parameter θ ∈ (0 , 1) , the signed real mea- sure of a sublanguage L ( q i , q j ) ⊆ L ( q i ) is defined as: ν i θ ( L ( q i , q j )) , ∑ ω ∈ L ( q i ,q j ) θ (1 − θ ) | ω | ̃ Π( q i , ω ) χ j . The measure of L ( q i ) , is defined as ν i θ ( L ( q i )) , ∑ q j ∈ Q ν i θ ( L i,j ) . Notation 2. For a given PFSA, we interpret the set of measures ν i θ ( L ( q i )) as a real-valued vector of length Card ( Q ) and denote ν i θ ( L ( q i )) as ν θ | i . The language measure can be expressed vectorially as (where the inverse exists for θ ∈ (0 , 1] [8]): ν θ = θ [ I − (1 − θ )Π ] − 1 χ (2.1) In the limit of θ → 0 + , the language measure of singleton strings can be inter- preted to be product of the conditional generation probability of the string, and the characteristic weight on the terminating state. Hence, smaller the characteristic, or smaller the probability of generating the string, smaller is its measure. Thus, if the characteristic values are chosen to represent the control specification, with more posi- tive weights given to more desirable states, then the measure represents how good the particular string is with respect to the given specification, and the given model. The limiting language measure ν 0 | i = lim θ → 0 + θ [ I − (1 − θ )Π ] − 1 χ ∣ ∣ i sums up the limiting measures of each string starting from q i , and thus captures how good q i is, based on not only its own characteristic, but on how good are the strings generated in future from q i . It is thus a quantification of the impact of q i , in a probabilistic sense, on future dynamical evolution [8]. Definition 6 ( Supervisor ). A supervisor is a control policy disabling a specific subset of the set C of controllable transitions. Hence there is a bijection between the set of all possible supervision policies and the power set 2 C . Language measure allows quantitative comparison of different supervision policies. Definition 7 ( Optimal Supervision Problem ). Given G = ( Q, Σ , δ, ̃ Π , χ, C ) , compute a supervisor disabling D ? ⊆ C , s.t. ν ? 0 = (Elementwise) ν † 0 ∀ D † ⊆ C where ν ? 0 , ν † 0 are the limiting measure vectors of supervised plants G ? , G † under D ? , D † respectively. The solution to the optimal supervision problem is obtained in [8] by design- ing an optimal policy using ν θ with θ ∈ (0 , 1). To ensure that the computed opti- mal policy coincides with the one for θ → 0 + , the authors choose a small non-zero Distributed Self-Organization Of Swarms To Find  -Optimal Routes 5 value for θ in each iteration step of the design algorithm. To address numerical issues, algorithms reported in [8] computes how small a θ is actually sufficient to ensure that the optimal solution computed with this value of θ coincides with the optimal policies for any smaller value, i.e. , computes the critical lower bound θ ? . (This is closely related to the notion of Blackwell optimality; See Section 2.1) More- over the solution obtained is stationary, efficiently computable, and can be shown to be the unique maximally permissive policy among ones with maximal performance. Language-measure-theoretic optimization is not a search (and has several key advan- tages over Dynamic Programming based approaches. See Section 2.1 for details); it is an iterative sequence of combinatorial manipulations, that monotonically improves the measures, leading to element-wise maximization of ν θ (See [8]). It is shown in [8] that lim θ → 0 + θ [ I − (1 − θ )Π ] − 1 χ = P χ , where the i th row of P (denoted as ℘ i ) is the stationary probability vector for the PFSA initialized at state q i . In other words, P is the Cesaro limit of the stochastic matrix Π, satisfying P = lim k →∞ 1 k ∑ k − 1 j =0 Π j [42]. Proposition 1 ( See [8] ). Since the optimization maximizes the language mea- sure element-wise for θ → 0 + , it follows that for the optimally supervised plant, the standard inner product 〈 ℘ i , χ 〉 is maximized, irrespective of the starting state q i ∈ Q . Notation 3. The optimal θ -dependent measure for a PFSA is denoted as ν ? θ and the limiting measure as ν ? . Algorithm 1 : Computation of Optimal Supervisor input : P , χ , C output : Optimal set of disabled transitions D ? begin 1 Set D [0] = ∅ ; /* Initial disabling set */ 2 Set ̃ Π [0] = ̃ Π ; /* Initial event prob. matrix */ 3 Set θ [0] ? = 0 . 99, Set k = 1 , Set Terminate = false ; 4 while ( Terminate == false ) do 5 Compute θ [ k ] ? ; /* Algorithm 2 */ 6 Set ̃ Π [ k ] = 1 − θ [ k ] ? 1 − θ [ k − 1] ? ̃ Π [ k − 1] ; 7 Compute ν [ k ] ; 8 for j = 1 to n do 9 for i = 1 to n do 10 Disable all controllable q i σ − → q j s.t. ν [ k ] j < ν [ k ] i ; 11 Enable all controllable q i σ − → q j s.t. ν [ k ] j = ν [ k ] i ; 12 Collect all disabled transitions in D [ k ] ; 13 if D [ k ] == D [ k − 1] then 14 Terminate = true ; 15 else 16 k = k + 1 ; 17 D ? = D [ k ] ; /* Optimal disabling set */ 18 end 19 For completeness, the key algorithms are included as Algorithms 1 and 2. 6 I. Chattopadhyay Algorithm 2 : Computation of the Critical Lower Bound θ ? input : P , χ output : θ ? begin 1 Set θ ? = 1, Set θ curr = 0; 2 Compute P , M 0 , M 1 , M 2 ; 3 for j = 1 to n do 4 for i = 1 to n do 5 if ( P χ ) i − ( P χ ) j 6 = 0 then 6 θ curr = 1 8 M 2 ∣ ∣ ( P χ ) i − ( P χ ) j ∣ ∣ 7 else 8 for r = 0 to n do 9 if ( M 0 χ ) i 6 = ( M 0 χ ) j then 10 Break ; 11 else 12 if ( M 0 M r 1 χ ) i 6 = ( M 0 M r 1 χ ) j then 13 Break ; 14 if r == 0 then 15 θ curr = |{ ( M 0 − P ) χ } i −{ ( M 0 − P ) χ } j | 8 M 2 ; 16 else 17 if r > 0 AND r ≤ n then 18 θ curr = | ( M 0 M 1 χ ) i − ( M 0 M 1 χ ) j | 2 r +3 M 2 ; 19 else 20 θ curr = 1 ; 21 θ ? = min( θ ? , θ curr ) ; 22 end 23 2.1. Relation Of The Centralized Approach To Dynamic Program- ming. In spite of underlying Markovian assumptions, the PFSA model is distinct from the standard formalism of (finite state) Controlled Markov Decision Processes (CMDP) [44, 45, 46]. In the latter, control actions are not probabilistic; the associ- ated control function maps states to unique actions in a deterministic manner, and the control problem is to decide which of the available control actions should be exe- cuted in each state. On the other hand, in the PFSA formalism, control is exerted by selectively disabling controllable probabilistic state transitions, and is thus a proba- bilistic generalization of supervisory control theory [47]. Note that while in the MDP framework, one specifies which control action to take at a given state, in the PFSA formalism one specifies which of the available control actions are not allowed at the current state, and that any of the remaining can be executed in accordance to their generation probabilities. Denoting the set of controllable transitions at state q k as Σ C , and φ : Q → Σ C as the control policy mapping the current state q k ∈ Q to the controllable move φ ( q k ) ∈ Σ C (and assuming that the control action is to dictate the agent to execute a specific controllable move and is not supervisory in nature), one can formulate an analogous optimization problem that admits solution within the DP Distributed Self-Organization Of Swarms To Find  -Optimal Routes 7 framework. The transition probabilities for a stationary policy φ is given by: ∀ q i , q j ∈ Q, σ r ∈ Σ C , P rob ( q j ∣ ∣ q i , φ ( q i ) = σ r ) = ∑ σ s.t. ( σ ∈ Σ \ Σ C ∪{ σ r } ) ∧ ( δ ( q i ,σ )= q j ) ̃ π ( q i , σ ) (2.2) Immediate rewards, in DP terminology, can be related to the state characteristic χ : g ( q i , φ ( q i )) = χ ( q i ) (2.3) We note that the problem at hand must be solved over an infinite horizon, since the total number of transitions ( i.e. the path length) is not bounded. Identifying (1 − θ ) as the discount factor, the cost-to-go (to be maximized) for the infinite horizon discounted cost (DC) problem is given by: J ( q 0 , φ ) = lim N →∞ E ( N ∑ i =0 (1 − θ ) i g ( q i , φ ( q i ) )) (2.4) and for the infinite horizon Average Cost per stage (AC): J ( q 0 ) = lim N →∞ 1 N E ( N ∑ i =0 g ( q i , φ ( q i ) )) (2.5) AC is more appropriate, since there is no reason to ”discount” events in future. In the PFSA formalism, we solve the analogous discounted problem at sufficiently small θ , and guarantee that the solution is simultaneously average cost optimal, primarily due to the following identity [8]: lim θ → 0 + θ ∞ ∑ k =0 (1 − θ ) k Π k χ = lim N →∞ 1 N N − 1 ∑ j =0 Π j χ = P χ (2.6) where P is the Cesaro limit of the stochastic matrix Π. Thus, the proposed technique can solve the problem by maximizing ν θ = lim θ → 0 + θ ∞ ∑ k =0 (1 − θ ) k Π k χ = lim θ → 0 + θ [ I − (1 − θ )Π ] − 1 χ (2.7) i.e. , the language measure, and achieve maximization of P χ (guaranteeing the prob- ability of reaching the goal is maximized, while simultaneously minimizing collision probability). In any case, the formulated DP problem can be solved using standard solution methodologies such as Value Iteration (VI) or Policy Iteration (PI) [48, 49]. We note that for VI, we need to search for the control action that maximizes the value update over all possible control actions, in each iteration. On the other hand, PI involves two steps in each iteration: (1) policy evaluation, which is very similar to the measure computation step in each iteration for the language-measure-theoretic technique and (2) policy improvement, which involves searching for a improved ac- tion over possible control actions for each state (which involves at least one product between a Card ( Q ) × Card ( Q ) matrix of transition probabilities and the current cost vector of length Card ( Q ) per state). The disabling/enabling of controllable transitions is significantly simpler compared to the search steps that both VI and PI require, and the improvement in complexity (for PI which is closer to the proposed al- gorithm in the centralized case, since the latter proceeds via computing a sequence of 8 I. Chattopadhyay q 0 q 1 q 2 Agent Movement Downstream Downstream ← Agent Loss (Probability λ 02 ) Agent Loss → (Probability λ 01 ) (a) q 0 q 1 q 2 q v 01 q v 02 D σ 1 σ 2 Prob. (1 − λ 01 ) τ 1 Prob. (1 − λ 02 ) τ 2 Agent Movement Computation in State q 0 Agent Loss Prob. λ 01 d Prob. λ 02 d Down- stream Down- stream (b) Local Network Model q 0 q 1 q 2 q v 01 q v 02 D σ 1 σ 2 Disabled Prob. (1 − λ 01 ) τ 1 Prob. (1 − λ 02 ) τ 2 Agent Movement Computation in State q 0 Agent Loss Prob. λ 01 d Prob. λ 02 d Down- stream Down- stream (c) Controlled Local Model Fig. 3.1: Agent-centric local decision-making with non-zero failure probability monotonically improving policies) is by at least an asymptotic factor of O ( Card ( Q ) 2 ) per iteration (for dense matrices, and O ( Card ( Q )) per iteration in the sparse case), which is significant for large problems. Remark 1. The number of iterations is not expected to be comparable for the PFSA framework, VI and PI techniques; simulations indicate the measure-theoretic approach converges faster, and detailed investigations in this direction is a topic of future work. Another key advantage of the PFSA-based solution methodology is guaranteed Blackwell optimality [50, 44]. It is well recognized that the average cost criterion is underselective, namely the finite time behavior is completely ignored. The condition of Blackwell optimality attempts to correct this by demanding the computed controller be optimal for a continuous range of discount factors in the interval ( d 0 , 1), i.e. for θ ∈ (0 , 1 − d 0 ), where 0 < d 0 < 1. Since the PFSA-based approach maximizes the language measure for some θ = θ min , such that the optimal policy is guaranteed to be identical for all values of θ in the range [ θ min , 0], the solution satisfies the Blackwell condition. It is possible to obtain such Blackwell optimal policies within the DP framework as well, but the approach(es) are significantly more involved (See [44], Chapter 8). The ability to adapt θ at each iteration (See Algorithm 2) leads to a novel adaptive discounting scheme in the technique proposed, which solves the infinite horizon problem efficiently while using a non-zero θ = θ min at all iteration steps. 3. The Swarm Model. We consider ad-hoc mobile network of communicating agents endowed with limited computational resources. For simplicity of exposition, we develop the theoretical results under the assumption of a single target, or goal. This is not a serious restriction and can be easily relaxed. The location and identity of the target is not known a priori to the individual agents, only ones which are within the communication radius of the target can sense its presence. The communication Distributed Self-Organization Of Swarms To Find  -Optimal Routes 9 radii are assumed to be constant throughout. Inter-agent communication links are assumed to be perfect, which again can be generalized easily, within our framework. We assume agents can efficiently gather the following information: 1. (Set of Neighboring agents:) Number and unique id. of agents to which it can successfully send data via a 1-hop direct line-of-sight link. The communica- tion radius R c is assumed to be identical for each agent. The set of neighbors for each agent varies with time as the swarm evolves. 2. (Local Navigation Properties:) Navigation is assumed to occur by moving towards a chosen neighbor with constant velocity, the magnitude of which is assumed to be identical for each agent. In general, there a non-zero probabil- ity of agent failure in the course of execution this maneuver, which is assumed to be either known or learnable by the agents. However the explicit learning of these local costs is not addressed in this paper. The local network model, along with the decision-making philosophy, is illustrated in Figure 3.1. We will talk about a frozen swarm, which denotes a particular spatial configuration of the agents assumed to be fixed in time. Unless explicitly mentioned, the agents are assumed to be updating their positions in continuous time (moving with constant velocity), while changing their headings in discrete time as dictated via on-board decision-making based on locally available information, with the objective of reaching the target with minimum end-to-end probability of agent failure. We fur- ther assume that the failure probabilities mentioned above are functions of the agent locations, and possibly vary in a smooth non-increasing manner with increasing inter- agent distances. However no time-dependence is assumed, i.e. the failure probabilities remain constant for a frozen swarm, and change (due to the positional updates) for a mobile one. The agent velocities are assumed to be significantly slower compared to the time required for the convergence of the optimization algorithm for each frozen configuration. The implications of the last assumption will be discussed in the sequel. First we formalize a failure-prone ad-hoc network of frozen communicating agents as a probabilistic finite state automata. Definition 8 ( Neighbor Map For A Frozen Swarm ). If Q is the set of all agents in the network, then the neighbor map N : Q → 2 Q specifies, for each agent q i ∈ Q , the set of agents N ( q i ) ⊂ Q (excluding q i ) to which q i can communicate via a single hop direct link. Definition 9 ( Failure Probability ). The failure probability λ ij ∈ [0 , 1] is defined to be the probability of unrecoverable loss of agent q i in the course of moving towards agent q j . Thus, λ ij reflects local or immediate navigation costs, and estimated risks and therefore varies with the positional coordinates of the agents q i and q j . These quan- tities are not constrained to be symmetric in general, i.e. , λ ij 6 = λ ji . We assume the agent-based estimation of these ratios to converge fast enough, in the scenario where such parameters are learned on-line. Since we are more concerned with decision op- timization in this paper, we ignore the parameter estimation problem of learning the failure probabilities, which is at least intuitively justified by the existence of separated policies in large classes of similar problems. We visualize the local network around a agent q 0 in a manner illustrated in Fig- ure 3.1(a) (shown for two neighbors q 1 and q 2 ). In particular, agent q 0 attempting to move towards the current position of q 1 experiences a failure probability λ 01 , while the moving towards q 2 has a failure probability λ 02 . To correctly represent this infor- mation, we require the notion of virtual states ( q v 01 , q v 02 in Figure 3.1(b)). 10 I. Chattopadhyay Virtual State Controllable Uncontrollable 6-agent Frozen Network Dump State Corresponding PFSA model with virtual states q 1 q 1 λ 12 q 2 q 2 q v 12 Fig. 3.2: 6-agent network & 23 state PFSA (16 virtual states, 6 agents, 1 dump state) Remark 2 ( Necessity Of Virtual States ). The virtual states are required to model the physical situation within the PFSA framework, in which transitions do not emerge from other transitions. As illustrated in Figure 3.1(a), the failure events do actually occur in the course of the attempted maneuver; hence necessitating the notion of the virtual states. Definition 10 ( Virtual State ). Given a agent q i , and a neighbor q j ∈ N ( q i ) with a specified failure probability λ ij , any attempted move towards q j is assumed to be first routed to a virtual state q v ij , upon which there is either an automatic ( i.e. uncontrollable) forwarding to q j with probability 1 − λ ij , or a failure with probability λ ij . The set of all virtual states in a network of Q agents is denoted by Q v in the sequel. Hence, the total number of virtual states is given by: Card ( Q v ) = ∑ i : q i ∈ Q N ( q i ) (3.1) And the cardinality of the set of virtual states satisfies: 0 5 Card ( Q v ) 5 Card ( Q ) 2 − Card ( Q ) (3.2) We assume that there is a static agent at the target or the goal, which we denote as q Tgt . The local communication with this agent-at-target can be visualized as the process of sensing the target by the mobile agents. We are ready to model an ad-hoc communicating network of frozen agents as a PFSA, whose states correspond to either agents, the virtual states, or the state reflecting agent failures. Definition 11 ( PFSA Model of Frozen Network ). For a given set of agents Q , the function N : Q → 2 Q , the link specific failure probabilities λ ij for any agent q i and a neighbor q j ∈ N ( q i ) , and a specified target q Tgt ∈ Q , the PFSA G N = ( Q N , Σ , δ, ̃ Π , χ, C ) is defined to be a model of the network, where (denoting Card ( N ( q i )) = m ): ◦ States: Q N = Q ⋃ Q v ⋃ { q D } (3.3a) where Q v is the set of virtual states, and q D is a dump state which models loss of agent due to failure. ◦ Alphabet: Σ = ⋃ i : q i ∈ Q   ⋃ j : q j ∈N ( q i ) σ ij   ⋃ { σ D } (3.3b) Distributed Self-Organization Of Swarms To Find  -Optimal Routes 11 where σ ij denotes navigation (attempted or actual) σ D denotes agent failure. ◦ Transition Map: δ ( q, σ ) =            q v ij if q = q i , σ = σ ij q j if q = q v ij , σ = σ ij q D if q = q v ij , σ = σ D q D if q = q D , σ = σ D − undefined otherwise (3.3c) ◦ Probability Morph Matrix: ̃ Π( q, σ ) =            1 m if q = q i , σ = σ ij 1 − λ ij if q = q v ij , σ = σ ij λ ij if q = q v ij , σ = σ D 1 if q = q D , σ = σ D 0 otherwise (3.3d) ◦ Characteristic Weights: χ i = { 1 if q i = q Tgt 0 otherwise (3.3e) ◦ Controllable Transitions: ∀ q i ∈ Q, q j ∈ N ( q i ) , q i σ ij − − → q v ij ∈ C (3.3f) We note that for a network of Q agents, the PFSA model may have (almost always has, see Figure 3.2) a significantly larger number of states. Using Eq. (3.2): Card ( Q ) + 1 5 Card ( Q N ) 5 Card ( Q ) 2 + 1 (3.4) This state-explosion will not be a problem for the distributed approach developed in the sequel, since we use the complete model G N only for the purpose of deriving theo- retical guarantees. Note, that Definition 11 generates a PFSA model which can be op- timized in a straightforward manner using the language-measure-theoretic technique described in Section 2 (See [8]) for details). This would yield the optimal routing policy in terms of the disabling decisions at each agent that minimize source-to-target failure probabilities (from every agent in the network). To see this explicitly, note that the measure-theoretic approach elementwise maximizes lim θ → 0 + θ [ I − (1 − θ )Π ] − 1 χ = P χ , where the i th row of P (denoted as ℘ i ) is the stationary probability vector for the PFSA initialized at state q i (See Proposition 1). Since, the dump state has charac- teristic − 1, the target has characteristic 1, and all other agents have characteristic 0, it follows that this optimization maximizes the quantity ℘ i Tgt − ℘ i Dump , for every source state or agent q i in the network. Note that ℘ i Tgt , ℘ i Dump are the stationary probabilities of reaching the target and incurring an agent loss to dump respectively, from a given source q i . Thus, maximizing ℘ i Tgt − ℘ i Dump for every q i ∈ Q guarantees that the computed routing policy is indeed optimal in the stated sense. However, the procedure in [8] requires centralized computations, which is precisely what we wish to avoid. The key technical contribution in this paper is to develop a distributed approach to language-measure-theoretic PFSA optimization. In effect, the theoret- ical development in the next section allows us to carry out the language-measure- theoretic optimization of a given PFSA, in situations where we do not have access to the complete Π matrix, or the χ vector at any particular agent ( i.e. each agent has a limited local view of the network), and are restricted to communicate only with immediate neighbors. We are interested in not just computing the measure vector in a distributed manner, but optimizing the PFSA via selected disabling of controllable transitions (See Section 2). This is accomplished by Algorithm 3. 3.1. Control Approach For Mobile Agents. For the mobile network, G N ( t ) varies as a function of operation time t . For a particular instant t = t 0 , the globally 12 I. Chattopadhyay optimized model G ? N ( t 0 ) yields the local decisions for the agent maneuvers. As stated before, this global optimization can be carried out in a distributed manner, and the agents update their headings towards the neighbor which has the highest measure among all neighbors, provided it is in fact higher than the self-measure. Transition towards any neighbor with a better or equal measure compared to self via randomized choice is also acceptable, but we use the former approach for the theoretical develop- ment in the sequel. The movement however modifies the PFSA model to G N ( t + ∆ t ), and a re-optimization is required. We assume, as stated before, that the agent veloc- ities are slow enough so that they do not interfere with this computation. A crucial point is the time complexity of convergence of the distributed algorithm, which in our case, is small enough to allow this procedure to be carried out efficiently. Also, note that since the complete model is never assembled, the modifications to G N is also a local affair, e.g. updating the set of neighbors or the failure probabilities, and such local effects are felt by the remote agent via percolated information involving an unavoidable delay, which goes to ensure that the effect of all local changes are not felt simultaneously across the network. 3.2. Possibility Of Different Local Models. The local model, as described above and illustrated in Figure 3.1, assumes that errors are non-recoverable; hence the possibility of transitioning to the dump state, from which no outward transition is defined. Alternatively, we could eliminate the dump state, and simply add the transition as a self-loop, or even redistribute the probability among the remaining transitions defined at a state. It is intuitively clear that the adopted model avoids errors most aggressively. 4. Decentralized PFSA Optimization For The Frozen Swarm. In the sequel, the current measure value, for a given θ , at agent q i ∈ Q is denoted as ̂ ν θ | i , and the measure of the virtual state q v ij ∈ Q N is denoted as ̂ ν θ | ( q V ij ) . The parenthesized entry ( q V ij ) denotes the index of the virtual state q v ij in the state set Q N . Similarly, the transition probability from q i to q v ij is denoted as Π i ( q V ij ) . The subscript entry i ( q V ij ) denotes the ik th element of Π, where k = ( q V ij ). Algorithm 3 establishes a distributed, asynchronous procedure achieving: ∀ q i ∈ Q, ̂ ν θ | i global − − − − − − − − → convergence ν ? θ | i (4.1) where ν ? θ | i is the optimal measure for q i ∈ Q that would be obtained by optimizing the PFSA G N , for a given θ , in a centralized approach (See Section 2). The optimal routing policy can then be obtained by moving towards neighboring agents which have a better or equal current measure value. If more than a one such neighbor is available, then one either chooses the local destination agent randomly, in an equiprobable manner; or as we use in this paper, move towards one chosen from the set of neighbors with maximal measure. In the sequel we show that this forwarding policy converges to the globally optimal routing policy, that, for a sufficiently small θ , it maximizes probability of reaching the target, while simultaneously minimizing the probability of end-to-end failures. Furthermore, choosing randomly between qualifying neighboring agents leads to significant congestion resilience. These issues would be elaborated in the sequel (Proposition 7). Algorithm 3 has four distinct parts, marked as (a1), (a2), (a3) and (a4). Part (a1) involves inter-agent communication, to enable a particular agent q i ∈ Q to as- certain the current measure values of neighboring agents, and the failure probabilities Distributed Self-Organization Of Swarms To Find  -Optimal Routes 13 Algorithm 3 : Distributed Update of Agent Measures In Frozen Swarm input : G N = ( Q, Σ , δ, ̃ Π , χ, C ), θ begin Initialize ∀ q i ∈ Q, ̂ ν θ | i = 0 / ∗ Begin Infinite Asynchronous Loop ∗ / while true do for each agent q i ∈ Q do if N ( q i ) 6 = ∅ then m = Card ( N ( q i )) for each agent q j ∈ N ( q i ) do / ∗ (a1) Inter-agent Communication ∗ / Query ̂ ν θ | j & Failure Prob. λ ij / ∗ (a2) Control Adaptation ∗ / if ̂ ν θ | j < ̂ ν θ | i then Π ii = Π ii + Π i ( q V ij ) Π i ( q V ij ) = 0; /* Disable */ elseif Π i ( q V ij ) == 0 then Π i ( q V ij ) = 1 m Π ii = Π ii − 1 m / ∗ Enable ∗ / endif endif / ∗ (a3) Updating Virtual States ∗ / ̂ ν θ | ( q V ij ) = (1 − θ )(1 − λ ij ) ̂ ν θ | j endfor endif / ∗ (a4) Updating Agent ∗ / ̂ ν θ | i = ∑ j : q j ∈N ( q i ) (1 − θ )Π i ( q V ij ) ̂ ν θ | ( q V ij ) +(1 − θ )Π ii ̂ ν θ | i + θχ | i endfor endw end λ ij on respective links. Recall, that we assume the probabilities λ ij to be more or less constant for the frozen swarm; however agents need to estimate these values for generalization to the mobile case. Part (a2) is the control adaptation, in which the agents decide, based on local information, the set of allowable destination agents. Part (a3) is the computation of the updated measure values for the virtual states q v ij where j : q j ∈ N ( q i ). Finally, part (a4) updates the measure of the agent q i based on the computed current measures of the virtual states. We note that Algorithm 3 only uses information that is either available locally, or that which can be queried from neighboring agents. Proposition 2 ( Convergence ). For a network Q modeled as G N = ( Q N , Σ , δ, ̃ Π , χ, C ) , the distributed procedure in Algorithm 3 has the following properties: 1. Computed measure values for every agent q i ∈ Q are non-negative and bounded above by 1 , i.e. , ∀ q i ∈ Q N , ∀ t ∈ [0 , ∞ ) , ̂ ν t θ | i ∈ [0 , 1] (4.2) 2. For constant failure probabilities and constant neighbor map N : Q → 2 Q , 14 I. Chattopadhyay Algorithm 3 converges in the sense: ∀ q i ∈ Q N , lim t →∞ ̂ ν t θ | i = ν ∞ θ | i ∈ [0 , 1] (4.3) 3. Convergent measure values coincide with the optimal values computed by the centralized approach: ∀ q i ∈ Q N , ν ∞ θ | i = ν ? θ | i (4.4) Proof . (Statement 1:) Non-negativity of the measure values is obvious. For establishing the upper bound, we use induction on computation time t . We note that all the measure values ̂ ν t θ | i are initialized to 0 at time t = 0. The first agent to change its measure will be the target, which is updated at some time t = t 0 : ̂ ν t 0 θ | ( q Tgt ) = 0 + θχ ( q Tgt ) = θ (4.5) where the first term is zero since all agents still have measure zero and the target characteristic χ ( q Tgt ) = 1. Thus, there exists a non-trivial time instant t 0 , at which: (Induction Basis) ∀ q i ∈ Q N , ̂ ν t θ | i 5 1 (4.6) Next we assume for time t = t ′ , we have (Induction Hypothesis) ∀ q i ∈ Q N , ∀ τ 5 t ′ , ̂ ν τ θ | i 5 1 We consider the next updates for physical agents and virtual states separately, and denote the time instant for the next updates as t ′ + . Note, that t ′ + actually may be different for different agents (asynchronous operation). (Virtual States) For any virtual state q i = q v kj ∈ Q N , where q k , q j ∈ Q , we have: ̂ ν t ′ + θ | i = (1 − λ ij )(1 − θ ) ̂ ν t ′ θ | j 5 1 (4.7) (Physical Agents) For any q i ∈ Q , where set of enabled neighbors E n = { q j ∈ N ( q i ) s.t. ̂ ν t ′ + θ | ( q V ij ) = ̂ ν t ′ θ | i } : ̂ ν τ + θ | i = 1 Card ( N ( q i )) ( ∑ j : q j ∈ E n (1 − θ ) 2 (1 − λ ij ) ̂ ν t ′ θ | ( q V ij ) + ∑ j : j ∈N ( q i ) \ E n (1 − θ ) ̂ ν t ′ θ | i ) 5 1 Card ( N ( q i )) ( ∑ j : q j ∈ E n 1 + ∑ j : j ∈N ( q i ) \ E n 1 ) 5 1 which establishes Statement 1. (Statement 2:) We claim that for each agent q i ∈ Q N , the sequence of measures ̂ ν t θ | i forms a monotonically non-decreasing sequence as a function of the computation time t . Again, we use induction on computation time. Considering the time instant t 0 (See Eqn. (4.5)), we note that we have an instant up to which all measure values have indeed changed in a non-decreasing fashion, since the measure of q Tgt increased to θ , while other agents are still at 0; which establishes the basis. For our hypothesis, we assume that there exists some time instant t ′ > t 0 , such that all measure values have undergone non-decreasing updates up to t ′ . We consider the physical agent q i ∈ Q which is the first one to update next, say at the instant t ′ + > t ′ . Referring to Distributed Self-Organization Of Swarms To Find  -Optimal Routes 15 Algorithm 3, this update occurs by first updating the set of virtual states { q v ij : q j ∈ N ( q i ) } . Since virtual states update as: ̂ ν t ′ + θ | ( q v ij = (1 − θ )(1 − λ ij ) ̂ ν t ′ θ | j (4.8) it follows from the induction hypothesis that ̂ ν t ′ + θ | ( q v ij ) = ̂ ν t ′ θ | ( q v ij ) (4.9) If the connectivity ( i.e. the forwarding decisions) for the physical agent q i remains unchanged for the instants t ′ and t ′ + , and since the measures of any neighboring agent has not decreased (by induction hypothesis), then: ̂ ν t ′ + θ | i = ̂ ν t ′ θ | i (4.10) If, on the other hand, the set of disabled transitions for q i changes ( e.g. for some q j ∈ N ( q i ), q i σ ij − − → q v ij was disabled at t ′ and is enabled at t ′ + , or vice verse), the measure of agent q i is increased by the additive factor (1 − θ ) Card ( N ( q i )) ∣ ∣ ∣ ∣̂ ν t ′ θ | i − ̂ ν t ′ θ | ( q v ij ) ∣ ∣ ∣ ∣ , which completes the inductive process and establishes our claim that the measure values form a non-decreasing sequence for each agent as a function of the computation time. Since, a non-decreasing bounded sequence in a complete space must converge to a unique limit [43], the convergence: ∀ q i ∈ Q N , lim t →∞ ̂ ν t θ | i = ν ∞ θ | i ∈ [0 , 1] (4.11) follows from the existence of the upper bound established in Statement 1. This establishes Statement 2. (Statement 3:) From the update equations in Algorithm 3, we note that the limiting measure values satisfy: ̂ ν ∞ θ ∣ ∣ i = (1 − θ ) ∑ j ∈N ( i ) Π ij ̂ ν ∞ θ ∣ ∣ j + θχ | i ⇒ ̂ ν ∞ θ = θ [ I − (1 − θ )Π ] − 1 χ (4.12) which implies that measure values does indeed converge to the measure vector com- puted in a centralized fashion (See Eq. (2.1)). Noting that any further disabling (or re-enabling) would not increase the measure values computed by Algorithm 3, we conclude that this must be the optimal disabling set that would be obtained by the centralized language-measure theoretic optimization of PFSA G N (Section 2). This completes the proof. Proposition 3 ( Initialization Independence ). For a network Q modeled as a PFSA G N = ( Q N , Σ , δ, ̃ Π , χ, C ) , convergence of Algorithm 3 is independent of the initialization of the measure values, i.e. , if ̂ ν t θ,α denotes the measure vector at time t with arbitrary initialization α ∈ [0 , 1] Card ( Q N ) , then: lim t →∞ ̂ ν t θ,α = lim t →∞ ̂ ν t θ (4.13) where ̂ ν 0 θ,α = α and ̂ ν 0 θ = [0 · · · 0] T . 16 I. Chattopadhyay Proof . The measure update equations in Algorithm 3 dictate that the measure values will have a positive contribution from α . Denoting the contribution of α to the measure of agent q i ∈ Q at time t as C t α ( q i ), we note that the measure can be written as ̂ ν t θ,α = C t α ( q i ) + f t i , where f t i is independent of α . Furthermore, the linearity of the updates imply that C t α ( q i ) can be used to formulate an inductive argument as follows. We use k t ? ∈ N ∪ { 0 } to denote the minimum number of updates that every agent in the network has encountered up to time instant t ∈ [0 , ∞ ). We claim that: ∀ q i ∈ Q, ∀ t ∈ [0 , ∞ ) , C t α ( q i ) 5 (1 − θ ) k t ? || α || 1 (4.14) To establish this claim, we use induction on k t ? . For the basis, we note that there exists a time instant t 0 , such that ∀ τ 5 t 0 , k τ ? = 0, implying that ∀ τ 5 t 0 , C τ α ( q i ) = α i 5 (1 − θ ) 0 ∑ q j ∈ Q α j = (1 − θ ) k τ ? || α || 1 We assume that if at some t k , k t k ? = k ∈ N , then: (Induction Hypothesis) ∀ q i ∈ Q, C t k α ( q i ) 5 (1 − θ ) k || α || 1 Next let q i be an arbitrary physical agent, and consider the first update of q i at t + k > t k : ̂ ν t + k θ | i = ∑ j : q j ∈N ( q i ) (1 − θ )Π i ( q v ij ) ̂ ν t k θ | ( q v ij ) + (1 − θ )Π ii ̂ ν t k θ | i + θχ i ⇒C t + k α ( q i ) 5 ∑ j : q j ∈N ( q i ) (1 − θ )Π i ( q v ij ) (1 − λ ij )(1 − θ )(1 − θ ) k || α || 1 + (1 − θ )Π ii (1 − θ ) k || α || 1 + θχ i ⇒C t + k α ( q i ) 5 (1 − θ ) k +1 || α || 1 We note that if k t k +1 ? = k + 1, then every agent q i ∈ Q must have undergone one more update since t k implying: ∀ q i ∈ Q, C t k +1 α ( q i ) 5 (1 − θ ) k +1 || α || 1 (4.15) which completes the induction proving Eq. (4.14). Observing that lim t →∞ k t ? = ∞ , and || α || 1 < ∞ , we conclude: ∀ q i ∈ Q, lim t →∞ C t α ( q i ) = 0 (4.16) which immediately implies Eq. (4.13). Next we investigate the performance of the proposed approach, and establish guarantees on global performance achieved via local decisions dictated by Algorithm 3. We need some technical lemmas, and the notion of strongly absorbing graphs, and graph powers. Definition 12 ( Exact Power of Graph ). For a given graph G = ( V, E ) , the exact power G d , for d ∈ N , is a graph ( V, E ′ ) , such that ( q i , q j ) is an edge in G d , only if there exists a sequence of edges of length exactly d from agent q i to agent q j in G . Definition 13 ( Strongly Absorbing Graph ). A finite directed graph G = ( V, E ) ( V is the set of agents and E ⊆ V × V the set of edges) is defined to be strongly absorbing (SA), if: Distributed Self-Organization Of Swarms To Find  -Optimal Routes 17 1. There are one or more absorbing agents, i.e. , ∃ A $ V , s.t. every agent in A (non-empty) is absorbing. 2. There exists at least one sequence of edges from any agent to one of the absorbing agents in A . 3. If E d denotes the set of edges for the d th exact power of G , then, for distinct agents q i , q j ∈ V , ( q i , q j ) ∈ E ⇒ ∀ d ∈ N , ( q j , q i ) / ∈ E d (4.17) Lemma 1 ( Properties of SA Graphs ). Given a SA graph G = ( V, E ) , with A $ V the absorbing set: 1. The power graph G d is SA for every d ∈ N . 2. q / ∈ A ⇒ ∃ q ′ ∈ V \ { q } s.t. ( q ′ , q ) / ∈ E 3. ∃ d ∈ N ( ∀ q ∈ V \ A ( ∃ q ′ ∈ A ( ( q, q ′ ) ∈ E d ))) Proof . Statement 1 is immediate from Definition 13. Statement 2 follows imme- diately from noting: q / ∈ A ⇒ ∃ q ′ ∈ V \ { q } s.t. ( q, q ′ ) ∈ E ⇒ ( q ′ , q ) / ∈ E Statement 3 follows, since from each agent there is a path (length bounded by Card ( V )) to a absorbing state. The performance of such control policies, and particularly the convergence time- complexity is closely related to the spectral gap of the induced Markov Chains. Hence we need to compute lower bounds on the spectral gap of the chains arising in the context of the proposed optimization, which (as we shall see later) have the strongly absorbing property. The following result computes such a bound as a simple function of the non-unity diagonal entries of Π. Proposition 4 ( Spectral Bound ). Given a n-state PFSA G = ( Q, Σ , δ, ̃ Π) with a strongly absorbing graph, the magnitude of non-unity eigenvalues of the transition matrix Π is bounded above by the maximum non-unity diagonal entry of Π . Proof . Without loss of generality, we assume that G has a single absorbing state (distinct absorbing states can be merged without affecting non-unity eigenvalues). Now, μ is an eigenvalue of Π iff μ d is an eigenvalue of Π d , d ∈ N . From Lemma 1: C1 ∃ ` ∈ N s.t. Π ` has no zero entry in column corresponding to the absorbing state. Let d ? be the smallest such integer. C2 Every non-absorbing state has at least one zero element in the corresponding column of Π d ? . C3 Statements C1,C2 are true for any integer d = d ? . We denote the column of ones as e , i.e. , e = [1 · · · 1] T Since Π d is (row) stochastic, we have Π d e = e . Hence, if v is a left eigenvector for Π d with eigenvalue μ d , then: v Π d e = v e = μ d v e ⇒ (1 − μ d ) v e = 0 (4.18) implying that if μ d 6 = 1, then v e = 0. Now we construct C = [ C 1 · · · C n ], where C j = min j Π d ij (minimum column element). Considering M = Π d − e C , we note: ( v Π d = μ d v ) ∧ ( μ d 6 = 1) ⇒ vM = μ d v (4.19) Recalling that stationary probability vectors (Perron vectors) of stochastic matrices add up to unity, we have: ( v Π d = v ) ⇒ vM = v − v e C = v − C (4.20) 18 I. Chattopadhyay which, along with the fact that since C is not a column of all zeros, implies that an upper bound on the magnitudes of the eigenvalues of M provides an upper bound on the magnitude of non-unity eigenvalues for Π d . Now, invoking the Gerschgorin Circle Theorem [51, 52], we get: | μ d | 5 1 − ∑ j C j = 1 − C a ⇒ | μ | 5 (1 − C a ) 1 d (4.21) where C a is the minimum column element corresponding to the absorbing state. 1 − C a is the maximum probability of not reaching the absorbing state after d steps from any state, which is bounded above by ( a ) d 1 ( b ) d − d 1 where a is the maximum non-diagonal entry in Π not going to the absorbing state, b is the maximum of the non-unity diagonal entries in Π, and d 1 is a bounded integer. Since any sequence of non-selfloops is absorbed in a finite number of steps (strongly absorbing property), we have a finite bound for d 1 . Hence we have: | μ | 5 lim d →∞ a d 1 d b 1 − d 1 d = b = max q i :Π ii < 1 Π ii (4.22) This completes the proof. Next, we make rigorous our notion of policy performance, and near-global or  -optimality. Definition 14 ( Policy Performance &  -Optimality ). The performance vector ρ S of a given routing policy S is the vector of agent-specific probabilities of a packet eventually reaching the target. A policy U has Utopian performance if its performance vector (denoted as ρ U ) element-wise dominates the one for any arbitrary policy S , i.e. ∀ q i ∈ Q N , ρ U i = ρ S i . A policy P has  -optimal performance, if for  > 0 , we have: || ρ P − ρ U || ∞ 5  (4.23) For a chosen θ , the limiting policy P θ computed by Algorithm 3 results in element- wise maximization of the measure vector over all possible supervision policies (where supervision is to be understood in the sense of the defined control philosophy). ̂ ν ∞ θ is related to the policy performance vector ρ P θ as follows. Selective disabling of the transitions dictated by the policy P θ induces a controlled PFSA, which represents the optimally supervised network, for a given θ . Let the optimized transition matrix be Π ? θ , and its Cesaro limit be P ? θ . (Note: Π ? θ , P ? θ are stochastic matrices.) Then: ∀ q i ∈ Q N , P ? θ χ ∣ ∣ i, ( q Tgt ) = ρ P θ i (4.24) We would need to distinguish between the optimal measure vector ̂ ν ∞ θ ′ (optimal for a given θ = θ ′ ) computed by Algorithm 3, and the one obtained by first computing ̂ ν ∞ θ ′ and then using the PFSA structure obtained in the process to compute the measure vector for some other value of θ = θ ′′ . These two vectors may not be identical. Notation 4. In the sequel, we denote the vector obtained in the latter case as ̂ ν ∞ ( θ ′ ,θ ′′ ) implying that we have ̂ ν ∞ ( θ,θ ) = ̂ ν ∞ θ . Lemma 2. We have the following equalities: lim θ → 0 + ̂ ν ∞ ( θ ′ ,θ ) = ρ P θ ′ (4.25a) lim θ → 0 + ̂ ν ∞ ( θ,θ ) = ρ U (4.25b) Distributed Self-Organization Of Swarms To Find  -Optimal Routes 19 Proof . Recalling Eq. (4.24), and noting that for any PFSA with transition matrix Π (with Cesaro limit P ), we have lim θ → 0 + ̂ ν θ = lim θ → 0 + θ [ I − (1 − θ )Π ] − 1 χ = P χ , we have Eq. (4.25a). In general, different choices of θ result in different disabling decisions, and hence different policies. However, since there is at most a finite number of distinct policies for a finite network, there must exist a θ ? such that for all choices 0 < θ 5 θ ? , the policy remains unaltered (although the measure values may differ). Since, executing the optimization with vanishingly small θ yields a performance vector identical (in the limit) with the optimal measure vector element-wise dominating the one for any arbitrary policy, the policy obtained for 0 < θ 5 θ ? has Utopian performance. Hence: lim θ → 0 + ̂ ν ∞ ( θ,θ ) = lim θ → 0 + ̂ ν ∞ ( θ ? ,θ ) = ρ P θ? = ρ U (4.26) This completes the proof. Computation of the critical θ ? is non-trivial from a distributed perspective, al- though centralized approaches have been reported [8]. Thus it is hard to guarantee Utopian performance in Algorithm 3. Also, θ ? may be too small resulting in an unac- ceptably poor convergence rate. Nevertheless, we will show that, given any  > 0, one can choose θ to guarantee  -optimal performance of the limiting policy in the sense of Definition 14. We would need the following result. Lemma 3. Given any PFSA, with transition matrix Π and corresponding Cesaro limit P , and μ being a non-unity eigenvalue of Π with maximal magnitude, we have: ∣ ∣∣ ∣ θ [ I − (1 − θ )Π ] − 1 − P ∣ ∣∣ ∣ ∞ 5 θ 1 − | μ | (4.27a) ∣ ∣∣ ∣ ν ( θ,θ ) − lim θ ′ → 0 + ν ( θ,θ ′ ) ∣ ∣∣ ∣ ∞ 5 θ || χ || ∞ 1 − | μ | (4.27b) Proof . Denoting M = [ I − (1 − θ )Π ] − 1 − 1 θ P , M =[ I − (1 − θ )Π] − 1 − P ∞ ∑ k =0 (1 − θ ) k = ∞ ∑ k =0 (1 − θ ) k (Π − P ) k − P =[ I − (1 − θ )(Π − P )] − 1 − P We note, that if u is a left eigenvector of Π with unity eigenvalue, then u P = u . Also, if the eigenvalue corresponding to u is strictly within the unit circle, then u P = 0. After a little algebra, it follows that if u is the left eigenspace (denoted as E (1)) corresponding to unity eigenvalues of Π, then uM = 0, otherwise, uM = 1 1 − (1 − θ ) μ u , where μ is a non-unity eigenvalue for Π. Invoking the definition of induced matrix norms, and noting || A || ∞ = || A T || 1 for any square matrix A : || M || ∞ = max || u || 1 =1 || uM || 1 = max || u || 1 =1 ∧ u / ∈ E (1) || uM || 1 (4.28) We further note that since [ I − (1 − θ )(Π − P )] − 1 is guaranteed to be invertible [8], its eigenvectors form a basis, implying: u = ∑ j c j u j , with ∣ ∣ ∣ ∣ ∣ ∣∑ j c j u j ∣ ∣ ∣ ∣ ∣ ∣ 1 = 1 (4.29) 20 I. Chattopadhyay where u j are eigenvectors of [ I − (1 − θ )(Π − P )] − 1 with non-unity eigenvalues, and c j are complex coefficients. An upper bound for || M || 1 can be now computed as: || M || ∞ 5 1 1 − (1 − θ ) | μ | ∣ ∣ ∣ ∣ ∣ ∣∑ j c j u j ∣ ∣ ∣ ∣ ∣ ∣ 1 = 1 1 − (1 − θ ) | μ | 5 1 1 −| μ | where μ is a non-unity eigenvalue for Π with maximal magnitude. This establishes Eq. (4.27a). Finally, noting: ν ( θ,θ ) − lim θ ′ → 0 + ν ( θ,θ ′ ) = ( θ [ I − (1 − θ )Π] − 1 − P ) χ establishes Eq. (4.27b). The next proposition the key result relating a specific choice of θ to guaranteed  -optimal performance. Proposition 5 ( Global  -Optimality ). Given any  > 0 , choosing θ =  / m 2 where m = max q ∈ Q Card ( N ( q )) guarantees that the limiting policy computed by Algorithm 3 is  -optimal in the sense of Definition 14. Proof . We observe that limiting measure values ̂ ν ∞ θ | i = ν ? θ | i computed by Algo- rithm 3 can be represented by convergent sums of the form ( a ij : non-negative reals): ∀ q i ∈ Q N , ̂ ν ∞ θ | i = ∞ ∑ j =1 a ij (1 − θ ) j (4.30) implying that for each q i ∈ Q , ̂ ν ∞ ( θ,θ 1 ) | i (See Notation 4) is a monotonically decreasing function of θ 1 in the domain [0 , θ ]. We note that if the following statement: ∀ q i , q j ∈ Q N , ̂ ν ∞ θ | i > ̂ ν ∞ θ | j ⇒ ∀ θ 1 5 θ, ̂ ν ∞ ( θ,θ 1 ) | i > ̂ ν ∞ ( θ,θ 1 ) | j is true, then we have Utopian performance for policy P θ , i.e. , ρ P θ = ρ U . Hence, if ρ P θ 6 = ρ U , then we must have: ∃ θ 2 < θ, ∃ q i , q j ∈ Q N , (̂ ν ∞ θ | i > ̂ ν ∞ θ | j ) ∧ (̂ ν ∞ ( θ,θ 1 ) | i > ̂ ν ∞ ( θ,θ 1 ) | j ) upon which Eq. (4.30), along with the bound established in Eq. (4.27a), guarantees that if q i , q j are agents (in consecutive order) that satisfy the above statement, then: lim θ 1 → 0 + ( ̂ ν ∞ ( θ,θ 1 ) | i − ̂ ν ∞ ( θ,θ 1 ) | j ) 5 β θ θ (4.31) where β θ = 1 1 −| μ | , with μ being a maximal non-unity eigenvalue of the transition matrix of the PFSA computed by Algorithm 3 at θ . Next we claim: ∀ θ ′ ∈ (0 , θ ] , || ̂ ν ∞ θ ′ − ̂ ν ∞ ( θ,θ ′ ) || ∞ 5 m 2 θ (4.32) We observe that, for any θ ′ , the optimal policy P θ ′ can be obtained by beginning with the PFSA induced by P θ (which is the optimal policy at θ ), and then executing the centralized iterative approach [8], resulting in a sequence of element-wise non- decreasing measure vectors converging to the optimal ̂ ν ∞ θ ′ : ̂ ν ∞ ( θ,θ ′ ) = ν [0] θ ′ > ν [1] θ ′ > ν [2] θ ′ > · · · ν [ k ? ] θ ′ = ̂ ν ∞ θ ′ (4.33) Distributed Self-Organization Of Swarms To Find  -Optimal Routes 21 where ν [ k ] θ ′ is the vector obtained after the k th iteration, and k ? < ∞ is the number of required iterations. Since, ν [ k ] θ ′ = θ ′ [ I − (1 − θ ′ )Π [ k ] ] − 1 χ , where the transition matrix after k th iteration is Π [ k ] and setting ∆ [ k ] θ ′ = ν [ k ] θ ′ − ̂ ν ∞ ( θ,θ ′ ) we have: ∆ [ k ] θ ′ = (1 − θ ′ ) [ I − (1 − θ ′ )Π [ k ] ] − 1 (Π [ k ] − Π [0] ) ̂ ν ∞ ( θ,θ ′ ) = 1 − θ ′ θ ′ { θ ′ [ I − (1 − θ ′ )Π [ k ] ] − 1 ︸ ︷︷ ︸ B [ k ] θ ′ }{ (Π [ k ] − Π [0] ) ̂ ν ∞ ( θ,θ ′ ) ︸ ︷︷ ︸ ω [ k ] θ ′ } For q i ∈ Q , let U (0 → k ) i be the set of transitions ( q i σ − → q j ), which are updated ( i.e. enabled if disabled or vice verse) to go from the configuration corresponding to ν [0] θ ′ to the one corresponding to ν [ k ] θ ′ . We note that: U (0 → k ) i = ( U (0 → 1) i ∩ U (0 → k ) i ) ⋃ W where W = U (0 → k ) i \ ( U (0 → 1) i ∩ U (0 → k ) i ) . The i th row of Π [1] is obtained from Π [0] [8] by disabling controllable transitions q i σ − → q j if ν [0] θ ′ | j > ν [0] θ ′ | i (and enabling otherwise), and each such update leads to a positive contribution in the corresponding row of ω [1] θ ′ . It follows that updating any transition t ≡ ( q i σ − → q j ) ∈ ( U (0 → 1) i ∩ U (0 → k ) i ) leads to a positive contribution to ω [ k ] θ ′ | i , given by: C t = ̃ Π( q i , σ ) ∣ ∣ ∣ ν [0] θ ′ ∣ ∣ i − ν [0] θ ′ ∣ ∣ j ∣ ∣ ∣ (4.34) Every t ′ ≡ ( q i σ ′ − → q k ) ∈ W causes a negative contribution to ω [ k ] θ ′ | i , given by: C t ′ = − ̃ Π( q i , σ ′ ) ∣ ∣ ∣ ν [0] θ ′ ∣ ∣ i − ν [0] θ ′ ∣ ∣ k ∣ ∣ ∣ (4.35) implying that: ω [ k ] θ ′ | i 5 ∑ r ∈ ( U (0 → 1) i ⋂ U (0 → k ) i ) C r (4.36) ⇒ ω [ k ] θ ′ | i 5 ∑ σ ∈ Σ ̃ Π( q i , σ ) β θ θ = β θ θ (See Eq. (4.31)) Since the rows corresponding to the absorbing states have no controllable transitions, absorbing states must remain absorbing through out the iterative sequence, and the corresponding entries in ω [ k ] θ ′ for all k ∈ { 0 , · · · , k ? } are strictly 0. It follows: ω [ k ] θ ′ | i = { 0 , if q i is absorbing ∈ [0 , β θ θ ] , otherwise (4.37) Stochasticity of B [ k ] θ ′ implies that in the limit θ ′ → 0 + , B [ k ] θ ′ converges to the Cesaro limit of B [ k ] θ ′ . Applying Lemma 3: ∣ ∣∣ ∣ B [ k ] θ ′ − lim θ ′ → 0 + B [ k ] θ ′ ∣ ∣∣ ∣ ∞ 5 θ ′ 1 − | μ θ ′ | , β θ ′ θ ′ (4.38) where μ θ ′ is a non-unity eigenvalue for B [ k ] θ ′ with maximal magnitude. Using the invariance of the absorbing state set, and observing that the Cesaro limit lim θ ′ → 0 + B [ k ] θ ′ has strictly zero columns corresponding to non-absorbing states, we conclude: ∀ θ ′ ∈ (0 , θ ] , ∆ [ k ] θ ′ | i 5 1 − θ ′ θ ′ β θ ′ θ ′ β θ θ 5 β θ ′ β θ θ 22 I. Chattopadhyay It is easy to see that the PFSA induced by P θ is strongly absorbing (Definition 13), and so is each one obtained in the iteration. Also, the virtual states in our network model have no controllable transitions, and have no self-loops. Physical agents can have self-loops arising from disablings; but for a non-absorbing agent with at most m neighbors, the self-loop probability is bounded by ( m − 1) /m , which then implies β θ ′ , β θ 5 1 1 − ( m − 1) /m = m (Proposition 4). Hence: ∀ θ ′ ∈ (0 , θ ] , || ∆ [ k ] θ ′ || ∞ 5 m 2 θ (4.39) Thus, if we choose θ = /m 2 , we can argue: ∀ k ∈ { 0 , · · · , k ? } , ∀ θ ′ ∈ (0 , θ ] , || ∆ [ k ] θ ′ || ∞ 5  ⇒ lim θ ′ → 0 + ∣ ∣ ∣ ∣ ∣ ∣̂ ν ∞ θ ′ − ̂ ν ∞ ( θ,θ ′ ) ∣ ∣ ∣ ∣ ∣ ∣ ∞ 5  ⇒ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ lim θ ′ → 0 + ̂ ν ∞ θ ′ − lim θ ′ → 0 + ̂ ν ∞ ( θ,θ ′ ) ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∞ 5  ( Continuity of norm ) ⇒ ∣ ∣∣ ∣ ρ U − ρ P θ ∣ ∣∣ ∣ ∞ 5  ( Using Lemma 2 ) which completes the proof. Once we have guaranteed convergence to a  -optimal policy, we need to compute asymptotic bounds on the time-complexity of route convergence, i.e. , how long it takes to converge to the limiting policy so that the local routing decisions no longer fluctuate. In practice, the convergence time is dependent on the network delays, the degree to which the agent updates are synchronized etc. , and is difficult to estimate. In this paper, we neglect such effects to obtain an asymptotic estimate in the perfect situation. This allows us to quantify the dependence of the convergence time on key parameters such as N , m and  . Future work will address situations where such possibly implementation-dependent effects are explicitly considered resulting in potentially smaller convergence rates. Proposition 6 ( Asymptotic Runtime Complexity ). With no communication delays and assuming synchronized updates, convergence time T c to  -optimal operation for a network of N physical agents and maximum m neighbors, satisfies: T c = O ( N m 2  (1 − γ ? ) ) where γ ? is a lower bound on failure probabilities Proof . Synchronized updates imply that we can assume the following recursion: ̂ ν [1] θ = 0 (Zero vector) (4.40a) ̂ ν [ k +1] θ = (1 − θ )Π [ k ] ̂ ν [ k ] θ + θχ (4.40b) which can be used to obtain the upper bound: ∣ ∣∣ ∣̂ ν ∞ θ − ̂ ν [ k ] θ ∣ ∣∣ ∣ ∞ 5 (1 − θ ) k (4.41) implying that after k updates, each agent is within (1 − θ ) k of its limiting value. Denoting the smallest difference of measures as ∆ ? , we note that (1 − θ ) k 5 ∆ ? would Distributed Self-Organization Of Swarms To Find  -Optimal Routes 23 guarantee that no further route fluctuation occurs, and the network operation will be  -optimal from that point onwards. To estimate ∆ ? , we note that 1) comparisons cannot be made for values closer than the machine precision M 0 , and 2) the lowest possible non-zero measure in the network occurs at the network boundaries if we assume the worst case scenario in which the failure probability is always γ ? . We recall the measure of a agent is the sum of the measures of all paths initiating from the particular agent and terminating at the target. Also, note that any such path accumulates a multiplicative factor of (1 − θ ) 2 (1 − γ ? ) in each hop. In the worst case a given agent is N hops away, and has a single path to the target, implying that the smallest non-zero measure of any agent is bounded below by ((1 − θ ) 2 (1 − γ ? )) N . Hence: ∆ ? = M 0 ( (1 − θ ) 2 (1 − γ ? ) ) N (4.42) and hence a sufficient condition for convergence is: (1 − θ ) k = M 0 ( (1 − θ ) 2 (1 − γ ? ) ) N ⇒ (1 − θ ) ( k − 2 N ) = M 0 (1 − γ ? ) N ⇒ k = 2 N + log M 0 log(1 − θ ) + N log(1 − γ ? ) log(1 − θ ) (4.43) Treating M 0 as a constant, we have log M 0 log(1 − θ ) = O ( 1 θ ) . Since θ must be small for near-optimal operation and considering the worst case γ ?  1, we have: (1 − θ ) k 1 = 1 − γ ? where k 1 , log(1 − γ ? ) log(1 − θ ) ⇒ (1 − k 1 θ ) ' 1 − γ ? ⇒ k 1 θ = γ ? ⇒ k 1 = γ ? θ ⇒ k 1 ' 1 θ (1 − (1 − γ ? )) − 1 ⇒ k 1 = O ( 1 θ (1 − γ ? ) ) ⇒ k = O ( N + 1 θ + N θ (1 − γ ? ) ) = O ( N θ (1 − γ ? ) ) ⇒ k = O ( N m 2  (1 − γ ? ) ) ( Using Proposition 5 ) Thus we have T c = O ( k ), which completes the proof. It follows from Proposition 6 that for constant  and γ ? , and large networks with relatively smaller number of local neighbors such that N  m , we will have T c = O ( N ). Detailed simulation, on the other hand, indicates that this bound is not tight, as illustrated in Figure 6.1a, where we see a logarithmic dependence instead. The stationary policy computed for the frozen swarm has some additional properties, as we establish next. Proposition 7 ( Properties ). The limiting frozen policy is stationary and has the following additional properties: 1. is loop-free 2. is the unique loop-free policy that disables the smallest set of transitions among all policies which induce the same measure vector for a given θ . Proof . Stationarity is obvious. (1) Absence of loops follows immediately from noting that, in the limiting policy, a controllable transition q i → q v ( ij ) is enabled if and only if q v ( ij ) has a limiting measure strictly greater than that of q i , implying that 24 I. Chattopadhyay Table 4.1: Instantaneous Agent Data Table Id. Neighbor # Current Measure Failure Probability Forwarding Decision I 1 (Self) 1 ν 0 d 0 = 0 0 . . . . . . . . . . . . . . . I m m ν m d m 1 any sequence of transitions (with no consecutive repeating states) goes to either the dump or the target in a finite number of steps. (2) follows directly from the uniqueness and the maximal permissivity property of optimal policies computed by language measure-theoretic optimization (See [8]). One can easily tabulate the data that needs to be maintained at each agent (See Table 4.1). In particular, each agent needs to know the unique network id. of each neighbor that it can communicate with (Col. 1), and their current measure values (Col. 3). The failure probabilities for communicating from self to each of those neighbors must be maintained as well, for the purpose of carrying out the distributed updates (Col. 4). The forwarding decision is a neighbor-specific Boolean value (Col. 5), which is set to 1 if the neighbor currently has a strictly higher measure than self, and 0 otherwise. The packets are then forwarded by randomly choosing (in an equiprobable manner) between the enabled neighbors, i.e. , the ones with a true forwarding decision. Note that this agent data updates when the measures of the neighbors change (Col. 3), or the failure probabilities (Col. 4) update. However, changes in the measures may not necessarily reflect a change in the forwarding deci- sions. Also, note that the routing is inherently probabilistic, (due to the possibility that multiple enabled neighbors may exist for a given agent). Furthermore, the opti- mal policy disables navigation decisions to as few neighbors as possible for a specified θ (Proposition 7), and hence exploits available alternate routes in an optimal manner, thereby reducing congestion. 5. Simultaneous Navigation & Decision Optimization: The ”Unfrozen” Case. Notation 5 ( Best Neighbors ). For a fixed agent q i , the set of neighboring agents having maximal measure at operation time t is denoted as B i ( t ) . Furthermore, let b ? i ( t ) denote a randomly chosen maximal agent, towards which q i has decided to move at time t . Notation 6 ( Swarm Configuration ). Denote the vector of positional coordinates of the agents at time t as P ( t ) . Definition 15 ( Movement Mechanism ). The positional update mechanism of the swarm can now be concretely stated as: C1 After step (a4) in Algorithm 3, for each q i choose a maximal agent b ? i ( t ) , and move towards b ? i ( t ) at a constant velocity, with the following restriction. C2 If there exists q j such that q i = b ? j ( t ) , then make sure that the distance from q j is within the communication radius. Definition 16 ( Process R v s ( t, P ( t ′ )) ). The stated movement mechanism induces a sequence of swarm configurations as a function of time t denoted by R v s ( t, P ( t ′ )) , which is understood to be the achieved vector of positional coordinates of the agents as a function of time t = t ′ , beginning with the initial configuration P ( t ′ ) at time t ′ , with the constant update velocity v s > 0 . Distributed Self-Organization Of Swarms To Find  -Optimal Routes 25 Definition 17 ( The Ideal Update Process ). Assuming that the distributed route convergence for the frozen swarm occurs instantaneously, let N ∆ ′ ,v s ( t, P ( t ′ )) denote the vector of position coordinates of the agents at time t = t ′ , obtained as a result of the following sequential operation, initiated with the swarm configuration P ( t ′ ) at time t ′ : 1. Freeze swarm 2. Optimize routes via Algorithm 3 (assumed to occur instantaneously for this definition only) 3. For time ∆ ′ , move each agent q i towards its best neighbor (with some form of tie-breaking if required) with a constant velocity v s . 4. Go to step 1. Then the ideal update process is defined as the sequence of swarm configurations (as a function of the operation time t ) given by: I v s ( t, P ( t ′ )) = lim ∆ ′ → 0 + N ∆ ′ ,v s ( t, P ( t ′ )) (5.1) I v s ( t, P ( t ′ )) has the following immediate properties: 1. At any point in time t = t ′ , the routing policy in effect is globally  -optimal in the sense defined in the preceding section. 2. Infinitesimal updates at each time t occur according to such  -optimal policies. Denoting I v s ( t, P ( t ′ )) | i and R v s ( t, P ( t ′ )) | i as the positional coordinates of the agent q i at time t for the respective update processes, and P Tgt as the positional coordinate of the target, we have the following convergence results. Proposition 8 ( Convergence Of Swarm Trajectories ). For any initial configu- ration P (0) at time t = 0 , each agent eventually converges to the target, i.e. , ∀ q i ∈ Q, ̂ ν θ | i (0) > 0 ⇒ { ∀P (0) , ∀ q i ∈ Q, lim t →∞ || I v s ( t, P (0)) | i − P Tgt || = 0 ∀P (0) , ∀ q i ∈ Q, lim t →∞ || R v s ( t, P (0)) | i − P Tgt || = 0 Proof . We consider the two processes N ∆ ′ ,v s ( t, P (0)), for some ∆ ′ > 0 and R ∆ ′ ,v s ( t, P (0)). We note that condition C2 in Definition 15 is automatically satisfied for N ∆ ′ ,v s ( t, P (0)), since the distance between agent q i and b ? i ( t ) is guaranteed to be non-decreasing if the agents move with a constant velocity v s . This immediately implies that no agent in the swarm gets disconnected, and it follows that: ̂ ν θ | i (0) > 0 ⇒ ∀ t > 0 , ̂ ν θ | i ( t ) > 0 (5.2) since it is given that ∀ q i ∈ Q, ̂ ν θ | i (0) > 0 implying that at least one sequence of hops from agent q i to q Tgt of the form { q i , q i ′ , · · · , q Tgt } exists at time t = 0, such that ̂ ν θ | i 5 ̂ ν θ | i ′ 5 · · · 5 ̂ ν θ | Tgt = 1 (5.3) and hence at least one such sequence is guaranteed to exist for all t > 0. Let h t ( q i ) ∈ N be the minimum length of such a sequence from q i at time t . Since it is given that ∀ q i ∈ Q, ̂ ν θ | i (0) > 0, we have: max q i ∈ Q h 0 ( q i ) 5 Card ( Q ) (5.4) We note that all agents q i with h 0 ( q i ) = 1 are direct neighbors of the target, and hence simply move towards the latter at a constant velocity v s for all times until convergence. Let t ′ be the time within which all such agents do converge to the target. Then, it follows that: max q i ∈ Q h t ′ ( q i ) 5 max q i ∈ Q h 0 ( q i ) − 1 (5.5) By continually applying the above argument, we obtain a sequence of times { t ′ , t ′′ , · · · } , 26 I. Chattopadhyay such that: Card ( Q ) = max q i ∈ Q h 0 ( q i ) > max q i ∈ Q h t ′ ( q i ) > max q i ∈ Q h t ′′ ( q i ) > · · · (5.6) Finite size of the swarm, and the fact that the above argument applies for all ∆ ′ > 0, then implies the desired result. I v s ( t, P (0)) cannot be directly implemented in practice (due to the requirement of sequential freezing and instantaneous route optimization). However, it allows us to compute the performance of implementable policies by comparing how close the achieved sequence of swarm configurations is to the ideal process. Note the in spite of convergence, the ideal process I v s ( t, P (0)) differs significantly in definition from R v s ( t, P (0)), and we need to establish that the latter is in some meaningful sense close to the former. We need the following definition, and a notion of convergence rate. Definition 18 ( Path Lengths To Target ). Recall that ̂ ν θ | i ( t ) > 0 implies that at least one sequence of hops from agent q i to q Tgt of the form { q i , q i ′ , · · · , q Tgt } exists at time t , such that ̂ ν θ | i ( t ) 5 ̂ ν θ | i ′ ( t ) 5 · · · 5 ̂ ν θ | Tgt = 1 (5.7) and h t ( q i ) is the minimum length of such a hop sequence from q i at time t . We define ̃ h t ( q i ) as the physical piecewise length of such a path (denoted by the indices of the agent sequence for simplified notation i.e. writing q i as i ) S = { i = j 1 , j 2 , · · · , j r − 1 , j r , · · · , j h t ( q i ) = Tgt } (5.8) as follows: ̃ h t ( q i ) = j r = j ht ( qi ) ∑ j r = j 2 ∣ ∣∣ ∣ P ( t ) j r − 1 − P ( t ) j r ∣ ∣∣ ∣ (5.9) Proposition 9 ( Convergence Rate ). The swarm trajectories converge to the target at an exponential rate for both I v s ( t, P (0)) and R v s ( t, P (0)) , i.e. , we have: ∀ q i ∈ Q, ̃ h t ( q i ) 5 ̃ h 0 ( q i ) e − ( v s /R c ) t (5.10) where R c > 0 is the specified constant communication radius. Proof . We note that in Eq. (5.9), agent q j r ∈ B j r − 1 ( t ). Without loss of generality, we assume that q j r = b ? j r − 1 ( t ). Then it follows that: d d t ̃ h t ( q i ) = − h t ( q i ) v s (5.11) Next, we note the bound: ̃ h t ( q i ) 5 R c h t ( q i ) ⇒ − h t ( q i ) 5 − 1 R c ̃ h t ( q i ) (5.12) Using in Eq. (5.11), we obtain: d d t ̃ h t ( q i ) 5 − ( v s R c ) ̃ h t ( q i ) (5.13) which completes the proof. Definition 19 ( Swarm Diameter ). The swarm diameter D t ( Q ) is defined as: D t ( Q ) = 2 max q i ∈ Q ||P q i ( t ) − P Tgt || (5.14) Corollary 1 ( Corollary To Proposition 9 ). D t ( Q ) 5 2 D 0 ( Q ) e − ( v s /R c ) t Proof . Follows immediately from noting D t ( Q ) 5 2 max q i ∈ Q ̃ h t ( q i ). Corollary 2 ( Corollary To Proposition 9 ). Denoting an upper bound on the Distributed Self-Organization Of Swarms To Find  -Optimal Routes 27 time required for all agents to converge to the target as T conv , we have: v s T conv ' Const. for constant R c (5.15a) R c /T conv ' Const. for constant v s (5.15b) Proof . Replace the requirement of every agent converging to the target by one that requires almost all of them reaching the target, in the sense that the swarm diameter D T conv ' f , where 0 < f  1. Using Corollary 1, we have: v s T conv /R c 5 ln(2 /f ) (5.16) Since T conv is an upper bound on the convergence time, we can replace the inequality: v s T conv /R c ' ln(2 /f ) = Const. (5.17) which implies the desired result. Notation 7. We denote the set of unit vectors of velocity directions at time t for the processes I v s ( t, P (0)) and R v s ( t, P (0)) as ∂ I v s ( t, P (0)) and ∂ R v s ( t, P (0)) respectively. Proposition 10 ( Asymptotic Deviation From Ideal Process ). For sufficiently small v s > 0 , we have: P rob (∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∂ R v s ( t, P (0)) − ∂ I v s ( t, R v s ( t, P (0))) ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ > 0 ) (5.18) = O ( R c v s ( e ( v s /R c ) T c − 1 ) e − ( v s /R c ) t ) (5.19) where T c is the convergence time of the frozen swarm. Proof . We note that if the neighborhood maps and the failure probabilities do not change for the interval [ t − T c , t ], then the velocity vectors of the two processes will coincide. We assume that v s is small enough such that in the absence of topology up- dates, the velocity vectors for R v s ( t, P (0)) coincide with that of I v s ( t, R v s ( t, P (0))). Denoting the probability of topology update at time t as T ( t ), we note: P rob (∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∂ R v s ( t, P (0)) − ∂ I v s ( t, R v s ( t, P (0))) ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ > 0 ) = ∫ t t − T c T ( t ′ )d t ′ (5.20) T ( t ) however is bounded above by the fraction of agents ψ ( t ) at time t that have an inter-agent distance ' R c , which is a necessary condition for such agents to affect a change in their neighborhood map. In particular, we have T ( t ) = O ( ψ ( t )) (5.21) Since the swarm diameter D t ( Q ) is dominated by an exponentially decreasing function (See Corollary 1 to Proposition 9), and inter-agent distance is bounded above by D t ( Q ), we have: ψ ( t ) = O ( e − ( v s /R c ) t ) (5.22) The result then follows by standard algebra from Eq. (5.20). Proposition 10 shows that the implementable process R v s ( t, P (0)) starts coin- ciding, at least in probability, to the ideal update process for small agent velocities. Note, that as v s → 0 + in Eq. (5.19), we have, as expected: lim v s → 0 P rob (∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∂ R v s ( t, P (0)) − ∂ I v s ( t, R v s ( t, P (0))) ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ > 0 ) = O ( R c lim v s → 0 e ( v s /R c ) T c − 1 v s ) = O ( T c ) (5.23) which reflects the fact that while R v s ( t, P (0)) takes O ( T c ) time to converge, we as- 28 I. Chattopadhyay Smoothing Spline Fit Time (ms) ǫ 10 − 5 10 − 4 10 − 3 10 2 . 3 10 2 . 4 (a) Mean Time (s) N 10 2 10 3 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 1 . 2 1 . 4 (b) τ Globally optimal perf. # of corrections || ρ P || 2 × 2 0 500 1000 1500 2000 50 100 150 200 250 (c) # of corrections || ρ P || 2 × 2 0 100 200 300 400 500 600 100 200 300 (d) Fig. 6.1: Convergence complexity for distributed optimization of the frozen network: (a) illustrates dependence on network size. (b) captures the O (1 / ) dependence. Convergence dynamics: (c) rapid convergence to large random target movements (d)robust response to large zero-mean variations in the failure probabilities sumed that I v s ( t, R v s ( t, P (0))) executes instantaneous route optimization. Similarly: lim R c →∞ P rob (∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∂ R v s ( t, P (0)) − ∂ I v s ( t, R v s ( t, P (0))) ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ > 0 ) = O ( T c ) (5.24) which indicates that in the case where no topology changes occur due to the fact that all agents are neighbors of each other, the ideal process is faster for the same reason. If there is no communication, then no optimization is possible for either process: lim R c → 0 P rob (∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∂ R v s ( t, P (0)) − ∂ I v s ( t, R v s ( t, P (0))) ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ > 0 ) = 0 (5.25) 6. Simulation Results. 6.1. Complexity of Optimization In The Frozen Swarm. Extensive simu- lations on NS2 network simulator are used to investigate how convergence times scale as a function of the network size (Figure 6.1). 10 2 random topologies were considered for each N (increased from 25 to 1600), and the mean times along with the max-min bars are plotted in Figure 6.1b. Note that the abscissa is on a logarithmic scale, and the near linear nature of the plot indicates a logarithmic dependence of the conver- gence on network size, implying that the bound computed in Proposition 6 is possibly not tight. The dependence on  shown in Figure 6.1a (for N = 10 3 ) is hyperbolic, as expected, leading to a near linear dependence after a smoothing spline fit on a log-log scale. Convergence times are estimated from NS2 output (using 802.11 standard). Theoretical convergence results are illustrated in Figure 6.1(c-d), generated on a Distributed Self-Organization Of Swarms To Find  -Optimal Routes 29 10 4 frozen agent network. Figure 6.1c illustrates the variation of the number of route updates (# of forwarding decision corrections) and the norm of the performance vector ρ P (scaled up by a multiplicative factor of 2) when the target is moved around ran- domly at a slower time scale. Since ρ P is the vector of end-to-end success probabilities (See Definition 14), its norm captures the degree of expected throughput across the network. Note that target changes induce self-organizing corrections, which rapidly die down, with the performance converging close to the global optimal (  = 0 . 001 was assumed in all the simulations). The failure probabilities are chosen randomly, and, on the average, held constant in the course of simulation illustrated in Figure 6.1c (zero mean Gaussian noise is added to illustrate robustness). Note that the seemingly large fluctuations in the performance norm is unavoidable; the interval τ is the what it approximately takes for information to percolate through the network, and hence this much time is necessary at a minimum for decentralized route convergence. Fig- ure 6.1d illustrates the effect of large zero-mean stochastic variations in the failure probabilities. Each agent estimates the failure probabilities from simple windowed average of the link-specific packet failures. We note that large sustained fluctuations result in a sustained corrections in the forwarding decisions (which no longer goes to zero). However, the norm of the performance vector converges and holds steady. This clearly illustrates that the information percolation strategy induces a low-pass filter eliminating high-frequency fluctuations. A small number of route fluctuations always occur (note the non-zero number of corrections), but this does not induce significant performance variations. 6.2. Simulation Studies On Mobile Swarms. First, we need to validate the theoretical development presented in Section 5. For that purpose, we consider a swarm of 10 4 agents, with a single target. The swarms are initially distributed uniformly over a 100 m × 100 m region. The failure probabilities are assumed to be linearly decreasing functions of the inter-agent distances. They also are given a random spatial depen- dence. The system is simulated in accordance to the described algorithms, at various values of the communication radius R c , and the agent velocity v s . Convergence time T conv is the time required for nearly all agents ( > 99 . 9%) to reach the target. The results are shown in Figure 6.2. Note that Figure 6.2a validates Proposition 10 in that the ideal process I v s ( t, P (0)) closely matches the implementable process R v s ( t, P (0)) with respect to the fraction of agents reaching the goal as a function of the simula- tion time. For our simulated system the ideal process I v s ( t, P (0)) was obtained by freezing the swarm for 1000 simulation ticks after every tick that caused any position updates. Since the frozen swarm is guaranteed to converge to the optimal routes (as shown in Section 4), this procedure ensures that the achieved position updates closely approximate the ideal process. In logging simulation time, we ignored the time spent in the frozen optimization. Figure 6.2b-6.2d validates Corollary 2, by showing that for constant communication radius R c , the convergence time T conv does indeed vary hyperbolically with the v s (Figure 6.2b,6.2c), and for constant v s , it increases linearly with the communication radius R c (Figure 6.2d). As the communication radius is de- creased, we eventually encounter a point where the swarm begins to get disconnected, which is reflected in the rapid increase of T conv at low values of R c . If we identify R c with the amount of energy spent in communication, we note that there is an optimum value at which the T conv is minimized. The high-traffic paths generated in the swarm are very different for different communication radius. This is illustrated in Figure 6.3 (with 10 4 agents), where we note that for small R c we see the development of distinct paths. 30 I. Chattopadhyay ( m/s Fraction At Goal Simulation Time (s) I v s ( t, P (0)) R v s ( t, P (0)) × 10 4 0 5 10 15 0 . 2 0 . 4 0 . 6 0 . 8 (a) Convergence Time T conv (s) v s ( m/s ) R c = 3 m R c = 2 m R c = 1 . 5 m 10 − 1 10 0 10 1 10 2 (b) Convergence Time T conv (s) v s ( m/s ) R c = 3 m R c = 2 m R c = 1 . 5 m 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 50 100 150 200 250 (c) Convergence Time T conv (s) Comm. Radius R c (m) v s = 4 . 5 m/s v s = 2 . 5 m/s 0 2 4 6 8 10 10 20 30 40 50 60 (d) Fig. 6.2: Simulation results validating theoretical development in Section 5: (a) I v s ( t, P (0)) and R v s ( t, P (0)) matches closely w.r.t. the the fraction reaching tar- get, (b) variation of T conv with velocity v s at constant communication radius R c on a log-log scale showing the predicted hyperbolic relationship (Corollary 2), (c) same data on linear scale, (d) variation of T conv with R c at constant v s showing the pre- dicted linear relationship. At low values of R c swarm begins to get disconnected resulting in rapid increase in T conv 6.3. Simulation Case Studies. We present simulation results for a series of different scenarios. All simulations are done with a minimum of 10 4 agents, and the communication radius R c and the swarm velocity v s is kept fixed at 0 . 1 m , and 2 . 5 m/s unless otherwise stated. The initial agent distribution is uniform, as before, over a 100 m × 100 m plane (with the exception of intentional voids). 1. Figure 6.4 illustrates the situation where there is a void in the initial agent distribution. We note how the high-traffic paths go around the void, resulting from the neighbor-following mechanism. This is intuitively correct, as the small communication radius implies that no information is available on the failure probabilities inside the void. As the simulation progresses, the two paths going around the void closes together, illustrating the fact as agents on the edge of the void incur inwards due to noisy position updates, the area with no agents diminishes. 2. Figure 6.5 illustrates the scenario where we have two point targets. Note Distributed Self-Organization Of Swarms To Find  -Optimal Routes 31 (a) (b) (c) (d) (e) (f) (g) (h) (i) Fig. 6.3: Effect of communication radius R c on swarm trajectories: Plates (a)-(d) illustrate the case for high R c = 3 m , whereas plates (e)-(i) illustrate the case for R c = 0 . 1 m . Note the case for low R c develops high-traffic paths in the simulation, whereas for high R c , the trajectories are more amorphous how the swarm splits automatically, and moves towards a particular target. This splitting decision is not taken a priori, but naturally emerges from the execution. 3. Figure 6.7 illustrates a scenario with multiple extended targets. Note how the swarm trajectories form temporary accumulation points (can be seen towards the bottom of the target on the right), indicating the locations from where going towards both targets are equally advantageous. Random choice of best neighbors is sufficient, without any addition effort, to ensure that such accumulations are only temporary. 4. Figure 6.3 illustrates the scenario with an obstacle, whose position needs to be locally sensed, and is not known a priori or globally. 5. Figure 6.8 illustrates the case with agents that are not interested in going to target, but are willing to share information i.e. carry out the node-based 32 I. Chattopadhyay (a) (b) (c) (d) (e) (f) (g) (h) (i) Fig. 6.4: Illustrating effect of void in the initial agent distribution: the trajectories tend to avoid the void since every agent is following a neighbor leading to distinct paths computations, and relate the measure values to the neighbors. The difficulty is that such agents are not necessarily moving towards the goal, so follow- ing such an agent may prove detrimental. However, as these agents move in a direction which is not improving the measure gradient, their nodal up- dates begin diminishing their current measures. The simulation shows that a sparse number of agents (in blue) which intend to reach the target can accomplish this in the presence of the former type (in red). Note also that in this simulation the number of blue agents was chosen to be insufficient to form a connected network. Thus the information percolation is shown to be sufficient for the development of distinct paths to target. 7. Future Work. Future work will proceed in the following directions: 1. Design explicit strategies for energy and congestion awareness within the Distributed Self-Organization Of Swarms To Find  -Optimal Routes 33 (a) (b) (c) (d) (e) (f) (g) (h) (i) Fig. 6.5: Multiple point targets: Note that the swarm automatically splits to move towards chosen targets; such choices are not made a priori but naturally emerge from the execution proposed framework. Note that each agent can regulate incoming traffic by deliberately reporting lower values of its current self-measure to its neighbors: Reported −→ r [ k ] θ ∣ ∣ i = ζ ( q i , k ) ν [ k ] θ ∣ ∣ i ← Computed (7.1) where ∀ q i ∈ Q, k ∈ [0 , ∞ ) , ζ ( q i , k ) ∈ [0 , 1] is a multiplicative factor which is modulated to have decreasing values as local congestion increases. Such modulation forces automatic self-organization to compute alternate routes that tend to avoid the particular agent. The dynamics of such context-aware modulation may be non-trivial; while for slowly varying ζ ( q i , k ), the conver- gence results presented here is expected to hold true, rapid fluctuations in (a) (b) (c) (d) (e) (f) (g) (h) (i) Fig. 6.6: Illustrating multiple extended target regions: Note that the swarm automat- ically splits to move towards chosen targets (in green); as before such choices are not made a priori but naturally emerge from the execution ζ ( q i , k ) may be problematic. 2. We assumed that the link-specific failure probabilities are estimated at the agents. Grossly incorrect estimations will translate to incorrect routing deci- sions, and decentralized strategies for robust identification of these parameters need to be investigated at a greater depth. More specifically, the proposed algorithm needs to be augmented with learning schemes (possibly reinforce- ment learning based approaches) that estimate such failure probabilities. 3. Explicit design of implementation details such as packet headers, agent data structures and pertinent neighbor-neighbor communication protocols. 4. Hardware validation on real systems. 34 (a) (b) (c) (d) (e) (f) (g) (h) (i) Fig. 6.7: Presence of obstacles with obstacle positions are locally sensed and are not known a priori or globally. Note how the high traffic paths go around REFERENCES [1] J. Deneubourg and S. Goss, “Collective patterns and decision-making,” Ethology, Ecology, and Evolution , vol. 1, 1989. [2] D. Gage, “Command and control for many-robot systems,” In Unmanned Systems Magazine , vol. 10, 1992. [3] O. Holland and C. Melhuish, “Stigmergy, self-organization, and sorting in collective robotics,” Artif. Life , vol. 5, pp. 173–202, April 1999. [Online]. Available: http://portal.acm.org/citation.cfm?id=338945.338956 [4] C. Unsal and J. Bay, “Spatial self-organization in large populations of mobile robots,” in IEEE Int. Symp. on Intelligent Control , 1994, pp. 249–254. [5] D. W. Gage, “How to communicate with zillions of robots,” in In Proceedings of SPIE Mobile Robots VIII , 1993, pp. 250–257. [6] M. Lewis and G. Bekey, “The behavioral self-organization of nanorobots using local rules,” in In Proc. 1992 IEEE/RSJ Int. Conf. Intelligent Robots and Systems , 1992. [7] D. Payton, R. Estkowski, and M. Howard, “Compound behaviors in pheromone robotics,” Robotics and Autonomous Systems , vol. 44, no. 3-4, pp. 229–240, 2001. [8] I. Chattopadhyay and A. Ray, “Language-measure-theoretic optimal control of probabilistic finite-state systems,” Int. J. Control , 2007. 35 (a) (b) (c) (d) (e) (f) (g) (h) Fig. 6.8: Presence of randomly moving agents (in red) which are not going to the target, but are willing to share information. Note how the blue agents organize to form paths [9] ——, “Structural transformations of probabilistic finite state machines,” International Journal of Control , vol. 81, no. 5, pp. 820–835, May 2008. [10] T. Schmickl, R. Thenius, C. Moeslinger, G. Radspieler, S. Kernbach, M. Szymanski, , and K. Crailsheim, “Get in touch: cooperative decision making based on robot-to-robot colli- sions,” Autonomous Agents and Multi-Agent Systems , vol. 18, no. 1, pp. 133–155, 2009. [11] S. Nouyan, A. Campo, and M. Dorigo, “Path formation in a robot swarm - self-organized strategies to find your way home,” Swarm Intelligence , vol. 2, no. 1, pp. 1–23, 2008. [12] B. Holldobler and E. O. Wilson, The superorganism: the beauty, elegance, and strangeness of insect societies . W.W. Norton & Company, 2009. [13] J. Svennebring and S. Koenig, “Building terrain-covering ant robots: A feasibility study,” Autonomous Robots , vol. 16, no. 3, pp. 313–332, 2004. [14] I. Wagner, M. Lindenbaum, and A. Bruckstein, “Distributed covering by ant- robots using evaporating traces,” IEEE Transactions on Robotics and Automation , vol. 15, 1999. [15] I. D. Couzin, J. Krause, N. R. Franks, and S. A. Levin, “Effective leadership and decision-making in animal groups on the move,” Nature , vol. 433, no. 7025, pp. 513–516, Feb 2005. [Online]. Available: http://dx.doi.org/10.1038/nature03236 [16] I. Couzin, “Collective minds,” Nature , vol. 445, no. 7129, pp. 715–715, Feb 2007. [Online]. Available: http://dx.doi.org/10.1038/445715a 36 [17] S. Nolfi and D. Floreano, Evolutionary Robotics: The Biology,Intelligence,and Technology . Cambridge, MA, USA: MIT Press, 2000. [18] D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein, “The complexity of decentral- ized control of markov decision processes,” in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI-2000) , 2000, pp. 32–37. [19] ——, “The complexity of decentralized control of markov decision processes,” Math. Oper. Res. , vol. 27, no. 4, pp. 819–840, 2002. [20] C. Lusena, J. Goldsmith, and M. Mundhenk, “Nonapproximability results for partially observ- able markov decision processes,” Journal of Artificial Intelligence Research , vol. 14, p. 2001, 2001. [21] M. Kumar, D. Garg, and R. Zachery, “Multiple mobile agents control via artificial potential functions and random motion,” in Proceedings of the ASME International Mechanical Engineering Congress and Exposition . Seattle, WA: ASME, November 2007. [22] S. Sarkar, E. Halland, and M. Kumar, “Mobile robot path planning using support vector machines,” in ASME Dynamic Systems and Control Conference . Ann Arbor, Michigan: ASME, 2008. [23] J. Borenstein and Y. Koren, “Potential field methods and their inherent limitations for mobile robot navigation,” in Proceedings of the 1991 IEEE International Conference on Robotics and Automation , 1991, pp. 1398–1404. [24] R. Volpe and P. Khosla, “Manipulator control with superquadric artificial potential functions: theory and experiments,” IEEE Transactions on Systems, Man, and Cybernetics, , vol. 20, pp. 1423–1436, 1990. [25] C. Connolly, “Harmonic functions and collision probabilities,” The International Journal of Robotics Research, , vol. 16, pp. 497–507, 1997. [26] J. Vascak, “Navigation of mobile robots using potential fields and computational intelligence means,” Acta Polytechnica Hungarica, , vol. 4, pp. 63–74, 2007. [27] J. Kim and P. K. Khosla, “Real-time obstacle avoidance using harmonic potential functions,” IEEE Transactions on Robotics and Automation , vol. 8, no. 3, pp. 338–349, 1993. [28] K. Sato, “Deadlock-free motion planning using the laplace potential field,” Advanced Robotics , vol. 5, no. 3, pp. 449–461, 1993. [29] Akishita, S. Kawamura, and K. Hayashi, “New navigation function utilizing hydrodynamic potential for mobile robot,” in IEEE Int. Workshop on Intelligent Motion Control , 1990, pp. 413–417. [30] S. Waydo and R. M. Murray, “Vehicle motion planning using stream functions,” in Proc. IEEE Int. Conf. Robotics and Automation , 2003, pp. 2484–2491. [31] F. Sharifi and D. Vinke, “Integration of the artificial potential field approach with simulated annealing for robot path planning,” 1993, pp. 536–541. [32] J. Barraquand, B. Langlois, and J.-C. Latombe, “Numerical potential field techniques for robot path planning,” IEEE Transactions on Systems, Man, and Cybernetics, , vol. 22, pp. 224– 241, 1992. [33] J. Barraquand and J. Latombe, “Robot motion planning: a distributed representation ap- proach,” The International Journal of Robotics Research, , vol. 10, pp. 628–6449, 1991. [34] M. Park and M. Lee, “A new technique to escape local minimum in artificial potential field based path planning,” KSME International Journal, , vol. 17, pp. 1876–1885, 2003. [35] C. Li, H. Marcelo, H. Krishnan, and L. Yong, “Virtual obstacle concept for local-minimum- recovery in potential-field based navigation,” 2000, pp. 983–988. [36] G. Bell and M. Weir, “Forward chaining for robot and agent navigation using potential fields,” 2004, pp. 265–274. [37] M. Weir, A. Buck, and J. Lewis, “A mind’s eye approach to providing bug-like guarantees for adaptive obstacle navigation using dynamic potential fields,” Lecture Notes in Computer Science , vol. 4095, pp. 239–250, 2006. [38] J. Borenstein and Y. Koren, “Real-time obstacle avoidance for fast mobile robots,” IEEE Transactions on Systems, Man, and Cybernetics, , vol. 19, pp. 1179–1187, 1989. [39] X. Yun and K. Tan, “A wall-following method for escaping local minima in potential field based motion planning,” in Proceedings of , 8th International Conference on Advanced Robotics , 1997. [40] M. Mabrouk and C. McInnes, “An emergent wall following behaviour to escape local minima for swarms of agents,” IAENG International Journal of Computer , vol. 35, pp. 463–476, 2008. [41] J. E. Hopcroft, R. Motwani, and J. D. Ullman, Introduction to Automata Theory, Languages, and Computation, 2nd ed. Addison-Wesley, 2001. [42] R. Bapat and T. Raghavan, Nonnegative matrices and Applications . Cambridge University 37 Press, 1997. [43] W. Rudin, Real and Complex Analysis, 3rd ed. McGraw Hill, New York, 1988. [44] E. A. Feinberg and A. Shwartz, Eds., Handbook of Markov Decision Processes: Methods and Algorithms . Boston, MA: Kluwer, 2002. [45] D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: The Discrete Time Case . New York: Academic Press, 1978. [46] D. P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models . Englewood Cliffs, NJ: Prentice Hall, 1987. [47] P. J. Ramadge and W. M. Wonham, “Supervisory control of a class of discrete event processes,” SIAM J. Control and Optimization , vol. 25, no. 1, pp. 206–230, 1987. [48] M. L. Puterman, Markov decision processes in Handbooks in Operation Research and Manage- ment Science , D. P. Heyman and M. J. Sobel, Eds. Amsterdam: NorthHolland, 1990. [49] L. G. Gubenko and E. S. Statland, “On controlled, discrete-time markov decision processes,” Theory Probab. Math. Statist. , vol. 7, p. 4761., 1975. [50] A. Arapostathis, V. S. Borkar, E. Fernandez-Gaucherand, and M. K. Ghosh, “Discrete-time controlled markov processes with average cost criterion: A survey,” SIAM J. Control and Optimization , vol. 31, pp. 282–344, 1993. [51] S. Gerschgorin, “ ̈ Uber die abgrenzung der eigenwerte einer matrix,” Izv. Akad. Nauk. USSR Otd. Fiz.-Mat. Nauk , vol. 7, pp. 749–754, 1931. [52] R. S. Varga, Gerschgorin and His Circles . Germany: Springer, 2004. 38