Safe and Stabilizing Distributed Multi-Path Cellular Flows Taylor T. Johnson ∗ , Sayan Mitra Coordinated Science Laboratory, University of Illinois at Urbana-Champaign,Urbana, IL 61801, USA Abstract We study the problem of distributed traffic control in the partitioned plane, where the movement of all entities (robots, vehicles, etc.) within each parti- tion (cell) is coupled. Establishing liveness in such systems is challenging, but such analysis will be necessary to apply such distributed traffic control algo- rithms in applications like coordinating robot swarms and the intelligent high- way system. We present a formal model of a distributed traffic control protocol that guarantees minimum separation between entities, even as some cells fail. Once new failures cease occurring, in the case of a single target, the protocol is guaranteed to self-stabilize and the entities with feasible paths to the target cell make progress towards it. For multiple targets, failures may cause deadlocks in the system, so we identify a class of non-deadlocking failures where all en- tities are able to make progress to their respective targets. The algorithm relies on two general principles: temporary blocking for maintenance of safety and local geographical routing for guaranteeing progress. Our assertional proofs may serve as a template for the analysis of other distributed traffic control pro- tocols. We present simulation results that provide estimates of throughput as a function of entity velocity, safety separation, single-target path complexity, failure-recovery rates, and multi-target path complexity. Keywords: distributed systems, swarm robotics, formal methods 1. Introduction Highway and air traffic flows are nonlinear switched dynamical systems that give rise to complex phenomena such as abrupt phase transitions from fast to sluggish flow [1, 2, 3]. Our ability to monitor, predict, and avoid such phe- nomena can have a significant impact on the reliability and capacity of physical traffic networks. Traditional traffic protocols, such as those implemented for air ∗ Corresponding author Email addresses: taylor.johnson@acm.org (Taylor T. Johnson), mitras@illinois.edu (Sayan Mitra) Preprint submitted to Theoretical Computer Science A June 22, 2021 arXiv:1209.2058v2 [cs.RO] 11 Oct 2012 traffic control are centralized [4]—a coordinator periodically collects information from the vehicles, decides and disseminates waypoints, and subsequently the vehicles try to blindly follow a path to the waypoint. Wireless vehicular net- works [5, 6, 7, 8] and autonomous vehicles [9, 10] present new opportunities for distributed traffic monitoring [11, 12, 13] and control [14, 15, 16, 17, 18, 19]. While these protocols may still rely on some centralized coordination, they should scale and be less vulnerable to failures compared to their centralized counterparts. In this paper, we propose a fault-tolerant distributed traffic con- trol protocol, formally model it, and formally prove its correctness. A traffic control protocol is a set of rules that determines the routing and movement of certain physical entities , such as vehicles, robots, or packages, over an underlying graph , such as a road network, air-traffic network, or ware- house conveyor system. Any traffic control protocol should guarantee: (a) ( safety ) that the entities always maintain some minimum physical separation, and (b) ( progress ) that the entities eventually arrive at a given a destination (or target) vertex. In a distributed traffic control protocol, each entity determines its own next- waypoint, or each vertex in the underlying graph determines the next-waypoints for the entities in an appropriately defined neighborhood. In this paper, we study the problem of distributed traffic control in a parti- tioned plane where the motions of entities within a partition are coupled . The problem can be described as follows (refer to Figures 1 and 2). The environ- ment —the geographical space of interest—is partitioned into regions or cells . Each entity is assigned a certain type or color . For each color, there is one source cell and one target cell of the same color. The source cells produce entities of some color, and the target cells only consume entities of a particular color, so the goal is to move entities of color c to the target of color c . The motion of all entities within a cell are coupled, in the sense that they all either move identi- cally, or they all remain stationary (we discuss the motivation for this below). If some entities within some cell i touch the boundary of a neighboring cell j , those entities are transferred to j . Thus, the role of the distributed traffic con- trol protocol is to control the motion of the cells so that the entities (a) always have the required safe separation, and (b) reach their respective targets, when feasible. The coupling mentioned above that requires entities within a cell to move identically may appear strong at first sight. After all, under low traffic condi- tions, individual drivers control the movement of their cars within a particular region of the highway, somewhat independently of the other drivers in that re- gion. However, on highways under high-traffic, high-velocity conditions, it is known that coupling may emerge spontaneously, causing the vehicles to form a fixed lattice structure and move with near-zero relative speed [1, 20]. In other scenarios, coupling arises because passive entities are moved around by active cells. For example, this occurs with packages being routed on a grid of multi- directional conveyors [21, 22], and molecules moving on a medium according to some controlled chemical gradient. Finally, even where the entities are ac- tive and cells are not, the entities can cooperate to emulate a virtual active cell expressly for the purposes of distributed coordination. This idea has been ex- 2 plored for mobile robot coordination in [23] using a cooperation strategy called virtual stationary automata [24, 25]. In this paper, we present a distributed traffic control protocol that guar- antees safety at all times , even when some cells fail permanently by crashing. The protocol also guarantees eventual progress of entities toward their targets, provided (a) that there exists a path through non-faulty cells to the entities’ re- spective targets, and (b) failures have not introduced unrecoverable deadlocks. Specifically, the protocol is self-stabilizing [26, 27], in that once new failures stop occurring, the composed system automatically returns to a state from which progress can be made. The algorithm relies on the following four mechanisms. (a) There is a routing rule to maintain local routing tables to each target at each non-faulty cell. This routing protocol is self-stabilizing and allows our pro- tocol to tolerate crash failures of cells. (b) There is a mutual exclusion and scheduling mechanism to ensure moving entities over distinctly colored overlapping paths do not introduce dead- locks. The locking and scheduling mechanism ensures one-way traffic can make progress over shared routes (traffic intersections). (c) There is a signaling rule between neighbors that guarantees safety while preventing deadlocks. Roughly speaking, the signaling mechanism at some cell fairly chooses among its neighboring cells that contain entities, indicat- ing if it is safe for one of these cells to apply a movement in the direction of the cell doing the signaling. This permission-to-move policy turns out to be necessary, because movement of neighboring cells may otherwise result in a violation of safety in the signaling cell, if entity transfers occur. (d) The movement policy causes all entities on a cell to either move with the same constant velocity in the direction of their destination, or remain sta- tionary to ensure safety. This policy abstracts more complex motion mod- eling. We establish these safety and progress properties through systematic asser- tional reasoning. For safety properties, we establish inductive invariants and for stabilization we use global ranking functions. To show that all entities reach their destinations (when feasible), we use a combination of ranking functions and fairness-based reasoning on infinite executions. These proof techniques may serve as a template for the analysis of other distributed traffic control protocols. Our analysis is generally independent of the size of the environ- ment, number of cells, and number of entities. Additionally, only neighbor- ing cells communicate with one another and the communication topology is fixed (aside from failures). For these reasons, this problem can serve as a case study in automatic parameterized verification of distributed cyber-physical systems [28, 29, 30, 31]. We present simulation results that illustrate the influence (or the lack thereof) of several factors on throughput. (a) Throughput decreases exponentially with 3 1 1.5 2 2.5 3 3.5 4 4.5 5 1 1.5 2 2.5 3 3.5 4 4.5 5 1S 2 3 4 5 T 6 7 8 9 10 11S 12 13 14 15 16 17 18 T x y Figure 1: Source cells ( 1 and 11 ) pro- duce entities that flow toward the target cell ( 18 and 5 ) of the appropriate color. Source-to-target paths overlap at cells 8 and 13 . In this execution, the blue entity on cell 7 is waiting for the red entities to leave the overlapping cells. 6T 11 12 13 14 15 16S 20T 10S Figure 2: If cell 10 moved its blue entities onto the shared one-lane “bridge” ( 11 , 12 , 13 , 14 , 15 ), then all entities would be deadlocked. path length until saturation, as which point it decreases roughly linearly with path length. (b) Throughput decreases roughly linearly with required safety separation and cell velocity. (c) Throughput decreases roughly exponentially until it saturates as a function of path complexity measured in number of turns along a path. (d) Throughput decreases roughly exponentially with failure rate, and increases linearly with recovery rates, under a model where crash failures are not permanent and cells may recover from crashing. (e) Through- put decreases roughly exponentially until it saturates as a function of the per- centage of overlapping cells between different colored targets. Contributions over Previous Work. In previous work [32], we analyzed a similar problem, but have significantly generalized our results in this paper. (A) We consider general tessellations (including triangulations) that define the partitioning, while we considered uniform square partitions in [32]. We also present results on partitioning schemes that cannot work for our for- mulation of the problem. (B) We allow entities of multiple colors, each flowing to a different target, while in [32], we only allowed entities of one color, all of which flowed to the nearest target. This generalization lets source-to-target paths of differ- ent colors overlap, creating intersections, and requires several changes to the algorithm, including adding a mutual exclusion and scheduling mech- anisms used to control traffic intersections. This generalization is signif- icant because it makes the problem applicable to a much wider class of systems. 4 (C) We extended our simulation results to allow for these generalizations, and characterized the cost on throughput due to the extra coordination re- quired to allow multiple colors. Paper Organization. The rest of the paper is organized as follows. First, Sec- tion 2 introduces the model of the physical system. Next in Section 3, we present the distributed traffic control algorithm. Then in Section 4, we define and prove the safety and progress properties. Subsection 4.1 establishes safety. Subsequently, we establish a progress property that shows entities eventually reach their targets in spite of failures (when possible). First in Subsection 4.2, it is shown that the routing protocol to find any target from any cell with a physical path through non-faulty cells to that target is self-stabilizing. Then in Subsection 4.3, we show how overlapping paths to different targets (traffic intersections) can be scheduled. Finally, in Subsection 4.4, it is shown that en- tities on any cell with a feasible physical path to their target eventually reach their target. Simulation results and interpretation are presented in Section 5, followed by a brief discussion of related work and further extensions, and a conclusion in Sections 6, 7, and 8. 2. Physical System Model We describe the physical system in this section. For a set K , we define K ⊥ 4 = K ∪ {⊥} and K ∞ 4 = K ∪ {∞} . For N ∈ N , let [ N ] 4 = { 1 , . . . , N } . The ||·|| brackets are used for the Euclidean norm of a vector. Partitioning. The system consists of N convex polygonal cells partitioning a polygonal environment. Let ID 4 = [ N ] be the set of unique identifiers for all cells in the system.The planar environment Env is some given simply con- nected polygon. A partition P of Env is a set of closed, convex polygonal cells { P i } i ∈ ID such that: (a) the interiors of the cells are pairwise disjoint, (b) the union of the cells is the original polygonal environment, and (c) cells only touch one another at a point or along an entire side. The first two conditions are the standard definition of a partition, while the third restricts any cell from being adjacent along one of its sides to more than one other cell. Thus, cell i occupies a convex polygon P i in the Euclidean plane. The boundary of cell i is denoted by ∂ P i . We denote the vertices (ex- treme points) of P i as V i . We denote the number of sides of P i as ns ( i ) . Let Side ( i, j ) 4 = ∂ P i ∩ ∂ P j be the common side of adjacent cells i and j —we will refer to Side ( i, j ) as both an index and a line segment (set of points). 5 Communications. Cell i is said to be a neighbor of cell j if the boundaries of the cells share a common side. The set of identifiers of all neighbors of cell i is denoted by Nbrs i . This definition of neighbors can naturally be represented as a graph, so let ∆ be the worst-case diameter of such a neighbor communication graph 1 . For each cell i ∈ ID and each neighboring cell j ∈ Nbrs i , let the side normal vector from i to j , denoted n ( i, j ) , be the unit vector orthogonal to Side ( i, j ) and pointing into cell j from the common side Side ( i, j ) . Each cell is controlled by software that implements the distributed traffic control algorithm described in the next section. We consider synchronous pro- tocols that operate in rounds. At each round, each cell exchanges messages bearing state information with its neighbors. Then, each cell updates its soft- ware state and decides the (possibly zero) velocity with which to move any entities on it. Until the beginning of the next round, the cells continue to oper- ate according to this velocity, which may lead to entity transfers. Entities. Each cell may contain a number of entities . Each entity occupies a cir- cular area and represents a physical object (or overapproximation of) such as an aircraft, car, robot, or package. Every entity that may ever be in the system has a unique identifier drawn from an index set I . This assumption is for pre- sentation only, and the algorithm does not rely on knowing entity identifiers. For an entity p ∈ I , we denote the coordinates of its center by p 4 = ( p x , p y ) ∈ R 2 . The open circular area (disc) centered at p of radius r representing entity p is denoted B ( p , l ) . The radius of an entity is l , and r s is the minimum required inter-entity safety gap. We define the total safety spacing radius as d 4 = r s + l . For simplicity of presentation, we work with uniform entity radii l and safety gaps r s . If they differ, we would take l and r s to be the maximums over all entities. We instantiate B ( p , l ) , which represents the physical space occupied by entity p , and we also instantiate B ( p , d ) , which is entity p ’s total safety area. Entity Colors, Source Cells, and Target Cells. There are | C | types (or colors ) of entities, where C is some finite, ordered set. The color of some entity p ∈ I is denoted as color ( p ) . For each c ∈ C , there is a source cell sid c and a target cell tid c . All other cells are ordinary cells . For simplicity of presentation, we assume there is a unique source and target, but the algorithms and the results generalize for when sid c and tid c are sets. Entity p ’s color color ( p ) designates the target cell entity p should eventually reach. The source sid c produces entities of color c and the target tid c consumes entities of color c . The sets of target and source identifiers are denoted ID T ⊆ ID and ID S ⊆ ID , respectively. Entity Movement. All the entities within a cell move identically—either they remain stationary or they move with some constant velocity 0 < v < l in the 1 The diameter of this graph is not static, it may change due to failures, but the worst case is always a path graph, so ∆ = N . 6 direction of one of the sides of the cell. Thus v is the maximum cell velocity, or the greatest distance traveled by any entity over one synchronous round. We require v > 0 to ensure progress. We require that v < l to ensure entities do not collide when transfers occur. Cell velocity may differ in each cell so long as each is upper bounded by v . This movement is determined by the algorithm controlling each cell. When a moving entity touches a side of a cell, it is instantaneously transferred to the neighboring cell beyond that side, so that the entity is entirely contained in the new cell. Safety and Transfer Regions. The safety region on side s of a cell is the area within the cell where (the centers of) new entities entering the cell from side s can be placed. For a side s of some cell i , the safety region on side s SR i ( s ) is the area on P i at most 3 d distance measured orthogonally from side s . Analo- gously, the transfer region on side s of a cell is the area within a cell where (the centers of) entities reside when those entities will be transfered to the neigh- boring cell on that side. The transfer region on side s TR i ( s ) is the region in the partition P i at most l distance measured orthogonally from side s . For a cell i , the transfer region TR i and safety region SR i are respectively the unions of TR i ( s ) and SR i ( s ) for each side s of P i . We refer to the inner side(s) of TR i , TR i ( s ) , SR i , or SR i ( s ) , as the side(s) touching the inside of the annulus, and denote them by ITR i , ITR i ( s ) , etc. For example, in Figure 3, the transfer region for the square cell 3 is the square annulus between the smaller cyan square and the larger blue square (the boundary ∂ P 3 of cell 3 ). Similarly, for the triangular cell 1 in Figure 3, the transfer region is the triangular annulus between the smaller cyan triangle and the larger blue triangle. Thus, the distance measured orthogonally between the sides of the cyan polygons representing the boundary of the transfer region, and the sides of the blue polygons is always l . In Figure 3, for the square cell 3 , the safety region is the square annulus between the smaller red square and the larger blue square. Geometric Assumptions. We assume that the polygonal environment Env and its partition P have shapes and sizes such that each cell in the partition is large enough for an entity to lie completely on it. Particularly, we require for each cell i ∈ ID that the transfer region TR i is nonempty. We also assume the following assumptions to ensure transferring entities between cells is well-defined. Assumption 1. (Projection Property): For each i ∈ ID , for each side s of P i , there exists a constant vector field over P i that drives every point in P i to some point on side s without exiting P i . By definition, the cells form a partition. However, partially because there is “empty space” between the transfer regions of the cells, the transfer regions do not form a partition. Even if we remove this empty space by translating the transfer regions so the sides of transfer regions of neighboring cells coin- cide, they still may not form a partition (see Figure 3 for an example where the transfer regions cannot form a partition). This is because, for the shared side s 7 98 99 100 101 102 103 0 1 2 3 4 5 1T 2S 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x y Figure 3: Safety regions (areas between red and blue) and transfer regions (areas between cyan and blue) for the squares and triangles compos- ing the snub square tiling tessellation. 95 100 105 0 2 4 6 8 10 1 2 3S 4S 5S 6S 7 8 9 10 T 11 12 13 14 15 T 16 17 18 19 20 21 22 23 T 24T x y Figure 4: Blue and red paths overlap at cells 2 , 3 , and 4 . Blue entities on cells 7 and 8 have traversed the intersection and then the red source ( 4 ) produces entities. Red and blue sources pro- ducing entities simultaneously would cause a deadlock. of neighboring cells i and j , the inner sides of the transfer regions on P i and P j may have different lengths, even though the shared side s obviously had the same length for P i and P j . Assumption 2. (Transfer Feasibility): For any i ∈ ID and any j ∈ Nbrs i , con- sider their common side Side ( i, j ) . The length of the inner side ITR i ( Side ( i, j )) line segment equals the length of the inner side ITR j ( Side ( i, j )) line segment. 3. Distributed Traffic Control Algorithm Next, we describe the discrete transition system Cell i that specifies the soft- ware controlling an individual cell P i of the partition P . Preliminaries. A variable is a name with an associate type. For a variable x , its type is denoted by type ( x ) and it is the set of values that x can take. A valuation (or state) for a set of variables X is denoted by x , and is a function that maps each x ∈ X to a point in type ( x ) . Given a valuation x for X , the valuation for a particular variable v ∈ X , denoted by x .v , is the restriction of x to { v } . The set of all possible valuations of X is denoted by val ( X ) . Many variables return cell identifiers that we use to access variables of other cells using subscripts, and if the valuation of these variables are restricted to the same state, we will drop the particular state on the subscripted variables for more concise notation. For instance, suppose x . next i ∈ ID , then x . next x . next i would be written x . next next i . A discrete transition system A is a tuple 〈 X, Q 0 , A, →〉 , where: (i) X is a set of variables and val ( X ) is called the set of states , 8 1 2S 3 4 5S 6T 7 8S 9 10 11S 12 13T 14T 15T 16 17T 18T 19 20S 21 22 23 24S 25 Figure 5: Example illustrating the computa- tion of the color-shared cells and shared colors, stored in the pint [ c ] and lcs i [ c ] variables, re- spectively. The color-shared cells are any cells on overlapping paths, and lcs i [ c ] corresponds to the colors of each disjoint set of color-shared cells. 1T 2 3 4 5S 6 7 8 9 10 11S 12T 13 14 15 16 17 18 19 20 21S 22 23 24 25T Figure 6: Example illustrating the two fairness requirements (Assumption 4) for proving live- ness. Cells 9 , 14 , 19 , and 20 failed, causing the original source-target path for blue to change from cells 5 , 10 , 15 , 20 , 25 . If source cell 5 does not place new entities fairly, then entities on cells 10 and 15 may never reach the target. A similar situation occurs with paths of multiple colors in the lower part of the image. (ii) Q 0 ⊆ val ( X ) is the set of start states , (iii) A is a set of transition names , and (iv) →⊆ val ( X ) × A × val ( X ) is a set of discrete transitions . For ( x k , a, x k +1 ) ∈→ , we also use the notation x k a → x k +1 . An execution fragment of a discrete transition system A is a (possibly infinite) sequence of states α = x 0 , x 1 , . . . , such that for each index appearing in α , ( x k , a, x k +1 ) ∈→ for some a ∈ A . An execution is an execution fragment with x 0 ∈ Q 0 . A state x is said to be reachable if there exists a finite execution that ends in x . A is said to be safe with respect to a set S ⊆ val ( X ) if all reachable states are contained in S . A set S is said to be stable if, for each ( x , a, x ′ ) ∈→ , x ∈ S implies that x ′ ∈ S . A is said to stabilize to S if S is stable and every execution fragment eventually enters S . Cells. We assume messages are delivered within bounded time and computa- tions are instantaneous. Under these assumptions, the system can be modeled as a collection of discrete transition systems. The overall system is obtained by composing the transition systems of the individual cells. We first present the discrete transition system corresponding to each cell, and then describe the composition. The variables associated with each Cell i are as follows, with initial values of the variables shown in Figure 7 using the ‘:=’ notation. 9 (a) Entities i is the set of identifiers for entities located on cell i . Cell i is said to be nonempty if Entities i 6 = ∅ . (b) color i designates the entity colors on the cell, or ⊥ if there are none 2 . (c) failed i indicates whether or not i has failed. (d) NEPrev i are the nonempty neighbors attempting to move entities (of any color) toward cell i . (e) token i is a token used for fairness to indicate which neighbor may move toward i . (f) signal i is the identifier of a neighbor of Cell i that has permission to move toward Cell i . Additionally, the following variables are defined as arrays for each color c ∈ C . The notation next i [ c ] means the c th entry of the next variable of cell i , and so on for the other variables. (a) next i [ c ] is the neighbor towards which i attempts to move entities of color c . (b) dist i [ c ] is the estimated distance—the number of cells—to the nearest target cell consuming entities of color c . (c) lock i [ c ] is a boolean variable for a lock of color c that some cells require to be able to move entities. (d) path i [ c ] is the set of cell identifiers from any source of color c (and any nonempty cell with entities of color c ) to the target of color c . This vari- able and the next two are local variables, but they are storing some global information. (e) pint i [ c ] is the set of cell identifiers in traffic intersections with cells of color c (where path i [ c ] and path i [ d ] have nonempty intersection for some d 6 = c ). (f) lcs i [ c ] is the set of colors that are involved in traffic intersections with the color c path. When clear from context, the subscripts in the names of the variables are dropped. A state of Cell i refers to a valuation of all these variables, i.e., a function that maps each variable to a value of the corresponding type. The complete system is an automaton, called System , consisting of the composition of all the cells. A state of System is a valuation of all the variables for all the cells. We refer to states of System with bold letters x , x ′ , etc. 2 It will be established that cells contain entities of only a single color, see Invariant 3. 10 1 variables Entities : Set [ P ] : = {} 3 NEPrev : Set [ ID ⊥ ] : = {} signal , token : ID ⊥ : = ⊥ 5 color : C ⊥ : = ⊥ failed : B : = false 7 next : [ C → ID ⊥ ] , init ∀ c ∈ C , next [ c ] := ⊥ dist : [ C → N ∞ ] , init ∀ c ∈ C , dist [ c ] := ∞ 9 path : [ C → Set [ ID ⊥ ]] , init ∀ c ∈ C , path [ c ] : = {} pint : [ C → Set [ ID ⊥ ]] , init ∀ c ∈ C , pint [ c ] : = {} 11 nlock : [ C → B ] , init ∀ c ∈ C , nlock : = true lock : [ C → B ] , init ∀ c ∈ C , lock : = false 13 lcs : [ C → Set [ C ]] , init ∀ c ∈ C , lcs [ c ] : = {} 14 transitions fail ( i ) 16 eff failed : = true for each c ∈ C 18 dist [ c ] : = ∞ ; next [ c ] : = ⊥ 20 update eff Route ; Lock ; Signal ; Move Figure 7: Specification of Cell i listing its variables, initial conditions, and transitions. Sub- scripts are dropped for readability. Variables token i , failed i , lock i , and NEPrev i are private to Cell i , while Entities i , dist i , next i , path i , color i , and signal i can be read by neighboring cells of Cell i . This has the following interpretation for an actual message-passing implemen- tation. At the beginning of each round, Cell i broadcasts messages containing the values of these variables and receives similar values from its neighbors. Then, the computation of this round updates the local variables for each cell based on the values collected from its neighbors. Variable Entities i is a special variable because it can also be written to by the neighbors of i . This is how we model transfer of entities between cells. For a state x , for some a ∈ A such that x a → x ′ , for some i ∈ ID , for some j ∈ Nbrs i , for some entity p ∈ x . Entities i , then entity p transfers from cell i to j when p ∈ x ′ . Entities j . We use the notation p ′ to denote the state of entity p at x ′ where x a → x ′ for some a ∈ A . Actions for the Composed System. System is a discrete transition system model- ing the composition of all the cells, and has two types actions: fail s and update s. A fail ( i ) transition models the crash failure of the i th cell and sets failed i to true , dist i [ c ] to ∞ for each c ∈ C , and next i [ c ] to ⊥ for each c ∈ C . Cell i is called faulty if failed i is true , otherwise it is called non-faulty . The set of identifiers of all faulty and non-faulty cells at a state x is denoted by F ( x ) and NF ( x ) , respectively. A faulty cell does nothing—it never moves and it never commu- nicates 3 . An update transition models the evolution of all non-faulty cells over one synchronous round. For readability, we describe the state-change caused by an update transition as a sequence of four functions (subroutines), where for each non-faulty i , (a) Route computes the variables dist i and next i , 3 dist i = ∞ can be interpreted as i ’s neighbors not receiving a timely response from i . 11 1 if ¬ failed i then color i : = { c ∈ C : ∃ p ∈ Entities i ∧ color ( p ) = c } 3 if i / ∈ ID T then for each c ∈ C 5 dist i [ c ] : = ( min j ∈ Nbrs i dist j [ c ] ) + 1 if dist i [ c ] = ∞ then next i [ c ] : = ⊥ 7 else next i [ c ] : = argmin j ∈ Nbrs i 〈 dist j [ c ] , j 〉 Figure 8: Route function for Cell i . This function computes a minimum distance vector routing spanning tree rooted composed of non-faulty cells for each color, rooted at each target. (b) Lock computes the variables path i , pint i , lcs i , and lock i , (c) Signal computes (primarily) the variable signal i , and (d) Move computes the new positions of entities. We note that in the single-color case considered in [32], the Lock subroutine is unnecessary. The entire update transition is atomic, so there is no possibility to interleave fail transitions between the subroutines of update . Thus, the state of System at (the beginning of) round k + 1 is obtained by applying these four functions to the state at round k . Now we proceed to describe the distributed traffic control algorithm that is implemented through these functions. Route . For each cell and each color, the Route function (Figure 8) constructs a distance-based routing table to the target cell of that color. This relies only on neighbors’ estimates of distance to the target. Recall that failed cells have dist [ c ] set to ∞ for every color c ∈ C . From a state x , for each i ∈ NF ( x ) , the variable dist i [ c ] is updated as 1 plus the minimum value of dist j [ c ] for each neighbor j of i . If this results in dist i [ c ] being infinity, then next i [ c ] is set to ⊥ , but otherwise it is set to be the identifier with the minimum dist [ c ] where ties are broken with neighbor identifiers. Next, we introduce some definitions used to relate the system state to the variables used in the algorithm. For a state x , we inductively define the color c target distance ρ c of a cell i ∈ ID as the smallest number of non-faulty cells between i and tid c : ρ c ( x , i ) 4 =          ∞ if x . failed i , 0 if i = tid c ∧ ¬ x . failed i , 1 + min j ∈ x . Nbrs i ρ c ( x , j ) otherwise . A cell is said to be target-connected to color c if ρ c is finite. We define TC ( x , c ) 4 = { i ∈ NF ( x ) | ρ c ( x , i ) < ∞} 12 as the set of cells that are target-connected to tid c . For a state x and a color c ∈ C , we define the routing graph as G R ( x , c ) = ( V R ( x , c ) , E R ( x , c )) , where the vertices and directed edges are, respectively, V R ( x , c ) 4 = NF ( x ) and E R ( x , c ) 4 = { ( i, j ) ∈ V R ( x , c ) : ρ c ( x , j ) = ρ c ( x , i ) + 1 } . Under this definition, G R ( x , c ) is a spanning tree rooted at tid c . We will show that the graph induced by the next i [ c ] variables stabilizes to the routing graph G R ( x , c ) at some state x (). We previously introduced ∆ as the worst-case diam- eter of the communication graph, and will refer to ∆( x ) as the exact diameter at some state x . Lock . The Lock function (Figure 9) executes after Route , and schedules traf- fic over intersections (the cells where source-to-target paths of different colors overlap). To avoid deadlock scenarios, Lock maintains an invariant that entities of at most one color are on these intersections. Moving entities over intersections requires some global coordination as il- lustrated by the following analogy. Consider the policy used to coordinate cars going in opposite directions over a one-lane bridge (see Figure 2), where there is a traffic signal on each side of the bridge. The algorithm chooses one traf- fic light, allowing some cars to safely travel over the bridge in one direction. After some time, the algorithm switches the lights (first turning green to red, and after the road is clear, turning red to green) allowing traffic to flow in the opposite direction. Then this process repeats. Two parts of the previous example require global coordination and are in- cluded in the Lock function. The first is how to chose the direction in which cars are allowed to travel—this is accomplished through the use of a mutual exclusion algorithm. The second is when to allow cars to travel in the opposite direction—this is accomplished by determining when the intersection is empty. We now describe this global coordination more formally. For defining the locking algorithm, we first define intersections. For this we introduce the notion of an entity graph. Cell i is said to be in the entity graph of some color c at state x if one of the following conditions hold: (a) i is sid c , (b) in state x , i has entities of color c , or (c) in state x , i is the neighbor closest to tid c of a cell already in the entity graph. Formally, we define the color c entity graph at state x as G E ( x , c ) = ( V E ( x , c ) , E E ( x , c )) , which is the following subgraph of the color c routing graph G R ( x , c ) . The vertices of G E ( x , c ) are inductively defined as V E ( x , c ) 4 = { i ∈ NF ( x ) : i = sid c ∨ x . color i = c ∨ ( ∃ j ∈ V E ( x , c ) . ( i, j ) ∈ E R ( x , c )) } . The edges of G E ( x , c ) are E E ( x , c ) 4 = { ( i, j ) ∈ V E ( x , c ) × V E ( x , c ) : ( i, j ) ∈ E R ( x , c ) } . For example, if all cells are empty, then V E ( x , c ) is the sequence of cell identifiers defined by following the minimum distance (as defined by ρ c ) 13 1 if ¬ failed i for each c ∈ C 3 if i = sid c ∨ color i = c ∨ i ∈ path i [ c ] then path i [ c ] : = path i [ c ] ∪ { i } ∪ { next i [ c ] } 5 // gossip the entity graph for each j ∈ Nbrs i , path i [ c ] : = path i [ c ] ∪ path j [ c ] 7 // compute the set of color-shared cells pint i [ c ] := { j ∈ path i [ c ] ∩ path i [ d ] : ∃ c 6 = d ∈ C } 9 if pint i [ c ] 6 = ∅ lcs i [ c ] := 11 { d ∈ C : c 6 = d ∧ path i [ c ] ∩ path i [ d ] 6 = ∅} // graphs stabilized and i needs a lock for color c 13 if round > 2∆ ∧ i ∈ pint i [ c ] ∧ ¬ lock i [ c ] Initiate mutual exclusion algorithm between all 15 color-shared cells in pint i [ c ] using lcs i as input Eventually, a color d is returned. 17 On return, if d = c then lock i [ c ] := true // detect if color-shared cells are empty 19 if round > 2∆ ∧ i ∈ pint i [ c ] ∧ lock i [ c ] Initiate distributed snapshot algorithm to decide 21 if all color-shared cells are empty after previously being nonempty with entities of color c. 23 On return, if all cells are empty then lock i [ c ] := false Figure 9: Lock function for Cell i . This function computes the color-shared cells—the cells in intersections—for each color, and then ensures liveness by giving a lock to only one color on each intersection. from the source to the target of color c . That is, each G E ( x , c ) is a simple path graph from source to target 4 . Now we describe how the entity graph of each color c is computed by each cell i as the path i [ c ] variable. If i is on the entity graph of color c , then we add i and i ’s next variable for color c to the entity graph (see Figure 9, lines 3 and 4). Once the next i [ c ] variables stabilize () and after an additional order of diameter rounds, the variable path i [ c ] contains all the entity graphs since we gossip these graphs (line 6). That is, the graph formed by the path i [ c ] variables stabilizes to equal G E ( x , c ) , and contains the sequence of identifiers from any source or nonempty cell of color c to the target of color c (). Next, the variable pint i [ c ] is computed to be the set of cell identifiers on the color c entity graph that overlaps with any other colored entity graph (line 8). The cells involved in such non-empty intersections represent physical traffic intersections, and are called color-shared cells . These cells require coordinated locking for traffic flow to progress. Cell i is in pint i [ c ] if and only if it will need a lock for color c . Formally, we define the c color-shared cells , for a state x , for any c ∈ C , as CSC ( x , c ) = { V E ( x , c ) : ∃ d ∈ C.c 6 = d ∧ V E ( x , c ) ∩ V E ( x , d ) 6 = ∅} . 4 Once cells have failed, this may stabilize to be a tree from any cell with entities of color c to the target of color c . 14 if ¬ failed i ∧ round > 2∆ then 2 cn := { d ∈ C : ∃ j ∈ Nbrs i s.t. next j [ d ] = i ∧ color j = d } 4 if color i = ⊥ then c := choose from cn 6 else c := color i 8 NEPrev i : = { j ∈ Nbrs i : next j [ c ] = i ∧ Entities j 6 = ∅} if token i = ⊥ then 12 token i : = choose from NEPrev i 14 let j = token i if ∀ p ∈ Entities i : p / ∈ SR ( i, j ) 16 ∧ ( color i 6 = ⊥ ⇒ color i = color j ) ∧ ( j ∈ pint i [ c ] ⇒ lock j [ c ]) 18 then signal i : = j 20 if | NEPrev i | > 1 then token i : = choose from NEPrev i \ { j } 22 elseif | NEPrev i | = 1 then token i : = choose from NEPrev i 24 else token i : = ⊥ else signal i : = ⊥ ; token i : = j Figure 10: Signal function for Cell i . Cell i signals fairly to some neighbor j if it is safe for j to move its entities toward i . In Figure 1, these are cells 8 and 12 . The pint i [ c ] variables stabilize to equal CSC ( x , c ) , at some state x , for any color c (). Next, we need to determine the colors that will need to coordinate to sched- ule traffic through the color-shared cells. Then, a mutual exclusion algorithm is initiated between all cells for each disjoint set of cell colors in pint i [ c ] . Formally, we define the c shared colors , for a state x , for any c ∈ C , as SC ( x , c ) = { d ∈ C : c 6 = d ∧ CSC ( x , d ) = CSC ( x , c ) } . The lcs i [ c ] variables stabilize at some state x to equal SC ( x , c ) , for any color c . In general, up to | C | colors could be involved in intersections, as well as all the smaller permutations. For instance, consider Figure 5 with 6 colors at some state x . Here, the blue and red entity graphs overlap, green and blue entity graphs overlap, but red and green do not, and independently, the purple and yellow entity graphs overlap (that is, not with blue, red, nor green), but no colors overlap with brown. Then SC ( x , c ) is { blue, red, green } for c equal to blue, red, or green, SC ( x , c ) is { yellow, purple } for c equal to yellow or purple, and SC ( x , c ) is empty for c equal to brown. Two mutual exclusion algorithms would be initiated, one with blue, red, and green as the input set of values, and another with yellow and purple as the input set. Upon these two instances terminating, one element of the first set, say green , would be chosen and given a lock, and one element, say yellow , of the second set would also be given a lock. The entities of these colors progress over the color-shared cells toward their intended targets. Finally, once the color-shared cells are empty again, green and yellow would each be removed from the respective input sets for fairness, and another mutual exclusion algorithm is initiated. Signal . The Signal function (Figure 10) executes after Lock . It is the key part of the protocol for maintaining safe entity separations, guaranteeing each cell has entities of only a single color, and ensuring progress of entities to the target. Roughly, each cell implements this through the following policies: (a) only 15 accept entities from a neighbor when it is safe to do so, (b) only accept entities with the same color as the entities currently on the cell (or an arbitrary color if the cell is empty), (c) if a lock is needed, then only let entities move if it is acquired, and (d) ensure fairness by providing opportunities infinitely often for each nonempty neighbor to make progress. First i computes a temporary variable cn , which is the set of colors for any neighbor that has entities of some color, with the corresponding next variable set to cell i . Next, cell i picks a color c from this set if it is empty, or the color of its own entities if it is nonempty, and will attempt to allow some cell with this chosen color to move toward itself. Then, cell i sets NEPrev i to be the subset of Nbrs i for which next has been set to i and Entities is nonempty. If token i is ⊥ , then it is set to some arbitrary value in NEPrev i , but it continues to be ⊥ if NEPrev i is empty. Otherwise, token i = j for some neighbor j of i with nonempty Entities j . This is accomplished through the conditional in line 6 as a step in guaranteeing fairness. It is then checked if there is any entity p with center p in the safety region of Cell i on the side corresponding to token i . If there is such an entity, then signal i is set to ⊥ , which blocks the neighboring cell with identifier token i from moving its entities in the direction of i , thus preventing entity transfers and ensuring safety. Otherwise, if there is no entity with center in the safety region on side token i , then signal i is set to token i to allow token i to move its entities toward i . Subsequently, token i is updated to a value in NEPrev i that is different from its previous value, if that is possible according to the rules just described (lines 20–22). Move . Finally, the Move function (Figure 11) models the physical movement of all the entities on cell i over a given round. For cell i , let j be next i [ c ] , where c is color i (which may be ⊥ if cell i has no entities). Every entity in Entities i moves in the direction of j if and only if signal j is set to i . The direction followed from cell i to j is u ( i, j ) , which is any vector satisfying Assumption 1. For example, for a square (or rectangular) cell i , one choice for u ( i, j ) is the unit vector orthogonal to Side ( i, j ) and pointing into j . In the case of an equilateral triangular cell i , one choice for u ( i, j ) is also any orthogonal vector pointing into j . The movement toward cell j may lead to some entities crossing the bound- ary of Cell i into Cell j , in which case, they are removed from Entities i . If j is not the target matching the transferred entities’ color, then the removed entities are added to Entities j . In this case (line 9), any transferred entity p is placed so that D l ( p ) touches a single point of (is tangent to) Side ( i, j ) , the shared side of cells i and j , and lies on the inner side of the transfer region of cell j on side Side ( i, j ) . Resetting entity positions is a conservative approximation to the ac- tual physical movement of entities. If j is the target matching the transferred entities’ color, then the removed entities are not added to any cell and thus no longer exist in System . The source cells i ∈ ID S , in addition to the above, add a finite number of entities in each round to Entities i , such that the addition of these entities does 16 1 let c = color i let j = next i [ c ] 3 if ¬ failed i ∧ signal j = i then for each p ∈ Entities i 5 p : = p + v u ( i, j ) if p ∈ TR i then 7 Entities i : = Entities i \ { p } if j 6 = tid c then 9 Entities j : = Entities j ∪ { p } // point on shared side along movement vector 11 p : = ( p + v u ( i, j )) ∩ Side ( i, j ) // inner transfer region along orthogonal line 13 p : = ( p + n ( i, j )) ∩ ITR j ( Side ( i, j )) Figure 11: Move function for Cell i . If i has received a signal to move from j , it updates the positions of all entities on it to move in j ’s direction, which may lead to some entities transferring from cell i to j . not violate the minimum gap between entities at Cell i . In the remainder of the paper, we will analyze System to show that in spite of failures, it maintains safety and liveness properties to be introduced in the next section. 4. Analysis of Distributed Traffic Control In this section, we present an analysis of the safety and liveness properties of System . Roughly, the safety property requires that there is a minimum gap between entities on any cell, and the liveness property requires that all entities that reside on cells with feasible paths to the corresponding target eventually reach that target. 4.1. Safety and Collision Avoidance A state is safe if, for every cell, the boundaries of all entities in the cell are separated by a distance of r s . For any state x of System , we define: Safe i ( x ) 4 = ∀ p, q ∈ x . Entities i .p 6 = q ⇒ || p − q || ≥ 2 l + r s , and Safe ( x ) 4 = ∀ i ∈ ID , Safe i ( x ) . This definition allows entities in different cells to be closer than 2 l + r s apart, but their centers will be spaced by at least 2 l . We proceed by proving some pre- liminary properties of System that will be used for proving Safe is an invariant. The first property asserts that entities’ cannot come close enough to the sides of cells to reside on multiple cells. This is because any entity whose boundary touches the side of a cell is transferred to the neighboring cell on that side (if one exists), and then the entity’s position is reset to be completely within the new cell. Assumption 2 restricts the allowed partitions to ensure en- tity transfers are well-defined. For instance, some of the cells in the snub square 17 tiling in Figure 3 do not satisfy Assumption 2. Consider an entity transfer from cell 3 to cell 5 . There is no constant vector connecting the transfer regions of cell 3 to those of cell 5 . This is because the side length of the transfer region of the triangular cell 5 is shorter than the side length of the transfer region of the square cell 3 . However, in a transfer from cell 1 to cell 2 or vice-versa, the side lengths are the same. We also note that the assumption is only necessary for entity transfers from a cell with a longer transfer side length to a neighboring cell with smaller corresponding transfer side length. For example, a transfer from cell 5 to cell 3 is feasible. Under Assumption 2, we have the following invariant, which states that the l -ball around each entity in a cell is completely contained within the cell. Invariant 1. In any reachable state x , ∀ i ∈ ID , ∀ p ∈ x . Entities i , D l ( p ) \ P i = ∅ . The next invariant states that cells’ Entities sets are disjoint. This is imme- diate from the Move function since entities are only added to one cell’s Entities upon being removed from a different cell’s Entities . Invariant 2. In any reachable state x , for any i, j ∈ ID , if i 6 = j , then x . Entities i ∩ x . Entities j = ∅ . The following invariant states that cells contain entities of a single color in spite of failures. This follows from the Signal routine in Figure 10, where line 16 requires that if some neighbor j is attempting to move entities toward cell i , then the color of i is either ⊥ or equal to the color of j . Invariant 3. In any reachable state x , for all i ∈ ID , for all p, q ∈ x . Entities i , color ( p ) = color ( q ) . Next, we define a predicate that states that if signal i is set to the identifier of some neighbor j ∈ Nbrs i , then there is a large enough area from the common side between i and j where no entities reside in Cell i . Recall that Side ( i, j ) is the line segment shared between neighboring cells i and j . For a state x , H ( x ) 4 = ∀ i ∈ ID , ∀ j ∈ Nbrs i , if x . signal i = j , then the following holds: ∀ p ∈ x . Entities i , min x ∈ Side ( i,j ) || p − x || ≥ 3 d. H ( x ) is not an invariant property because once entities move the property may be violated. However, for proving safety, all that needs to be established is that at the point of computation of the signal variable this property holds. The next key lemma states this. Lemma 4. For all reachable states x , H ( x ) ⇒ H ( x S ) where x S is the state obtained by applying the Route , Lock , and Signal functions to x . Proof. Fix a reachable state x , an i ∈ ID , and an j ∈ Nbrs i such that x . signal i = j . Let x R be the state obtained by applying the Route function to x , x L be the state obtained by applying the Lock function to x R , and x S be the state obtained by applying the Signal function to x L . 18 First, observe that both H ( x R ) and H ( x L ) hold. This is because the Route and Lock functions do not change any of the variables involved in the defini- tion of H ( · ) . Next, we show that H ( x L ) implies H ( x S ) . If x S . signal i 6 = j then the statement holds vacuously. Otherwise, x S . signal i = j , then since (a) H ( x L ) holds, and (b) Figure 10, line 6 is satisfied, we have that H ( x S ) . The following lemma asserts that if there is a cycle of length two formed by the signal variables—which could occur due to failures—then entity transfers cannot occur between the involved cells in that round. Lemma 5. Let x be any reachable state and x ′ be a state that is reached from x af- ter a single update transition (round). If x . signal i = j and x . signal j = i , then x . Entities i = x ′ . Entities i and x . Entities j = x ′ . Entities j . Proof. No entities enter either x ′ . Entities i or x ′ . Entities j from any other m ∈ Nbrs i or n ∈ Nbrs j since x . signal i = j and x . signal j = i . It remains to be es- tablished that @ p ∈ x . Entities j such that p ′ ∈ x ′ . Entities i where p = p ′ or vice- versa. Suppose such a transfer occurs. For the transfer to have occurred, p must be such that p ′ = ( p x , p y ) + v u ( i, j ) by Figure 11, line 5. But for x . signal i = j to be satisfied, it must have been the case that D l ( p ) ∩ P i = ∅ by Figure 10, line 6 and since v < l , a contradiction is reached. Using the previous results, we now prove that System preserves safety even when some cells fail. Theorem 1. In any reachable state x of System , Safe ( x ) . Proof. The proof is by standard induction over the length of any execution of System . The base case is satisfied by the assumption that initial states x ∈ Q 0 satisfy Safe ( x ) . For the inductive step, consider any reachable states x , x ′ and an action a ∈ A such that x a → x ′ . Fix i ∈ ID and assuming Safe i ( x ) , we show that Safe i ( x ′ ) . If a = fail i , then Safe i ( x ′ ) since no entities move. For a = update , there are two cases to consider by Invariant 2. First, x ′ . Entities i ⊆ x . Entities i , that is, no new entities were added to i , but some may have trans- fered off i . There are two sub-cases: if x ′ . Entities i = x . Entities i , then all entities in x . Entities move identically and the spacing between two distinct entities p , q ∈ x ′ . Entities i is unchanged. Let j = next i [ c ] where c = color i by Invariant 3. That is, ∀ p, q ∈ x . Entities i , ∀ p ′ , q ′ ∈ x ′ . Entities i such that p ′ = p and q ′ = q and where p 6 = q , ∣ ∣∣ ∣ ( p ′ x , p ′ y ) − ( q ′ x , q ′ y ) ∣ ∣∣ ∣ = || ( p x , p y ) + v u ( i, j ) , ( q x , q y ) + v u ( i, j ) || (Figure 11, line 5). It follows by the inductive hypothesis that ∣ ∣∣ ∣ ( p ′ x , p ′ y ) − ( q ′ x , q ′ y ) ∣ ∣∣ ∣ ≥ d . The second sub-case arises if x ′ . Entities i ( x . Entities i , then Safe i ( x ′ ) is ei- ther vacuously satisfied or it is satisfied by the same argument just stated. The second case is when x ′ . Entities i * x . Entities i , that is, there was at least one entity transfered to i . Consider any such transferred entity p ′ ∈ x ′ . Entities i where p ′ / ∈ x . Entities i . There are two sub-cases. The first sub-case is when p ′ was added to x ′ . Entities i because i is a source, that is, i ∈ ID S . In this case, the specification of the source cells states that the entity p ′ was added to x ′ . Entities i without violating Safe i ( x ′ ) , and the proof is complete. Otherwise, 19 p ′ was added to x ′ . Entities i by some neighbor j ∈ x . Nbrs i , so p ′ ∈ x . Entities j but p ′ / ∈ x . Entities i , and p ′ ∈ x ′ . Entities i but p ′ / ∈ x ′ . Entities j . From line 9 of Figure 11, we have that that ( p ′ x , p ′ y ) = ResetEntity ( p, i, j ) . The fact that p ′ transferred from Cell j in x to Cell i in x ′ implies that x . next j = i and x . signal i = j —these are necessary conditions for the transfer by Figure 10, line 15. Thus, applying the predicate H ( x ) at state x and by Lemma 4, it follows that for ev- ery q ∈ x . Entities i , ( q x , q y ) / ∈ FR ( i, j ) . It must now be established that if p ′ is transfered to x ′ . Entities i , then every q ′ ∈ x ′ . Entities i , where q ′ 6 = p ′ satisfies ( q ′ x , q ′ y ) / ∈ FR ( i, j ) , which means that any entity q already on i did not move toward the transfered entity p that is now on i . This follows by application of Lemma 5, which states that if entities on adjacent cells move towards one another simultaneously, then a transfer of entities cannot occur. This implies that the discs of all entities q ′ in x ′ . Entities i are farther than r s of the borders of any transfered entity p ′ , implying Safe i ( x ′ ) . Finally, since i was chosen arbi- trarily, Safe ( x ′ ) . Theorem 1 shows that System is safe in spite of failures. 4.2. Stabilization of Spanning Routing Trees Next, we show under some additional assumptions, that once new failures cease to occur, System recovers to a state where each non-faulty cell with a fea- sible path to its target computes a route toward it. This route stabilization is then used in showing that any entity on a non-faulty cell with a feasible path to its target makes progress toward it. Our analysis relies on the following as- sumptions on cell failures and the placement of new entities on source cells. The first assumption states that no target cells fail, and is reasonable and nec- essary because if any target cell did fail, entities of that color obviously cannot make progress. Assumption 3. No target cells t ∈ ID T may fail. The next assumption ensures source cells place entities fairly so that they may not perpetually prevent any neighboring cell or any color-shared cell from making progress. The assumption is needed because it provides a specification of how the source cells behave, which has not been done so far. The assumption is reasonable because it essentially says that traffic is not produced perpetually without any break. Assumption 4. (Fairness): Source cells place new entities without perpetually block- ing either (i) any of their nonempty non-faulty neighbors, or (ii) any cell i ∈ CSC ( x , c ) , where c is the color of source s . Formally, the first fairness condition states, for any execution α of System , for any color c ∈ C , for any source cell sid c , if there exists an i ∈ Nbrs s , such that for every state x in α after a certain round, i ∈ x . NEPrev s , then eventually signal s becomes equal to i in some round of α . The second fairness condition states, for any execution α of System , for any state x ∈ α , for any color c ∈ C , 20 for any source cell sid c , if there exists an i ∈ NF ( x ) such that i ∈ CSC ( x , c ) , and for every state x in α after a certain round, if cell i is nonempty, then eventually signal j becomes equal to i in some round of α , where j is a neighbor of i . Such conditions can be ensured if we suppose some oracle placing entities on source cells follows the same round-robin like scheme defined in the Signal subrou- tine in Figure 10. Scenarios where each of these cases can arise are illustrated in Figure 6. A fault-free execution fragment α be a sequence of states starting from x and along which no fail ( i ) transitions occur. That is, a fault-free execution frag- ment is an execution fragment with no new failure actions, although there may be existing failures at the first state x of α , so F ( x ) need not be empty. Through- out the remainder of this section, we will consider fault-free executions that satisfy Assumptions 3 and 4. Lemma 6. Consider any reachable state x of System , any color c ∈ C , and any i ∈ TC ( x , c ) \ { tid c } . Let h = ρ c ( x , i ) . Any fault-free execution fragment α starting from x stabilizes within h rounds to a set of states S with all elements satisfying: dist i [ c ] = h, and next i [ c ] = i n , where ρ c ( x , i n ) = h − 1 . Proof. Fix an arbitrary state x , a fault-free execution fragment α starting from x , a color c ∈ C , and i ∈ TC ( x , c ) \ { tid c } . We have to show that (a) the set of states S is closed under update transitions and (b) after h rounds, the execution fragment α enters S . First, by induction on h we show that S is stable. Consider any state y ∈ S and a state y ′ that is obtained by applying an update transition to y . We have to show that y ′ ∈ S . For the base case, h = 1 , so y . dist i [ c ] = 1 and y . next i [ c ] = tid c . From lines 5 and 7 of the Route function in Figure 8, and that there is a unique tid c for each color c , it follows that y ′ . dist i [ c ] remains 1 and y ′ . next i [ c ] remains tid c . For the inductive step, the inductive hypothesis is, for any given h , if for any j ∈ NF ( x ) , y . dist j [ c ] = h and y . next j [ c ] = m , for some m ∈ ID with ρ c ( x , m ) = h − 1 , then y ′ . dist j [ c ] = h and y ′ . next j [ c ] = m. Now consider i such that ρ c ( y , i ) = ρ c ( y ′ , i ) = h + 1 . In order to show that S is closed, we have to assume that y . dist i [ c ] = h + 1 and y . next i [ c ] = m , and show that the same holds for y ′ . Since ρ c ( y ′ , i ) = h + 1 , i does not have a neighbor with target distance smaller than h . The required result follows from applying the inductive hypothesis to m and from lines 5 and 7 of Figure 8. Second, we have to show that starting from x , α enters S within h rounds. Once again, this is established by induction on h , which is ρ c ( x , i ) . Consider any state y such that ρ c ( x , i ) = ρ c ( y , i ) . The base case only includes the target distances satisfying h = ρ c ( y , i ) = 1 and follows by instantiating i n = tid c . For the inductive case, assume for the inductive hypothesis that at some state y , y . dist j [ c ] = h and y . next j [ c ] = i n such that ρ c ( y , i n ) = h − 1 , where i n is 21 the minimum identifier among all such cells (since we used cell identifiers to break ties). Observe that there is one such j ∈ y . Nbrs i by the definition of TC . Then at state y ′ , by the inductive hypothesis and lines 5 and 7 of Figure 8, y ′ . dist i [ c ] = y ′ . dist j [ c ] + 1 = h + 1 . The following corollary of Lemma 6 states that, after new failures cease occurring, for all target-connected cells, the graph induced by the next [ c ] vari- ables stabilizes to the color c routing graph, G R ( x , c ) , within at most the di- ameter of the communication graph number of rounds, which is bounded by ∆( x ) . Corollary 7. Consider any execution α of System with an arbitrary but finite se- quence of fail transitions. For any state x ∈ α at least 2∆( x ) rounds after the last fail transition, for any c ∈ C , every cell i target-connected to color c has x . next i [ c ] equal to the identifier of the next cell along such a route. The following corollary of Lemma 6 and states that within 2∆( x ) rounds after routes stabilize, for each color c ∈ C , the identifiers in the path i [ c ] vari- ables equal the vertices of the color c entity graph G E ( x , c ) . The result follows since routes stabilize and that Lock is a function of next and path variables only, and that path i variables are gossiped in Figure 9, line 6. Corollary 8. Consider any execution α of System with an arbitrary but finite se- quence of fail transitions. For any state x ∈ α at least 2∆( x ) rounds after the last fail transition, for every c ∈ C , every cell i target-connected to color c has path i [ c ] = V E ( x , c ) . The next corollary of Lemma 6 states that eventually the values of the pint [ c ] variables equal the set of color-shared cells CSC ( x , c ) for any cell i and color c . This is important because the mutual exclusion algorithm is initiated between the cells in pint [ c ] (Figure 9, line 13). Corollary 9. Consider any execution α of System with an arbitrary but finite se- quence of fail transitions. For any state x ∈ α at least 2∆( x ) rounds after the last fail transition, for every c ∈ C , every cell i target-connected to color c has x . pint [ c ] = CSC ( x , c ) . 4.3. Scheduling Entities through Color-Shared Cells In this section, we show that there is at most a single color on the set of color-shared cells if there are no failures. We then show that any cell that re- quests a lock eventually gets one, under an additional assumption that failures do not cause entities of more than one color to reside on the set of color-shared cells. Because failures cause the routing graphs and entity graphs to change, the color-shared cells that could previously be scheduled may now be dead- locked. Additionally, because we separately lock each disjoint set of color- shared cells to allow entities of some color to flow toward their target, it could be the case that the intermediate states between when the failure occurred and 22 when routes have stabilized allowed entities to move in such a way that dead- locks the system. Such deadlocks could be avoided if a centralized coordinator informs every non-faulty cell to disable their signals when a failure is detected. The assumption states that with failures, the color-shared cells either all have the same-colored entities, or have no entities (and combinations thereof). Assumption 5. Feasibility of Locking after Failures : For any reachable state x , for any color c ∈ C , consider the color-shared cells CSC ( x , c ) . For all distinct cells i, j ∈ CSC ( x , c ) either x . color i = x . color j or x . color i = ⊥ . The next lemma states that without failures, there are entities of at most a single color on the set of color-shared cells. The result is not an invariant be- cause failures may cause the set of color-shared cells to change, resulting in deadlocks, which is why we need Assumption 5. By Invariant 3, we know that there are entities of at most a single color in each cell, so the following invariant is stated in terms of the color color i of each cell. We emphasize that Assump- tion 5 is unnecessary if there are no failures, as the algorithm ensures there are entities of at most a single color on the color-shared cells by the following lemma. Lemma 10. If there are no failures, for any reachable state x , for any c ∈ C , for any i ∈ CSC ( x , c ) , if ¬ x . lock i [ c ] , then for all j ∈ CSC ( x , c ) , we have x . color j 6 = c . Proof. The proof is showing an inductive invariant, supposing no failures oc- cur. For the initial state, all cells are empty, so we have x . color i = ⊥ for any i ∈ ID . For the inductive step, we are only considering update actions by as- sumption. In the pre-state, we have ¬ x . lock i [ c ] and ∀ j ∈ CSC ( x , c ) , we have x . color j 6 = c . Fix some c ∈ C and some i ∈ CSC ( x , c ) . For any subsequent state x ′ , if x ′ . lock i [ c ] , the result follows vacuously. If ¬ x ′ . lock i [ c ] , we must show ∀ j ∈ CSC ( x , c ) that x . color j 6 = c , so fix some j ∈ CSC ( x , c ) . If j ∈ CSC ( x ′ , c ) , the result follows, since by the inductive hypothesis, x . color j = x ′ . color j 6 = c . If j / ∈ CSC ( x ′ , c ) , the condition in Signal (Figure 10, line 17) cannot be satisfied since ¬ x ′ . lock i [ c ] . Thus, no cell with entities of color c could move toward any cell in CSC ( x ′ , c ) , and we have x ′ . color j 6 = c . The next lemma states that without failures, or with “nice” failures as de- scribed by Assumption 5, that any cell requesting a lock of some color will eventually get it, and thus it may move entities onto the color-shared cells. Lemma 11. For any reachable state x satisfying Assumption 5, for any c ∈ C , for any i ∈ NF ( x ) , if i ∈ x . pint [ c ] and all cells in CSC ( x , c ) are empty, then eventually a state x ′ is reached where x ′ . lock i [ c ] . Proof. By correctness of the mutual exclusion algorithm, eventually a color d ∈ SC ( x ′ , c ) is returned and x ′ . lock i [ d ] = true (Figure 9, line 13). If c = d , then the result follows. If c 6 = d , by Lemma 10 and Assumption 5, we know that no other color aside from c has entities on any cell j ∈ CSC ( x ′ , c ) . The next time the mutual exclusion algorithm is initiated, d is excluded from the input set to the mutual exclusion algorithm (Figure 9, line 19), and by repeated argument, eventually lock i [ c ] . 23 4.4. Progress of Entities towards their Targets Using the results from the previous sections, we show that once new fail- ures cease occurring, for every color c ∈ C , every entity of color c on a cell that is target-connected eventually gets to the target of color c . The result (The- orem 2) uses two lemmas which establish that, along every infinite execution with a finite number of failures, every nonempty target-connected cell gets per- mission to move infinitely often (Lemma 13), and a permission to move allows the entities on a cell to make progress towards the target (Lemma 12). For the remainder of this section, we fix an arbitrary infinite execution α of System with a finite number of failures, satisfying Assumption 5. Let x f be any state of System at least 2∆( x ) rounds after the last failure, and α ′ be the infinite failure-free execution fragment x f , x f +1 , . . . of α starting from x f . For any c ∈ C , observe that the number of target-connected cells remains constant starting from x f for the remainder of the execution. That is, TC ( x f , c ) = TC ( x f +1 , c ) = TC ( . . . , c ) , so we fix TC ( c ) = TC ( x f , c ) . Lemma 12. For any c ∈ C , for any i ∈ TC ( c ) , for some j ∈ x f . Nbrs i , if k > f , x k . signal j = i , and x k . next i [ c ] = j , for any entity p ∈ x k . Entities i , let the distance function be defined by the lexicographically ordered tuple R ( x , p ) = 〈 ρ c ( x , i ) , ds − p 〉 , where ds is the point on the shared side Side ( i, j ) defined by the line passing through p with direction u ( i, j ) . Then, R ( x k +1 , p ) < R ( x k , p ) . Proof. The first case is when no entity transfers from i to j in the k + 1 th round: if p ′ ∈ x k +1 . Entities i such that p ′ = p , then || ds − p ′ || < || ds − p || . In this case, the result follows since a velocity v > 0 is applied towards cell j by Move in Figure 11, line 5. The second case is when some entity p transfers from i to j , so p ′ ∈ x k +1 . Entities j such that p ′ = p . In this case, we have ρ c ( x k , j ) < ρ c ( x k , i ) , since the distance between j and tid c is smaller than the distance between i and tid c since routes have stabilized by Lemma 6. In either case, R ( x k +1 , p ) < R ( x k , p ) , so entity p is closer to the appropriate target. The following lemma states that all cells with a path to the target receive a signal to move infinitely often, so Lemma 12 applies infinitely often. Lemma 13. For any c ∈ C , consider any i ∈ TC ( c ) \ tid c , such that for all k > f , if x k . Entities i 6 = ∅ , then ∃ k ′ > k such that x k ′ . signal next i [ c ] = i . Proof. Fix some c ∈ C . Since i ∈ TC ( c ) , there exists h < ∞ such that for all k > f , ρ c ( x k , i ) = h . We prove the lemma by inducting on h . The base case is h = 1 . Fix i and instantiate k ′ = f + ns ( tid c ) . By Lemma 6, for any t ∈ ID T , for all non-faulty i ∈ Nbrs t , x f . next i [ c ] = t since k > f . For all k > f , if x k . Entities i 6 = ∅ , then signal tid c changes to a different neighbor with entities every round. It is thus the case that | x k . NEPrev tid c | ≤ ns ( tid c ) and since Entities tid c = ∅ always, exactly one neighbor satisfies the conditional of Figure 10, line 6 in any round, then within ns ( tid c ) rounds, signal tid c = i . 24 For the inductive case, let k s = k + h be the step in α after which all non- faulty a ∈ Nbrs i have x k s . next a [ c ] = i by Lemma 6. Also by Lemma 6, ∃ m ∈ Nbrs i such that x k s . dist m < x k s . dist i , implying that after k s , | x k s . NEPrev i | ≤ ns ( i ) since x k s . next i = m and x k s . next m 6 = i . By the inductive hypothesis, x k s . signal next i [ c ] = i infinitely often. If i ∈ ID S , then entity initialization does not prevent x k . signal i = a from being satisfied infinitely often by the second assumption introduced in Subsection 4.2. It remains to be established that signal i = a infinitely often. Let a ∈ x k s . NEPrev i where ρ c ( x k s , a ) = h + 1 . In any of the following cases, if i ∈ x k s . pint [ c ] and all cells j ∈ CSC ( x k s , c ) are empty, then by Lemma 11, eventually lock i [ c ] . If | x k s . NEPrev i | = 1 , then since the inductive hypothesis satisfies signal next i [ c ] = i infinitely often, then Lemma 12 applies infinitely often, and thus Entities i = ∅ infinitely often, finally implying that signal i = a infinitely often. If | x k s . NEPrev i | > 1 , there are two sub-cases. The first sub-case is when no entity enters i from some d 6 = a ∈ x k s . NEPrev i , which follows by the same reasoning used in the | x k s . NEPrev i | = 1 case. The second sub-case is when a entity enters i from d , in which case it must be established that signal i = a infinitely often. This follows since if x k ′ . token i = a where k ′ > k t > k s and k t is the round at which an entity entered i from d , and the appropriate case of Lemma 4 is not satisfied, then x k ′ +1 . signal i = ⊥ and x k ′ +1 . token i = a by Figure 10, line 25. This implies that no more entities enter i from either cell d satisfying d 6 = a . Thus token i = a infinitely often follows by the same reasoning | x k s . NEPrev i | = 1 case. The final theorem establishes that entities on any cell in TC ( c ) eventually reach the target in α ′ . Theorem 2. For any c ∈ C , consider any i ∈ TC ( c ) , ∀ k > f , ∀ p ∈ x k . Entities i , ∃ k ′ > k such that p ∈ x k ′ . Entities next i [ c ] . Proof. Fix c ∈ C , i ∈ TC ( c ) , a round k > f and p ∈ x k . Entities i . Let h = max i ∈ TC ( c ) ρ c ( x f , i ) which is finite. By Lemma 6, at every round after k s = k + h for any i ∈ TC ( c ) , the sequence of identifiers β = i , x k s . next i [ c ] , x k s . next next i [ c ] [ c ] , . . . forms a fixed path to tid c . Applying Lemma 13 to i ∈ TC ( c ) shows that there exists k m ≥ k s such that x k m . signal next i [ c ] = i . Now applying Lemma 12 to x k m establishes movement of p towards x k s . next i [ c ] , which is also x k m . next i [ c ] . Lemma 13 further establishes that this occurs infinitely often, thus there is a round k ′ > k m such that p gets transferred to x k m . Entities next i [ c ] . By an induction on the sequence of identifiers in the path β , it follows that entities on any cell in TC ( c ) eventually get consumed by the target. Summary of Results In this section, we establish several invariant properties culminating in prov- ing safety of the system, which meant that entities never collide, in spite of failures. Next, we proved that the routing algorithm used to construct paths to the destinations is self-stabilizing in spite of arbitrary crash failures. We next 25 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 r s throughput v=0.05 v=0.1 v=0.2 v=0.25 Figure 12: Throughput versus safety spacing r s for several values of v , for K = 2500 , l = 0 . 25 for System with an 8 × 8 unit square tessellation. 0 1 2 3 4 5 6 7 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 number of turns along path throughput r s =0.05, v=0.2, l=0.2 r s =0.05, v=0.1, l=0.2 r s =0.05, v=0.1, l=0.1 r s =0.05, v=0.05, l=0.1 Figure 13: Throughput versus number of turns along a path, for a path of length 8 , where K = 2500 , r s = 0 . 05 , and each of l and v are varied for System with an 8 × 8 unit square tessellation. showed under an assumption that failures do not introduce deadlock scenarios that the locking algorithm allows multi-color flows to mutual exclusively take control of intersections (color-shared cells). Finally, under a fairness assump- tion, we established the main progress property through two results, that any cell gets permission to move infinitely often, and that any cell with a permis- sion to move decreases the distance of any entities on it from its destination. 5. Simulation Experiments We have performed several simulation studies of the algorithm for evaluat- ing its throughput performance. In this section, we discuss the main findings with illustrative examples taken from the simulation results. We implemented the simulator in Matlab, and all the partition figures displayed in the paper are created using it. Let the K -round throughput of System be the total number of entities arriving at the target over K rounds, divided by K . We define the average throughput (henceforth throughput) as the limit of K -round throughput for large K . All simulations start at a state where all cells are empty and subsequently entities are added to the source cells. Single-color throughput without failures as a function of r s , l , v . Rough calcula- tions show that throughput should be proportional to cell velocity v , and in- versely proportional to safety distance r s and entity radius l . Figure 12 shows throughput versus r s for several choices of v for an 8 × 8 unit square tessel- lation instance of System with a single entity color. The parameters are set to l = 0 . 25 and K = 2500 . The entities move along a line path where the source is the bottom left corner cell and the target is the top left corner cell. For the most part, the inverse relationship with v holds as expected: all other factors 26 remaining the same, a lower velocity makes each entity take longer to move away from the boundary, which causes the predecessor cell to be blocked more frequently, and thus fewer entities reach tid from any element of ID S in the same number of rounds. In cases with low velocity (for example v = 0 . 1 ) and for very small r s , however, the throughput can actually be greater than that at a slightly higher velocity. We conjecture that this somewhat surprising effect appears because at very small safety spacing, the potential for safety violation is higher with faster speeds, and therefore there are many more blocked cells per round. We also observe that the throughput saturates at a certain value of r s ( ≈ 0 . 55 ). This situation arises when there is roughly only one entity in each cell. Single-color throughput without failures as a function of the path. For a sufficiently large number of rounds K , throughput is independent of the length of the path. This of course varies based on the particular path and instance of System considered, but all other variables fixed, this relationship is observed. More in- teresting however, is the relationship between throughput and path complex- ity, measured in the number of turns along a path. Figure 13 shows through- put versus the number of turns along paths of length 8 . This illustrates that throughput decreases as the number of turns increases, up to a point at which the decrease in throughput saturates. This saturation is due to signaling and indicates that there is only one entity per cell. Single-color throughput under failure and recovery of cells. Finally, we considered a random failure and recovery model in which at each round each non-faulty cell fails with some probability p f and each faulty cell recovers with some prob- ability p r [33]. A recovery sets failed i = false and in the case of tid also resets dist tid = 0 , so that eventually Route will correct next j and dist j for any j ∈ TC . Intuitively, we expect that throughput will decrease as p f increases and in- crease as p r increases. Figure 14 demonstrates this result for 0 . 01 ≤ p f ≤ 0 . 05 and 0 . 05 ≤ p r ≤ 0 . 2 . There is a diminishing return on increasing p r for a fixed p f , in that for a fixed p f increasing p r results in smaller throughput gains. Multi-color throughput as a function of the number of intersecting cells. Now we discuss the influence of multi-color throughput. In the case where the paths between different sources and targets do not overlap, all the results from the single-color simulation results apply. In the case where the paths do overlap, the mutual exclusion algorithm runs to ensure no deadlocks occur. This addi- tional control logic will have an influence on the throughput. For the multi- color cases, we consider the summed throughput, which is the sum of the throughputs for each color. Figure 16 shows the roughly exponential decrease in throughput as the frac- tion of overlapping paths increases for two colors with path length 8 and no turns. The fraction of overlapping paths is defined as the number of vertices in the color-shared cells CSC ( x , c ) . As the fraction increases, the paths lie com- pletely on top of one another, so in this case with path length 8 , we have no overlap, 1 cell overlap, etc. 27 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0 0.01 0.02 0.03 0.04 0.05 0.06 p f throughput p r =0.05 p r =0.1 p r =0.15 p r =0.2 Figure 14: Throughput versus failure rate p f for several recovery rates p r with an initial path of length 8 , where K = 20000 , r s = 0 . 05 , l = 0 . 2 , and v = 0 . 2 for System with an 8 × 8 unit square tessellation. 0 5 10 15 0.05 0.1 0.15 0.2 0.25 path length throughput square triangular Figure 15: Throughput versus increasing path length of square (blue) and equilateral triangu- lar (red) partitions. 0 0.2 0.4 0.6 0.8 1 0 0.05 0.1 fraction of overlapping paths throughput Figure 16: Throughput versus fraction of path overlap for two colors on a 1 × 16 unit square tessellation. 0 2 4 6 8 0 0.05 0.1 0.15 0.2 number of overlapping colors throughput Figure 17: Throughput versus number of over- lapping colors on a 1 × 3 unit square tessella- tion. Multi-color throughput as a function of the number of intersecting colors. Intersec- tions (that is, having at least one color-shared cell) have a fixed cost on through- put. Specifically, the summed throughput of there being two overlapping col- ors on a cell is the same as the summed throughput of three or more. Figure 17 shows this fixed decrease in throughput as the number of overlapping col- ors increases for a fixed path of length 3 with 3 color-shared cells, where the decrease in throughput from having no overlaps to having one color overlap- ping is about 4 . 5 times. Once there are two colors, all additional colors do not decrease throughput. This observation agrees with intuition—the decrease in throughput due to an intersection is independent of the number of destinations for the entities that must pass through that intersection. 28 6. Related Work There is a large amount of work on traffic control in transportation systems (see, e.g., [4, 34]) and robotics (see, e.g., [35]). We briefly summarize some of the more related work, but highlight that we are presenting a formal model of an example of such systems. Distributed air and automotive traffic control have been studied in many contexts. Human-factors issues are considered in [36, 37] to ensure collision avoidance between the coordination of numerous pilots and a supervisory controller modeling the semi-centralized air traffic control com- ponents. The Small Aircraft Transportation Protocol (SATS) is semi-distributed air traffic control protocol designed for small airports without radar, so pilots and their aircraft coordinate among themselves to land after being assigned a landing sequence order by an automated system at the airport [16]. SATS has been formally modeled and analyzed using a combination of model checking and automated theorem proving [38]. SATS and this paper share an abstrac- tion: the physical environment is a priori partitioned into a set of regions of interest, and properties about the whole system are proved using composi- tional analysis. Safe conflict resolution maneuvers for distributed air traffic control are designed in [39]. A formal model of the traffic collision avoidance system (TCAS) is developed and analyzed for safety in [40]. TCAS is a system deployed on aircraft that alerts pilots when other aircraft are in close proximity and guides them along safe trajectories. A distributed algorithm (executed by entities, vehicles in this case) for con- trolling automotive intersections without any stop signs is presented in [18]. Some methods for ensuring liveness for automotive intersections are presented in [41]. A method to detect the mode of a hybrid system control model of an autonomous vehicle in intersections is developed in [42], and is used to reduce conservatism of the maximally controlled invariant set (the set of collision- free controls). Efficient distributed intersection control algorithms are devel- oped in [43]. There is a large amount of work on flocking [44] and platoon- ing [45, 46, 47, 48]. Only a few works consider failures in such systems, like the arbitrary failures considered in [49, 50], the actuator failures considered in [48], or in synchronization of swarm robot systems in [51]. Distributed robot coordination on discrete abstractions like [52, 23, 53, 54, 55, 56, 57] can be viewed as traffic control. For instance, [23] establishes a for- mal connection between the continuous and the discrete parts of these proto- cols, and also presents a self-stabilizing algorithm with similar analysis to the analysis in this paper. These works also decompose the continuous problem into a discrete abstraction by partitioning the environment, but all these works allow at most a single entity (robot) in each partition, while our framework allows numerous entities in each partition. If several entities are to visit some destination in [53, 56, 57], like our targets here, that destination is represented as the union of a set of partitions and each entity must reside in one of these partitions. The Kiva Systems robotic warehouse [52] is a robotic traffic control system on square partitions, and can be described in our framework by allowing a 29 single entity per cell. In these warehouse systems, there is a central coordinator scheduling tasks, but the robots are responsible for path planning using an A ∗ - like search algorithm [52]. However, several deadlock scenarios are identified when performing such path planning [54]. The Adaptive Highways Algorithm presented in [54] for scheduling entities relies on using the tentative trajectories of other robots collected by the central controller. Deadlocks are also observed in other distributed robotics path-planning algorithms on discrete partitions in [58]. Deadlock scenarios can also arise without a discrete abstraction, such as in the doorways considered in [59], the path formation algorithms of [60], or the warehouse automation system of [61]. Lastly, we mention that most of these works on traffic control from aviation, automotive, swarm robotics, and warehouse automation applications can be modeled within the framework of spatial computing [62, 63, 64]. 7. Discussion In this section, we discuss some ways to generalize assumptions used in the paper and some alternative methods. In this paper, we presented a dis- tributed traffic control algorithm for the partitioned plane, which moves en- tities without collision to their destinations, in spite of failures. While our algorithm is presented for two-dimensional partitions, an extension to some three-dimensional partitions (e.g., cubes and tetrahedra) follows in an obvious way. An extension to the more general case where there are multiple sources and multiple targets of each color—and entities of each color move toward the nearest target of that color—is straightforward, but complicates notation. Self-Stabilizing Mutual Exclusion and Distributed Snapshot Algorithms. There are a variety of mutual exclusion algorithms that could be used to determine locks (Figure 9, line 13). For this paper, we require the overall system to be stabi- lizing and therefore the locking algorithm itself should be stabilizing. To this end, any of the following algorithms could be adapted to our framework: the token circulation algorithm [65], mutual exclusion [66], group mutual exclu- sion [67], snap-stabilizing propagation of information with feedback (PIF) al- gorithm [68], or k -out-of- l mutual exclusion [69]. A self-stabilizing distributed snapshot algorithm (see [27, Ch. 5]) can be used to determine if all c color- shared cells are empty, after having had some entity of color c (Figure 9, line 19). If all cells are empty, then another round of mutual exclusion commences, ex- cluding color c from the input set. General Triangulations and Affine Dynamics. We assumed in Section 2 that the partitions satisfy several geometric assumptions for feasibility of entity trans- fers. We considered using vector fields generated by a discrete abstraction like those presented in [70, 71, 72, 73]. The affine vector fields generated on sim- plices in [70, 73] can be used to move an entity (with potentially nonholo- nomic or nonlinear dynamics) through any side of a cell in a triangulation (simplex) [70, 72] or rectangle [71]. However, it turns out that it is impossible to 30 maintain our notion of safety for such vector fields without additional collision avoidance mechanisms implemented on each entity. This is due to a simple ge- ometric observation—moving entities through a shorter side than the side they entered through may require the entities to come closer together. For example, if a cell in the triangulation has an obtuse angle, then the vector field gener- ated by [70] flowing from the longest edge to the shortest edge has negative divergence. Furthermore, a vector field having negative divergence implies the flow corresponding to any two distinct points starting in that field come closer together, hence safety cannot be maintained. The distributed problems using these discrete abstractions [53, 56, 57] avoid this by requiring at most one entity in any (triangular) partition at a time. We also mention a simple condition to ensure that triangulations have the required geometric partition properties (Assumptions 1 and 2). If all the trian- gles in the triangulation are non-obtuse, then the triangulation satisfies these assumptions. We also note that restricting allowable triangulations of an en- vironment to ones without obtuse angles is not restrictive, since any polygon can be efficiently partitioned into a triangulation with non-obtuse [74, 75] or acute [76] angles. Insufficiency of Disjoint Paths. Finding disjoint paths, such as by using the al- gorithms from [77, 78, 79, 80], could be another approach to solving the multi- color problem, but the locking mechanism used here solves a more general problem. Even without failures, there are many environments and choices of sources and targets for which there are no disjoint paths between sources and targets. One such environment is shown in Figure 4, where for two distinct colors c and d , the paths between the respective sources and targets necessar- ily overlap, so an algorithm for finding disjoint paths cannot be used as there are no disjoint paths between sources and targets. However, there are disjoint paths in some cases, so no scheduling would be necessary if these are found, but our routing algorithm does not necessarily find these, as the disjoint paths may not be shortest distance. A self-stabilizing algorithm for finding disjoint paths on planar graphs would be an enhancement to our algorithm, as it would increase throughput in the case that paths need not overlap. Back-Pressure and Wormhole Routing. Back-pressure routing [81, 82] is an al- gorithm for dynamically routing traffic over an underlying graph using con- gestion gradients. If we view the color of each entity as its intended address and consider this problem from the perspective of queuing theory, one might think back-pressure routing could provide a throughput-optimal solution for the problem. However, our physical motion model is incompatible with back- pressure routing. For a given cell, our model does not allow arbitrary choice of the next neighbor for each entity on that cell. In particular, when one cell moves its entities toward a neighboring cell, all entities sufficiently near the shared side between the two neighbors would transfer. Wormhole routing [83] is a flow control policy over a fixed underlying graph for determining when packets move to the node on the graph. Ad- 31 6 7 8 9 10 11 12 13 5 6 7 8 2 3 4 6 7 8 10 11 x y Figure 18: Hexagonal partition that does not satisfy the projection property (Assumption 1). An exten- sion to allow such partitions would require enlarg- ing the transfer region and receiving a signal from all of the potential next neighbors, which would re- quire cells 3 and 7 both to signal cell 4 to move. 1T 2 3 4S 5 6 7 8 9 10 11 12 13 14 15 16 Figure 19: Example system on a parallel- ogram partition with failed cells in black. The several turns along the path from the source to the target cause a satura- tion of entities on cells 6 , 10 , and 14 . The movement vector u ( i, j ) is defined as the unit vector parallel to the x axis for movement between horizontal neighbors, and the unit vector parallel to the vertical sides of the parallelograms between ver- tical neighbors. dresses in wormhole routing are very short and come at the beginning of a packet, so a packet can be subdivided into pieces or flits and begin being for- warded after the address is received, yielding a snake-like sequence of flits in transfer. One could also view the sequence of entities on a path toward the appropriately-colored target (see Figure 19) sequence of flits flowing to a des- tination in wormhole routing. While similar deadlock scenarios can arise in our system and wormhole routing, wormhole routing is incompatible with our system due to the motion model just like back-pressure routing. 8. Conclusion We presented a self-stabilizing distributed traffic control protocol for the partitioned plane, where each partition controls the motion of all entities within that partition. The algorithm guarantees separation between entities in the face of crash failures of the software controlling a partition. Once new failures cease occurring, it guarantees progress of all entities that are neither isolated by (a) failed partitions, nor (b) cells with entities of other colors that become deadlocked due to failures, to the respective targets. Through simulations, we presented estimates of throughput as a function of velocity, minimum sepa- ration, single-target path complexity, failure-recovery rates, and multi-target path complexity. It would be interesting to develop strategies allowing entities of different colors on a single cell. Our strategy of preventing entities of different colors from residing on a single cell simplified some analysis, but it also complicated some analysis, by making it harder to prove progress because deadlock sce- narios may frequently arise. It would be interesting to develop algorithms al- lowing mixing and sorting of colors using different types of motion coupling. 32 It would also be interesting to design algorithms that can allow relaxing the assumption on what failures may occur to ensure liveness. We believe this would require a more complex routing algorithm to temporarily move entities of some colors off the color shared cells, thus allowing some other color on the color shared cells to make progress. 9. Acknowledgments The authors thank Zhongdong Zhu for helping develop the current version of the simulator, Karthik Manamcheri for helping develop an earlier version of the simulator, and Nitin Vaidya for helpful feedback. We also thank the anonymous reviewers who helped improve the earlier version of this paper. References [1] D. Helbing, M. Treiber, Jams, waves, and clusters, Science [2] B. S. Kerner, Experimental features of self-organization in traffic flow, Phys. Rev. Lett. [3] C. Daganzo, M. Cassidy, R. Bertini, Possible explanations of phase transi- tions in highway traffic, Transportation Research A [4] M. Nolan, Fundamentals of air traffic control, [5] F. Borgonovo, L. Campelli, M. Cesana, L. Coletti, Mac for ad hoc inter- vehicle network: services and performance, in: IEEE Vehicular Technol- ogy Conf., Vol. 5, 2003, [6] M. Karpiriski, A. Senart, V. Cahill, Sensor networks for smart roads, in: Pervasive Computing and Communications Workshops, 2006. PerCom Workshops 2006. Fourth Annual IEEE International Conference on, 2006, [7] S. S. Manvi, M. S. Kakkasageri, J. Pitt, Multiagent based information dis- semination in vehicular ad hoc networks, Mob. Inf. Syst. [8] S. R. Azimi, G. Bhatia, R. R. Rajkumar, P. Mudalige, Vehicular networks for collision avoidance at intersections, SAE International Journal of Pas- senger Cars - Mechanical Systems [9] S. Thrun, M. Montemerlo, H. Dahlkamp, D. Stavens, A. Aron, J. Diebel, P. Fong, J. Gale, M. Halpenny, G. Hoffmann, K. Lau, C. Oakley, M. Palatucci, V. Pratt, P. Stang, S. Strohband, C. Dupont, L.-E. Jendrossek, C. Koelen, C. Markey, C. Rummel, J. van Niekerk, E. Jensen, P. Alessan- drini, G. Bradski, B. Davies, S. Ettinger, A. Kaehler, A. Nefian, P. Ma- honey, Stanley: The Robot That Won the DARPA Grand Challenge, in: M. Buehler, K. Iagnemma, S. Singh (Eds.), The 2005 DARPA Grand Chal- lenge, Vol. 36 of Springer Tracts in Advanced Robotics, Springer Berlin / Heidelberg, 2007, 33 [10] C. Urmson, J. Anhalt, D. Bagnell, C. Baker, R. Bittner, M. N. Clark, J. Dolan, D. Duggins, T. Galatali, C. Geyer, M. Gittleman, S. Harbaugh, M. Hebert, T. M. Howard, S. Kolski, A. Kelly, M. Likhachev, M. Mc- Naughton, N. Miller, K. Peterson, B. Pilnick, R. Rajkumar, P. Rybski, B. Salesky, Y.-W. Seo, S. Singh, J. Snider, A. Stentz, W. R. Whittaker, Z. Wolkowicki, J. Ziglar, H. Bae, T. Brown, D. Demitrish, B. Litkouhi, J. Nickolaou, V. Sadekar, W. Zhang, J. Struble, M. Taylor, M. Darms, D. Fer- guson, Autonomous driving in urban environments: Boss and the urban challenge, Journal of Field Robotics [11] X. Yang, L. Liu, N. Vaidya, F. Zhao, A vehicle-to-vehicle communication protocol for cooperative collision warning, in: Mobile and Ubiquitous Systems: Networking and Services. MOBIQUITOUS. The First Annual International Conference on, 2004, [12] J. Misener, R. Sengupta, H. Krishnan, Cooperative collision warning: En- abling crash avoidance with wireless technology, in: 12th World Congress on Intelligent Transportation Systems, 2005, [13] B. Hoh, M. Gruteser, R. Herring, J. Ban, D. Work, J.-C. Herrera, A. M. Bayen, M. Annavaram, Q. Jacobson, Virtual trip lines for distributed privacy-preserving traffic monitoring, in: MobiSys ’08: Proceeding of the 6th International Conference on Mobile Systems, Applications, and Ser- vices, ACM, New York, NY, USA, 2008, [14] A. Girard, J. de Sousa, J. Misener, J. Hedrick, A control architecture for integrated cooperative cruise control and collision warning systems, in: Decision and Control. Proceedings of the 40th IEEE Conference on, Vol. 2, 2001, [15] M. Mamei, F. Zambonelli, L. Leonardi, Distributed motion coordination with co-fields: a case study in urban traffic management, in: Autonomous Decentralized Systems. ISADS. The Sixth International Symposium on, 2003, [16] T. S. Abbott, K. M. Jones, M. C. Consiglio, D. M. Williams, C. A. Adams, Small aircraft transportation system, higher volume operations concept: Normal operations, Tech. Rep. NASA/TM-2004-213022, NASA [17] M. Kelly, G. Di Marzo Serugendo, A decentralised car traffic control sys- tem simulation using local message propagation optimised with a genetic algorithm, in: S. Brueckner, S. Hassas, M. Jelasity, D. Yamins (Eds.), Engi- neering Self-Organising Systems, Vol. 4335 of Lecture Notes in Computer Science, Springer, 2007, [18] H. Kowshik, D. Caveney, P. R. Kumar, Safety and liveness in intelligent in- tersections, in: Hybrid Systems: Computation and Control (HSCC), 11th International Workshop, Vol. 4981 of LNCS, 2008, 34 [19] K. Dresner, P. Stone, A multiagent approach to autonomous intersection management, Journal of Artificial Intelligence Research [20] P. Weiss, Stop-and-go science, Science News [21] Kornylak, Omniwheel brochure URL http://www.kornylak.com/images/pdf/omni-wheel.pdf. [22] K. An, A. Trewyn, A. Gokhale, S. Sastry, Model-driven performance analy- sis of reconfigurable conveyor systems used in material handling applica- tions, in: Cyber-Physical Systems (ICCPS), 2011 IEEE/ACM International Conference on, Vol. 2, IEEE, 2011, [23] S. Gilbert, N. Lynch, S. Mitra, T. Nolte, Self-stabilizing robot formations over unreliable networks, ACM Trans. Auton. Adapt. Syst. [24] S. Dolev, L. Lahiani, S. Gilbert, N. Lynch, T. Nolte, Virtual stationary au- tomata for mobile networks, in: PODC ’05: Proceedings of the twenty- fourth annual ACM symposium on Principles of distributed computing, ACM, New York, NY, USA, 2005, [25] T. Nolte, N. Lynch, A virtual node-based tracking algorithm for mobile networks, in: Distributed Computing Systems, International Conference on (ICDCS), IEEE Computer Society, Los Alamitos, CA, USA, 2007, [26] A. Arora, M. Gouda, Closure and convergence: A foundation of fault- tolerant computing, IEEE Trans. Softw. Eng. [27] S. Dolev, Self-stabilization, [28] S. M. Loos, A. Platzer, L. Nistor, Adaptive cruise control: Hybrid, dis- tributed, and now formally verified, in: M. Butler, W. Schulte (Eds.), For- mal Methods, LNCS, [29] A. Platzer, Quantified differential invariants, in: Proc. of the 14th ACM Intl. Conf. on Hybrid Systems: Computation and Control, ACM, 2011, [30] T. T. Johnson, S. Mitra, Parameterized verification of distributed cyber- physical systems: An aircraft landing protocol case study, in: ACM/IEEE 3rd International Conference on Cyber-Physical Systems, [31] T. T. Johnson, S. Mitra, A small model theorem for rectangular hybrid automata networks, in: Proceedings of the IFIP International Conference on Formal Techniques for Distributed Systems, Joint 14th Formal Methods for Open Object-Based Distributed Systems and 32nd Formal Techniques for Networked and Distributed Systems (FORTE-FMOODS), Vol. 7273 of LNCS, [32] T. T. Johnson, S. Mitra, K. Manamcheri, Safe and stabilizing distributed cellular flows, in: Proceedings of the 30th IEEE International Conference on Distributed Computing Systems (ICDCS), 35 [33] R. E. L. DeVille, S. Mitra, Stability of distributed algorithms in the face of incessant faults, in: Proceedings of 11th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS), Springer, 2009, [34] P. A. Ioannou, Automated Highway Systems, [35] F. Bullo, J. Cort ́ es, S. Mart ́ ınez, Distributed Control of Robotic Networks, Applied Mathematics Series, Princeton University Press, 2009, [36] N. Leveson, M. de Villepin, J. Srinivasan, M. Daouk, N. Neogi, E. Bachelder, J. Bellingham, N. Pilon, G. Flynn, A safety and human- centered approach to developing new air traffic management tools, in: Proceedings Fourth USA/Europe Air Traffic Management R&D Seminar, 2001, [37] T. Prevot, Exploring the many perspectives of distributed air traffic man- agement: The multi aircraft control system (macs), in: Proceedings of the HCI-Aero, 2002, [38] C. Mu ̃ noz, V. Carre ̃ no, G. Dowek, Formal analysis of the operational con- cept for the small aircraft transportation system, in: M. Butler, C. Jones, A. Romanovsky, E. Troubitsyna (Eds.), Rigorous Development of Com- plex Fault-Tolerant Systems, Vol. 4157 of LNCS, Springer Berlin / Heidel- berg, 2006, [39] C. Tomlin, G. Pappas, S. Sastry, Conflict resolution for air traffic manage- ment: a study in multiagent hybrid systems, IEEE Trans. Autom. Control [40] C. Livadas, J. Lygeros, N. A. Lynch, High-level modeling and analysis of TCAS, in: Proceedings of the 20th IEEE Real-Time Systems Symposium (RTSS’99), 1999, [41] T.-C. Au, N. Shahidi, P. Stone, Enforcing liveness in autonomous traffic management, in: Proceedings of the Twenty-Fifth Conference on Artificial Intelligence, [42] R. Verma, D. Vecchio, Semiautonomous multivehicle safety, Robotics Au- tomation Magazine, IEEE [43] A. Colombo, D. D. Vecchio, Efficient algorithms for collision avoidance at intersections, in: Hybrid Systems: Computation and Control (HSCC), [44] R. Olfati-Saber, Flocking for multi-agent dynamic systems: algorithms and theory, IEEE Trans. Autom. Control [45] P. Varaiya, Smart cars on smart roads: Problems of control, IEEE Trans. Autom. Control 36 [46] E. Dolginova, N. Lynch, Safety verification for automated platoon maneu- vers: A case study, in: HART’97 (International Workshop on Hybrid and Real-Time Systems), Vol. 1201 of LNCS, [47] D. Swaroop, J. K. Hedrick, Constant spacing strategies for platooning in automated highway systems, Journal of Dynamic Systems, Measurement, and Control [48] T. T. Johnson, S. Mitra, Safe flocking in spite of actuator faults using direc- tional failure detectors, Journal of Nonlinear Systems and Applications [49] V. Gupta, C. Langbort, R. Murray, On the robustness of distributed algo- rithms, in: Decision and Control. 45th IEEE Conference on, 2006, [50] M. Franceschelli, M. Egerstedt, A. Giua, Motion probes for fault detection and recovery in networked control systems, in: American Control Confer- ence, 2008, 2008, [51] A. Christensen, R. O’Grady, M. Dorigo, From fireflies to fault tolerant swarms of robots, IEEE Transactions on Evolutionary Computation [52] P. R. Wurman, R. D’Andrea, M. Mountz, Coordinating hundreds of coop- erative, autonomous vehicles in warehouses, AI Magazine [53] M. Kloetzer, C. Belta, Automatic deployment of distributed teams of robots from temporal logic motion specifications, Robotics, IEEE Trans- actions on [54] H. Roozbehani, R. D’Andrea, Adaptive highways on a grid, in: C. Pradalier, R. Siegwart, G. Hirzinger (Eds.), Robotics Research, Vol. 70 of Springer Tracts in Advanced Robotics, Springer, 2011, [55] J. W. Durham, R. Carli, P. Frasca, F. Bullo, Discrete partitioning and cover- age control for gossiping robots, Robotics, IEEE Transactions on [56] X. C. Ding, M. Kloetzer, Y. Chen, C. Belta, Automatic deployment of robotic teams, Robotics Automation Magazine, IEEE [57] Y. Chen, X. C. Ding, A. Stefanescu, C. Belta, Formal approach to the de- ployment of distributed robotic teams, Robotics, IEEE Transactions on [58] R. Luna, K. Bekris, Network-guided multi-robot path planning in discrete representations, in: Intelligent Robots and Systems (IROS). IEEE/RSJ In- ternational Conference on, 2010, [59] D. Herrero-Perez, H. Matinez-Barbera, Decentralized coordination of au- tonomous agvs in flexible manufacturing systems, in: Intelligent Robots and Systems. IROS. IEEE/RSJ International Conference on, 2008, [60] S. Nouyan, A. Campo, M. Dorigo, Path formation in a robot swarm: Self- organized strategies to find your way home, Swarm Intelligence 37 [61] A. Kamagaew, J. Stenzel, A. Nettstrater, M. ten Hompel, Concept of cel- lular transport systems in facility logistics, in: Automation, Robotics and Applications (ICARA). 5th International Conference on, 2011, [62] F. Zambonelli, M. Mamei, Spatial computing: An emerging paradigm for autonomic computing and communication, in: M. Smirnov (Ed.), Auto- nomic Communication, Vol. 3457 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 2005, [63] J. Beal, J. Bachrach, Infrastructure for engineered emergence on sensor/ac- tuator networks, Intelligent Systems, IEEE [64] J. Bachrach, J. Beal, J. McLurkin, Composable continuous-space programs for robotic swarms, Neural Computing & Applications [65] C. Johnen, G. Alari, J. Beauquier, A. Datta, Self-stabilizing depth-first to- ken passing on rooted networks, in: M. Mavronicolas, P. Tsigas (Eds.), Distributed Algorithms, Vol. 1320 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 1997, pp. 260–274, [66] A. K. Datta, C. Johnen, F. Petit, V. Villain, Self-stabilizing depth-first token circulation in arbitrary rooted networks, Distributed Computing [67] J. Beauquier, S. Cantarell, A. Datta, F. Petit, Group mutual exclusion in tree networks, in: Parallel and Distributed Systems, 2002. Proceedings. Ninth International Conference on, IEEE Computer Society, 2002, [68] A. Bui, A. Datta, F. Petit, V. Villain, Snap-stabilization and pif in tree net- works, Distributed Computing [69] A. Datta, S. Devismes, F. Horn, L. Larmore, Self-stabilizing k-out-of-l ex- clusion on tree networks, in: Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, 2009, [70] C. Belta, V. Isler, G. Pappas, Discrete abstractions for robot motion plan- ning and control in polygonal environments, Robotics, IEEE Transactions on [71] C. Belta, L. Habets, Controlling a class of nonlinear systems on rectangles, Automatic Control, IEEE Transactions on [72] L. Habets, P. Collins, J. van Schuppen, Reachability and control synthe- sis for piecewise-affine hybrid systems on simplices, Automatic Control, IEEE Transactions on [73] M. Kloetzer, C. Belta, A fully automated framework for control of lin- ear systems from temporal logic specifications, Automatic Control, IEEE Transactions on [74] B. Baker, E. Grosse, C. Rafferty, Nonobtuse triangulation of polygons, Dis- crete & Computational Geometry 38 [75] M. Bern, S. Michell, J. Ruppert, Linear-size nonobtuse triangulation of polygons, Discrete & Computational Geometry [76] H. Maehara, Acute triangulations of polygons, European Journal of Com- binatorics [77] H. Mohanty, G. P. Bhattacharjee, A distributed algorithm for edge-disjoint path problem, in: Proceedings of the Sixth Conference on Foundations of Software Technology and Theoretical Computer Science, Springer-Verlag, London, UK, UK, 1986, [78] R. Ogier, V. Rutenburg, N. Shacham, Distributed algorithms for comput- ing shortest pairs of disjoint paths, Information Theory, IEEE Transactions on [79] S.-J. Lee, M. Gerla, Split multipath routing with maximally disjoint paths in ad hoc networks, in: Communications. ICC. IEEE International Confer- ence on, Vol. 10, 2001, [80] M. Marina, S. Das, On-demand multipath distance vector routing in ad hoc networks, in: Network Protocols. Ninth International Conference on, 2001, [81] L. Tassiulas, A. Ephremides, Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks, Automatic Control, IEEE Transactions on [82] B. Awerbuch, T. Leighton, A simple local-control approximation algo- rithm for multicommodity flow, in: Foundations of Computer Science. Proceedings., 34th Annual Symposium on, IEEE, 1993, [83] L. Ni, P. McKinley, A survey of wormhole routing techniques in direct networks, Computer 39