Memoryless Control Design for Persistent Surveillance under Safety Constraints Eduardo Arvelo, Eric Kim and Nuno C. Martins Department of Electrical and Computer Engineering and Maryland Robotics Center Univeristy of Maryland,College Park, MD 20742 Email: {earvelo, eskim727, nmartins}@umd.edu Abstract—This paper deals with the design of time-invariant memoryless control policies for robots that move in a finite two- dimensional lattice and are tasked with persistent surveillance of an area in which there are forbidden regions. We model each robot as a controlled Markov chain whose state comprises its position in the lattice and the direction of motion. The goal is to find the minimum number of robots and an associated time-invariant memoryless control policy that guarantees that the largest number of states are persistently surveilled without ever visiting a forbidden state. We propose a design method that relies on a finitely parametrized convex program inspired by entropy maximization principles. Numerical examples are provided. I. INTRODUCTION We develop a method to design memoryless controllers for robots that move in a finite two-dimensional lattice. The goal is to achieve persistent surveillance. The term “persistent surveillance” is used to denote the task of continuously visiting the largest possible set of points in the lattice. In our setup, we also impose safety constraints that dictate that certain regions are forbidden. The forbidden regions may represent areas in which robots cannot operate (such as bodies of water) or are not allowed to visit (such as restricted airspace). The goal is to deploy the minimum number of robots equipped with a control policy that guarantees persistent surveillance of the largest possible set of lattice points without ever visiting a forbidden region. The memoryless strategies proposed here are applicable to miniature robots that have severe computational constraints. The concept of persistent survaillance is similar to the concept of coverage [5], but differs from it in that the area to be surveilled must be revisited infinitely many times. Control design for persistent surveillance has been studied in [12], [13], where a semi-heuristic control policy that minimizes the time between visitations to the same region is proposed, and in [10], which proposes an algorithm for persistent surveillance of a convex polygon in the plane. These approaches, however, are not restricted to memoryless policies and do not consider safety constraints. On the implementation front, system ar- chitectures for unmanned aerial vehicles have been designed specifically for persistent surveillance purposes [2], [11]. In this paper, we model each robot as a fully-observed controlled Markov chain with finite state and control spaces. This approach, which has been successfully used in the context of navigation and path planning (such as in [15], [17], [16], [4]), allows for the development of robust and highly scalable algorithms. Without loss of generality, we consider robots whose state is taken as its position on a finite two-dimensional lattice and direction of motion (taken from a set of four possible orientations), and limit the control space to two control actions (“forward” and “turn right”). The limitation in the control space illustrates how constrained actuation can be incorporated in our formulation. It is important to highlight, however, that the ideas described in this paper can be extended to more general dynamics and state/control spaces. We use a recent result in [1] to compute the largest set of states that can be persistently surveilled under safety constraints, and an associated memoryless control policy. The proposed solution relies on a finitely parametrized convex program, which is highly scalable and can be efficiently solved by standard convex optimization tools, such as [8]. The approach is based on the fact that the probability mass function that maximizes the entropy under convex constraints has maximal support [6]. We also show that the minimum number of robots needed to perform persistent surveillance of the largest set of states (without ever violating the safety constraint) is equal to the number of recurrent classes of the closed loop Markov chain under the control policy computed by the proposed convex program. The recurrent classes can be found by traversing the graph of the closed loop Markov chain. The remainder of this paper is organized as follows. Sec- tion II provides notation, basic definitions and the problem statement. The convex program that computes the maximal set of persistently surveilled states and its associated control policy is presented in Section III. Section IV provides details on computing the smallest deployment of robots necessary for maximal persistently surveillance. We discuss limiting behav- ior and use of additional constraints in Section V. Conclusions are provided in Section VI. Numerical examples are given throughout the paper to illustrate concepts and the proposed methodology. II. PRELIMINARIES AND PROBLEM STATEMENTS The following notation is used throughout the paper: arXiv:1209.5805v2 [cs.SY] 8 Nov 2012 X × Y set of lattice positions O set of orientations S def = X × Y × O set of robot states F ⊂S set of forbidden states U set of control actions Sk state of the robot at time k Uk control action at time k The state of the robot will be graphically represented as shown in Fig. 1. 1 2 3 2 1 (1, 2, U) 1 2 3 (3, 1, R) Fig. 1: Graphical representation of the state of the robot. In this examples, we use X = {1, 2, 3}, Y = {1, 2}, and O = {R, U, L, D}, where R, U, L and D represent right, up, left and down directions, respectively. The robot’s dynamics are governed by the (conditional) probability of Sk+1 given the current state Sk and control action Uk, and are denoted as: Q(s+, s, u) def = P(Sk+1 = s+ Sk = s, Uk = u), where s, s+ ∈S, u ∈U. We denote any memoryless control policy by K(u, s) def = P(Uk = u Sk = s), u ∈U, s ∈S, where P u∈U K(u, s) = 1 for all s in S. The set of all such policies is denoted as K. Note that the computation of a control action may be deterministic (when K(u, s) = 1 for a given action u) or carried out in a randomized manner, in which case the policy dictates the probabilities assigned to each control action for a given state. Assumptions: • Throughout the paper we assume that Q is given. Hence, all quantities and sets that depend on the closed loop behavior may be indexed only by the underlying control policy K. • When multiple robots are considered, we assume that they are identical and have dynamics governed by Q. In these situations, every robot executes the same control policy. Moreover, multiple robots are allowed to occupy the same position. Given a control policy K, the conditional state transition probability of the closed loop is represented as: PK(Sk+1 = s+ Sk = s) def = X u∈U Q(s+, s, u)K(u, s). We will also refer to this quantity as QK(s+, s) def = PK(Sk+1 = s+ Sk = s). A. Recurrence and Persistent Surveillance A state s ∈S is recurrent under a control policy K if the probability of a robot revisiting state s is one, that is: PK(Sk = s for some k > 0 S0 = s) = 1. (1) We define the set of recurrent states SR K under control policy K as follows: SR K def = n s ∈S : (1) holds o . Remark 2.1: Membership in SR K guarantees that once a state is visited, it will be revisited infinitely many times under control policy K. It does not, however, guarantee that each state in SR K will be visited for all initial states in SR K because SR K may contain multiple recurrent classes. In fact, a robot will visit a certain recurrent state s with probability one if and only if it is initialized in the same recurrent class. Moreover, note that once a robot enters a recurrent class, it will never exit under control policy K. We say a state s is persistently surveilled under control policy K and initial state s0 ∈S if it is recurrent under K and PK(Sk = s for some k > 0 S0 = s0) = 1. (2) If a state s is persistently surveilled under control policy K and initial state s0 ∈SR K, then it must be that s and s0 belong to the same recurrent class. We define the set of persistently surveilled states Sps s0,K under control policy K and initial state s0 ∈S to be: Sps s0,K def = n s ∈SR K : (2) holds o . The set Sps s0,K is a recurrent class of the closed loop dynamics QK. Note that for every state s in Sps s0,K, it holds that Sps s,K = Sps s0,K. Moreover, if there exists a recurrent state for which Sps s0,K = SR K, the set SR K has only one recurrent class. Given a set F of forbidden states, we define the set of states that are recurrent and for which the probability of transitioning into F is zero. The set of F-safe recurrent states SR K,F under a control policy K is defined as: SR K,F def = n s ∈SR K : QK(s+, s) = 0, s+ ∈F o . We define the maximal set of F-safe recurrent states as: SR F def = [ K∈K SR K,F. Finally, the set of F-safe persistently surveilled states Sps s0,K,F under a control policy K and initial state s0 ∈S is defined as: Sps s0,K,F def = n s ∈Sps s0,K : QK(s+, s) = 0, s+ ∈F o . Remark 2.2: As before, Sps s0,K,F is a (safe) recurrent class of QK. B. Problem Statement We start by addressing the following problem: Problem 2.3: (Maximal set of F-safe recurrent states). Given a set of forbidden states F, determine: (a) SR F ; and (b) a control policy K∗such that SR K∗= SR F . In light of Remark 2.1, note that in order to persistently surveil all possible states, we need to determine how many robots to use and in which state they should be initialized. The following problem addresses this issue. Problem 2.4: (Maximal F-safe persistent surveillance). Given a set of forbidden states F, determine the minimum number of robots r, a control policy ˆK and a set of initial states {s1, ..., sr}, r, so that r[ i=1 Sps si, ˆK,F = SR F . (3) Remark 2.5: The following is a list of important comments on Problems 2.3 and 2.4. • There is no K such that the states in S⧹SR F can be F-safe and recurrent • Once r robots are initialized with initial states {s1, ..., sr}, it is guaranteed that the largest possible set of states will be visited infinitely many times without ever visiting a forbidden state. We will propose a convex optimization problem that ef- ficiently computes SR F and a control policy K∗such that SR K∗= SR F . We will show that the minimum number of robots r required to persistently surveil SR F is the number of distinct recurrent classes of the closed loop Markov chain under the computed control policy K∗. III. COMPUTING THE MAXIMAL SET OF RECURRENT STATES: A CONVEX APPROACH Let PSU be the set of all probability mass functions (pmfs) with support in S × U, and consider the following convex optimization program: f ∗ SU = arg max fSU∈PSU H(fSU) (4) subject to: X u+∈U fSU(s+, u+) = X s∈S,u∈U Q(s+, s, u)fSU(s, u) (5) X u∈U fSU(s, u) = 0, s ∈F (6) where H : PSU →ℜ≥0 is the entropy of fSU, and is given by H(fSU) = − X u∈U X s∈S fSU(s, u) ln fSU(s, u)  where we adopt the standard convention that 0 ln(0) = 0. The following proposition, which has been modified from Theorem 3.1 in [1], provides a solution to Problem 2.3. Proposition 3.1: Let F be given, assume that (4)-(6) is fea- sible, and that f ∗ SU is the optimal solution. In addition, adopt the marginal pmf f ∗ S(s) = P u∈U f ∗ SU(s, u) and consider that G : U×S →[0, 1] is any function satisfying P u∈U G(u, s) = 1 for all s in S. The following holds: (a) SR F = Wf ∗ S (b) SR K∗= SR F for K∗given by: K∗(u, s) = ( f ∗ SU(s,u) f ∗ S(s) , s ∈Wf ∗ S G(u, s), otherwise , (u, s) ∈U × S (7) where Wf ∗ S is the support of f ∗ S and is given by Wf ∗ S = {s ∈ S : f ∗ S(s) > 0}. Comments on the proof of Proposition 3.1: The proof of Proposition 3.1 closely follows the proof of Theorem 3.1 in [1] and is omitted. However, it is important to highlight that constraint (5) enforces recurrence and constraint (6) enforces F-safety. Moreover, note that the pmf that maximizes the entropy under convex constraints has maximal support (see Lemma 3.5 in [1]). Example 3.2: Let X = Y = {1, ..., 5}, O = {R, U, L, D} and consider a robot whose action space is given by U = {“forward”, “turn right”}. Moreover, let the set of for- bidden states be given by: F =  (x, y, θ) ∈S : (x, y) ∈ {(1, 1), (1, 5), (5, 1), (5, 5), (3, 3)} , which means the robot is prohibited from visiting the center or corner locations of the lattice. In order to specify Q, we first define an auxiliary con- ditional pmf Q′ defined on X′ = Y′ = {1, 2, 3} and O′ = {R, U, L, D}. For clarity, Q′ is shown graphically in Fig. 2, which contains the probabilities of transitioning to states shown as dark triangles given the previous state shown as a white triangle. There is uncertainty only for transitions that occur on the edge of the lattice. Since we consider dynamics that are spatially invariant, the transition probabilities for states not shown in Fig. 2 can be computed by appropriate manipulation of the ones shown. Similarly, Q is constructed by appropriate expansion of Q′. We use [8] to solve (4)-(6) and use Proposition 3.1 to compute SR F and a control policy K∗such that SR K∗= SR F . The set SR F can be seen in Fig. 3, where the areas in red represent the states in F, and the triangles in blue represent the states in SR F . The control K∗, computed using (7), has been omitted due to space constraints. IV. MAXIMAL PERSISTENT SURVEILLANCE AND ROBOT DEPLOYMENT In this section, we provide a solution to Problem 2.4, which seeks the minimum number r of robots, a control policy ˆK and a set of initial states {s1, ..., sr} so that Sr i=1 Sps si, ˆK,F = SR F . In light of a previous remark, recall that the set of F-safe persistently surveilled states Sps s0,K,F is a recurrent class of QK. In practice, this means that when a robot with initial state S0 = s0 applies control policy K, it is guaranteed that: • the robot will never leave Sps s0,K,F; • every state in Sps s0,K,F will be visited infinitely many times; • states in F will never be visited. 1 1 2 3 1 3 2 Forward (F) .5 .5 1 2 3 F: edge .3 .7 1 2 3 F: top corner 1 1 3 2 Turn Right (TR) .6 .4 TR: edge .7 .3 1 3 2 TR: top corner .3 .7 TR: lower corner Fig. 2: Graphical representation of some transitions in Q′. 1 2 3 4 5 1 2 3 4 5 Fig. 3: Depiction of SR F in blue. The red areas represent the forbidden states. To find all the (safe) recurrent classes in SR K,F, flood-fill- type algorithms may be used, where the graph of QK is traversed, either in a depth-first or breath-first manner. An edge from s to s+ of the graph of QK exists if and only if QK(s+, s) > 0 holds. Note that states in S⧹(SR K,F ∪F) do not need to be searched. Given F and a control policy K, let nK be the number of distinct recurrent classes of QK, and note that the following holds: nK [ i=1 Sps si,K,F = SR K,F, where {s1, ..., snK} is a set of initial states, and  Sps si,K,F nK i=1 are distinct recurrent classes. We define the set of all admissible control policies whose F-safe set of recurrent states are maximal to be: KR F =  K ∈K : SR K,F = SR F , and note that in order to solve Problem 2.4, we must: • find a control policy ˆK in KR F such that n ˆK ≤nK for all K in KR F . Note that n ˆK is the minimum number of robots needed for maximal persistent surveillance. • identify the recurrent classes in SR ˆK,F (by exploring the graph of Q ˆK); and • select one (any) state from each of the recurrent classes to compose the set of initial states Note that the control policy K∗given in (7) is a candidate for maximal persistent surveillance since SR K∗,F = SR F . The following proposition will show that nK∗≤nK for all K in KR F . Proposition 4.1: Let F be given, and take K∗to be the control policy in (7). The following holds: nK∗≤nK, K ∈KR F . Proof: Suppose there exists a control policy ¯K in KR F such that n ¯K < nK∗. Since ¯K belongs to KR F , there must exist a control policy ˜K with the same sparsity pattern as ¯K and a pmf ˜fSU in PSU that satisfies (5) and (6) for which: ˜K(u, s) = ( ˜ fSU(s,u) ˜ fS(s) , s ∈SR F G(u, s), otherwise , (u, s) ∈U × S where ˜fS(s) = P u∈U ˜fSU(s, u) and G : U × S →[0, 1]. Since Q ¯K has fewer recurrent classes than QK∗, there must exist a pair (s, u) in SR F × U for which ¯K(s, u) > 0 and K∗(s, u) = 0 holds. Since ¯K and ˜K have the same sparsity pattern, it holds that ˜K(s, u) > 0. Therefore, it must be that ˜fSU(s, u) > 0 and f ∗ SU(s, u) = 0. In other words, the support of ˜fSU is not contained in the support of f ∗ SU, which is a contradiction by Lemma 3.5 in [1]. Remark 4.2: Suppose we change the objective function in (4) to H(fS) and add the following constraint: fS(s) = P u∈U fSU(u, s). Note that an appropriate modification of Proposition 3.1 would enable us to find SR F and an associated control policy (i.e., solve Problem 3.1), with the added benefit that the modified convex program would be computationally less intensive (since fewer calls to the entropy function would be required). However, maximizing the entropy of the marginal distribution fS would not solve the problem of maximal persistent surveillance since Proposition 4.1 would not apply. Example 4.3: Consider again the example described in Ex- ample 3.2. By exploring the graph of QK∗, we conclude 1 2 3 4 5 1 2 3 4 5 SR F 1 2 3 4 5 1 2 3 4 5 Sps s1,K∗,F, s1 = (1, 2, U) 1 2 3 4 5 1 2 3 4 5 Sps s2,K∗,F, s2 = (2, 1, U) 1 2 3 4 5 1 2 3 4 5 Sps s2,K∗,F, s3 = (2, 4, U) Fig. 4: Top left: maximal set of recurrent states SR F (in blue). Others: three recurrent classes whose union is SR F . that only one robot is required to perform maximal persistent surveillance (i.e., SR F contains only one recurrent class). Any state in SR F may be selected as the robot’s initial state. Suppose that we now change the set of forbidden states to include location (4, 3) (i.e., let F =  (x, y, θ) ∈ S : (x, y) ∈{(1, 1), (1, 5), (5, 1), (5, 5), (3, 3), (4, 3)} ). Re- solving (4)-(6), applying Propositions 3.1 and 4.1, and search- ing the graph of the closed loop Markov chain, we conclude that at least three robots are required to perform maximal persistent surveillance of SR F (see Fig. 4). Any state from each recurrent class may be used as initial states, so we can chose the set of initial states to be:  (1, 2, U), (2, 1, U), (2, 4, U) . Note that the set SR F is now smaller than in the previous example (34 vs. 40 states). V. LIMITING BEHAVIOR AND OTHER CONSTRAINTS We define TK, the long term proportion of time the robot, under control policy K, visits state s in S having started at state s0, to be: TK(s, s0) def = lim k→∞ 1 k k X i=1 I Si = s, S0 = s0  , where I is the indicator function. A. Limiting Behavior with One Recurrent Class Given a forbidden set F, and let f ∗ SU be the optimal solution to (4)-(6), and K∗be the control policy computed in (7), and suppose SR K∗,F has only one recurrent class. For any initial state s0 in SR F , the following holds with probability one: TK∗(s, s0) = f ∗ S(s), (8) were f ∗ S(s) = P u∈U f ∗ SU(s, u). Since we have not imposed aperiodicity on QK∗, we cannot state stronger convergence. However, equation (8) still tells us valuable information re- garding the limiting behavior of the robot. Note that the pmf that maximizes the entropy is “as uniform as possible” (in fact, when unconstrained, the pmf that max- imizes the entropy is uniform.). However, additional convex constraints can be added to our formulation in order to shape the distribution of the optimal pmf and, thus, influence the limiting behavior of the robot. Consider the following constraint: X (x,y)∈D, θ∈O, u∈U fSU((x, y, θ), u) > α, (9) where D ⊂X × Y is a region of the lattice. The set D can be interpreted as a region of high interest that should be surveilled more often. Suppose the convex program (4)-(6) and (9) is feasible, that f ∗∗ SU is the optimal solution and K∗∗ is the associated control policy. The following holds for any s0 in SR F with probability one: X (x,y)∈D, θ∈O TK∗∗(x, y, θ), s0  > α. Example 5.1: Let X = Y = {1, ..., 10}, O = {R, U, L, D}, and consider again a robot whose action space is given by U = {“Forward”, “Turn Right”}. The dynamics Q are similar to what was used in Examples 3.2 and 4.3, except that we add uncertainty to the transition of states that lie in the interior of the grid (see Fig. 5). The probabilities for states on the edge of the grid are the same as before (see Fig.2). .2 .2 1 2 3 1 3 2 Forward .7 .3 1 2 3 Turn Right Fig. 5: Graphical representation of some transitions in Q′. Consider the set of forbidden state F =  (x, y, θ) ∈S : (x, y) ∈{(2, 2),(2, 3),(3, 2),(3, 3),(8, 8),(8, 9),(9, 8),(9, 9)} . We solve (4)-(6) using the tool in [8]. In Fig. 6, each state that belongs in SR F is shown in blue, where the darker the 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Fig. 6: Depiction of SR F in blue. Darker blue indicates a higher value for f ∗ S. 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Fig. 7: Depiction of SR F in blue with additional constraint (9). Darker blue indicates a higher value for f ∗∗ S . blue, the higher the value of f ∗ S. Note that the distribution is relatively uniform. Consider now D =  (x, y) ∈X × Y : 3 ≤x, y ≤8 , and let α = 0.75. We solve (4)-(6) and (9). The result can be seen in Fig. 7. B. Limiting Behavior with Multiple Recurrent Classes Consider again f ∗ SU and K∗as before, and, without loss of generality, let SR K∗,F have two recurrent classes with initial states s1 and s2 i.e., Sps s1,K∗,F ∪Sps s2,K∗,F = SR K∗,F  . For any initial state s0 in Sps s1,K∗,F equiv., Sps s2,K∗,F  , the following holds with probability one: TK∗(s, s0) = f ∗ S(s) β , (10) where β =P s∈Sps s1,K∗,F f ∗ S(s) equiv. β =P s∈Sps s2,K∗,F f ∗ S(s)  . With equation (10) in mind, note that additional convex constraints may also be used to influence the limiting behavior of the robots. Moreover, by carefully selecting the number of robots allocated to each recurrent class, one can achieve a desirable limiting behavior for the ensemble of robots. VI. CONCLUSIONS We have proposed methods to design memoryless strate- gies for controlled Markov chains that guarantee maximal persistent surveillance properties under safety constraints. The uncomplicated structure of the resulting controllers makes them implementable in small robots. We have described a finitely parametrized convex program that solves this problem via entropy maximization principles, and we show that the computed control policy results in the closed loop Markov chain with the least number of recurrent classes. ACKNOWLEDGMENT This work was supported by NSF Grant CNS 0931878, AFOSR Grant FA95501110182, ONR UMD-AppEl Center and the Multiscale Systems Center, one of six research centers funded under the Focus Center Research Program. REFERENCES [1] E. Arvelo and N. C. Martins. Control design for Markov chains under safety constraints: a convex approach, 2012. arXiv:1209.2883. [2] B. Bethke, J. Redding, J. P. How, M. A. Vavrina, and J. Vian. Agent capability in persistent mission planning using approximate dynamic programming. American Control Conference, pages 1623–1628, 2010. [3] V.S. Borkar. Controlled Markov chains with constraints. Sadhana, 15(4- 5):405–413, December 1990. [4] J. Burlet, O. Aycard, and T. Fraichard. Robust motion planning using Markov decision processes and quadtree decomposition. In IEEE International Conference on Robotics and Automation, pages 2820– 2825, 2004. [5] H Choset. Coverage for robotics – A survey of recent results. Annals of Mathematics and Artificial Intelligence, 31:113–126, 2001. [6] I. Csiszar and P.C. Shields. Information Theory and Statistics: A Tutorial. Foundations and Trends in Communications and Information Theory, 1(4):1–116, 2004. [7] B. Fox. Markov Renewal Programming by Linear Fractional Pro- gramming. SIAM Journal on Applied Mathematics, 14(6):1418–1432, November 1966. [8] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 1.21. http://cvxr.com/cvx/, April 2011. [9] O. Hernandez-Lerma and J.B. Lasserre. Discrete-time Markov control processes: Basic optimality criteria, volume 30 of Stochastic Modelling and Applied Probability. Springer-Verlag, Berlin and New York, 1996. [10] P. F. Hokayem, D. Stipanovic, and M. W. Spong. On persistent coverage control. In IEEE Conference onDecision and Control, pages 6130–6135, 2007. [11] N. Michael, E. Stump, and K. Mohta. Persistent surveillance with a team of MAVs. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2708–2714, 2011. [12] N. Nigam, S. Bieniawski, I. Kroo, and J. Vian. Control of Multiple UAVs for Persistent Surveillance: Algorithm and Flight Test Results. IEEE Transactions on Control Systems Technology, 20(5):1236–1251, 2012. [13] N. Nigam and I. Kroo. Persistent Surveillance Using Multiple Un- manned Air Vehicles. In IEEE Aerospace Conference, pages 1–14, 2008. [14] M.L. Puterman. Markov Decision Processes: Discrete Stochastic Dy- namic Programming. John Wiley and Sons, New York, NY, 1994. [15] R. Simmons and S. Koenig. Probabilistic robot navigation in partially observable environments. International Joint Conference on Artificial Intelligence, 1995. [16] G. Theocharous and S. Mahadevan. Proceedings 2002 IEEE Interna- tional Conference on Robotics and Automation (Cat. No.02CH37292). In IEEE International Conference on Robotics and Automation, pages 1347–1352. IEEE, 2002. [17] A. Undurti and J. P. How. A decentralized approach to multi-agent planning in the presence of constraints and uncertainty. In IEEE International Conference on Robotics and Automation, pages 2534– 2539, 2011. [18] P. Wolfe and G.B. Dantzig. Linear Programming in a Markov Chain. Operations Research, 10(5):702–710, October 1962. Memoryless Control Design for Persistent Surveillance under Safety Constraints Eduardo Arvelo, Eric Kim and Nuno C. Martins Department of Electrical and Computer Engineering and Maryland Robotics Center Univeristy of Maryland,College Park, MD 20742 Email: {earvelo, eskim727, nmartins}@umd.edu Abstract—This paper deals with the design of time-invariant memoryless control policies for robots that move in a finite two- dimensional lattice and are tasked with persistent surveillance of an area in which there are forbidden regions. We model each robot as a controlled Markov chain whose state comprises its position in the lattice and the direction of motion. The goal is to find the minimum number of robots and an associated time-invariant memoryless control policy that guarantees that the largest number of states are persistently surveilled without ever visiting a forbidden state. We propose a design method that relies on a finitely parametrized convex program inspired by entropy maximization principles. Numerical examples are provided. I. INTRODUCTION We develop a method to design memoryless controllers for robots that move in a finite two-dimensional lattice. The goal is to achieve persistent surveillance. The term “persistent surveillance” is used to denote the task of continuously visiting the largest possible set of points in the lattice. In our setup, we also impose safety constraints that dictate that certain regions are forbidden. The forbidden regions may represent areas in which robots cannot operate (such as bodies of water) or are not allowed to visit (such as restricted airspace). The goal is to deploy the minimum number of robots equipped with a control policy that guarantees persistent surveillance of the largest possible set of lattice points without ever visiting a forbidden region. The memoryless strategies proposed here are applicable to miniature robots that have severe computational constraints. The concept of persistent survaillance is similar to the concept of coverage [5], but differs from it in that the area to be surveilled must be revisited infinitely many times. Control design for persistent surveillance has been studied in [12], [13], where a semi-heuristic control policy that minimizes the time between visitations to the same region is proposed, and in [10], which proposes an algorithm for persistent surveillance of a convex polygon in the plane. These approaches, however, are not restricted to memoryless policies and do not consider safety constraints. On the implementation front, system ar- chitectures for unmanned aerial vehicles have been designed specifically for persistent surveillance purposes [2], [11]. In this paper, we model each robot as a fully-observed controlled Markov chain with finite state and control spaces. This approach, which has been successfully used in the context of navigation and path planning (such as in [15], [17], [16], [4]), allows for the development of robust and highly scalable algorithms. Without loss of generality, we consider robots whose state is taken as its position on a finite two-dimensional lattice and direction of motion (taken from a set of four possible orientations), and limit the control space to two control actions (“forward” and “turn right”). The limitation in the control space illustrates how constrained actuation can be incorporated in our formulation. It is important to highlight, however, that the ideas described in this paper can be extended to more general dynamics and state/control spaces. We use a recent result in [1] to compute the largest set of states that can be persistently surveilled under safety constraints, and an associated memoryless control policy. The proposed solution relies on a finitely parametrized convex program, which is highly scalable and can be efficiently solved by standard convex optimization tools, such as [8]. The approach is based on the fact that the probability mass function that maximizes the entropy under convex constraints has maximal support [6]. We also show that the minimum number of robots needed to perform persistent surveillance of the largest set of states (without ever violating the safety constraint) is equal to the number of recurrent classes of the closed loop Markov chain under the control policy computed by the proposed convex program. The recurrent classes can be found by traversing the graph of the closed loop Markov chain. The remainder of this paper is organized as follows. Sec- tion II provides notation, basic definitions and the problem statement. The convex program that computes the maximal set of persistently surveilled states and its associated control policy is presented in Section III. Section IV provides details on computing the smallest deployment of robots necessary for maximal persistently surveillance. We discuss limiting behav- ior and use of additional constraints in Section V. Conclusions are provided in Section VI. Numerical examples are given throughout the paper to illustrate concepts and the proposed methodology. II. PRELIMINARIES AND PROBLEM STATEMENTS The following notation is used throughout the paper: arXiv:1209.5805v2 [cs.SY] 8 Nov 2012 X × Y set of lattice positions O set of orientations S def = X × Y × O set of robot states F ⊂S set of forbidden states U set of control actions Sk state of the robot at time k Uk control action at time k The state of the robot will be graphically represented as shown in Fig. 1. 1 2 3 2 1 (1, 2, U) 1 2 3 (3, 1, R) Fig. 1: Graphical representation of the state of the robot. In this examples, we use X = {1, 2, 3}, Y = {1, 2}, and O = {R, U, L, D}, where R, U, L and D represent right, up, left and down directions, respectively. The robot’s dynamics are governed by the (conditional) probability of Sk+1 given the current state Sk and control action Uk, and are denoted as: Q(s+, s, u) def = P(Sk+1 = s+ Sk = s, Uk = u), where s, s+ ∈S, u ∈U. We denote any memoryless control policy by K(u, s) def = P(Uk = u Sk = s), u ∈U, s ∈S, where P u∈U K(u, s) = 1 for all s in S. The set of all such policies is denoted as K. Note that the computation of a control action may be deterministic (when K(u, s) = 1 for a given action u) or carried out in a randomized manner, in which case the policy dictates the probabilities assigned to each control action for a given state. Assumptions: • Throughout the paper we assume that Q is given. Hence, all quantities and sets that depend on the closed loop behavior may be indexed only by the underlying control policy K. • When multiple robots are considered, we assume that they are identical and have dynamics governed by Q. In these situations, every robot executes the same control policy. Moreover, multiple robots are allowed to occupy the same position. Given a control policy K, the conditional state transition probability of the closed loop is represented as: PK(Sk+1 = s+ Sk = s) def = X u∈U Q(s+, s, u)K(u, s). We will also refer to this quantity as QK(s+, s) def = PK(Sk+1 = s+ Sk = s). A. Recurrence and Persistent Surveillance A state s ∈S is recurrent under a control policy K if the probability of a robot revisiting state s is one, that is: PK(Sk = s for some k > 0 S0 = s) = 1. (1) We define the set of recurrent states SR K under control policy K as follows: SR K def = n s ∈S : (1) holds o . Remark 2.1: Membership in SR K guarantees that once a state is visited, it will be revisited infinitely many times under control policy K. It does not, however, guarantee that each state in SR K will be visited for all initial states in SR K because SR K may contain multiple recurrent classes. In fact, a robot will visit a certain recurrent state s with probability one if and only if it is initialized in the same recurrent class. Moreover, note that once a robot enters a recurrent class, it will never exit under control policy K. We say a state s is persistently surveilled under control policy K and initial state s0 ∈S if it is recurrent under K and PK(Sk = s for some k > 0 S0 = s0) = 1. (2) If a state s is persistently surveilled under control policy K and initial state s0 ∈SR K, then it must be that s and s0 belong to the same recurrent class. We define the set of persistently surveilled states Sps s0,K under control policy K and initial state s0 ∈S to be: Sps s0,K def = n s ∈SR K : (2) holds o . The set Sps s0,K is a recurrent class of the closed loop dynamics QK. Note that for every state s in Sps s0,K, it holds that Sps s,K = Sps s0,K. Moreover, if there exists a recurrent state for which Sps s0,K = SR K, the set SR K has only one recurrent class. Given a set F of forbidden states, we define the set of states that are recurrent and for which the probability of transitioning into F is zero. The set of F-safe recurrent states SR K,F under a control policy K is defined as: SR K,F def = n s ∈SR K : QK(s+, s) = 0, s+ ∈F o . We define the maximal set of F-safe recurrent states as: SR F def = [ K∈K SR K,F. Finally, the set of F-safe persistently surveilled states Sps s0,K,F under a control policy K and initial state s0 ∈S is defined as: Sps s0,K,F def = n s ∈Sps s0,K : QK(s+, s) = 0, s+ ∈F o . Remark 2.2: As before, Sps s0,K,F is a (safe) recurrent class of QK. B. Problem Statement We start by addressing the following problem: Problem 2.3: (Maximal set of F-safe recurrent states). Given a set of forbidden states F, determine: (a) SR F ; and (b) a control policy K∗such that SR K∗= SR F . In light of Remark 2.1, note that in order to persistently surveil all possible states, we need to determine how many robots to use and in which state they should be initialized. The following problem addresses this issue. Problem 2.4: (Maximal F-safe persistent surveillance). Given a set of forbidden states F, determine the minimum number of robots r, a control policy ˆK and a set of initial states {s1, ..., sr}, r, so that r[ i=1 Sps si, ˆK,F = SR F . (3) Remark 2.5: The following is a list of important comments on Problems 2.3 and 2.4. • There is no K such that the states in S⧹SR F can be F-safe and recurrent • Once r robots are initialized with initial states {s1, ..., sr}, it is guaranteed that the largest possible set of states will be visited infinitely many times without ever visiting a forbidden state. We will propose a convex optimization problem that ef- ficiently computes SR F and a control policy K∗such that SR K∗= SR F . We will show that the minimum number of robots r required to persistently surveil SR F is the number of distinct recurrent classes of the closed loop Markov chain under the computed control policy K∗. III. COMPUTING THE MAXIMAL SET OF RECURRENT STATES: A CONVEX APPROACH Let PSU be the set of all probability mass functions (pmfs) with support in S × U, and consider the following convex optimization program: f ∗ SU = arg max fSU∈PSU H(fSU) (4) subject to: X u+∈U fSU(s+, u+) = X s∈S,u∈U Q(s+, s, u)fSU(s, u) (5) X u∈U fSU(s, u) = 0, s ∈F (6) where H : PSU →ℜ≥0 is the entropy of fSU, and is given by H(fSU) = − X u∈U X s∈S fSU(s, u) ln fSU(s, u)  where we adopt the standard convention that 0 ln(0) = 0. The following proposition, which has been modified from Theorem 3.1 in [1], provides a solution to Problem 2.3. Proposition 3.1: Let F be given, assume that (4)-(6) is fea- sible, and that f ∗ SU is the optimal solution. In addition, adopt the marginal pmf f ∗ S(s) = P u∈U f ∗ SU(s, u) and consider that G : U×S →[0, 1] is any function satisfying P u∈U G(u, s) = 1 for all s in S. The following holds: (a) SR F = Wf ∗ S (b) SR K∗= SR F for K∗given by: K∗(u, s) = ( f ∗ SU(s,u) f ∗ S(s) , s ∈Wf ∗ S G(u, s), otherwise , (u, s) ∈U × S (7) where Wf ∗ S is the support of f ∗ S and is given by Wf ∗ S = {s ∈ S : f ∗ S(s) > 0}. Comments on the proof of Proposition 3.1: The proof of Proposition 3.1 closely follows the proof of Theorem 3.1 in [1] and is omitted. However, it is important to highlight that constraint (5) enforces recurrence and constraint (6) enforces F-safety. Moreover, note that the pmf that maximizes the entropy under convex constraints has maximal support (see Lemma 3.5 in [1]). Example 3.2: Let X = Y = {1, ..., 5}, O = {R, U, L, D} and consider a robot whose action space is given by U = {“forward”, “turn right”}. Moreover, let the set of for- bidden states be given by: F =  (x, y, θ) ∈S : (x, y) ∈ {(1, 1), (1, 5), (5, 1), (5, 5), (3, 3)} , which means the robot is prohibited from visiting the center or corner locations of the lattice. In order to specify Q, we first define an auxiliary con- ditional pmf Q′ defined on X′ = Y′ = {1, 2, 3} and O′ = {R, U, L, D}. For clarity, Q′ is shown graphically in Fig. 2, which contains the probabilities of transitioning to states shown as dark triangles given the previous state shown as a white triangle. There is uncertainty only for transitions that occur on the edge of the lattice. Since we consider dynamics that are spatially invariant, the transition probabilities for states not shown in Fig. 2 can be computed by appropriate manipulation of the ones shown. Similarly, Q is constructed by appropriate expansion of Q′. We use [8] to solve (4)-(6) and use Proposition 3.1 to compute SR F and a control policy K∗such that SR K∗= SR F . The set SR F can be seen in Fig. 3, where the areas in red represent the states in F, and the triangles in blue represent the states in SR F . The control K∗, computed using (7), has been omitted due to space constraints. IV. MAXIMAL PERSISTENT SURVEILLANCE AND ROBOT DEPLOYMENT In this section, we provide a solution to Problem 2.4, which seeks the minimum number r of robots, a control policy ˆK and a set of initial states {s1, ..., sr} so that Sr i=1 Sps si, ˆK,F = SR F . In light of a previous remark, recall that the set of F-safe persistently surveilled states Sps s0,K,F is a recurrent class of QK. In practice, this means that when a robot with initial state S0 = s0 applies control policy K, it is guaranteed that: • the robot will never leave Sps s0,K,F; • every state in Sps s0,K,F will be visited infinitely many times; • states in F will never be visited. 1 1 2 3 1 3 2 Forward (F) .5 .5 1 2 3 F: edge .3 .7 1 2 3 F: top corner 1 1 3 2 Turn Right (TR) .6 .4 TR: edge .7 .3 1 3 2 TR: top corner .3 .7 TR: lower corner Fig. 2: Graphical representation of some transitions in Q′. 1 2 3 4 5 1 2 3 4 5 Fig. 3: Depiction of SR F in blue. The red areas represent the forbidden states. To find all the (safe) recurrent classes in SR K,F, flood-fill- type algorithms may be used, where the graph of QK is traversed, either in a depth-first or breath-first manner. An edge from s to s+ of the graph of QK exists if and only if QK(s+, s) > 0 holds. Note that states in S⧹(SR K,F ∪F) do not need to be searched. Given F and a control policy K, let nK be the number of distinct recurrent classes of QK, and note that the following holds: nK [ i=1 Sps si,K,F = SR K,F, where {s1, ..., snK} is a set of initial states, and  Sps si,K,F nK i=1 are distinct recurrent classes. We define the set of all admissible control policies whose F-safe set of recurrent states are maximal to be: KR F =  K ∈K : SR K,F = SR F , and note that in order to solve Problem 2.4, we must: • find a control policy ˆK in KR F such that n ˆK ≤nK for all K in KR F . Note that n ˆK is the minimum number of robots needed for maximal persistent surveillance. • identify the recurrent classes in SR ˆK,F (by exploring the graph of Q ˆK); and • select one (any) state from each of the recurrent classes to compose the set of initial states Note that the control policy K∗given in (7) is a candidate for maximal persistent surveillance since SR K∗,F = SR F . The following proposition will show that nK∗≤nK for all K in KR F . Proposition 4.1: Let F be given, and take K∗to be the control policy in (7). The following holds: nK∗≤nK, K ∈KR F . Proof: Suppose there exists a control policy ¯K in KR F such that n ¯K < nK∗. Since ¯K belongs to KR F , there must exist a control policy ˜K with the same sparsity pattern as ¯K and a pmf ˜fSU in PSU that satisfies (5) and (6) for which: ˜K(u, s) = ( ˜ fSU(s,u) ˜ fS(s) , s ∈SR F G(u, s), otherwise , (u, s) ∈U × S where ˜fS(s) = P u∈U ˜fSU(s, u) and G : U × S →[0, 1]. Since Q ¯K has fewer recurrent classes than QK∗, there must exist a pair (s, u) in SR F × U for which ¯K(s, u) > 0 and K∗(s, u) = 0 holds. Since ¯K and ˜K have the same sparsity pattern, it holds that ˜K(s, u) > 0. Therefore, it must be that ˜fSU(s, u) > 0 and f ∗ SU(s, u) = 0. In other words, the support of ˜fSU is not contained in the support of f ∗ SU, which is a contradiction by Lemma 3.5 in [1]. Remark 4.2: Suppose we change the objective function in (4) to H(fS) and add the following constraint: fS(s) = P u∈U fSU(u, s). Note that an appropriate modification of Proposition 3.1 would enable us to find SR F and an associated control policy (i.e., solve Problem 3.1), with the added benefit that the modified convex program would be computationally less intensive (since fewer calls to the entropy function would be required). However, maximizing the entropy of the marginal distribution fS would not solve the problem of maximal persistent surveillance since Proposition 4.1 would not apply. Example 4.3: Consider again the example described in Ex- ample 3.2. By exploring the graph of QK∗, we conclude 1 2 3 4 5 1 2 3 4 5 SR F 1 2 3 4 5 1 2 3 4 5 Sps s1,K∗,F, s1 = (1, 2, U) 1 2 3 4 5 1 2 3 4 5 Sps s2,K∗,F, s2 = (2, 1, U) 1 2 3 4 5 1 2 3 4 5 Sps s2,K∗,F, s3 = (2, 4, U) Fig. 4: Top left: maximal set of recurrent states SR F (in blue). Others: three recurrent classes whose union is SR F . that only one robot is required to perform maximal persistent surveillance (i.e., SR F contains only one recurrent class). Any state in SR F may be selected as the robot’s initial state. Suppose that we now change the set of forbidden states to include location (4, 3) (i.e., let F =  (x, y, θ) ∈ S : (x, y) ∈{(1, 1), (1, 5), (5, 1), (5, 5), (3, 3), (4, 3)} ). Re- solving (4)-(6), applying Propositions 3.1 and 4.1, and search- ing the graph of the closed loop Markov chain, we conclude that at least three robots are required to perform maximal persistent surveillance of SR F (see Fig. 4). Any state from each recurrent class may be used as initial states, so we can chose the set of initial states to be:  (1, 2, U), (2, 1, U), (2, 4, U) . Note that the set SR F is now smaller than in the previous example (34 vs. 40 states). V. LIMITING BEHAVIOR AND OTHER CONSTRAINTS We define TK, the long term proportion of time the robot, under control policy K, visits state s in S having started at state s0, to be: TK(s, s0) def = lim k→∞ 1 k k X i=1 I Si = s, S0 = s0  , where I is the indicator function. A. Limiting Behavior with One Recurrent Class Given a forbidden set F, and let f ∗ SU be the optimal solution to (4)-(6), and K∗be the control policy computed in (7), and suppose SR K∗,F has only one recurrent class. For any initial state s0 in SR F , the following holds with probability one: TK∗(s, s0) = f ∗ S(s), (8) were f ∗ S(s) = P u∈U f ∗ SU(s, u). Since we have not imposed aperiodicity on QK∗, we cannot state stronger convergence. However, equation (8) still tells us valuable information re- garding the limiting behavior of the robot. Note that the pmf that maximizes the entropy is “as uniform as possible” (in fact, when unconstrained, the pmf that max- imizes the entropy is uniform.). However, additional convex constraints can be added to our formulation in order to shape the distribution of the optimal pmf and, thus, influence the limiting behavior of the robot. Consider the following constraint: X (x,y)∈D, θ∈O, u∈U fSU((x, y, θ), u) > α, (9) where D ⊂X × Y is a region of the lattice. The set D can be interpreted as a region of high interest that should be surveilled more often. Suppose the convex program (4)-(6) and (9) is feasible, that f ∗∗ SU is the optimal solution and K∗∗ is the associated control policy. The following holds for any s0 in SR F with probability one: X (x,y)∈D, θ∈O TK∗∗(x, y, θ), s0  > α. Example 5.1: Let X = Y = {1, ..., 10}, O = {R, U, L, D}, and consider again a robot whose action space is given by U = {“Forward”, “Turn Right”}. The dynamics Q are similar to what was used in Examples 3.2 and 4.3, except that we add uncertainty to the transition of states that lie in the interior of the grid (see Fig. 5). The probabilities for states on the edge of the grid are the same as before (see Fig.2). .2 .2 1 2 3 1 3 2 Forward .7 .3 1 2 3 Turn Right Fig. 5: Graphical representation of some transitions in Q′. Consider the set of forbidden state F =  (x, y, θ) ∈S : (x, y) ∈{(2, 2),(2, 3),(3, 2),(3, 3),(8, 8),(8, 9),(9, 8),(9, 9)} . We solve (4)-(6) using the tool in [8]. In Fig. 6, each state that belongs in SR F is shown in blue, where the darker the 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Fig. 6: Depiction of SR F in blue. Darker blue indicates a higher value for f ∗ S. 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Fig. 7: Depiction of SR F in blue with additional constraint (9). Darker blue indicates a higher value for f ∗∗ S . blue, the higher the value of f ∗ S. Note that the distribution is relatively uniform. Consider now D =  (x, y) ∈X × Y : 3 ≤x, y ≤8 , and let α = 0.75. We solve (4)-(6) and (9). The result can be seen in Fig. 7. B. Limiting Behavior with Multiple Recurrent Classes Consider again f ∗ SU and K∗as before, and, without loss of generality, let SR K∗,F have two recurrent classes with initial states s1 and s2 i.e., Sps s1,K∗,F ∪Sps s2,K∗,F = SR K∗,F  . For any initial state s0 in Sps s1,K∗,F equiv., Sps s2,K∗,F  , the following holds with probability one: TK∗(s, s0) = f ∗ S(s) β , (10) where β =P s∈Sps s1,K∗,F f ∗ S(s) equiv. β =P s∈Sps s2,K∗,F f ∗ S(s)  . With equation (10) in mind, note that additional convex constraints may also be used to influence the limiting behavior of the robots. Moreover, by carefully selecting the number of robots allocated to each recurrent class, one can achieve a desirable limiting behavior for the ensemble of robots. VI. CONCLUSIONS We have proposed methods to design memoryless strate- gies for controlled Markov chains that guarantee maximal persistent surveillance properties under safety constraints. The uncomplicated structure of the resulting controllers makes them implementable in small robots. We have described a finitely parametrized convex program that solves this problem via entropy maximization principles, and we show that the computed control policy results in the closed loop Markov chain with the least number of recurrent classes. ACKNOWLEDGMENT This work was supported by NSF Grant CNS 0931878, AFOSR Grant FA95501110182, ONR UMD-AppEl Center and the Multiscale Systems Center, one of six research centers funded under the Focus Center Research Program. REFERENCES [1] E. Arvelo and N. C. Martins. Control design for Markov chains under safety constraints: a convex approach, 2012. arXiv:1209.2883. [2] B. Bethke, J. Redding, J. P. How, M. A. Vavrina, and J. Vian. Agent capability in persistent mission planning using approximate dynamic programming. American Control Conference, pages 1623–1628, 2010. [3] V.S. Borkar. Controlled Markov chains with constraints. Sadhana, 15(4- 5):405–413, December 1990. [4] J. Burlet, O. Aycard, and T. Fraichard. Robust motion planning using Markov decision processes and quadtree decomposition. In IEEE International Conference on Robotics and Automation, pages 2820– 2825, 2004. [5] H Choset. Coverage for robotics – A survey of recent results. Annals of Mathematics and Artificial Intelligence, 31:113–126, 2001. [6] I. Csiszar and P.C. Shields. Information Theory and Statistics: A Tutorial. Foundations and Trends in Communications and Information Theory, 1(4):1–116, 2004. [7] B. Fox. Markov Renewal Programming by Linear Fractional Pro- gramming. SIAM Journal on Applied Mathematics, 14(6):1418–1432, November 1966. [8] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 1.21. http://cvxr.com/cvx/, April 2011. [9] O. Hernandez-Lerma and J.B. Lasserre. Discrete-time Markov control processes: Basic optimality criteria, volume 30 of Stochastic Modelling and Applied Probability. Springer-Verlag, Berlin and New York, 1996. [10] P. F. Hokayem, D. Stipanovic, and M. W. Spong. On persistent coverage control. In IEEE Conference onDecision and Control, pages 6130–6135, 2007. [11] N. Michael, E. Stump, and K. Mohta. Persistent surveillance with a team of MAVs. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2708–2714, 2011. [12] N. Nigam, S. Bieniawski, I. Kroo, and J. Vian. Control of Multiple UAVs for Persistent Surveillance: Algorithm and Flight Test Results. IEEE Transactions on Control Systems Technology, 20(5):1236–1251, 2012. [13] N. Nigam and I. Kroo. Persistent Surveillance Using Multiple Un- manned Air Vehicles. In IEEE Aerospace Conference, pages 1–14, 2008. [14] M.L. Puterman. Markov Decision Processes: Discrete Stochastic Dy- namic Programming. John Wiley and Sons, New York, NY, 1994. [15] R. Simmons and S. Koenig. Probabilistic robot navigation in partially observable environments. International Joint Conference on Artificial Intelligence, 1995. [16] G. Theocharous and S. Mahadevan. Proceedings 2002 IEEE Interna- tional Conference on Robotics and Automation (Cat. No.02CH37292). In IEEE International Conference on Robotics and Automation, pages 1347–1352. IEEE, 2002. [17] A. Undurti and J. P. How. A decentralized approach to multi-agent planning in the presence of constraints and uncertainty. In IEEE International Conference on Robotics and Automation, pages 2534– 2539, 2011. [18] P. Wolfe and G.B. Dantzig. Linear Programming in a Markov Chain. Operations Research, 10(5):702–710, October 1962.