LTL Control in Uncertain Environments with Probabilistic Satisfaction Guarantees - Technical Report - Xu Chu Ding Stephen L. Smith Calin Belta Daniela Rus Abstract— We present a method to generate a robot control strategy that maximizes the probability to accomplish a task. The task is given as a Linear Temporal Logic (LTL) formula over a set of properties that can be satisfied at the regions of a partitioned environment. We assume that the probabilities with which the properties are satisfied at the regions are known, and the robot can determine the truth value of a proposition only at the current region. Motivated by several results on partitioned- based abstractions, we assume that the motion is performed on a graph. To account for noisy sensors and actuators, we assume that a control action enables several transitions with known probabilities. We show that this problem can be reduced to the problem of generating a control policy for a Markov Decision Process (MDP) such that the probability of satisfying an LTL formula over its states is maximized. We provide a complete solution for the latter problem that builds on existing results from probabilistic model checking. We include an illustrative case study. I. INTRODUCTION Recently there has been an increased interest in using temporal logics, such as Linear Temporal Logic (LTL) and Computation Tree Logic (CTL) as motion specification lan- guages for robotics [1]–[6]. Temporal logics are appealing because they provide formal, high level languages in which to describe complex missions, e.g., “Reach A, then B, and then C, in this order, infinitely often. Never go to A. Don’t go to B unless C or D were visited.” In addition, off-the- shelf model checking algorithms [7], [8] and temporal logic game strategies [9] can be used to verify the correctness of robot trajectories and to synthesize robot control strategies. Motivated by several results on finite abstractions of con- trol systems, in this paper we assume that the motion of the robot in the environment is modeled as a finite labeled tran- sition system. This can be obtained by simply partitioning the environment and labeling the edges of the corresponding quotient graph according to the motion capabilities of the robot among the regions. Alternatively, the partition can be made in the state space of the robot dynamics, and the transition system is then a finite abstraction of a continuous or hybrid control system [10], [11]. The problem of controlling a finite transition system from a temporal logic specification has received a lot of attention This work was supported in part by ONR-MURI N00014-09-1051, ARO W911NF-09-1-0088, AFOSR YIP FA9550-09-1-020, and NSF CNS- 0834260. X. C. Ding and C. Belta are with Department of Mechanical En- gineering, Boston University, Boston, MA 02215, USA (email: {xcding; cbelta}@bu.edu). S. L. Smith is with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo ON, N2L 3G1 Canada (email: stephen.smith@uwaterloo.ca). D. Rus is with the Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA (email: rus@csail.mit.edu). during recent years. All the existing works assume that the current state can be precisely determined. If the result of a control action is deterministic (i.e., at each state, an available control enables exactly one transition), control strategies from specifications given as LTL formulas can be found through a simple adaptation of off-the-shelf model checking algorithms [3]. If the control is nondeterministic (an available control at a state enables one of several transitions, and their probabilities are not known), the control problem from an LTL specification can be mapped to the solution of a Rabin game [12], or simpler B¨uchi and GR(1) games if the specification is restricted to fragments of LTL [1]. If the control is probabilistic (an available control at a state enables one of several transitions, and their probabilities are known), the transition system is a Markov Decision Process (MDP). The control problem then reduces to generating a policy (adversary) for an MDP such that the produced language satisfies a formula of a probabilistic temporal logic [13], [14]. We have recently developed a framework for deriving an MDP control strategy from a formula in a fragment of probabilistic CTL (pCTL) [15]. For probabilistic LTL, in [16], a control strategy is synthesized for an MDP where some states are under control of the environment, so that an LTL specification is guaranteed to be satisfied under all possible environment behaviors. The temporal logic control problems for systems with probabilistic or nondeterministic state-observation models, which include the class of Partially Observable Markov Decision Processes [17], [18], are cur- rently open. In this paper, we consider motion specifications given as arbitrary LTL formulas over a set of properties that can be satisfied with given probabilities at the vertices of a graph environment. We assume that the truth values of the properties can be observed only when a vertex is reached in the environment, and the observations of these properties are independent with each other. We assume a probabilistic robot control model and that the robot can determine its current vertex precisely. Under these assumptions, we develop an algorithm to generate a control strategy that maximizes the probability of satisfying the specification. Our approach is based on mapping this problem to the problem of generating a control policy for a MDP such that the probability of satis- fying an LTL formula is maximized. We provide a solution to this problem by drawing inspiration from probabilistic model checking. We illustrate the method by applying it to a numerical example of a robot navigating in an indoor environment. The contribution of this work is twofold. First, we adapt arXiv:1104.1159v2 [math.OC] 7 Apr 2011 existing approaches in probabilistic model checking (e.g., [19], [20]), and provide a complete solution to the general problem of controlling MDPs from full LTL specifications using deterministic Rabin automata. This is a significant de- parture from our previous work on MDP control from pCTL formulas [15], since it allows for strictly richer specifications. The increase in expressivity is particularly important in many robotic applications where the robot is expected to perform some tasks, such as surveillance, repeatedly. However, it comes at the price of increased computational complexity. Second, we allow for non-determinism not only in the robot motion, but also in the robot’s observation of properties in the environment. This allows us to model a large class of robotic problems in which the satisfaction of properties of interest can be predicted only probabilistically. For example, we can model a task where a robot is operating in an indoor environment, and is required to pick-up and deliver items among some rooms. The robot determines its current location using RFID tags on the floors and walls. Non-determinism occurs in observations because items may or may not be available when a robot visits a room. Non-determinism also occurs in the motion due to imprecise localization or control actuation. The remainder of the paper is organized as follows: In Section II we introduce the necessary definitions and preliminary results. In Section III we formulate the problem and describe the technical approach. In Section IV we reformulate this problem onto a MDP and show that two problems are equivalent. We synthesis our controls strategy in Section V, and an example of the provided algorithm is shown in Section VI. We conclude in Section VII. II. PRELIMINARIES In this section we provide background material on linear temporal logic and Markov decision processes. A. Linear Temporal Logic We employ Linear Temporal Logic (LTL) to describe high level motion specifications. A detailed description of the syntax and semantics of LTL is beyond the scope of this paper and can be found in, for example, [7]. Roughly, an LTL formula is built up from a set of atomic propositions Π, which are properties that can be either true or false, standard Boolean operators ¬ (negation), ∨(disjunction), ∧ (conjunction), and temporal operators ⃝(next), U (until), 3 (eventually), 2 (always) and ⇒(implication). The semantics of LTL formulas are given over words, which is defined as an infinite sequence o = o0o1 . . ., where oi ∈2Π for all i. We say o ⊨φ if the word o satisfies the LTL formula φ. The semantics of LTL is defined recursively. If φ = π is an LTL formula, where π ∈Π, then φ is true at position i of the word if π ∈oi. A word satisfies an LTL formula φ if φ is true at the first position of the word; 2φ means that φ is true at all positions of the word; 3φ means that φ eventually becomes true in the word; φ1 Uφ2 means φ2 eventually becomes true and φ1 is true until this happens; ⃝φ means that φ becomes true at next position of the word. More expressivity can be achieved by combining the above temporal and Boolean operators (several examples are given later in the paper). An LTL formula can be represented by a deterministic Rabin automaton, which is defined as follows. Definition 2.1 (Deterministic Rabin Automaton): A deterministic Rabin automaton (DRA) is a tuple R = (Q, Σ, δ, q0, F), where (i) Q is a finite set of states; (ii) Σ is a set of inputs (alphabet); (iii) δ : Q × Σ →Q is the transition function; (iv) q0 ∈Q is the initial state; and (v) F = {(L1, K1), . . . , (Lk, Kk)} is a set of pairs where Li, Ki ⊆Q for all i ∈{1, . . . , k}. A run of a Rabin automaton R, denoted by rR = q0q1 . . ., is an infinite sequence of states in R such that for each i ≥0, qi+1 ∈δ(qi, α) for some α ∈Σ. A run rR is accepting if there exists a pair (L, K) ∈F such that 1) there exists n ≥0, such that for all m ≥n, we have qm /∈L, and 2) there exist infinitely many indices k where qk ∈K. This acceptance conditions means that rR is accepting if for a pair (L, K) ∈F, rR intersects with L finitely many times and K infinitely many times. For any LTL formula φ over Π, one can construct a DRA with input alphabet Σ = 2Π accepting all and only words over Π that satisfy φ (see [21]). We refer readers to [22] and references therein for algorithms and to freely available implementations, such as [23], to translate a LTL formula over Π to a corresponding DRA. B. Markov Decision Process and probability measure We now introduce a labeled Markov decision process, and the probability measure we will use in the upcoming sections. Definition 2.2 (Labeled Markov Decision Process): A la- beled Markov decision process (MDP) is a tuple M = (S, U, A, P, ι, Π, h), where (i) S is a finite set of states; (ii) U is a finite set of actions; (iii) A : S →2U represents the set of actions enabled at state s ∈S; (iv) P : S ×U ×S →[0, 1] is the transition probability function such that for all states s ∈S, P s′∈S P(s, u, s′) = 1 if u ∈A(s) ⊆U and P(s, u, s′) = 0 if u /∈A(s); (v) ι : S →[0, 1] is the initial state distribution satisfying P s∈S ι(s) = 1; (vi) Π is a set of atomic propositions; and (vii) h : S →2Π is a labeling function. The quantity P(s, u, s′) represents the probability of reaching the state s′ from s taking the control u ∈A(s). We will now define a probability measure over paths in the MDP. To do this, we define an action function as a function µ : S →U such that µ(s) ∈A(s) for all s ∈S. An infinite sequence of action functions M = {µ0, µ1, . . .} is called a policy. One can use a policy to resolve all nondeterministic choices in an MDP by applying the action µk(sk) at each time-step k. Given an initial state s0 such that ι(s0) > 0, an infinite sequence rM M = s0s1 . . . on M generated under a policy M = {µ0, µ1, . . .} is called a path on M if P(si, µi(si), si+1) > 0 for all i. The subsequence s0s1 . . . sn is called a finite path. If µi = µ for all i, then we call this policy a stationary policy. We define PathsM M and FPathsM M as the set of all infinite and finite paths of M under a policy M starting from any state s0 where ι(s0) > 0. We can then define a probability measure over the set PathsM M of paths. For a path rM M = s0s1 . . . snsn+1 . . . ∈PathsM M, the prefix of length n of rM M is the finite subsequence s0s1 . . . sn. Let PathsM M(s0s1 . . . sn) denote the set of all paths in PathsM M with the prefix s0s1 . . . sn. (Note that s0s1 . . . sn is a finite path in FPathsM M.) Then, the probability measure PrM on the smallest σ- algebra over PathsM M containing PathsM M(s0s1 . . . sn) for all s0s1 . . . sn ∈FPathsM M is the unique measure satisfying PrM{PathsM M(s0s1 . . . sn)} = ι(s0) Y 0≤i 0} be the atomic propositions that can be observed at a vertex v. Then Zv = {Z ∈2Πv : Q π∈Z Po(v, π) × Q π /∈Z (1 −Po(v, π)) > 0} is the set of all possible observations at v. B. Problem Formulation Let the initial state of the robot be given as v0. The trajectory of the robot in the environment is an infinite sequence r = v0v1, . . ., where Pm(vi, u, vi+1) > 0 for some u for all i. Given r = v0v1, . . ., we call vi the state of the robot at the discrete time-step i. We denote the observed atomic propositions at time-step i as oi ∈Zvi and O(r) = o0o1 . . . as the word observed by r. An example of a trajectory r and its observed word in an environment with given E, U, A, Pm and Po are shown in Fig. 1. v2 Zv0 = {∅, {a}, {b}, {a, b}} Zv1 = {{b}} Zv2 = {{b}, {a, b}} Zv3 = {{a}} Po(v2, a) = 0.4 Po(v2, b) = 1 Po(v0, a) = 0.2 Po(v0, b) = 0.6 Po(v1, b) = 1 v0 Po(v3, a) = 1 v1 v3 1 0.2 0.8 0.3 0.7 1 u1 u1 u2 u2 1 u3 r = v0v1v2v3v1... Control = u1u1u2u2... O(r) = {b}{b}{a, b}{a}{b}... Fig. 1. An example trajectory r and its observed word O(r). We also show Zv for all v ∈V . A single arrow pointed towards a state v0 indicates the initial state. The atomic proposition set is Π = {a, b}. The set of motion primitives is U = {u1, u2, u3}. The probability function Po assigns probabilities for all atomic propositions at each state. We show the probability of an atomic proposition only if it is positive (i.e., π ∈Πv). The number on top of an arrow pointing from a vertex v to v′ is the probability Pm(v, u, v′) associated with a control u ∈U. Our desired “reactive” control strategy is in the form of an infinite sequence C = {ν0, ν1, . . .} where νi : V × 2Π →U and νi(v, Z) is defined only if Z ∈Zv. Furthermore, we enforce that νi(v, Z) ∈A(v) for all v and all i. The reactive control strategy returns the control to be applied at each time-step, given the current state v and observed set of propositions Z at v. Given an initial condition v0 and a control strategy C, we can produce a trajectory r = v0v1 . . . where the control applied at time i is νi(vi, oi). We call r and O(r) = o = o0o1 . . . the trajectory and the word generated under C, respectively. Note that given v0 and a control strategy C, the resultant trajectory and its corresponding word are not unique due to non-determinism in both motion and observation of the robot. Now we formulate the following problem: Problem 3.1: Given the environment represented by E = (V, δE, Π); the robot motion model U, A and Pm; the observation model Po; and an LTL formula φ over Π, find the control strategy C that maximizes the probability that the word generated under C satisfies φ. C. Summary of technical approach Our approach to solve Prob. 3.1 proceeds by construction of a labeled MDP M (see Def. 2.2), which captures all pos- sible words that can be observed by the robot. Furthermore, each control strategy C corresponds uniquely to a policy M on M. Thus, each trajectory with an observed word under a control strategy C corresponds uniquely to a path on M under M. We then reformulate Prob. 3.1 as the problem of finding the policy on M that maximizes the probability of satisfying φ. These two problems are equivalent due to the assumption that all observations are independent. We synthesize the optimal control strategy by solving maximal reachability probability problems inspired by results in prob- abilistic model checking. Our framework is more general than in [15] due to a richer specification language and non- determinism in observation of the environment. The trade off is that computational complexity in this approach is in general much larger due to increased size of the automaton representing the specification. IV. MDP CONSTRUCTION AND PROBLEM REFORMULATION As part of our approach to solve Problem 3.1, we con- struct a labeled MDP M = (S, U, A, P, ι, Π, h) from the environment model E, the robot motion model U, A, Pm, and the observation model Po as follows: • S = {(v, Z) | v ∈V, Z ∈Zv} • U = U • A((v, Z)) = A(v) • P((v, Z), u, (v′, Z′)) = Pm(v, u, v′)× Y π∈Z′ Po(v′, π) × Y π /∈Z′ (1 −Po(v′, π)) ! • ι is defined as ι(s) = Q π∈Z P(v0, π)× Q π /∈Z (1−P(v0, π)) if s = (v0, Z) for any Z ∈Zv0, and ι(s) = 0 otherwise. • h((v, Z)) = Z for all (v, Z) ∈S. An example of a constructed MDP is shown in Fig. 2. One can easily verify that M is a valid MDP such that for all s ∈S, P s′∈S P(s, u, s′) = 1 if u ∈A(s), P(s, u, s′) = 0 if u /∈A(s), and P s∈S ι(s) = 1. We discuss the growth of the state space from E to M in Section V-C. We now formulate a problem on the MDP M. We will then show that this new problem is equivalent to Prob. 3.1. Problem 4.1: For a given labeled MDP M and an LTL formula φ, find a policy such that PrM M(φ) (see Eq. (2)) is maximized. The following proposition formalizes the equivalence be- tween the two problems, and the one-to-one correspondence between a control strategy on E and a policy on M. Proposition 4.2 (Equivalence of problems): A control strategy C = {ν0, ν1, . . .} is a solution to Problem 3.1 if and only if the policy M = {µ0, µ1, . . .}, where µi (vi, Zi)  = νi(vi, Zi) for each i, is a solution to Problem 4.1. Proof: We can establish an one-to-one correspondence between a control strategy C in the environment E and a policy M on M. Given C = {ν0, ν1, . . .}, we can obtain the corresponding M = {µ0, µ1, . . .} by setting µi((vi, Zi)) = νi(vi, Zi). Conversly, given M = {µ0, µ1, . . .}, we can (v3, {a}) (v0, ∅) 0.32 (v0, {a}) 0.08 (v0, {b}) 0.48 (v0, {a, b}) 0.12 (v1, {b}) (v2, {b}) s0 s1 s2 s3 s4 s5 s7 0.48 1 1 1 1 0.32 1 1 1 0.224 0.056 0.336 0.084 0.3 u1 u2 u1 u1 u1 u1 0.2 u3 u2 (v2, {a, b}) s6 u2 Fig. 2. The constructed MDP M using E, U, A, Pm and Po from the example in Fig. 1. For each state s ∈S, the labels on top of the state show the components of s (i.e., s = (v, Z)). The number on the arrow from the state (v, Z) to the state (v′, Z′) denotes the transition probability P((v, Z), u, (v′, Z′)) for the action u ∈U. The numbers atop arrows pointing into states (v0, Zv0) denote the initial distribution. The set of atomic propositions assigned to each state in M is the second component of the state. generate a corresponding control strategy C = {ν0, ν1, . . .} such that νi(vi, Zi) = µi((vi, Zi)). We need only to verify that we can use the same probabil- ity measure on paths in E and on trajectories in M. Due to the assumption that each observation at v is independent, the observation process is Markovian as it only depends on which vertex the observation is made. Note that the probability of observing Z ∈Zv at a state v ∈V is Q π∈Z P(v, π)×Q π /∈Z(1−P(v, π)). Hence, the probability of moving from a vertex v to v′, under control u, and observing Z′ ∈Zv′ is exactly P((v, Z), u(v′, Z′)). There- fore, the probability of observing the finite word O(rf) for a finite trajectory rf = v0 . . . vn under C is the same (by construction of M) as traversing through a finite path frM M ∈FPathsM M such that h(frM M) = O(rf) under the policy M corresponding to C. Since this property holds for any arbitrary finite trajectory rf, a trajectory r with a word O(r) under C can be uniquely mapped to a path in PathsM M and we can use the probability measure and σ-algebra (see Sec. II-B) on M under a policy M for the corresponding control strategy C. Thus, if M is a solution for Prob. 3.1, then the control strategy C corresponding to M is a solution for Prob. 4.1, and vice versa. Due to the above proposition, we will proceed by construct- ing a policy M on the MDP M as a solution to Prob. 4.1. We can then uniquely map M to a control strategy C in the robot environment E for a solution to Prob. 3.1. V. SYNTHESIS OF CONTROL STRATEGY In this section we provides a solution for Prob. 3.1 by synthesizing an optimal policy for Prob. 4.1. Our approach is adapted from automata-theoretic approaches in the area of probabilistic verification and model checking (see [20] and references therein for an overview). Probabilistic LTL model checking finds the maximum probability that a path of a given MDP satisfies a LTL specification. We modify this method to obtain an optimal policy that achieves the maximum probability. This approach is related to the work of [26], in which rewards are assigned to specifications and non- deterministic B¨uchi automata (NBA) are used. We do not use NBAs since a desired product MDP cannot be directly constructed from an NBA, but only from an DRA. A. The Product MDP We start by converting the LTL formula φ to a DRA defined in Def. 2.1. We denote the resulting DRA as Rφ = (Q, 2Π, δ, q0, F) with F = {(L1, K1), . . . , (Lk, Kk)} where Li, Ki ⊆Q for all i = 1, . . . , k. The DRA obtained from the LTL formula φ = 23a ∧23b is shown in Fig. 3. q0 q2 q1 q3 q4 {∅, {b}} {∅, {b}} {∅, {b}} {∅, {a}} {{a}} {{a}} {{a}} {{b}, {a, b}} {{a, b}} {{a, b}} {∅, {a}} {{a, b}} L = ∅ K = {q3, q4} {{b}, {a, b}} Fig. 3. The DRA Rφ corresponding to the LTL formula φ = 23a∧23b. In this example, there is one set of accepting states F = {(L, K)} where L = ∅and K = {q3, q4}. Thus, accepting runs of this DRA must visit q3 or q4 (or both) infinitely often. We now obtain an MDP as the product of a labeled MDP M and a DRA Rφ. This product MDP allows one to find runs on M that generate words satisfying the acceptance condition of Rφ. Definition 5.1 (Product MDP): The product MDP M × Rφ between a labeled MDP M = (S, U, A, P, ι, Π, h) and a DRA Rφ = (Q, 2Π, δ, q0, F) is a MDP MP = (SP, U, AP, PP, ιP), where: • SP = S × Q (the Cartesian product of sets S and Q) • AP((s, q)) = A(s) • PP((s, q), u, (s′, q′)) =  P(s, u, s′) if q′ = δ(q, h(s′)) 0 otherwise • ιP((s, q)) = ι(s) if q = δ(q0, h(s)) and ιP = 0 otherwise. We generate the accepting state pairs FP for the product MDP MP as follows: For a pair (Li, Ki) ∈F, a state (s, q) of MP is in LP i if q ∈Li, and (s, q) ∈KP i if q ∈Ki. As an example, we show in Fig. 4 some of the states and transitions for the product MDP MP = M × Rφ where M is shown in Fig. 2 and Rφ is shown in Fig. 3. Note that the set of actions for MP is the same as the one for M. A policy MP = {µP 0 , µP 1 , . . .} on MP directly induces a policy M = {µ0, µ1, . . .} on M by keeping track of the state on the product MDP (µP i is an action function that returns an action corresponding to a state in MP). Note ( ) ( ) ( ) u1 0.2 s0, q0 s1, q1 s2, q0 s3, q2 0.32 0.08 0.48 0.12 u1 1 1 1 1 u1 u1 u1 s4, q0 s4, q2 s5, q0 0.48 u1 s7, q1 s5, q2 0.48 0.32 s6, q4 s7, q3 u2 ... u2 ... KP KP s6, q2 ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 0.32 1 u3 1 u3 u2 u2 1 u2 u2 1 0.2 1 Fig. 4. The product MDP MP = M × Rφ where M is shown in Fig. 2 and Rφ is shown in Fig. 3. Due to space restrictions, only the initial states (states for which the initial distribution ιP((s, q)) is positive) and part of the state space are shown. FP = {(LP, KP)}, where LP = ∅and states in KP are marked. that given the state of M at time-step i and the state of MP at time-step i −1, the state of MP at time-step i can be exactly determined. We can induce a policy M for M from a policy MP for MP as follows: Definition 5.2 (Inducing a policy for M from MP): If the state of MP at time-step i is (si, qi), then the policy M = {µ0, µ1, . . .} induced from MP = {µP 0 , µP 1 . . .} can be obtained by setting µi(si) = µP i ((si, qi)) for all i. We denote rMP MP as a path on MP under a policy MP. We say a path rMP MP is accepting if and only if it satisfies the Rabin acceptance condition with FP as the accepting states pairs, i.e., there exists a pair (LP, KP) ∈FP so that rMP MP intersects with LP finitely many times and KP infinitely many times. The product MDP is constructed so that given a path rMP MP = (s0, q0)(s1, q1) . . ., the path s0s1 . . . on M generates a word that satisfies φ if and only if the infinite sequence q0q1 . . . is an accepting run on Rφ, in which case rMP MP is accepting. Therefore, each accepting path of MP uniquely corresponds to a paths of M whose word satisfies φ. B. Generating the Optimal Control Strategy Once we obtain the product MDP MP = M×Rφ and the accepting states pairs FP = {(LP 1 , KP 1 ), . . . , (LP k , KP k )}, the method to obtain a solution to Prob. 3.1 proceeds as follows: For each pair (LP i , KP i ) ∈FP, we obtain a set of ac- cepting maximum end components. An accepting maximum end component for MP consists of a set of states SP ⊆SP and a function AP such that ∅̸= AP((s, q)) ⊆AP((s, q)) for all (s, q) ∈SP. It has the property that, by taking actions enabled by AP, all states in SP can reach every other state in SP and can not reach any state outside of SP. Furthermore, it contains no state in LP i and at least one state in KP i . In addition, it is called maximum because it is not properly contained in another accepting maximum end component. Note that for a given pair (LP i , KP i ), its accepting maximal end components are pairwise disjoint. A procedure to obtain all accepting maximum end com- ponents of an MDP is outlined in [19]. From probabilistic model checking (see [19], [20]), the maximum probability of satisfying the LTL formula φ for M is the same as the maximum probability of reaching any accepting maximum end component of MP. Once an accepting maximum end component (SP, AP) is reached, then all states in SP are reached infinitely often (and φ satisfied) with probability 1, under a policy that all actions in AP are used infinitely often. The maximum probability of reaching a set of states BP ⊆SP can be obtained by the solution of a linear program. First we find the set of states that can not reach BP under any policy and denote it as CP (a simple graph analysis is sufficient to find this set). We then let xp denote the maximum probability of reaching the set BP from a state p ∈SP. We have xp = 1 if p ∈BP, and xp = 0 if p ∈CP. The values of xp for the remaining states can then be determined by solving a linear optimization problem: min X p∈SP xp, subject to: 0 ≤xp ≤1, and xp ≥ X t∈SP PP(p, u, t)xt for all p ∈SP \ (BP ∪CP) and for all u ∈AP(p). (4) Once xp is obtained for all p ∈SP, one can identify an action u⋆(not necessarily unique) for each state p ∈ SP \ (BP ∪CP) such that: xp = X t∈SP PP(p, u⋆, t)xt. (5) We define a function µ⋆ P : SP →U that returns an action u⋆ satisfying (5) if p ∈SP \(BP ∪CP) (the actions for states in BP or CP are irrelevant and can be chosen arbitrarily). Then the optimal policy maximizing the probability of reaching BP is the stationary policy {µ⋆ P, µ⋆ P, . . .}. Our desired policy M ⋆ P that maximizes the probability of satisfying φ on the product MDP is the policy maximizing the probability of reaching the union of all accepting maxi- mum end components of all accepting state pairs in FP, if the state is not in an accepting maximum end component. Otherwise, the optimal policy is to use all actions allowed in the associated accepting maximal end component infinitely often in a round-robin fashion. The solution to Prob. 4.1 is then the policy M ⋆on M induced by M ⋆ P. The desired reactive control strategy C⋆as a solution to Prob. 3.1 can finally be obtained as the control strategy corresponding to M ⋆(see Prop. 4.2). Our overall approach is summarized in Alg. 1. C. Complexity The complexity of our proposed algorithm is dictated by the size of the generated MDPs. We use |·| to denote cardinal- ity of a set. The number of states in M is |S| = P v∈V |Zv|. Hence, in the worst case where all propositions π ∈Π can be observed with positive but less than 1 probability at all vertices v ∈V , |S| = 2|Π|. In practice, the number of propositions that can be non-deterministically observed at a vertex is small. For example, in an urban setting, most regions of the environment including intersections and roads Algorithm 1 Generating the optimal control strategy C⋆ given E, U, A, Pm, Po and φ 1: Generate the MDP M from the environment model E, the motion primitives U, the actions A, the motion model Pm and the observation model Po. 2: Translate the LTL formula φ to a deterministic Rabin automaton Rφ. 3: Generate the product MDP MP = M × Rφ and ac- cepting states pairs FP = {(LP 1 , KP 1 ), . . . , (LP k , KP k )}. 4: Find all accepting maximum end components for all pairs (LP i , KP i ) ∈FP, and find their union BP. 5: Find the stationary policy {µ⋆ P, µ⋆ P, . . .} maximizing the probability of reaching BP by solving (4) and (5). 6: Generate the policy M ⋆ P = {µP 0 , µP 1 , . . .} as follows: µP i (p) = µ⋆ P(p) if p ∈SP \ BP. Otherwise, p is in at least one accepting maximum end component. Assuming it is (SP, AP) and AP(p) = {u1, u2, . . . , um}, then µP i (p) = uj where j = i mod m. 7: Generate the policy M ⋆= {µ0, µ1, . . .} induced by M ⋆ P. 8: Generate the control strategy C⋆= {ν0, ν1, . . .} corre- sponding to M ⋆by setting νi(v, Z) = µi((v, Z)) for all i. have fixed atomic propositions. The number of intersections that can be blocked is small comparing to the size of the environment. The size of the DRA |Q| is in worst case, doubly expo- nential with respect to |Π|. However, empirical studies such as [22] have shown that in practice, the sizes of the DRAs for many LTL formulas are exponential or lower with respect to |Π|. In robot control applications, since properties in the environment are typically assigned scarcely (meaning that each region of the environment is usually assigned a small number of properties comparing to |Π|), the size of DRA can be reduced much further by removing transitions in the DRA with inputs that can never appear in the environment, and then minimizing the DRA by removing states that can not be reached from the initial state. The size of the product MDP MP is |M| × |Q|. The complexity for the algorithm to generate accepting maximal end component is at most quadratic in the size of MP (see [19]), and the complexity for finding the optimal policy from a linear program is polynomial in the size of MP. Thus, overall, our algorithm is polynomial in the size of MP. VI. EXAMPLE The computational framework developed in this paper is implemented in MATLAB, and here we provide an example as a case study. Consider a robot navigating in an indoor environment as shown in Fig. 5. Each region of the environ- ment is represented by a vertex vi, and the arrows represent allowable transitions between regions. In this case study, we choose the motion primitives arbitrarily (see the caption of Fig. 5). In practice, they can either correspond to low level control actions such as “turn left”, “turn right” and “go V13 V1 V4 V7 V8 V2 V3 V5 V6 V12 V11 V10 V9 Po(v13, pickup) = 1 Po(v13, observe9) = 0.4 Po(v7, event7) = 1 Po(v9, event9) = 0.8 Fig. 5. Environment for a numerical example of the proposed approach. We assume that the set of motion primitive is U = {α, β, γ}. The number of actions available at each vertex depends on the number of arrows from that vertex to adjacent vertices. We define the enabling function A so that the motion primitive α is enabled at all vertices, β is enabled at vertices v1, v6 and v7, and γ is enabled at vertices v2, v3, v6, v7 and v8. straight”, or high level commands such as “go from region 1 to region 2”, which can then be achieved by a sequence of low level control actions. The goal of the robot is to perform a persistent surveillance mission on regions v7 and v9, described as follows: The robot can pickup (or receive) a surveillance task at region v13. With probability 0.4 the robot receives the task denoted observe9. Otherwise, the task is observe7. The task observe7 (or observe9) is completed by traveling to region v7 (or v9), and observing some specified event. In region v7, the robot observes the event (event7) with probability 1. In region v9, each time the robot enters the region, there is a probability of 0.8 that it observes the event (event9). Thus, the robot may have to visit v9 multiple times before observing event9. Once the robot observes the required event, it must return to v13 and pickup a new task. This surveillance mission can be rep- resented by four atomic propositions {pickup, observe9, event7, event9}. (the task observe7 can be written as ¬observe9). The propositions pickup and observe7 are assigned to v13, with Po(v13, pickup) = 1 and Po(v13, observe9) = 0.4. The proposition event7 is assigned to v7 with Po(v7, event7) = 1 and event9 is assigned to v9 with Po(v9, event9) = 0.8. The surveillance mission can be written as the following LTL formula: φ = 23pickup∧ 2 (pickup ∧¬observe9 ⇒⃝(¬pickup Uevent7)) ∧2 (pickup ∧observe9 ⇒⃝(¬pickup Uevent9)) . The first line of φ, 23pickup, enforces that the robot must repeatedly pick up tasks. The second line pertains to task observe7 and third line pertains to task observe9. These two lines ensure that a new task cannot be picked up until the current task is completed (i.e., the desired event is observed). Note that if event9 is observed after observing event7, then the formula φ is not violated (and similarly if event7 is observed after observing event9). The MDP M generated from the environment is shown in Fig. 6. For this example, we have arbitrarily chosen values for the probability transition function Pm. In practice, proba- bilities of transition under actuation and measurement errors can be obtained from experiments or accurate simulations (see [15]). The number of states in the MDP M is |S| = 15. 0.7 0.7 1 1 0.6 0.4 1 1 1 0.2 0.8 0.8 0.2 1 1 0.9 0.9 0.9 0.3 1 1 0.8 0.2 1 1 1 0.1 0.1 0.1 α α α α α α α α α α α α α γ γ γ γ γ β γ β 0.3 pickup, observe9 observe9 observe7 pickup Fig. 6. MDP M generated from the environment with given U, A, Po and Pm. The initial state s0 is marked by an incoming arrow (ι(s0) = 1). We generated the deterministic Rabin automaton Rφ using the ltl2dstar tool (see [23]). The number of states |Q| is 52. Thus, the product MDP MP has 780 states. For the DRA generated, there is only one set in F, i.e., F = {(L, K)}, with 1 state in L and 18 states in K. Thus, the number of states in LP is 15 and the number of states in KP is 270. There is one accepting maximum end component in MP, and it contains 17 states. Using the implementation of Alg. 1 we computed the maximum probability of satisfying the specification from the initial state and the optimal control strategy. The Algorithm ran in approximately 7 seconds on a MacBook Pro computer with a 2.5 GHz dual core processor. For this example the maximum probability is 1, implying that the corresponding optimal control strategy almost surely satisfies φ. To illustrate the control strategy, a sample execution is shown in Fig. 7. VII. CONCLUSIONS AND FINAL REMARKS We presented a method to generate a robot control strategy that maximizes the probability to accomplish a task. The robot motion in the environment was modeled as a graph and the task was given as a Linear Temporal Logic (LTL) formula over a set of properties that can be satisfied at the vertices with some probability. We allowed for noisy sensors and actuators by assuming that a control action enables several transitions with known probabilities. We Fig. 7. A sample path of the robot with the optimal control strategy. The word observed by the sample path is pickup, event7, event9, {pickup, observe9}, event9, . . .. reduced this problem to one of generating a control policy for a Markov Decision Process such that the probability of satisfying an LTL formula over its states is maximized. We then provided a complete solution to this problem adapting existing probabilistic model checking tools. We are currently pursuing several future directions. We are looking at proposition observation models that are not independently distributed. These models arise when the current truth value of the proposition gives information about the future truth value. We are also looking at methods for optimizing the robot control strategy for a suitable cost function when costs are assigned to actions of an MDP. The second direction will build on our recent results on optimal motion planning with LTL constraints [27]. REFERENCES [1] H. Kress-Gazit, G. Fainekos, and G. J. Pappas, “Where’s Waldo? Sensor-based temporal logic motion planning,” in IEEE Int. Conf. on Robotics and Automation, Rome, Italy, 2007, pp. 3116–3121. [2] S. Karaman and E. Frazzoli, “Sampling-based motion planning with deterministic µ-calculus specifications,” in IEEE Conf. on Decision and Control, Shanghai, China, 2009, pp. 2222 – 2229. [3] M. Kloetzer and C. Belta, “A fully automated framework for control of linear systems from temporal logic specifications,” IEEE Transactions on Automatic Control, vol. 53, no. 1, pp. 287–297, 2008. [4] S. G. Loizou and K. J. Kyriakopoulos, “Automatic synthesis of multiagent motion tasks based on LTL specifications,” in IEEE Conf. on Decision and Control, Paradise Island, Bahamas, 2004, pp. 153– 158. [5] M. M. Quottrup, T. Bak, and R. Izadi-Zamanabadi, “Multi-robot motion planning: A timed automata approach,” in IEEE Int. Conf. on Robotics and Automation, New Orleans, LA, Apr. 2004, pp. 4417– 4422. [6] T. Wongpiromsarn, U. Topcu, and R. M. Murray, “Receding horizon temporal logic planning for dynamical systems,” in IEEE Conf. on Decision and Control, Shanghai, China, 2009, pp. 5997–6004. [7] E. M. Clarke, D. Peled, and O. Grumberg, Model checking. MIT Press, 1999. [8] E. A. Emerson, “Temporal and modal logic,” in Handbook of Theoreti- cal Computer Science: Formal Models and Semantics, J. van Leeuwen, Ed. Elsevier, 1990, vol. B, pp. 995–1072. [9] N. Piterman, A. Pnueli, and Y. Saar, “Synthesis of reactive(1) designs,” in International Conference on Verification, Model Checking, and Abstract Interpretation, Charleston, SC, 2006, pp. 364–380. [10] R. Alur, T. A. Henzinger, G. Lafferriere, and G. J. Pappas, “Discrete abstractions of hybrid systems,” Proceedings of the IEEE, vol. 88, pp. 971–984, 2000. [11] G. J. Pappas, “Bisimilar linear systems,” Automatica, vol. 39, no. 12, pp. 2035–2047, 2003. [12] W. Thomas, “Infinite games and verification,” in Computer Aided Verification, ser. Lecture Notes in Computer Science, E. Brinksma and K. Larsen, Eds. Springer, 2002, vol. 2404, pp. 58–65. [13] A. Dianco and L. D. Alfaro, “Model checking of probabilistic and nondeterministic systems,” in Foundations of Software Technology and Theoretical Computer Science, ser. Lecture Notes in Computer Science. Springer, 1995, vol. 1026, pp. 499–513. [14] M. Kwiatkowska, G. Norman, and D. Parker, “Probabilistic symbolic model checking with PRISM: A hybrid approach,” International Journal on Software Tools for Technology Transfer, vol. 6, no. 2, pp. 128–142, 2004. [15] M. Lahijanian, J. Wasniewski, S. B. Andersson, and C. Belta, “Motion planning and control from temporal logic specifications with proba- bilistic satisfaction guarantees,” in IEEE Int. Conf. on Robotics and Automation, Anchorage, AK, 2010, pp. 3227 – 3232. [16] C. Baier, M. Gr¨oßer, M. Leucker, B. Bollig, and F. Ciesinski, “Con- troller synthesis for probabilistic systems,” in Proceedings of IFIP TCS, 2004. [17] J. Pineau and S. Thrun, “High-level robot behavior control using POMDPs,” in AAAI Workshop notes, Menlo Park, CA, 2002. [18] N. L. Zhang and W. Zhang, “Speeding up the convergence of value iteration in partially observable Markov decision processes,” Journal of Artificial Intelligence Research, vol. 14, pp. 29–51, 2001. [19] C. Baier and J.-P. Katoen, Principles of Model Checking. MIT Press, 2008. [20] M. Vardi, “Probabilistic linear-time model checking: An overview of the automata-theoretic approach,” Formal Methods for Real-Time and Probabilistic Systems, pp. 265–276, 1999. [21] E. Gradel, W. Thomas, and T. Wilke, Automata, logics, and infinite games: A guide to current research, ser. Lecture Notes in Computer Science. Springer, 2002, vol. 2500. [22] J. Klein and C. Baier, “Experiments with deterministic ω-automata for formulas of linear temporal logic,” Theoretical Computer Science, vol. 363, no. 2, pp. 182–195, 2006. [23] J. Klein, “ltl2dstar - LTL to deterministic Streett and Rabin automata,” http://www.ltl2dstar.de/, 2007. [24] L. Habets and J. van Schuppen, “A control problem for affine dynami- cal systems on a full-dimensional polytope,” Automatica, vol. 40, no. 1, pp. 21–35, 2004. [25] C. Belta, V. Isler, and G. J. Pappas, “Discrete abstractions for robot planning and control in polygonal environments,” IEEE Transactions on Robotics, vol. 21, no. 5, pp. 864–874, 2005. [26] C. Courcoubetis and M. Yannakakis, “Markov decision processes and regular events,” IEEE Transactions on Automatic Control, vol. 43, no. 10, pp. 1399–1418, 1998. [27] S. L. Smith, J. T˚umov´a, C. Belta, and D. Rus, “Optimal path planning under temporal constraints,” in IEEE/RSJ Int. Conf. on Intelligent Robots & Systems, Taipei, Taiwan, Oct. 2010, to appear.