Incremental Temporal Logic Synthesis of Control Policies for Robots Interacting with Dynamic Agents Tichakorn Wongpiromsarn, Alphan Ulusoy, Calin Belta, Emilio Frazzoli and Daniela Rus Abstract— We consider the synthesis of control policies from temporal logic specifications for robots that interact with multiple dynamic environment agents. Each environment agent is modeled by a Markov chain whereas the robot is modeled by a finite transition system (in the deterministic case) or Markov decision process (in the stochastic case). Existing results in probabilistic verification are adapted to solve the synthesis problem. To partially address the state explosion issue, we propose an incremental approach where only a small subset of environment agents is incorporated in the synthesis procedure initially and more agents are successively added until we hit the constraints on computational resources. Our algorithm runs in an anytime fashion where the probability that the robot satisfies its specification increases as the algorithm progresses. I. INTRODUCTION Temporal logics [1], [2], [3] have been recently employed to precisely express complex behaviors of robots. In partic- ular, given a robot specification expressed as a formula in a temporal logic, control policies that ensure or maximize the probability that the robot satisfies the specification can be automatically synthesized based on exhaustive exploration of the state space [4], [5], [6], [7], [8], [9], [10], [11], [12]. Consequently, the main limitation of existing approaches for synthesizing control policies from temporal logic specifica- tions is almost invariably due to a combinatorial blow up of the state space, commonly known as the state explosion problem. In many applications, robots need to interact with exter- nal, potentially dynamic agents, including human and other robots. As a result, the control policy synthesis problem becomes more computationally complex as more external agents are incorporated in the synthesis procedure. Consider, as an example, the problem where an autonomous vehicle needs to go through a pedestrian crossing while there are multiple pedestrians who are already at or approaching the crossing. The state space of the complete system (i.e., the vehicle and all the pedestrians) grows exponentially with the number of the pedestrians. Hence, given a limited budget of computational resources, solving the control policy synthesis problem with respect to temporal logic specifications may not be feasible when there are a large number of pedestrians. In this paper, we partially address the aforementioned issue and propose an algorithm for computing a robot control policy in an anytime manner. Our algorithm progressively T. Wongpiromsarn is with the Singapore-MIT Alliance for Research and Technology, Singapore 117543, Singapore. nok@smart.mit.edu A. Ulusoy and C. Belta are with Boston University, Boston, MA, USA alphan@bu.edu, cbelta@bu.edu E. Frazzoli and D. Rus are with the Massachusetts Institute of Technology, Cambridge, MA, USA frazzoli@mit.edu, rus@csail.mit.edu computes a sequence of control policies, taking into account only a small subset of the environment agents initially and successively adds more agents to the synthesis procedure in each iteration until the computational resource constraints are exceeded. As opposed to existing incremental synthesis approaches that handle temporal logic specifications where representative robot states are incrementally added to the synthesis procedure [8], we consider incrementally adding representative environment agents instead. The main contribution of this paper is twofold. First, we propose an anytime algorithm for synthesizing a control pol- icy for a robot interacting with multiple environment agents with the objective of maximizing the probability for the robot to satisfy a given temporal logic specification. Second, an incremental construction of various objects needed to be computed during the synthesis procedure is proposed. Such an incremental construction makes our anytime algorithm more efficient by avoiding unnecessary computation and exploiting the objects computed in the previous iteration. Experimental results show that not only we obtain a rea- sonable solution much faster, but we are also able to obtain an optimal solution faster than existing approaches. The rest of the paper is organized as follows: We provide useful definitions and descriptions of the formalisms in the following section. Section III is dedicated to the problem formulation. Section IV provides a complete solution to the control policy synthesis problem for robots that interact with environment agents. Incremental computation of control policies is discussed in Section V. Section VI presents experimental results. Finally, Section VII concludes the paper and discusses future work. II. PRELIMINARIES We consider systems that comprise multiple (possibly stochastic) components. In this section, we define the for- malisms used in this paper to describe such systems and their desired properties. Throughout the paper, we let X∗, Xω and X+ denote the set of finite, infinite and nonempty finite strings, respectively, of a set X. A. Automata Definition 1: A deterministic finite automaton (DFA) is a tuple A = (Q, Σ, δ, qinit, F) where • Q is a finite set of states, • Σ is a finite set called alphabet, • δ : Q × Σ →Q is a transition function, • qinit ∈Q is the initial state, and • F ⊆Q is a set of final states. arXiv:1203.1180v1 [cs.RO] 6 Mar 2012 We use the relation notation, q w −→q′ to denote δ(q, w) = q′. Consider a finite string σ = σ1σ2 . . . σn ∈Σ∗. A run for σ in a DFA A = (Q, Σ, δ, qinit, F) is a finite sequence of states q0q1 . . . qn such that q0 = qinit and q0 σ1 −→q1 σ2 −→ q2 σ3 −→. . . σn −→qn. A run is accepting if qn ∈F. A string σ ∈Σ∗is accepted by A if there is an accepting run of σ in A. The language accepted by A, denoted by L(A), is the set of all accepted strings of A. B. Linear Temporal Logic Linear temporal logic (LTL) is a branch of logic that can be used to reason about a time line. An LTL formula is built up from a set Π of atomic propositions, the logic connectives ¬, ∨, ∧and =⇒and the temporal modal operators # (“next”), □(“always”), 3 (“eventually”) and U (“until”). An LTL formula over a set Π of atomic propositions is inductively defined as ϕ := True | p | ¬ϕ | ϕ ∧ϕ | # ϕ | ϕ U ϕ where p ∈Π. Other operators can be defined as follows: ϕ∧ψ = ¬(¬ϕ∨¬ψ), ϕ =⇒ψ = ¬ϕ∨ψ, 3ϕ = True U ϕ, and □ϕ = ¬3¬ϕ. Semantics of LTL: LTL formulas are interpreted on infi- nite strings over 2Π. Let σ = σ0σ1σ2 . . . where σi ∈2Π for all i ≥0. The satisfaction relation |= is defined inductively on LTL formulas as follows: • σ |= True, • for an atomic proposition p ∈Π, σ |= p if and only if p ∈σ0, • σ |= ¬ϕ if and only if σ ̸|= ϕ, • σ |= ϕ1 ∧ϕ2 if and only if σ |= ϕ1 and σ |= ϕ2, • σ |= #ϕ if and only if σ1σ2 . . . |= ϕ, and • σ |= ϕ1 U ϕ2 if and only if there exists j ≥0 such that σjσj+1 . . . |= ϕ2 and for all i such all 0 ≤i < j, σiσi+1 . . . |= ϕ1. More details on LTL can be found, e.g., in [1], [2], [3]. In this paper, we are particularly interested in a class of LTL known as co-safety formulas. An important property of a co-safety formula is that any word satisfying the formula has a finite good prefix, i.e., a finite prefix that cannot be extended to violate the formula. Specifically, given an alphabet Σ, a language L ⊆Σω is co-safety if and only if every w ∈L has a good prefix x ∈Σ∗such that for all y ∈Σω, we have x · y ∈L. In general, the problem of determining whether an LTL formula is co- safety is PSPACE-complete [13]. However, there is a class of co-safety formulas, known as syntactically co-safe LTL formulas, which can be easily characterized. A syntactically co-safe LTL formula over Π is an LTL formula over Π whose only temporal operators are #, 3 and U when written in positive normal form where the negation operator ¬ occurs only in front of atomic propositions [3], [13]. It can be shown that for any syntactically co-safe formula ϕ, there exists a DFA Aϕ that accepts all and only words in pref(ϕ), i.e., L(Aϕ) = pref(ϕ), where pref(ϕ) denote the set of all good prefixes for ϕ [9]. C. Systems and Control Policies We consider the case where each component of the system can be modeled by a deterministic finite transition system, Markov chain or Markov decision process, depending on the characteristics of that component. These different models are defined as follows. Definition 2: A deterministic finite transition system (DFTS) is a tuple T = (S, Act, −→, sinit, Π, L) where • S is a finite set of states, • Act is a finite set of actions, • −→⊆S × Act × S is a transition relation such that for all s ∈S and α ∈Act, |Post(s, α)| ≤1 where Post(s, α) = {s′ ∈S | (s, α, s′) ∈−→}, • sinit ∈S is the initial state, • Π is a set of atomic propositions, and • L : S →2Π is a labeling function. (s, α, s′) ∈−→is denoted by s α −→s′. An action α is enabled in state s if and only if there exists s′ such that s α −→s′. Definition 3: A (discrete-time) Markov chain (MC) is a tuple M = (S, P, ıinit, Π, L) where S, Π and L are defined as in DFTS and • P : S × S →[0, 1] is the transition probability function such that for any state s ∈S, P s′∈S P(s, s′) = 1, and • ıinit : S →[0, 1] is the initial state distribution satisfying P s∈S ıinit(s) = 1. Definition 4: A Markov decision process (MDP) is a tuple M = (S, Act, P, ıinit, Π, L) where S, Act, ıinit, Π and L are defined as in DFTS and MC and P : S×Act×S →[0, 1] is the transition probability function such that for any state s ∈S and action α ∈Act, P s′∈S P(s, α, s′) ∈{0, 1}. An action α is enabled in state s if and only if P s′∈S P(s, α, s′) = 1. Let Act(s) denote the set of enabled actions in s. Given a complete system as the composition of all its components, we are interested in computing a control policy for the system that optimizes certain objectives. We define a control policy for a system modeled by an MDP as follows. Definition 5: Let M = (S, Act, P, ıinit, Π, L) be a Markov decision process. A control policy for M is a function C : S+ →Act such that C(s0s1 . . . sn) ∈Act(sn) for all s0s1 . . . sn ∈S+. Let M = (S, Act, P, ıinit, Π, L) be an MDP and C : S+ →Act be a control policy for M. Given an initial state s0 of M such that ıinit(s0) > 0, an infinite sequence rC M = s0s1 . . . on M generated under policy C is called a path on M if P(si, C(s0s1 . . . si), si+1) > 0 for all i. The subsequence s0s1 . . . sn where n ≥0 is the prefix of length n of rC M. We define PathsC M and FPathsC M as the set of all infinite paths of M under policy C and their finite prefixes, respectively, starting from any state s0 with ıinit(s0) > 0. For s0s1 . . . sn ∈FPathsC M, we let PathsC M(s0s1 . . . sn) denote the set of all paths in PathsC M with the prefix s0s1 . . . sn. The σ-algebra associated with M under policy C is defined as the smallest σ-algebra that contains PathsC M(ˆrC M) where ˆrC M ranges over all finite paths in FPathsC M. It follows that there exists a unique probability measure PrC M on the σ−algebra associated with M under policy C where for any s0s1 . . . sn ∈FPathsC M, PrC M{PathsC M(s0s1 . . . sn)} = ıinit(s0) Q 0≤i 0, let ∥s∥be the length of a shortest path from s to a state in Bp, using only actions in Actmax p . C(s) ∈Actmax p (s) for a state s ∈Sp \ Bp with xs > 0 is then chosen such that Pp(s, C(s), t) > 0 for some t ∈Sp with ∥t∥= ∥s∥−1. For a state s ∈Sp with xs = 0 or a state s ∈Bp, C(s) ∈Actp(s) can be chosen arbitrarily. 2In the original algorithm, all the states s ∈Sp with xs = 1 and all the states that cannot reach Bp under any control policy need to be identified but it has been shown in [16] that this step is not necessary for the correctness of the algorithm. V. INCREMENTAL COMPUTATION OF CONTROL POLICIES Automatic synthesis described in the previous section suffers from the state explosion problem as the composition of T and all M1, . . . , MN needs to be constructed, leading to an exponential blow up of the state space. In this section, we propose an incremental synthesis approach where we progressively compute a sequence of control policies, taking into account only a small subset of the environment agents initially and successively add more agents to the synthesis procedure in each iteration until we hit the computational resource constraints. Hence, even though the complete syn- thesis problem cannot be solved due to the computational resource limitation, we can still obtain a reasonably good control policy. A. Overview of Incremental Computation of Control Policies Initially, we consider a small subset M0 ⊂ {M1, . . . , MN} of the environment agents. For each Mi = (Si, Pi, ıinit,i, Πi, Li) ̸∈ M0, we consider a simplified model ˜ Mi that essentially assumes that the ith environment agent is stationary (i.e., we take into account their presence but do not consider their full model). Formally, ˜ Mi = ({si}, ˜Pi,˜ıinit,i, Πi, ˜Li) where si ∈Si can be chosen arbitrarily, ˜Pi(si, si) = 1, ˜ıinit,i(si) = 1 and ˜Li(si) = Li(si). Note that the choice of si ∈Si may affect the performance of our incremental synthesis algorithm; hence, it should be chosen such that it is the most likely state of Mi. We let ˜M0 = { ˜ Mi | Mi ∈{M1, . . . , MN}\M0}. The composition of T , all Mi ∈M0 and all ˜ Mj ∈˜M0 is then constructed. We let MM0 be the MDP that represents such composition. Note that since ˜ Mi is typically smaller Mi, MM0 is typically much smaller than the composition of T , M1, . . . , MN. We identify all the SCCs of MM0 and their partial order. Following the steps for synthesizing a control policy described in Section IV, we construct MM0 p = MM0 ⊗Aϕ where Aϕ = (Q, 2Π, δ, qinit, F) is a DFA that recognizes the good prefixes of ϕ. We also store the intermediate transition probability function and the intermediate initial state distribution for MM0 p and denote these functions by ˜PM0 p and ˜ıM0 p,init, respectively. At the end of the initialization period (i.e., the 0th it- eration), we obtain a control policy CM0 that maximizes the probability for MM0 to satisfy ϕ. CM0 resolves all nondeterministic choices in MM0 and induces a Markov chain, which we denote by MM0 CM0 . Our algorithm then successively adds more full models of the rest of the environment agents to the synthesis procedure at each iteration. In the (k + 1)th iteration where k ≥ 0, we consider Mk+1 = Mk ∪{Ml} for some Ml ∈ {M1, . . . , MN} \ Mk. Such Ml may be picked such that the probability for MM0 CM0 ||Ml to satisfy ϕ is the minimum among all Mi ∈{M1, . . . , MN} \ Mk. This probability can be efficiently computed using probabilistic verification [3]. (As an MC can be considered a special case of MDP with exactly one action enabled in each state, we can easily adapt the techniques for computing the probability vector of a product MDP described in Section IV-C to compute the probability that MM0 CM0 ||Ml satisfies ϕ.) We let ˜Mk+1 = ˜Mk \ { ˜ Ml} and let MMk+1 be the MDP that represents the composition of T , all Mi ∈Mk+1 and all ˜ Mj ∈ ˜Mk+1. Next, we construct MMk+1 p = MMk+1 ⊗Aϕ and obtain a control policy CMk+1 that maximizes the probability for MMk+1 to satisfy ϕ. Similar to the initialization step, during the construction of MMk+1 p , we store the intermediate transition probability function and the intermediate initial state distribution for MMk+1 p and denote these functions by ˜PMk+1 p and ˜ıMk+1 p,init , respectively. The process outlined in the previous paragraph terminates at the Kth iteration where MK = {M1, . . . , MN} or when the computational resource constraints are exceeded. To make this process more efficient, we avoid unnecessary computation and exploit the objects computed in the previous iteration. Consider an arbitrary iteration k ≥0. In Section V-B, we show how MMk+1 p , ˜PMk+1 p , and ˜ıMk+1 p,init can be incrementally constructed from MMk p , ˜PMk p and ˜ıMk p,init. Hence, we can avoid computing MMk+1. In addition, as previously discussed in Section IV-C, generating an order of SCCs can be computationally expensive. Hence, we only compute the SCCs and their order for MM0 and all Mj ∈{M1, . . . , MN} \ M0, which are typically small. Incremental construction of SCCs of MMk+1 and their order from those of MMk is considered in Section V-C. (Note that we do not compute MMk but only maintain its SCCs and their order, which are incrementally constructed using the results from the previous iteration.) Finally, Section V-D describes computation of CMk, using a method adapted from SCC-based value iteration where we avoid having to identify the SCCs of MMk p and their order. Instead, we exploit the SCCs of MMk and their order, which can be incrementally constructed using the approach described in Section V-C. B. Incremental Construction of Product MDP For an iteration k ≥0, let Mk+1 = Mk ∪{Ml} for some Ml ∈{M1, . . . , MN} \ Mk. In general, one can construct MMk+1 p by first computing MMk+1, which requires taking the composition of a DFTS or an MDP with N MCs, and then constructing MMk+1 ⊗Aϕ. To accelerate the process of computing MMk+1 p , we exploit the presence of MMk p , its intermediate transition probability function ˜PMk p and intermediate initial state distribution ˜ıMk p,init, which are computed in the previous iteration. First, note that a state sp of MMk p is of the form sp = ⟨s, q⟩where s = ⟨s0, s1, . . . , sN⟩∈S0 × S1 × . . . × SN and q ∈Q. For s = ⟨s0, s1, . . . , sN⟩∈S0 × S1 × . . . × SN, i ∈{0, . . . , N} and r ∈Si, we define s|i←r = ⟨s0, . . . , si−1, r, si+1, . . . , sN⟩, i.e., s|i←r is obtained by replacing the ith element of s by r. Lemma 1: Consider an arbitrary iteration k ≥0. Let Mk+1 = Mk ∪{Ml} where Ml ∈{M1, . . . , MN} \ Mk. Suppose MMk p = (SMk p , ActMk p , PMk p , ıMk p,init, ΠMk p , LMk p ) and Ml = (Sl, Pl, ıinit,l, Πl, Ll). Assuming that for any i, j ∈{0, . . . , N}, Πi ∩Πj = ∅, then MMk+1 p = (SMk+1 p , ActMk+1 p , PMk+1 p , ıMk+1 p,init , ΠMk p , LMk+1 p ) where SMk+1 p = {⟨s|l←r, q⟩| ⟨s, q⟩ ∈ SMk p and r ∈ Sl}, ActMk+1 p = ActMk p , ΠMk+1 p = ΠMk p , and for any s = ⟨s0, . . . , sN⟩, s′ = ⟨s′ 0, . . . , s′ N⟩∈S0 × . . . SN and q, q′ ∈Q, • PMk+1 p (⟨s, q⟩, α, ⟨s′, q′⟩) =    ˜PMk+1 p (⟨s, q⟩, α, ⟨s′, q′⟩) if q′ = δ(q, LMk+1 p (⟨s′, q′⟩)) 0 otherwise , where the intermediate transition probability function is given by ˜PMk+1 p (⟨s, q⟩, α, ⟨s′, q′⟩) = Pl(sl, s′ l)˜PMk p (⟨˜s, q⟩, α, ⟨˜s′, q′⟩) (4) for any ⟨˜s, q⟩, ⟨˜s′, q′⟩∈SMk p such that ˜s|l←sl = s and ˜s′|l←s′ l = s′, • ıMk+1 p,init (⟨s, q⟩) =      ˜ıMk+1 p,init (⟨s, q⟩) if q = δ(qinit, LMk+1 p (⟨s, q⟩)) 0 otherwise where the intermediate initial state distribution is given by ˜ıMk+1 p,init (⟨s, q⟩) = ıinit,l(sl)˜ıMk p,init(⟨˜s, q⟩) (5) for any ⟨˜s, q⟩∈SMk p such that ˜s|l←sl = s, and • LMk+1 p (⟨s, q⟩) = LMk p (⟨˜s, q⟩) \ Ll(˜sl)  ∪Ll(sl) for any ⟨˜s, q⟩∈SMk p such that ˜s|l←sl = s. Proof: The correctness of SMk+1 p , ActMk+1 p , ΠMk+1 p and LMk+1 p is straightforward to verify. Hence, we will only provide the proof for the correctness of PMk+1 p and ˜PMk+1 p . The correctness of ıMk+1 p,init and ˜ıMk+1 p,init can be proved in a similar way. Consider an arbitrary iteration k ≥0 and let MMk = (SMk, ActMk, PMk, ıMk init, ΠMk, LMk) and MMk+1 = (SMk+1, ActMk+1, PMk+1, ıMk+1 init , ΠMk+1, LMk+1). It is obvious from the definition of product MDP that PMk+1 p is correct as long as ˜PMk+1 p is correct, i.e., ˜PMk+1 p (⟨s, q⟩, α, ⟨s′, q′⟩) = PMk+1(s, α, s′) for all ⟨s, q⟩, ⟨s′, q′⟩∈SMk+1 p and α ∈ActMk+1 p . Hence, we only need to prove the correctness of ˜PMk+1 p . Assume that ˜PMk p is correct, i.e., ˜PMk p (⟨s, q⟩, α, ⟨s′, q′⟩) = PMk(s, α, s′) for all ⟨s, q⟩, ⟨s′, q′⟩∈SMk p and α ∈ActMk p . Let l be the index such that Mk+1 = Mk ∪{Ml}. Consider arbitrary ⟨s, q⟩, ⟨s′, q′⟩ ∈ SMk+1 p and α ∈ ActMk+1 p . Suppose s = ⟨s0, . . . , sN⟩and s′ = ⟨s′ 0, . . . , s′ N⟩. Note that since ˜ Ml only contains one state, there exists exactly one ⟨˜s, q⟩∈SMk p and exactly one ⟨˜s′, q′⟩∈SMk p such that ˜s|l←sl = s and ˜s′|l←s′ l = s′. Since Mk is the composition of T , all Mi ∈Mk and all ˜ Mj ∈˜Mk and since Ml ̸∈Mk and ˜Pl(·, ·) = 1, it follows that if T is a DFTS, then PMk(˜s, α, ˜s′) =    Y i∈{1,...,N}\{l} Pi(si, s′ i) if s0 α −→s′ 0 0 otherwise , and if T is an MDP, then PMk(˜s, α, ˜s′) = P0(s0, α, s′ 0) Y i∈{1,...,N}\{l} Pi(si, s′ i). Thus, PMk+1(s, α, s′) = Pl(sl, s′ l)PMk(˜s, α, ˜s′). Com- bining this with (4), we get ˜PMk+1 p (⟨s, q⟩, α, ⟨s′, q′⟩) = Pl(sl, s′ l)˜PMk p (⟨˜s, q⟩, α, ⟨˜s′, q′⟩) = Pl(sl, s′ l)PMk(˜s, α, ˜s′) = PMk+1(s, α, s′). By definition, we can conclude that ˜PMk+1 p is correct. C. Incremental Construction of SCCs Consider an arbitrary iteration k ≥0. Let l be the index of the environment agent such that Mk+1 = Mk ∪{Ml}. In this section, we first provide a way to incrementally identify all the SCCs of MMk+1 from all the SCCs of MMk and Ml. We conclude the section with incremental construction of the partial order over the SCCs of MMk+1 from the partial order defined over the SCCs of MMk and Ml. Lemma 2: Let CMk be an SCC of MMk and Cl be an SCC of Ml where Mk+1 = Mk ∪{Ml}. Suppose either of the following conditions holds: Cond 1: |CMk| = 1 and the state in CMk does not have a self-loop in MMk. Cond 2: |Cl| = 1 and the state in Cl does not have a self-loop in Ml. Then, for any s ∈CMk and r ∈Cl, {s|l←r} is an SCC of MMk+1. Otherwise, {s|l←r | s ∈CMk, r ∈Cl} is an SCC of MMk+1. Proof: First, we consider the case where Cond 1 or Cond 2 holds and consider arbitrary s ∈CMk and r ∈Cl. To show that {s|l←r} is an SCC of MMk+1, we will show that there is no path from s|l←r to itself in MMk+1. Since condition (1) or condition (2) holds, either there is no path from s to itself in MMk or there is no path from r to itself in Cl. Assume, by contradiction, that there is a path from s|l←r to itself in MMk+1. Let this path be s|l←r, s1, s2, . . . , sn, s|l←r where for each i ∈ {1, . . . , n}, si = ⟨si 0, . . . , si N⟩. From the proof of Lemma 1, we get that PMk+1(s|l←r, α, s1) = Pl(r, s1 l )PMk(s, α, ˜s1), PMk+1(sn, α, s|l←r) = Pl(sn l , r)PMk(˜sn, α, s) and PMk+1(si, α, si+1) = Pl(si l, si+1 l )PMk(˜si, α, ˜si+1) for all α ∈ActMk+1 where for each i ∈{1, . . . , n}, ˜si ∈SMk such that ˜si|l←si l = si. Since s|l←r, s1, s2, . . . , sn, s|l←r is a path in MMk+1, there exist α0, . . . , αn ∈ ActMk+1 such that PMk+1(s|l←r, α0, s1), PMk+1(sn, αn, s|l←r), PMk+1(si, αi, si+1) > 0 for all i ∈{1, . . . , n}. Thus, it must be the case that Pl(r, s1 l ), Pl(sn l , r), Pl(si l, si+1 l ) > 0 and PMk(s, α, ˜s1), PMk(˜sn, α, s), PMk(˜si, α, ˜si+1) > 0 for all i ∈{1, . . . , n}. But then, r, s1 l , . . . , sn l , r is a path in Cl and s, ˜s1, . . . , ˜sn, s is a path in MMk, leading to a contradiction. Next, consider the case where both Cond 1 and Cond 2 do not hold. To show that CMk+1 = {s|l←r | s ∈CMk, r ∈ Cl} is an SCC of MMk+1, we need to show that for any s, ˜s ∈CMk+1 and any s′ /∈CMk+1, (1) there is a path in MMk+1 from s to ˜s, and (2) there is no path in MMk+1 either from s to s′ or from s′ to s. Both of these statements can be proved by contradiction, using the same reasoning as in the proof above for the case where either Cond 1 or Cond 2 holds. We say that an SCC CMk+1 of MMk+1 is derived from ⟨CMk, Cl⟩, where CMk is an SCC of MMk and Cl is an SCC of Ml, if CMk+1 is constructed from CMk and Cl according to Lemma 2, i.e., CMk+1 = {s|l←r} for some s ∈CMk and r ∈Cl if Cond 1 or Cond 2 in Lemma 2 holds; otherwise, CMk+1 = {s|l←r | s ∈CMk, r ∈Cl}. Lemma 3: For each SCC CMk+1 of MMk+1, there exists a unique ⟨CMk, Cl⟩from which CMk+1 is derived. Proof: Similar to Lemma 1, it can be checked that SMk+1 = {s|l←r | s ∈SMk and r ∈Sl} is the set of states of MMk+1. Consider an arbitrary SCC CMk+1 of MMk+1 and an arbitrary s = ⟨s0, . . . , sN⟩∈CMk+1. By definition, for any arbitrary SCC CMk of MMk and arbitrary SCC Cl of Ml, CMk+1 is derived from ⟨CMk, Cl⟩ only if sl ∈Cl and there exist s′ ∈CMk such that s′|l←sl = s. But since ˜ Ml contains exactly one state, there exists a unique s′ ∈SMk such that s′|l←sl = s. Also, from the definition of SCC, there exist a unique SCC CMk of MMk and a unique SCC Cl of Ml such that s′ ∈CMk and sl ∈Cl. Thus, it cannot be the case that CMk+1 is derived from ⟨˜CMk, ˜Cl⟩where ˜CMk ̸= CMk or ˜Cl ̸= Cl. Applying Lemma 2, we get that there exists an SCC ˜CMk+1 of MMk+1 that is derived from ⟨CMk, Cl⟩and contains s. Since s ∈CMk+1 and s ∈˜CMk+1, from the definition of SCC, it must be the case that CMk+1 = ˜CMk+1; thus, CMk+1 must be derived from ⟨CMk, Cl⟩. Lemma 2 and Lemma 3 provide a way to generate all the SCCs of MMk+1 from all the SCCs of MMk and Ml as formally stated below. Corollary 1: The set of all the SCCs of MMk+1 is given by  CMk+1 derived from ⟨CMk, Cl⟩| CMk is an SCC of MMk and Cl is an SCC of Ml . Finally, in the following lemma, we provide a necessary condition, based on the partial order over the SCCs of MMk and Ml, for the existence of the partial order between two SCCs of MMk+1. Lemma 4: Let CMk+1 1 and CMk+1 2 be SCCs of MMk+1. Suppose CMk+1 1 is derived from ⟨CMk 1 , Cl 1⟩and CMk+1 2 is derived from ⟨CMk 2 , Cl 2⟩where CMk 1 and CMk 2 are SCCs of MMk and Cl 1 and Cl 2 are SCCs of Ml. Then, CMk+1 1 ≺MMk+1 CMk+1 2 only if CMk 1 ≺MMk CMk 2 and Cl 1 ≺Ml Cl 2. Proof: Consider the case where CMk+1 1 ≺MMk+1 CMk+1 2 . By definition, Succ(CMk+1 2 ) ∩CMk+1 1 ̸= ∅. Con- sider a state s′ = ⟨s′ 0, . . . , s′ N⟩∈Succ(CMk+1 2 ) ∩CMk+1 1 . Since s′ ∈Succ(CMk+1 2 ), there exists s = ⟨s0, . . . , sN⟩∈ CMk+1 2 and α ∈ActMk+1 such that PMk+1(s, α, s′) > 0. But from the proof of Lemma 1, PMk+1(s, α, s′) = Pl(sl, s′ l)PMk(˜s, α, ˜s′) where ˜s and ˜s′ are unique states in SMk such that ˜s|l←sl = s and ˜s′|l←s′ l = s′. Thus, it must be the case that Pl(sl, s′ l) > 0 and PMk(˜s, α, ˜s′) > 0. In addition, since CMk+1 1 is derived from ⟨CMk 1 , Cl 1⟩and CMk+1 2 is derived from ⟨CMk 2 , Cl 2⟩, from Lemma 2 and Lemma 3, it must be the case that ˜s ∈CMk 2 , ˜s′ ∈CMk 1 , sl ∈Cl 2 and s′ l ∈Cl 1. Since ˜s ∈CMk 2 , ˜s′ ∈CMk 1 and PMk(˜s, α, ˜s′) > 0, we can conclude that ˜s′ ∈Succ(CMk 2 )∩ CMk 1 , and therefore, by definition, CMk 1 ≺MMk CMk 2 . Similarly, since sl ∈Cl 2, s′ l ∈Cl 1 and Pl(sl, s′ l) > 0, we can conclude that s′ l ∈Succ(Cl 2) ∩Cl 1, and therefore, by definition, Cl 1 ≺Ml Cl 2. D. Computation of Probability Vector and Control Policy for MMk p from SCCs of MMk Consider an arbitrary iteration k ≥ 0 and the associated product MDP MMk p = (SMk p , ActMk p , PMk p , ıMk p,init, ΠMk p , LMk p ). Similar to the SCC-based value iteration, we want to generate a partition {DMk p,1 , . . . , DMk p,mk} of SMk p with a partial order ≺M Mk p such that DMk p,j ≺M Mk p DMk+1 p,i if Succ(DMk p,i ) ∩DMk p,j ̸= ∅. However, we relax the condition that each DMk p,i , i ∈{1, . . . , mk} is an SCC of MMk p and only require that if DMk p,i contains a state in an SCC CMk p of MMk p , then it has to contain all the states in CMk p . Hence, DMk i may include all the states in multiple SCCs of MMk p . The following lemmas provide a method for constructing {DMk 1 , . . . , DMk mk } and their partial order from SCCs of MMk and their partial order, which can be incrementally constructed as described in Section V-C. Lemma 5: Let CMk p be an SCC of MMk p . Then, there exists a unique SCC CMk of MMk such that CMk p ⊆ CMk × Q. Proof: This follows from the definition of product MDP that for any s, s′ ∈SMk and q, q′ ∈Q, there is a path from ⟨s, q⟩to ⟨s′, q′⟩in MMk p only if there is a path from s to s′ in MMk. Lemma 6: Let CMk p and ˜CMk p be SCCs of MMk p . Sup- pose CMk and ˜CMk are unique SCCs of MMk such that CMk p ⊆CMk × Q and ˜CMk p ⊆ ˜CMk × Q. Then, CMk p ≺M Mk p ˜CMk p only if CMk ≺MMk ˜CMk. Proof: This follows from the definition of product MDP since for any ⟨s, q⟩∈SMk p , ⟨˜s, ˜q⟩∈SMk p is a successor of ⟨s, q⟩in MMk p only if ˜s is a successor of s in MMk. Lemma 7: Let CMk 1 , . . . , CMk mk be all the SCCs of MMk and for each i ∈{1, . . . , mk}, let DMk p,1 = CMk i × Q. Then, {DMk p,1 , . . . , DMk p,mk} is a partition of SMk p . In addition, the following statements hold for all i, j ∈{1, . . . , mk}. • If DMk p,i contains a state in an SCC CMk p of MMk p , then it contains all the states in CMk p . • Succ(DMk p,i ) ∩DMk p,j ̸= ∅only if CMk j ≺MMk CMk i . Proof: Consider arbitrary i, j ∈{1, . . . , mk}. It follows directly from Lemma 5 that if DMk p,i contains a state in an SCC CMk p of MMk p , then it contains all the states in CMk p . Next, consider the case where Succ(DMk p,i ) ∩DMk p,j ̸= ∅. Then, from Lemma 5, there exist SCCs CMk p,i ⊆DMk p,i and CMk p,j ⊆DMk p,j of MMk p such that Succ(CMk p,i ) ∩CMk p,j ̸= c4 c3 c0 c1 c2 Fig. 1. The road and its partition used in the autonomous vehicle example. ∅. Thus, CMk p,j ≺M Mk p CMk p,i . Applying Lemma 6, we get CMk j ≺MMk CMk i . Applying Lemma 7, we generate a partition {DMk p,1 , . . . , DMk p,mk} of SMk p where for each i ∈{1, . . . , mk}, DMk p,1 = CMk i × Q and CMk 1 , . . . , CMk mk are all the SCCs of MMk. A partial order ≺M Mk p over this partition is defined such that DMk p,j ≺M Mk p DMk+1 p,i if CMk j ≺MMk CMk i . Hence, an order OMk p among DMk p,1 , . . . , DMk p,mk can be simply derived from the order of CMk 1 , . . . , CMk mk , which can be incrementally constructed based on Lemma 4. This order OMk p has the property that the probability values of states in DMk p,j that appears after DMk p,i in OMk p cannot affect the probability values of states in DMk p,i . Hence, we can follow the SCC-based value iteration and process each DMk p,i separately, according to the order in OMk p to compute the probability xs for all s ∈DMk p,i . Finally, we generate a memoryless control policy CMk from the probability vector (xs)s∈S Mk p as described at the end of Section IV. VI. EXPERIMENTAL RESULTS Consider, once again, the autonomous vehicle problem described in Example 1 and Example 2. Suppose the road is discretized into 5 cells c0, . . . , c4 where c2 is the pedestrian crossing area as shown in Figure 1. The vehicle starts in cell c0 and has to reach cell c4. There are 5 pedestrians, modeled by MCs M1, . . . , M5, initially at cell c1. The models of the vehicle and the pedestrians are shown in Figure 2. A DFA Aϕ that accepts all and only words in pref(ϕ) where ϕ =  ¬ W i≥1,j≥0(c0 j ∧ci j)  U c0 4 is shown in Figure 3. First, we apply the LP-based, value iteration and SCC- based value iteration techniques described in Section IV to synthesize a control policy that maximizes the probability that the complete system M = T ||M1||M2|| . . . ||M5 satisfies ϕ. The time required for each step of computation is summarized in Table I. All the approaches yield the probability of 0.8 that M satisfies ϕ under the synthesized control policy. The comparison of the total computation time required for these different approaches is shown in Figure 4. As discussed in Section IV-C, although the SCC-based value iteration itself takes significantly less computation time than the LP-based technique or value iteration, the time spent in identifying SCCs and their order renders the total c0 start c2 c4 α1 α2 α1 α2 α1 (a) The vehicle model T c1 start c2 c3 0.6 0.4 0.2 0.8 1 (b) The pedestrian models M1, . . . , M4 c1 start c2 c3 0.6 0.4 0.2 0.4 0.4 0.6 0.4 (c) The pedestrian model M5 Fig. 2. The models of vehicle and pedestrians. q0 start q1 q2 ¬col ∧¬c0 4 c0 4 col ∧¬c0 4 True True Fig. 3. A DFA Aϕ that recognizes the prefixes of ϕ = ¬col U c0 4 where col is defined as col = W i≥1,j≥0(c0 j ∧ci j). q1 is the accepting state. computation time of the SCC-based value iteration more than the other two approaches. Technique Mp SCCs & order of Mp Prob vector Control policy Total LP 156.3 - 8.8 6.8 171.9 Value iteration 156.3 - 31.3 6.8 194.4 SCC-based value iteration 156.3 71.1 1.9 6.8 236.1 TABLE I TIME REQUIRED (IN SECONDS) FOR COMPUTING VARIOUS OBJECTS USING DIFFERENT TECHNIQUES WHEN THE FULL MODELS OF ALL THE ENVIRONMENT AGENTS ARE CONSIDERED. Next, we apply the incremental technique where we progressively compute a sequence of control policies as more agents are added to the synthesis procedure in each iteration as described in Section V. We let M0 = ∅, M1 = {M1}, M2 = {M1, M2}, . . ., M6 = {M1, . . . , M5}, i.e., we successively add each pedestrian M1, M2, . . . , M5, respectively, in each iteration. We consider 2 cases: (1) no incremental construction of various objects is employed (i.e., when MMk+1 and MMk+1 p , k ≥0 are computed from scratch in every iteration), and (2) incremental construction of various objects as described in Section V-B–V-D is 0 50 100 150 200 0 0.2 0.4 0.6 0.8 1 Computation time (s) Probability of satisfying the specification Solving full problem using LPïbased approach Solving full problem using value iteration Solving full problem using SCCïbased value iteration Successively adding agents, without incremental construction of product MDP and SCCs Successively adding agents, with incremental construction of product MDP and SCCs Fig. 4. Comparison of the computation time and the probability for the system to satisfy the specification computed using different techniques. applied. For the first case, we apply the LP-based technique to compute the probability vector as it has been shown to be the fastest technique when applied to this problem, taking into account the required pre-computation, which needs to be done in every iteration. For both cases, 6 control policies CM0, . . . , CM5 are generated for MM0, . . . , MM5, respec- tively. For each policy CMk, we compute the probability PrCMk M (ϕ) that the complete system M satisfies ϕ under policy CMk. (Note that CMk, when applied to M, is only a function of states of Mi ∈Mk since it assumes that the other agents Mj ̸∈Mk are stationary.) These proba- bilities are given by PrCM0 M (ϕ) = 0.08, PrCM1 M (ϕ) = 0.46, PrCM2 M (ϕ) = 0.57, PrCM3 M (ϕ) = 0.63, PrCM4 M (ϕ) = 0.67 and PrCM5 M (ϕ) = 0.8. The comparison of the cases where the incremental con- struction of various objects is not and is employed is shown in Figure 4. A jump in the probability occurs each time a new control policy is computed. The time spent during each step of computation is summarized in Table II and Table III for the first and the second case, respectively. Notice that the time required for identifying the SCCs and their order when the incremental approach is applied is significantly less than when the full model of all the pedestrians is considered in one shot since MM0, M1, . . . , M5, each of which contains 3 states, are much smaller than Mp, which contains 2187 states. From Figure 4, our incremental approach is able to obtain an optimal control policy faster than any other techniques. This is mainly due to the efficiency of our incremental construction of SCCs and their order. In addition, we are able to obtain a reasonable solution, with 0.67 probability of satisfying ϕ, within 12 seconds while the maximum probability of satisfying ϕ is 0.8, which requires 160 seconds of computation (or 171.9 seconds without employing the incremental approach). Iteration MMk MMk p Prob vector Control policy Total 0 0.0064 0.0185 0.0464 0.0084 0.08 1 0.0123 0.0762 0.0203 0.0104 0.12 2 0.0154 0.3383 0.0231 0.0296 0.41 3 0.0357 1.7055 0.0542 0.1503 1.95 4 0.1393 9.1950 0.2155 0.7975 10.35 5 3.1836 152.86 8.2302 6.8938 171.17 TABLE II TIME REQUIRED (IN SECONDS) FOR COMPUTING VARIOUS OBJECTS IN EACH ITERATION WHEN INCREMENTAL CONSTRUCTION IS NOT APPLIED. Iter- ation MM0 SCCs & order of MM0, M1, . . . , M5 MMk p , partition & order Prob vector Control policy Total 0 0.0055 0.0043 0.0203 0.0112 0.0036 0.04 1 - - 0.0726 0.0102 0.0087 0.09 2 - - 0.3239 0.0193 0.0282 0.37 3 - - 1.6036 0.0567 0.1424 1.80 4 - - 8.6955 0.1876 0.7755 9.66 5 - - 139.27 1.6122 7.0125 147.89 TABLE III TIME REQUIRED (IN SECONDS) FOR COMPUTING VARIOUS OBJECTS IN EACH ITERATION WHEN INCREMENTAL CONSTRUCTION IS APPLIED. VII. CONCLUSIONS AND FUTURE WORK An anytime algorithm for synthesizing a control policy for a robot interacting with multiple environment agents with the objective of maximizing the probability for the robot to satisfy a given temporal logic specification was proposed. Each environment agent is modeled by a Markov chain whereas the robot is modeled by a finite transition system (in the deterministic case) or Markov decision process (in the stochastic case). The proposed algorithm progressively computes a sequence of control policies, taking into account only a small subset of the environment agents initially and successively adding more agents to the synthesis procedure in each iteration until we hit the constraints on computational resources. Incremental construction of various objects needed to be computed during the synthesis procedure was proposed. Experimental results showed that not only we obtain a reasonable solution much faster than existing approaches, but we are also able to obtain an optimal solution faster than existing approaches. Future work includes extending the algorithm to handle full LTL specifications. This direction appears to be promis- ing because the remaining step is only to incrementally construct accepting maximal end components of an MDP. We are also examining an effective approach to determine an agent to be added in each iteration. As mentioned in Section V-A, such an agent may be picked based on the result from probabilistic verification but this comes at the extra cost of adding the verification phase. REFERENCES [1] E. A. Emerson, “Temporal and modal logic,” Handbook of Theoretical Computer Science (Vol. B): Formal Models and Semantics, pp. 995– 1072, 1990. [2] Z. Manna and A. Pnueli, The temporal logic of reactive and concurrent systems. Springer-Verlag, 1992. [3] C. Baier and J.-P. Katoen, Principles of Model Checking (Represen- tation and Mind Series). The MIT Press, 2008. [4] G. Fainekos, H. Kress-Gazit, and G. Pappas, “Temporal logic motion planning for mobile robots,” in IEEE International Conference on Robotics and Automation, pp. 2020–2025, 2005. [5] H. Kress-Gazit, G. Fainekos, and G. Pappas, “Where’s Waldo? Sensor- based temporal logic motion planning,” in IEEE International Confer- ence on Robotics and Automation, pp. 3116–3121, 2007. [6] C. Belta, A. Bicchi, M. Egerstedt, E. Frazzoli, E. Klavins, and G. Pappas, “Symbolic planning and control of robot motion [grand challenges of robotics],” IEEE Robotics & Automation Magazine, vol. 14, no. 1, pp. 61–70, 2007. [7] D. Conner, H. Kress-Gazit, H. Choset, A. Rizzi, and G. Pappas, “Valet parking without a valet,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007, pp. 572–577, 2007. [8] S. Karaman and E. Frazzoli, “Sampling-based motion planning with deterministic µ-calculus specifications,” in Proc. of IEEE Conference on Decision and Control, 2009. [9] A. Bhatia, L. E. Kavraki, and M. Y. Vardi, “Sampling-based motion planning with temporal goals,” in IEEE International Conference on Robotics and Automation (ICRA), pp. 2689–2696, 2010. [10] X. C. Ding, S. L. Smith, C. Belta, and D. Rus, “LTL control in uncertain environments with probabilistic satisfaction guarantees,” in IFAC World Congress, 2011. [11] A. I. Medina Ayala, S. B. Andersson, and C. Belta, “Temporal logic control in dynamic environments with probabilistic satisfaction guar- antees,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007, pp. 3108–3113, 2011. [12] H. Kress-Gazit, T. Wongpiromsarn, and U. Topcu, “Correct, reactive robot control from abstraction and temporal logic specifications,” Special Issue of the IEEE Robotics & Automation Magazine on Formal Methods for Robotics and Automation, vol. 18, pp. 65–74, 2011. [13] O. Kupferman and M. Y. Vardi, “Model checking of safety properties,” Formal Methods in System Design, vol. 19, pp. 291–314, 2001. [14] T. Latvala, “Efficient model checking of safety properties,” in Model Checking Software. 10th International SPIN Workshop, pp. 74–88, Springer, 2003. [15] F. Ciesinski, C. Baier, M. Gr¨oßer, and J. Klein, “Reduction techniques for model checking markov decision processes,” in Proceedings of the 2008 Fifth International Conference on Quantitative Evaluation of Systems, pp. 45–54, 2008. [16] M. Kwiatkowska, D. Parker, and H. Qu, “Incremental quantitative ver- ification for markov decision processes,” in IEEE/IFIP International Conference on Dependable Systems & Networks, pp. 359–370, 2011. [17] R. Tarjan, “Depth-first search and linear graph algorithms,” SIAM Journal on Computing, vol. 1, pp. 146–160, 1972.