Control with Probabilistic Signal Temporal Logic Chanyeol Yoo and Calin Belta Abstract— Autonomous agents often operate in uncertain environments where their decisions are made based on beliefs over states of targets. We are interested in controller synthesis for complex tasks defined over belief spaces. Designing such controllers is challenging due to computational complexity and the lack of expressivity of existing specification languages. In this paper, we propose a probabilistic extension to signal temporal logic (STL) that expresses tasks over continuous belief spaces. We present an efficient synthesis algorithm to find a control input that maximises the probability of satisfying a given task. We validate our algorithm through simulations of an unmanned aerial vehicle deployed for surveillance and search missions. I. INTRODUCTION In recent years, there has been an increased interest in using formal methods in robot motion planning and control [1]–[4]. Temporal logics, such as Linear Temporal Logic (LTL), Computation Tree Logic (CTL), and their proba- bilistic versions [5], [6] have been shown to be expressive enough to capture a large spectrum of robotic missions. Model checking and synthesis algorithms have been success- fully to generate motion plans and control strategies from such specifications. Most of the current works, however, do not capture the uncertainty that is inherent in real- world applications. Autonomous agents usually operate in uncertain environments with limited and possibly corrupted information. In this paper, we propose a specification language called probabilistic Signal Temporal Logic (PrSTL), which is a probabilistic extension of an existing temporal logic, called Signal Temporal Logic (STL) [7]. The specifications are interpreted over a belief space (the space of probability distributions over states of an environment) [8], [9] about the locations of a set of targets. We propose a receding horizon control strategy that maximizes the probability of satisfying the specification. The procedure involves iterations consisting of observations and belief updates using Bayes’ rule. We include illustrative simulation examples involving surveillance and search and rescue. Contribution and Related Work Temporal logics have been used widely in robotic task planning [1], [2], [10], [11]. One of the most popular logic is linear temporal logic (LTL), which provides an expressive mean to specify complex robotic tasks such as converge, sequencing, conditions and avoidance [11]. These tasks can be combined further to The authors are with the Department of Mechanical Engineering, Boston University, Boston, MA 02215, USA {chanyeol,cbelta}@bu.edu. This work was partially supported by the NSF under grants NRI-1426907 and CMMI-1400167. This work is currently under review in the 2016 Americal Control Conference (ACC 2016, submitted on September, 30, 2015). form more complex tasks. Given an LTL specification, a provably-correct controller can be automatically generated for finite systems by using existing synthesis algorithms [5]. Examples of robotics applications include [12]–[18]. These works consider static environments and assume that the robots have full knowledge over the environment. Hence, the controllers have to be re-synthesised from scratch if changes are made. Also, the correctness property may be violated if the knowledge over the environment is not accurate. In order to operate robots in dynamic environments, a fragment of LTL called generalized reactivity (GR(1)) has been used to synthesise reactive controllers [11], [19], [20]. This game-theoretic approach considers non-deterministic changes in the environment and guarantees the satisfaction under all allowed environment changes. However, since all robot and environment behaviours are symbolically encoded as part of an LTL formula, any unexpected changes in dynamics cannot be captured during execution and complex system dynamics cannot be described accurately. This ap- proach also assumes that the robots have full knowledge of the environment at all times but the assumption is easily violated in practice. Furthermore, this method only considers the worst case. Hence, the solution is often conservative and may not find a solution at all for practical cases. Instead of modelling the behaviour of environments by non-determinism, probabilistic uncertainty is also considered in robotic task planning problems. Two popular temporal logic forms for this purpose are probabilistic computation tree logic (PCTL) [18], [21]–[23] and probabilistic linear temporal logic (pLTL) [14]. These logics are capable of expressing tasks for systems with probabilistic transitions while assuming that the transition results are precisely known and that the regions of interest are static and known in advance. Their semantics is defined over Markov decision processes (MDPs). The objective is to find a control policy that maximises the probability of satisfying a given specifi- cation. Extended work considers non-static environments where the behaviour of adversarial environmental states are also modelled by MDPs [20]. Even further, in [24], environmental states are modelled by mixed observable MDPs where some internal transitions are not visible from the outside. The solution to this problem considers the beliefs on the inter- nal states of the environment. Partially observable MDPs (POMDPs) allows for more general abstraction of hidden internal transitions. Even though recent results show that LTL control synthesis over control strategies with finite memory is decidable [25], the solutions are expensive and do not capture environmental changes. arXiv:1510.08474v1 [cs.SY] 28 Oct 2015 Signal temporal logic (STL) is a time-bounded temporal logic developed monitor systems with continuous dynamics. Its semantics are defined over continuously valued sig- nals [7]. A satisfaction of an STL specification is determined by evaluating its degree of robustness: a measure of how well a signal satisfies the given specification. To the best of our knowledge, no probabilistic forms of STL exist. In order to specify tasks over an uncertain environment using STL and quantitatively evaluate a signal with respect to a specification, one possible approach is to compute the expected degree of robustness (i.e., average robustness over an infinite set of signals). However, since computing robustness for temporal operators requires min and max operations, analytical evaluation of the expected robustness is not trivial. Also, using such an approach would require the environment to be deterministic while transitions are assumed to be probabilistic. Our interest in this paper is in estimating true states of targets in an environment. Since the estimates over the target states are given in the form of probability distributions, it makes more sense to evaluate signals in terms of probability, not robustness. We propose an extension to STL that can express tasks over beliefs of targets in an environment. We then use this extension to evaluate a probabilistic degree of satisfaction. One of the main challenges in task planning with temporal logic under uncertainty is the computational complexity of solving the synthesis problem. The time complexity of finding an optimal solution for a pLTL specification is doubly exponential in the number of propositions [14]. Synthesis of an STL formula also suffers from complexity blowup. Recent work in synthesis from STL specifications suggests using mixed integer linear program (MILP) [26] and receding horizon control (RHC) methods. However, since MILP is NP-hard, such algorithms are not scalable when the size of the problem is large (i.e., the number of constraints, length of formula, and time horizon). Organisation The remainder of the paper is organised as follows. Sec. II presents the problem statement and defines the system and sensor models. In Sec. III, we define the proposed probabilistic signal temporal logic (PrSTL). In Sec. IV, we present an efficient synthesis algorithm from a PrSTL formula. We then discuss the complexity of our approach in Sec. V and present simulation examples in Sec. VI. We conclude in Sec. VII. II. PROBLEM FORMULATION We consider a discrete-time dynamic agent of the form xt+1 = f(xt, ut), (1) where xt ∈Rn is the continuous-valued n-dimensional state of the agent at discrete time t, ut ∈U is the control input from a finite set U at discrete time t. We assume that the sampling time ∆t is 1 (i.e., t ∈{0, 1, 2, · · · }) and the state of the agent is always fully known. We define a run x as a sequence of agent states at time t where the prefix Xt is a sequence of past states xk (i.e., past run) and the suffix ¯X is a sequence of future states ¯xk (i.e., future run) given a control sequence u. At time t, we have x = {Xt, ¯X} given u, where Xt = {x0, x1, · · · , xt}, ¯X = f(xt, u) = {¯xt+1, ¯xt+2, · · · , ¯xt+|u|}. Note that X0 = {x0}. The agent is assigned a complex task ψ associated with a set of targets. The state of target i is in the form ˆxi t+1 = gi(ˆxi t, ˆui t), (2) where the true ni-dimensional state of the target ˆxi t ∈Rni evolves over time, ˆui t ∈Ui is a hidden control input from a finite set Ui, i ∈{1, 2, · · · , I} and the number of targets I is finite and known in advance. We assume that the state of the agent is independent of the states of the targets. We also assume that the agent knows the model gi(·) but the exact states of the targets are not precisely known. Instead, the agent maintains a belief of each target at time t defined as bi t ≜P(ˆxi t | Zi t), where Zi t = {zi 0, zi 1, · · · , zi t} is the history of observations made by sensors on the agent and zi t ∈{0, 1} (i.e., 1 if the target i is observed at time t and 0 otherwise). The belief is updated using a state estimator when an observation is made. Assuming that observation zi t is inde- pendent of the history Zi t−1 given the true state of a target ˆxi t (i.e., P(zi t, Zi t−1 | ˆxi t) = P(zi t | ˆxi t) · P(Zi t−1 | ˆxi t)), the belief can be updated using Bayes’ rule: P(ˆxi t+1 | Zi t+1) = α P(zi t+1 | ˆxi t+1) P(ˆxi t | Zi t), (3) where α is a normalising constant. The function P(zi t | ˆxi t) is the detection likelihood which is obtained from a sensor model. Assuming conditional independence where observations are independent of each other given the current state, only the current observation is required to update the belief. The no detection likelihood is the complement of the detection likelihood (i.e., P(¯zi t | ˆxi t) = 1 −P(zi t | ˆxi t)). The task ψ assigned to the agent is specified using a time-bounded temporal logic over a set of real-valued target beliefs. An example of such a specification is the agent has to find two targets in 20 time steps while avoiding an obstacle. Once all the targets are found, the agent has to come back to base in 10 time steps. The probability of finding each target has to be greater than 50% at all time. This logic allows for computing the probability of satisfaction given a sequence of agent states over the target beliefs. In Sec. III, we define such a specification language formally. In this paper, we address the following controller synthesis problem over a belief space. Problem 1 (Receding horizon feedback controller synthesis over belief space). Given a specification ψ over a finite time horizon H, a past run Xt, a system of the form in (1), targets of the form in (2), a state estimation model of the form in (3) and beliefs over the targets Bt = {bi t | i = 1, 2, · · · , I}, compute uH−t t = arg max u∈UH−t Prob({Xt, f(xt, u)}, ψ, 0), (4) where uH−t t = {ut, ut+1, · · · , uH−t} is a finite sequence of control inputs, Prob is a function that returns the probability of satisfying a specification ψ at time t = 0 given a run {Xt, f(xt, u)}. The problem is solved using a receding horizon control (RHC) framework. RHC is an iterative control technique to solve optimisation problems in which an optimal control input over a fixed finite time horizon is determined at each time step [4], [26]–[29]. At each time t, we compute a sequence of control inputs that maximises an objective function over a finite horizon H (i.e., between t and t + H) and the first control input is chosen and executed. We repeat this process until the mission is complete. We use RHC to reduce the synthesis complexity and to rapidly react to changes in belief space. A. Examples We present examples using an unmanned aerial vehicle (UAV). Consider a UAV operating at a constant altitude and airspeed va that can be described by an equation of the form (1) xt+1 =   xt+1 yt+1 θt+1  =   va cos θt va sin θt u  + xt, (5) where θt is the heading angle at time t (minutes) in an absolute Cartesian space and u is a control input from a finite set U. The UAV is equipped with a noisy forward-facing camera that can detect the presence of a target within the viewing range without any distance or heading information. A target is said to be in the view when it is within the effective measuring distance (20m) and the angle of view (60 deg). The detection likelihood of the camera (i.e., the probability of detecting a target i at time t given a system state xt and a true state of the target ˆxi t) is P(zi t | ˆxi t, xt) = ( α · exp(−∥xt−ˆxi t∥2 λ ) if in the view, 0 otherwise, (6) where α and λ are parameters of the camera. Note that P(zi t | ˆxi t, xt) = P(zi t | ˆxi t) when P(xt | ˆxi t) = P(xt) (i.e., the agent state is independent of all the target states). Example 1 (Surveillance). The UAV is required to survey three hidden targets ˆxi t where i = {1, 2, 3}. The task is to repeatedly find each of the targets over a certain horizon. The UAV is initially given a probabilistic estimate of the targets (i.e., where they are), and the estimate is updated according to the dynamic model of the targets. Example 2 (Prioritised search). The UAV is deployed in a search mission to find two suspects ˆxi t where i = {Tom, Jerry} hiding in mountains. Based on geographic data, etc, local police has computed rough probabilistic estimates of where the suspects would be. Tom is given higher capture priority, hence the UAV is commanded to find Tom first and then to find Jerry. III. PROBABILISTIC SIGNAL TEMPORAL LOGIC (PRSTL) In this section, we propose a probabilistic extension of signal temporal logic (STL) [7] called probabilistic signal temporal logic (PrSTL). PrSTL is defined with respect to a discrete-time continuous-valued signal x (i.e., a run). For any sequence s, s[k] is the suffix from time k (i.e., s[k] = {st′ | t′ ≥k}), s(i) is i-th term of a sequence s, where s(0) is the first term, s(last) is the last term and |s| is the cardinality of the sequence. For instance, we have sH t [t+2] = st+2st+3 · · · and sH t (0) = st. The syntax of PrSTL is defined as φ ::= ⊤| ¬φ | φ ∧φ | φ1 U[t1,t2]φ2 | P∼λ[ψ] ψ ::= µ | ¬ψ | ψ ∧ϕ | F[t1,t2]ψ | G[t1,t2]ψ, (7) where ⊤is a Boolean constant for ‘true’, ¬ is a negation (‘not’), ∧is a conjunction (‘and’) ∼∈{<, ≤, ≥, >}, λ ∈ [0, 1], U is the ‘Until’ temporal operator, F is the ‘in Future’ temporal operator, G is the ‘Globally’ temporal operator, t1, t2 ∈[0, ∞) such that t2 ≥t1, µ is a predicate over a real valued function of x[t] (s.t. µ := r(x[t]) with r : Rn →B) and ϕ ∈{φ, ψ}. In this paper, we define two types of temporal logic formulas: event and instance formulas. An event formula ψ is specified over target beliefs given a run. Thus, a satisfaction of an event formula can be specified probabilistically. The probabilistic degree of satisfying an event formula ψ given a run x and beliefs at time t is computed using a function Prob(x, ψ, t). On the other hand, an instance formula φ is defined over a sequence of truth values. Hence, a satisfaction can be known deterministically for a given sequence of agent states. We use P∼λ[·] operator to determine if the probability of satisfying a given event for- mula holds true for ∼λ. For a synthesis problem where the objective is to either maximise or minimise the probability, we use special notations Pmax and Pmin respectively. The semantics of PrSTL instance formulas are recursively defined as x[t] |= ⊤,∀t x[t] |= ¬φ ⇐⇒x[t] ̸|= φ x[t] |= φ1 ∧φ2 ⇐⇒x[t] |= φ1 and x[t] |= φ2 x[t] |= φ1 U[t1,t2]φ2 ⇐⇒∃t′ ∈[t1, t2] s.t. xt′ |= φ2 and ∀t′′ ∈[t1, t′ −1], xt′′ |= φ1 x[t] |= P∼λ[ψ] ⇐⇒Prob(x, ψ, t) ∼λ, (8) The satisfaction of a given event formula is measured probabilitically as Prob(x, µ, t) = f µ(xt) Prob(x, ¬ψ, t) = 1 −Prob(x, ψ, t) Prob(x, ψ1 ∧ψ2, t) = Prob(x, ψ1, t) · Prob(x, ψ2, t) Prob(x, ψ ∧φ, t) = ( Prob(x, ψ, t) if x[t] |= φ, 0 otherwise, Prob(x, G[t1,t2]ψ, t) = Y t′∈[t1,t2] Prob(x, ψ, t′) Prob(x, F[t1,t2]ψ, t) = 1 − Y t′∈[t1,t2] (1 −Prob(x, ψ, t′)), (9) where f µ : Rn × Rnµ → R. Given a run x = {x0, · · · , xt, ¯xt+1, · · · }, Prob(x, µ, k) = zµ k ∈{0, 1} if k ≤t. Otherwise, Prob(x, µ, k) = P(ˆxµ k | Zµ k). From the existing operators, additional operators can be derived: ϕ1 ∨ϕ2 = ¬(¬ϕ1 ∧¬ϕ2) ϕ1 ⇒ϕ2 = ¬ϕ1 ∨ϕ2 F[t1,t2]ϕ = ⊤U[t1,t2]ϕ G[t1,t2]ϕ = ¬F[t1,t2]¬ϕ, (10) where ⇒is an implication (i.e., if ϕ1, then ϕ2), F is a temporal operator for ’sometime in future’ (eventually), and G is a temporal operator for ‘globally’ (always). Every PrSTL formula has a horizon length denoted as hrz(ϕ) ∈N0 (i.e., non-negative integer) [30]. The horizon length is the minimum length in time of a signal (i.e., a run of an agent) required to evaluate the signal against a given specification ϕ. The horizon length can be computed resursively as hrz(µ) = 0 hrz(¬ϕ) = hrz(ϕ) hrz(ϕ1 ∧ϕ2) = max{hrz(ϕ1), hrz(ϕ2)} hrz(ϕ1U[t1,t2]ϕ2) = t2 + max{hrz(ϕ1) −1, hrz(ϕ2)}. (11) Using PrSTL, Examples 1 and 2 can be re-written as the following. Example 1 (cont.). The surveillance mission can be re- written as G[0,30](F[0,40]µ1 ∧F[0,40]µ2 ∧F[0,40]µ3), (12) where µ1, µ2 and µ3 are the predicates for targets in the area. Over 30 minutes, each target has to be located repeatedly every 40 minutes. The horizon length of the mission is 70. Example 2 (cont.). The search mission can be re-written as F[0,60]µT om ∧G[0,60](P=1[µT om] ⇒F[0,30]µJerry), (13) where µT om and µJerry are the predicates for Tom and Jerry respectively. In this mission, Tom has to be found in 60 minutes. Whenever Tom is located, Jerry needs to be found in 40 minutes. The horizon length of the mission is 90. TABLE I: Evaluations of a formula G[0,1]F[0,3]µ over trajec- tories x, x3 and x5. Approximated numbers are shown in shaded cells. Note that ψF = F[0,3]µ and ψG = G[0,1]ψF. Time t 0 1 2 3 4 5 Prob(x, µ, t) 0.8 0.7 0.5 0.6 0.6 0.7 Prob(x, ψF, t) 0.988 0.976 0.976 · · · · · · · · · Prob(x, ψG, t) 0.964 0.953 · · · · · · · · · · · · Prob′(x3, µ, t) 0.8 0.7 0.5 0.6 N/A N/A Prob′(x3, ψF, t) 0.988 0.94 0.8 0.6 N/A N/A Prob′(x3, ψG, t) 0.929 0.752 0.48 0.6 N/A N/A Prob′(x5, µ, t) 0.8 0.7 0.5 0.6 0.6 0.7 Prob′(x5, ψF, t) 0.988 0.976 0.976 0.952 0.88 0.7 Prob′(x5, ψG, t) 0.964 0.953 0.929 0.838 0.616 0.7 IV. RECEDING HORIZON SYNTHESIS WITH FORWARD SEARCH To solve Problem 1 in an efficient manner, we propose an algorithm using forward search and RHC method. The algorithm assumes that time bounds on temporal operators start from zero (i.e., F[0,τ]ϕ and G[0,τ]ϕ). The algorithm works as follows. Given a sequence of past agent states Xt at time t, we iteratively apply all the control inputs to generate a set of candidate trajecto- ries Ct k at k-th iteration. A candidate trajectory is a run that consists of past agent states and future agent states. Starting from Ct 0 = {{Xt}}, we iteratively update the set of candidate trajectories as follows: Ct i+1 = {{c, f(c(last), u)} | ∀u ∈U and ∀c ∈Ct i}. (14) We repeat the process to generate a new set of candidate trajectories until we reach the end of horizon length (i.e., t+i = H). However, the growth in the number of trajectories is not scalable in practice. Therefore we introduce a heuris- tically chosen constant N which is the maximum number of candidate trajectories. Thus, after each iteration, we compute the probabilistic degree of satisfaction using Prob and only maintain N-best trajectories for the next iteration. For a set of candidate trajectories Ct i, we find a new set ˜Ct i such that ˜Ct i = {c ∈Ct i | Prob(c, ψ, 0) ≤Prob(c′, ψ, 0), ∀c′ ∈C′}, (15) where |C′| = N. We replace Ct i in (14) with ˜Ct i to calculate Ct i+1. After we compute the next set of trajectories, we check if there exists only one branch from the initial state. If so, we stop and execute the corresponding control input leading to the branch. Formally speaking, we stop when the condition below is true for a set of candidate trajectories ˜Ct i = {c1, c2, · · · , cN}: ci(t + 1) = cj(t + 1), ∀i, j ≤N. (16) This is because we use RHC method in which only the first control input from the sequence is important. Therefore, computing any further is not computationally beneficial. In order to evaluate a given trajectory over a PrSTL formula, the length of the trajectory has to be equal or greater than the horizon length of the formula as described in 0.8 0.5 Prob' = 0.9 0.9 Prob' = 0.95 (a) 1st iteration 0.7 0.5 0.8 0.9 0.5 0.55 0.6 Prob' = 0.97 Prob' = 0.95 Prob' = 0.9775 Prob' = 0.98 (b) 2nd iteration Prob' = 0.9946 Prob' = 0.9973 0.5 0.8 0.9 0.7 0.5 0.55 0.6 0.7 0.6 0.76 0.88 0.86 0.8 Prob' = 0.991 Prob' = 0.988 Prob' = 0.9972 Prob' = 0.996 (c) 3rd iteration Fig. 1: Demonstration example for the forward search algo- rithm with a PrSTL formula is F[0,5]µ, N = 3 and |U| = 2. Each node represents a state of a agent where the number on every node is the probability that µ is satisfied in the state. The overall relaxed satisfaction probabilities are shown for resulting candidate trajectories after each iteration. Sec. III. However, the proposed synthesis algorithm requires an evaluation when the length is shorter than the horizon length. Hence, we propose a relaxation to evaluate short trajectories. Given a sequence of states with finite length xn, the approximated probability of satisfaction for a temporal operator is re-written as Prob′(xn, T[0,τ]ϕ, t) = Prob(x′, T[0,min(τ,n)]ϕ, t) (17) where T ∈{G, F} and xn is a prefix of the infinite sequence x′. We replace Prob in (15) with Prob′. Suppose we have a PrSTL formula G[0,1]F[0,3]µ, the satisfaction probability over a set of example trajectories (x, x3 and x5) are shown in Tab. I where the probabilities in shades are approximated. In Fig. 1, we illustrate an example of a PrSTL for- mula F[0,5]µ over three iterations where N = 3 and |U| = 2. After the first iteration, we have two candidate trajectories. At the next iteration, we again apply control inputs to every candidate trajectories and obtain four new trajectories as shown in Fig. 1b. As the number of trajectories is greater than the limit, we calculate the relaxed satisfaction probability for each trajectory and remove the least satisfying branch. We repeat the same in the third iteration. As there exists only one branch from the starting state, we terminate the process and execute the corresponding control. V. DISCUSSIONS Using forward search and RHC, we have gained a sig- nificant improvement in efficiency in solving the problem. This is achieved by limiting the number of candidate trajec- tories. If the number were not limited, the time complexity would be proportional to |U|H−t at time t which are not scalable in practice where agent operates over a long mission horizon. With the limiting constant N, the complexity is reduced to |ψ| · |N| · |U| · (H −t) where |ψ| is the size of formula, H is the horizon length. This is because we apply |U|-number of control inputs to |N|-number of candidate trajectories over H −t iterations where each newly created trajectory is approximately evaluated |ψ| times. The overall time complexity of the mission is |ψ| · |N| · |U| · H2. Since we limit the number of candidate trajectories, the completeness of the algorithm in finding the optimal solution with respect to satisfaction probability is not assured. How- ever, our algorithm has gained a significant improvement in time complexity in return. In the following section, we show that the algorithm runs fast enough for an online synthesis in the presence of a changing belief space. VI. CASE STUDIES In this section, we demonstrate the simulated results of the examples presented in Sec. II using the UAV. These examples illustrate how an autonomous agent with a complex task could operate over an uncertain environment. We also show that our algorithm is efficient for an online synthesis. In both scenarios, we have N = 10 and |U| = 3 (i.e., straight, left, and right). We show the average clock time for synthesis using a standard desktop with 3.4 GHz Intel CPU and 16 GB RAM. A. Surveillance The simulated result for the surveillance mission (Exam- ple 1) is shown in Fig. 2 where the targets are shown in red, blue, and magenta. At each time t, the candidate trajectories are shown in black, the trajectory with maximum relaxed satisfaction probability is shown in bold red, the executed trajectory (i.e., past run) is shown in bold green, and the camera’s field of view is shown in yellow. The true locations of the targets are marked in bold. The initial state of the UAV is [30m, 10m, 70 deg]T . We assume that the targets are randomly moving at a known maximum velocity. Based on the velocity, the belief changes over time. The initial control of the UAV is synthesised based on poorly estimated beliefs of the targets shown in Fig. 2a. The beliefs are updated as the UAV makes observations, and the UAV always uses the most up-to-date beliefs to synthesise a new control input at any given time. Figure 2 shows how the beliefs change over time as observations are made. It also shows how the changing beliefs affect the way the trajectories are generated. The average synthesis time at each t is 2.94s where the minimum and maximum are 1.22s and 6.67s, respectively. For synthesising the initial control input at time t = 1, the number of candidate trajectories generated is 683, which took 4.85s to compute. However, if we had to solve Prob- lem 1 without limiting the number of candidate trajectories as shown in (15), the required number of candidate trajectories 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 UAV Target 1 Target 2 Target 3 (a) t = 1 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (b) t = 12 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (c) t = 13 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (d) t = 18 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (e) t = 22 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (f) t = 27 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (g) t = 28 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (h) t = 59 (end) Fig. 2: Resulting trajectories for the surveillance mission (Example 1). Targets are shown in contours of different colours. A synthesis of a control is based on the current beliefs over the targets. in synthesis would have been 370 (≈2.5 × 1033) which is not scalable in practice. To demonstrate the improvement in synthesis time, we ran the simulation to compute the average 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Tom Jerry UAV (a) t = 1 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (b) t = 10 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (c) t = 13 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (d) t = 19 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (e) t = 26 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (f) t = 29 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (g) t = 35 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (h) t = 37 (end) Fig. 3: Resulting trajectories for the prioritised search mis- sion (Example 2). Blue and red contours represent the belief over Tom and Jerry, respectively. synthesis time without the limiting constant N. However, no solution was given in a practical time (still running after 5 hours). Although the synthesis without the limiting constant N would provide a complete and optimal solution, the approach is not scalable for an online synthesis. B. Prioritised search We demonstrate the simulated result for Example 2 in Fig. 3. The blue and red contours represent the beliefs over true locations of Tom and Jerry, respectively. Like in the previous example, we assume that Tom and Jerry are moving at a known maximum walking velocity. The initial state of the UAV is the same as the previous example. Initially, the UAV assigned a task to locate Tom whose true location is poorly estimated. Based on the belief over Tom’s true location, the UAV heads straight to the region where the probability of finding Tom is the highest (i.e., around [75, 70]). At t = 10, Tom is in the view of the UAV, but the UAV fails to detect him due to sensor noise. With the series of observations, the belief over Tom’s location is updated (compare Fig. 3a with 3d), and the more accurate belief is used to synthesise a better control input. At t = 29, the UAV finally finds Tom, and then it generates a new control input to find Jerry while updating the beliefs. Note that the belief over Jerry’s location at time t = 29 is wider than that at time t = 1 because Jerry’s walking speed is reflected in estimating Jerry’s belief. At time t = 37, Jerry is found, and the mission ends. The average synthesis time at each time t is 6.25s where the minimum and maximum are 1.95s and 34.74s, respectively. VII. CONCLUSIONS In this paper, we proposed a probabilistic extension to signal temporal logic to specify complex tasks over target beliefs. We also presented an efficient receding horizon synthesis algorithm that maximises the probability of satis- fying a specification in this logic. Through simulations using a simple UAV model, we showed that the algorithm can easily adapt to changes in the belief space. This work is an important step towards synthesis of complex tasks over a belief space, but many open problems remain. In the future, we will consider cases where the conditional independence assumption is violated. We will consider models that include uncertain external disturbaces such as wind. REFERENCES [1] C. Belta, A. Bicchi, M. Egerstedt, E. Frazzoli, E. Klavins, and G. Pappas, “Symbolic planning and control of robot motion,” IEEE Rob. Autom. Mag., vol. 14, no. 1, pp. 61–70, 2007. [2] H. Kress-Gazit, “Ensuring correct behavior: Formal methods for hardware and software systems [Special Issue],” IEEE Rob. Autom. Mag., vol. 18, no. 3, 2011. [3] A. Bhatia, M. Maly, L. Kavraki, and M. Vardi, “Motion planning with complex goals,” IEEE Rob. Autom. Mag., vol. 18, no. 3, pp. 55–64, 2011. [4] T. Wongpiromsarn, U. Topcu, and R. M. Murray, “Receding horizon temporal logic planning for dynamical systems,” in Proc. of IEEE CDC, 2009, pp. 5997–6004. [5] C. Baier and J.-P. Katoen, Principles of Model Checking. Cambridge: The MIT Press, 2008. [6] C. Baier, “On algorithmic verification methods for probabilistic sys- tems,” Ph.D. dissertation, Universit¨at Mannheim, 1998. [7] O. Maler and D. Nickovic, “Monitoring temporal properties of contin- uous signals,” in Formal Techniques, Modelling and Analysis of Timed and Fault-Tolerant Systems. Springer, 2004, pp. 152–166. [8] L. P. Kaelbling and T. Lozano-Perez, “Integrated task and motion planning in belief space,” Int. J. Rob. Res., vol. 32, 2013. [9] J. Nguyen, N. Lawrance, R. Fitch, and S. Sukkarieh, “Energy- constrained motion planning for information gathering with au- tonomous aerial soaring,” in Proc. of IEEE ICRA. IEEE, 2013, pp. 3825–3831. [10] X. C. Ding, M. Kloetzer, Y. Chen, and C. Belta, “Automatic deploy- ment of robotic teams,” IEEE Rob. Autom. Mag., vol. 18, no. 3, pp. 75–86, 2011. [11] H. Kress-Gazit, G. Fainekos, and G. Pappas, “Temporal-logic-based reactive mission and motion planning,” IEEE Trans. Robot., vol. 25, no. 6, pp. 1370–1381, 2009. [12] S. L. Smith, J. Tumova, C. Belta, and D. Rus, “Optimal path planning for surveillance with temporal-logic constraints,” Int. J. Rob. Res., vol. 30, no. 14, pp. 1695–1708, 2011. [13] C. Yoo, R. Fitch, and S. Sukkarieh, “Online task planning and control for fuel-constrained aerial robots in wind fields,” Int. J. Rob. Res., 2015, to appear. [14] X. C. Ding, S. Smith, C. Belta, and D. Rus, “LTL control in uncertain environments with probabilistic satisfaction guarantees,” in Proc. of IFAC World Congress, 2011, pp. 3515 – 3520. [15] M. Kloetzer and C. Belta, “Temporal logic planning and control of robotic swarms by hierarchical abstractions,” IEEE Trans. Robot., vol. 23, no. 2, pp. 320–330, 2007. [16] Y. Chen, J. Tumova, A. Ulusoy, and C. Belta, “Temporal logic robot control based on automata learning of environmental dynamics,” Int. J. Rob. Res., vol. 32, no. 5, pp. 547–565, 2013. [17] A. Ulusoy, S. L. Smith, X. C. Ding, C. Belta, and D. Rus, “Optimality and robustness in multi-robot path planning with temporal logic constraints,” Int. J. Rob. Res., vol. 32, no. 8, pp. 889–911, 2013. [18] M. Svorenova, I. Cerna, and C. Belta, “Optimal control of MDPs with temporal logic constraints,” in Proc. of IEEE CDC. IEEE, 2013, pp. 3938–3943. [19] N. Piterman, A. Pnueli, and Y. Sa’ar, “Synthesis of reactive (1) designs,” in Proc. of VMCAI, 2006. [20] E. M. Wolff, U. Topcu, and R. M. Murray, “Efficient reactive controller synthesis for a fragment of linear temporal logic,” in Proc. of IEEE ICRA, 2013, pp. 5018–5025. [21] C. Yoo, R. Fitch, and S. Sukkarieh, “Probabilistic temporal logic for motion planning with resource threshold constraints,” in Proc. of RSS, 2012. [22] ——, “Provably-correct stochastic motion planning with safety con- straints,” in Proc. of IEEE ICRA, 2013, pp. 981–986. [23] M. Lahijanian, S. B. Andersson, and C. Belta, “Temporal logic motion planning and control with probabilistic satisfaction guarantees,” IEEE Trans. Robot., vol. 28, no. 2, pp. 396–409, 2012. [24] A. Medina Ayala, S. Andersson, and C. Belta, “Temporal logic control in dynamic environments with probabilistic satisfaction guarantees,” in Proc. of IEEE IROS, 2011, pp. 3108–3113. [25] K. Chatterjee, M. Chmelik, and M. Tracol, “What is decidable about partially observable markov decision processes with omega-regular objectives,” CSL, p. 165, 2013. [26] V. Raman, A. Donz´e, M. Maasoumy, R. M. Murray, A. Sangiovanni- Vincentelli, S. Seshia et al., “Model predictive control with signal temporal logic specifications,” in Proc. of IEEE CDC, 2014, pp. 81– 87. [27] D. Aksaray, K. Leahy, and C. Belta, “Distributed multi-agent persistent surveillance under temporal logic constraints,” in Proc. of IFAC Distributed Estimation and Control of Networked Systems (NecSys), 2015. [28] A. Jones, M. Schwager, and C. Belta, “A receding horizon algorithm for informative path planning with temporal logic constraints,” in Proc. of IEEE ICRA, 2013, pp. 5004–5009. [29] X. C. Ding, C. Belta, and C. G. Cassandras, “Receding horizon surveillance with temporal logic specifications,” in Proc. of IEEE CDC, 2010, pp. 256–261. [30] A. Dokhanchi, B. Hoxha, and G. Fainekos, “On-line monitoring for temporal logic robustness,” in Runtime Verification. Springer, 2014, pp. 231–246. Control with Probabilistic Signal Temporal Logic Chanyeol Yoo and Calin Belta Abstract— Autonomous agents often operate in uncertain environments where their decisions are made based on beliefs over states of targets. We are interested in controller synthesis for complex tasks defined over belief spaces. Designing such controllers is challenging due to computational complexity and the lack of expressivity of existing specification languages. In this paper, we propose a probabilistic extension to signal temporal logic (STL) that expresses tasks over continuous belief spaces. We present an efficient synthesis algorithm to find a control input that maximises the probability of satisfying a given task. We validate our algorithm through simulations of an unmanned aerial vehicle deployed for surveillance and search missions. I. INTRODUCTION In recent years, there has been an increased interest in using formal methods in robot motion planning and control [1]–[4]. Temporal logics, such as Linear Temporal Logic (LTL), Computation Tree Logic (CTL), and their proba- bilistic versions [5], [6] have been shown to be expressive enough to capture a large spectrum of robotic missions. Model checking and synthesis algorithms have been success- fully to generate motion plans and control strategies from such specifications. Most of the current works, however, do not capture the uncertainty that is inherent in real- world applications. Autonomous agents usually operate in uncertain environments with limited and possibly corrupted information. In this paper, we propose a specification language called probabilistic Signal Temporal Logic (PrSTL), which is a probabilistic extension of an existing temporal logic, called Signal Temporal Logic (STL) [7]. The specifications are interpreted over a belief space (the space of probability distributions over states of an environment) [8], [9] about the locations of a set of targets. We propose a receding horizon control strategy that maximizes the probability of satisfying the specification. The procedure involves iterations consisting of observations and belief updates using Bayes’ rule. We include illustrative simulation examples involving surveillance and search and rescue. Contribution and Related Work Temporal logics have been used widely in robotic task planning [1], [2], [10], [11]. One of the most popular logic is linear temporal logic (LTL), which provides an expressive mean to specify complex robotic tasks such as converge, sequencing, conditions and avoidance [11]. These tasks can be combined further to The authors are with the Department of Mechanical Engineering, Boston University, Boston, MA 02215, USA {chanyeol,cbelta}@bu.edu. This work was partially supported by the NSF under grants NRI-1426907 and CMMI-1400167. This work is currently under review in the 2016 Americal Control Conference (ACC 2016, submitted on September, 30, 2015). form more complex tasks. Given an LTL specification, a provably-correct controller can be automatically generated for finite systems by using existing synthesis algorithms [5]. Examples of robotics applications include [12]–[18]. These works consider static environments and assume that the robots have full knowledge over the environment. Hence, the controllers have to be re-synthesised from scratch if changes are made. Also, the correctness property may be violated if the knowledge over the environment is not accurate. In order to operate robots in dynamic environments, a fragment of LTL called generalized reactivity (GR(1)) has been used to synthesise reactive controllers [11], [19], [20]. This game-theoretic approach considers non-deterministic changes in the environment and guarantees the satisfaction under all allowed environment changes. However, since all robot and environment behaviours are symbolically encoded as part of an LTL formula, any unexpected changes in dynamics cannot be captured during execution and complex system dynamics cannot be described accurately. This ap- proach also assumes that the robots have full knowledge of the environment at all times but the assumption is easily violated in practice. Furthermore, this method only considers the worst case. Hence, the solution is often conservative and may not find a solution at all for practical cases. Instead of modelling the behaviour of environments by non-determinism, probabilistic uncertainty is also considered in robotic task planning problems. Two popular temporal logic forms for this purpose are probabilistic computation tree logic (PCTL) [18], [21]–[23] and probabilistic linear temporal logic (pLTL) [14]. These logics are capable of expressing tasks for systems with probabilistic transitions while assuming that the transition results are precisely known and that the regions of interest are static and known in advance. Their semantics is defined over Markov decision processes (MDPs). The objective is to find a control policy that maximises the probability of satisfying a given specifi- cation. Extended work considers non-static environments where the behaviour of adversarial environmental states are also modelled by MDPs [20]. Even further, in [24], environmental states are modelled by mixed observable MDPs where some internal transitions are not visible from the outside. The solution to this problem considers the beliefs on the inter- nal states of the environment. Partially observable MDPs (POMDPs) allows for more general abstraction of hidden internal transitions. Even though recent results show that LTL control synthesis over control strategies with finite memory is decidable [25], the solutions are expensive and do not capture environmental changes. arXiv:1510.08474v1 [cs.SY] 28 Oct 2015 Signal temporal logic (STL) is a time-bounded temporal logic developed monitor systems with continuous dynamics. Its semantics are defined over continuously valued sig- nals [7]. A satisfaction of an STL specification is determined by evaluating its degree of robustness: a measure of how well a signal satisfies the given specification. To the best of our knowledge, no probabilistic forms of STL exist. In order to specify tasks over an uncertain environment using STL and quantitatively evaluate a signal with respect to a specification, one possible approach is to compute the expected degree of robustness (i.e., average robustness over an infinite set of signals). However, since computing robustness for temporal operators requires min and max operations, analytical evaluation of the expected robustness is not trivial. Also, using such an approach would require the environment to be deterministic while transitions are assumed to be probabilistic. Our interest in this paper is in estimating true states of targets in an environment. Since the estimates over the target states are given in the form of probability distributions, it makes more sense to evaluate signals in terms of probability, not robustness. We propose an extension to STL that can express tasks over beliefs of targets in an environment. We then use this extension to evaluate a probabilistic degree of satisfaction. One of the main challenges in task planning with temporal logic under uncertainty is the computational complexity of solving the synthesis problem. The time complexity of finding an optimal solution for a pLTL specification is doubly exponential in the number of propositions [14]. Synthesis of an STL formula also suffers from complexity blowup. Recent work in synthesis from STL specifications suggests using mixed integer linear program (MILP) [26] and receding horizon control (RHC) methods. However, since MILP is NP-hard, such algorithms are not scalable when the size of the problem is large (i.e., the number of constraints, length of formula, and time horizon). Organisation The remainder of the paper is organised as follows. Sec. II presents the problem statement and defines the system and sensor models. In Sec. III, we define the proposed probabilistic signal temporal logic (PrSTL). In Sec. IV, we present an efficient synthesis algorithm from a PrSTL formula. We then discuss the complexity of our approach in Sec. V and present simulation examples in Sec. VI. We conclude in Sec. VII. II. PROBLEM FORMULATION We consider a discrete-time dynamic agent of the form xt+1 = f(xt, ut), (1) where xt ∈Rn is the continuous-valued n-dimensional state of the agent at discrete time t, ut ∈U is the control input from a finite set U at discrete time t. We assume that the sampling time ∆t is 1 (i.e., t ∈{0, 1, 2, · · · }) and the state of the agent is always fully known. We define a run x as a sequence of agent states at time t where the prefix Xt is a sequence of past states xk (i.e., past run) and the suffix ¯X is a sequence of future states ¯xk (i.e., future run) given a control sequence u. At time t, we have x = {Xt, ¯X} given u, where Xt = {x0, x1, · · · , xt}, ¯X = f(xt, u) = {¯xt+1, ¯xt+2, · · · , ¯xt+|u|}. Note that X0 = {x0}. The agent is assigned a complex task ψ associated with a set of targets. The state of target i is in the form ˆxi t+1 = gi(ˆxi t, ˆui t), (2) where the true ni-dimensional state of the target ˆxi t ∈Rni evolves over time, ˆui t ∈Ui is a hidden control input from a finite set Ui, i ∈{1, 2, · · · , I} and the number of targets I is finite and known in advance. We assume that the state of the agent is independent of the states of the targets. We also assume that the agent knows the model gi(·) but the exact states of the targets are not precisely known. Instead, the agent maintains a belief of each target at time t defined as bi t ≜P(ˆxi t | Zi t), where Zi t = {zi 0, zi 1, · · · , zi t} is the history of observations made by sensors on the agent and zi t ∈{0, 1} (i.e., 1 if the target i is observed at time t and 0 otherwise). The belief is updated using a state estimator when an observation is made. Assuming that observation zi t is inde- pendent of the history Zi t−1 given the true state of a target ˆxi t (i.e., P(zi t, Zi t−1 | ˆxi t) = P(zi t | ˆxi t) · P(Zi t−1 | ˆxi t)), the belief can be updated using Bayes’ rule: P(ˆxi t+1 | Zi t+1) = α P(zi t+1 | ˆxi t+1) P(ˆxi t | Zi t), (3) where α is a normalising constant. The function P(zi t | ˆxi t) is the detection likelihood which is obtained from a sensor model. Assuming conditional independence where observations are independent of each other given the current state, only the current observation is required to update the belief. The no detection likelihood is the complement of the detection likelihood (i.e., P(¯zi t | ˆxi t) = 1 −P(zi t | ˆxi t)). The task ψ assigned to the agent is specified using a time-bounded temporal logic over a set of real-valued target beliefs. An example of such a specification is the agent has to find two targets in 20 time steps while avoiding an obstacle. Once all the targets are found, the agent has to come back to base in 10 time steps. The probability of finding each target has to be greater than 50% at all time. This logic allows for computing the probability of satisfaction given a sequence of agent states over the target beliefs. In Sec. III, we define such a specification language formally. In this paper, we address the following controller synthesis problem over a belief space. Problem 1 (Receding horizon feedback controller synthesis over belief space). Given a specification ψ over a finite time horizon H, a past run Xt, a system of the form in (1), targets of the form in (2), a state estimation model of the form in (3) and beliefs over the targets Bt = {bi t | i = 1, 2, · · · , I}, compute uH−t t = arg max u∈UH−t Prob({Xt, f(xt, u)}, ψ, 0), (4) where uH−t t = {ut, ut+1, · · · , uH−t} is a finite sequence of control inputs, Prob is a function that returns the probability of satisfying a specification ψ at time t = 0 given a run {Xt, f(xt, u)}. The problem is solved using a receding horizon control (RHC) framework. RHC is an iterative control technique to solve optimisation problems in which an optimal control input over a fixed finite time horizon is determined at each time step [4], [26]–[29]. At each time t, we compute a sequence of control inputs that maximises an objective function over a finite horizon H (i.e., between t and t + H) and the first control input is chosen and executed. We repeat this process until the mission is complete. We use RHC to reduce the synthesis complexity and to rapidly react to changes in belief space. A. Examples We present examples using an unmanned aerial vehicle (UAV). Consider a UAV operating at a constant altitude and airspeed va that can be described by an equation of the form (1) xt+1 =   xt+1 yt+1 θt+1  =   va cos θt va sin θt u  + xt, (5) where θt is the heading angle at time t (minutes) in an absolute Cartesian space and u is a control input from a finite set U. The UAV is equipped with a noisy forward-facing camera that can detect the presence of a target within the viewing range without any distance or heading information. A target is said to be in the view when it is within the effective measuring distance (20m) and the angle of view (60 deg). The detection likelihood of the camera (i.e., the probability of detecting a target i at time t given a system state xt and a true state of the target ˆxi t) is P(zi t | ˆxi t, xt) = ( α · exp(−∥xt−ˆxi t∥2 λ ) if in the view, 0 otherwise, (6) where α and λ are parameters of the camera. Note that P(zi t | ˆxi t, xt) = P(zi t | ˆxi t) when P(xt | ˆxi t) = P(xt) (i.e., the agent state is independent of all the target states). Example 1 (Surveillance). The UAV is required to survey three hidden targets ˆxi t where i = {1, 2, 3}. The task is to repeatedly find each of the targets over a certain horizon. The UAV is initially given a probabilistic estimate of the targets (i.e., where they are), and the estimate is updated according to the dynamic model of the targets. Example 2 (Prioritised search). The UAV is deployed in a search mission to find two suspects ˆxi t where i = {Tom, Jerry} hiding in mountains. Based on geographic data, etc, local police has computed rough probabilistic estimates of where the suspects would be. Tom is given higher capture priority, hence the UAV is commanded to find Tom first and then to find Jerry. III. PROBABILISTIC SIGNAL TEMPORAL LOGIC (PRSTL) In this section, we propose a probabilistic extension of signal temporal logic (STL) [7] called probabilistic signal temporal logic (PrSTL). PrSTL is defined with respect to a discrete-time continuous-valued signal x (i.e., a run). For any sequence s, s[k] is the suffix from time k (i.e., s[k] = {st′ | t′ ≥k}), s(i) is i-th term of a sequence s, where s(0) is the first term, s(last) is the last term and |s| is the cardinality of the sequence. For instance, we have sH t [t+2] = st+2st+3 · · · and sH t (0) = st. The syntax of PrSTL is defined as φ ::= ⊤| ¬φ | φ ∧φ | φ1 U[t1,t2]φ2 | P∼λ[ψ] ψ ::= µ | ¬ψ | ψ ∧ϕ | F[t1,t2]ψ | G[t1,t2]ψ, (7) where ⊤is a Boolean constant for ‘true’, ¬ is a negation (‘not’), ∧is a conjunction (‘and’) ∼∈{<, ≤, ≥, >}, λ ∈ [0, 1], U is the ‘Until’ temporal operator, F is the ‘in Future’ temporal operator, G is the ‘Globally’ temporal operator, t1, t2 ∈[0, ∞) such that t2 ≥t1, µ is a predicate over a real valued function of x[t] (s.t. µ := r(x[t]) with r : Rn →B) and ϕ ∈{φ, ψ}. In this paper, we define two types of temporal logic formulas: event and instance formulas. An event formula ψ is specified over target beliefs given a run. Thus, a satisfaction of an event formula can be specified probabilistically. The probabilistic degree of satisfying an event formula ψ given a run x and beliefs at time t is computed using a function Prob(x, ψ, t). On the other hand, an instance formula φ is defined over a sequence of truth values. Hence, a satisfaction can be known deterministically for a given sequence of agent states. We use P∼λ[·] operator to determine if the probability of satisfying a given event for- mula holds true for ∼λ. For a synthesis problem where the objective is to either maximise or minimise the probability, we use special notations Pmax and Pmin respectively. The semantics of PrSTL instance formulas are recursively defined as x[t] |= ⊤,∀t x[t] |= ¬φ ⇐⇒x[t] ̸|= φ x[t] |= φ1 ∧φ2 ⇐⇒x[t] |= φ1 and x[t] |= φ2 x[t] |= φ1 U[t1,t2]φ2 ⇐⇒∃t′ ∈[t1, t2] s.t. xt′ |= φ2 and ∀t′′ ∈[t1, t′ −1], xt′′ |= φ1 x[t] |= P∼λ[ψ] ⇐⇒Prob(x, ψ, t) ∼λ, (8) The satisfaction of a given event formula is measured probabilitically as Prob(x, µ, t) = f µ(xt) Prob(x, ¬ψ, t) = 1 −Prob(x, ψ, t) Prob(x, ψ1 ∧ψ2, t) = Prob(x, ψ1, t) · Prob(x, ψ2, t) Prob(x, ψ ∧φ, t) = ( Prob(x, ψ, t) if x[t] |= φ, 0 otherwise, Prob(x, G[t1,t2]ψ, t) = Y t′∈[t1,t2] Prob(x, ψ, t′) Prob(x, F[t1,t2]ψ, t) = 1 − Y t′∈[t1,t2] (1 −Prob(x, ψ, t′)), (9) where f µ : Rn × Rnµ → R. Given a run x = {x0, · · · , xt, ¯xt+1, · · · }, Prob(x, µ, k) = zµ k ∈{0, 1} if k ≤t. Otherwise, Prob(x, µ, k) = P(ˆxµ k | Zµ k). From the existing operators, additional operators can be derived: ϕ1 ∨ϕ2 = ¬(¬ϕ1 ∧¬ϕ2) ϕ1 ⇒ϕ2 = ¬ϕ1 ∨ϕ2 F[t1,t2]ϕ = ⊤U[t1,t2]ϕ G[t1,t2]ϕ = ¬F[t1,t2]¬ϕ, (10) where ⇒is an implication (i.e., if ϕ1, then ϕ2), F is a temporal operator for ’sometime in future’ (eventually), and G is a temporal operator for ‘globally’ (always). Every PrSTL formula has a horizon length denoted as hrz(ϕ) ∈N0 (i.e., non-negative integer) [30]. The horizon length is the minimum length in time of a signal (i.e., a run of an agent) required to evaluate the signal against a given specification ϕ. The horizon length can be computed resursively as hrz(µ) = 0 hrz(¬ϕ) = hrz(ϕ) hrz(ϕ1 ∧ϕ2) = max{hrz(ϕ1), hrz(ϕ2)} hrz(ϕ1U[t1,t2]ϕ2) = t2 + max{hrz(ϕ1) −1, hrz(ϕ2)}. (11) Using PrSTL, Examples 1 and 2 can be re-written as the following. Example 1 (cont.). The surveillance mission can be re- written as G[0,30](F[0,40]µ1 ∧F[0,40]µ2 ∧F[0,40]µ3), (12) where µ1, µ2 and µ3 are the predicates for targets in the area. Over 30 minutes, each target has to be located repeatedly every 40 minutes. The horizon length of the mission is 70. Example 2 (cont.). The search mission can be re-written as F[0,60]µT om ∧G[0,60](P=1[µT om] ⇒F[0,30]µJerry), (13) where µT om and µJerry are the predicates for Tom and Jerry respectively. In this mission, Tom has to be found in 60 minutes. Whenever Tom is located, Jerry needs to be found in 40 minutes. The horizon length of the mission is 90. TABLE I: Evaluations of a formula G[0,1]F[0,3]µ over trajec- tories x, x3 and x5. Approximated numbers are shown in shaded cells. Note that ψF = F[0,3]µ and ψG = G[0,1]ψF. Time t 0 1 2 3 4 5 Prob(x, µ, t) 0.8 0.7 0.5 0.6 0.6 0.7 Prob(x, ψF, t) 0.988 0.976 0.976 · · · · · · · · · Prob(x, ψG, t) 0.964 0.953 · · · · · · · · · · · · Prob′(x3, µ, t) 0.8 0.7 0.5 0.6 N/A N/A Prob′(x3, ψF, t) 0.988 0.94 0.8 0.6 N/A N/A Prob′(x3, ψG, t) 0.929 0.752 0.48 0.6 N/A N/A Prob′(x5, µ, t) 0.8 0.7 0.5 0.6 0.6 0.7 Prob′(x5, ψF, t) 0.988 0.976 0.976 0.952 0.88 0.7 Prob′(x5, ψG, t) 0.964 0.953 0.929 0.838 0.616 0.7 IV. RECEDING HORIZON SYNTHESIS WITH FORWARD SEARCH To solve Problem 1 in an efficient manner, we propose an algorithm using forward search and RHC method. The algorithm assumes that time bounds on temporal operators start from zero (i.e., F[0,τ]ϕ and G[0,τ]ϕ). The algorithm works as follows. Given a sequence of past agent states Xt at time t, we iteratively apply all the control inputs to generate a set of candidate trajecto- ries Ct k at k-th iteration. A candidate trajectory is a run that consists of past agent states and future agent states. Starting from Ct 0 = {{Xt}}, we iteratively update the set of candidate trajectories as follows: Ct i+1 = {{c, f(c(last), u)} | ∀u ∈U and ∀c ∈Ct i}. (14) We repeat the process to generate a new set of candidate trajectories until we reach the end of horizon length (i.e., t+i = H). However, the growth in the number of trajectories is not scalable in practice. Therefore we introduce a heuris- tically chosen constant N which is the maximum number of candidate trajectories. Thus, after each iteration, we compute the probabilistic degree of satisfaction using Prob and only maintain N-best trajectories for the next iteration. For a set of candidate trajectories Ct i, we find a new set ˜Ct i such that ˜Ct i = {c ∈Ct i | Prob(c, ψ, 0) ≤Prob(c′, ψ, 0), ∀c′ ∈C′}, (15) where |C′| = N. We replace Ct i in (14) with ˜Ct i to calculate Ct i+1. After we compute the next set of trajectories, we check if there exists only one branch from the initial state. If so, we stop and execute the corresponding control input leading to the branch. Formally speaking, we stop when the condition below is true for a set of candidate trajectories ˜Ct i = {c1, c2, · · · , cN}: ci(t + 1) = cj(t + 1), ∀i, j ≤N. (16) This is because we use RHC method in which only the first control input from the sequence is important. Therefore, computing any further is not computationally beneficial. In order to evaluate a given trajectory over a PrSTL formula, the length of the trajectory has to be equal or greater than the horizon length of the formula as described in 0.8 0.5 Prob' = 0.9 0.9 Prob' = 0.95 (a) 1st iteration 0.7 0.5 0.8 0.9 0.5 0.55 0.6 Prob' = 0.97 Prob' = 0.95 Prob' = 0.9775 Prob' = 0.98 (b) 2nd iteration Prob' = 0.9946 Prob' = 0.9973 0.5 0.8 0.9 0.7 0.5 0.55 0.6 0.7 0.6 0.76 0.88 0.86 0.8 Prob' = 0.991 Prob' = 0.988 Prob' = 0.9972 Prob' = 0.996 (c) 3rd iteration Fig. 1: Demonstration example for the forward search algo- rithm with a PrSTL formula is F[0,5]µ, N = 3 and |U| = 2. Each node represents a state of a agent where the number on every node is the probability that µ is satisfied in the state. The overall relaxed satisfaction probabilities are shown for resulting candidate trajectories after each iteration. Sec. III. However, the proposed synthesis algorithm requires an evaluation when the length is shorter than the horizon length. Hence, we propose a relaxation to evaluate short trajectories. Given a sequence of states with finite length xn, the approximated probability of satisfaction for a temporal operator is re-written as Prob′(xn, T[0,τ]ϕ, t) = Prob(x′, T[0,min(τ,n)]ϕ, t) (17) where T ∈{G, F} and xn is a prefix of the infinite sequence x′. We replace Prob in (15) with Prob′. Suppose we have a PrSTL formula G[0,1]F[0,3]µ, the satisfaction probability over a set of example trajectories (x, x3 and x5) are shown in Tab. I where the probabilities in shades are approximated. In Fig. 1, we illustrate an example of a PrSTL for- mula F[0,5]µ over three iterations where N = 3 and |U| = 2. After the first iteration, we have two candidate trajectories. At the next iteration, we again apply control inputs to every candidate trajectories and obtain four new trajectories as shown in Fig. 1b. As the number of trajectories is greater than the limit, we calculate the relaxed satisfaction probability for each trajectory and remove the least satisfying branch. We repeat the same in the third iteration. As there exists only one branch from the starting state, we terminate the process and execute the corresponding control. V. DISCUSSIONS Using forward search and RHC, we have gained a sig- nificant improvement in efficiency in solving the problem. This is achieved by limiting the number of candidate trajec- tories. If the number were not limited, the time complexity would be proportional to |U|H−t at time t which are not scalable in practice where agent operates over a long mission horizon. With the limiting constant N, the complexity is reduced to |ψ| · |N| · |U| · (H −t) where |ψ| is the size of formula, H is the horizon length. This is because we apply |U|-number of control inputs to |N|-number of candidate trajectories over H −t iterations where each newly created trajectory is approximately evaluated |ψ| times. The overall time complexity of the mission is |ψ| · |N| · |U| · H2. Since we limit the number of candidate trajectories, the completeness of the algorithm in finding the optimal solution with respect to satisfaction probability is not assured. How- ever, our algorithm has gained a significant improvement in time complexity in return. In the following section, we show that the algorithm runs fast enough for an online synthesis in the presence of a changing belief space. VI. CASE STUDIES In this section, we demonstrate the simulated results of the examples presented in Sec. II using the UAV. These examples illustrate how an autonomous agent with a complex task could operate over an uncertain environment. We also show that our algorithm is efficient for an online synthesis. In both scenarios, we have N = 10 and |U| = 3 (i.e., straight, left, and right). We show the average clock time for synthesis using a standard desktop with 3.4 GHz Intel CPU and 16 GB RAM. A. Surveillance The simulated result for the surveillance mission (Exam- ple 1) is shown in Fig. 2 where the targets are shown in red, blue, and magenta. At each time t, the candidate trajectories are shown in black, the trajectory with maximum relaxed satisfaction probability is shown in bold red, the executed trajectory (i.e., past run) is shown in bold green, and the camera’s field of view is shown in yellow. The true locations of the targets are marked in bold. The initial state of the UAV is [30m, 10m, 70 deg]T . We assume that the targets are randomly moving at a known maximum velocity. Based on the velocity, the belief changes over time. The initial control of the UAV is synthesised based on poorly estimated beliefs of the targets shown in Fig. 2a. The beliefs are updated as the UAV makes observations, and the UAV always uses the most up-to-date beliefs to synthesise a new control input at any given time. Figure 2 shows how the beliefs change over time as observations are made. It also shows how the changing beliefs affect the way the trajectories are generated. The average synthesis time at each t is 2.94s where the minimum and maximum are 1.22s and 6.67s, respectively. For synthesising the initial control input at time t = 1, the number of candidate trajectories generated is 683, which took 4.85s to compute. However, if we had to solve Prob- lem 1 without limiting the number of candidate trajectories as shown in (15), the required number of candidate trajectories 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 UAV Target 1 Target 2 Target 3 (a) t = 1 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (b) t = 12 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (c) t = 13 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (d) t = 18 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (e) t = 22 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (f) t = 27 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (g) t = 28 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (h) t = 59 (end) Fig. 2: Resulting trajectories for the surveillance mission (Example 1). Targets are shown in contours of different colours. A synthesis of a control is based on the current beliefs over the targets. in synthesis would have been 370 (≈2.5 × 1033) which is not scalable in practice. To demonstrate the improvement in synthesis time, we ran the simulation to compute the average 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Tom Jerry UAV (a) t = 1 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (b) t = 10 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (c) t = 13 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (d) t = 19 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (e) t = 26 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (f) t = 29 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (g) t = 35 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 (h) t = 37 (end) Fig. 3: Resulting trajectories for the prioritised search mis- sion (Example 2). Blue and red contours represent the belief over Tom and Jerry, respectively. synthesis time without the limiting constant N. However, no solution was given in a practical time (still running after 5 hours). Although the synthesis without the limiting constant N would provide a complete and optimal solution, the approach is not scalable for an online synthesis. B. Prioritised search We demonstrate the simulated result for Example 2 in Fig. 3. The blue and red contours represent the beliefs over true locations of Tom and Jerry, respectively. Like in the previous example, we assume that Tom and Jerry are moving at a known maximum walking velocity. The initial state of the UAV is the same as the previous example. Initially, the UAV assigned a task to locate Tom whose true location is poorly estimated. Based on the belief over Tom’s true location, the UAV heads straight to the region where the probability of finding Tom is the highest (i.e., around [75, 70]). At t = 10, Tom is in the view of the UAV, but the UAV fails to detect him due to sensor noise. With the series of observations, the belief over Tom’s location is updated (compare Fig. 3a with 3d), and the more accurate belief is used to synthesise a better control input. At t = 29, the UAV finally finds Tom, and then it generates a new control input to find Jerry while updating the beliefs. Note that the belief over Jerry’s location at time t = 29 is wider than that at time t = 1 because Jerry’s walking speed is reflected in estimating Jerry’s belief. At time t = 37, Jerry is found, and the mission ends. The average synthesis time at each time t is 6.25s where the minimum and maximum are 1.95s and 34.74s, respectively. VII. CONCLUSIONS In this paper, we proposed a probabilistic extension to signal temporal logic to specify complex tasks over target beliefs. We also presented an efficient receding horizon synthesis algorithm that maximises the probability of satis- fying a specification in this logic. Through simulations using a simple UAV model, we showed that the algorithm can easily adapt to changes in the belief space. This work is an important step towards synthesis of complex tasks over a belief space, but many open problems remain. In the future, we will consider cases where the conditional independence assumption is violated. We will consider models that include uncertain external disturbaces such as wind. REFERENCES [1] C. Belta, A. Bicchi, M. Egerstedt, E. Frazzoli, E. Klavins, and G. Pappas, “Symbolic planning and control of robot motion,” IEEE Rob. Autom. Mag., vol. 14, no. 1, pp. 61–70, 2007. [2] H. Kress-Gazit, “Ensuring correct behavior: Formal methods for hardware and software systems [Special Issue],” IEEE Rob. Autom. Mag., vol. 18, no. 3, 2011. [3] A. Bhatia, M. Maly, L. Kavraki, and M. Vardi, “Motion planning with complex goals,” IEEE Rob. Autom. Mag., vol. 18, no. 3, pp. 55–64, 2011. [4] T. Wongpiromsarn, U. Topcu, and R. M. Murray, “Receding horizon temporal logic planning for dynamical systems,” in Proc. of IEEE CDC, 2009, pp. 5997–6004. [5] C. Baier and J.-P. Katoen, Principles of Model Checking. Cambridge: The MIT Press, 2008. [6] C. Baier, “On algorithmic verification methods for probabilistic sys- tems,” Ph.D. dissertation, Universit¨at Mannheim, 1998. [7] O. Maler and D. Nickovic, “Monitoring temporal properties of contin- uous signals,” in Formal Techniques, Modelling and Analysis of Timed and Fault-Tolerant Systems. Springer, 2004, pp. 152–166. [8] L. P. Kaelbling and T. Lozano-Perez, “Integrated task and motion planning in belief space,” Int. J. Rob. Res., vol. 32, 2013. [9] J. Nguyen, N. Lawrance, R. Fitch, and S. Sukkarieh, “Energy- constrained motion planning for information gathering with au- tonomous aerial soaring,” in Proc. of IEEE ICRA. IEEE, 2013, pp. 3825–3831. [10] X. C. Ding, M. Kloetzer, Y. Chen, and C. Belta, “Automatic deploy- ment of robotic teams,” IEEE Rob. Autom. Mag., vol. 18, no. 3, pp. 75–86, 2011. [11] H. Kress-Gazit, G. Fainekos, and G. Pappas, “Temporal-logic-based reactive mission and motion planning,” IEEE Trans. Robot., vol. 25, no. 6, pp. 1370–1381, 2009. [12] S. L. Smith, J. Tumova, C. Belta, and D. Rus, “Optimal path planning for surveillance with temporal-logic constraints,” Int. J. Rob. Res., vol. 30, no. 14, pp. 1695–1708, 2011. [13] C. Yoo, R. Fitch, and S. Sukkarieh, “Online task planning and control for fuel-constrained aerial robots in wind fields,” Int. J. Rob. Res., 2015, to appear. [14] X. C. Ding, S. Smith, C. Belta, and D. Rus, “LTL control in uncertain environments with probabilistic satisfaction guarantees,” in Proc. of IFAC World Congress, 2011, pp. 3515 – 3520. [15] M. Kloetzer and C. Belta, “Temporal logic planning and control of robotic swarms by hierarchical abstractions,” IEEE Trans. Robot., vol. 23, no. 2, pp. 320–330, 2007. [16] Y. Chen, J. Tumova, A. Ulusoy, and C. Belta, “Temporal logic robot control based on automata learning of environmental dynamics,” Int. J. Rob. Res., vol. 32, no. 5, pp. 547–565, 2013. [17] A. Ulusoy, S. L. Smith, X. C. Ding, C. Belta, and D. Rus, “Optimality and robustness in multi-robot path planning with temporal logic constraints,” Int. J. Rob. Res., vol. 32, no. 8, pp. 889–911, 2013. [18] M. Svorenova, I. Cerna, and C. Belta, “Optimal control of MDPs with temporal logic constraints,” in Proc. of IEEE CDC. IEEE, 2013, pp. 3938–3943. [19] N. Piterman, A. Pnueli, and Y. Sa’ar, “Synthesis of reactive (1) designs,” in Proc. of VMCAI, 2006. [20] E. M. Wolff, U. Topcu, and R. M. Murray, “Efficient reactive controller synthesis for a fragment of linear temporal logic,” in Proc. of IEEE ICRA, 2013, pp. 5018–5025. [21] C. Yoo, R. Fitch, and S. Sukkarieh, “Probabilistic temporal logic for motion planning with resource threshold constraints,” in Proc. of RSS, 2012. [22] ——, “Provably-correct stochastic motion planning with safety con- straints,” in Proc. of IEEE ICRA, 2013, pp. 981–986. [23] M. Lahijanian, S. B. Andersson, and C. Belta, “Temporal logic motion planning and control with probabilistic satisfaction guarantees,” IEEE Trans. Robot., vol. 28, no. 2, pp. 396–409, 2012. [24] A. Medina Ayala, S. Andersson, and C. Belta, “Temporal logic control in dynamic environments with probabilistic satisfaction guarantees,” in Proc. of IEEE IROS, 2011, pp. 3108–3113. [25] K. Chatterjee, M. Chmelik, and M. Tracol, “What is decidable about partially observable markov decision processes with omega-regular objectives,” CSL, p. 165, 2013. [26] V. Raman, A. Donz´e, M. Maasoumy, R. M. Murray, A. Sangiovanni- Vincentelli, S. Seshia et al., “Model predictive control with signal temporal logic specifications,” in Proc. of IEEE CDC, 2014, pp. 81– 87. [27] D. Aksaray, K. Leahy, and C. Belta, “Distributed multi-agent persistent surveillance under temporal logic constraints,” in Proc. of IFAC Distributed Estimation and Control of Networked Systems (NecSys), 2015. [28] A. Jones, M. Schwager, and C. Belta, “A receding horizon algorithm for informative path planning with temporal logic constraints,” in Proc. of IEEE ICRA, 2013, pp. 5004–5009. [29] X. C. Ding, C. Belta, and C. G. Cassandras, “Receding horizon surveillance with temporal logic specifications,” in Proc. of IEEE CDC, 2010, pp. 256–261. [30] A. Dokhanchi, B. Hoxha, and G. Fainekos, “On-line monitoring for temporal logic robustness,” in Runtime Verification. Springer, 2014, pp. 231–246.