arXiv:1611.02343v1 [cs.RO] 7 Nov 2016 Non-Myopic Target Tracking Strategies for State-Dependent Noise Zhongshun Zhang and Pratap Tokekar Abstract— We study the problem of devising a closed-loop strategy to control the position of a robot that is tracking a possibly moving target. The robot is capable of obtaining noisy measurements of the target’s position. The key idea in active target tracking is to choose control laws that drive the robot to measurement locations that will reduce the uncertainty in the target’s position. The challenge is that measurement uncertainty often is a function of the (unknown) relative positions of the target and the robot. Consequently, a closed-loop control policy is desired which can map the current estimate of the target’s position to an optimal control law for the robot. Our main contribution is to devise a closed-loop control policy for target tracking that plans for a sequence of control actions, instead of acting greedily. We consider scenarios where the noise in measurement is a function of the state of the target. We seek to minimize the maximum uncertainty (trace of the posterior covariance matrix) over all possible measurements. We exploit the structural properties of a Kalman Filter to build a policy tree that is orders of magnitude smaller than naive enumeration while still preserving optimality guarantees. We show how to obtain even more computational savings by relaxing the optimality guarantees. The resulting algorithms are evaluated through simulations. I. INTRODUCTION Tracking a moving, possibly adversarial target is a fun- damental problem in robotics and has long been a subject of study [1]–[6]. Target tracking finds applications in many areas such as surveillance [7], telepresence [8], assisted living [9], and habitat monitoring [10], [11]. Target tracking refers to broadly two classes of problems: (i) estimating the position of the target using noisy sensor measurements; and (ii) actively controlling the sensor position to improve the performance of the estimator. The second problem is distinguished as active target tracking and is the subject of study of this paper. The main challenge in active target tracking is that the value of future measurement locations can be a function of the unknown target state. Take as example, a simple instance of estimating the unknown position of a stationary target where the measurement noise is a function of the distance between the robot and the target. If the true location of the target were known, the robot would always choose a control sequence that drives it closer to the target. Since, in practice, the true target location is unknown, we cannot determine such a control sequence exactly. A possible strategy in this case would be to plan with respect to the probability dis- tribution of the target. However, the probability distribution itself will evolve as a function of the actual measurement The authors are with the Department of Electrical & Computer Engineer- ing, Virginia Tech, USA. {zszhang,tokekar}@vt.edu. This material is based upon work supported by the National Science Foundation under Grant #1566247. values. This becomes even more challenging if the target is mobile. We use a Kalman Filter (KF) to estimate the state of a moving target with a possibly state-dependent measurement model where the measurement noise is a function of the distance between the robot and the target. When planning non-myopically (i.e., for multiple steps in the future), one can enumerate all possible future mea- surements in the form of a tree. In particular, a minimax tree can be used to find the optimal (in the min-max sense) control policy for actively tracking a target [12]. The size of the min max tree grows exponentially with the time horizon. The tree can be pruned using α −β pruning [13]. Our main contribution is to show how the properties of a KF can be exploited to prune a larger number of nodes without losing optimality. In doing so, we extend the pruning techniques first proposed by Vitus et al. [14] for linear systems. Using a min max tree, we generalize these results to a state-dependent, time-variant dynamical systems. Our pruning techniques allow us to trade-off the size of the tree (equivalently, computation time) with the performance guarantees of the algorithm. We demonstrate this effect in simulations. The rest of the paper is organized as follows. We start with the related work in Section II. The problem formulation is presented in Section III. Our main algorithm is presented in Section IV. We validate the algorithm through simulations described in Section VI. Finally, we conclude with a brief discussion of future work in Section VII. II. RELATED WORK The target tracking problem has been studied under var- ious settings. Bar-Shalom et al. [1] present many of the commonly-used estimation techniques in target tracking. The five-part survey by Li and Jilkov [2]–[6] covers commonly- used control and maneuvering techniques for active target tracking. In the rest of the section, we discuss works most closely related to our formulation. Vitus et al. [14] presented an algorithm that computes the optimal scheduling of measurements for a linear dynamical system. Their formulation does not directly model a target tracking problem. Instead, the goal is to track a linear dynam- ical system using a set of sensors such that one sensor can be activated at any time instance. The posterior covariance in estimating a linear system in a Kalman filter depends on the prior covariance and sensor variance but not on the actual measurement values (unlike the case in non-linear systems). Thus, one can build a search tree enumerating all possible sensor selections and choosing the one that minimizes the final covariance. The main contribution of Vitus et al. was to present a pruning technique to reduce the size of the tree while still preserving optimality. Atanasov et al. [15] extended this result to active target tracking with a single robot. A major contribution was to show that robot trajectories that are nearby in space can be pruned away (under certain conditions), leading to further computational savings. This was based on a linear system assumption. In this paper, we build on these works and make progress towards generalizing the solution for state- dependent observation systems. A major bottleneck for planning for state-dependent ob- servation systems is that the future covariance is no longer independent of the actual measurement values, as was the case in linear state-independent systems. The covariance up- date equations use the linear models and as such will depend on the state estimate and the measurements. Thus, the search tree will have to include all possible measurement values and not just all possible measurement locations. Furthermore, finding an optimal path is no longer sufficient. Instead one must find an optimal policy that prescribes the optimal set of actions for all possible measurements. We show how to use a min max tree to find such an optimal policy while at the same time leveraging the computational savings that hold for the linear case. III. PROBLEM FORMULATION We assume that the position of the robot is known accurately using onboard sensors (e.g., GPS, laser, IMU, cameras). The motion model of the robot is given by: Xr(t + 1) = f(Xr(t), u(t)) (1) where Xr = [xr(t), yr(t)]T ∈R2 is the position of robot and u(t) ∈U is the control input at time t. U is a finite space of control inputs. We assume there are n actions available as control inputs at any time: U(t) = {u1(t), u2(t), · · · , un(t)}. The robot is mounted with a sensor that is capable of obtaining a measurement of the target. We assume that the target’s motion model is given by: Xo(t + 1) = CtXo(t) + v(t) (2) where, Xo(t) = [xo(t), yo(t)]T is the 2-dimensional position of the target and v(t) is Gaussian noise of known covariance. The task of the robot is to track the target using its noisy measurements. The measurements, Z(t), can be a nonlinear function of the states of the target and robot: Z(t) = H(Xr(t))Xo(t) + ω(Xr(t), Xo(t)) (3) The measurement noise, ω(t), is a Gaussian whose variance depends on the distance between the robot and the target: ω(Xr(t), Xo(t)) ∼N 0, δ2 1 + δ2 2d(Xr(t), Xo(t))  (4) where, d(Xr(t), Xo(t)) =    C, ||Xr(t) −Xo(t)||2 > B C||Xr(t) −Xo(t)||2 B , ||Xr(t) −Xo(t)||2 ≤B (5) When the true distance between the robot and target is within B, we assume that measurement noise is proportional to the true distance. When the distance is greater than B), the variance is assumed to be a constant maximum value of C. The estimated position and the covariance matrix of the target at time t, ˆXo(t) and ˆΣo(t), is given by the Kalman Filter. The uncertainty in the estimate of the target’s position is measured by the trace of the covariance matrix. The problem considered in this paper can be formally stated as follows. Problem: Given an initial robot position Xr(0) and an initial target estimate [ ˆXo(0), ˆΣo(0)], find a sequence of control laws for the robot, σ = u0, u1, · · · , uT ∈UT from time t = 0 to t = T to minimize trace of the covariance in the target’s estimate at time t = T . That is, min u(t) max z(t) tr(ΣT ) (6) such that, Σt+1 = ρt+1(Σt), t = 0, 1, · · · , T −1 where ρt(·) is the Kalman Riccati equation [16]. The Riccati equation, ρ(·), maps the current covariance matrix ˆΣk, under a new measurement to the covariance matrix at the next time step, ρi(ˆΣk) =Ck ˆΣkCT k −Ck ˆΣkHT k (Hk ˆΣkHT k + ˆΣw)−1Hk ˆΣkCT k + Σv (7) where Hk is the matrix of the measurement equation com- puted around Xr(k) at time k. Σw is the covariance of the measurement noise given in Equations (3)–(5). The true position of the target is unknown making it impossible to determine Σw exactly. Consequently, we use an estimate of Σw using the estimated target’s position ˆXo(k). Therefore, the optimal solution for the problem defined will be a closed-loop policy that should map the estimated target’s position to the optimal control action for the robot. IV. CLOSED-LOOP CONTROL POLICY References [14], [15] solve a similar problem but for a linear Gaussian system. The linearity assumption makes the Riccati equation independent of the position of the target. Consequently, they show an open loop policy can determine the optimal control sequence for the robot. In our case, the optimal control policy for this state-dependent observation model case will be an adaptive (closed-loop) control policy. However, this generalization comes at the expense of discretization of the set of possible target mea- surements. Specifically, we assume that the measurement at any time step is chosen from one of k tuples of candidate measurements. That is, z(t) ∈{z1(t), z2(t), · · · , zk(t)}. These candidate measurements can be obtained by, for example, sampling from the continuous distribution of zero mean sensor noise around the current estimate of the target. For example, we can choose k candidate measurements from the data within 3 standard deviation of the mean value, which contain 99.7% of the possible measurements. -3 -2 -1 0 1 2 3 0 0.2 0.4 0.6 0.8 1 Zm-2 . . . Zm-1 Zm Z2 Z1 Z3 . . . Fig. 1. Obtaining a finite set of candidate measurements by discretizing the Gaussian distribution of the measurement noise. A. Optimal decisions: The minimax algorithm Minimizing the maximum trace can be thought of as playing a game against an adversary: The robot chooses the control actions to minimize the trace whereas the adversary (i.e., nature) chooses measurement noise to maximize the trace. By optimizing the min max trace, the robot determines the best conservative policy. We can find this optimal strategy by building a minimax tree. This tree enumerates all possible control laws and all possible measurements that the robot can obtain. A node on the kth level of the tree stores the position of the robot, Xr(k), the estimated position of the target, ˆXo(k), and the covariance matrix ˆΣk. Each node at an odd level has one branch per control action. Each node at an even level has one branch per candidate measurement. The robot’s state and the target’s estimate are updated appropriately along the control and measurement branches using the state transition equation (1) and the Kalman filter update equation, respectively. The minimax value is computed at the leaf nodes and is equal to the trace of the covariance matrix at that node. These values are propagated upwards to compute the optimal strategy. Figure 2 shows an example. B. Alpha Pruning The number of nodes in a naive minmax tree is exponential in the depth of the tree (i.e., in the planning horizon). As a first step in reducing the size of the tree, we implement α pruning [13]. The main idea in α pruning is that if we have explored a part of the tree, we have an upper bound on the optimal minimax value. When exploring a new node, ni, if we find that the minimax value of the subtree rooted at ni is greater than the upper bound found, that subtree does not need to be explored further. This is because an optimal strategy will never prefer a strategy that passes through ni since there exists a better control policy in another part of the tree. Note that ni must be a control node. Measurement nodes cannot be pruned since the robot has no control over the actual measurement values. The pruning algorithm is based on the general alpha-beta pruning [13] used in adversarial Algorithm 1: The minimax algorithms 1 S0 ←{(Xr(o), Σ0)} , St ←φ for t = 1, ...., T 2 Z = {z1, z2, ..., zk} 3 for t = 1 : T do 4 if NODE STATE (min) 5 for all (Xr(t −1), Σ(t −1)) ∈St−1 do 6 for all ui ∈U 7 Xr(t) ←f(Xr(t −1), ui) 8 St ←St S {(Xr(t), Σ(t −1))} 9 else if NODE STATE (max) 10 for all (Xr(t), Σ(t −1)) ∈St do 11 for all zi ∈Z 12 Σ(t) ←ρ(Xr(t), zi, Σ(t −1)) 13 St ←St S {(Xr(t), Σ(t))} 14 for t = 1 : T do 15 if TERMINAL-TEST (Max) 16 for each St do 17 V ←(max tr(Σi(t)), St(i)) 18 t ←t −1 19 if TERMINAL-TEST (Min) 20 for each St do 21 V ←(minimax tr(Σi(t)), St(i)) 22 t ←t −1 23 return V games. Fig. 2 shows a simple example of policy tree built using alpha pruning. C. Algebraic Redundancy Pruning In addition to alpha pruning, we extend the ideas presented by [14] for linear systems and extend them to get even further computational savings. If there are multiple nodes at the same level with the same Xr(t) values but different target estimates, we can prune one of the nodes if it is clearly “dominated” by the others. The notion of domination encodes the property that that particular node will never be a part of an optimal (minmax) policy. Reference [14] formalized the notion of domination by defining an algebraic redundancy constraint. We adapt this result for our notation as follows: Theorem 1 (Algebraic Redundancy [14]): Let H = {(Xj r(t), Σj t)} be a set of n nodes at the same level of the tree. If there exist non-negative constants α1, α2, . . . , αk such that, Σp t ⪰ k X i=1 αjΣj t and k X i=1 αi = 1 then the node (Xp r (t), Σp t ) is regarded as algebraically redun- dant1 with respect to H\{(Xp r (t), Σp t )} and (Xp r (t), Σp t ) and all of its descendants can be pruned without eliminating the optimal solution from the tree. They prove that the trace of any successor of (Xp r (t), Σp(t)) cannot be lower than one of the successors of 1M ⪰N represents that M −N is positive semi-definite. H \ {(Xp r (t), Σp(t))}. Our main insight is that a similar re- dundancy constrained can be defined for the state-dependent case with suitable additional constraints as described below. Theorem 2 (State-dependent Algebraic Redundancy): Let H = {(Xi r(t), ˆ Xi o(t), ˆΣi t)} be a set of N nodes at the same level. If there exists a node A = (XA r (t), ˆXA o (t), ˆΣA t ) at the same level such that: 1) the robot states are identical, i.e., XA r (t) = Xi r(t) for all i in H; 2) the least common ancestor of A with any other node in H is a control (i.e., min) node; 3) there exist non-negative αi such that for any K: HtΣA t HT t ⪰ N X i=1 αi  HtΣi tHT t + K δ2 1 + δ2 2C  (8) where, PN i=1 αi = 1, then there exists a node in H, say B, such that: tr(ΣA t+K) ≥tr(ΣB t+K). That is, the node A can be pruned from the minimax tree without eliminating the optimal policy. The proof is presented in the appendix. D. Sub-optimal Pruning algorithm Combining alpha pruning with linear state-independent algebraic redundancy pruning we can reduce a significant number of branches in the search tree while still guaranteeing optimality. We can further reduce the number of branches at the expense of losing optimality. This can be achieved by relaxing alpha-pruning and algebraic redundancy constraints. We use two parameters ǫ1 > 0 and ǫ2 > 0 as relax- ation parameters for alpha pruning and algebraic redundancy pruning, respectively. In each case, we bound the loss in optimality as a function of the parameters. Specifically, while building the tree, we prune away a node if it satisfies either of the following two conditions. When checking for alpha pruning, we prune a node if its alpha value is greater than or equal to the best minmax value found so far minus ǫ1. Similarly, we replace the constraint in Theorem 2 with the following: Ht(ΣA t + ǫ2)HT t ⪰ N X i=1 αi  HtΣi tHT t + K δ2 1 + δ2 2C  (9) By varying ǫ1 and ǫ2, we can vary the number of nodes in the search tree. Next we bound the resulting loss in the optimality of the algorithm. V. ERROR ANALYSIS In this section, we bound the value returned by the relaxed algorithm with respect to the optimal algorithm. Theorem 3 (ǫ1 Alpha Pruning): Let J∗ 2k = tr(ˆΣ∗ 2k) be the optimal minimax value returned by the full enumeration tree. If Jǫ1 2k = tr(ˆΣǫ1 2k) is the value returned by the ǫ1–alpha pruning algorithm, then 0 ≤Jǫ1 2k −J∗ 2k ≤ǫ1. The proof follows directly from the fact that if a node on the optimal policy, say ni is pruned away, then the alpha value at ni is at most the alpha value of some other node, say nj, that is present in the tree minus ǫ1. The alpha value of nj cannot be less than the value returned by the ǫ1 algorithm. The full proof is given in the appendix. The bound for ǫ2- algebraic redundancy pruning is more complicated. Theorem 4 (ǫ2 State-dependent Algebraic Redundancy): Let J∗ 2k = tr(ˆΣ∗ 2k) be the optimal minimax value returned by the full enumeration tree of 2k levels. If Jǫ2 2k = tr(ˆΣǫ2 2k) is the value returned by the ǫ2–algebraic redundancy pruning algorithm, then 0 ≤Jǫ2 2k −J∗ 2k ≤Bǫ2 where, Bǫ2 = tr    k X j=0   jY i=k−1 (Fi(Σ)Φ2i(Σ)) k−1 Y i=j (Fi(Σ)Φ2i(Σ))T  ǫ2    where, Fi(Σ) = C −CKi(Σ)Hi and Ki(Σ) is the Kalman gain given by Ki(Σ) = ΣHT i (HiΣHT i + Σw)−1, and Φ2k(·) is the application of the Riccati equation ρ(·), over k measurement steps: Φ2k(·) = ρ2(k−1)(ρ2(k−2)(. . . ρ0(·))) | {z } k steps ρ(·) . By combining the two results, we get 0 ≤Jǫ1,ǫ2 2k −J∗ 2k ≤max {ǫ1, Bǫ2} . VI. SIMULATIONS In this section, we present results from simulations to evaluate our pruning techniques. We carry out three types of evaluations. First, we investigate the savings of our algorithm by comparing the number of nodes in the pruned minimax tree and the full enumeration tree. Then, we study the effect of varying the ǫ1 and ǫ2 parameters on the number of nodes. Finally, we use the control policy given by our algorithm and execute it by drawing actual measurements from a random distribution. This represents a realistic scenario where the measurements are not necessarily adversarial. We demon- strate how our strategy can be used in such a case, and compare the average case performance with the worst-case performance. In all simulations, the robot can choose from four actions: U = n [+e, 0]T , [−e, 0]T , [0, +e]T , [0, −e]T o where e is a constant. Xr(t + 1) = Xr(t) + u(t) We build the tree using five candidate measurements at each step: z(t) = {z1(t), z2(t), · · · , z5(t)}. The five values are randomly generated with Gaussian noise. Pruned Optimal root (starting point) Control level Measurement level Control level Measurement level Fig. 3. A five level minimax tree with pruning (189 nodes) and full enumeration (505 nodes). 4 6 8 10 12 Depth of the tree 102 104 106 108 log(Total nodes number) With Pruning Brute Force Fig. 4. Comparison of the number of total nodes generated for minimax tree. Note the log scale. A. Comparing the Number of Nodes Fig. 3 and Fig. 4 shows the number of nodes left in the minimax tree after pruning as compared to a full enumer- ation tree. We prune a node by comparing it to the nodes already explored. More nodes will be pruned if initial nodes encountered are “closer” to the optimal policy. For instance, if the first set of nodes explored happen to be control laws that drive the robot close to the target, then we expect the nodes encountered later will be pruned closer to the root. To provide a fair assessment, we generate the search trees for various true positions of the target and report the average and standard deviation of the number of nodes in Fig. 4. Fig. 4 shows that our algorithm prunes orders of magni- tudes of nodes from the full enumeration tree. For a tree with depth 13, it takes 8.08×107 to enumerate all nodes but the same optimal solution can be computed using 4.36×105 nodes with our pruning strategy. Even though our algorithm prunes a large percentage of the nodes, the number of nodes still grows exponentially. By sacrificing optimality, we can prune even more nodes. We evaluate this by varying ǫ1 and ǫ2 individually first, and then jointly. As shown in Fig. 5, ǫ1–alpha pruning is relatively better at reducing the complexity of the minmax tree. This is intuitive because ǫ1-alpha pruning condition compares nearly every pair of nodes at the same depth. ǫ2-algebraic redun- dancy pruning, on the other hand, requires more conditions (same robot state) to be satisfied. Nevertheless, Fig. 5 shows that by sacrificing optimality, the number of nodes can be substantially reduced. 1 2 3 4 5 Depth of the tree 0 1 2 3 4 5 Total number of search tree nodes ×105 Fig. 5. Effect of the ǫ1 and ǫ2 relaxation parameters on the number of nodes in the search tree. The baseline case is the optimal solution with alpha pruning and algebraic redundancy with both parameters set to zero. B. Online Execution of the Search Tree So far, we have discussed the problem of building the minimax tree. Once the tree is built, the robot can execute the optimal policy. At the root node, the robot executes the first control action along the optimal minmax path found. Next, the robot obtains a measurement. This measurement may not correspond to the worst-case measurement. Furthermore, the actual value of the measurement may not even be in the k candidate measurements in the tree. The updated target estimate may not correspond to a node in the tree. Instead, we compute the distance between the actual measurement and find the closest match in the candidate set. The corresponding child node is now the new root node of the tree and the optimal policy starting at that node is executed, iteratively. Fig. 6 shows the execution in a simple instance. VII. CONCLUSION We investigated the problem of devising closed-loop con- trol policies for state-dependent target tracking. Unlike state- independent tracking, the value of a candidate control law in state-dependent target tracking is a function of the history of measurements obtained. Consequently, planning over a hori- zon requires taking into account possible measurement val- ues. In this paper, we focused on minimizing the worst-case uncertainty. Our solution consisted of building a min max search tree to obtain the control policy. A full enumeration tree has size exponential in the number of measurements, control laws, and the planning horizon. Instead, we exploited the structural properties of Kalman filter to yield a tree with significantly less number of possible nodes without sacrificing the optimality guarantees. We also showed how two parameters, ǫ1 and ǫ2, can be used to yield even more computational savings at the expense of optimality. One disadvantage of the generalization is the need to discretize the set of possible future measurements. Our immediate future work is to bound the suboptimality as a function of the number of discrete samples chosen to represent the continuous set of future measurements. Once a bound is obtained, the user may choose the correct trade- off between the computation time and the solution quality desired. A second avenue of future work focuses on extend- ing these results to multi-robot, multi-target scenarios. Our prior work [17] has shown a greedy assignment of robot trajectories to targets yield provably approximate solutions for one-step planning. We will extend this to planning over a finite horizon using the results presented in this paper. REFERENCES [1] Y. Bar-Shalom, X. R. Li, and T. Kirubarajan, Estimation with Applica- tions to Tracking and Navigation: Theory, Algorithms, and Software. John Wiley & Sons, 2004. [2] X. R. Li and V. P. Jilkov, “Survey of maneuvering target tracking. part i. dynamic models,” IEEE Transactions on aerospace and electronic systems, vol. 39, no. 4, pp. 1333–1364, 2003. [3] ——, “Survey of maneuvering target tracking. part ii: motion models of ballistic and space targets,” IEEE Transactions on Aerospace and Electronic Systems, vol. 46, no. 1, pp. 96–119, 2010. [4] ——, “A Survey of Maneuvering Target TrackingPart III: Measure- ment Models,” in Proceedings of SPIE, vol. 4473, 2001, p. 424. [5] ——, “A Survey of Maneuvering Target TrackingPart IV: Decision- Based Methods,” in Proceedings of SPIE, vol. 4728, 2002, p. 512. [6] ——, “Survey of maneuvering target tracking. part v. multiple-model methods,” IEEE Transactions on Aerospace and Electronic Systems, vol. 41, no. 4, pp. 1255–1321, 2005. [7] B. Rao, H. F. Durrant-Whyte, and J. Sheen, “A fully decentralized multi-sensor system for tracking and surveillance,” The International Journal of Robotics Research, vol. 12, no. 1, pp. 20–44, 1993. [8] N. Karnad and V. Isler, “Modeling human motion patterns for multi- robot planning,” in Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2012, pp. 3161–3166. [9] M. Montemerlo, J. Pineau, N. Roy, S. Thrun, and V. Verma, “Expe- riences with a mobile robotic guide for the elderly,” in Proceedings of the AAAI National Conference on Artificial Intelligence,, 2002, pp. 587–592. [10] V. Isler, N. Noori, P. Plonski, A. Renzaglia, P. Tokekar, and J. Vander Hook, “Finding and tracking targets in the wild: Algorithms and field deployments,” in International Symposium on Safety, Security, and Rescue Robotics, 2015. [11] P. Tokekar, E. Branson, J. Vander Hook, and V. Isler, “Tracking aquatic invaders: Autonomous robots for invasive fish,” IEEE Robotics and Automation Magazine, 2013. [12] P. Tokekar, J. Vander Hook, and V. Isler, “Active target localization for bearing based robotic telemetry,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011. [13] S. Russell and P. Norvig, Artificial Intelligence: A modern approach. Prentice-Hall, 1995. [14] M. P. Vitus, W. Zhang, A. Abate, J. Hu, and C. J. Tomlin, “On efficient sensor scheduling for linear dynamical systems,” Automatica, vol. 48, no. 10, pp. 2482–2493, October 2012. [15] N. Atanasov, J. Le Ny, K. Daniilidis, and G. J. Pappas, “Informa- tion acquisition with sensing robots: Algorithms and error bounds,” in Proceedings of IEEE International Conference on Robotics and Automation, 2014, pp. 6447–6454. [16] P. R. Kumar and P. Varaiya, Stochastic systems: Estimation, identifi- cation, and adaptive control. Prentice Hall, NJ, 1986, vol. 986. [17] P. Tokekar, V. Isler, and A. Franchi, “Multi-target visual tracking with aerial robots,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014. APPENDIX Theorem 5: (Monotonicity of state dependent Riccati equation ) If two nodes of a linear stochastic system satisfy the condition: HΣA t HT +SA ⪰HΣB t HT +SB and ΣA t ⪰ΣB t , then after apply one step Riccati map, we have: ρ(ΣA t ) ⪰ρ(ΣB t ) where, 0 ≤SA ≤a, 0 ≤SB ≤a ρ(Σk) =CkΣkCT k −Ck ˆΣkHT k (Hk ˆΣkHT k + SA)−1Hk ˆΣkCT k + Σv (10) Proof: L(ΣA k ) −L(ΣB k ) =CkΣA k CT k −CkΣA k HT HΣA k HT + SA−1 HΣA k CT k −CkΣB k CT k + CkΣB k HT HΣB k HT + SB−1 HΣB k CT k (11) We define: K(Σ) := −FΣHT (HΣHT + S)−1 F(Σ) := F −(FΣHT )(HΣHT + S)−1 Note that F(Σ) = F + K(Σ)H, and K(Σ)(FΣHT )T = −K(Σ)(HΣHT + S)K(Σ)T Then, L(ΣA t ) −L(ΣB t ) −F(ΣA t )(ΣA t −ΣB t )F T (ΣA t ) =K(ΣA t )FΣA t HT −K(ΣB t )FΣB t HT −K(ΣA t )H(ΣA t −ΣB t )F T −F(ΣA t −ΣB t )HT KT (ΣA t ) −K(ΣA t )H(ΣA t −ΣB t )HT K(ΣA t )T =K(ΣA t )FΣA t HT −K(ΣB t )FΣB t HT −K(ΣA t )[HΣA t F T −HΣB t F T ] −[FΣA t HT −FΣB t HT ]KT(ΣA t ) −K(ΣA t )H(ΣA t −ΣB t )HT KT (ΣA t ) = −K(ΣB t )(FΣB t HT )T + K(ΣA t )(FΣB t HT ) + (FΣB t HT )T KT (ΣA t ) + K(ΣA t )(HΣB t HT + SA)KT (ΣA t ) =K(ΣB t )(HΣB t HT + SB)KT (ΣB t ) + K(ΣA t )(HΣB t HT + SA)KT (ΣA t ) −K(ΣA t )(HΣB t HT + SB)KT (ΣB t ) −K(ΣB t )(HΣB t HT + SB)KT (ΣA t ) =(K(ΣB t ) −K(ΣA t ))(HΣB t HT + SB)(K(ΣB t ) −K(ΣA t ))T + K(ΣA t )(SA −SB)KT (ΣA t ) (12) Define M = max (F + K(ΣA t )H)(F + K(ΣA t )H)T , K(ΣA t )KT (ΣA t )  (13) So, we have, L(ΣA t ) −L(ΣB t ) =F(ΣA t )(ΣA t −ΣB t )F T (ΣA t ) + K(ΣA t )(SA −SB)KT (ΣA t ) + (K(ΣB t ) −K(ΣA t ))(HΣB t HT + SB)(K(ΣB t )T −K(ΣA t ))T =(F + K(ΣA t )H)(ΣA t −ΣB t )(F + K(ΣA t )H)T + K(ΣA t )(SA −SB)KT (ΣA t ) + (K(ΣB t ) −K(ΣA t ))(HΣB t HT + SB)(K(ΣB t ) −K(ΣA t ))T From [13], we have ⪰K(ΣA t )H(ΣA t −ΣB t )HT KT(ΣA t ) + K(ΣA t )(SA −SB)KT (ΣA t ) + (K(ΣB t ) −K(ΣA t ))(HΣB t HT + SB)(K(ΣB t ) −K(ΣA t ))T =K(ΣA t ) (HΣA t HT + SA) −(HΣB t HT + SB)  KT (ΣA t ) + (K(ΣB t ) −K(ΣA t ))(HΣB t HT + SB)(K(ΣB t ) −K(ΣA t ))T since HΣA t HT + SA ⪰HΣB t HT + SB ⪰0 (14) PROOF OF THEOREM 2 Proof: We first prove a special case when H consists of only one node, B. That is, we have: HtΣA t HT t ⪰HtΣB t HT t + K · aI where, a = δ2 1 + δ2 2C. To prove: tr(ΣA t+K) ≥tr(ΣB t+K). 1) Show that the statement holds for K = 1. When K = 1, HΣA t HT ⪰HΣB t HT + aI. From the Kalman Riccati map, ΣA t+1 = ρi(ΣA t , xr(t), ˆxo(t)) =CΣA t CT −CΣA t HT t · HtΣA t HT t + Σwi xr(t), ˆxA o (t) −1 HtΣA t CT + Σv Applying Theorem 5, we have ⪰CΣB t CT −CΣB t HT t · HtΣB t HT t + Σwi xr(t), ˆxB o (t) −1 HtΣB t CT + Σv =ΣB t+1 (15) 2) Inductive step: Show that if the claims holds for K = M, then it also holds for K = M + 1. This can be done as follows: Assume the claim holds for K = M. Let ΣB′(t) = ΣB(t) + a · (HT H)−1, based on the condition of K = M we have, ΣA t+M ⪰ΣB′ t+M that is, ΣA t+M ⪰ΣB t+M + a · (HT H)−1 Similar to the step (1): ΣA t+M+1 ⪰ΣB t+M+1 Thereby showing that indeed K = M + 1 holds. Since both the base case and the inductive step have been performed, by mathematical induction, the statement holds for all natural numbers n. Then, we extend the proof from comparing two nodes to arbitrary N nodes case, if we have HtΣA t HT t ⪰ N X i=1 αi  HtΣi tHT t + K δ2 1 + δ2 2C  (16) Without loss of generality, we assume ΣB′ is the minimum covariance matrix (in the positive semi-definite sense) among Σ1 t, Σ2 t, ..., ΣN t . HtΣA t HT t ⪰ N X i=1 αi  HtΣi tHT t + K δ2 1 + δ2 2C  ⪰ N X i=1 αi h HtΣB′ t HT t + K δ2 1 + δ2 2C i = HtΣB′ t HT t + K δ2 1 + δ2 2C  (17) Using the result from the induction above, after K minmax tree steps, there always exist a node B′, such that, tr(ΣA t+K) ≥tr(ΣB′ t+K) (18) Therefore, node A can be pruned without reducing the optimality of the minmax tree. PROOF OF THEOREM 4 Proof: For some level i, suppose that we prune a node on the optimal policy. We have, tr(H (Σǫ2 2i) HT ) ≤tr(H (Σ∗ 2i + ǫ2I) HT ) From [14], we know that ∀Σ, Q ∈Rn×n and ǫ ≥0: ρ2i(Σ + ǫQ) ⪯ρ2i(Σ) + Fi(Σ)QF T i (Σ)ǫ. Applying to the above equation we get, Φ2k(Σ + ǫ2Q) = ρ2(k−1)(Φ2(k−1)(Σ + ǫ2Q)) =ρ2(k−1)(ρ2(k−2)(. . . ρ0(Σ + ǫ2Q))) ⪯Φ2k(Σ) + ρ2(k−1)(ρ2(k−2)(. . . ρ2(F1(Σ)QF T 1 (Σ)) ... =Φ2k(Σ)+ " 0 Y i=k−1 (Fi(Σ)Φ2i(Σ)) Q k−1 Y i=0 (Fi(Σ)Φ2i(Σ))T # ǫ2 + o(ǫ2) ⪯Φ2k(Σ)+ " 0 Y i=k−1 (Fi(Σ)Φ2i(Σ)) Q k−1 Y i=0 (Fi(Σ)Φ2i(Σ))T # ǫ2 Let n ˆΣ∗ i ok i=1 be the series of covariance matrices along the optimal minmax trajectory. Suppose that the sequence of covariance matrices along the optimal trajectory returned by ǫ2–algebraic redundancy pruning algorithm is n ˆΣǫ2 i ok i=1. We get, ˆΣǫ2 i ⪯ˆΣ∗ i + ǫ2I, ∀i = 1, 2, . . ., k By combining the two results, we obtain the desired bound: 0 ≤Jǫ2 2k −J∗ 2k = tr(ˆΣǫ2 k ) −tr(ˆΣ∗ k) ≤tr    k X j=0   jY i=k−1 (Fi(Σ)Φ2i(Σ)) k−1 Y i=j (Fi(Σ)Φ2i(Σ))T  ǫ2    = Bǫ2 7 9 2 4 7 5 8 7 2 5 7 minmax tree (before alpha pruning) max min max root node max min minimax tree max u1 u2 un z1 zm root node 7 9 2 4 5 7 2 5 7 minmax tree (before alpha pruning) Fig. 2. A minimax tree with alpha pruning. ▽and △are nodes in which we compute the minimum or maximum value of its children. The value at the leaf nodes equals the tr(Σk)). ▽and △nodes represent control and measurement nodes, respectively. The filled ▽are pruned by alpha pruning. 2k level minimax tree u1 un z1 zk 2k level minimax tree u1 u2 un z1 zk 2k level minimax tree u1 u2 un z1 zk u1 u2 un z1 zk add one new control and measurement level u1 un z1 Take actual measurement zk Apply optimal action u1 Fig. 6. Online measurement update with a minmax tree. The actual measurement, zk may not correspond to a measurement node in the tree. In such a case, we choose the “closest” measurement in the tree. arXiv:1611.02343v1 [cs.RO] 7 Nov 2016 Non-Myopic Target Tracking Strategies for State-Dependent Noise Zhongshun Zhang and Pratap Tokekar Abstract— We study the problem of devising a closed-loop strategy to control the position of a robot that is tracking a possibly moving target. The robot is capable of obtaining noisy measurements of the target’s position. The key idea in active target tracking is to choose control laws that drive the robot to measurement locations that will reduce the uncertainty in the target’s position. The challenge is that measurement uncertainty often is a function of the (unknown) relative positions of the target and the robot. Consequently, a closed-loop control policy is desired which can map the current estimate of the target’s position to an optimal control law for the robot. Our main contribution is to devise a closed-loop control policy for target tracking that plans for a sequence of control actions, instead of acting greedily. We consider scenarios where the noise in measurement is a function of the state of the target. We seek to minimize the maximum uncertainty (trace of the posterior covariance matrix) over all possible measurements. We exploit the structural properties of a Kalman Filter to build a policy tree that is orders of magnitude smaller than naive enumeration while still preserving optimality guarantees. We show how to obtain even more computational savings by relaxing the optimality guarantees. The resulting algorithms are evaluated through simulations. I. INTRODUCTION Tracking a moving, possibly adversarial target is a fun- damental problem in robotics and has long been a subject of study [1]–[6]. Target tracking finds applications in many areas such as surveillance [7], telepresence [8], assisted living [9], and habitat monitoring [10], [11]. Target tracking refers to broadly two classes of problems: (i) estimating the position of the target using noisy sensor measurements; and (ii) actively controlling the sensor position to improve the performance of the estimator. The second problem is distinguished as active target tracking and is the subject of study of this paper. The main challenge in active target tracking is that the value of future measurement locations can be a function of the unknown target state. Take as example, a simple instance of estimating the unknown position of a stationary target where the measurement noise is a function of the distance between the robot and the target. If the true location of the target were known, the robot would always choose a control sequence that drives it closer to the target. Since, in practice, the true target location is unknown, we cannot determine such a control sequence exactly. A possible strategy in this case would be to plan with respect to the probability dis- tribution of the target. However, the probability distribution itself will evolve as a function of the actual measurement The authors are with the Department of Electrical & Computer Engineer- ing, Virginia Tech, USA. {zszhang,tokekar}@vt.edu. This material is based upon work supported by the National Science Foundation under Grant #1566247. values. This becomes even more challenging if the target is mobile. We use a Kalman Filter (KF) to estimate the state of a moving target with a possibly state-dependent measurement model where the measurement noise is a function of the distance between the robot and the target. When planning non-myopically (i.e., for multiple steps in the future), one can enumerate all possible future mea- surements in the form of a tree. In particular, a minimax tree can be used to find the optimal (in the min-max sense) control policy for actively tracking a target [12]. The size of the min max tree grows exponentially with the time horizon. The tree can be pruned using α −β pruning [13]. Our main contribution is to show how the properties of a KF can be exploited to prune a larger number of nodes without losing optimality. In doing so, we extend the pruning techniques first proposed by Vitus et al. [14] for linear systems. Using a min max tree, we generalize these results to a state-dependent, time-variant dynamical systems. Our pruning techniques allow us to trade-off the size of the tree (equivalently, computation time) with the performance guarantees of the algorithm. We demonstrate this effect in simulations. The rest of the paper is organized as follows. We start with the related work in Section II. The problem formulation is presented in Section III. Our main algorithm is presented in Section IV. We validate the algorithm through simulations described in Section VI. Finally, we conclude with a brief discussion of future work in Section VII. II. RELATED WORK The target tracking problem has been studied under var- ious settings. Bar-Shalom et al. [1] present many of the commonly-used estimation techniques in target tracking. The five-part survey by Li and Jilkov [2]–[6] covers commonly- used control and maneuvering techniques for active target tracking. In the rest of the section, we discuss works most closely related to our formulation. Vitus et al. [14] presented an algorithm that computes the optimal scheduling of measurements for a linear dynamical system. Their formulation does not directly model a target tracking problem. Instead, the goal is to track a linear dynam- ical system using a set of sensors such that one sensor can be activated at any time instance. The posterior covariance in estimating a linear system in a Kalman filter depends on the prior covariance and sensor variance but not on the actual measurement values (unlike the case in non-linear systems). Thus, one can build a search tree enumerating all possible sensor selections and choosing the one that minimizes the final covariance. The main contribution of Vitus et al. was to present a pruning technique to reduce the size of the tree while still preserving optimality. Atanasov et al. [15] extended this result to active target tracking with a single robot. A major contribution was to show that robot trajectories that are nearby in space can be pruned away (under certain conditions), leading to further computational savings. This was based on a linear system assumption. In this paper, we build on these works and make progress towards generalizing the solution for state- dependent observation systems. A major bottleneck for planning for state-dependent ob- servation systems is that the future covariance is no longer independent of the actual measurement values, as was the case in linear state-independent systems. The covariance up- date equations use the linear models and as such will depend on the state estimate and the measurements. Thus, the search tree will have to include all possible measurement values and not just all possible measurement locations. Furthermore, finding an optimal path is no longer sufficient. Instead one must find an optimal policy that prescribes the optimal set of actions for all possible measurements. We show how to use a min max tree to find such an optimal policy while at the same time leveraging the computational savings that hold for the linear case. III. PROBLEM FORMULATION We assume that the position of the robot is known accurately using onboard sensors (e.g., GPS, laser, IMU, cameras). The motion model of the robot is given by: Xr(t + 1) = f(Xr(t), u(t)) (1) where Xr = [xr(t), yr(t)]T ∈R2 is the position of robot and u(t) ∈U is the control input at time t. U is a finite space of control inputs. We assume there are n actions available as control inputs at any time: U(t) = {u1(t), u2(t), · · · , un(t)}. The robot is mounted with a sensor that is capable of obtaining a measurement of the target. We assume that the target’s motion model is given by: Xo(t + 1) = CtXo(t) + v(t) (2) where, Xo(t) = [xo(t), yo(t)]T is the 2-dimensional position of the target and v(t) is Gaussian noise of known covariance. The task of the robot is to track the target using its noisy measurements. The measurements, Z(t), can be a nonlinear function of the states of the target and robot: Z(t) = H(Xr(t))Xo(t) + ω(Xr(t), Xo(t)) (3) The measurement noise, ω(t), is a Gaussian whose variance depends on the distance between the robot and the target: ω(Xr(t), Xo(t)) ∼N 0, δ2 1 + δ2 2d(Xr(t), Xo(t))  (4) where, d(Xr(t), Xo(t)) =    C, ||Xr(t) −Xo(t)||2 > B C||Xr(t) −Xo(t)||2 B , ||Xr(t) −Xo(t)||2 ≤B (5) When the true distance between the robot and target is within B, we assume that measurement noise is proportional to the true distance. When the distance is greater than B), the variance is assumed to be a constant maximum value of C. The estimated position and the covariance matrix of the target at time t, ˆXo(t) and ˆΣo(t), is given by the Kalman Filter. The uncertainty in the estimate of the target’s position is measured by the trace of the covariance matrix. The problem considered in this paper can be formally stated as follows. Problem: Given an initial robot position Xr(0) and an initial target estimate [ ˆXo(0), ˆΣo(0)], find a sequence of control laws for the robot, σ = u0, u1, · · · , uT ∈UT from time t = 0 to t = T to minimize trace of the covariance in the target’s estimate at time t = T . That is, min u(t) max z(t) tr(ΣT ) (6) such that, Σt+1 = ρt+1(Σt), t = 0, 1, · · · , T −1 where ρt(·) is the Kalman Riccati equation [16]. The Riccati equation, ρ(·), maps the current covariance matrix ˆΣk, under a new measurement to the covariance matrix at the next time step, ρi(ˆΣk) =Ck ˆΣkCT k −Ck ˆΣkHT k (Hk ˆΣkHT k + ˆΣw)−1Hk ˆΣkCT k + Σv (7) where Hk is the matrix of the measurement equation com- puted around Xr(k) at time k. Σw is the covariance of the measurement noise given in Equations (3)–(5). The true position of the target is unknown making it impossible to determine Σw exactly. Consequently, we use an estimate of Σw using the estimated target’s position ˆXo(k). Therefore, the optimal solution for the problem defined will be a closed-loop policy that should map the estimated target’s position to the optimal control action for the robot. IV. CLOSED-LOOP CONTROL POLICY References [14], [15] solve a similar problem but for a linear Gaussian system. The linearity assumption makes the Riccati equation independent of the position of the target. Consequently, they show an open loop policy can determine the optimal control sequence for the robot. In our case, the optimal control policy for this state-dependent observation model case will be an adaptive (closed-loop) control policy. However, this generalization comes at the expense of discretization of the set of possible target mea- surements. Specifically, we assume that the measurement at any time step is chosen from one of k tuples of candidate measurements. That is, z(t) ∈{z1(t), z2(t), · · · , zk(t)}. These candidate measurements can be obtained by, for example, sampling from the continuous distribution of zero mean sensor noise around the current estimate of the target. For example, we can choose k candidate measurements from the data within 3 standard deviation of the mean value, which contain 99.7% of the possible measurements. -3 -2 -1 0 1 2 3 0 0.2 0.4 0.6 0.8 1 Zm-2 . . . Zm-1 Zm Z2 Z1 Z3 . . . Fig. 1. Obtaining a finite set of candidate measurements by discretizing the Gaussian distribution of the measurement noise. A. Optimal decisions: The minimax algorithm Minimizing the maximum trace can be thought of as playing a game against an adversary: The robot chooses the control actions to minimize the trace whereas the adversary (i.e., nature) chooses measurement noise to maximize the trace. By optimizing the min max trace, the robot determines the best conservative policy. We can find this optimal strategy by building a minimax tree. This tree enumerates all possible control laws and all possible measurements that the robot can obtain. A node on the kth level of the tree stores the position of the robot, Xr(k), the estimated position of the target, ˆXo(k), and the covariance matrix ˆΣk. Each node at an odd level has one branch per control action. Each node at an even level has one branch per candidate measurement. The robot’s state and the target’s estimate are updated appropriately along the control and measurement branches using the state transition equation (1) and the Kalman filter update equation, respectively. The minimax value is computed at the leaf nodes and is equal to the trace of the covariance matrix at that node. These values are propagated upwards to compute the optimal strategy. Figure 2 shows an example. B. Alpha Pruning The number of nodes in a naive minmax tree is exponential in the depth of the tree (i.e., in the planning horizon). As a first step in reducing the size of the tree, we implement α pruning [13]. The main idea in α pruning is that if we have explored a part of the tree, we have an upper bound on the optimal minimax value. When exploring a new node, ni, if we find that the minimax value of the subtree rooted at ni is greater than the upper bound found, that subtree does not need to be explored further. This is because an optimal strategy will never prefer a strategy that passes through ni since there exists a better control policy in another part of the tree. Note that ni must be a control node. Measurement nodes cannot be pruned since the robot has no control over the actual measurement values. The pruning algorithm is based on the general alpha-beta pruning [13] used in adversarial Algorithm 1: The minimax algorithms 1 S0 ←{(Xr(o), Σ0)} , St ←φ for t = 1, ...., T 2 Z = {z1, z2, ..., zk} 3 for t = 1 : T do 4 if NODE STATE (min) 5 for all (Xr(t −1), Σ(t −1)) ∈St−1 do 6 for all ui ∈U 7 Xr(t) ←f(Xr(t −1), ui) 8 St ←St S {(Xr(t), Σ(t −1))} 9 else if NODE STATE (max) 10 for all (Xr(t), Σ(t −1)) ∈St do 11 for all zi ∈Z 12 Σ(t) ←ρ(Xr(t), zi, Σ(t −1)) 13 St ←St S {(Xr(t), Σ(t))} 14 for t = 1 : T do 15 if TERMINAL-TEST (Max) 16 for each St do 17 V ←(max tr(Σi(t)), St(i)) 18 t ←t −1 19 if TERMINAL-TEST (Min) 20 for each St do 21 V ←(minimax tr(Σi(t)), St(i)) 22 t ←t −1 23 return V games. Fig. 2 shows a simple example of policy tree built using alpha pruning. C. Algebraic Redundancy Pruning In addition to alpha pruning, we extend the ideas presented by [14] for linear systems and extend them to get even further computational savings. If there are multiple nodes at the same level with the same Xr(t) values but different target estimates, we can prune one of the nodes if it is clearly “dominated” by the others. The notion of domination encodes the property that that particular node will never be a part of an optimal (minmax) policy. Reference [14] formalized the notion of domination by defining an algebraic redundancy constraint. We adapt this result for our notation as follows: Theorem 1 (Algebraic Redundancy [14]): Let H = {(Xj r(t), Σj t)} be a set of n nodes at the same level of the tree. If there exist non-negative constants α1, α2, . . . , αk such that, Σp t ⪰ k X i=1 αjΣj t and k X i=1 αi = 1 then the node (Xp r (t), Σp t ) is regarded as algebraically redun- dant1 with respect to H\{(Xp r (t), Σp t )} and (Xp r (t), Σp t ) and all of its descendants can be pruned without eliminating the optimal solution from the tree. They prove that the trace of any successor of (Xp r (t), Σp(t)) cannot be lower than one of the successors of 1M ⪰N represents that M −N is positive semi-definite. H \ {(Xp r (t), Σp(t))}. Our main insight is that a similar re- dundancy constrained can be defined for the state-dependent case with suitable additional constraints as described below. Theorem 2 (State-dependent Algebraic Redundancy): Let H = {(Xi r(t), ˆ Xi o(t), ˆΣi t)} be a set of N nodes at the same level. If there exists a node A = (XA r (t), ˆXA o (t), ˆΣA t ) at the same level such that: 1) the robot states are identical, i.e., XA r (t) = Xi r(t) for all i in H; 2) the least common ancestor of A with any other node in H is a control (i.e., min) node; 3) there exist non-negative αi such that for any K: HtΣA t HT t ⪰ N X i=1 αi  HtΣi tHT t + K δ2 1 + δ2 2C  (8) where, PN i=1 αi = 1, then there exists a node in H, say B, such that: tr(ΣA t+K) ≥tr(ΣB t+K). That is, the node A can be pruned from the minimax tree without eliminating the optimal policy. The proof is presented in the appendix. D. Sub-optimal Pruning algorithm Combining alpha pruning with linear state-independent algebraic redundancy pruning we can reduce a significant number of branches in the search tree while still guaranteeing optimality. We can further reduce the number of branches at the expense of losing optimality. This can be achieved by relaxing alpha-pruning and algebraic redundancy constraints. We use two parameters ǫ1 > 0 and ǫ2 > 0 as relax- ation parameters for alpha pruning and algebraic redundancy pruning, respectively. In each case, we bound the loss in optimality as a function of the parameters. Specifically, while building the tree, we prune away a node if it satisfies either of the following two conditions. When checking for alpha pruning, we prune a node if its alpha value is greater than or equal to the best minmax value found so far minus ǫ1. Similarly, we replace the constraint in Theorem 2 with the following: Ht(ΣA t + ǫ2)HT t ⪰ N X i=1 αi  HtΣi tHT t + K δ2 1 + δ2 2C  (9) By varying ǫ1 and ǫ2, we can vary the number of nodes in the search tree. Next we bound the resulting loss in the optimality of the algorithm. V. ERROR ANALYSIS In this section, we bound the value returned by the relaxed algorithm with respect to the optimal algorithm. Theorem 3 (ǫ1 Alpha Pruning): Let J∗ 2k = tr(ˆΣ∗ 2k) be the optimal minimax value returned by the full enumeration tree. If Jǫ1 2k = tr(ˆΣǫ1 2k) is the value returned by the ǫ1–alpha pruning algorithm, then 0 ≤Jǫ1 2k −J∗ 2k ≤ǫ1. The proof follows directly from the fact that if a node on the optimal policy, say ni is pruned away, then the alpha value at ni is at most the alpha value of some other node, say nj, that is present in the tree minus ǫ1. The alpha value of nj cannot be less than the value returned by the ǫ1 algorithm. The full proof is given in the appendix. The bound for ǫ2- algebraic redundancy pruning is more complicated. Theorem 4 (ǫ2 State-dependent Algebraic Redundancy): Let J∗ 2k = tr(ˆΣ∗ 2k) be the optimal minimax value returned by the full enumeration tree of 2k levels. If Jǫ2 2k = tr(ˆΣǫ2 2k) is the value returned by the ǫ2–algebraic redundancy pruning algorithm, then 0 ≤Jǫ2 2k −J∗ 2k ≤Bǫ2 where, Bǫ2 = tr    k X j=0   jY i=k−1 (Fi(Σ)Φ2i(Σ)) k−1 Y i=j (Fi(Σ)Φ2i(Σ))T  ǫ2    where, Fi(Σ) = C −CKi(Σ)Hi and Ki(Σ) is the Kalman gain given by Ki(Σ) = ΣHT i (HiΣHT i + Σw)−1, and Φ2k(·) is the application of the Riccati equation ρ(·), over k measurement steps: Φ2k(·) = ρ2(k−1)(ρ2(k−2)(. . . ρ0(·))) | {z } k steps ρ(·) . By combining the two results, we get 0 ≤Jǫ1,ǫ2 2k −J∗ 2k ≤max {ǫ1, Bǫ2} . VI. SIMULATIONS In this section, we present results from simulations to evaluate our pruning techniques. We carry out three types of evaluations. First, we investigate the savings of our algorithm by comparing the number of nodes in the pruned minimax tree and the full enumeration tree. Then, we study the effect of varying the ǫ1 and ǫ2 parameters on the number of nodes. Finally, we use the control policy given by our algorithm and execute it by drawing actual measurements from a random distribution. This represents a realistic scenario where the measurements are not necessarily adversarial. We demon- strate how our strategy can be used in such a case, and compare the average case performance with the worst-case performance. In all simulations, the robot can choose from four actions: U = n [+e, 0]T , [−e, 0]T , [0, +e]T , [0, −e]T o where e is a constant. Xr(t + 1) = Xr(t) + u(t) We build the tree using five candidate measurements at each step: z(t) = {z1(t), z2(t), · · · , z5(t)}. The five values are randomly generated with Gaussian noise. Pruned Optimal root (starting point) Control level Measurement level Control level Measurement level Fig. 3. A five level minimax tree with pruning (189 nodes) and full enumeration (505 nodes). 4 6 8 10 12 Depth of the tree 102 104 106 108 log(Total nodes number) With Pruning Brute Force Fig. 4. Comparison of the number of total nodes generated for minimax tree. Note the log scale. A. Comparing the Number of Nodes Fig. 3 and Fig. 4 shows the number of nodes left in the minimax tree after pruning as compared to a full enumer- ation tree. We prune a node by comparing it to the nodes already explored. More nodes will be pruned if initial nodes encountered are “closer” to the optimal policy. For instance, if the first set of nodes explored happen to be control laws that drive the robot close to the target, then we expect the nodes encountered later will be pruned closer to the root. To provide a fair assessment, we generate the search trees for various true positions of the target and report the average and standard deviation of the number of nodes in Fig. 4. Fig. 4 shows that our algorithm prunes orders of magni- tudes of nodes from the full enumeration tree. For a tree with depth 13, it takes 8.08×107 to enumerate all nodes but the same optimal solution can be computed using 4.36×105 nodes with our pruning strategy. Even though our algorithm prunes a large percentage of the nodes, the number of nodes still grows exponentially. By sacrificing optimality, we can prune even more nodes. We evaluate this by varying ǫ1 and ǫ2 individually first, and then jointly. As shown in Fig. 5, ǫ1–alpha pruning is relatively better at reducing the complexity of the minmax tree. This is intuitive because ǫ1-alpha pruning condition compares nearly every pair of nodes at the same depth. ǫ2-algebraic redun- dancy pruning, on the other hand, requires more conditions (same robot state) to be satisfied. Nevertheless, Fig. 5 shows that by sacrificing optimality, the number of nodes can be substantially reduced. 1 2 3 4 5 Depth of the tree 0 1 2 3 4 5 Total number of search tree nodes ×105 Fig. 5. Effect of the ǫ1 and ǫ2 relaxation parameters on the number of nodes in the search tree. The baseline case is the optimal solution with alpha pruning and algebraic redundancy with both parameters set to zero. B. Online Execution of the Search Tree So far, we have discussed the problem of building the minimax tree. Once the tree is built, the robot can execute the optimal policy. At the root node, the robot executes the first control action along the optimal minmax path found. Next, the robot obtains a measurement. This measurement may not correspond to the worst-case measurement. Furthermore, the actual value of the measurement may not even be in the k candidate measurements in the tree. The updated target estimate may not correspond to a node in the tree. Instead, we compute the distance between the actual measurement and find the closest match in the candidate set. The corresponding child node is now the new root node of the tree and the optimal policy starting at that node is executed, iteratively. Fig. 6 shows the execution in a simple instance. VII. CONCLUSION We investigated the problem of devising closed-loop con- trol policies for state-dependent target tracking. Unlike state- independent tracking, the value of a candidate control law in state-dependent target tracking is a function of the history of measurements obtained. Consequently, planning over a hori- zon requires taking into account possible measurement val- ues. In this paper, we focused on minimizing the worst-case uncertainty. Our solution consisted of building a min max search tree to obtain the control policy. A full enumeration tree has size exponential in the number of measurements, control laws, and the planning horizon. Instead, we exploited the structural properties of Kalman filter to yield a tree with significantly less number of possible nodes without sacrificing the optimality guarantees. We also showed how two parameters, ǫ1 and ǫ2, can be used to yield even more computational savings at the expense of optimality. One disadvantage of the generalization is the need to discretize the set of possible future measurements. Our immediate future work is to bound the suboptimality as a function of the number of discrete samples chosen to represent the continuous set of future measurements. Once a bound is obtained, the user may choose the correct trade- off between the computation time and the solution quality desired. A second avenue of future work focuses on extend- ing these results to multi-robot, multi-target scenarios. Our prior work [17] has shown a greedy assignment of robot trajectories to targets yield provably approximate solutions for one-step planning. We will extend this to planning over a finite horizon using the results presented in this paper. REFERENCES [1] Y. Bar-Shalom, X. R. Li, and T. Kirubarajan, Estimation with Applica- tions to Tracking and Navigation: Theory, Algorithms, and Software. John Wiley & Sons, 2004. [2] X. R. Li and V. P. Jilkov, “Survey of maneuvering target tracking. part i. dynamic models,” IEEE Transactions on aerospace and electronic systems, vol. 39, no. 4, pp. 1333–1364, 2003. [3] ——, “Survey of maneuvering target tracking. part ii: motion models of ballistic and space targets,” IEEE Transactions on Aerospace and Electronic Systems, vol. 46, no. 1, pp. 96–119, 2010. [4] ——, “A Survey of Maneuvering Target TrackingPart III: Measure- ment Models,” in Proceedings of SPIE, vol. 4473, 2001, p. 424. [5] ——, “A Survey of Maneuvering Target TrackingPart IV: Decision- Based Methods,” in Proceedings of SPIE, vol. 4728, 2002, p. 512. [6] ——, “Survey of maneuvering target tracking. part v. multiple-model methods,” IEEE Transactions on Aerospace and Electronic Systems, vol. 41, no. 4, pp. 1255–1321, 2005. [7] B. Rao, H. F. Durrant-Whyte, and J. Sheen, “A fully decentralized multi-sensor system for tracking and surveillance,” The International Journal of Robotics Research, vol. 12, no. 1, pp. 20–44, 1993. [8] N. Karnad and V. Isler, “Modeling human motion patterns for multi- robot planning,” in Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2012, pp. 3161–3166. [9] M. Montemerlo, J. Pineau, N. Roy, S. Thrun, and V. Verma, “Expe- riences with a mobile robotic guide for the elderly,” in Proceedings of the AAAI National Conference on Artificial Intelligence,, 2002, pp. 587–592. [10] V. Isler, N. Noori, P. Plonski, A. Renzaglia, P. Tokekar, and J. Vander Hook, “Finding and tracking targets in the wild: Algorithms and field deployments,” in International Symposium on Safety, Security, and Rescue Robotics, 2015. [11] P. Tokekar, E. Branson, J. Vander Hook, and V. Isler, “Tracking aquatic invaders: Autonomous robots for invasive fish,” IEEE Robotics and Automation Magazine, 2013. [12] P. Tokekar, J. Vander Hook, and V. Isler, “Active target localization for bearing based robotic telemetry,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011. [13] S. Russell and P. Norvig, Artificial Intelligence: A modern approach. Prentice-Hall, 1995. [14] M. P. Vitus, W. Zhang, A. Abate, J. Hu, and C. J. Tomlin, “On efficient sensor scheduling for linear dynamical systems,” Automatica, vol. 48, no. 10, pp. 2482–2493, October 2012. [15] N. Atanasov, J. Le Ny, K. Daniilidis, and G. J. Pappas, “Informa- tion acquisition with sensing robots: Algorithms and error bounds,” in Proceedings of IEEE International Conference on Robotics and Automation, 2014, pp. 6447–6454. [16] P. R. Kumar and P. Varaiya, Stochastic systems: Estimation, identifi- cation, and adaptive control. Prentice Hall, NJ, 1986, vol. 986. [17] P. Tokekar, V. Isler, and A. Franchi, “Multi-target visual tracking with aerial robots,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014. APPENDIX Theorem 5: (Monotonicity of state dependent Riccati equation ) If two nodes of a linear stochastic system satisfy the condition: HΣA t HT +SA ⪰HΣB t HT +SB and ΣA t ⪰ΣB t , then after apply one step Riccati map, we have: ρ(ΣA t ) ⪰ρ(ΣB t ) where, 0 ≤SA ≤a, 0 ≤SB ≤a ρ(Σk) =CkΣkCT k −Ck ˆΣkHT k (Hk ˆΣkHT k + SA)−1Hk ˆΣkCT k + Σv (10) Proof: L(ΣA k ) −L(ΣB k ) =CkΣA k CT k −CkΣA k HT HΣA k HT + SA−1 HΣA k CT k −CkΣB k CT k + CkΣB k HT HΣB k HT + SB−1 HΣB k CT k (11) We define: K(Σ) := −FΣHT (HΣHT + S)−1 F(Σ) := F −(FΣHT )(HΣHT + S)−1 Note that F(Σ) = F + K(Σ)H, and K(Σ)(FΣHT )T = −K(Σ)(HΣHT + S)K(Σ)T Then, L(ΣA t ) −L(ΣB t ) −F(ΣA t )(ΣA t −ΣB t )F T (ΣA t ) =K(ΣA t )FΣA t HT −K(ΣB t )FΣB t HT −K(ΣA t )H(ΣA t −ΣB t )F T −F(ΣA t −ΣB t )HT KT (ΣA t ) −K(ΣA t )H(ΣA t −ΣB t )HT K(ΣA t )T =K(ΣA t )FΣA t HT −K(ΣB t )FΣB t HT −K(ΣA t )[HΣA t F T −HΣB t F T ] −[FΣA t HT −FΣB t HT ]KT(ΣA t ) −K(ΣA t )H(ΣA t −ΣB t )HT KT (ΣA t ) = −K(ΣB t )(FΣB t HT )T + K(ΣA t )(FΣB t HT ) + (FΣB t HT )T KT (ΣA t ) + K(ΣA t )(HΣB t HT + SA)KT (ΣA t ) =K(ΣB t )(HΣB t HT + SB)KT (ΣB t ) + K(ΣA t )(HΣB t HT + SA)KT (ΣA t ) −K(ΣA t )(HΣB t HT + SB)KT (ΣB t ) −K(ΣB t )(HΣB t HT + SB)KT (ΣA t ) =(K(ΣB t ) −K(ΣA t ))(HΣB t HT + SB)(K(ΣB t ) −K(ΣA t ))T + K(ΣA t )(SA −SB)KT (ΣA t ) (12) Define M = max (F + K(ΣA t )H)(F + K(ΣA t )H)T , K(ΣA t )KT (ΣA t )  (13) So, we have, L(ΣA t ) −L(ΣB t ) =F(ΣA t )(ΣA t −ΣB t )F T (ΣA t ) + K(ΣA t )(SA −SB)KT (ΣA t ) + (K(ΣB t ) −K(ΣA t ))(HΣB t HT + SB)(K(ΣB t )T −K(ΣA t ))T =(F + K(ΣA t )H)(ΣA t −ΣB t )(F + K(ΣA t )H)T + K(ΣA t )(SA −SB)KT (ΣA t ) + (K(ΣB t ) −K(ΣA t ))(HΣB t HT + SB)(K(ΣB t ) −K(ΣA t ))T From [13], we have ⪰K(ΣA t )H(ΣA t −ΣB t )HT KT(ΣA t ) + K(ΣA t )(SA −SB)KT (ΣA t ) + (K(ΣB t ) −K(ΣA t ))(HΣB t HT + SB)(K(ΣB t ) −K(ΣA t ))T =K(ΣA t ) (HΣA t HT + SA) −(HΣB t HT + SB)  KT (ΣA t ) + (K(ΣB t ) −K(ΣA t ))(HΣB t HT + SB)(K(ΣB t ) −K(ΣA t ))T since HΣA t HT + SA ⪰HΣB t HT + SB ⪰0 (14) PROOF OF THEOREM 2 Proof: We first prove a special case when H consists of only one node, B. That is, we have: HtΣA t HT t ⪰HtΣB t HT t + K · aI where, a = δ2 1 + δ2 2C. To prove: tr(ΣA t+K) ≥tr(ΣB t+K). 1) Show that the statement holds for K = 1. When K = 1, HΣA t HT ⪰HΣB t HT + aI. From the Kalman Riccati map, ΣA t+1 = ρi(ΣA t , xr(t), ˆxo(t)) =CΣA t CT −CΣA t HT t · HtΣA t HT t + Σwi xr(t), ˆxA o (t) −1 HtΣA t CT + Σv Applying Theorem 5, we have ⪰CΣB t CT −CΣB t HT t · HtΣB t HT t + Σwi xr(t), ˆxB o (t) −1 HtΣB t CT + Σv =ΣB t+1 (15) 2) Inductive step: Show that if the claims holds for K = M, then it also holds for K = M + 1. This can be done as follows: Assume the claim holds for K = M. Let ΣB′(t) = ΣB(t) + a · (HT H)−1, based on the condition of K = M we have, ΣA t+M ⪰ΣB′ t+M that is, ΣA t+M ⪰ΣB t+M + a · (HT H)−1 Similar to the step (1): ΣA t+M+1 ⪰ΣB t+M+1 Thereby showing that indeed K = M + 1 holds. Since both the base case and the inductive step have been performed, by mathematical induction, the statement holds for all natural numbers n. Then, we extend the proof from comparing two nodes to arbitrary N nodes case, if we have HtΣA t HT t ⪰ N X i=1 αi  HtΣi tHT t + K δ2 1 + δ2 2C  (16) Without loss of generality, we assume ΣB′ is the minimum covariance matrix (in the positive semi-definite sense) among Σ1 t, Σ2 t, ..., ΣN t . HtΣA t HT t ⪰ N X i=1 αi  HtΣi tHT t + K δ2 1 + δ2 2C  ⪰ N X i=1 αi h HtΣB′ t HT t + K δ2 1 + δ2 2C i = HtΣB′ t HT t + K δ2 1 + δ2 2C  (17) Using the result from the induction above, after K minmax tree steps, there always exist a node B′, such that, tr(ΣA t+K) ≥tr(ΣB′ t+K) (18) Therefore, node A can be pruned without reducing the optimality of the minmax tree. PROOF OF THEOREM 4 Proof: For some level i, suppose that we prune a node on the optimal policy. We have, tr(H (Σǫ2 2i) HT ) ≤tr(H (Σ∗ 2i + ǫ2I) HT ) From [14], we know that ∀Σ, Q ∈Rn×n and ǫ ≥0: ρ2i(Σ + ǫQ) ⪯ρ2i(Σ) + Fi(Σ)QF T i (Σ)ǫ. Applying to the above equation we get, Φ2k(Σ + ǫ2Q) = ρ2(k−1)(Φ2(k−1)(Σ + ǫ2Q)) =ρ2(k−1)(ρ2(k−2)(. . . ρ0(Σ + ǫ2Q))) ⪯Φ2k(Σ) + ρ2(k−1)(ρ2(k−2)(. . . ρ2(F1(Σ)QF T 1 (Σ)) ... =Φ2k(Σ)+ " 0 Y i=k−1 (Fi(Σ)Φ2i(Σ)) Q k−1 Y i=0 (Fi(Σ)Φ2i(Σ))T # ǫ2 + o(ǫ2) ⪯Φ2k(Σ)+ " 0 Y i=k−1 (Fi(Σ)Φ2i(Σ)) Q k−1 Y i=0 (Fi(Σ)Φ2i(Σ))T # ǫ2 Let n ˆΣ∗ i ok i=1 be the series of covariance matrices along the optimal minmax trajectory. Suppose that the sequence of covariance matrices along the optimal trajectory returned by ǫ2–algebraic redundancy pruning algorithm is n ˆΣǫ2 i ok i=1. We get, ˆΣǫ2 i ⪯ˆΣ∗ i + ǫ2I, ∀i = 1, 2, . . ., k By combining the two results, we obtain the desired bound: 0 ≤Jǫ2 2k −J∗ 2k = tr(ˆΣǫ2 k ) −tr(ˆΣ∗ k) ≤tr    k X j=0   jY i=k−1 (Fi(Σ)Φ2i(Σ)) k−1 Y i=j (Fi(Σ)Φ2i(Σ))T  ǫ2    = Bǫ2 7 9 2 4 7 5 8 7 2 5 7 minmax tree (before alpha pruning) max min max root node max min minimax tree max u1 u2 un z1 zm root node 7 9 2 4 5 7 2 5 7 minmax tree (before alpha pruning) Fig. 2. A minimax tree with alpha pruning. ▽and △are nodes in which we compute the minimum or maximum value of its children. The value at the leaf nodes equals the tr(Σk)). ▽and △nodes represent control and measurement nodes, respectively. The filled ▽are pruned by alpha pruning. 2k level minimax tree u1 un z1 zk 2k level minimax tree u1 u2 un z1 zk 2k level minimax tree u1 u2 un z1 zk u1 u2 un z1 zk add one new control and measurement level u1 un z1 Take actual measurement zk Apply optimal action u1 Fig. 6. Online measurement update with a minmax tree. The actual measurement, zk may not correspond to a measurement node in the tree. In such a case, we choose the “closest” measurement in the tree.