Information Acquisition with Sensing Robots: Algorithms and Error Bounds Nikolay Atanasov, Jerome Le Ny, Kostas Daniilidis, and George J. Pappas Abstract—Utilizing the capabilities of configurable sensing systems requires addressing difficult information gathering prob- lems. Near-optimal approaches exist for sensing systems without internal states. However, when it comes to optimizing the trajec- tories of mobile sensors the solutions are often greedy and rarely provide performance guarantees. Notably, under linear Gaussian assumptions, the problem becomes deterministic and can be solved off-line. Approaches based on submodularity have been applied by ignoring the sensor dynamics and greedily selecting informative locations in the environment. This paper presents a non-greedy algorithm with suboptimality guarantees, which does not rely on submodularity and takes the sensor dynamics into account. Our method performs provably better than the widely used greedy one. Coupled with linearization and model predictive control, it can be used to generate adaptive policies for mobile sensors with non-linear sensing models. Applications in gas concentration mapping and target tracking are presented. I. INTRODUCTION Miniaturization, wireless communication, and sensor tech- nology have advanced remarkably in recent years. Au- tonomous vehicles instrumented with various sensors and interconnected by a network are deployed in large numbers, leading to configurable networked sensing systems. In order to utilize their capabilities, some important information gather- ing problems such as environmental monitoring (temperature, humidity, air quality, etc.) [1], [2], [3], surveillance and recon- naissance [4], [5], search and rescue [6], active perception and active SLAM [7], [8], [9] need to be addressed. Sensor management [10] offers a formal methodology to control the degrees of freedom of sensing systems in order to improve the information acquisition process. Much research in the field has been focused on sensors without internal states by studying the problem of sensor scheduling, also referred to as sensor selection. Efficient non-myopic approaches have been proposed [11], [12], [13]. However, when it comes to mobile sensors, which posses internal states, the results are This work was supported by ONR-MURI HUNT grant N00014-08-1-0696 and by TerraSwarm, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA. N. Atanasov and G. Pappas are with the Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA 19104, {atanasov, pappasg}@seas.upenn.edu. J. Le Ny is with the Department of Electrical Engineering, Ecole Polytech- nique de Montreal, ON, Canada, jerome.le-ny@polymtl.ca K. Daniilidis is with the Department of Computer and Infor- mation Science, University of Pennsylvania, Philadelphia, PA 19104, kostas@cis.upenn.edu This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. limited. The main complication is that the evolution of the states depends on the management decisions and affects the measurements in the long run. Concretely, whereas in radar management a sensor can switch instantaneously between targets [14], a feasible and informative path needs to be designed for a sensing robot [15]. Due to this complication, most approaches for mobile sensor management are myopic [16], [17] or use short planning horizons [18], [19]. However, it is precisely the presence of an internal state that makes multi- step optimization important. The behavior of the observed phenomenon needs to be predicted at an early stage to facilitate effective control of the mobile sensor. A key insight is that under linear Gaussian assumptions the problem can be formulated as deterministic optimal control [20]. As a result, informative sensing paths can be planned off-line. Some of the successful approaches rely on a submod- ular function to quantify the informativeness of the sensor paths [15], [21]. The sensing locations are partitioned into independent clusters. Submodularity is used to greedily select informative locations within clusters. An orienteering problem is solved to choose the sequence of clusters to visit. The drawback is that within clusters the movement of the sensor is restricted to a graph, essentially ignoring the sensor dynamics. As a result, the cluster sizes cannot be increased much without affecting the quality of the solution. We address this limitation by considering the sensor dynamics and planning non-greedily. The contribution of this paper is an approach for discrete- time dynamic sensors with linear Gaussian sensing models to track a target with linear dynamics. We present an approximate non-myopic algorithm to decrease the complexity of obtaining an optimal policy, while providing strong performance guar- antees. The generated suboptimal policy performs provably better than the widely used greedy one. Our work can be considered a search-based method for planning in information space. Related work in this area includes [22], [20]. Sampling- based methods have been proposed as well [23], [24]. Just as in traditional planning, they are able to find feasible solutions quickly but provide no finite-time guarantees on optimality. Approaches which do not make linear Gaussian assumptions and use non-linear filters exist as well [25], [26], [27]. They can handle general sensing and target models but are compu- tationally demanding and have no performance guarantees. The rest of the paper is organized as follows. In Section II we formulate the information acquisition problem precisely. In Section III we prove a separation principle, which allows computing the optimal sensor trajectory off-line via forward value iteration. We discuss how to reduce the exponential arXiv:1309.5390v1 [cs.SY] 20 Sep 2013 complexity and develop an approximate algorithm with subop- timality guarantees in Section IV. In Section V we show how to generate adaptive policies for non-linear mobile sensors. Applications in gas distribution mapping and target tracking are presented. All proofs are provided in the Appendix. II. PROBLEM FORMULATION A. Information acquisition with sensing robots Consider a sensor mounted on a robot or vehicle, whose dynamics are governed by the following sensor motion model: xt+1 = f(xt, ut), (1) where x ∈X ∼= Rnx is the sensor state, X is an nx- dimensional state space with metric dX , u ∈U is the control input, and U is a finite space of admissible controls. The state of the vehicle is assumed known. The task of the sensor is to track the evolution of the state of a target, whose dynamics are governed by a linear target motion model: yt+1 = Ayt + wt, wt ∼N(0, W), (2) where y ∈Rny is the target state, A ∈Rny×ny, and wt is a white Gaussian noise with covariance EwtwT τ = Wδ(t −τ). The minimum eigenvalue of W is denoted by λW . The operation of the sensor is governed by the following sensor observation model: zt = H(xt)yt + vt(xt), vt(xt) ∼N(0, V (xt)), (3) where zt ∈Rnz is the measurement signal, H(xt) ∈Rnz×ny, and vt(xt) is a sensor-state-dependent Gaussian noise, whose values at any pair of times are independent. The measurement noise is independent of the target noise wt as well. Note that the observation model is linear in the target state but might depend nonlinearly on the sensor state, which typically reflects the fact that the sensor has a limited field of view. The information available to the sensor at time t to compute the control ut is summarized in the information state: I0 := z0 It := (z0:t, u0:(t−1)) ∈ Rnzt+1 × Ut, t > 0. Problem (Active Information Acquisition). Given an initial sensor pose x0 ∈X, a prior distribution of the target state y1, and a planning horizon T, the task of the sensor is to choose a sequence of functions µ0 : Rnz →U, µt : R(t+1)nz ×Ut →U for t = 1, . . . , T −1, which maximize the mutual information between the final target state yT +1 and the measurement set z1:T : max µ0,...,µT −1 I(yT +1; z1:T | x1:T ) (4) s.t. xt+1 = f(xt, µt(It)), t = 0, . . . , T −1, yt+1 = Ayt + wt, t = 1, . . . , T, zt = H(xt)yt + vt(xt), t = 1, . . . , T. Remark: To simplify notation, we work with autonomous models but all results hold for time-varying ft, At, Wt, and Ht with λW > 0 such that λW Iny ⪯Wt for all t ∈[1, T]. Problem (4) subsumes the static case where sensors do not have internal states. In particular, if f(x, u) := u and we think of the control u as choosing a subset of sensors to be activated, then (4) becomes a sensor scheduling problem. Similarly, if the space of sensor configurations, X, is restricted to a graph and revisiting is not allowed, we get the problem in [15]. Numerous information gathering tasks can be cast in the form of (4). For example, environmental monitoring [20], active target tracking (Sec. V, [25]), and active mapping with gas (Sec. V), stereo [28], laser, or any other sensor, whose operation is captured reasonably by a linearized model. To motivate the discussion consider the methane emission monitoring problem, which was addressed by the Best Service Robotics paper [3] at ICRA 2013. The task is to estimate the gas concentration in a landfill using a remote methane leak detector (RMLD) based on tunable diode laser absorption spectroscopy. The RMLD sensor is mounted on a robotic platform, Gasbot, which follows an exploration path pre- specified by hand. In this paper, we would like to automatically generate the most informative path for the Gasbot. In detail, suppose that state of the Gasbot consists of its 2D position (x1, x2) ∈R2 and the orientation of the RMLD sen- sor θ ∈SO(2) so that x := (x1, x2, θ). At each time period, the Gasbot can move on a grid and choose the orientation of the sensor in 30◦increments: Θ := {−π, −5π/6, . . . , 5π/6}, U := dx dy dθT  dx dy  ∈{0, ±e1, ±e2}, dθ ∈Θ  , where e1 and e2 are the standard basis vectors in R2. The sensor motion model is: xt+1 = x1 t+1 x2 t+1 θt+1  = x1 t + dx x2 t + dy dθ . Given a time horizon T, we would like to choose a control pol- icy for the Gasbot in order to obtain a good map yT +1 ∈Rny of the gas concentration in the landfill. The ith component, yi T +1, represents the estimate of the concentration in parts per million (ppm) in the ith cell of the map. We assume a static methane field so that A = Iny, W = 0. It was experimentally verified in [3] that a good sensor observation model is: zt = H(xt)yt + vt = ny X i=1 li(xt)yi t + vt, vt ∼N(0, V ), where the ith component of H(xt) ∈R1×ny is the distance li(xt) traveled by sensor laser beam in cell yi t for the given sensor pose xt. Solving problem (4) will provide an automatic way to control the Gasbot in order to obtain an accurate map of the methane concentration. B. Notation The sets of n × n symmetric positive definite and semidef- inite matrices are denoted Sn ++ and Sn + respectively. Given a control sequence σ = ut, . . . , uT −1 ∈UT −t at time t, the corresponding sensor trajectory is π(σ) := xt+1, . . . , xT ∈ X T −t. Also, let σi := ut+i, πi+1 := xt+i+1, and π(i + 1) := xt+i+1, . . . , xT ∈X T −t−i for i = 0, . . . , (T −t −1). III. A SEPARATION PRINCIPLE A. Open-loop control is optimal Active Information Acquisition (4) is a stochastic optimal control problem and in general for such problems adaptive (closed-loop) control policies have a significant advantage over non-adaptive (open-loop) ones. However, due to the linearity of the observation model (3) in the target state and the Gaussian-noise assumptions, it can be shown that (4) reduces to a deterministic optimal control problem, for which open- loop policies are optimal. Theorem 1. If the prior distribution of y1 is Gaussian with covariance Σ0 ∈Sny + , there exists an open-loop control sequence σ = u0, . . . , uT −1 ∈UT, which is optimal in (4). Moreover, (4) is equivalent to the following deterministic optimal control problem: min σ∈UT log det(ΣT ) (5) s.t. xt+1 = f(xt, σt), t = 0, . . . , T −1, Σt+1 = ρxt+1(Σt), t = 0, . . . , T −1, where ρx(·) := ρp(ρe x(·)) is the Kalman filter Riccati map: Update: ρe x(Σ) := (Σ−1 + M(x))−1 = Cx(Σ)Σ = Cx(Σ)ΣCT x (Σ) + Kx(Σ)V (x)KT x (Σ) M(x) := H(x)T V (x)−1H(x) (6) Cx(Σ) := I −Kx(Σ)H(x) = (I + ΣMx)−1 Kx(Σ) := ΣH(x)T R−1 x (Σ) Rx(Σ) := H(x)ΣH(x)T + V (x) Predict: ρp(Σ) := AΣAT + W and M(·) ∈Rny×ny is called the sensor information matrix. Remark: Our solution to (5) is applicable to any monotone concave cost function but for clarity we use log det(·). Re- mark: the conditional mean of yt+1 given z0:t can also be computed recursively via the Kalman filter: Update: ξe(y) := y + Kx(Σ)(z −H(x)y) Predict: ξp(y) := Ay but does not play a role in the optimization (5). B. Forward value iteration Separation principles like Thm. 1 have been exploited since the early work on sensor management [29], [30]. The result is significant because the state space of a deterministic control problem is much smaller than the stochastic version (distribu- tions over hidden variables are not needed) and an open-loop policy can be computed off-line. As discussed in [20], the main bottleneck for computations is the large dimension of the state (xt, Σt) and it is beneficial to use forward value iteration (Alg. 1). The advantage is that the set of reachable covariance matrices is built progressively starting from the initial state. As a result, only the reachable subspace is considered instead Algorithm 1 Forward Value Iteration 1: S0 ←{(x0, Σ0)}, St ←∅for t = 1, . . . , T 2: for t = 1 : T do 3: for all (x, Σ) ∈St−1 do 4: for all u ∈U do 5: xt ←f(x, u) 6: St ←St ∪{(xt, ρxt(Σ))} 7: return min {log det(Σ) | (x, Σ) ∈ST } of discretizing the whole space of covariances as required by backward value iteration. The forward value iteration (FVI) is constructing a search tree, Tt, with nodes at stage t ∈[0, T] corresponding to the reachable states (xt, Σt). The algorithm starts from the initial node (x0, Σ0) and uses the controls from U to obtain the set of nodes S1 reachable at time t = 1. In general, starting from node (xt, Σt), there is an edge for each control in U leading to a node (xt+1, Σt+1) obtained from the dynamics in (5). Even though FVI provides a computational advantage to the backward version, the number of nodes in Tt corresponds to the number of sensor trajectories of length t and grows exponentially. Alg. 1 is guaranteed to find the optimal control sequence σ∗but has exponential complexity, O(|U|T ). The other extreme is the greedy policy σg: σg t ∈arg min u∈U  log det ρf(xt,u)(Σt)  , t ∈[0, T −1], (7) which is computationally very efficient, O(|U|T), but no guarantees exist for its performance. We can think of it as a modification of Alg. 1, which discards all but the best (lowest cost) node at level t of the tree Tt. The goal of this paper is to provide an algorithm, with complexity lower than that of FVI and performance better than that of the greedy policy (7). This can be achieved by discarding some but not all of the nodes at level t of Tt. Intuitively, if two sensor trajectories cross, i.e. there are two nodes at level t of Tt with the same vehicle configuration x but different covariances, and one is clearly less informative, i.e. its covariance is dominated by the other one, then it is not necessary to keep it. The following section will make this intuition precise. IV. REDUCED VALUE ITERATION A. Optimality-preserving reductions A notion of redundancy for the nodes in the search tree T is provided by the following definition. Definition 1 (Algebraic redundancy [11]). Let {Σi}K i=1 ⊂Sn + be a finite set. A matrix Σ ∈Sn + is algebraically redundant with respect to {Σi}, if there exist nonnegative constants {αi}K i=1 such that: PK i=1 αi = 1 and Σ ⪰PK i=1 αiΣi. Remark: Definition 1 is a simplified version of the one in [11]. As shown in the proof of Theorem 2, the simplification is possible because the objective function in (5) minimizes the estimation error only at the final stage. The next theorem shows that when several sensor paths cross at time t, i.e. there are several nodes at level t of Tt with the same vehicle state, algebraically redundant ones can be discarded without removing the optimal one. Theorem 2 (Optimal reduction). For t ∈[1, T], let (x, Σ) ∈ St be a node in level t of the search tree Tt. If there exist a set {xi, Σi} ⊆St \ {(x, Σ)} such that x = xi, ∀i and Σ is algebraically redundant with respect to {Σi}, then (x, Σ) can be removed from St without eliminating the optimal trajectory. B. ϵ-Suboptimal reductions At the expense of losing optimality, we can discard even more of the crossing trajectories by using a relaxed notion of algebraic redundancy. Definition 2 (ϵ-Algebraic redundancy [11]). Let ϵ ≥0 and let {Σi}K i=1 ⊂Sn + be a finite set. A matrix Σ ∈Sn + is ϵ- algebraically redundant with respect to {Σi}, if there exist nonnegative constants {αi}K i=1 such that: PK i=1 αi = 1 and Σ + ϵIn ⪰PK i=1 αiΣi. Let π∗ = x∗ 1, . . . , x∗ T ∈ X T be the optimal sensor trajectory in TT with covariance sequence Σ∗ 1, . . . , Σ∗ T and cost V ∗ T := log det(Σ∗ T ). Denote the search tree at time t with all ϵ-algebraically redundant nodes removed by T ϵ t . Let πϵ = xϵ 1, . . . , xϵ T ∈X T be the trajectory obtained by forward value iteration on the reduced tree T ϵ T with a corresponding covariance sequence Σϵ 1, . . . , Σϵ T and cost V ϵ T := log det(Σϵ T ). The following theorem provides an upper bound on the gap, (V ϵ T −V ∗ T ), between the performances of πϵ and π∗. Theorem 3 (ϵ-Suboptimal reduction). Let β∗< ∞be the peak estimation error of the optimal policy, Σ∗ t ⪯β∗Iny, for t ∈[1, T]. Then, 0 ≤V ϵ T −V ∗ T ≤ϵ∆T , (8) where ∆T := ny λW  1 + β2 ∗ λ2 W 1 −ηT −1 ∗  , η∗:= β∗ β∗+ λW < 1. Remark: If the peak estimation error β∗remains bounded as T →∞, i.e. the sensor performs well when using the optimal policy, then ∆T → ny λW  1 + β2 ∗ λ2 W  . In other words, the suboptimality gap of πϵ is bounded regardless of the length T of the planning horizon and grows linearly in ϵ! C. (ϵ, δ)-Suboptimal reductions If the motion of the sensor is restricted to a graph many of the planned trajectories will be crossing and the results from Thm. 3 are very satisfactory. However, if the space of sensor configurations, X, is continuous, depending on the sensor motion model (e.g. differential drive), it is possible that no two sensor states reachable at time t are exactly equal. Then, the reductions from Thm. 3 cannot be applied. To avoid this case, we can relax the notion of trajectory crossing at time t. Definition 3 (Trajectory δ-Crossing). Trajectories π1, π2 ∈ X T δ-cross at time t ∈[1, T] if dX (π1 t , π2 t ) ≤δ for δ ≥0. Instead of discarding ϵ-algebraically redundant trajectories which cross, we can discard those which δ-cross. Let T ϵ,δ t be the reduced tree at time t, with all ϵ-algebraically redundant δ-crossing nodes removed. Some continuity assumptions are necessary in order to provide suboptimality guarantees for searching within T ϵ,δ t . Assumption 1 (Motion Model Continuity). The sensor mo- tion model is Lipschitz continuous in x with Lipschitz constant Lf ≥0 for every fixed u ∈U: dX (f(x1, u), f(x2, u)) ≤LfdX (x1, x2), ∀x1, x2 ∈X. Assumption 2 (Observation Model Continuity). There exists a real constant Lm ≥0 such that: M(x1) ⪯ 1 + LmdX (x1, x2)  M(x2) M(x2) ⪯ 1 + LmdX (x1, x2)  M(x1), ∀x1, x2 ∈X, where M(·) is the sensor information matrix (6). Assumption 1 says that when two sensor configurations are close and the same sequence of controls is applied, then the resulting trajectories will remain close. Assumption 2 says that sensing from similar configurations results in similar informa- tion gain. This gives the intuition that when two trajectories δ-cross their future informativeness will be similar. We make this intuition precise in the next theorem. Let πϵ,δ ∈X T be the sensor trajectory obtained by searching the reduced tree T ϵ,δ T with corresponding cost V ϵ,δ T . The gap, (V ϵ,δ T −V ∗ T ), between the performances of πϵ,δ and π∗is bounded as follows. Theorem 4 ((ϵ, δ)-Suboptimal reduction). Let β∗< ∞be the peak estimation error of the optimal policy, Σ∗ t ⪯β∗Iny, for t ∈[1, T]. Then, 0 ≤V ϵ,δ T −V ∗ T ≤(ζT −1)  V ∗ T −log det(W)  + ϵ∆T , (9) where ζt := t−1 Y τ=1  1 + τ X s=1 Ls fLmδ  ≥1, ∆T := ny λW  1 + β∗ λW T −1 X τ=1 ζT ζτ ηT −τ ∗  , η∗:= β∗ β∗+ λW < 1. Notice that Thm. 3 is a special case of Thm. 4 because if δ = 0, then ζt = 1, ∀t ∈[1, T] and (9) reduces to (8). Then, the suboptimality gap remains bounded regardless of the time horizon and grows linearly with ϵ. If δ > 0, then limT →∞ζT = ∞and the suboptimality gap grows with T and δ. The bound is loose, however, because it uses a worst-case analysis. The worst case happens when a trajectory, which was discarded from T ϵ,δ T , persistently obtains more information than a trajectory, which remains very close in space and is still in the search tree. Even then, if the optimal policy performs well, the term V ∗ T −log det(W) gets smaller as ζT increases and the suboptimality gap remains small. D. (ϵ, δ)-Reduced value iteration The approaches for reducing the search tree, developed in the previous subsections, can be used to significantly decrease the complexity of the FVI algorithm (Alg. 1), while providing suboptimality guarantees (Thm. 4). The (ϵ, δ)-Reduced Value Iteration (RVI) is summarized in Alg. 2. Algorithm 2 (ϵ, δ)-Reduced Value Iteration 1: S0 ←{(x0, Σ0)}, St ←∅for t = 1, . . . , T 2: for t = 1 : T do 3: for all (x, Σ) ∈St−1 do 4: for all u ∈U do 5: xt ←f(x, u) 6: St ←St ∪{(xt, ρxt(Σ))} 7: Sort St in ascending order according to log det(·) 8: S′ t ←St[1] % State with lowest cost 9: for all (x, Σ) ∈St \ St[1] do 10: % Find all nodes in S′ t, which δ-cross x: 11: Q ←{Σ′ | (x′, Σ′) ∈S′ t, dX (x, x′) ≤δ} 12: if isempty(Q) or not( Σ is ϵ-alg. redundant wrt Q ) then 13: S′ t ←S′ t ∪(x, Σ) 14: St ←S′ t 15: return min {log det(Σ) | (x, Σ) ∈ST } The most computationally demanding operation is checking ϵ-algebraic redundancy (Line 12). It is a feasibility problem for a linear matrix inequality (LMI) and off-the-shelf solvers exist [31]. An appealing property of Alg. 2 is that at stage t at least the node with currently lowest cost is retained (Line 8). This guarantees that the control sequence obtained from RVI performs at least as well as the greedy policy (7). V. APPLICATIONS A. Gas distribution mapping and leak localization In this subsection, we apply the (ϵ, δ)-RVI to the methane monitoring problem intorduced in Section II. Since the move- ment of the Gasbot is restricted to a grid, the planned sensor paths will be crossing frequently and we can use δ = 0. The di- mension ny of the target is the number of cells in the gas con- centration map and would typically be very large. Checking algebraic redundancy requires solving an ny-dimensional LMI feasibility problem, which is computationally very demanding. To avoid this, we let ϵ = ∞. This means that when several paths cross at time t, only the most informative one is kept in T ϵ,δ t . Thus, the number of nodes in T ϵ,δ t remains bounded by the number of reachable sensor states. Trajectories of length T = 40 were planned using RVI and the greedy algorithm (GREEDY). The results (Fig. 1) reveal an important drawback of GREEDY. It remains trapped in a local region of relatively high variance and fails to see that there are more interesting regions which should be explored during the limited available time. Fig. 1 also shows that the growth of the search tree is much slower for RVI compared to FVI, while the quality of the RVI solution is better than the greedy one. B. Target localization and tracking In a lot of applications, the linear assumptions (2), (3) are reasonable for the target motion model but very limiting for the sensor observation model. A lot more problems can fit in a framework with the following non-linear observation model: zt = h(xt, yt)+v(xt, yt), v(xt, yt) ∼N(0, V (xt, yt)). (10) In this subsection, we show that our method can be coupled with linearization and model predictive control to generate an adaptive policy for a mobile sensor with the model in (10). To motivate the discussion we consider a target tracking application, in which both the sensor motion and observation models are highly non-linear. Suppose that the sensor is mounted on a vehicle with differential-drive dynamics, which are discretized using a sampling period τ as follows:   x1 t+1 x2 t+1 θt+1  =   x1 t x2 t θt  +                       τv cos(θt + τω/2) τv sin(θt + τω/2) τω   , |τω| < 0.001    v ω(sin(θt + τω) −sin θt) v ω(cos θt −cos(θt + τω)) τω   , else. The vehicle is controlled using the motion primitives U = {(v, ω) | v ∈{0, 1, 2, 3} m/s, ω ∈{0, ±π/2, ±π} rad/s}. The task of the sensor is to track the position (y1, y2) ∈R2 and velocity ( ˙y1, ˙y2) ∈R2 of another vehicle with discretized double integrator dynamics driven by Gaussian noise: yt+1 = I2 τI2 0 I2  yt+wt, wt ∼N  0, q τ 3/3I2 τ 2/2I2 τ 2/2I2 τI2  , where y = [y1, y2, ˙y1, ˙y2]T is the target state and q is a scalar diffusion strength measured in ( m sec2 )2 1 Hz. The sensor takes noisy range and bearing measurments of the target’s position: h(x, y)= r(x, y) α(x, y)  :=  p (y1 −x1)2 + (y2 −x2)2 arctan (y2 −x2)/(y1 −x1)   (11) The target needs to be tracked during a period Tmax in a wooded area (see Fig. 2), which affects the covariance of the measurement noise. The noise in the range measurement grows linearly with the distance between the sensor and the target but trees along the way make the growth faster. The bearing measurement noise increases linearly with the speed of the sensor. Thus, good range measurements require that the sensor is close to the target and not blocked by trees, while good bearing measurements require that the sensor moves slowly. The sensor has a maximum range of 15 meters, after which the noise covariance is infinite. To exploit the independence of the covariance recursion of the Kalman filter from measurement values (Theorem 1) and to employ RVI, the observation model (11) needs to be linearized about a predicted target trajectory during planning. Linearized about an arbitrary target state y ̸= x, the model is: H(x, y):=∇yh(x, y)= 1 r(x, y)  (y1 −x1) (y2 −x2) −sin α(x, y) cos α(x, y)  . The linearization depends on the predicted target trajectory, which in turn depends on the measurements obtained on-line. As a result, it is necessary to re-plan the sensor path on-line x [m] y [m] 2 4 6 8 10 12 14 2 4 6 8 10 12 14 x [m] Gas concentration [ppm] 2 4 6 8 10 12 14 2 4 6 8 10 12 14 10 25 50 100 150 200 300 400 500 600 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 Time Steps Log number of nodes per tree level 1027 103 1 GREEDY (ε,δ)−RVI FVI Fig. 1. Comparison of the sensor trajectories (white) obtained by the greedy algorithm (left) and the reduced value iteration (middle) with ϵ = ∞and δ = 0 after 40 time steps. A typical realization of the methane field is shown. The red lines indicate the orientation of the gas sensor during the execution. On the right, the log number of nodes maintained in the search tree by the two approaches is compared to the complete tree maintained by forward value iteration. after every new measurement. In general, re-planning would be feasible only for a short horizon T < Tmax before the plan is needed. Alg. 3 shows how to use the (ϵ, δ)-RVI and model predictive control to generate an adaptive policy. Algorithm 3 Model predictive control via the (ϵ, δ)-RVI 1: Input: Tmax, x0, ˆy0, ˆΣ0, f, U, H, V, A, W, T, ϵ, δ 2: for t = 0 : Tmax do 3: Predict a target trajectory of length T: ˆyt, . . . , ˆyt+T 4: Linearize the observation model: Hτ(·) ←H(·, ˆyt+τ), τ = 0, .., T 5: % Plan a sensor trajectory of length T 6: σ ←(ϵ, δ)-RVI(xt, ˆyt, ˆΣt, {Hτ}T τ=0, V, f, U, A, W, T) 7: Move the sensor: xt+1 ←f(xt, σ1) 8: Get a new observation: zt+1 ←h(xt+1, yt+1) + vt+1(xt+1) 9: Update the target estimate: (ˆyt+1, ˆΣt+1) ←Filter(ˆyt, ˆΣT , zt+1) We emphasize that the linearized sensing models are utilized merely for determining the next configuration (Line 6), while the target state inference (Line 9) can still be performed with any non-linear filter. Alg. 3 was implemented with Tmax = 100, T = 7, ϵ = 0.1, δ = 1, τ = 0.5, q = 0.2 and 100 Monte- Carlo simulations were carried out. The tracking performance is compared to the greedy solution in Fig. 2. We can see that the greedy policy goes in straight line to keep the speed low, i.e. the bearing noise small, but cannot predict in advance that the range noise will increase as the target goes further away. As a result, the greedy solution is likely to lose the target. VI. CONCLUSION Under linear Gaussian assumptions, information acquisition with mobile sensors can be planned off-line. Most previous ap- proaches are greedy or neglect the sensor dynamics and rarely provide performance guarantees. In this paper, we developed a non-greedy algorithm (RVI), which takes the sensor dynamics into account and has suboptimality guarantees. It provides pa- rameters (ϵ, δ) to control the trade-off between complexity and optimality. Coupled with linearization and model predictive control, RVI can generate an adaptive policy for a non-linear mobile sensor. Our method can also be applied to a multi- sensor problem by stacking all motion and observation models into single centralized versions. Unfortunately, the complexity of this solution is exponential in the number of sensors. Future work will focus on a decentralized method for the multi- sensor, multi-target active information acquisition. In detail, we would like to address cooperative control, distributed target estimation, and noisy self-localization. APPENDIX A: PROOF OF THEOREM 1 Lemma 1. Let X ∼N(µ, Σ) be n-dimensional. Then, its differential entropy is h(X) = n log(2πe) + log det(Σ)  /2. Proof of Thm 1: By definition of mutual information: I(yT +1; z1:T | x1:T ) = h(yT +1 | x1:T )−h(yT +1 | z1:T , x1:T ). Since yT +1 is independent of the sensor path the first term is constant with respect to the optimization in (4). Due to linearity of the observation model (3) in the target state, the distribution of yt+1 given z1:t and x1:t remains Gaussian for t>1. Its covariance Σt can be obtained from the Bayes filter, which due to the linear Gaussian assumptions is equivalent to the Kalman filter. Thus, Σt+1 = ρxt(Σt) for t = 0, . . . , T −1, which is independent of the measurements z1:t. By Lemma 1: h(yT +1 |z1:T , x1:T )=Eˆz1:T h(yT +1 | z1:T = ˆz1:T , x1:T ) (12) = 1 2Eˆz1:T log(2πe)ny+log |ΣT |  = 1 2 log(2πe)ny+log |ΣT |  . Let µ∗={µ∗ 0, .., µ∗ T −1} be optimal in (4) with cost V ∗. Fix a realization ˆz1:T of the measurements and let σ be the open- loop policy induced by µ∗given ˆz1:T with cost V σ. From (12), V ∗is independent of ˆz1:T , hence V ∗= V σ for any ˆz1:T . APPENDIX B: PROOF OF THEOREM 2 Definition 4 (Operator monotone function). Let Σ1, Σ2 ∈Sn +. A function g : Sn + →Sn + is operator monotone if Σ1 ⪯Σ2 implies g(Σ1) ⪯g(Σ2). Definition 5 (Operator concave function). Let {Σi | i = 1, . . . , K} ⊂Sn + be a finite set and let PK i=1 αi = 1 for some real constants αi ≥0. A function g : Sn + →Sn + is operator concave if it satisfies PK i=1 αig(Σi) ⪯g(PK i=1 αiΣi). x [m] y [m] −30 −20 −10 0 10 −50 −40 −30 −20 −10 0 10 20 30 40 50 Target path RVI path GREEDY path 0 20 40 60 80 100 0 2 4 6 8 Position RMSE [m] 0 20 40 60 80 100 0 1 2 3 Velocity RMSE [m/sec] Time Steps (ε,δ)−RVI GREEDY 0 20 40 60 80 100 −20 −15 −10 −5 0 5 Final Cost of Plan Time Steps (ε,δ)−RVI GREEDY Fig. 2. Simulation results from 100 Monte-Carlo runs of the target tracking scenario. A typical realization is shown on the left. The average root-mean-square error (RMSE) of the estimated target position and velocity is shown in the middle. The log det of the predicted target covariance is shown on the right. Definition 6 (t-step Riccati map). For π ∈X T , Σ ∈Sn + let φ0 π(Σ) := Σ, φt π(Σ) := ρπt ◦. . . ◦ρπ1(Σ), t ∈[1, T]. Lemma 2 ([11]). For any t ∈[0, T], the t-step Riccati map is operator monotone and operator concave. Proof of Thm 2: Let σ ∈U T −t be any admissible control sequence. Starting at (x, Σ), by Lemma 2 and Definition 1, there exist nonnegative constants {αi}K i=1 such that φT −t π(σ)(Σ) ⪰φT −t π(σ)  K X i=1 αiΣi  ⪰ K X i=1 αiφT −t π(σ) Σi . Then, from monotonicity and concavity of log det(·): log det  φT −t π(σ)(Σ)  ≥log det  K X i=1 αiφT −t π(σ)(Σi)  ≥ K X i=1 αi log det  φT −t π(σ)(Σi)  ≥log det  φT −t π(σ)(Σi∗)  . The last inequality holds because a convex combination of scalars is lower bounded by the smallest one i∗. APPENDIX C: PROOF OF THEOREM 4 Lemma 3 ([11]). For any π ∈X T , Q ∈Sn +, and ϵ ≥0 the directional derivative of the t-step Riccati map is: gt π(Σ; Q) := d dϵφt π(Σ + ϵQ) ϵ=0 = 1 Y τ=t ACπτ (φτ π(Σ))Q tY τ=1 Cπτ (φτ π(Σ))T AT . Lemma 4. For any t∈[0, T], π∈X T , Σ, Q1, Q2 ∈Sn +, a, b∈R gt π(Σ; aQ1 + bQ2) = agt π(Σ; Q1) + bgt π(Σ; Q2) because a directional derivative is linear in the perturbation. In addition, by operator concavity of the t-step Riccati map: φt π(Σ + ϵQ) ⪯φt π(Σ) + ϵgt π(Σ; Q) Lemma 5. For all t ∈[1, T], π ∈X T , and Q ∈Sn +, if there exists a constant β < ∞such that φt π(Σ) ⪯βIny, then tr gt π(Σ; Q)  ≤βηt tr(Σ−1Q), η := β/(β + λW ) < 1. Proof: We follow the steps of [11, Thm. 5] but use the tighter bound ˆAt ˆΣt ˆAT t ⪯(β−λ− W )In in (A.4), which leads to L(l) t −L(l) t+1 ≥λ− W β L(l) t . Also, since Q ∈Sn + we can decompose it as Q = Pn l=1 λl QqlqT l and let ξ(l)(0) = q λl Qql. Lemma 6. Consider two nodes (x1, Σ) and (x2, Σ) with dX (x1, x2) ≤δ. Let Σ1 and Σ2 be the updated covariance matrices after applying u ∈U at each node. Then, Σ1 ⪰γΣ2 + (1 −γ)W and Σ2 ⪰γΣ1 + (1 −γ)W, where 0 < γ := (1 + LmLfδ)−1 ≤1. Proof: Let Mi := M(f(xi, u)) and ρi(·) := ρf(xi,u)(·) for i = 1, 2. From Assumption 1, dX (f(x1, u), f(x2, u)) ≤Lfδ and from Assumption 2, M1 ⪯(1 + LmLfδ)M2. Then, ρ1(Σ) = A(Σ−1 + M1)−1AT + W ⪰A(Σ−1 + γ−1M2)−1AT + W = γρ2(γ−1Σ) + (1 −γ)W ⪰γρ2(Σ) + (1 −γ)W The second result follows analogously. Lemma 7. For t ∈[1, T], ϵ ≥0, δ ≥0, the reduced tree T ϵ,δ t contains a set of nodes {(xi t, Σi t) | i = 1, . . . , K} such that: dX (x∗ t , xi t) ≤ t−1 X τ=0 Lτ fδ, ∀i, (13) Σ∗ t + ϵ  ΓtIny + t−1 X τ=1 Γτgt−τ π∗(τ+1)(Σ∗ τ; Iny)  (14) ⪰Γt K X i=1 αiΣi t + t−1 X τ=1 Γτ(1 −γτ)φt−1−τ π∗(τ+1)(W), where π∗= x∗ 1, . . . , x∗ T is the optimal trajectory in 5, 0 < γt := (1 + Pt s=1 Ls fLmδ)−1 ≤1, and Γt := Qt−1 s=1 γs. Proof of Lemma 7: We proceed by induction. Base Case: If the optimal node (x∗ 1, Σ∗ 1) is discarded, then by Definitions 2 and 3 T ϵ,δ 1 contains a set of nodes {(xk 1, Σk 1)} such that dX (x∗ 1, xk 1) ≤δ for all k and Σ∗ 1+ϵIny ⪰P k αk 1Σk 1. Hypothesis: Suppose that (13) and (14) hold for some set {(xj t, Σj t) | j = 1, . . . , J} of nodes in T ϵ,δ t . Induction: At time t + 1, there are sets {(xji t+1, Σji t+1)}Kj i=1 in T ϵ,δ t+1 corresponding to each node j from time t and satisfying dX (xj t+1, xji t+1) ≤δ and Σj t+1+ϵIny ⪰PKj i=1 αjiΣji t+1. (15) From Lemma 3 for every τ = 1, . . . , t: g1 π∗(t+1)  Σ∗ t ; gt−τ π∗(τ+1)(Σ∗ τ; Iny)  = gt+1−τ π∗(τ+1)(Σ∗ τ; Iny). From this and Lemma 4: Σ∗ t+1 + ϵ t X τ=1 Γτgt+1−τ π∗ (Σ∗ τ; Iny) = ρπ∗ t+1(Σ∗ t ) + ϵg1 π∗(t+1)  Σ∗ t ; t X τ=1 Γτgt−τ π∗(τ+1)(Σ∗ τ; Iny)  ⪰ρπ∗ t+1  Σ∗ t + ϵ t−1 X τ=1 Γτgt−τ π∗(τ+1)(Σ∗ τ; Iny) + ϵΓtIny  Note that Pt−1 τ=1 Γτ(1 −γτ) + Γt = 1. Thus, the terms (1 − γ1), γ1(1 −γ2), . . . , Γt−1(1 −γt−1), Γt are the coefficients of a convex combination. Using the hypothesis and Lemma 2: ρπ∗ t+1  Σ∗ t + ϵ t−1 X τ=1 Γτgt−τ π∗(τ+1)(Σ∗ τ; Iny) + ϵΓtIny  ⪰ρπ∗ t+1  Γt J X j=1 αjΣj t + t−1 X τ=1 Γτ(1 −γτ)φt−1−τ π∗(τ+1)(W)  ⪰Γt J X j=1 αjρπ∗ t+1(Σj t) + t−1 X τ=1 Γτ(1 −γτ)φt−τ π∗(τ+1)(W). By hypothesis, dX (x∗ t , xj t) ≤Pt−1 τ=0 Lτ fδ, and from Lemma 6: ρπ∗ t+1(Σj t) ⪰γtΣj t+1 + (1 −γt)W. The nodes {(xj t+1, Σj t+1)} might not be in T ϵ,δ t+1 but from (15): ρπ∗ t+1(Σj t) + γtϵIny ⪰γt Kj X i=1 αjiΣji t+1 + (1 −γt)W. Combining the previous results, we have: Σ∗ t+1 + ϵ t X τ=1 Γτgt+1−τ π∗ (Σ∗ τ; Iny) + ϵΓt+1Iny ⪰Γt J X j=1 αj  ρπ∗ t+1(Σj t)+ γtϵIny  + t−1 X τ=1 Γτ(1−γτ)φt−τ π∗(τ+1)(W) | {z } (∗) ⪰Γt J X j=1 αj  γt Kj X i=1 αjiΣji t+1 + (1 −γt)W  + (∗) = Γt+1 J X j=1 Kj X i=1 αjαjiΣji t+1 + t X τ=1 Γτ(1 −γτ)φt−τ π∗(τ+1)(W). Thus, the set SJ j=1 SKj i=1{(xji t+1, Σji t+1)} satisfies (14) at time t+1. It satisfies (13) at t+1 from (15) and Assumption 1. Proof of Thm 4: As defined in Lemma 7 ΓT = ζ−1 T . Define V (·) := log det(·) and G := ΓT Iny+ T −1 X τ=1 ΓτgT −τ π∗(τ+1)(Σ∗ τ; Iny). By monotonicity of V (·) and the result in Lemma 7: V (Σ∗ T +ϵG) ≥V  ΓT K X i=1 αiΣi T + T −1 X τ=1 Γτ(1−γτ)φT −1−τ π∗(τ+1)(W)  for some set of nodes {(xi T , Σi T ) | i = 1, . . . , K} in the reduced tree T ϵ,δ T . By definition, φt π(W) ⪰W for any t, π and PT −1 τ=1 Γτ(1 −γτ) = 1 −ΓT . By concavity of V (·): V (Σ∗ T + ϵG) ≥ΓT K X i=1 αiV (Σi T ) + (1 −ΓT )V (W) (16) ≥ΓT V (Σi∗ T ) + (1 −ΓT )V (W) ≥ΓT V ϵ,δ T + (1 −ΓT )V (W). The second inequality holds because a convex combination of scalars is lower bounded by the smallest one i∗. The last inequality holds because πϵ,δ is the optimal path in the reduced tree. Next, by concavity of log det(·): V (Σ∗ T + ϵG) ≤V (Σ∗ T ) + ϵ d dϵV  Σ∗ T + ϵG  ϵ=0 = V ∗ T + ϵ tr  (Σ∗ T )−1G  ≤V ∗ T + ϵ 1 λW tr(G). (17) From Lemma 5 and since tr((Σ∗ T )−1) ≤ny/λW : tr(G) = ΓT tr(Iny) + T =1 X τ=1 Γτ tr  gT −τ π∗(τ+1)(Σ∗ τ; Iny)  ≤nyΓT + T =1 X τ=1 Γτβ∗ηT −τ ∗ tr((Σ∗ T )−1) ≤ΓT ∆T (18) Finally, by combining (16), (17), and (18) we get: ΓT V ϵ,δ T + (1 −ΓT )V (W) ≤V ∗ T + ϵΓT ∆T 0 ≤ΓT (V ϵ,δ T −V ∗ T ) ≤(1 −ΓT )(V ∗ T −V (W)) + ϵΓT ∆T . Multiplying by ζT = Γ−1 T gives the result in (9). REFERENCES [1] H. Choi, “Adaptive Sampling and Forecasting With Mobile Sensor Networks,” Ph.D. dissertation, MIT, February 2009. [2] N. Leonard, D. Paley, F. Lekien, R. Sepulchre, D. Fratantoni, and R. Davis, “Collective Motion, Sensor Networks, and Ocean Sampling,” Proceedings of the IEEE, vol. 95, no. 1, pp. 48–74, 2007. [3] V. Hernandez Bennetts, A. Lilienthal, A. A. Khaliq, V. Pomareda Sese, and M. Trincavelli, “Towards Real-World Gas Distribution Mapping and Leak Localization Using a Mobile Robot with 3D and Remote Gas Sensing Capabilities,” in Proc. IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, May 2013. [4] P. E. Rybski, S. A. Stoeter, M. D. Erickson, M. Gini, D. F. Hougen, and N. Papanikolopoulos, “A Team of Robotic Agents for Surveillance,” in Proc. International Conference on Autonomous Agents, 2000, pp. 9–16. [5] A. Hilal, “An Intelligent Sensor Management Framework for Pervasive Surveillance,” Ph.D. dissertation, University of Waterloo, 2013. [6] V. Kumar, D. Rus, and S. Singh, “Robot and Sensor Networks for First Responders,” Pervasive Computing, IEEE, vol. 3, no. 4, pp. 24–33, 2004. [7] R. Sim and N. Roy, “Global A-Optimal Robot Exploration in SLAM,” in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2005, pp. 661–666. [8] V. Karasev, A. Chiuso, and S. Soatto, “Controlled recognition bounds for visual learning and exploration,” in Advances in Neural Information Processing Systems, December 2012. [9] N. Atanasov, B. Sankaran, J. Le Ny, T. Koletschka, G. J. Pappas, and K. Daniilidis, “Hypothesis Testing Framework for Active Object Detection,” in Proc. IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, May 2013. [10] A. O. Hero III and D. Cochran, “Sensor management: Past, present, and future,” Sensors Journal, IEEE, vol. 11, no. 12, 2011. [11] M. P. Vitus, W. Zhang, A. Abate, J. Hu, and C. J. Tomlin, “On efficient sensor scheduling for linear dynamical systems,” Automatica, vol. 48, no. 10, pp. 2482–2493, October 2012. [12] S. Joshi and S. Boyd, “Sensor Selection via Convex Optimization,” Signal Processing, IEEE Transactions on, vol. 57, no. 2, 2009. [13] J. L. Williams, “Information Theoretic Sensor Management,” Ph.D. dissertation, Massachusetts Institute of Technology, February 2007. [14] V. Krishnamurthy and R. Evans, “Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking,” Signal Processing, IEEE Transactions on, vol. 49, no. 12, 2001. [15] A. Singh, A. Krause, C. Guestrin, and W. J. Kaiser, “Efficient Informa- tive Sensing Using Multiple Robots,” Journal of Artificial Intelligence Research (JAIR), vol. 34, no. 1, pp. 707–755, April 2009. [16] B. Grocholsky, “Information Theoretic Control of Multiple Sensor Platforms,” Ph.D. dissertation, University of Sidney, March 2002. [17] T. Chung, J. Burdick, and R. Murray, “A Decentralized Motion Co- ordination Strategy for Dynamic Target Tracking,” in Proc. IEEE International Conference on Robotics and Automation (ICRA), 2006. [18] C. M. Kreucher, “An Information-based Approach to Sensor Resource Allocation,” Ph.D. dissertation, University of Michigan, 2005. [19] M. Huber, “Probabilistic framework for sensor management,” Ph.D. dissertation, Universit¨at Karlsruhe (TH), 2009. [20] J. Le Ny and G. J. Pappas, “On Trajectory Optimization for Active Sensing in Gaussian Process Models,” in Proc. of the 48th Conference on Decision and Control (CDC), December 2009. [21] A. Singh, A. Krause, and W. Kaiser, “Nonmyopic Adaptive Informa- tive Path Planning for Multiple Robots,” in 21st International Joint Conference on Artificial Intelligence, Pasadena, CA, USA, 2009. [22] J. Vander Hook, P. Tokekar, and V. Isler, “Bearing-Only Active Target Localization Strategies for a System of Two Communicating Mobile Robots: Full Technical Report,” University of Minnesota, Tech. Rep. TR-13012, 2013. [23] X. Lan and M. Schwager, “Planning Periodic Persistent Monitoring Trajectories for Sensing Robots in Gaussian Random Fields,” in Proc. IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, May 2013, pp. 2407–2412. [24] G. A. Hollinger and G. S. Sukhatme, “Sampling-based Motion Planning for Robotic Information Gathering,” in Proc. Robotics: Science and Systems Conference (RSS), 2013. [25] B. Charrow, V. Kumar, and N. Michael, “Approximate Representations for Multi-Robot Control Policies that Maximize Mutual Information,” in Proc. Robotics: Science and Systems Conference (RSS), 2013. [26] G. M. Hoffmann and C. J. Tomlin, “Mobile Sensor Network Control Using Mutual Information Methods and Particle Filters,” Automatic Control, IEEE Transactions on, vol. 55, no. 1, pp. 32–47, 2010. [27] S. M. LaValle, “Sensing and Filtering: A Fresh Perspective Based on Preimages and Information Spaces,” Foundations and Trends in Robotics, vol. 1, no. 4, pp. 253–372, April 2012. [28] C. Freundlich, P. Mordohai, and M. M. Zavlanos, “A Hybrid Control Approach to the Next-Best-View Problem using Stereo Vision,” in Proc. IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, May 2013. [29] M. Athans, “On the Determination of Optimal Costly Measurement Strategies for Linear Stochastic Systems,” Automatica, vol. 8, no. 4, pp. 397–412, 1972. [30] L. Meier, J. Peschon, and R. Dressler, “Optimal Control of Measurement Subsystems,” Automatic Control, IEEE Transactions on, vol. 12, no. 5, pp. 528–536, 1967. [31] MATLAB, Robust Control Toolbox. Natick, Massachusetts, United States: The MathWorks Inc., 2012.