arXiv:1709.08826v2 [math.OC] 14 May 2020 Sensing-Constrained LQG Control Vasileios Tzoumas, 1 , 2 Luca Carlone, 2 George J. Pappas, 1 Ali Jadbabaie 2 Abstract —Linear-Quadratic-Gaussian (LQG) control is con- cerned with the design of an optimal controller and estimator for linear Gaussian systems with imperfect state information. Standard LQG assumes the set of sensor measurements, to be fed to the estimator, to be given. However, in many problems, arising in networked systems and robotics, one may not be able to use all the available sensors, due to power or payload constraints, or may be interested in using the smallest subset of sensors that guarantees the attainment of a desired control goal. In this paper, we introduce the sensing-constrained LQG control problem, in which one has to jointly design sensing, estimation, and control, under given constraints on the resources spent for sensing. We focus on the realistic case in which the sensing strategy has to be selected among a finite set of possible sensing modalities. While the computation of the optimal sensing strategy is intractable, we present the first scalable algorithm that computes a near- optimal sensing strategy with provable sub-optimality guarantees. To this end, we show that a separation principle holds, which allows the design of sensing, estimation, and control policies in isolation. We conclude the paper by discussing two applications of sensing-constrained LQG control, namely, sensing-constrained formation control and resource-constrained robot navigation . This paper has been accepted for publication in the IEEE Transactions of Automatic Control. Please cite the paper as: V. Tzoumas, L. Carlone, George J. Pappas, A. Jadbabaie “LQG Control and Sensing Co-Design", IEEE Transactions of Automatic Control (TAC), 2020. I. I NTRODUCTION Traditional approaches to control of systems with partially observable state assume the choice of sensors used to observe the system is given. The choice of sensors usually results from a preliminary design phase in which an expert designer selects a suitable sensor suite that accommodates estimation require- ments (e.g., observability, desired estimation error) and system constraints (e.g., size, cost). Modern control applications, from large networked systems to miniaturized robotics systems, pose serious limitations to the applicability of this traditional paradigm. In large-scale networked systems (e.g., smart grids or robot swarms), in which new nodes are continuously added and removed from the network, a manual re-design of the sensors becomes cumbersome and expensive, and it is simply not scalable. In miniaturized robot systems, while the set of onboard sensors is fixed, it may be desirable to selectively activate only a subset of the sensors during different phases of operation, in order to minimize power consumption. In both application scenarios, one usually has access to a (possibly large) list of potential sensors, but, due to resource constraints 1 The authors are with the Department of Electrical and Systems Engi- neering, University of Pennsylvania, Philadelphia, PA 19104 USA (email: {pappagsg, vtzoumas}@seas.upenn.edu ). 2 The authors are with the Institute for Data, Systems and Society, and the Laboratory for Information and Decision Systems, Massachusetts In- stitute of Technology, Cambridge, MA 02139 USA (email: {jadbabai, lcarlone, vtzoumas}@mit.edu ). This work was supported in part by TerraSwarm, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA, and in part by AFOSR Complex Networks Program. (e.g., cost, power), can only utilize a subset of them. Moreover, the need for online and large-scale sensor selection demands for automated approaches that efficiently select a subset of sensors to maximize system performance. Motivated by these applications, in this paper we consider the problem of jointly designing control, estimation, and sensor selection for a system with partially observable state. Related work. One body of related work is control over band-limited communication channels , which investigates the trade-offs between communication constraints (e.g., data rate, quantization, delays) and control performance (e.g., stability) in networked control systems. Early work provides results on the impact of quantization [1], finite data rates [2], [3], and separation principles for LQG design with communica- tion constraints [4]; more recent work focuses on privacy constraints [5]. We refer the reader to the surveys [6]–[8]. A second set of related work is sensor selection and schedul- ing , in which one has to select a (possibly time-varying) set of sensors in order to monitor a phenomenon of interest. Related literature includes approaches based on randomized sensor selection [9], dual volume sampling [10], [11], convex relax- ations [12], [13], and submodularity [14]–[16]. The third set of related works is information-constrained (or information- regularized) LQG control [17], [18]. Shafieepoorfard and Ra- ginsky [17] study rationally inattentive control laws for LQG control and discuss their effectiveness in stabilizing the system. Tanaka and Mitter [18] consider the co-design of sensing, control, and estimation, propose to augment the standard LQG cost with an information-theoretic regularizer, and derive an elegant solution based on semidefinite programming. The main difference between our proposal and [18] is that we consider the case in which the choice of sensors, rather than being arbitrary, is restricted to a finite set of available sensors. Contributions. We extend the Linear-Quadratic-Gaussian (LQG) control to the case in which, besides designing an op- timal controller and estimator, one has to select a set of sensors to be used to observe the system state. In particular, we for- mulate the sensing-constrained (finite-horizon) LQG problem as the joint design of an optimal control and estimation policy, as well as the selection of a subset of k out of N available sensors, that minimize the LQG objective, which quantifies tracking performance and control effort. We first leverage a separation principle to show that the design of sensing, control, and estimation, can be performed independently. While the computation of the optimal sensing strategy is combinatorial in nature, a key contribution of this paper is to provide the first scalable algorithm that computes a near-optimal sensing strategy with provable sub-optimality guarantees. We motivate the importance of the sensing-constrained LQG problem, and demonstrate the effectiveness of the proposed algorithm in nu- merical experiments, by considering two application scenarios, namely, sensing-constrained formation control and resource- constrained robot navigation , which, due to page limitations, we include in the full version of this paper, located at the authors’ websites. All proofs can be found also in the full version of this paper, located at the authors’ websites. Notation. Lowercase letters denote vectors and scalars, and uppercase letters denote matrices. We use calligraphic fonts to denote sets. The identity matrix of size n is denoted with I n (dimension is omitted when clear from the context). For a matrix M and a vector v of appropriate dimension, we define ‖ v ‖ 2 M , v T M v . For matrices M 1 , M 2 , . . . , M k , we define diag ( M 1 , M 2 , . . . , M k ) as the block diagonal matrix with diagonal blocks the M 1 , M 2 , . . . , M k . II. S ENSING -C ONSTRAINED LQG C ONTROL In this section we formalize the sensing-constrained LQG control problem considered in this paper. We start by intro- ducing the notions of system , sensors , and control policies . a) System: We consider a standard discrete-time (possi- bly time-varying) linear system with additive Gaussian noise: x t +1 = A t x t + B t u t + w t , t = 1 , 2 , . . . , T, (1) where x t ∈ R n t represents the state of the system at time t , u t ∈ R m t represents the control action, w t represents the process noise, and T is a finite time horizon. In addition, we consider the system’s initial condition x 1 to be a Gaussian random variable with covariance Σ 1 | 0 , and w t to be a Gaussian random variable with mean zero and covariance W t , such that w t is independent of x 1 and w t ′ for all t ′ = 1 , 2 , . . . , T , t ′ 6 = t . b) Sensors: We consider the case where we have a (potentially large) set of available sensors, which take noisy linear observations of the system’s state. In particular, let V be a set of indices such that each index i ∈ V uniquely identifies a sensor that can be used to observe the state of the system. We consider sensors of the form y i,t = C i,t x t + v i,t , i ∈ V , (2) where y i,t ∈ R p i,t represents the measurement of sensor i at time t , and v i,t represents the measurement noise of sensor i . We assume v i,t to be a Gaussian random variable with mean zero and positive definite covariance V i,t , such that v i,t is independent of x 1 , and of w t ′ for any t ′ 6 = t , and independent of v i ′ ,t ′ for all t ′ 6 = t , and any i ′ ∈ V , i ′ 6 = i . In this paper we are interested in the case in which we cannot use all the available sensors, and as a result, we need to select a convenient subset of sensors in V to maximize our control performance (formalized in Problem 1 below). Definition 1 (Active sensor set and measurement model). Given a set of available sensors V , we say that S ⊂ V is an active sensor set if we can observe the measurements from each sensor i ∈ S for all t = 1 , 2 , . . . , T . Given an active sensor set S = { i 1 , i 2 . . . , i |S| } , we define the following quantities y t ( S ) , [ y T i 1 ,t , y T i 2 ,t , . . . , y T i |S| ,t ] T , C t ( S ) , [ C T i 1 ,t , C T i 2 ,t , . . . , C T i |S| ,t ] T , V t ( S ) , diag [ V i 1 ,t , V i 2 ,t , . . . , V i |S| ,t ] (3) which lead to the definition of the measurement model : y t ( S ) = C t ( S ) x t + v t ( S ) (4) where v t ( S ) is a zero-mean Gaussian noise with covari- ance V t ( S ) . Despite the availability of a possibly large set of sensors V , our observer will only have access to the measurements produced by the active sensors. The following paragraph formalizes how the choice of the active sensors affects the control policies. c) Control policies: We consider control policies u t for all t = 1 , 2 , . . . , T that are only informed by the measurements collected by the active sensors: u t = u t ( S ) = u t ( y 1 ( S ) , y 2 ( S ) , . . . , y t ( S )) , t = 1 , 2 , . . . , T. Such policies are called admissible . In this paper, we want to find a small set of active sensors S , and admissible controllers u 1 ( S ) , u 2 ( S ) , . . . , u T ( S ) , to solve the following sensing-constrained LQG control problem. Problem 1 (Sensing-constrained LQG control). Find a sen- sor set S ⊂ V of cardinality at most k to be active across all times t = 1 , 2 , . . . , T , and control policies u 1: T ( S ) , { u 1 ( S ) , u 2 ( S ) , . . . , u T ( S ) } , that minimize the LQG cost function: min S ⊆ V , |S|≤ k, u 1: T ( S ) T ∑ t =1 E [ ‖ x t +1 ( S ) ‖ 2 Q t + ‖ u t ( S ) ‖ 2 R t ] , (5) where the state-cost matrices Q 1 , Q 2 , . . . , Q T are positive semi-definite, the control-cost matrices R 1 , R 2 , . . . , R T are positive definite, and the expectation is taken with respect to the initial condition x 1 , the process noises w 1 , w 2 , . . . , w T , and the measurement noises v 1 ( S ) , v 2 ( S ) , . . . , v T ( S ) . Problem 1 generalizes the imperfect state-information LQG control problem from the case where all sensors in V are active, and only optimal control policies are to be found [19, Chapter 5], to the case where only a few sensors in V can be active, and both optimal sensors and control policies are to be found jointly. While we already noticed that admissible control policies depend on the active sensor set S , it is worth noticing that this in turn implies that the state evolution also depends on S ; for this reason we write x t +1 ( S ) in eq. (5). The intertwining between control and sensing calls for a joint design strategy. In the following section we focus on the design of a jointly optimal control and sensing solution to Problem 1. III. J OINT S ENSING AND C ONTROL D ESIGN In this section we first present a separation principle that de- couples sensing, estimation, and control, and allows designing them in cascade (Section III-A). We then present a scalable algorithm for sensing and control design (Section III-B). Algorithm 1 Joint Sensing and Control design for Problem 1. Input: Time horizon T , available sensor set V , covariance matrix Σ 1 | 0 of initial condition x 1 ; for all t = 1 , 2 , . . . , T , system matrix A t , input matrix B t , LQG cost matrices Q t and R t , process noise covariance matrix W t ; and for all sensors i ∈ V , measurement matrix C i,t , and measurement noise covariance matrix V i,t . Output: Active sensors ̂ S , and control matrices K 1 , . . . , K T . 1: ̂ S is returned by Algorithm 2 that finds a (possibly approx- imate) solution to the optimization problem in eq. (6); 2: K 1 , . . . , K T are computed using the recursion in eq. (8). A. Separability of Optimal Sensing and Control Design We characterize the jointly optimal control and sensing solutions to Problem 1, and prove that they can be found in two separate steps, where first the sensing design is computed, and second the corresponding optimal control design is found. Theorem 1 (Separability of optimal sensing and control de- sign). Let the sensor set S ⋆ and the controllers u ⋆ 1 , u ⋆ 2 , . . . , u ⋆ T be a solution to the sensing-constrained LQG Problem 1. Then, S ⋆ and u ⋆ 1 , u ⋆ 2 , . . . , u ⋆ T can be computed in cascade as follows: S ⋆ ∈ arg min S⊆V , |S|≤ k T ∑ t =1 tr [Θ t Σ t | t ( S )] , (6) u ⋆ t = K t ˆ x t, S ⋆ , t = 1 , . . . , T (7) where ˆ x t ( S ) is the Kalman estimator of the state x t , i.e., ˆ x t ( S ) , E ( x t | y 1 ( S ) , y 2 ( S ) , . . . , y t ( S )) , and Σ t | t ( S ) is ˆ x t ( S ) ’s error covariance, i.e., Σ t | t ( S ) , E [(ˆ x t ( S ) − x t )(ˆ x t ( S ) − x t ) T ] [19, Appendix E]. In addition, the matrices Θ t and K t are independent of the selected sensor set S , and they are computed as follows: the matrices Θ t and K t are the solution of the backward Riccati recursion S t = Q t + N t +1 , N t = A T t ( S − 1 t + B t R − 1 t B T t ) − 1 A t , M t = B T t S t B t + R t , K t = − M − 1 t B T t S t A t , Θ t = K T t M t K t , (8) with boundary condition N T +1 = 0 . Remark 1 (Certainty equivalence principle). The control gain matrices K 1 , K 2 , . . . , K T are the same as the ones that make the controllers ( K 1 x 1 , K 1 x 2 , . . . , K T x T ) optimal for the perfect state-information version of Problem 1, where the state x t is known to the controllers [19, Chapter 4]. Theorem 1 decouples the design of the sensing from the controller design. Moreover, it suggests that once an optimal sensor set S ⋆ is found, then the optimal controllers are equal to K t ˆ x t ( S ) , which correspond to the standard LQG control policy. This should not come as a surprise, since for a given sensing strategy, Problem 1 reduces to standard LQG control. We conclude this section with a remark providing a more intuitive interpretation of the sensor design step in eq. (6). Algorithm 2 Sensing design for Problem 1. Input: Time horizon T , available sensor set V , covariance matrix Σ 1 | 0 of system’s initial condition x 1 , and for any time t = 1 , 2 , . . . , T , any sensor i ∈ V , process noise covariance matrix W t , measurement matrix C i,t , and measurement noise covariance matrix V i,t . Output: Sensor set ̂ S . 1: Compute Θ 1 , Θ 2 , . . . , Θ T using recursion in eq. (8); 2: ̂ S ← ∅ ; i ← 0 ; 3: while i < k do 4: for all a ∈ V \ ̂ S do 5: ̂ S a ← ̂ S ∪ { a } ; Σ 1 | 0 ( ̂ S a ) ← Σ 1 | 0 ; 6: for all t = 1 , . . . , T do 7: Σ t | t ( ̂ S a ) ← 8: [Σ t | t − 1 ( ̂ S a ) − 1 + C t ( ̂ S a ) T V t ( ̂ S a ) − 1 C t ( ̂ S a )] − 1 ; 9: Σ t +1 | t ( ̂ S a ) ← A t Σ t | t ( ̂ S a ) A T t + W t ; 10: end for 11: cost a ← ∑ T t =1 tr [Θ t Σ t | t ( ̂ S a )] ; 12: end for 13: a i ← arg min a ∈V\S cost a ; 14: ̂ S ← ̂ S ∪ { a i } ; i ← i + 1 ; 15: end while Remark 2 (Control-aware sensor design). In order to pro- vide more insight on the cost function in (6) , we rewrite it as: T ∑ t =1 tr [Θ t Σ t | t ( S )] = T ∑ t =1 E ( tr { [ x t − ˆ x t ( S )] T Θ t [ x t − ˆ x t ( S )] } ) = T ∑ t =1 E ( ‖ K t x t − K t ˆ x t ( S ) ‖ 2 M t ) , (9) where in the first line we used the fact that Σ t | t ( S ) = E [ ( x t − ˆ x t ( S ))( x t − ˆ x t ( S )) T ] , and in the second line we substituted the definition of Θ t = K T t M t K t from eq. (8) . From eq. (9) , it is clear that each term tr [Θ t Σ t | t ( S )] captures the expected control mismatch between the imperfect state-information controller u t ( S ) = K t ˆ x t ( S ) (which is only aware of the measurements from the active sensors) and the perfect state-information controller K t x t . This is an important distinction from the existing sensor selection literature. In par- ticular, while standard sensor selection attempts to minimize the estimation covariance, for instance by minimizing T ∑ t =1 tr [Σ t | t ( S )] , T ∑ t =1 E ( ‖ x t − ˆ x t ( S ) ‖ 2 2 ) , (10) the proposed LQG cost formulation attempts to minimize the estimation error of only the informative states to the perfect state-information controller: for example, the contribution of all x t − ˆ x t ( S ) in the null space of K t to the total control mismatch in eq. (9) is zero. Hence, in contrast to minimizing the cost function in eq. (10) , minimizing the cost function in eq. (9) results to a control-aware sensing design. B. Scalable Near-optimal Sensing and Control Design This section proposes a practical design algorithm for Problem 1. The pseudo-code of the algorithm is presented in Algorithm 1. Algorithm 1 follows the result of Theorem 1, and jointly designs sensing and control by first computing an active sensor set (line 1 in Algorithm 1) and then computing the control policy (line 2 in Algorithm 1). We discuss each step of the design process in the rest of this section. 1) Near-optimal Sensing design: The optimal sensor design can be computed by solving the optimization problem in eq. (6). The problem is combinatorial in nature, since it requires to select a subset of elements of cardinality k out of all the available sensors that induces the smallest cost. In this section we propose a greedy algorithm, whose pseudo-code is given in Algorithm 2, that computes a (possibly approximate) solution to the problem in eq. (6). Our interest towards this greedy algorithm is motivated by the fact that it is scalable (in Section IV we show that its complexity is linear in the number of available sensors) and is provably close to the optimal solution of the problem in eq. (6) (we provide suboptimality bounds in Section IV). Algorithm 2 computes the matrices Θ t ( t = 1 , 2 , . . . , T ) which appear in the cost function in eq. (6) (line 1). Note that these matrices are independent on the choice of sensors. The set of active sensors ̂ S is initialized to the empty set (line 2). The “while loop” in line 3 will be executed k times and at each time a sensor is greedily added to the set of active sensors ̂ S . In particular, the “for loop” in lines 4-12 computes the estimation covariance resulting by adding a sensor to the current active sensor set and the corresponding cost (line 11). Finally, the sensor inducing the smallest cost is selected (line 13) and added to the current set of active sensors (line 14). 2) Control policy design: The optimal control design is computed as in eq. (7), where the control policy matrices K 1 , K 2 , . . . , K T are obtained from the recursion in eq. (8). In the following section we characterize the approximation and running-time performance of Algorithm 1. IV. P ERFORMANCE G UARANTEES FOR J OINT S ENSING AND C ONTROL D ESIGN We prove that Algorithm 1 is the first scalable algorithm for the joint sensing and control design Problem 1, and that it achieves a value for the LQG cost function in eq. (5) that is finitely close to the optimal. We start by introducing the notion of supermodularity ratio (Section IV-A), which will enable to bound the sub-optimality gap of Algorithm 1 (Section IV-B). A. Supermodularity ratio of monotone functions We define the supermodularity ratio of monotone functions. We start with the notions of monotonicity and supermodularity. Definition 2 (Monotonicity [20]). Consider any finite ground set V . The set function f : 2 V 7 → R is non-increasing if and only if for any A ⊆ A ′ ⊆ V , f ( A ) ≥ f ( A ′ ) . Definition 3 (Supermodularity [20, Proposition 2.1]). Con- sider any finite ground set V . The set function f : 2 V 7 → R is supermodular if and only if for any A ⊆ A ′ ⊆ V and x ∈ V , f ( A ) − f ( A ∪ { x } ) ≥ f ( A ′ ) − f ( A ′ ∪ { x } ) . In words, a set function f is supermodular if and only if it satisfies the following intuitive diminishing returns property: for any x ∈ V , the marginal drop f ( A ) − f ( A ∪ { x } ) diminishes as A grows; equivalently, for any A ⊆ V and x ∈ V , the marginal drop f ( A ) − f ( A∪{ x } ) is non-increasing. Definition 4 (Supermodularity ratio [21, Definition of elemental curvature on p. 5]). Consider any finite ground set V , and a non-increasing set function f : 2 V 7 → R . We define the supermodularity ratio of f as γ f = min A⊆V ,x,x ′ ∈V\A f ( A ) − f ( A ∪ { x } ) f ( A ∪ { x ′ } ) − f [( A ∪ { x ′ } ) ∪ { x } ] . In words, the supermodularity ratio of a monotone set function f measures how far f is from being supermodular. In particular, per the Definition 4 of supermodularity ratio, the supermodularity ratio γ f takes values in [0 , 1] , and • γ f = 1 if and only if f is supermodular, since if γ f = 1 , then Definition 4 implies f ( A ) − f ( A ∪ { x } ) ≥ f ( A ∪ { x ′ } ) − f [( A ∪ { x ′ } ) ∪ { x } ] , i.e., the drop f ( A ) − f ( A ∪ { x } ) is non-increasing as new elements are added in A . • γ f < 1 if and only if f is approximately supermodular , in the sense that if γ f < 1 , then Definition 4 implies f ( A ) − f ( A ∪ { x } ) ≥ γ f { f ( A ∪ { x ′ } ) − f [( A ∪ { x ′ } ) ∪ { x } ] } , i.e., the drop f ( A ) − f ( A ∪ { x } ) is approximately non- increasing as new elements are added in A ; specifically, the supermodularity ratio γ f captures how much ones needs to discount the drop f ( A ∪ { x ′ } ) − f [( A ∪ { x ′ } ) ∪ { x } ] , such that f ( A ) − f ( A ∪ { x } ) remains greater then, or equal to, f ( A ∪ { x ′ } ) − f [( A ∪ { x ′ } ) ∪ { x } ] . We next use the notion of supermodularity ratio Definition 4 to quantify the sub-optimality gap of Algorithm 1. B. Performance Analysis for Algorithm 1 We quantify Algorithm 1’s running time, as well as, Al- gorithm 1’s approximation performance, using the notion of supermodularity ratio introduced in Section IV-A. We con- clude the section by showing that for appropriate LQG cost matrices Q 1 , Q 2 , . . . , Q T and R 1 , R 2 , . . . , R T , Algorithm 1 achieves near-optimal approximate performance. Theorem 2 (Performance of Algorithm 1). For any active sensor set S ⊆ V , and admissible control policies u 1: T ( S ) , { u 1 ( S ) , u 2 ( S ) , . . . , u T ( S ) } , let h [ S , u 1: T ( S )] be Problem 1’s cost function, i.e., h [ S , u 1: T ( S )] , ∑ T t =1 E ( ‖ x t +1 ( S ) ‖ 2 Q t + ‖ u t ( S ) ‖ 2 R t ); Further define the following set-valued function and scalar: g ( S ) , min u 1: T ( S ) h [ S , u 1: T ( S )] , (11) g ⋆ , min S⊆V , |S|≤ k, u 1: T ( S ) h [ S , u 1: T ( S )] . The following results hold true: 1) (Approximation quality) Algorithm 1 returns an active sensor set ̂ S ⊂ V of cardinality k , and gain matrices K 1 , K 2 , . . . , K T , such that the cost h [ ̂ S , u 1: T ( ̂ S )] attained by the sensor set ̂ S and the corresponding control policies u 1: T ( ̂ S ) , { K 1 ˆ x 1 ( ̂ S ) , . . . , K T ˆ x T ( ̂ S ) } satisfies h ( ̂ S , u 1: T ( ̂ S )) − g ⋆ g ( ∅ ) − g ⋆ ≤ exp( − γ g ) (12) where γ g is the supermodularity ratio of g ( S ) in eq. (11) . 2) (Running time) Algorithm 1 runs in O ( k |V| T n 2 . 4 ) time, where n , max t =1 , 2 ,...,T ( n t ) is the maximum system size in eq. (1) . Theorem 2 ensures that Algorithm 1 is the first scalable algorithm for the sensing-constrained LQG control Problem 1. In particular, Algorithm 1’s running time O ( k |V| T n 2 . 4 ) is lin- ear both in the number of available sensors |V| , and the sensor set cardinality constraint k , as well as, linear in the Kalman filter’s running time across the time horizon { 1 , 2 . . . , T } . Specifically, the contribution n 2 . 4 T in Algorithm 1’s running time comes from the computational complexity of using the Kalman filter to compute the state estimation error covariances Σ t | t for each t = 1 , 2 , . . . , T [19, Appendix E]. Theorem 2 also guarantees that for non-zero ratio γ g Algorithm 1 achieves a value for Problem 1 that is finitely close to the optimal. In particular, the bound in ineq. (12) improves as γ g increases, since it is decreasing in γ g , and is characterized by the following extreme behaviors: for γ g = 1 , the bound in ineq. (12) is e − 1 ≃ . 37 , which is the minimum for any γ g ∈ [0 , 1] , and hence, the best bound on Algorithm 1’s approximation performance among all γ g ∈ [0 , 1] (ideally, the bound in ineq. (12) would be 0 for γ g = 1 , in which case Algorithm 1 would be exact, since it would be implied h ( ̂ S , u 1: T ( ̂ S )) = g ⋆ ; however, even for supermodular functions, the best bound one can achieve in the worst-case is e − 1 [22]); for γ g = 0 , ineq. (12) is uninformative since it simplifies to h ( ̂ S , u 1: T ( ̂ S )) ≤ g ( ∅ ) = h ( ∅ , u 1: T ( ∅ )) , which is trivially satisfied. 1 In the remaining of the section, we first prove that if the strict inequality ∑ T t =1 Θ t ≻ 0 holds, where each Θ t is defined as in eq. (8), then the ratio γ g in ineq. (12) is non-zero, and as result Algorithm 1 achieves a near-optimal approximation performance (Theorem 3). Then, we prove that the strict inequality ∑ T t =1 Θ t ≻ 0 holds true in all LQG control problem instances where a zero controller would result in a suboptimal behavior of the system and, as a result, LQG control design (through solving Problem 1) is necessary to achieve their desired system performance (Theorem 4). Theorem 3 (Lower bound for supermodularity ratio γ g ). Let Θ t for all t = 1 , 2 , . . . , T be defined as in eq. (8) , g ( S ) 1 The inequality h ( ̂ S , u 1: T ( ̂ S )) ≤ h ( ∅ , u 1: T ( ∅ )) simply states that a con- trol policy that is informed by the active sensor set S has better performance than a policy that does not use any sensor; for a more formal proof we refer the reader to Appendix B. be defined as in eq. (11) , and for any sensor i ∈ V , ̄ C i,t be the normalized measurement matrix V − 1 / 2 i,t C i,t . If ∑ T t =1 Θ t ≻ 0 , the supermodularity ratio γ g is non-zero. In addition, if we consider for simplicity that the Frobenius norm of each ̄ C i,t is 1 , i.e., tr ( ̄ C i,t ̄ C T i,t ) = 1 , and that tr [Σ t | t ( ∅ )] ≤ λ 2 max [Σ t | t ( ∅ )] , γ g ’s lower bound is γ g ≥ λ min ( ∑ T t =1 Θ t ) λ max ( ∑ T t =1 Θ t ) min t ∈{ 1 , 2 ,...,T } λ 2 min [Σ t | t ( V )] max t ∈{ 1 , 2 ,...,T } λ 2 max [Σ t | t ( ∅ )] 1 + min i ∈V ,t ∈{ 1 , 2 ...,T } λ min [ ̄ C i Σ t | t ( V ) ̄ C T i ] 2 + max i ∈V ,t ∈{ 1 , 2 ...,T } λ max [ ̄ C i Σ t | t ( ∅ ) ̄ C T i ] . (13) The supermodularity ratio bound in ineq. (13) suggests two cases under which γ g can increase, and correspondingly, the performance bound of Algorithm 1 in eq. (12) can improve: a) Case 1 where γ g ’s bound in ineq. (13) increases: When the fraction λ min ( ∑ T t =1 Θ t ) /λ max ( ∑ T t =1 Θ t ) increases to 1 , then the right-hand-side in ineq. (13) increases. Equiv- alently, the right-hand-side in ineq. (13) increases when on average all the directions x ( i ) t − ˆ x ( i ) t of the estimation errors x t − ˆ x t = ( x (1) t − ˆ x (1) t , x (2) t − ˆ x (2) t , . . . , x ( n t ) t − ˆ x ( n t ) t ) become equally important in selecting the active sensor set. To see this, consider for example that λ max (Θ t ) = λ min (Θ t ) = λ ; then, the cost function in eq. (6) that Algorithm 1 minimizes to select the active sensor set becomes T ∑ t =1 tr [Θ t Σ t | t ( S )] = λ T ∑ t =1 E [ tr ( ‖ x t − ˆ x t ( S ) ‖ 2 2 ) ] = λ T ∑ t =1 n t ∑ i =1 E [ tr ( ‖ x ( i ) t − ˆ x ( i ) t ( S ) | 2 2 ) ] . Overall, it is easier for Algorithm 1 to approximate a solution to Problem 1 as the cost function in eq. (6) becomes the cost function in the standard sensor selection problems where one minimizes the total estimation covariance as in eq. (10). b) Case 2 where γ g ’s bound in ineq. (13) increases: When either the numerators of the last two fractions in the right-hand-side of ineq. (13) increase or the denominators of the last two fractions in the right-hand-side of ineq. (13) decrease, then the right-hand-side in ineq. (13) increases. In particular, the numerators of the last two fractions in right- hand-side of ineq. (13) capture the estimation quality when all available sensors in V are used, via the terms of the form λ min [Σ t | t ( V )] and λ min [ ̄ C i,t Σ t | t ( V ) ̄ C T i,t ] . Interestingly, this suggests that the right-hand-side of ineq. (13) increases when the available sensors in V are inefficient in achieving low estimation error, that is, when the terms of the form λ min [Σ t | t ( V )] and λ min [ ̄ C i,t Σ t | t ( V ) ̄ C T i,t ] increase. Similarly, the denominators of the last two fractions in right-hand- side of ineq. (13) capture the estimation quality when no sensors are used, via the terms of the form λ max [Σ t | t ( ∅ )] and λ max [ ̄ C i,t Σ t | t ( ∅ ) ̄ C T i,t ] . This suggests that the right-hand-side of ineq. (13) increases when the measurement noise increases. We next give a control-level equivalent condition to Theo- rem 3’s condition ∑ T t =1 Θ t ≻ 0 for non-zero ratio γ g . -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 x [meters] -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 y [meters] (a) formation control (b) unmanned aerial robot Fig. 1. Examples of applications of the proposed sensing-constrained LQG control framework: (a) sensing-constrained formation control and (b) resource- constrained robot navigation. Theorem 4 (Control-level condition for near-optimal sensor selection). Consider the LQG problem where for any time t = 1 , 2 , . . . , T , the state x t is known to each controller u t and the process noise w t is zero, i.e., the optimization problem min u 1: T ∑ T t =1 [ ‖ x t +1 ‖ 2 Q t + ‖ u t ( x t ) ‖ 2 R t ] ∣ ∣ Σ t | t = W t =0 . (14) Let A t to be invertible for all t = 1 , 2 , . . . , T ; the strict inequality ∑ T t =1 Θ t ≻ 0 holds if and only if for all non-zero initial conditions x 1 , 0 / ∈ arg min u 1: T ∑ T t =1 [ ‖ x t +1 ‖ 2 Q t + ‖ u t ( x t ) ‖ 2 R t ] ∣ ∣ Σ t | t = W t =0 . Theorem 4 suggests that Theorem 3’s sufficient condition ∑ T t =1 Θ t ≻ 0 for non-zero ratio γ g holds if and only if for any non-zero initial condition x 1 the all-zeroes control policy u 1: T = (0 , 0 , . . . , 0) is suboptimal for the noiseless perfect state-information LQG problem in eq. (14). Overall, Algorithm 1 is the first scalable algorithm for Problem 1, and (for the LQG control problem instances of interest where a zero controller would result in a suboptimal behavior of the system and, as a result, LQG control design is necessary to achieve their desired system performance) it achieves close to optimal approximate performance. V. N UMERICAL E XPERIMENTS We consider two application scenarios for the pro- posed sensing-constrained LQG control framework: sensing- constrained formation control and resource-constrained robot navigation . We present a Monte Carlo analysis for both scenar- ios, which demonstrates that (i) the proposed sensor selection strategy is near-optimal, and in particular, the resulting LQG- cost (tracking performance) matches the optimal selection in all tested instances for which the optimal selection could be computed via a brute-force approach, (ii) a more naive selection which attempts to minimize the state estimation covariance [15] (rather than the LQG cost) has degraded LQG tracking performance, often comparable to a random selection, (iii) in the considered instances, a clever selection of a small subset of sensors can ensure an LQG cost that is close to the one obtained by using all available sensors, hence providing an effective alternative for control under sensing constraints [23]. VI. S ENSING - CONSTRAINED FORMATION CONTROL Simulation setup. The first application scenario is illus- trated in Fig. 1(a). A team of n agents (blue triangles) moves in a 2D scenario. At time t = 1 , the agents are randomly deployed in a 10m × 10m square and their objective is to reach a target formation shape (red stars); in the example of Fig. 1(a) the desired formation has an hexagonal shape, while in general for a formation of n , the desired formation is an equilateral polygon with n vertices. Each robot is modeled as a double-integrator, with state x i = [ p i v i ] T ∈ R 4 ( p i is the 2D position of agent i , while v i is its velocity), and can control its own acceleration u i ∈ R 2 ; the process noise is chosen as a diagonal matrix W = diag ( [1 e − 2 , 1 e − 2 , 1 e − 4 , 1 e − 4 ] ) . Each robot i is equipped with a GPS receiver, which can measure the agent position p i with a covariance V gps,i = 2 · I 2 . Moreover, the agents are equipped with lidar sensors allowing each agent i to measure the relative position of another agent j with covariance V lidar,ij = 0 . 1 · I 2 . The agents have very limited on-board resources, hence they can only activate a subset of k sensors. Hence, the goal is to select the subset of k sensors, as well as to compute the control policy that ensure best tracking performance, as measured by the LQG objective. For our tests, we consider two problem setups. In the first setup, named homogeneous formation control , the LQG weigh matrix Q is a block diagonal matrix with 4 × 4 blocks, with each block i chosen as Q i = 0 . 1 · I 4 ; since each 4 × 4 block of Q weights the tracking error of a robot, in the homogeneous case the tracking error of all agents is equally important. In the second setup, named heterogeneous formation control , the matrix Q is chose as above, except for one of the agents, say robot 1, for which we choose Q 1 = 10 · I 4 ; this setup models the case in which each agent has a different role or importance, hence one weights differently the tracking error of the agents. In both cases the matrix R is chosen to be the identity matrix. The simulation is carried on over T time steps, and T is also chosen as LQG horizon. Results are averaged over 100 Monte Carlo runs: at each run we randomize the initial estimation covariance Σ 1 | 0 . Compared techniques. We compare five techniques. All techniques use an LQG-based estimator and controller, and they only differ by the selections of the sensors used. The first approach is the optimal sensor selection, denoted as optimal , which attains the minimum of the cost function in eq. (6), and that we compute by enumerating all possible subsets; this brute-force approach is only viable when the number of available sensors is small. The second approach is a pseudo-random sensor selection, denoted as random ∗ , which selects all the GPS measurements and a random subset of the lidar measurements; note that we do not consider a fully random selection since in practice this often leads to an unobservable system, hence causing divergence of the LQG cost. The third approach, denoted as logdet , selects sensors so to minimize the average log det of the estimation covariance over the horizon; this approach resembles [15] and is agnostic to the control task. The fourth approach is the proposed sensor selection strategy, described in Algorithm 2, and is denoted as s-LQG . Finally, we also report the LQG performance when all sensors are selected. This approach is denoted as allSensors . Results. The results of our numerical analysis are reported in Fig. 2. When not specified otherwise, we consider a formation of n = 4 agents, which can only use a total of k = 6 sensors, and a control horizon T = 20 . Fig. 2(a) shows the LQG cost attained by the compared techniques for increasing control horizon and for the homogeneous case. We note that, in all tested instance, the proposed approach s-LQG matches the optimal selection optimal , and both approaches are relatively close to allSensors , which selects all the available sensors ( n + n 2 2 ). On the other hand logdet leads to worse tracking performance, and it is often close to the pseudo-random selection random ∗ . These considerations are confirmed by the heterogeneous setup, shown in Fig. 2(b). In this case the separation between the proposed approach and logdet becomes even larger; the intuition here is that the heterogeneous case rewards differently the tracking errors at different agents, hence while logdet attempts to equally reduce the estimation error across the formation, the proposed approach s-LQG selects sensors in a task-oriented fashion, since the matrices Θ t for all t = 1 , 2 , . . . , T in the cost function in eq. (6) incorporate the LQG weight matrices. Fig. 2(c) shows the LQG cost attained by the compared techniques for increasing number of selected sensors k and for the homogeneous case. We note that for increasing number of sensors all techniques converge to allSensors (the entire ground set is selected). As in the previous case, the proposed approach s-LQG matches the optimal selection optimal . Fig. 2(d) shows the same statistics for the heterogeneous case. We note that in this case logdet is inferior to s-LQG even in the case with small k . Moreover, an interesting fact is that s-LQG matches allSensors already for k = 7 , meaning that the LQG performance of the sensing-constraint setup is indistinguishable from the one using all sensors; intuitively, in the heterogeneous case, adding more sensors may have marginal impact on the LQG cost (e.g., if the cost rewards a small tracking error for robot 1, it may be of little value to take a lidar measurement between robot 3 and 4). This further stresses the importance of the proposed framework as a parsimonious way to control a system with minimal resources. Fig. 2(e) and Fig. 2(f) show the LQG cost attained by the compared techniques for increasing number of agents, in the homogeneous and heterogeneous case, respectively. To ensure observability, we consider k = round (1 . 5 n ) , i.e., we select a number of sensors 50% larger than the smallest set of sensors that can make the system observable. We note that optimal quickly becomes intractable to compute, hence we omit values beyond n = 4 . In both figures, the main observation is that the separation among the techniques increases with the number of agents, since the set of available sensors quickly increases with n . Interestingly, in the heterogeneous case s-LQG re- mains relatively close to allSensors , implying that for the purpose of LQG control, using a cleverly selected small subset 10 15 20 25 30 horizon 2 4 6 8 10 12 LQG cost random* optimal logdet s-LQG allSensors (a) homogeneous 10 15 20 25 30 horizon 50 100 150 200 250 LQG cost random* optimal logdet s-LQG allSensors (b) heterogeneous 4 5 6 7 8 9 10 maxNrUsedSensors 4 6 8 10 12 14 16 LQG cost random* optimal logdet s-LQG allSensors (c) homogeneous 4 5 6 7 8 9 10 maxNrUsedSensors 50 100 150 200 250 300 LQG cost random* optimal logdet s-LQG allSensors (d) heterogeneous 3 5 7 9 11 nrRobots 0 5 10 15 20 25 LQG cost random* optimal logdet s-LQG allSensors (e) homogeneous 3 5 7 9 11 nrRobots 40 60 80 100 120 140 160 180 LQG cost random* optimal logdet s-LQG allSensors (f) heterogeneous Fig. 2. LQG cost for increasing (a)-(b) control horizon T , (c)-(d) number of selected sensors k , and (e)-(f) number of agents n . Statistics are reported for the homogeneous formation control setup (left column), and the heterogeneous setup (right column). Results are averaged over 100 Monte Carlo runs. of sensors still ensures excellent tracking performance. VII. R ESOURCE - CONSTRAINED ROBOT NAVIGATION Simulation setup. The second application scenario is illus- trated in Fig. 1(b). An unmanned aerial robot (UAV) moves in a 3D scenario, starting from a randomly selected initial location. The objective of the UAV is to land, and more specifically, it has to reach the position [0 , 0 , 0] with zero velocity. The UAV is modeled as a double-integrator, with state x i = [ p i v i ] T ∈ R 6 ( p i is the 3D position of agent i , while v i is its velocity), and can control its own acceleration u i ∈ R 3 ; the process noise is chosen as W = I 6 . The UAV is equipped with multiple sensors. It has an on-board GPS receiver, measuring the UAV position p i with a covariance 2 · I 3 , and an altimeter, measuring only the last component of p i (altitude) with standard deviation 0 . 5m . Moreover, the UAV can use a stereo camera to measure the relative position of ℓ landmarks on the ground; for the sake of the numerical example, we assume the location of each landmark to be known only approximately, and we associate to each landmark an uncertainty covariance (red ellipsoids in Fig. 1(b)), which is 10 20 30 40 50 horizon 90 100 110 120 130 LQG cost / T random* optimal logdet s-LQG allSensors (a) homogeneous 4 6 8 10 12 maxNrUsedSensors 2250 2300 2350 2400 2450 2500 LQG cost random* optimal logdet s-LQG allSensors (b) heterogeneous Fig. 3. LQG cost for increasing (a) control horizon T , and (b) number of selected sensors k . Statistics are reported for the heterogeneous setup. Results are averaged over 100 Monte Carlo runs. randomly generated at the beginning of each run. The UAV has limited on-board resources, hence it can only activate a subset of k sensors. For instance, the resource-constraints may be due to the power consumption of the GPS and the altimeter, or may be due to computational constraints that prevent to run multiple object-detection algorithms to detect all landmarks on the ground. Similarly to the previous case, we phrase the problem as a sensing-constraint LQG problem, and we use Q = diag ( [1 e − 3 , 1 e − 3 , 10 , 1 e − 3 , 1 e − 3 , 10] ) and R = I 3 . Note that the structure of Q reflects the fact that during landing we are particularly interested in controlling the vertical direction and the vertical velocity (entries with larger weight in Q ), while we are less interested in controlling accurately the horizontal position and velocity (assuming a sufficiently large landing site). In the following, we present results averaged over 100 Monte Carlo runs: in each run, we randomize the covariances describing the landmark position uncertainty. Compared techniques. We consider the five techniques discussed in the previous section. As in the formation control case, the pseudo-random selection random ∗ always includes the GPS measurement (which alone ensures observability) and a random selection of the other available sensors. Results. The results of our numerical analysis are reported in Fig. 3. When not specified otherwise, we consider a total of k = 3 sensors to be selected, and a control horizon T = 20 . Fig. 3(a) shows the LQG cost attained by the compared techniques for increasing control horizon. For visualization purposes we plot the cost normalized by the horizon, which makes more visible the differences among the techniques. Sim- ilarly to the formation control example, s-LQG matches the optimal selection optimal , while logdet and random ∗ have suboptimal performance. Fig. 3(b) shows the LQG cost attained by the compared techniques for increasing number of selected sensors k . Clearly, all techniques converge to allSensors for increas- ing k , but in the regime in which few sensors are used s-LQG still outperforms alternative sensor selection schemes, and matches in all cases the optimal selection optimal . VIII. C ONCLUDING R EMARKS In this paper, we introduced the sensing-constrained LQG control Problem 1, which is central in modern control ap- plications that range from large-scale networked systems to miniaturized robotics networks. While the computation of the optimal sensing strategy is intractable, We provided the first scalable algorithm for Problem 1, Algorithm 1, and under mild conditions on the system and LQG matrices, proved that Algorithm 1 computes a near-optimal sensing strategy with provable sub-optimality guarantees. To this end, we showed that a separation principle holds, which allows the design of sensing, estimation, and control policies in isolation. We motivated the importance of the sensing-constrained LQG Problem 1, and demonstrated the effectiveness of Algorithm 1, by considering two application scenarios: sensing-constrained formation control , and resource-constrained robot navigation . A PPENDIX A: P RELIMINARY FACTS This appendix contains a set of lemmata that will be used to support the proofs in this paper (Appendices B–F). Lemma 1 ([24, Proposition 8.5.5]). Consider two positive definite matrices M 1 and M 2 . If M 1  M 2 then M − 1 2  M − 1 1 . Lemma 2 (Trace inequality [24, Proposition 8.4.13]). Con- sider a symmetric matrix A , and a positive semi-definite matrix B of appropriate dimension. Then, λ min ( A ) tr ( B ) ≤ tr ( AB ) ≤ λ max ( A ) tr ( B ) . Lemma 3 (Woodbury identity [24, Corollary 2.8.8]). Con- sider matrices A , C , U and V of appropriate dimensions, such that A , C , and A + U CV are invertible. Then, ( A + U CV ) − 1 = A − 1 − A − 1 U ( C − 1 + V A − 1 U ) − 1 V A − 1 . Lemma 4 ([24, Proposition 8.5.12]). Consider two symmetric matrices A 1 and A 2 , and a positive semi-definite matrix B . If A 1  A 2 , then tr ( A 1 B ) ≤ tr ( A 2 B ) . Lemma 5 ([19, Appendix E]). For any sensor set S ⊆ V , and for all t = 1 , 2 , . . . , T , let ˆ x t ( S ) be the Kalman estimator of the state x t , i.e., ˆ x t ( S ) , and Σ t | t ( S ) be ˆ x t ( S ) ’s error covariance, i.e., Σ t | t ( S ) , E [(ˆ x t ( S ) − x t )(ˆ x t ( S ) − x t ) T ] . Then, Σ t | t ( S ) is the solution of the Kalman filtering recursion Σ t | t ( S ) = [Σ t | t − 1 ( S ) − 1 + C t ( S ) T V t ( S ) − 1 C t ( S )] − 1 , Σ t +1 | t ( S ) = A t Σ t | t ( S ) A T t + W t , (15) with boundary condition the Σ 1 | 0 ( S ) = Σ 1 | 0 . Lemma 6. For any sensor set S ⊆ V , let Σ 1 | 1 ( S ) be defined as in eq. (15) , and consider two sensor sets S 1 , S 2 ⊆ V . If S 1 ⊆ S 2 , then Σ 1 | 1 ( S 1 )  Σ 1 | 1 ( S 2 ) . Proof of Lemma 6: Let D = S 2 \ S 1 , and observe that for all t = 1 , 2 , . . . , T , the notation in Definition 1 implies C t ( S 2 ) T V t ( S 2 ) − 1 C t ( S 2 ) = ∑ i ∈S 2 C T i,t V i,t C i,t = ∑ i ∈S 1 C T i,t V i,t C i,t + ∑ i ∈D C T i,t V i,t C i,t = ∑ i ∈S 1 C T i,t V i,t C i,t  C t ( S 1 ) T V t ( S 1 ) − 1 C t ( S 1 ) . (16) Therefore, Lemma 1 and ineq. (16) imply Σ 1 | 1 ( S 2 ) = [Σ − 1 1 | 0 + C 1 ( S 2 ) T V t ( S 2 ) − 1 C t ( S 2 )] − 1  [Σ − 1 1 | 0 + C 1 ( S 1 ) T V t ( S 1 ) − 1 C t ( S 1 )] − 1 = Σ 1 | 1 ( S 1 ) . Lemma 7. Let Σ t | t be defined as in eq. (15) with boundary condition the Σ 1 | 0 ; similarly, let ̄ Σ t | t be defined as in eq. (15) with boundary condition the ̄ Σ 1 | 0 . If Σ t | t  ̄ Σ t | t , then Σ t +1 | t  ̄ Σ t +1 | t . Proof of Lemma 7: We complete the proof in two steps: first, from eq. (15), it its Σ t +1 | t = A t Σ t | t A T t + W t  A t ̄ Σ t | t A T t + W t = ̄ Σ t +1 | t . Then, from Σ t | t  ̄ Σ t | t , it follows A t Σ t | t A T t  A t ̄ Σ t | t A T t . Lemma 8. Let Σ t | t − 1 be defined as in eq. (15) with boundary condition the Σ 1 | 0 ; similarly, let ̄ Σ t | t − 1 be defined as in eq. (15) with boundary condition the ̄ Σ 1 | 0 . If Σ t | t − 1  ̄ Σ t | t − 1 , then Σ t | t  ̄ Σ t | t . Proof of Lemma 8: From eq. (15), it is Σ t | t = (Σ − 1 t | t − 1 + C T t V − 1 t C t ) − 1  ( ̄ Σ − 1 t | t − 1 + C T t V − 1 t C t ) − 1 = ̄ Σ t | t , since Lemma 1 and the condition Σ t | t − 1  ̄ Σ t | t − 1 imply Σ − 1 t | t − 1 + C T t V − 1 t C t  ̄ Σ − 1 t | t − 1 + C T t V − 1 t C t , which in turn implies (Σ − 1 t | t − 1 + C T t V − 1 t C t ) − 1  ( ̄ Σ − 1 t | t − 1 + C T t V − 1 t C t ) − 1 . Corollary 1. Let Σ t | t be defined as in eq. (15) with boundary condition the Σ 1 | 0 ; similarly, let ̄ Σ t | t be defined as in eq. (15) with boundary condition the ̄ Σ 1 | 0 . If Σ t | t  ̄ Σ t | t , then Σ t + i | t + i  ̄ Σ t + i | t + i for any positive integer i . Proof of Corollary 1: If Σ t | t  ̄ Σ t | t , from Lemma 7, we get Σ t +1 | t  ̄ Σ t +1 | t , which, from Lemma 8, implies Σ t +1 | t +1  ̄ Σ t +1 | t +1 . By repeating the previous argument another ( i − 1) times, the proof is complete. Corollary 2. Let Σ t | t be defined as in eq. (15) with boundary condition the Σ 1 | 0 ; similarly, let ̄ Σ t | t be defined as in eq. (15) with boundary condition the ̄ Σ 1 | 0 . If Σ t | t  ̄ Σ t | t , then Σ t + i | t + i − 1  ̄ Σ t + i | t + i − 1 for any positive integer i . Proof of Corollary 2: If Σ t | t  ̄ Σ t | t , from Corollary 1, we get Σ t + i − 1 | t + i − 1  ̄ Σ t + i − 1 | t + i − 1 , which, from Lemma 7, implies Σ t + i | t + i − 1  ̄ Σ t + i | t + i − 1 . A PPENDIX B: P ROOF OF T HEOREM 1 The proof of Theorem 1 follows from the following lemma. Lemma 9. For any active sensor set S ⊆ V , and admissible control policies u 1: T ( S ) , { u 1 ( S ) , u 2 ( S ) , . . . , u T ( S ) } , let h [ S , u 1: T ( S )] be Problem 1’s cost function, i.e., h [ S , u 1: T ( S )] , ∑ T t =1 E ( ‖ x t +1 ( S ) ‖ 2 Q t + ‖ u t ( S ) ‖ 2 R t ); Further define the following set-valued function: g ( S ) , min u 1: T ( S ) h [ S , u 1: T ( S )] , Consider any sensor set S ⊆ V , and let u ⋆ 1: T, S be the vec- tor of control policies ( K 1 ˆ x 1 , S , K 2 ˆ x 2 , S , . . . , K T ˆ x T, S ) . Then u ⋆ 1: T, S is an optimal control policy: u ⋆ 1: T, S ∈ arg min u 1: T ( S ) h [ S , u 1: T ( S )] , (17) i.e., g ( S ) = h [ S , u ⋆ 1: T ( S )] , and in particular, u ⋆ 1: T, S attains a (sensor-dependent) LQG cost equal to: g ( S ) = E ( ‖ x 1 ‖ N 1 )+ T ∑ t =1 { tr [Θ t Σ t | t ( S )] + tr ( W t S t ) } . (18) Proof of Lemma 9: Let h t [ S , u t : T ( S )] be the LQG cost in Problem 1 from time t up to time T , i.e., h t [ S , u t : T ( S )] , T ∑ k = t E ( ‖ x k +1 ( S ) ‖ 2 Q t + ‖ u k ( S ) ‖ 2 R t ) . and define g t ( S ) , min u t : T ( S ) h t [ S , u t : T ( S )] . Clearly, g 1 ( S ) matches the LQG cost in eq. (18). We complete the proof inductively. In particular, we first prove Lemma 9 for t = T , and then for any other t ∈ { 1 , 2 , . . . , T − 1 } . To this end, we use the following observa- tion: given any sensor set S , and any time t ∈ { 1 , 2 , . . . , T } , g t ( S ) = min u t ( S ) [ E ( ‖ x t +1 ( S ) ‖ 2 Q t + ‖ u t ( S ) ‖ 2 R t ) + g t +1 ( S ) ] , (19) with boundary condition the g T +1 ( S ) = 0 . In particular, eq. (19) holds since g t ( S ) = min u t ( S ) E { ‖ x t +1 ( S ) ‖ 2 Q t + ‖ u t ( S ) ‖ 2 R t )+ min u t +1: T ( S ) h t +1 [ S , u t +1: T ( S )] } , where one can easily recognize the second summand to match the definition of g t +1 ( S ) . We prove Lemma 9 for t = T . From eq. (19), for t = T , g T ( S ) = min u T ( S ) [ E ( ‖ x T +1 ( S ) ‖ 2 Q T + ‖ u T ( S ) ‖ 2 R T ) ] = min u T ( S ) [ E ( ‖ A T x T + B T u T ( S ) + w T ‖ 2 Q T + ‖ u T ( S ) ‖ 2 R T ) ] , (20) since x T +1 ( S ) = A T x T + B T u T ( S ) + w T , as per eq. (1); we note that for notational simplicity we drop henceforth the dependency of x T on S since x T is independent of u T ( S ) , which is the variable under optimization in the optimization problem in (20). Developing eq. (20) we get: g T ( S ) = min u T ( S ) [ E ( u T ( S ) T B T T Q T B T u T ( S ) + w T T Q T w T + x T T A T T Q T A T x T + 2 x T T A T T Q T B T u T ( S )+ 2 x T T A T T Q T w T + 2 u T ( S ) T B T T Q T w T + ‖ u T ( S ) ‖ 2 R T ) ] = min u T ( S ) [ E ( u T ( S ) T B T T Q T B T u T ( S ) + ‖ w T ‖ 2 Q T + x T T A T T Q T A T x T + 2 x T T A T T Q T B T u T ( S ) + ‖ u T ‖ 2 R T ) ] , (21) where the latter equality holds since w T has zero mean and w T , x T , and u T ( S ) are independent. From eq. (21), rearranging the terms, and using the notation in eq. (8), g T ( S ) = min u T ( S ) [ E ( u T ( S ) T ( B T T Q T B T + R T ) u T ( S )+ ‖ w T ‖ 2 Q T + x T T A T T Q T A T x T + 2 x T T A T T Q T B T u T ( S ) ] = min u T ( S ) [ E ( ‖ u T ( S ) ‖ 2 M T + ‖ w T ‖ 2 Q T + x T T A T T Q T A T x T + 2 x T T A T T Q T B T u T ( S ) ] = min u T ( S ) [ E ( ‖ u T ( S ) ‖ 2 M T + ‖ w T ‖ 2 Q T + x T T A T T Q T A T x T − 2 x T T ( − A T T Q T B T M − 1 T ) M T u T ( S ) ] = min u T ( S ) [ E ( ‖ u T ( S ) ‖ 2 M T + ‖ w T ‖ 2 Q T + x T T A T T Q T A T x T − 2 x T T K T T M T u T ( S ) ] = min u T ( S ) [ E ( ‖ u T ( S ) − K T x T ‖ 2 M T + ‖ w T ‖ 2 Q T + x T T ( A T T Q T A T − K T T M T K T ) x T ] = min u T ( S ) ( E ( ‖ u T ( S ) − K T x T ‖ 2 M T + ‖ w T ‖ 2 Q T + x T T ( A T T Q T A T − Θ T ) x T ) = min u T ( S ) [ E ( ‖ u T ( S ) − K T x T ‖ 2 M T + ‖ w T ‖ 2 Q T + ‖ x T ‖ 2 N T ] = min u T ( S ) E ( ‖ u T ( S ) − K T x T ‖ 2 M T ) + tr ( W T Q T ) + E ( ‖ x T ‖ 2 N T ) , (22) where the latter equality holds since E ( ‖ w T ‖ 2 Q T ) = E [ tr ( w T T Q T w T )] = tr ( E ( w T T w T ) Q T ) = tr ( W T Q T ) . Now we note that min u T ( S ) E ( ‖ u T ( S ) − K T x T ‖ 2 M T ) = E ( ‖ K T ˆ x T ( S ) − K T x T ‖ 2 M T ) = tr ( Θ T Σ T | T ( S ) ) , (23) since ˆ x T ( S ) is the Kalman estimator of the state x T , i.e., the minimum mean square estimator of x T , which implies that K T ˆ x T ( S ) is the minimum mean square estimator of K T x T ( S ) [19, Appendix E]. Substituting (23) back into eq. (22), we get: g T ( S ) = E ( ‖ x T ‖ 2 N T ) + tr ( Θ T Σ T | T ( S ) ) + tr ( W T Q T ) , which proves that Lemma 9 holds for t = T . We now prove that if Lemma 9 holds for t = l + 1 , it also holds for t = l . To this end, assume eq. (19) holds for t = l + 1 . Using the notation in eq. (8), g l ( S ) = min u l ( S ) [ E ( ‖ x l +1 ( S ) ‖ 2 Q l + ‖ u l ( S ) ‖ 2 R l ) + g l +1 ( S ) ] = min u l ( S ) { E ( ‖ x l +1 ( S ) ‖ 2 Q l + ‖ u l ( S ) ‖ 2 R l )+ E ( ‖ x l +1 ( S ) ‖ 2 N l +1 ) + ∑ T k = l +1 [ tr ( Θ k Σ k | k ( S ) ) + tr ( W k S k )] } = min u l ( S ) { E ( ‖ x l +1 ( S ) ‖ 2 S l + ‖ u l ( S ) ‖ 2 R l )+ ∑ T k = l +1 [ tr ( Θ k Σ k | k ( S ) ) + tr ( W k S k )] } = ∑ T k = l +1 [ tr ( Θ k Σ k | k ( S ) ) + tr ( W k S k )]+ min u l ( S ) E ( ‖ x l +1 ( S ) ‖ 2 S l + ‖ u l ( S ) ‖ 2 R l ) . (24) In eq. (24), for the last summand in the last right-hand-side, by following the same steps as for the proof of Lemma 9 for t = T , we have: min u l ( S ) E ( ‖ x l +1 ( S ) ‖ 2 S l + ‖ u l ( S ) ‖ 2 R l ) = E ( ‖ x l ‖ 2 N l ) + tr ( Θ l Σ l | l ( S ) ) + tr ( W l Q l ) , (25) and u l ( S ) = K l ˆ x l ( S ) . Therefore, by substituting eq. (25) back to eq. (24), we get: g l ( S ) = E ( ‖ x l ‖ 2 N l ) + ∑ T k = l [ tr ( Θ k Σ k | k ( S ) ) + tr ( W k S k )] . (26) which proves that if Lemma 9 holds for t = l +1 , it also holds for t = l . By induction, this also proves that Lemma 9 holds for l = 1 , and we already observed that g 1 ( S ) matches the original LQG cost in eq. (18), hence concluding the proof. Proof of Theorem 1: The proof easily follows from Lemma 9. Eq. (6) is a direct consequence of eq. (18), since both E ( x T 1 N 1 x 1 ) = tr ( Σ 1 | 1 N 1 ) and ∑ T t =1 tr ( W t S t ) are independent of the choice of the sensor set S . Second, (7) directly follows from eq. (17). A PPENDIX C: P ROOF OF T HEOREM 2 The following result is used in the proof of Theorem 2. Proposition 1 (Monotonicity of cost function in eq. (6) ). Consider the cost function in eq. (6) , namely, for any sensor set S ⊆ V the set function ∑ T t =1 tr ( Θ t Σ t | t ( S ) ) . Then, for any sensor sets such that S 1 ⊆ S 2 ⊆ V , it holds ∑ T t =1 tr ( Θ t Σ t | t ( S 1 ) ) ≥ ∑ T t =1 tr ( Θ t Σ t | t ( S 2 ) ) . Proof: Lemma 6 implies Σ 1 | 1 ( S 1 )  Σ 1 | 1 ( S 2 ) , and then, Corollary 1 implies Σ t | t ( S 1 )  Σ t | t ( S 2 ) . Finally, for any t = 1 , 2 , . . . , T , Lemma 4 implies tr ( Θ t Σ t | t ( S 1 ) ) ≥ tr ( Θ t Σ t | t ( S 2 ) ) , since each Θ t is symmetric. Proof of part (1) of Theorem 2 (Algorithm 2’s approxima- tion quality): Using Proposition 1, and the supermodularity ratio Definition 4, the proof of the upper bound exp( − γ g ) in ineq. (12) follows the same steps as the proof of [25, Theorem 1]. Proof of part (2) of Theorem 2 (Algorithm 1’s running time): We compute Algorithm 1’s running time by adding the running times of Algorithm 1’s lines 1 and 2: a) Running time of Algorithm 1’s line 1: Algorithm 1’s line 1 needs O ( k |V| T n 2 . 4 ) time. In particular, Algorithm 1’s line 2 running time is the running time of Algorithm 2, whose running time we show next to be O ( k |V| T n 2 . 4 ) . To this end, we first compute the running time of Algorithm 2’s line 1, and then the running time of Algorithm 2’s lines 3–15. Algo- rithm 2’s line 1 needs O ( n 2 . 4 ) time, using the Coppersmith algorithm for both matrix inversion and multiplication [26]. Then, Algorithm 2’s lines 3–15 are repeated k times, due to the “while loop” between lines 3 and 15. We now need to find the running time of Algorithm 2’s lines 4–14; to this end, we first find the running time of Algorithm 2’s lines 4–12, and then the running time of Algorithm 2’s lines 13 and 14. In more detail, the running time of Algorithm 2’s lines 4–12 is O ( |V| T n 2 . 4 ) , since Algorithm 2’s lines 5–11 are repeated at most |V| times and Algorithm 2’s lines 6–10, as well as line 11 need O ( T n 2 . 4 ) time, using the Coppersmith-Winograd algorithm for both matrix inversion and multiplication [26]. Moreover, Algorithm 2’s lines 13 and 14 need O [ |V| log( |V| )] time, since line 13 asks for the minimum among at most |V| values of the cost ( · ) , which takes O [ |V| log( |V| )] time to be found, using, e.g., the merge sort algorithm. In sum, Algorithm 2’s running time is O [ n 2 . 4 + k |V| T n 2 . 4 + k |V| log( |V| )] = O ( k |V| T n 2 . 4 ) . b) Running time of Algorithm 1’s line 2: Algorithm 1’s line 2 needs O ( n 2 . 4 ) time, using the Coppersmith algorithm for both matrix inversion and multiplication [26]. In sum, Algorithm 1’s running time is O ( k |V| T n 2 . 4 + n 2 . 4 ) = O ( k |V| T n 2 . 4 ) . A PPENDIX D: P ROOF OF T HEOREM 3 Proof of Theorem 3: We complete the proof by first deriving a lower bound for the numerator of the supermodu- larity ratio γ g , and then, by deriving an upper bound for the denominator of the supermodularity ratio γ g . We use the following notation: c , E ( x T 1 N 1 x 1 ) + ∑ T t =1 tr ( W t S t ) , and for any sensor set S ⊆ V , and time t = 1 , 2 , . . . , T , f t ( S ) , tr ( Θ t Σ t | t ( S ) ) . Then, the cost function g ( S ) in eq. (11) is written as g ( S ) = c + ∑ T t =1 f t ( S ) , due to eq. (18) in Lemma 9. a) Lower bound for the numerator of the supermodular- ity ratio γ g : Per the supermodularity ratio Definition 4, the numerator of the submodularity ratio γ g is of the form T ∑ t =1 [ f t ( S ) − f t ( S ∪ { v } )] , (27) for some sensor set S ⊆ V , and sensor v ∈ V ; to lower bound the sum in (27), we lower bound each f t ( S ) − f t ( S ∪ { v } ) . To this end, from eq. (15) in Lemma 5, observe Σ t | t ( S ∪ { v } ) = [Σ − 1 t | t − 1 ( S ∪ { v } ) + ∑ i ∈S∪{ v } ̄ C T i,t ̄ C i,t ] − 1 . Define Ω t = Σ − 1 t | t − 1 ( S ) + ∑ T i ∈S ̄ C T i,t ̄ C i,t , and ̄ Ω t = Σ − 1 t | t − 1 ( S ∪{ v } )+ ∑ T i ∈S ̄ C T i,t ̄ C i,t ; using the Woodbury identity in Lemma 3, f t ( S ∪ { v } ) = tr ( Θ t ̄ Ω − 1 t ) − tr ( Θ t ̄ Ω − 1 t ̄ C T v,t ( I + ̄ C v,t ̄ Ω − 1 t ̄ C T v,t ) − 1 ̄ C v,t ̄ Ω − 1 t ) . Therefore, for any time t ∈ { 1 , 2 . . . , T } , f t ( S ) − f t ( S ∪ { v } ) = tr ( Θ t Ω − 1 t ) − tr ( Θ t ̄ Ω − 1 t ) + tr ( Θ t ̄ Ω − 1 t ̄ C T v,t ( I + ̄ C v,t ̄ Ω − 1 t ̄ C T v,t ) − 1 ̄ C v,t ̄ Ω − 1 t ) ≥ tr ( Θ t ̄ Ω − 1 t ̄ C T v,t ( I + ̄ C v,t ̄ Ω − 1 t ̄ C T v,t ) − 1 ̄ C v,t ̄ Ω − 1 t ) , (28) where ineq. (28) holds because tr ( Θ t Ω − 1 t ) ≥ tr ( Θ t ̄ Ω − 1 t ) . In particular, the inequality tr ( Θ t Ω − 1 t ) ≥ tr ( Θ t ̄ Ω − 1 t ) is implied as follows: Lemma 6 implies Σ 1 | 1 ( S )  Σ 1 | 1 ( S ∪ { u } ) . Then, Corollary 2 implies Σ t | t − 1 ( S )  Σ t | t − 1 ( S ∪ { v } ) , and as a result, Lemma 1 implies Σ t | t − 1 ( S ) − 1  Σ t | t − 1 ( S ∪ { u } ) − 1 . Now, Σ t | t − 1 ( S ) − 1  Σ t | t − 1 ( S ∪ { u } ) − 1 and the definition of Ω t and of ̄ Ω t imply Ω t  ̄ Ω t . Next, Lemma 1 implies Ω − 1 t  ̄ Ω − 1 t . As a result, since also Θ t is a symmetric matrix, Lem- ma 4 gives the desired inequality tr ( Θ t Ω − 1 t ) ≥ tr ( Θ t ̄ Ω − 1 t ) . Continuing from the ineq. (28), f t ( S ) − f t ( S ∪ { v } ) ≥ tr ( ̄ C v,t ̄ Ω − 1 t Θ t ̄ Ω − 1 t ̄ C T v,t ( I + ̄ C v,t ̄ Ω − 1 t ̄ C T v,t ) − 1 ) ≥ λ min (( I + ̄ C v,t ̄ Ω − 1 t ̄ C T v,t ) − 1 ) tr ( ̄ C v,t ̄ Ω − 1 t Θ t ̄ Ω − 1 t ̄ C T v,t ) , (29) where ineq. (29) holds due to Lemma 2. From ineq. (29), f t ( S ) − f t ( S ∪ { v } ) ≥ = λ − 1 max ( I + ̄ C v,t ̄ Ω − 1 t ̄ C T v,t ) tr ( ̄ C v,t ̄ Ω − 1 t Θ t ̄ Ω − 1 t ̄ C T v,t ) ≥ λ − 1 max ( I + ̄ C v,t Σ t | t ( ∅ ) ̄ C T v,t ) tr ( ̄ C v,t ̄ Ω − 1 t Θ t ̄ Ω − 1 t ̄ C T v,t ) = λ − 1 max ( I + ̄ C v,t Σ t | t ( ∅ ) ̄ C T v,t ) tr ( Θ t ̄ Ω − 1 t ̄ C T v,t ̄ C v,t ̄ Ω − 1 t ) , (30) where we used ̄ Ω − 1 t  Σ t | t ( ∅ ) , which holds because of the following: the definition of ̄ Ω t implies ̄ Ω t  Σ − 1 t | t − 1 ( S ∪ { v } ) , and as a result, from Lemma 1 we get ̄ Ω − 1 t  Σ t | t − 1 ( S ∪{ v } ) . In addition, Corollary 2 and the fact that Σ 1 | 1 ( S ∪ { v } )  Σ 1 | 1 ( ∅ ) , which holds due to Lemma 6, imply Σ t | t − 1 ( S ∪ { v } )  Σ t | t − 1 ( ∅ ) . Finally, from eq. (15) in Lemma 5 it is Σ t | t − 1 ( ∅ ) = Σ t | t ( ∅ ) . Overall, the desired inequality ̄ Ω − 1 t  Σ t | t ( ∅ ) holds. Consider a time t ′ ∈ { 1 , 2 . . . , T } such that for any time t ∈ { 1 , 2 , . . . , T } it is ̄ Ω − 1 t ′ ̄ C T v,t ′ ̄ C v,t ′ ̄ Ω − 1 t ′  ̄ Ω − 1 t ̄ C T v,t ̄ C v,t ̄ Ω − 1 t , and let Φ be the matrix ̄ Ω − 1 t ′ ̄ C T v,t ′ ̄ C v,t ′ ̄ Ω − 1 t ′ ; similarly, let l be the min t ∈{ 1 , 2 ...,T } ,u ∈V λ − 1 max ( I + ̄ C v,t Σ t | t ( ∅ ) ̄ C T v,t ) . Summing ineq. (30) across all times t ∈ { 1 , 2 . . . , T } , and using Lemmata 4 and 2, g ( S ) − g ( S ∪ { v } ) ≥ l T ∑ t =1 tr ( Θ t ̄ Ω − 1 t ̄ C T v,t ̄ C v,t ̄ Ω − 1 t ) ≥ l T ∑ t =1 tr (Θ t Φ) = l tr ( Φ T ∑ t =1 Θ t ) ≥ lλ min ( T ∑ t =1 Θ t ) tr (Φ) > 0 , which is non-zero because ∑ T t =1 Θ t ≻ 0 and Φ is a non-zero positive semi-definite matrix. Finally, we lower bound tr (Φ) , using Lemma 2: tr (Φ) = tr ( ̄ Ω − 1 t ′ ̄ C T v,t ′ ̄ C v,t ′ ̄ Ω − 1 t ′ ) = tr ( ̄ Ω − 2 t ′ ̄ C T v,t ′ ̄ C v,t ′ ) ≥ λ min ( ̄ Ω − 2 t ′ ) tr ( ̄ C T v,t ′ ̄ C v,t ′ ) = λ 2 min ( ̄ Ω − 1 t ′ ) tr ( ̄ C T v,t ′ ̄ C v,t ′ ) ≥ λ 2 min (Σ t ′ | t ′ ( V )) tr ( ̄ C T v,t ′ ̄ C v,t ′ ) , (31) where ineq. (31) holds because ̄ Ω − 1 t ′  Σ t ′ | t ′ ( V ) . In particular, the inequality ̄ Ω − 1 t ′  Σ t ′ | t ′ ( S ∪ { v } ) is derived by applying Lemma 1 to the inequality ̄ Ω t ′  ̄ Ω t ′ + ̄ C T v,t ̄ C T v,t = Σ − 1 t ′ | t ′ ( S ∪ { v } ) , where the equality holds by the definition of ̄ Ω t ′ . In ad- dition, due to Lemma 6 it is Σ 1 | 1 ( S ∪ { v } )  Σ 1 | 1 ( V ) , and as a result, from Corollary 1 it also is Σ t ′ | t ′ ( S ∪{ v } )  Σ t ′ | t ′ ( V ) . Overall, the desired inequality ̄ Ω − 1 t ′  Σ t ′ | t ′ ( V ) holds. b) Upper bound for the denominator of the supermodu- larity ratio γ g : The denominator of the submodularity ratio γ g is of the form T ∑ t =1 [ f t ( S ′ ) − f t ( S ′ ∪ { v } )] , for some sensor set S ′ ⊆ V , and sensor v ∈ V ; to upper bound it, from eq. (15) in Lemma 5 of Appendix A, observe Σ t | t ( S ′ ∪ { v } ) = [Σ − 1 t | t − 1 ( S ′ ∪ { v } ) + ∑ i ∈S ′ ∪{ v } ̄ C T i,t ̄ C i,t ] − 1 , and let H t = Σ − 1 t | t − 1 ( S ′ ) + ∑ T i ∈S ′ ̄ C T i,t ̄ C i,t , and ̄ H t = Σ − 1 t | t − 1 ( S ′ ∪ { v } ) + ∑ T i ∈S ′ ̄ C T i,t ̄ C i,t ; using the Woodbury iden- tity in Lemma 3, f t ( S ′ ∪ { v } ) = tr ( Θ t ̄ H − 1 t ) − tr ( Θ t ̄ H − 1 t ̄ C T v,t ( I + ̄ C v,t ̄ H − 1 t ̄ C T v,t ) − 1 ̄ C v,t ̄ H − 1 t ) . Therefore, T ∑ t =1 [ f t ( S ′ ) − f t ( S ′ ∪ { v } )] = T ∑ t =1 [ tr ( Θ t H − 1 t ) − tr ( Θ t ̄ H − 1 t ) + tr ( Θ t ̄ H − 1 t ̄ C T v,t ( I + ̄ C v,t ̄ H − 1 t ̄ C T v,t ) − 1 ̄ C v,t ̄ H − 1 t ) ] ≤ T ∑ t =1 [ tr ( Θ t H − 1 t ) + tr ( Θ t ̄ H − 1 t ̄ C T v,t ( I + ̄ C v,t ̄ H − 1 t ̄ C T v,t ) − 1 ̄ C v,t ̄ H − 1 t ) ] , (32) where ineq. (32) holds since tr ( Θ t ̄ H − 1 t ) is non-negative. In eq. (32), the second term in the sum is upper bounded as follows, using Lemma 2: tr ( Θ t ̄ H − 1 t ̄ C T v,t ( I + ̄ C v,t ̄ H − 1 t ̄ C T v,t ) − 1 ̄ C v,t ̄ H − 1 t ) = tr ( ̄ C v,t ̄ H − 1 t Θ t ̄ H − 1 t ̄ C T v,t ( I + ̄ C v,t ̄ H − 1 t ̄ C T v,t ) − 1 ) ≤ tr ( ̄ C v,t ̄ H − 1 t Θ t ̄ H − 1 t ̄ C T v,t ) λ max [( I + ̄ C v,t ̄ H − 1 t ̄ C T v,t ) − 1 ] = tr ( ̄ C v,t ̄ H − 1 t Θ t ̄ H − 1 t ̄ C T v,t ) λ − 1 min ( I + ̄ C v,t ̄ H − 1 t ̄ C T v,t ) ≤ tr ( ̄ C v,t ̄ H − 1 t Θ t ̄ H − 1 t ̄ C T v,t ) λ − 1 min ( I + ̄ C v,t Σ t | t ( V ) ̄ C T v,t ) , (33) since λ min ( I + ̄ C v,t ̄ H − 1 t ̄ C T v,t ) ≥ λ min ( I + ̄ C v,t Σ t | t ( V ) ̄ C T v,t ) , because ̄ H − 1 t  Σ t | t ( V ) . In particular, the inequality ̄ H − 1 t  Σ t | t ( V ) is derived as follows: first, it is ̄ H t  ̄ H t + ̄ C T v,t ̄ C v,t = Σ t | t ( S ′ ∪ { v } ) − 1 , where the equality holds by the definition of ̄ H t , and now Lemma 1 implies ̄ H − 1 t  Σ t | t ( S ′ ∪ { v } ) . In ad- dition, Σ t | t ( S ′ ∪ { v } )  Σ t | t ( V ) is implied from Corollary 1, since Lemma 6 implies Σ 1 | 1 ( S ′ ∪ { v } )  Σ 1 | 1 ( V ) . Overall, the desired inequality ̄ H − 1 t  Σ t | t ( V ) holds. Let l ′ = max t ∈{ 1 , 2 ...,T } ,v ∈V λ − 1 min ( I + ̄ C v,t Σ t | t ( V ) ̄ C T v,t ) . From ineqs. (32) and (33), ∑ T t =1 [ f t ( S ′ ) − f t ( S ′ ∪ { v } )] ≤ ∑ T t =1 [ tr ( Θ t H − 1 t ) + l ′ tr ( Θ t ̄ H − 1 t ̄ C T v,t ̄ C v,t ̄ H − 1 t ) ] . (34) Consider times t ′ ∈ { 1 , 2 . . . , T } and t ′′ ∈ { 1 , 2 . . . , T } such that for any time t ∈ { 1 , 2 , . . . , T } , it is H − 1 t ′  H − 1 t and ̄ H − 1 t ′′ ̄ C T v,t ′′ ̄ C v,t ′′ ̄ H − 1 t ′′  ̄ H − 1 t ̄ C T v,t ̄ C v,t ̄ H − 1 t , and let Ξ = H − 1 t ′ and Φ ′ = ̄ H − 1 t ′ ̄ C T v,t ′ ̄ C v,t ′ ̄ H − 1 t ′ . From ineq. (34), and Lemma 4, T ∑ t =1 [ f t ( S ′ ) − f t ( S ′ ∪ { v } )] ≤ T ∑ t =1 [ tr (Θ t Ξ) + l ′ tr (Θ t Φ ′ )] ≤ tr ( Ξ T ∑ t =1 Θ t ) + l ′ tr ( Φ ′ T ∑ t =1 Θ t ) ≤ ( tr (Ξ) + l ′ tr (Φ ′ )) λ max ( T ∑ t =1 Θ t ) . (35) Finally, we upper bound tr (Ξ)+ l ′ tr (Φ ′ ) in ineq. (35), using Lemma 2: tr (Ξ) + l ′ tr (Φ ′ ) ≤ tr ( H − 1 t ′ ) + (36) l ′ λ 2 max ( ̄ H − 1 t ′′ ) tr ( ̄ C T v,t ′′ ̄ C v,t ′′ ) ≤ tr ( Σ t ′ | t ′ ( ∅ ) ) + l ′ λ 2 max (Σ t ′′ | t ′′ ( ∅ )) tr ( ̄ C T v,t ′′ ̄ C v,t ′′ ) , (37) where ineq. (37) holds because H − 1 t ′  Σ t ′ | t ′ ( ∅ ) , and simi- larly, ̄ H − 1 t ′′  Σ t ′′ | t ′′ ( ∅ ) . In particular, the inequality H − 1 t ′  Σ t ′ | t ′ ( ∅ ) is implied as follows: first, by the definition of H t ′ , it is H − 1 t ′ = Σ t ′ | t ′ ( S ′ ) ; and finally, Corollary 1 and the fact that Σ 1 | 1 ( S ′ )  Σ 1 | 1 ( ∅ ) , which holds due to Lemma 6, imply Σ t ′ | t ′ ( S ′ )  Σ t ′ | t ′ ( ∅ ) . In addition, the inequality ̄ H − 1 t ′′  Σ t ′′ | t ′′ ( ∅ ) is implied as follows: first, by the definition of ̄ H t ′′ , it is ̄ H t ′′  Σ − 1 t ′′ | t ′′ − 1 ( S ′ ∪ { v } ) , and as a result, Lemma 1 implies ̄ H − 1 t ′′  Σ t ′′ | t ′′ − 1 ( S ′ ∪ { v } ) . Moreover, Corollary 2 and the fact that Σ 1 | 1 ( S ∪ { v } )  Σ 1 | 1 ( ∅ ) , which holds due to Lemma 6, imply Σ t ′′ | t ′′ − 1 ( S ′ ∪ { v } )  Σ t ′′ | t ′′ − 1 ( ∅ ) . Finally, from eq. (15) in Lemma 5 it is Σ t ′′ | t ′′ − 1 ( ∅ ) = Σ t ′′ | t ′′ ( ∅ ) . Overall, the desired inequality ̄ H − 1 t ′′  Σ t ′′ | t ′′ ( ∅ ) holds. A PPENDIX E: P ROOF OF T HEOREM 4 Lemma 10 (System-level condition for near-optimal sensor selection). Let N 1 be defined as in eq. (8) . The control policy u 1: T , (0 , 0 , . . . , 0) is suboptimal for the LQG problem in eq. (14) for all non-zero initial conditions x 1 if and only if ∑ T t =1 A T 1 · · · A T t Q t A t · · · A 1 ≻ N 1 . (38) Proof of Lemma 10: For any initial condition x 1 , eq. (18) in Lemma 9 implies for the noiseless perfect state information LQG problem in eq. (14): min u 1: T T ∑ t =1 [ ‖ x t +1 ‖ 2 Q t + ‖ u t ( x t ) ‖ 2 R t ] ∣ ∣ Σ t | t = W t =0 = x T 1 N 1 x 1 , (39) since E ( ‖ x 1 ‖ 2 N 1 ) = x T 1 N 1 x 1 , because x 1 is known ( Σ 1 | 1 = 0 ), and Σ t | t and W t are zero. In addition, for u 1: T = (0 , 0 , . . . , 0) , the objective function in the noiseless perfect state information LQG problem in eq. (14) is ∑ T t =1 [ ‖ x t +1 ‖ 2 Q t + ‖ u t ( x t ) ‖ 2 R t ] ∣ ∣ Σ t | t = W t =0 = ∑ T t =1 x T t +1 Q t x t +1 = x T 1 ∑ T t =1 A T 1 A T 2 · · · A T t Q t A t A t − 1 · · · A 1 x 1 , (40) since x t +1 = A t x t = A t A t − 1 x t − 1 = . . . = A t A t − 1 · · · A 1 x 1 when all u 1 , u 2 , . . . , u T are zero. From eqs. (39) and (40), the inequality x T 1 N 1 x 1 < x T 1 T ∑ t =1 A T 1 A T 2 · · · A T t Q t A t A t − 1 · · · A 1 x 1 holds for any non-zero x 1 if and only if N 1 ≺ T ∑ t =1 A T 1 · · · A T t Q t A t A t − 1 · · · A 1 . Lemma 11. For any t = 1 , 2 , . . . , T , Θ t = A T t S t A t + Q t − 1 − S t − 1 . Proof of Lemma 11: Using the Woobury identity in Lemma 3, and the notation in eq. (8), N t = A T t ( S − 1 t + B t R − 1 t B T t ) − 1 A t = A T t ( S t − S t B t M − 1 t B T t S t ) A t = A T t S t A t − Θ t . The latter, gives Θ t = A T t S t A t − N t . In addition, from eq. (8), − N t = Q t − 1 − S t − 1 , since S t = Q t + N t +1 . Lemma 12. ∑ T t =1 A T 1 A T 2 · · · A T t Q t A t A t − 1 · · · A 1 ≻ N 1 if and only if T ∑ t =1 A T 1 A T 2 · · · A T t − 1 Θ t A t − 1 A t − 2 · · · A 1 ≻ 0 . Proof of Lemma 12: For i = t − 1 , t − 2 , . . . , 1 , we pre- and post-multiply the identity in Lemma 11 with A T i and A i , respectively: Θ t = A T t S t A t + Q t − 1 − S t − 1 ⇒ A T t − 1 Θ t A t − 1 = A T t − 1 A T t S t A t A t − 1 + A T t − 1 Q t − 1 A t − 1 − A T t − 1 S t − 1 A t − 1 ⇒ A T t − 1 Θ t A t − 1 = A T t − 1 A T t S t A t A t − 1 + A T t − 1 Q t − 1 A t − 1 − Θ t − 1 + Q t − 2 − S t − 2 ⇒ Θ t − 1 + A T t − 1 Θ t A t − 1 = A T t − 1 A T t S t A t A t − 1 + A T t − 1 Q t − 1 A t − 1 + Q t − 2 − S t − 2 ⇒ . . . ⇒ Θ 2 + A T 2 Θ 3 A 2 + . . . + A T 2 · · · A T t − 1 Θ t A t − 1 · · · A 2 = A T 2 · · · A T t S t A t · · · A 2 + A T 2 · · · A T t − 1 Q t − 1 A t − 1 · · · A 2 + . . . + A T 2 Q 2 A 2 + Q 1 − S 1 ⇒ Θ 1 + A T 1 Θ 2 A 1 + . . . + A T 1 · · · A T t − 1 Θ t A t − 1 · · · A 1 = A T 1 · · · A T t S t A t · · · A 1 + A T 1 · · · A T t − 1 Q t − 1 A t − 1 · · · A 1 + . . . + A T 1 Q 1 A 1 − N 1 ⇒ ∑ T t =1 A T 1 · · · A T t − 1 Θ t A t − 1 · · · A 1 = ∑ T t =1 A T 1 · · · A T t Q t A t · · · A 1 − N 1 . (41) The last equality in eq. (41) implies Lemma 12. Lemma 13. Consider for any t = 1 , 2 , . . . , T that A t is invertible. ∑ T t =1 A T 1 A T 2 · · · A T t − 1 Θ t A t − 1 A t − 2 · · · A 1 ≻ 0 if and only if T ∑ t =1 Θ t ≻ 0 . Proof of Lemma 13: Let U t = A t − 1 A t − 2 · · · A 1 . We first prove that for any non-zero vector z , if it is ∑ T t =1 A T 1 A T 2 · · · A T t − 1 Θ t A t − 1 A t − 2 · · · A 1 ≻ 0 , then ∑ T t =1 z T Θ t z > 0 . In particular, since U t is invertible, — because for any t ∈ { 1 , 2 , . . . , T } , A t is,— ∑ T t =1 z T Θ t z = ∑ T t =1 z T U −⊤ t U T t Θ t U t U − 1 t z = ∑ T t =1 tr ( φ t φ T t U T t Θ t U t ) , (42) where we let φ t = U − 1 t z . Consider a time t ′ such that for any time t ∈ { 1 , 2 . . . , T } , φ t ′ φ T t ′  φ t φ T t . From eq. (42), using Lemmata 4 and 2, T ∑ t =1 z T Θ t z ≥ T ∑ t =1 tr ( φ t ′ φ T t ′ U T t Θ t U t ) ≥ tr ( φ t ′ φ T t ′ T ∑ t =1 U T t Θ t U t ) ≥ tr ( φ t ′ φ T t ′ ) λ min ( T ∑ t =1 U T t Θ t U t ) = ‖ φ t ′ ‖ 2 2 λ min ( T ∑ t =1 U T t Θ t U t ) > 0 . We finally prove that for any non-zero vector z , if ∑ T t =1 Θ t ≻ 0 , then ∑ T t =1 zA T 1 · · · A T t − 1 Θ t A t − 1 · · · A 1 z ≻ 0 . In particular, T ∑ t =1 z T U T t Θ t U t z = T ∑ t =1 tr ( ξ T t Θ t ξ t ) , (43) where we let ξ t = U t z . Consider time t ′ such that for any time t ∈ { 1 , 2 . . . , T } , ξ t ′ ξ T t ′  ξ t ξ T t . From eq. (42), using Lemmata 4 and 2, T ∑ t =1 tr ( ξ T t Θ t ξ t ) ≥ tr ( ξ t ′ ξ T t ′ T ∑ t =1 Θ t ) ≥ tr ( ξ t ′ ξ T t ′ ) λ min ( T ∑ t =1 Θ t ) = ‖ ξ t ′ ‖ 2 2 λ min ( T ∑ t =1 Θ t ) > 0 . Proof of Theorem 4: Theorem 4 follows from the sequential application of Lemmata 10, 12, and 13. R EFERENCES [1] N. Elia and S. Mitter, “Stabilization of linear systems with limited information,” IEEE Trans. on Automatic Control , vol. 46, no. 9, pp. 1384–1400, 2001. [2] G. Nair and R. Evans, “Stabilizability of stochastic linear systems with finite feedback data rates,” SIAM Journal on Control and Optimization , vol. 43, no. 2, pp. 413–436, 2004. [3] S. Tatikonda and S. Mitter, “Control under communication constraints,” IEEE Trans. on Automatic Control , vol. 49, no. 7, pp. 1056–1068, 2004. [4] V. Borkar and S. Mitter, “LQG control with communication constraints,” Comm., Comp., Control, and Signal Processing , pp. 365–373, 1997. [5] J. L. Ny and G. Pappas, “Differentially private filtering,” IEEE Trans. on Automatic Control , vol. 59, no. 2, pp. 341–354, 2014. [6] G. Nair, F. Fagnani, S. Zampieri, and R. Evans, “Feedback control under data rate constraints: An overview,” Proceedings of the IEEE , vol. 95, no. 1, pp. 108–137, 2007. [7] J. Hespanha, P. Naghshtabrizi, and Y.Xu, “A survey of results in net- worked control systems,” Pr. of the IEEE , vol. 95, no. 1, p. 138, 2007. [8] J. Baillieul and P. Antsaklis, “Control and communication challenges in networked real-time systems,” Proceedings of the IEEE , vol. 95, no. 1, pp. 9–28, 2007. [9] V. Gupta, T. H. Chung, B. Hassibi, and R. M. Murray, “On a stochastic sensor selection algorithm with applications in sensor scheduling and sensor coverage,” Automatica , vol. 42, no. 2, pp. 251–260, 2006. [10] H. Avron and C. Boutsidis, “Faster subset selection for matrices and applications,” SIAM Journal on Matrix Analysis and Applications , vol. 34, no. 4, pp. 1464–1499, 2013. [11] C. Li, S. Jegelka, and S. Sra, “Polynomial Time Algorithms for Dual Volume Sampling,” ArXiv e-prints: 1703.02674 , 2017. [12] S. Joshi and S. Boyd, “Sensor selection via convex optimization,” IEEE Transactions on Signal Processing , vol. 57, no. 2, pp. 451–462, 2009. [13] J. L. Ny, E. Feron, and M. A. Dahleh, “Scheduling continuous-time kalman filters,” IEEE Trans. on Aut. Control , vol. 56, no. 6, pp. 1381– 1394, 2011. [14] M. Shamaiah, S. Banerjee, and H. Vikalo, “Greedy sensor selection: Leveraging submodularity,” in Proceedings of the 49th IEEE Conference on Decision and control , 2010, pp. 2572–2577. [15] S. T. Jawaid and S. L. Smith, “Submodularity and greedy algorithms in sensor scheduling for linear dynamical systems,” Automatica , vol. 61, pp. 282–288, 2015. [16] V. Tzoumas, A. Jadbabaie, and G. J. Pappas, “Sensor placement for optimal Kalman filtering,” in Amer. Contr. Conf. , 2016, pp. 191–196. [17] E. Shafieepoorfard and M. Raginsky, “Rational inattention in scalar LQG control,” in Proceedings of the 52th IEEE Conference in Decision and Control , 2013, pp. 5733–5739. [18] T. Tanaka and H. Sandberg, “SDP-based joint sensor and controller design for information-regularized optimal LQG control,” in 54th IEEE Conference on Decision and Control , 2015, pp. 4486–4491. [19] D. P. Bertsekas, Dynamic programming and optimal control, Vol. I . Athena Scientific, 2005. [20] G. Nemhauser, L. Wolsey, and M. Fisher, “An analysis of approximations for maximizing submodular set functions – I,” Mathematical Program- ming , vol. 14, no. 1, pp. 265–294, 1978. [21] Z. Wang, B. Moran, X. Wang, and Q. Pan, “Approximation for maxi- mizing monotone non-decreasing set functions with a greedy method,” Journal of Combinatorial Optimization , vol. 31, no. 1, pp. 29–43, 2016. [22] U. Feige, “A threshold of ln ( n ) for approximating set cover,” Journal of the ACM , vol. 45, no. 4, pp. 634–652, 1998. [23] L. Carlone and S. Karaman, “Attention and anticipation in fast visual- inertial navigation,” in Proceeding of the IEEE International Conference on Robotics and Automation , 2017, pp. 3886–3893. [24] D. S. Bernstein, Matrix mathematics . Princeton University Press, 2005. [25] L. F. Chamon and A. Ribeiro, “Near-optimality of greedy set selection in the sampling of graph signals,” in Proceedings of the IEEE Global Conference on Signal and Information Processing , 2016, pp. 1265–1269. [26] D. Coppersmith and S. Winograd, “Matrix multiplication via arithmetic progressions,” J. of Symbolic Comp. , vol. 9, no. 3, pp. 251–280, 1990.