arXiv:1608.07533v3 [math.OC] 26 Sep 2016 Near-Optimal Sensor Scheduling for Batch State Estimation: Complexity, Algorithms, and Limits Vasileios Tzoumas 1 , Ali Jadbabaie 2 , George J. Pappas 1 Abstract — In this paper, we focus on batch state estimation for linear systems. This problem is important in applications such as environmental field estimation, robotic navigation, and target tracking. Its difficulty lies on that limited operational resources among the sensors, e.g., shared communication band- width or battery power, constrain the number of sensors that can be active at each measurement step. As a result, sensor scheduling algorithms must be employed. Notwithstanding, current sensor scheduling algorithms for batch state estimation scale poorly with the system size and the time horizon. In addition, current sensor scheduling algorithms for Kalman filtering, although they scale better, provide no performance guarantees or approximation bounds for the minimization of the batch state estimation error. In this paper, one of our main contributions is to provide an algorithm that enjoys both the estimation accuracy of the batch state scheduling algorithms and the low time complexity of the Kalman filtering scheduling algorithms. In particular: 1) our algorithm is near- optimal: it achieves a solution up to a multiplicative factor 1 / 2 from the optimal solution, and this factor is close to the best approximation factor 1 /e one can achieve in polynomial time for this problem; 2) our algorithm has (polynomial) time complexity that is not only lower than that of the current algorithms for batch state estimation; it is also lower than, or similar to, that of the current algorithms for Kalman filtering. We achieve these results by proving two properties for our batch state estimation error metric, which quantifies the square error of the minimum variance linear estimator of the batch state vector: a) it is supermodular in the choice of the sensors; b) it has a sparsity pattern (it involves matrices that are block tri-diagonal) that facilitates its evaluation at each sensor set. I. I NTRODUCTION Search and rescue [1], environmental field estimation [2], robotic navigation [3], and target tracking [4] are only a few of the challenging information gathering problems that employ the monitor capabilities of sensor networks [5]. In particular, all these problems face the following three main challenges: • they involve systems whose evolution is largely un- known, corrupted with noisy inputs [4], and sensors with limited sensor capabilities, corrupted with mea- surement noise [6]. 1 The authors are with the Department of Electrical and Systems En- gineering, University of Pennsylvania, Philadelphia, PA 19104-6228 USA (email: { vtzoumas, pappasg } @seas.upenn.edu ). 2 The author is with the Department of Civil and Environmental Engineer- ing, Massachusetts Institute of Technology, Cambridge, MA, 02139 USA (email: jadbabai@mit.edu ). This work was supported in part by TerraSwarm, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA, in part by AFOSR Complex Networks Program and in part by AFOSR MURI CHASE. • they involve systems that change over time [2], and as a result, necessitate both spacial and temporal deployment of sensors in the environment. At the same time: • they involve operational constraints, such as limited bandwidth and battery life, which limit the number of sensors that can be simultaneously used (i.e., be switched-on) in the information gathering process [7]. As a result of these challenges, researchers focused on the following question: “How do we select at each measurement step only a few sensors so to minimize the estimation error despite the above challenges?” The effort to answer this question resulted to the problem of sensor scheduling [7]: in particular, sensor scheduling offers a formal methodology to use at each measurement time only a few sensors and obtain an optimal trade-off between the estimation accuracy and the usage of the limited operational resource (e.g., the shared bandwidth). Clearly, sensor scheduling is a combinatorial problem of exponential complexity [5]. In this paper, we focus on the following instance of this problem: Problem 1 (Sensor Scheduling for Minimum Variance Batch State Estimation): Consider a time-invariant linear system, whose state at time t k is denoted as x ( t k ) , a set of m sensors, and a fixed set of K measurement times t 1 , t 2 , . . . , t K . In addition, consider that at each t k at most r k sensors can be used, where r k ≤ m . At each t k select a set of r k sensors so to minimize the square estimation error of the minimum variance linear estimator of the batch state vector ( x ( t 1 ) , x ( t 2 ) , . . . , x ( t K )) . There are two classes of sensor scheduling algorithms, that trade-off between the estimation accuracy of the batch state vector and their time complexity: these for Kalman filtering, and those for batch state estimation. In more detail: Kalman filtering algorithms: These algorithms sacrifice estimation accuracy over reduced time complexity. The rea- son is that they are sequential algorithms: at each t k , they select the sensors so to minimize the square estimation error of the minimum variance linear estimator of x ( t k ) (given the measurements up to t k ). Therefore, their objective is to minimize the sum of the square estimation errors of x ( t k ) across the measurement times t k [8]. However, this sum is only an upper bound to the square estimation error of the batch state vector ( x ( t 1 ) , x ( t 2 ) , . . . , x ( t K )) . Thus, the Kalman filtering algorithms lack on estimation accuracy with respect to the batch state estimation algorithms. Batch state estimation algorithms: These algorithms sacrifice time complexity over estimation accuracy. The rea- son is that they perform global optimization, in accordance to Problem 1. Therefore, however, they lack on time complexity with respect to the Kalman filtering algorithms. Notwithstanding, in several recent robotic applications, batch estimation algorithms have been proven competitive in their time complexity to their filtering counterparts [9], [10]. The reason is that sparsity patterns emerge in these applications, that reduce the time complexity of their batch estimation algorithms to an order similar to that of the filtering algorithms [11]. Thereby, the following question on Problem 1 arises: Question 1: “Is there an algorithm for Problem 1 that enjoys both the estimation accuracy of the batch state algorithms and the low time complexity of the Kalman filtering algorithms?” Literature review on sensor scheduling algorithms for batch state estimation: The most relevant paper on Problem 1 is [12], where an algorithm based on convex relaxation is provided. This algorithm scales poorly with the system’s size and number of measurement times. In addition, it provides no approximation performance guarantees. Literature review on sensor scheduling algorithms for Kalman filtering: Several papers in this category have fo- cused on myopic algorithms [13]; such algorithms, however, often perform poorly [14]. Other papers have focused on algorithms that use: tree pruning [15], convex optimiza- tion [16], quadratic programming [17], or submodular func- tion maximization [18], [19]. Nevertheless, these algorithms provide no performance guarantees on the batch state estima- tion error, or have time complexity that scales poorly with the system’s size and number of measurement times [15] [17]. To reduce the time complexity of these algorithms, papers have also proposed periodic sensor schedules [8]. Contributions: We now present our contributions: 1) We prove that Problem 1 is NP-hard. 2) We provide an algorithm for Problem 1 (Algorithm 1) that answers Question 1 positively. The reasons are two: i) Algorithm 1 is near-optimal: it achieves a solution that is up to a multiplicative factor 1 / 2 from the opti- mal solution. In addition, this multiplicative factor is close to the factor 1 /e which we prove to be the best approximation factor one can achieve in polynomial time for Problem 1 in the worst-case. ii) Algorithm 1 has (polynomial) time complexity that is not only lower than that of the state of the art scheduling algorithms for batch state estimation; it is also lower than, or similar to, that of the state of the art scheduling algorithms for Kalman filtering. For example, it has similar complexity to the state of the art periodic scheduling algorithm in [8] (in particular: lower for K large enough), and lower than the complexity of the algorithm in [16]. Overall, in response to Question 1, Algorithm 1 enjoys both the higher estimation accuracy of the batch state estimation approach (compared to the Kalman filtering approach, that only approximates the batch state esti- mation error with an upper bound) and the low time complexity of Kalman filtering approach. 3) We prove limits on the minimization of the square error of the minimum variance estimator of ( x ( t 1 ) , x ( t 2 ) , . . . , x ( t K )) with respect to the scheduled sensors. For example, we prove that the number r k of used sensors at each measurement time must increase linearly with the system size for fixed estimation error and number of measurement times K ; this is a fundamental limit, especially for large-scale systems. Our technical contributions: We achieve our aforemen- tioned contributions by proving the following two: a) Supermodularity in Problem 1: We prove that our estimation metric, that quantifies the square error of the minimum variance estimator of ( x ( t 1 ) , x ( t 2 ) , . . . , x ( t K )) , is a supermodular function in the choice of the used sensors. This result becomes important when we compare it to results on the multi-step Kalman filtering that show that the corresponding estimation metric in this case is neither supermodular nor submodular [18], [19]. 1 In addition, this submodularity result cannot be reduced to the batch estimation problem in [21]. The main reasons are two: i) we consider sensors that can measure any linear combination of the element of x ( t k ) , in contrast to [21], where each sensor measures directly only one element of x ( t k ) . Nonetheless, the latter assumption is usually infeasible in dynamical systems [22]; ii) our error metric is relevant to estimation problems for dynamical systems and different to the submodular information gain considered in [21]. b) Sparsity in Problem 1: We identify a sparsity pattern in our error metric, that facilitates the latter’s evaluation at each sensor set. In particular, we prove that the error covariance of the minimum variance linear estimator of the batch state vector is block tri-diagonal. We organize the rest of the paper as follows: In Section II we present formally Problem 1. In Section III, we present in three subsections our main results: in Section III-A, we prove that our sensor scheduling problem is NP-hard. In Section III-B, we derive a near-optimal approximation algorithm, and compare its time complexity with that of previous sensor scheduling algorithms. In Section III-C, we prove limits on the minimization of the batch state estimation error with respect to the used sensors. Section IV concludes the paper with our future work. All proofs are found in the Appendix. 2 1 The observation of [19] is also important as it disproves previous results in the literature [20]. 2 Standard notation is presented in this footnote: We denote the set of natural numbers { 1 , 2 , . . . } as N , the set of real numbers as R , and the set { 1 , 2 , . . . , n } as [ n ] ( n ∈ N ). The empty set is denoted as ∅ . Given a set X , |X | is its cardinality. Matrices are represented by capital letters and vectors by lower-case letters. We write A ∈ X n 1 × n 2 ( n 1 , n 2 ∈ N ) to denote a matrix of n 1 rows and n 2 columns whose elements take values in X . Moreover, for a matrix A , A ⊤ is its transpose, and [ A ] ij is its element at the i -th row and j -th column. In addition, ‖ A ‖ 2 ≡ √ A ⊤ A is its spectral norm, and det( A ) its determinant. Furthermore, if A is positive semi-definite or positive definite, we write A  0 and A ≻ 0 , respectively. I is the identity matrix; its dimension is inferred from the context. Similarly for the zero matrix 0 . Finally, for a random variable x ∈ R n , E ( x ) is its expected value, and C ( x ) its covariance. II. P ROBLEM F ORMULATION In the following paragraphs, we present our sensor scheduling problem for batch state estimation. To this end, we first build our system and measurement framework. Then, we define our sensor scheduling framework and, finally, present our sensor scheduling problem. We start in more detail with the system model: System Model: We consider the linear time-invariant system: ̇ x ( t ) = Ax ( t ) + Bu ( t ) + F w ( t ) , t ≥ t 0 , (1) where t 0 is the initial time, x ( t ) ∈ R n ( n ∈ N ) the state vector, ̇ x ( t ) the time derivative of x ( t ) , u ( t ) the exogenous input, and w ( t ) the process noise. The system matrices A, B and F are of appropriate dimensions. We consider that u ( t ) , A, B and F are known. Our main assumption on w ( t ) is found in Assumption 1, that is presented after our measurement model. Remark 1: Our results extend to continuous and discrete time-variant systems, as explained in detail in Section III (Corollaries 1 and 2). We introduce the measurement model: Measurement Model: We consider m sensors: z i ( t ) = C i x ( t ) + v i ( t ) , i ∈ [ m ] , (2) where z i ( t ) is the measurement taken by sensor i at time t , C i ∈ R d i × n ( d i ∈ N ) is sensor’s i measurement matrix, and v i ( t ) is its measurement noise. We make the following assumption on x ( t 0 ) , w ( t ) and v i ( t ) : Assumption 1: For all t , t ′ ≥ t 0 , t 6 = t ′ , and all i ∈ [ m ] : x ( t 0 ) , w ( t ) , w ( t ′ ) , v i ( t ) and v i ( t ′ ) are uncorrelated; in addition, x ( t 0 ) , w ( t ) and v i ( t ) have positive definite covariance. We now introduce the sensor scheduling model: Sensor Scheduling Model: The m sensors in (2) are used at K scheduled measurement times { t 1 , t 2 , . . . , t K } . Specifically, at each t k only r k of these m sensors are used ( r k ≤ m ), resulting in the batch measurement vector y ( t k ) : y ( t k ) = S ( t k ) z ( t k ) , k ∈ [ K ] , (3) where z ( t k ) ≡ ( z ⊤ 1 ( t k ) , z ⊤ 2 ( t k ) , . . . , z ⊤ m ( t k )) ⊤ , and S ( t k ) is the sensor selection matrix: it is a block matrix, composed of matrices [ S ( t k )] ij ( i ∈ [ r k ] , j ∈ [ m ] ) such that [ S ( t k )] ij = I if sensor j is used at t k , and [ S ( t k )] ij = 0 otherwise. We consider that each sensor can be used at most once at each t k , and as a result, for each i there is one j such that [ S ( t k )] ij = I while for each j there is at most one i such that [ S ( t k )] ij = I . We now present the sensor scheduling problem we study in this paper. To this end, we use two notations: Notation: First, we set S k ≡ { j : there exists i ∈ [ r k ] , [ S ( t k )] ij = 1 } ; that is, S k is the set of indices that correspond to used sensors at t k . Second, we set S 1: K ≡ ( S 1 , S 2 , . . . , S K ) . Problem 1 (Sensor Scheduling for Minimum Variance Batch State Estimation): Given a set of measurement times t 1 , t 2 , . . . , t K , select at each t k to use a subset of r k sensors, out of the m sensors in (2) , so to minimize the log det of the error covariance of the minimum variance linear estimator of x 1: K ≡ ( x ( t 1 ) , x ( t 2 ) , . . . , x ( t K )) . In mathematical notation: minimize S k ⊆ [ m ] ,k ∈ [ K ] log det(Σ(ˆ x 1: K |S 1: K )) subject to |S k | ≤ r k , k ∈ [ K ] , where ˆ x 1: K is the minimum variance linear estimator of x 1: K , and Σ(ˆ x 1: K |S 1: K ) its error covariance given S 1: K . Two remarks follow on the definition of Problem 1. In the first remark we explain why we focus on ˆ x 1: K , and in the second why we focus on log det(Σ(ˆ x 1: K )) . Notation: For notational simplicity, we use Σ(ˆ x 1: K ) and Σ(ˆ x 1: K |S 1: K ) interchangeably. Remark 2: We focus on the minimum variance linear estimator ˆ x 1: K because of its optimality: it minimizes among all linear estimators of x 1: K the estimation error E ( ‖ x 1: K − ˆ x 1: K ‖ 2 2 ) , where the expectation is taken with respect to y ( t 1 ) , y ( t 2 ) , . . . , y ( t K ) [6]. Because ˆ x 1: K is also unbiased (that is, E (ˆ x 1: K ) = x 1: K , where the expectation is taken with respect to y ( t 1 ) , y ( t 2 ) , . . . , y ( t K )) , we equivalently say that ˆ x 1: K is the minimum variance estimator of x 1: K . We compute the error covariance of ˆ x 1: K in Appendix I. Remark 3: We focus on the estimation error metric log det(Σ(ˆ x 1: K )) because when it is minimized the prob- ability that the estimation error ‖ x 1: K − ˆ x 1: K ‖ 2 2 is small is maximized. To quantify this statement, we note that this error metric is related to the η -confidence ellipsoid of x 1: K − ˆ x 1: K [16]: Specifically, the η -confidence ellipsoid is the minimum volume ellipsoid that contains x 1: K − ˆ x 1: K with probability η , that is, it is the E ǫ ≡ { x : x ⊤ Σ(ˆ x 1: K ) x ≤ ǫ } , where ǫ is the quantity F − 1 χ 2 n ( k +1) ( η ) , and F χ 2 n ( k +1) the cumu- lative distribution function of a χ -squared random variable with n ( k + 1) degrees of freedom [23]. Thus, its volume vol ( E ǫ ) ≡ ( ǫπ ) n ( k +1) / 2 Γ ( n ( k + 1) / 2 + 1) det ( Σ(ˆ x 1: K ) 1 / 2 ) , (4) where Γ( · ) denotes the Gamma function [23], quantifies the estimation error of the optimal estimator ˆ x 1: K . Therefore, by taking the logarithm of (4), we validate that when the log det(Σ(ˆ x 1: K )) is minimized the probability that the estimation error ‖ x 1: K − ˆ x 1: K ‖ 2 2 is small is maximized. III. M AIN R ESULTS Our main results are presented in three sections: • In Section III-A, we prove that Problem 1 is NP-hard. • In Section III-B, we derive a provably near-optimal approximation algorithm for Problem 1. In addition, we emphasize on its time complexity and compare it to that of existing sensor scheduling algorithms for two categories: batch state estimation, and Kalman filtering. • In Section III-C, we prove limits on the optimization of the estimation error E ( ‖ x 1: K − ˆ x 1: K ‖ 2 2 ) with respect to the scheduled sensors. Algorithm 1 Approximation algorithm for Problem 1. Input: Number of measurement times K , scheduling constraints r 1 , r 2 , . . . , r K , estimation error function log det(Σ(ˆ x 1: K |S 1: K )) : S k ⊆ [ m ] , k ∈ [ K ] 7 → R Output: Sensor sets ( S 1 , S 2 , . . . , S K ) that approximate the solution to Problem 1, as quantified in Theorem 2 k ← 1 , S 1:0 ← ∅ while k ≤ K do 1. Apply Algorithm 2 to min S ⊆ [ m ] { log det(Σ(ˆ x 1: K |S 1: k − 1 , S )) : |S| ≤ r k } (5) 2. Denote as S k the solution Algorithm 2 returns 3. S 1: k ← ( S 1: k − 1 , S k ) 4. k ← k + 1 end while A. Computational Complexity of Sensor Scheduling for Batch State Estimation In this section, we characterize the computational com- plexity of Problem 1. In particular, we prove: Theorem 1: The problem of sensor scheduling for mini- mum variance batch state estimation (Problem 1) is NP-hard. Proof: The proof is found in Appendix II. Here, we note that the proof is complete by finding an instance of Problem 1 that is equivalent to the NP-hard minimal observability problem introduced in [24], [25]. Due to Theorem 1, for the polynomial time solution of Problem 1 we need to appeal to approximation algorithms. To this end, in Section III-B, we provide an efficient provably near-optimal approximation algorithm: B. Algorithm for Sensor Scheduling for Minimum Variance Batch State Estimation We propose Algorithm 1 for Problem 1 (Algorithm 1 uses Algorithm 2 as a subroutine); with the following theorem, we quantify its approximation performance and time complexity. Theorem 2: The theorem has two parts: 1) Approximation performance of Algorithm 1: Algorithm 1 returns sensors sets S 1 , S 2 , . . . , S K that: • satisfy all the feasibility constraints of Problem 1: |S k | ≤ r k , k ∈ [ K ] • achieve an error value log det(Σ(ˆ x 1: K |S 1: K )) , where S 1: K ≡ ( S 1 , S 2 , . . . , S K ) , such that: log det(Σ(ˆ x 1: K |S 1: K )) − OP T M AX − OP T ≤ 1 2 , (6) where OP T is the (optimal) value to Problem 1, and M AX is the maximum (worst) value to Problem 1 ( M AX ≡ max S ′ 1: K log det(Σ(ˆ x 1: K |S ′ 1: K ))) . 2) Time complexity of Algorithm 1: Algorithm 1 has time complexity of order O ( n 2 . 4 K ∑ K k =1 r 2 k ) . Theorem 2 extends to continuous and discrete time-variant systems as follows: Corollary 1: Consider the time-variant version of (1) : ̇ x ( t ) = A ( t ) x ( t ) + B ( t ) u ( t ) + F ( t ) w ( t ) , t ≥ t 0 . (7) 1) Part 1 of Theorem 2 holds. 2) Part 2 of Theorem 2 holds if the time complexity for computing each transition matrix Φ( t k +1 , t k ) [22], where k ∈ [ K − 1] , is O ( n 3 ) . 3 Corollary 2: Consider the discrete time version of (7) : x [ k + 1] = A k x [ k ] + B k u [ k ] + F k w [ k ] , k ≥ k 0 . (8) Similarly, consider the discrete time counterparts of the sensor model (2) , Assumption 1, and the sensor scheduling model (3) . 1) Part 1 of Theorem 2 holds. 2) Part 2 of Theorem 2 holds if A k in (8) is full rank for all k ∈ [ K ] . We follow-up with several remarks on Theorem 2: Remark 4: (Approximation quality of Algorithm 1) Theo- rem 2 quantifies the worst-case performance of Algorithm 1 across all values of Problem 1’s parameters. The reason is that the right-hand side of (6) is constant. In particular, (6) guarantees that for any instance of Problem 1, the distance of the approximate value log det(Σ(ˆ x 1: K |S 1: K )) from OP T is at most 1 / 2 the distance of the worst (maximum) value M AX from OP T . In addition, this approximation factor is near to the optimal approximation factor 1 /e ∼ = . 38 one can achieve in the worst-case for Problem 1 in polynomial time [26]; the reason is twofold: first, as we comment in the next paragraph, we prove that Problem 1 involves the minimization of a non-increasing and supermodular function [27], and second, as we proved in Section III-A, Problem 1 is in the worst-case equivalent to the minimal controllability problem introduced in [24], which cannot be approximated in polynomial time with a better factor than the 1 /e [28]. Remark 5: ( Supermodularity of log det(Σ(ˆ x 1: K ))) In the proof of Theorem 2 (Appendix III), we show that log det(Σ(ˆ x 1: K )) is a non-increasing and supermodular function with respect to the sequence of selected sensors. Specifically, the proof of (6) follows by combining these two results and results on the maximization of submodular functions over matroid constraints [29] —we present these three derivations in Appendices III-A, III-B, and III-C, respectively. We continue with our third remark on Theorem 2: Remark 6: (Time complexity of Algorithm 1) Algorithm 1’s time complexity is broken down into two parts: the first part is the number of evaluations of log det(Σ(ˆ x 1: K )) required by the algorithm, and the second part is the time complexity of each such evaluation. In particular, Algorithm 1 requires at most r 2 k evaluations of log det(Σ(ˆ x 1: K )) at each t k . Therefore, Algorithm 1 achieves a time complexity that is only linear in K with respect to the total num- ber of evaluations of log det(Σ(ˆ x 1: K )) . The reason is that ∑ K k =1 r 2 k ≤ max k ∈ [ K ] ( r 2 k ) K . In addition, for w ( t ) zero mean and white Gaussian —as commonly assumed in the literature of sensor scheduling— the time complexity of each such evaluation is at most linear in K : the reason is that this 3 The matrices Φ( t k +1 , t k ) , where k ∈ [ K − 1] , are used in the computation of Σ(ˆ x 1: K ) (cf. proof of Theorem 2 in Appendix III). Algorithm 2 Single step greedy algorithm (subroutine in Algorithm 1). Input: Current iteration k (corresponds to t k ), selected sensor sets ( S 1 , S 2 , . . . , S k − 1 ) up to the current itera- tion, scheduling constraint r k , estimation error function log det(Σ(ˆ x 1: K |S 1: K )) : S k ⊆ [ m ] , k ∈ [ K ] 7 → R Output: Sensor set S k that approximates the solution to Problem 1 at t k S 0 ← ∅ , X 0 ← [ m ] , and t ← 1 Iteration t: 1. If X t − 1 = ∅ , return S t − 1 2. Select i ( t ) ∈ X t − 1 for which ρ i ( t ) ( S t − 1 ) = max i ∈X t − 1 ρ i ( S t − 1 ) , with ties settled arbitrarily, where: ρ i ( S t − 1 ) ≡ log det(Σ(ˆ x 1: K |S 1: k − 1 , S t − 1 )) − log det(Σ(ˆ x 1: K |S 1: k − 1 , S t − 1 ∪ { i } )) and S 1: k − 1 ≡ ( S 1 , S 2 , . . . , S k − 1 ) 3.a. If |S t − 1 ∪ { i ( t ) }| > r k , X t − 1 ← X t − 1 \ { i ( t ) } , and go to Step 1 3.b. If |S t − 1 ∪ { i ( t ) }| ≤ r k , S t ← S t − 1 ∪ { i ( t ) } and X t ← X t − 1 \ { i ( t ) } 4. t ← t + 1 and continue w ( t ) agrees with Assumption 1, in which case we prove that the time complexity of each evaluation of log det(Σ(ˆ x 1: K )) is O ( n 2 . 4 K ) (linear in K ). 4 Remark 7: ( Sparsity of Σ(ˆ x 1: K )) We state the three prop- erties of log det(Σ(ˆ x 1: K )) we prove to obtain the time com- plexity for Algorithm 1. The first two properties were men- tioned in Remark 5: the monotonicity and supermodularity of log det(Σ(ˆ x 1: K )) . These two properties are responsible for that Algorithm 1 requires at most r 2 k evaluations at each t k . The third property, which follows, is responsible for the low time complexity for each evaluation of log det(Σ(ˆ x 1: K )) : • Σ(ˆ x 1: K ) is the sum of two nK × nK sparse matri- ces: the first matrix is block diagonal, and the sec- ond one is block tri-diagonal. As a result, given that both of these matrices are known, each evaluation of log det(Σ(ˆ x 1: K )) has time complexity O ( n 2 . 4 K ) , linear in K (using the results in [31] —cf. Theorem 2 therein). We show in Appendix III-C that after we include at each evaluation step of log det(Σ(ˆ x 1: K )) the complexity to com- pute the two sparse matrices in Σ(ˆ x 1: K ) , the total time complexity of Algorithm 1 is as given in Theorem 2. Our final remark on Theorem 2 follows: Remark 8: (Comparison of Algorithm 1’s time complexity to that of existing scheduling algorithms) We do the com- parison for two cases: batch state estimation, and Kalman filtering. In particular, we show that the time complexity of our algorithm is lower than that of existing sensor scheduling algorithms for batch state estimation, and of the similar order, or lower, of existing algorithms for Kalman filtering. 4 We can also speed up Algorithm 1 by implementing in Algorithm 2 the method of lazy evaluations [30]: this method avoids in Step 2 of Algorithm 2 the computation of ρ i ( S t − 1 ) for unnecessary choices of i . Comparison with algorithms for batch state estima- tion: In [12], Problem 1 is considered, and a semi- definite programming (SDP) algorithm is proposed; its time complexity is of the order O (max k ∈ [ K ] ( r k ) K ( nK ) 3 . 5 + (max k ∈ [ K ] ( r 2 k ) K 2 ( nK ) 2 . 5 ) [32]. Clearly, this time complex- ity is higher than that of Algorithm 1, whose complex- ity is O (max k ∈ [ K ] ( r k ) 2 K 2 n 2 . 4 ) . In addition, the algorithm presented in [12] provides no worst-case approximation guarantees (6), in contrast to Algorithm 1 that provides (6). Comparison with algorithms for Kalman filtering: We do the comparison in two steps: first, we consider algorithms based on the maximization of submodular functions, and second, algorithms based on convex relaxation techniques or the alternating direction method of multipliers (ADMM): • Algorithms based on the maximization of submodular functions: In [19], an algorithm is provided that is valid for a restricted class of linear systems: its time complex- ity is O (max k ∈ [ K ] ( r k ) mn 2 K + n 2 . 4 K ) . This time com- plexity is of similar order to that of Algorithm 1, whose complexity is of the order O (max k ∈ [ K ] ( r k ) 2 Kn 2 . 4 K ) , since max k ∈ [ K ] ( r k ) < m . Specifically, we observe in Algorithm 1’s time complexity the additional multi- plicative factor K (linear in K ) ; this difference em- anates from that Algorithm 1 offers a near-optimal guarantee over the whole time horizon ( t 1 , t 2 , . . . , t K ) whereas the algorithm in [19] offers a near-optimal guarantee only for the last time step t K . In addition, Al- gorithm 1 holds for any linear continuous time-invariant system (no restrictions are necessary), in contrast to the algorithm in [19], and it holds for any discrete time-variant systems where A k in (8) is full rank; the latter assumption is one of the four restrictive conditions in [19] (Theorem 13). • Algorithms based on convex relaxation techniques or ADMM: In [16], the authors assume a single sen- sor ( r k = 1 across t k ), and their objective is to achieve a minimal estimation error by minimizing the number of times this sensor will be used over the horizon t 1 , t 2 , . . . , t K . The time complexity of the proposed algorithm is O ( n 2 . 5 K 2 + n 3 . 5 K ) . This time complexity is higher than that of Algorithm 1, whose complexity for r k = 1 is of the order O ( n 2 . 4 K 2 ) . In [8], the authors employ ADMM tech- niques to solve a periodic sensor scheduling problem. They consider a zero mean and white Gaussian w ( t ) . The time complexity of the proposed algorithm is O (( nK ) 3 + (max k ∈ [ K ] ( r k ) K ) n 2 K 2 + max ( r k ) 2 nK 3 ) . This time complexity is of similar order to that of Algorithm 1, whose complexity in this case is O (max k ∈ [ K ] ( r 2 k ) n 2 . 4 K 2 ) , since max k ∈ [ K ] ( r k ) ≤ K ; in particular, for K > n 0 . 4 max k ∈ [ K ] ( r k ) , Algorithm 1 has lower time complexity. 5 5 More algorithms exist in the literature, that also use convex relaxation [33] or randomization techniques [34], and have similar time complexity to Algorithm 1. They achieve this complexity using additional approximation methods: e.g., they optimize instead an upper bound to the involved estimation error metric. With the above remarks we conclude: Algorithm 1 enjoys both the estimation accuracy of the batch state scheduling algorithms and the low time complexity of the Kalman filtering scheduling algorithms, since: • Algorithm 1 offers a near-optimal worst-case approx- imation guarantee for the batch state estimation er- ror. This estimation error is only approximated by the Kalman filtering sensor scheduling algorithms: the reason is that they aim instead to minimize the sum of each of the estimation errors for x ( t k ) (across t k ). However, this sum only upper bounds the batch state estimation error. • Algorithm 1 has time complexity lower than the state of the art batch estimation algorithms, and at the same time, lower than, or similar to, the time complexity of the corresponding Kalman filtering algorithms. In addition: Algorithm 1’s approximation guarantee holds for any linear system (continuous or discrete time). Moreover, Algorithm 1’s time complexity guarantee holds for any continuous time system, and for discrete time systems where A k in (8) is full rank across k . The proof of Theorem 2 can be found in Appendix III. C. Limits on Sensor Scheduling for Minimum Variance Batch State Estimation In this section, we derive two trade-offs between three important parameters of our sensor scheduling problem: 6 • the number of measurements times ( t 1 , t 2 , . . . , t K ) • the number r k of sensors that can be used at each t k • the value of the estimation error E ( ‖ x 1: K − ˆ x 1: K ‖ 2 2 ) . The first of the two trade-offs is captured in the next theorem: Theorem 3: Let σ ( − 1) w ≡ max i ∈ [ nK ] [ C ( x 1: K ) − 1 ] ii and σ ( − 1) v ≡ ‖ C ( v 1: K ) − 1 ‖ 2 . Also, let C 1: K be the block diagonal matrix where each of its K diagonal elements is equal to C , where C is the matrix [ C ⊤ 1 , C ⊤ 2 , . . . , C ⊤ m ] ⊤ . For the variance of the error of the minimum variance estimator ˆ x 1: K : E ( ‖ x 1: K − ˆ x 1: K ‖ 2 2 ) ≥ n σ ( − 1) v max k ∈ [ K ] ( r k ) ‖ C 1: K ‖ 2 2 + σ ( − 1) w /K . (9) The lower bound in (9) decreases as the number of used sen- sors for scheduling r k increases or the number measurement times K increases, and increases as the system’s size in- creases. Since these qualitative relationships were expected, the importance of this theorem lies on the quantification of these relationships (that also includes the dependence on the noise parameters σ ( − 1) w and σ ( − 1) v ): for example, (9) decreases only inversely proportional with the number of sensors for scheduling; that is, increasing the number r k so to reduce the variance of the error of the minimum variance estimator is ineffective, a fundamental limit. In addition, this bound increases linearly with the system’s size; this is another limit for large-scale systems. 6 We recall from Section II that the objective of Problem 1 is related to E ( ‖ x 1: K − ˆ x 1: K ‖ 2 2 ) in that when log det(Σ(ˆ x 1: K )) is minimized the probability that the estimation error ‖ x 1: K − ˆ x 1: K ‖ 2 2 is small is maximized. Similar results are proved in [35] for the steady state error covariance of scalar systems in the case that the number of sensors goes to infinity. In more detail, the authors in [35] account for different types of multi-access schemes, as well as, for fading channels between the sensors and the fusion centre that combines the sensor measurements. The next corollary presents our last trade-off: Corollary 3: Consider that the desired value for E ( ‖ x 1: K − ˆ x 1: K ‖ 2 2 ) is α . Any set of scheduled sensors at t 1 , t 2 , . . . , t K that achieves this error satisfies: max k ∈ [ K ] ( r k ) ≥ n/α − σ ( − 1) w /K σ ( − 1) v ‖ C 1: K ‖ 2 2 . (10) Eq. (10) implies that the number of sensors used for scheduling at each t k increases as the error of the minimum variance estimator or the number of measurements times K decreases. More importantly, it quantifies that this number increases linearly with the system’s size for fixed error variance. This is again a fundamental limit, meaningful for large-scale systems. IV. F UTURE W ORK We work on extending the results of this paper to largely unknown systems, under the presence of non-linear measure- ments. The first of these extensions allows systems whose evolution is captured by, e.g., Gaussian processes or random networks (the former example is a widely used assumption for motion models; cf. [10] and references therein). The second of these extensions allows complex measurement environments, such as camera-sensor environments, that can enable the application of our results in domains such as robotics and the automotive sector. V. A CKNOWLEDGEMENTS Vasileios Tzoumas thanks Nikolay A. Atanasov for bring- ing into his attention paper [10]. A PPENDIX I C LOSED FORMULA FOR THE ERROR COVARIANCE OF ˆ x 1: K We compute the error covariance of ˆ x 1: K : Denote as S 1: K the block diagonal matrix with diagonal ele- ments the sensor selection matrices S ( t 1 ) , S ( t 2 ) , . . . , S ( t K ) . Moreover, denote as C the matrix [ C ⊤ 1 , C ⊤ 2 , . . . , C ⊤ m ] ⊤ . Finally, denote y 1: K ≡ ( y ( t 1 ) ⊤ , y ( t 2 ) ⊤ , . . . , y ( t k ) ⊤ ) ⊤ , w 1: K ≡ ( w ( t 1 ) ⊤ , w ( t 2 ) ⊤ , . . . , w ( t k ) ⊤ ) ⊤ , and v 1: K ≡ ( v ( t 1 ) ⊤ , v ( t 2 ) ⊤ , . . . , v ( t k ) ⊤ ) ⊤ , where v ( t k ) ≡ ( v 1 ( t k ) ⊤ , v 2 ( t k ) ⊤ , . . . , v m ( t k ) ⊤ ) ⊤ ] . Then, from (1), (2) and (3): y 1: K = O 1: K x 1: K + S 1: K v 1: K , (11) where O 1: K is the ∑ K k =1 r k × nK block diagonal matrix with diagonal elements the matrices S ( t 1 ) C, S ( t 2 ) C, . . . , S ( t K ) C . ˆ x 1: K has the error covariance Σ(ˆ x 1: K ) = E (( x 1: K − ˆ x 1: K )( x 1: K − ˆ x 1: K ) ⊤ ) [6]: Σ(ˆ x 1: K ) = C ( x 1: K ) − C ( x 1: K ) O ⊤ 1: K Ξ O 1: K C ( x 1: K ) , (12) where Ξ ≡ ( O 1: K C ( x 1: K ) O ⊤ 1: K + S 1: K C ( v 1: K ) S ⊤ 1: K ) − 1 . We simplify (12) in the following lemma: Lemma 1: The error covariance of ˆ x 1: K has the equiva- lent form: Σ(ˆ x 1: K ) = ( K ∑ k =1 m ∑ i =1 s i ( t k ) U ( ki ) + C ( x 1: K ) − 1 ) − 1 , (13) where s i ( t k ) is a zero-one function, equal to 1 if and only if sensor i is used at t k , and U ( ki ) is the block diagonal matrix C ⊤ 1: K I ( ki ) C ( v 1: K ) − 1 I ( ki ) C 1: K ; C 1: K is the block diagonal matrix where each of its K diagonal elements is equal to C , and I ( ki ) is the block diagonal matrix with mK diagonal elements such that: the (( k − 1) m + i ) -th element is the d i × d i identity matrix I , and the rest of the elements are equal to zero. Proof: Let Φ ≡ ( S 1: K C ( v 1: K ) S ⊤ 1: K ) − 1 ; then, Σ(ˆ x 1: K ) = ( O ⊤ 1: K Φ O 1: K + C ( x 1: K ) − 1 ) − 1 , (14) where we deduce (14) from (12) using the Woodbury matrix identity (Corollary 2.8.8 in [36]). In addition, because S 1: K is block diagonal, and C ( v 1: K ) is block diagonal as well (per Assumption 1), Φ = S 1: K C ( v 1: K ) − 1 S ⊤ 1: K . Moreover, O 1: K = S 1: K C 1: K . Then, Σ(ˆ x 1: K ) = ( C ⊤ 1: K D C ( v 1: K ) − 1 DC 1: K + C ( x 1: K ) − 1 ) − 1 , (15) where D = S ⊤ 1: K S 1: K . Now, since D and C ( v 1: K ) − 1 are block diagonal, C ( v 1: K ) − 1 D = D C ( v 1: K ) − 1 . Furthermore, the definition of S 1: K implies D 2 = D . As a result, Σ(ˆ x 1: K ) = ( C ⊤ 1: K D C ( v 1: K ) − 1 C 1: K + C ( x 1: K ) − 1 ) − 1 . (16) Moreover, we observe: D = ∑ K k =1 ∑ m i =1 s i ( t k ) I ( ki ) . In addition, for any k ∈ [ K ] and i ∈ [ m ] : C ⊤ 1: K I ( ki ) C ( v 1: K ) − 1 C 1: K = C ⊤ 1: K I ( ki ) C ( v 1: K ) − 1 I ( ki ) C 1: K , which is the reverse step we used before for D . Using the last observation in (16) the proof is complete. A PPENDIX II P ROOF OF T HEOREM 1 Proof: We present an instance of Problem 1 that is equivalent to the NP-hard minimal observability problem introduced in [24], [25], that is defined as follows: Definition (Minimal Observability Problem): Consider the linear time-invariant system: ̇ x ( t ) = Ax ( t ) , y i ( t ) = s i e ⊤ i x ( t ) , i ∈ [ n ] (17) where x ( t ) ∈ R n , e i is the vector with the i -th entry equal to 1 and the rest equal to 0 , and s i is either zero or one; the minimal observability problem follows: select s 1 , s 2 , . . . , s n such that s 1 + s 2 + . . . + s n ≤ r, (17) is observable , (18) where r ≤ n . The minimal observability problem is NP-hard when A is chosen as in the proof of Theorem 1 of [24]. We denote this A as A N P − h . Problem 1 is equivalent to the NP-hard minimal observ- ability problem for the following instance: K = 1 , m = n , w ( t ) = 0 and v i ( t ) = 0 , C i = e ⊤ i , r 1 = r , and A ( t ) = A N P − h . This observation concludes the proof. A PPENDIX III P ROOF OF T HEOREM 2 We prove Theorem 2 in three steps: we first show that log det(Σ(ˆ x 1: K )) is a non-increasing function in the choice of the sensors; we then show that log det(Σ(ˆ x 1: K )) is a supermodular function in the choice of the sensors; finally, we prove Theorem 2 by combining the aforementioned two results and results on the maximization of submodular functions over matroid constraints [29]. Notation: We recall that any collection ( x 1 , x 2 , . . . , x k ) is denoted as x 1: k ( k ∈ N ). A. Monotonicity in Sensor Scheduling for Minimum Variance Batch State Estimation We first provide two notations, and then the definition of non-increasing and non-decreasing set functions. Afterwards, we present the main result of this subsection. Notation: Given K disjoint finite sets X 1 , X 2 , . . . , X K and A i , B i ∈ X i , we write A 1: K  B 1: K to denote that for all i ∈ [ K ] , A i ⊆ B i ( A i is a subset of B i ). Moreover, we denote that A i ∈ X i for all i ∈ [ K ] as A 1: K ∈ X 1: K . Definition 1: Consider K disjoint finite sets X 1 , X 2 , . . . , X K . A function h : X 1: K 7 → R is non- decreasing if and only if for all A, B ∈ X 1: K such that A  B , h ( A ) ≤ h ( B ); h : X 1: K 7 → R is non-increasing if − h is non-decreasing. The main result of this subsection follows: Proposition 1: For any finite K ∈ N , consider K distinct copies of [ m ] , denoted as M 1 , M 2 , . . . , M K . The estimation error metric log det(Σ(ˆ x 1: K |S 1: K )) : M 1: K 7 → R is a non- increasing function in the choice of the sensors S 1: K . Proof: Consider S 1: K  S ′ 1: K . From (13) (Lemma 1 in Appendix I) and Theorem 8.4.9 in [36], Σ(ˆ x 1: K |S ′ 1: K )  Σ(ˆ x 1: K |S 1: K ) , since U ( ki )  0 and C ( x 1: K ) ≻ 0 (i.e., C ( x 1: K ) − 1 ≻ 0 ). As a result, log det(Σ(ˆ x 1: K |S ′ 1: K ))  log det(Σ(ˆ x 1: K |S 1: K )) , and the proof is complete. We next show that log det(Σ(ˆ x 1: K |S 1: K )) is a supermod- ular function with respect to the selected sensors S 1: K . B. Submodularity in Sensor Scheduling for Minimum Vari- ance Batch State Estimation We first provide a notation, and then the definition of submodular and supermodular set functions. Afterwards, we present the main result of this subsection. Notation: Given K disjoint finite sets X 1 , X 2 , . . . , X K and A 1: K , B 1: K ∈ X 1: K , we write A 1: K ⊎ B 1: K to denote that for all i ∈ [ K ] , A i ∪ B i ( A i union B i ). Definition 2: Consider K disjoint finite sets X 1 , X 2 , . . . , X K . A function h : X 1: K 7 → R is submodular if and only if for all A, B, C ∈ X 1: K such that A  B , h ( A ⊎ C ) − h ( A ) ≥ h ( B ⊎ C ) − h ( B ); h : X 1: K 7 → R is supermodular if − h is submodular. According to Definition 2, set submodularity is a dimin- ishing returns property: a function h : X 1: K 7 → R is set submodular if and only if for all C ∈ X 1: K , the function h C : X 1: K 7 → R defined for all A ∈ X 1: K as h C ( A ) ≡ h ( A ⊎ C ) − h ( A ) is non-increasing. The main result of this subsection follows: Proposition 2: For any finite K ∈ N , consider K distinct copies of [ m ] , denoted as M 1 , M 2 , . . . , M K ; the estimation error metric log det(Σ(ˆ x 1: K |S 1: K )) : M 1: K 7 → R is a set supermodular function in the choice of the sensors S 1: K . Proof: Denote ∑ K k =1 ∑ m i =1 s i ( t k ) U ( ki ) in (13) (Lemma 1 in Appendix I) as U ( S 1: K ) , − log det(Σ(ˆ x 1: K |S 1: K )) as h ( S 1: K ) , and h ( S 1: K ∪ { a } ) − h ( S 1: K ) as h a ( S 1: K ) , for any a ∈ ⋃ K k =1 M k . To prove that log det(Σ(ˆ x 1: K |S 1: K )) is supermodular, it suffices to prove that h ( S 1: K ) is submodular. In particular, h is submodular if and only if h a is a non-increasing, for any a ∈ ⋃ K k =1 M k . To this end, we follow the proof of Theorem 6 in [37]: first, observe that: h a ( S 1: K ) = log det( U ( S 1: K ∪ { a } ) + C ( x 1: K ) − 1 ) − log det( U ( S 1: K ) + C ( x 1: K ) − 1 ) = log det( U ( S 1: K ) + U ( { a } ) + C ( x 1: K ) − 1 ) − log det( U ( S 1: K ) + C ( x 1: K ) − 1 ) . For S 1: K  S ′ 1: K and t ∈ [0 , 1] , define O ( t ) ≡ C ( x 1: K ) − 1 + U ( S 1: K ) + t ( U ( S ′ 1: K ) − U ( S 1: K )) and g ( t ) ≡ log det ( O ( t ) + U ( { a } )) − log det ( O ( t )) . Then, g (0) = h a ( S 1: K ) and g (1) = h a ( S ′ 1: K ) . Moreover, since: d log det( O ( t )) dt = tr ( O ( t ) − 1 dO ( t ) dt ) (eq. (43) in [38]), ̇ g ( t ) = tr [ (( O ( t ) + U ( { a } )) − 1 − O ( t ) − 1 ) L ] , where L ≡ U ( S ′ 1: K ) − U ( S 1: K ) . From Proposition 8.5.5 in [36], ( O ( t ) + U ( { a } )) − 1  O ( t ) − 1 ( O ( t ) ≻ 0 , since C ( x 1: K ) − 1 ≻ 0 , U ( S 1: K )  0 , and U ( S ′ 1: K )  U ( S 1: K ) ). Moreover, from Corollary 8.3.6 in [36], all the eigenvalues of (( O ( t ) + U ( { a } )) − 1 − O ( t ) − 1 ) L are non-positive. As a result, ̇ g ( t ) ≤ 0 , and h a ( S ′ 1: K ) = g (1) = g (0) + ∫ 1 0 ̇ g ( t ) dt ≤ g (0) = h a ( S 1: K ) . Therefore, h a is non-increasing, and the proof is complete. Proposition 2 implies that as we increase at each t k the number of sensors used, the marginal improvement we get on the estimation error of x 1: K diminishes. We are now ready for the proof of Theorem 2. C. Proof of Theorem 2 We first provide the definition of a matroid, and then continue with the main proof: Definition 3: Consider a finite set X and a collection C of subsets of X . ( X , C ) is: • an independent system if and only if: – ∅ ∈ C , where ∅ denotes the empty set – for all X ′ ⊆ X ⊆ X , if X ∈ C , X ′ ∈ C . • a matroid if and only if in addition to the previous two properties: – for all X ′ , X ∈ C where | X ′ | < | X | , there exists x / ∈ X ′ and x ∈ X such that X ′ ∪ { x } ∈ C . Proof: [ of Part 1 of Theorem 2 ] We use the next result from the literature of maximization of submodular functions over matroid constraints: Lemma 2 (Ref. [29]): Consider K independence systems { ( X k , C k ) } k ∈ [ K ] , each the intersection of at most P matroids, 7 and a submodular and non-decreasing function h : X 1: K 7 → R . There exist a polynomial time greedy algorithm that returns an (approximate) solution S 1: K to: maximize S 1: K X 1: K h ( S 1: K ) subject to S k ∩ X k ∈ C k , k ∈ [ K ] , (19) that satisfies: h ( O ) − h ( S 1: K ) h ( O ) − h ( ∅ ) ≤ P 1 + P , (20) where O is an (optimal) solution to (19) . In particular, we prove: Lemma 3: Problem 1 is an instance of (19) with P = 1 . Proof: We identify the instance of {X k , C k } k ∈ [ K ] and h , respectively, that translate (19) to Problem 1: Given K distinct copies of [ m ] , denoted as M 1 , M 2 , . . . , M K , first consider X k = M k and C k = {S|S ⊆ M k , |S| ≤ r k } : ( X k , C k ) satisfies the first two points in part 1 of Definition 3, and as a result is an independent system. Moreover, by its definition, S k ∩ X k ∈ C k if and only if |S k | ≤ r k . Second, for all S 1: K  X 1: K , consider: h ( S 1: K ) = − log det(Σ(ˆ x 1: K |S 1: K )) . From Propositions 1 and 2, h ( S 1: K ) is set submodular and non-decreasing. In addition to Lemma 3, the independence system ( X k , C k ) , where X k = M k and C k = {S|S ⊆ M k , |S| ≤ r } , satisfies also the point in part 2 of Definition 3; thereby, it is also a matroid and as a result P , as defined in Lemma 2, is equal to 1 . This observation, along with Lemmas 2 and 3 complete the proof of (6), since the adaptation to Problem 1 of the greedy algorithm in [29] (Theorem 4.1) results to Algorithm 1. Proof: [ of Part 2 of Theorem 2 ] In Lemma 1 in Ap- pendix I we prove that Σ(ˆ x 1: K ) is the sum of two matrices: the first matrix is a block diagonal matrix, and the second one is the inverse of the covariance of x 1: K , C ( x 1: K ) . The block diagonal matrix is computed in O ( n 2 . 4 K ) time. Moreover, by extending the result in [10] (Theorem 1), we get that C ( x 1: K ) − 1 is a block tri-diagonal matrix, that is described by the ( K − 1) transition matrices Φ( t k +1 , t k ) [22], where k ∈ [ K − 1] , and K identity matrices. For continuous time systems, the time complexity to compute all the block elements in C ( x 1: K ) − 1 is O ( n 3 K ) [39]; for discrete time systems, it is O ( n 2 . 4 K ) [22]. This computation 7 Any independence system can be expressed as the intersection of a finite number of matroids [27]. of C ( x 1: K ) − 1 is made only once. Finally, from Theorem 2 in [31], we can now compute the det(Σ(ˆ x 1: K )) in O ( n 2 . 4 K ) time, since Σ(ˆ x 1: K ) is block tri-diagonal. Therefore, the overall time complexity of Algorithm 1 is: O ( n 3 K ) + O (2 n 2 . 4 K ∑ K k =1 r 2 k ) = O ( n 2 . 4 K ∑ K k =1 r 2 k ) for K large, since C ( x 1: K ) − 1 is computed only once, and Algorithm 1 requests at most ∑ K k =1 r 2 k evaluations of Σ(ˆ x 1: K ) . The proof is complete. A PPENDIX IV P ROOF OF T HEOREM 3 Proof: Since the arithmetic mean of a finite set of positive numbers is at least as large as their harmonic mean, tr (Σ(ˆ x 1: K )) ≥ ( nK ) 2 tr (∑ K k =1 ∑ m i =1 s i ( t k ) U ( ki ) + C ( x 1: K ) − 1 ) . In the denominator, we have for the second term: tr ( C ( x 1: K ) − 1 ) ≤ nKσ ( − 1) w ; and for the first term: tr ( K ∑ k =1 m ∑ i =1 s i ( t k ) U ( ki ) ) = K ∑ k =1 m ∑ i =1 s i ( t k ) tr ( U ( ki ) ) , where tr ( U ( ki ) ) ≤ nK ‖ C 1: K ‖ 2 2 ‖ C ( v 1: K ) − 1 ‖ 2 , since ‖ I ( ki ) ‖ 2 = 1 . Then, we get the upper bound: tr ( K ∑ k =1 m ∑ i =1 s i ( t k ) U ( ki ) ) ≤ nKσ ( − 1) v ‖ C 1: K ‖ 2 2 K ∑ k =1 r k ≤ nK 2 σ ( − 1) v ‖ C 1: K ‖ 2 2 max k ∈ [ K ] ( r k ) . Overall: (we denote max k ∈ [ K ] ( r k ) as r M ) tr (Σ(ˆ x 1: K )) ≥ ( nK ) 2 nK 2 σ ( − 1) v r M ‖ C 1: K ‖ 2 2 + nKσ ( − 1) w = n σ ( − 1) v r M ‖ C 1: K ‖ 2 2 + σ ( − 1) w /K , and the proof is complete. R EFERENCES [1] V. Kumar, D. Rus, and S. Singh, “Robot and sensor networks for first responders,” IEEE Pervasive computing , vol. 3, no. 4, pp. 24–33, 2004. [2] R. Nowak, U. Mitra, and R. Willett, “Estimating inhomogeneous fields using wireless sensor networks,” IEEE Journal on Selected Areas in Communications , vol. 22, no. 6, pp. 999–1006, 2004. [3] M. P. Vitus, “Sensor placement for improved robotic navigation,” Robotics: Science and Systems VI , p. 217, 2011. [4] E. Masazade, M. Fardad, and P. K. Varshney, “Sparsity-promoting extended kalman filtering for target tracking in wireless sensor net- works,” IEEE Signal Processing Letters , vol. 19, no. 12, pp. 845–848, 2012. [5] H. Rowaihy, S. Eswaran, M. Johnson, D. Verma, A. Bar-Noy, T. Brown, and T. La Porta, “A survey of sensor selection schemes in wireless sensor networks,” in Defense and Security Symposium . International Society for Optics and Photonics, 2007, pp. 65 621A– 65 621A. [6] T. Kailath, A. H. Sayed, and B. Hassibi, Linear estimation . Prentice Hall Upper Saddle River, NJ, 2000, vol. 1. [7] A. O. Hero III and D. Cochran, “Sensor management: Past, present, and future,” IEEE Sensors Journal , vol. 11, no. 12, pp. 3064–3075, 2011. [8] S. Liu, M. Fardad, E. Masazade, and P. K. Varshney, “Optimal periodic sensor scheduling in networks of dynamical systems,” IEEE Transactions on Signal Processing , vol. 62, no. 12, pp. 3055–3068. [9] M. Kaess, A. Ranganathan, and F. Dellaert, “isam: Incremental smoothing and mapping,” IEEE Transactions on Robotics , vol. 24, no. 6, pp. 1365–1378, 2008. [10] S. Anderson, T. D. Barfoot, C. H. Tong, and S. S ̈ arkk ̈ a, “Batch non- linear continuous-time trajectory estimation as exactly sparse gaussian process regression,” Autonomous Robots , vol. 39, no. 3, pp. 221–238, 2015. [11] H. Strasdat, J. Montiel, and A. J. Davison, “Real-time monocular slam: Why filter?” in 2010 IEEE International Conference on Robotics and Automation (ICRA), . IEEE, 2010, pp. 2657–2664. [12] V. Roy, A. Simonetto, and G. Leus, “Spatio-temporal sensor manage- ment for environmental field estimation,” Signal Processing , vol. 128, pp. 369 – 381, 2016. [13] M. Shamaiah, S. Banerjee, and H. Vikalo, “Greedy sensor selection: Leveraging submodularity,” in 49th IEEE Conference on Decision and Control (CDC), , 2010, pp. 2572–2577. [14] J. Liu, D. Petrovic, and F. Zhao, “Multi-step information-directed sensor querying in distributed sensor networks,” in Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03). 2003 IEEE International Conference on , vol. 5. IEEE, 2003, pp. V–145. [15] M. P. Vitus, W. Zhang, A. Abate, J. Hu, and C. J. Tomlin, “On efficient sensor scheduling for linear dynamical systems,” Automatica , vol. 48, no. 10, pp. 2482–2493, 2012. [16] S. Joshi and S. Boyd, “Sensor selection via convex optimization,” IEEE Transactions on Signal Processing , vol. 57, no. 2, pp. 451–462, 2009. [17] Y. Mo, R. Ambrosino, and B. Sinopoli, “Sensor selection strategies for state estimation in energy constrained wireless sensor networks,” Automatica , vol. 47, no. 7, pp. 1330–1338, 2011. [18] H. Zhang, R. Ayoub, and S. Sundaram, “Sensor selection for optimal filtering of linear dynamical systems: Complexity and approximation,” in IEEE Conference on Decision and Control (CDC) , 2015. [19] S. T. Jawaid and S. L. Smith, “Submodularity and greedy algorithms in sensor scheduling for linear dynamical systems,” Automatica , vol. 61, pp. 282–288, 2015. [20] M. F. Huber, A. Kuwertz, F. Sawo, and U. D. Hanebeck, “Distributed greedy sensor scheduling for model-based reconstruction of space- time continuous physical phenomena,” in Information Fusion, 2009. FUSION’09. 12th International Conference on . IEEE, 2009, pp. 102– 109. [21] A. Krause, A. Singh, and C. Guestrin, “Near-optimal sensor place- ments in gaussian processes: Theory, efficient algorithms and empirical studies,” The Journal of Machine Learning Research , vol. 9, pp. 235– 284, 2008. [22] C.-T. Chen, Linear System Theory and Design , 3rd ed. New York, NY, USA: Oxford University Press, Inc., 1998. [23] S. Venkatesh, The Theory of Probability: Explorations and Applica- tions . Cambridge University Press, 2012. [24] A. Olshevsky, “Minimal controllability problems,” IEEE Transactions on Control of Network Systems , vol. 1, no. 3, pp. 249–258, 2014. [25] S. Pequito, S. Kar, and A. Aguiar, “A framework for structural in- put/output and control configuration selection in large-scale systems,” IEEE Transactions on Automatic Control , vol. PP, no. 99, pp. 1–1, 2015. [26] J. Vondr ́ ak, “Submodularity and curvature: the optimal algorithm,” RIMS Kokyuroku Bessatsu B , vol. 23, pp. 253–266, 2010. [27] G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Optimization . New York, NY, USA: Wiley-Interscience, 1988. [28] U. Feige, “A threshold of ln n for approximating set cover,” J. ACM , vol. 45, no. 4, pp. 634–652, Jul. 1998. [29] M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey, An analysis of approximations for maximizing submodular set functionsII . Springer, 1978. [30] M. Minoux, “Accelerated greedy algorithms for maximizing submod- ular set functions,” in Optimization Techniques . Springer, 1978, pp. 234–243. [31] L. G. Molinari, “Determinants of block tridiagonal matrices,” Linear algebra and its applications , vol. 429, no. 8, pp. 2221–2226, 2008. [32] A. Nemirovski, “Interior point polynomial time methods in convex programming.” [33] J. E. Weimer, B. Sinopoli, and B. H. Krogh, “A relaxation approach to dynamic sensor selection in large-scale wireless networks,” in 28th International Conference on Distributed Computing Systems Workshops (ICDCS) , 2008, pp. 501–506. [34] V. Gupta, T. H. Chung, B. Hassibi, and R. M. Murray, “On a stochastic sensor selection algorithm with applications in sensor scheduling and sensor coverage,” Automatica , vol. 42, no. 2, pp. 251–260, 2006. [35] A. S. Leong, S. Dey, and J. S. Evans, “Asymptotics and power allo- cation for state estimation over fading channels,” IEEE Transactions on Aerospace and Electronic Systems , vol. 47, no. 1, pp. 611–633, January 2011. [36] D. S. Bernstein, Matrix mathematics: theory, facts, and formulas . Princeton University Press, 2009. [37] T. H. Summers, F. L. Cortesi, and J. Lygeros, “On submodularity and controllability in complex dynamical networks,” IEEE Transactions on Control of Network Systems , 2015, in press. [38] K. B. Petersen and M. S. Pedersen, The matrix cookbook , 2012. [39] C. Moler and C. Van Loan, “Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later,” SIAM review , vol. 45, no. 1, pp. 3–49, 2003.