arXiv:1608.07533v3 [math.OC] 26 Sep 2016 Near-Optimal Sensor Scheduling for Batch State Estimation: Complexity, Algorithms, and Limits Vasileios Tzoumas1, Ali Jadbabaie2, George J. Pappas1 Abstract— In this paper, we focus on batch state estimation for linear systems. This problem is important in applications such as environmental field estimation, robotic navigation, and target tracking. Its difficulty lies on that limited operational resources among the sensors, e.g., shared communication band- width or battery power, constrain the number of sensors that can be active at each measurement step. As a result, sensor scheduling algorithms must be employed. Notwithstanding, current sensor scheduling algorithms for batch state estimation scale poorly with the system size and the time horizon. In addition, current sensor scheduling algorithms for Kalman filtering, although they scale better, provide no performance guarantees or approximation bounds for the minimization of the batch state estimation error. In this paper, one of our main contributions is to provide an algorithm that enjoys both the estimation accuracy of the batch state scheduling algorithms and the low time complexity of the Kalman filtering scheduling algorithms. In particular: 1) our algorithm is near- optimal: it achieves a solution up to a multiplicative factor 1/2 from the optimal solution, and this factor is close to the best approximation factor 1/e one can achieve in polynomial time for this problem; 2) our algorithm has (polynomial) time complexity that is not only lower than that of the current algorithms for batch state estimation; it is also lower than, or similar to, that of the current algorithms for Kalman filtering. We achieve these results by proving two properties for our batch state estimation error metric, which quantifies the square error of the minimum variance linear estimator of the batch state vector: a) it is supermodular in the choice of the sensors; b) it has a sparsity pattern (it involves matrices that are block tri-diagonal) that facilitates its evaluation at each sensor set. I. INTRODUCTION Search and rescue [1], environmental field estimation [2], robotic navigation [3], and target tracking [4] are only a few of the challenging information gathering problems that employ the monitor capabilities of sensor networks [5]. In particular, all these problems face the following three main challenges: • they involve systems whose evolution is largely un- known, corrupted with noisy inputs [4], and sensors with limited sensor capabilities, corrupted with mea- surement noise [6]. 1The authors are with the Department of Electrical and Systems En- gineering, University of Pennsylvania, Philadelphia, PA 19104-6228 USA (email: {vtzoumas, pappasg}@seas.upenn.edu). 2The author is with the Department of Civil and Environmental Engineer- ing, Massachusetts Institute of Technology, Cambridge, MA, 02139 USA (email: jadbabai@mit.edu). This work was supported in part by TerraSwarm, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA, in part by AFOSR Complex Networks Program and in part by AFOSR MURI CHASE. • they involve systems that change over time [2], and as a result, necessitate both spacial and temporal deployment of sensors in the environment. At the same time: • they involve operational constraints, such as limited bandwidth and battery life, which limit the number of sensors that can be simultaneously used (i.e., be switched-on) in the information gathering process [7]. As a result of these challenges, researchers focused on the following question: “How do we select at each measurement step only a few sensors so to minimize the estimation error despite the above challenges?” The effort to answer this question resulted to the problem of sensor scheduling [7]: in particular, sensor scheduling offers a formal methodology to use at each measurement time only a few sensors and obtain an optimal trade-off between the estimation accuracy and the usage of the limited operational resource (e.g., the shared bandwidth). Clearly, sensor scheduling is a combinatorial problem of exponential complexity [5]. In this paper, we focus on the following instance of this problem: Problem 1 (Sensor Scheduling for Minimum Variance Batch State Estimation): Consider a time-invariant linear system, whose state at time tk is denoted as x(tk), a set of m sensors, and a fixed set of K measurement times t1, t2, . . . , tK. In addition, consider that at each tk at most rk sensors can be used, where rk ≤m. At each tk select a set of rk sensors so to minimize the square estimation error of the minimum variance linear estimator of the batch state vector (x(t1), x(t2), . . . , x(tK)). There are two classes of sensor scheduling algorithms, that trade-off between the estimation accuracy of the batch state vector and their time complexity: these for Kalman filtering, and those for batch state estimation. In more detail: Kalman filtering algorithms: These algorithms sacrifice estimation accuracy over reduced time complexity. The rea- son is that they are sequential algorithms: at each tk, they select the sensors so to minimize the square estimation error of the minimum variance linear estimator of x(tk) (given the measurements up to tk). Therefore, their objective is to minimize the sum of the square estimation errors of x(tk) across the measurement times tk [8]. However, this sum is only an upper bound to the square estimation error of the batch state vector (x(t1), x(t2), . . . , x(tK)). Thus, the Kalman filtering algorithms lack on estimation accuracy with respect to the batch state estimation algorithms. Batch state estimation algorithms: These algorithms sacrifice time complexity over estimation accuracy. The rea- son is that they perform global optimization, in accordance to Problem 1. Therefore, however, they lack on time complexity with respect to the Kalman filtering algorithms. Notwithstanding, in several recent robotic applications, batch estimation algorithms have been proven competitive in their time complexity to their filtering counterparts [9], [10]. The reason is that sparsity patterns emerge in these applications, that reduce the time complexity of their batch estimation algorithms to an order similar to that of the filtering algorithms [11]. Thereby, the following question on Problem 1 arises: Question 1: “Is there an algorithm for Problem 1 that enjoys both the estimation accuracy of the batch state algorithms and the low time complexity of the Kalman filtering algorithms?” Literature review on sensor scheduling algorithms for batch state estimation: The most relevant paper on Problem 1 is [12], where an algorithm based on convex relaxation is provided. This algorithm scales poorly with the system’s size and number of measurement times. In addition, it provides no approximation performance guarantees. Literature review on sensor scheduling algorithms for Kalman filtering: Several papers in this category have fo- cused on myopic algorithms [13]; such algorithms, however, often perform poorly [14]. Other papers have focused on algorithms that use: tree pruning [15], convex optimiza- tion [16], quadratic programming [17], or submodular func- tion maximization [18], [19]. Nevertheless, these algorithms provide no performance guarantees on the batch state estima- tion error, or have time complexity that scales poorly with the system’s size and number of measurement times [15] [17]. To reduce the time complexity of these algorithms, papers have also proposed periodic sensor schedules [8]. Contributions: We now present our contributions: 1) We prove that Problem 1 is NP-hard. 2) We provide an algorithm for Problem 1 (Algorithm 1) that answers Question 1 positively. The reasons are two: i) Algorithm 1 is near-optimal: it achieves a solution that is up to a multiplicative factor 1/2 from the opti- mal solution. In addition, this multiplicative factor is close to the factor 1/e which we prove to be the best approximation factor one can achieve in polynomial time for Problem 1 in the worst-case. ii) Algorithm 1 has (polynomial) time complexity that is not only lower than that of the state of the art scheduling algorithms for batch state estimation; it is also lower than, or similar to, that of the state of the art scheduling algorithms for Kalman filtering. For example, it has similar complexity to the state of the art periodic scheduling algorithm in [8] (in particular: lower for K large enough), and lower than the complexity of the algorithm in [16]. Overall, in response to Question 1, Algorithm 1 enjoys both the higher estimation accuracy of the batch state estimation approach (compared to the Kalman filtering approach, that only approximates the batch state esti- mation error with an upper bound) and the low time complexity of Kalman filtering approach. 3) We prove limits on the minimization of the square error of the minimum variance estimator of (x(t1), x(t2), . . . , x(tK)) with respect to the scheduled sensors. For example, we prove that the number rk of used sensors at each measurement time must increase linearly with the system size for fixed estimation error and number of measurement times K; this is a fundamental limit, especially for large-scale systems. Our technical contributions: We achieve our aforemen- tioned contributions by proving the following two: a) Supermodularity in Problem 1: We prove that our estimation metric, that quantifies the square error of the minimum variance estimator of (x(t1), x(t2), . . . , x(tK)), is a supermodular function in the choice of the used sensors. This result becomes important when we compare it to results on the multi-step Kalman filtering that show that the corresponding estimation metric in this case is neither supermodular nor submodular [18], [19].1 In addition, this submodularity result cannot be reduced to the batch estimation problem in [21]. The main reasons are two: i) we consider sensors that can measure any linear combination of the element of x(tk), in contrast to [21], where each sensor measures directly only one element of x(tk). Nonetheless, the latter assumption is usually infeasible in dynamical systems [22]; ii) our error metric is relevant to estimation problems for dynamical systems and different to the submodular information gain considered in [21]. b) Sparsity in Problem 1: We identify a sparsity pattern in our error metric, that facilitates the latter’s evaluation at each sensor set. In particular, we prove that the error covariance of the minimum variance linear estimator of the batch state vector is block tri-diagonal. We organize the rest of the paper as follows: In Section II we present formally Problem 1. In Section III, we present in three subsections our main results: in Section III-A, we prove that our sensor scheduling problem is NP-hard. In Section III-B, we derive a near-optimal approximation algorithm, and compare its time complexity with that of previous sensor scheduling algorithms. In Section III-C, we prove limits on the minimization of the batch state estimation error with respect to the used sensors. Section IV concludes the paper with our future work. All proofs are found in the Appendix.2 1The observation of [19] is also important as it disproves previous results in the literature [20]. 2Standard notation is presented in this footnote: We denote the set of natural numbers {1, 2, . . .} as N, the set of real numbers as R, and the set {1, 2, . . . , n} as [n] (n ∈N). The empty set is denoted as ∅. Given a set X, |X| is its cardinality. Matrices are represented by capital letters and vectors by lower-case letters. We write A ∈X n1×n2 (n1, n2 ∈N) to denote a matrix of n1 rows and n2 columns whose elements take values in X. Moreover, for a matrix A, A⊤is its transpose, and [A]ij is its element at the i-th row and j-th column. In addition, ∥A∥2 ≡ √ A⊤A is its spectral norm, and det(A) its determinant. Furthermore, if A is positive semi-definite or positive definite, we write A ⪰0 and A ≻0, respectively. I is the identity matrix; its dimension is inferred from the context. Similarly for the zero matrix 0. Finally, for a random variable x ∈Rn, E(x) is its expected value, and C(x) its covariance. II. PROBLEM FORMULATION In the following paragraphs, we present our sensor scheduling problem for batch state estimation. To this end, we first build our system and measurement framework. Then, we define our sensor scheduling framework and, finally, present our sensor scheduling problem. We start in more detail with the system model: System Model: We consider the linear time-invariant system: ˙x(t) = Ax(t) + Bu(t) + Fw(t), t ≥t0, (1) where t0 is the initial time, x(t) ∈Rn (n ∈N) the state vector, ˙x(t) the time derivative of x(t), u(t) the exogenous input, and w(t) the process noise. The system matrices A, B and F are of appropriate dimensions. We consider that u(t), A, B and F are known. Our main assumption on w(t) is found in Assumption 1, that is presented after our measurement model. Remark 1: Our results extend to continuous and discrete time-variant systems, as explained in detail in Section III (Corollaries 1 and 2). We introduce the measurement model: Measurement Model: We consider m sensors: zi(t) = Cix(t) + vi(t), i ∈[m], (2) where zi(t) is the measurement taken by sensor i at time t, Ci ∈Rdi×n (di ∈N) is sensor’s i measurement matrix, and vi(t) is its measurement noise. We make the following assumption on x(t0), w(t) and vi(t): Assumption 1: For all t, t′ ≥t0, t ̸= t′, and all i ∈ [m]: x(t0), w(t), w(t′), vi(t) and vi(t′) are uncorrelated; in addition, x(t0), w(t) and vi(t) have positive definite covariance. We now introduce the sensor scheduling model: Sensor Scheduling Model: The m sensors in (2) are used at K scheduled measurement times {t1, t2, . . . , tK}. Specifically, at each tk only rk of these m sensors are used (rk ≤m), resulting in the batch measurement vector y(tk): y(tk) = S(tk)z(tk), k ∈[K], (3) where z(tk) ≡(z⊤ 1 (tk), z⊤ 2 (tk), . . . , z⊤ m(tk))⊤, and S(tk) is the sensor selection matrix: it is a block matrix, composed of matrices [S(tk)]ij (i ∈[rk], j ∈[m]) such that [S(tk)]ij = I if sensor j is used at tk, and [S(tk)]ij = 0 otherwise. We consider that each sensor can be used at most once at each tk, and as a result, for each i there is one j such that [S(tk)]ij = I while for each j there is at most one i such that [S(tk)]ij = I. We now present the sensor scheduling problem we study in this paper. To this end, we use two notations: Notation: First, we set Sk ≡{j : there exists i ∈ [rk], [S(tk)]ij = 1}; that is, Sk is the set of indices that correspond to used sensors at tk. Second, we set S1:K ≡ (S1, S2, . . . , SK). Problem 1 (Sensor Scheduling for Minimum Variance Batch State Estimation): Given a set of measurement times t1, t2, . . . , tK, select at each tk to use a subset of rk sensors, out of the m sensors in (2), so to minimize the log det of the error covariance of the minimum variance linear estimator of x1:K ≡(x(t1), x(t2), . . . , x(tK)). In mathematical notation: minimize Sk⊆[m],k∈[K] log det(Σ(ˆx1:K|S1:K)) subject to |Sk| ≤rk, k ∈[K], where ˆx1:K is the minimum variance linear estimator of x1:K, and Σ(ˆx1:K|S1:K) its error covariance given S1:K. Two remarks follow on the definition of Problem 1. In the first remark we explain why we focus on ˆx1:K, and in the second why we focus on log det(Σ(ˆx1:K)). Notation: For notational simplicity, we use Σ(ˆx1:K) and Σ(ˆx1:K|S1:K) interchangeably. Remark 2: We focus on the minimum variance linear estimator ˆx1:K because of its optimality: it minimizes among all linear estimators of x1:K the estimation error E(∥x1:K − ˆx1:K∥2 2), where the expectation is taken with respect to y(t1), y(t2), . . . , y(tK) [6]. Because ˆx1:K is also unbiased (that is, E(ˆx1:K) = x1:K, where the expectation is taken with respect to y(t1), y(t2), . . . , y(tK)), we equivalently say that ˆx1:K is the minimum variance estimator of x1:K. We compute the error covariance of ˆx1:K in Appendix I. Remark 3: We focus on the estimation error metric log det(Σ(ˆx1:K)) because when it is minimized the prob- ability that the estimation error ∥x1:K −ˆx1:K∥2 2 is small is maximized. To quantify this statement, we note that this error metric is related to the η-confidence ellipsoid of x1:K −ˆx1:K [16]: Specifically, the η-confidence ellipsoid is the minimum volume ellipsoid that contains x1:K−ˆx1:K with probability η, that is, it is the Eǫ ≡{x : x⊤Σ(ˆx1:K)x ≤ǫ}, where ǫ is the quantity F −1 χ2 n(k+1)(η), and Fχ2 n(k+1) the cumu- lative distribution function of a χ-squared random variable with n(k + 1) degrees of freedom [23]. Thus, its volume vol(Eǫ) ≡ (ǫπ)n(k+1)/2 Γ (n(k + 1)/2 + 1) det  Σ(ˆx1:K)1/2 , (4) where Γ(·) denotes the Gamma function [23], quantifies the estimation error of the optimal estimator ˆx1:K. Therefore, by taking the logarithm of (4), we validate that when the log det(Σ(ˆx1:K)) is minimized the probability that the estimation error ∥x1:K −ˆx1:K∥2 2 is small is maximized. III. MAIN RESULTS Our main results are presented in three sections: • In Section III-A, we prove that Problem 1 is NP-hard. • In Section III-B, we derive a provably near-optimal approximation algorithm for Problem 1. In addition, we emphasize on its time complexity and compare it to that of existing sensor scheduling algorithms for two categories: batch state estimation, and Kalman filtering. • In Section III-C, we prove limits on the optimization of the estimation error E(∥x1:K −ˆx1:K∥2 2) with respect to the scheduled sensors. Algorithm 1 Approximation algorithm for Problem 1. Input: Number of measurement times K, scheduling constraints r1, r2, . . . , rK, estimation error function log det(Σ(ˆx1:K|S1:K)) : Sk ⊆[m], k ∈[K] 7→R Output: Sensor sets (S1, S2, . . . , SK) that approximate the solution to Problem 1, as quantified in Theorem 2 k ←1, S1:0 ←∅ while k ≤K do 1. Apply Algorithm 2 to min S⊆[m]{log det(Σ(ˆx1:K|S1:k−1, S)) : |S| ≤rk} (5) 2. Denote as Sk the solution Algorithm 2 returns 3. S1:k ←(S1:k−1, Sk) 4. k ←k + 1 end while A. Computational Complexity of Sensor Scheduling for Batch State Estimation In this section, we characterize the computational com- plexity of Problem 1. In particular, we prove: Theorem 1: The problem of sensor scheduling for mini- mum variance batch state estimation (Problem 1) is NP-hard. Proof: The proof is found in Appendix II. Here, we note that the proof is complete by finding an instance of Problem 1 that is equivalent to the NP-hard minimal observability problem introduced in [24], [25]. Due to Theorem 1, for the polynomial time solution of Problem 1 we need to appeal to approximation algorithms. To this end, in Section III-B, we provide an efficient provably near-optimal approximation algorithm: B. Algorithm for Sensor Scheduling for Minimum Variance Batch State Estimation We propose Algorithm 1 for Problem 1 (Algorithm 1 uses Algorithm 2 as a subroutine); with the following theorem, we quantify its approximation performance and time complexity. Theorem 2: The theorem has two parts: 1) Approximation performance of Algorithm 1: Algorithm 1 returns sensors sets S1, S2, . . . , SK that: • satisfy all the feasibility constraints of Problem 1: |Sk| ≤rk, k ∈[K] • achieve an error value log det(Σ(ˆx1:K|S1:K)), where S1:K ≡(S1, S2, . . . , SK), such that: log det(Σ(ˆx1:K|S1:K)) −OPT MAX −OPT ≤1 2, (6) where OPT is the (optimal) value to Problem 1, and MAX is the maximum (worst) value to Problem 1 (MAX ≡maxS′ 1:K log det(Σ(ˆx1:K|S′ 1:K))). 2) Time complexity of Algorithm 1: Algorithm 1 has time complexity of order O(n2.4K PK k=1 r2 k). Theorem 2 extends to continuous and discrete time-variant systems as follows: Corollary 1: Consider the time-variant version of (1): ˙x(t) = A(t)x(t) + B(t)u(t) + F(t)w(t), t ≥t0. (7) 1) Part 1 of Theorem 2 holds. 2) Part 2 of Theorem 2 holds if the time complexity for computing each transition matrix Φ(tk+1, tk) [22], where k ∈[K −1], is O(n3).3 Corollary 2: Consider the discrete time version of (7): x[k + 1] = Akx[k] + Bku[k] + Fkw[k], k ≥k0. (8) Similarly, consider the discrete time counterparts of the sensor model (2), Assumption 1, and the sensor scheduling model (3). 1) Part 1 of Theorem 2 holds. 2) Part 2 of Theorem 2 holds if Ak in (8) is full rank for all k ∈[K]. We follow-up with several remarks on Theorem 2: Remark 4: (Approximation quality of Algorithm 1) Theo- rem 2 quantifies the worst-case performance of Algorithm 1 across all values of Problem 1’s parameters. The reason is that the right-hand side of (6) is constant. In particular, (6) guarantees that for any instance of Problem 1, the distance of the approximate value log det(Σ(ˆx1:K|S1:K)) from OPT is at most 1/2 the distance of the worst (maximum) value MAX from OPT . In addition, this approximation factor is near to the optimal approximation factor 1/e ∼= .38 one can achieve in the worst-case for Problem 1 in polynomial time [26]; the reason is twofold: first, as we comment in the next paragraph, we prove that Problem 1 involves the minimization of a non-increasing and supermodular function [27], and second, as we proved in Section III-A, Problem 1 is in the worst-case equivalent to the minimal controllability problem introduced in [24], which cannot be approximated in polynomial time with a better factor than the 1/e [28]. Remark 5: (Supermodularity of log det(Σ(ˆx1:K))) In the proof of Theorem 2 (Appendix III), we show that log det(Σ(ˆx1:K)) is a non-increasing and supermodular function with respect to the sequence of selected sensors. Specifically, the proof of (6) follows by combining these two results and results on the maximization of submodular functions over matroid constraints [29] —we present these three derivations in Appendices III-A, III-B, and III-C, respectively. We continue with our third remark on Theorem 2: Remark 6: (Time complexity of Algorithm 1) Algorithm 1’s time complexity is broken down into two parts: the first part is the number of evaluations of log det(Σ(ˆx1:K)) required by the algorithm, and the second part is the time complexity of each such evaluation. In particular, Algorithm 1 requires at most r2 k evaluations of log det(Σ(ˆx1:K)) at each tk. Therefore, Algorithm 1 achieves a time complexity that is only linear in K with respect to the total num- ber of evaluations of log det(Σ(ˆx1:K)). The reason is that PK k=1 r2 k ≤maxk∈[K](r2 k)K. In addition, for w(t) zero mean and white Gaussian —as commonly assumed in the literature of sensor scheduling— the time complexity of each such evaluation is at most linear in K: the reason is that this 3The matrices Φ(tk+1, tk), where k ∈[K −1], are used in the computation of Σ(ˆx1:K) (cf. proof of Theorem 2 in Appendix III). Algorithm 2 Single step greedy algorithm (subroutine in Algorithm 1). Input: Current iteration k (corresponds to tk), selected sensor sets (S1, S2, . . . , Sk−1) up to the current itera- tion, scheduling constraint rk, estimation error function log det(Σ(ˆx1:K|S1:K)) : Sk ⊆[m], k ∈[K] 7→R Output: Sensor set Sk that approximates the solution to Problem 1 at tk S0 ←∅, X 0 ←[m], and t ←1 Iteration t: 1. If X t−1 = ∅, return St−1 2. Select i(t) ∈ X t−1 for which ρi(t)(St−1) = maxi∈X t−1 ρi(St−1), with ties settled arbitrarily, where: ρi(St−1) ≡ log det(Σ(ˆx1:K|S1:k−1, St−1)) − log det(Σ(ˆx1:K|S1:k−1, St−1 ∪{i})) and S1:k−1 ≡(S1, S2, . . . , Sk−1) 3.a. If |St−1 ∪{i(t)}| > rk, X t−1 ←X t−1 \ {i(t)}, and go to Step 1 3.b. If |St−1 ∪{i(t)}| ≤rk, St ←St−1 ∪{i(t)} and X t ← X t−1 \ {i(t)} 4. t ←t + 1 and continue w(t) agrees with Assumption 1, in which case we prove that the time complexity of each evaluation of log det(Σ(ˆx1:K)) is O(n2.4K) (linear in K).4 Remark 7: (Sparsity of Σ(ˆx1:K)) We state the three prop- erties of log det(Σ(ˆx1:K)) we prove to obtain the time com- plexity for Algorithm 1. The first two properties were men- tioned in Remark 5: the monotonicity and supermodularity of log det(Σ(ˆx1:K)). These two properties are responsible for that Algorithm 1 requires at most r2 k evaluations at each tk. The third property, which follows, is responsible for the low time complexity for each evaluation of log det(Σ(ˆx1:K)): • Σ(ˆx1:K) is the sum of two nK × nK sparse matri- ces: the first matrix is block diagonal, and the sec- ond one is block tri-diagonal. As a result, given that both of these matrices are known, each evaluation of log det(Σ(ˆx1:K)) has time complexity O(n2.4K), linear in K (using the results in [31] —cf. Theorem 2 therein). We show in Appendix III-C that after we include at each evaluation step of log det(Σ(ˆx1:K)) the complexity to com- pute the two sparse matrices in Σ(ˆx1:K), the total time complexity of Algorithm 1 is as given in Theorem 2. Our final remark on Theorem 2 follows: Remark 8: (Comparison of Algorithm 1’s time complexity to that of existing scheduling algorithms) We do the com- parison for two cases: batch state estimation, and Kalman filtering. In particular, we show that the time complexity of our algorithm is lower than that of existing sensor scheduling algorithms for batch state estimation, and of the similar order, or lower, of existing algorithms for Kalman filtering. 4We can also speed up Algorithm 1 by implementing in Algorithm 2 the method of lazy evaluations [30]: this method avoids in Step 2 of Algorithm 2 the computation of ρi(St−1) for unnecessary choices of i. Comparison with algorithms for batch state estima- tion: In [12], Problem 1 is considered, and a semi- definite programming (SDP) algorithm is proposed; its time complexity is of the order O(maxk∈[K](rk)K(nK)3.5 + (maxk∈[K](r2 k)K2(nK)2.5) [32]. Clearly, this time complex- ity is higher than that of Algorithm 1, whose complex- ity is O(maxk∈[K](rk)2K2n2.4). In addition, the algorithm presented in [12] provides no worst-case approximation guarantees (6), in contrast to Algorithm 1 that provides (6). Comparison with algorithms for Kalman filtering: We do the comparison in two steps: first, we consider algorithms based on the maximization of submodular functions, and second, algorithms based on convex relaxation techniques or the alternating direction method of multipliers (ADMM): • Algorithms based on the maximization of submodular functions: In [19], an algorithm is provided that is valid for a restricted class of linear systems: its time complex- ity is O(maxk∈[K](rk)mn2K+n2.4K). This time com- plexity is of similar order to that of Algorithm 1, whose complexity is of the order O(maxk∈[K](rk)2Kn2.4K), since maxk∈[K](rk) < m. Specifically, we observe in Algorithm 1’s time complexity the additional multi- plicative factor K (linear in K); this difference em- anates from that Algorithm 1 offers a near-optimal guarantee over the whole time horizon (t1, t2, . . . , tK) whereas the algorithm in [19] offers a near-optimal guarantee only for the last time step tK. In addition, Al- gorithm 1 holds for any linear continuous time-invariant system (no restrictions are necessary), in contrast to the algorithm in [19], and it holds for any discrete time-variant systems where Ak in (8) is full rank; the latter assumption is one of the four restrictive conditions in [19] (Theorem 13). • Algorithms based on convex relaxation techniques or ADMM: In [16], the authors assume a single sen- sor (rk = 1 across tk), and their objective is to achieve a minimal estimation error by minimizing the number of times this sensor will be used over the horizon t1, t2, . . . , tK. The time complexity of the proposed algorithm is O(n2.5K2 + n3.5K). This time complexity is higher than that of Algorithm 1, whose complexity for rk = 1 is of the order O(n2.4K2). In [8], the authors employ ADMM tech- niques to solve a periodic sensor scheduling problem. They consider a zero mean and white Gaussian w(t). The time complexity of the proposed algorithm is O((nK)3 +(maxk∈[K](rk)K)n2K2 +max(rk)2nK3). This time complexity is of similar order to that of Algorithm 1, whose complexity in this case is O(maxk∈[K](r2 k)n2.4K2), since maxk∈[K](rk) ≤K; in particular, for K > n0.4 maxk∈[K](rk), Algorithm 1 has lower time complexity.5 5More algorithms exist in the literature, that also use convex relaxation [33] or randomization techniques [34], and have similar time complexity to Algorithm 1. They achieve this complexity using additional approximation methods: e.g., they optimize instead an upper bound to the involved estimation error metric. With the above remarks we conclude: Algorithm 1 enjoys both the estimation accuracy of the batch state scheduling algorithms and the low time complexity of the Kalman filtering scheduling algorithms, since: • Algorithm 1 offers a near-optimal worst-case approx- imation guarantee for the batch state estimation er- ror. This estimation error is only approximated by the Kalman filtering sensor scheduling algorithms: the reason is that they aim instead to minimize the sum of each of the estimation errors for x(tk) (across tk). However, this sum only upper bounds the batch state estimation error. • Algorithm 1 has time complexity lower than the state of the art batch estimation algorithms, and at the same time, lower than, or similar to, the time complexity of the corresponding Kalman filtering algorithms. In addition: Algorithm 1’s approximation guarantee holds for any linear system (continuous or discrete time). Moreover, Algorithm 1’s time complexity guarantee holds for any continuous time system, and for discrete time systems where Ak in (8) is full rank across k. The proof of Theorem 2 can be found in Appendix III. C. Limits on Sensor Scheduling for Minimum Variance Batch State Estimation In this section, we derive two trade-offs between three important parameters of our sensor scheduling problem:6 • the number of measurements times (t1, t2, . . . , tK) • the number rk of sensors that can be used at each tk • the value of the estimation error E(∥x1:K −ˆx1:K∥2 2). The first of the two trade-offs is captured in the next theorem: Theorem 3: Let σ(−1) w ≡maxi∈[nK][C(x1:K)−1]ii and σ(−1) v ≡∥C(v1:K)−1∥2. Also, let C1:K be the block diagonal matrix where each of its K diagonal elements is equal to C, where C is the matrix [C⊤ 1 , C⊤ 2 , . . . , C⊤ m]⊤. For the variance of the error of the minimum variance estimator ˆx1:K: E(∥x1:K −ˆx1:K∥2 2) ≥ n σ(−1) v maxk∈[K](rk)∥C1:K∥2 2 + σ(−1) w /K . (9) The lower bound in (9) decreases as the number of used sen- sors for scheduling rk increases or the number measurement times K increases, and increases as the system’s size in- creases. Since these qualitative relationships were expected, the importance of this theorem lies on the quantification of these relationships (that also includes the dependence on the noise parameters σ(−1) w and σ(−1) v ): for example, (9) decreases only inversely proportional with the number of sensors for scheduling; that is, increasing the number rk so to reduce the variance of the error of the minimum variance estimator is ineffective, a fundamental limit. In addition, this bound increases linearly with the system’s size; this is another limit for large-scale systems. 6We recall from Section II that the objective of Problem 1 is related to E(∥x1:K −ˆx1:K∥2 2) in that when log det(Σ(ˆx1:K)) is minimized the probability that the estimation error ∥x1:K −ˆx1:K∥2 2 is small is maximized. Similar results are proved in [35] for the steady state error covariance of scalar systems in the case that the number of sensors goes to infinity. In more detail, the authors in [35] account for different types of multi-access schemes, as well as, for fading channels between the sensors and the fusion centre that combines the sensor measurements. The next corollary presents our last trade-off: Corollary 3: Consider that the desired value for E(∥x1:K −ˆx1:K∥2 2) is α. Any set of scheduled sensors at t1, t2, . . . , tK that achieves this error satisfies: max k∈[K](rk) ≥n/α −σ(−1) w /K σ(−1) v ∥C1:K∥2 2 . (10) Eq. (10) implies that the number of sensors used for scheduling at each tk increases as the error of the minimum variance estimator or the number of measurements times K decreases. More importantly, it quantifies that this number increases linearly with the system’s size for fixed error variance. This is again a fundamental limit, meaningful for large-scale systems. IV. FUTURE WORK We work on extending the results of this paper to largely unknown systems, under the presence of non-linear measure- ments. The first of these extensions allows systems whose evolution is captured by, e.g., Gaussian processes or random networks (the former example is a widely used assumption for motion models; cf. [10] and references therein). The second of these extensions allows complex measurement environments, such as camera-sensor environments, that can enable the application of our results in domains such as robotics and the automotive sector. V. ACKNOWLEDGEMENTS Vasileios Tzoumas thanks Nikolay A. Atanasov for bring- ing into his attention paper [10]. APPENDIX I CLOSED FORMULA FOR THE ERROR COVARIANCE OF ˆx1:K We compute the error covariance of ˆx1:K: Denote as S1:K the block diagonal matrix with diagonal ele- ments the sensor selection matrices S(t1), S(t2), . . . , S(tK). Moreover, denote as C the matrix [C⊤ 1 , C⊤ 2 , . . . , C⊤ m]⊤. Finally, denote y1:K ≡ (y(t1)⊤, y(t2)⊤, . . . , y(tk)⊤)⊤, w1:K ≡ (w(t1)⊤, w(t2)⊤, . . . , w(tk)⊤)⊤, and v1:K ≡ (v(t1)⊤, v(t2)⊤, . . . , v(tk)⊤)⊤, where v(tk) ≡(v1(tk)⊤, v2(tk)⊤, . . . , vm(tk)⊤)⊤]. Then, from (1), (2) and (3): y1:K = O1:Kx1:K + S1:Kv1:K, (11) where O1:K is the PK k=1 rk × nK block diagonal matrix with diagonal elements the matrices S(t1)C, S(t2)C, . . . , S(tK)C. ˆx1:K has the error covariance Σ(ˆx1:K) = E((x1:K −ˆx1:K)(x1:K −ˆx1:K)⊤) [6]: Σ(ˆx1:K) = C(x1:K) −C(x1:K)O⊤ 1:KΞO1:KC(x1:K), (12) where Ξ ≡(O1:KC(x1:K)O⊤ 1:K + S1:KC(v1:K)S⊤ 1:K)−1. We simplify (12) in the following lemma: Lemma 1: The error covariance of ˆx1:K has the equiva- lent form: Σ(ˆx1:K) = K X k=1 m X i=1 si(tk)U (ki) + C(x1:K)−1 !−1 , (13) where si(tk) is a zero-one function, equal to 1 if and only if sensor i is used at tk, and U (ki) is the block diagonal matrix C⊤ 1:KI(ki)C(v1:K)−1I(ki)C1:K; C1:K is the block diagonal matrix where each of its K diagonal elements is equal to C, and I(ki) is the block diagonal matrix with mK diagonal elements such that: the ((k −1)m + i)-th element is the di × di identity matrix I, and the rest of the elements are equal to zero. Proof: Let Φ ≡(S1:KC(v1:K)S⊤ 1:K)−1; then, Σ(ˆx1:K) = (O⊤ 1:KΦO1:K + C(x1:K)−1)−1, (14) where we deduce (14) from (12) using the Woodbury matrix identity (Corollary 2.8.8 in [36]). In addition, because S1:K is block diagonal, and C(v1:K) is block diagonal as well (per Assumption 1), Φ = S1:KC(v1:K)−1S⊤ 1:K. Moreover, O1:K = S1:KC1:K. Then, Σ(ˆx1:K) = (C⊤ 1:KDC(v1:K)−1DC1:K + C(x1:K)−1)−1, (15) where D = S⊤ 1:KS1:K. Now, since D and C(v1:K)−1 are block diagonal, C(v1:K)−1D = DC(v1:K)−1. Furthermore, the definition of S1:K implies D2 = D. As a result, Σ(ˆx1:K) = (C⊤ 1:KDC(v1:K)−1C1:K +C(x1:K)−1)−1. (16) Moreover, we observe: D = PK k=1 Pm i=1 si(tk)I(ki). In addition, for any k ∈[K] and i ∈[m]: C⊤ 1:KI(ki)C(v1:K)−1C1:K = C⊤ 1:KI(ki)C(v1:K)−1I(ki)C1:K, which is the reverse step we used before for D. Using the last observation in (16) the proof is complete. APPENDIX II PROOF OF THEOREM 1 Proof: We present an instance of Problem 1 that is equivalent to the NP-hard minimal observability problem introduced in [24], [25], that is defined as follows: Definition (Minimal Observability Problem): Consider the linear time-invariant system: ˙x(t) = Ax(t), yi(t) = sie⊤ i x(t), i ∈[n] (17) where x(t) ∈Rn, ei is the vector with the i-th entry equal to 1 and the rest equal to 0, and si is either zero or one; the minimal observability problem follows: select s1, s2, . . . , sn such that s1 + s2 + . . . + sn ≤r, (17) is observable, (18) where r ≤n. The minimal observability problem is NP-hard when A is chosen as in the proof of Theorem 1 of [24]. We denote this A as ANP −h. Problem 1 is equivalent to the NP-hard minimal observ- ability problem for the following instance: K = 1, m = n, w(t) = 0 and vi(t) = 0, Ci = e⊤ i , r1 = r, and A(t) = ANP −h. This observation concludes the proof. APPENDIX III PROOF OF THEOREM 2 We prove Theorem 2 in three steps: we first show that log det(Σ(ˆx1:K)) is a non-increasing function in the choice of the sensors; we then show that log det(Σ(ˆx1:K)) is a supermodular function in the choice of the sensors; finally, we prove Theorem 2 by combining the aforementioned two results and results on the maximization of submodular functions over matroid constraints [29]. Notation: We recall that any collection (x1, x2, . . . , xk) is denoted as x1:k (k ∈N). A. Monotonicity in Sensor Scheduling for Minimum Variance Batch State Estimation We first provide two notations, and then the definition of non-increasing and non-decreasing set functions. Afterwards, we present the main result of this subsection. Notation: Given K disjoint finite sets X1, X2, . . . , XK and Ai, Bi ∈Xi, we write A1:K ⪯B1:K to denote that for all i ∈[K], Ai ⊆Bi (Ai is a subset of Bi). Moreover, we denote that Ai ∈Xi for all i ∈[K] as A1:K ∈X1:K. Definition 1: Consider K disjoint finite sets X1, X2, . . . , XK. A function h : X1:K 7→ R is non- decreasing if and only if for all A, B ∈X1:K such that A ⪯B, h(A) ≤h(B); h : X1:K 7→R is non-increasing if −h is non-decreasing. The main result of this subsection follows: Proposition 1: For any finite K ∈N, consider K distinct copies of [m], denoted as M1, M2, . . . , MK. The estimation error metric log det(Σ(ˆx1:K|S1:K)) : M1:K 7→R is a non- increasing function in the choice of the sensors S1:K. Proof: Consider S1:K ⪯S′ 1:K. From (13) (Lemma 1 in Appendix I) and Theorem 8.4.9 in [36], Σ(ˆx1:K|S′ 1:K) ⪯ Σ(ˆx1:K|S1:K), since U (ki) ⪰0 and C(x1:K) ≻0 (i.e., C(x1:K)−1 ≻0). As a result, log det(Σ(ˆx1:K|S′ 1:K)) ⪯ log det(Σ(ˆx1:K|S1:K)), and the proof is complete. We next show that log det(Σ(ˆx1:K|S1:K)) is a supermod- ular function with respect to the selected sensors S1:K. B. Submodularity in Sensor Scheduling for Minimum Vari- ance Batch State Estimation We first provide a notation, and then the definition of submodular and supermodular set functions. Afterwards, we present the main result of this subsection. Notation: Given K disjoint finite sets X1, X2, . . . , XK and A1:K, B1:K ∈X1:K, we write A1:K ⊎B1:K to denote that for all i ∈[K], Ai ∪Bi (Ai union Bi). Definition 2: Consider K disjoint finite sets X1, X2, . . . , XK. A function h : X1:K 7→R is submodular if and only if for all A, B, C ∈X1:K such that A ⪯B, h(A⊎C)−h(A) ≥ h(B ⊎C) −h(B); h : X1:K 7→R is supermodular if −h is submodular. According to Definition 2, set submodularity is a dimin- ishing returns property: a function h : X1:K 7→R is set submodular if and only if for all C ∈X1:K, the function hC : X1:K 7→R defined for all A ∈X1:K as hC(A) ≡ h(A ⊎C) −h(A) is non-increasing. The main result of this subsection follows: Proposition 2: For any finite K ∈N, consider K distinct copies of [m], denoted as M1, M2, . . . , MK; the estimation error metric log det(Σ(ˆx1:K|S1:K)) : M1:K 7→R is a set supermodular function in the choice of the sensors S1:K. Proof: Denote PK k=1 Pm i=1 si(tk)U (ki) in (13) (Lemma 1 in Appendix I) as U(S1:K), −log det(Σ(ˆx1:K|S1:K)) as h(S1:K), and h(S1:K ∪{a}) −h(S1:K) as ha(S1:K), for any a ∈SK k=1 Mk. To prove that log det(Σ(ˆx1:K|S1:K)) is supermodular, it suffices to prove that h(S1:K) is submodular. In particular, h is submodular if and only if ha is a non-increasing, for any a ∈SK k=1 Mk. To this end, we follow the proof of Theorem 6 in [37]: first, observe that: ha(S1:K) = log det(U(S1:K ∪{a}) + C(x1:K)−1)− log det(U(S1:K) + C(x1:K)−1) = log det(U(S1:K) + U({a}) + C(x1:K)−1)− log det(U(S1:K) + C(x1:K)−1). For S1:K ⪯ S′ 1:K and t ∈ [0, 1], define O(t) ≡ C(x1:K)−1 + U(S1:K) + t(U(S′ 1:K) −U(S1:K)) and g(t) ≡ log det (O(t) + U({a})) −log det (O(t)) . Then, g(0) = ha(S1:K) and g(1) = ha(S′ 1:K). Moreover, since: d log det(O(t)) dt = tr  O(t)−1 dO(t) dt  (eq. (43) in [38]), ˙g(t) = tr  ((O(t) + U({a}))−1 −O(t)−1)L  , where L ≡U(S′ 1:K) −U(S1:K). From Proposition 8.5.5 in [36], (O(t) + U({a}))−1 ⪯O(t)−1 (O(t) ≻0, since C(x1:K)−1 ≻0, U(S1:K) ⪰0, and U(S′ 1:K) ⪰U(S1:K)). Moreover, from Corollary 8.3.6 in [36], all the eigenvalues of ((O(t) + U({a}))−1 −O(t)−1)L are non-positive. As a result, ˙g(t) ≤0, and ha(S′ 1:K) = g(1) = g(0)+ R 1 0 ˙g(t)dt ≤ g(0) = ha(S1:K). Therefore, ha is non-increasing, and the proof is complete. Proposition 2 implies that as we increase at each tk the number of sensors used, the marginal improvement we get on the estimation error of x1:K diminishes. We are now ready for the proof of Theorem 2. C. Proof of Theorem 2 We first provide the definition of a matroid, and then continue with the main proof: Definition 3: Consider a finite set X and a collection C of subsets of X. (X, C) is: • an independent system if and only if: – ∅∈C, where ∅denotes the empty set – for all X′ ⊆X ⊆X, if X ∈C, X′ ∈C. • a matroid if and only if in addition to the previous two properties: – for all X′, X ∈C where |X′| < |X|, there exists x /∈X′ and x ∈X such that X′ ∪{x} ∈C. Proof: [of Part 1 of Theorem 2] We use the next result from the literature of maximization of submodular functions over matroid constraints: Lemma 2 (Ref. [29]): Consider K independence systems {(Xk, Ck)}k∈[K], each the intersection of at most P matroids, 7 and a submodular and non-decreasing function h : X1:K 7→ R. There exist a polynomial time greedy algorithm that returns an (approximate) solution S1:K to: maximize S1:K⪯X1:K h(S1:K) subject to Sk ∩Xk ∈Ck, k ∈[K], (19) that satisfies: h(O) −h(S1:K) h(O) −h(∅) ≤ P 1 + P , (20) where O is an (optimal) solution to (19). In particular, we prove: Lemma 3: Problem 1 is an instance of (19) with P = 1. Proof: We identify the instance of {Xk, Ck}k∈[K] and h, respectively, that translate (19) to Problem 1: Given K distinct copies of [m], denoted as M1, M2, . . . , MK, first consider Xk = Mk and Ck = {S|S ⊆ Mk, |S| ≤ rk}: (Xk, Ck) satisfies the first two points in part 1 of Definition 3, and as a result is an independent system. Moreover, by its definition, Sk ∩Xk ∈Ck if and only if |Sk| ≤rk. Second, for all S1:K ⪯X1:K, consider: h(S1:K) = −log det(Σ(ˆx1:K|S1:K)). From Propositions 1 and 2, h(S1:K) is set submodular and non-decreasing. In addition to Lemma 3, the independence system (Xk, Ck), where Xk = Mk and Ck = {S|S ⊆ Mk, |S| ≤r}, satisfies also the point in part 2 of Definition 3; thereby, it is also a matroid and as a result P, as defined in Lemma 2, is equal to 1. This observation, along with Lemmas 2 and 3 complete the proof of (6), since the adaptation to Problem 1 of the greedy algorithm in [29] (Theorem 4.1) results to Algorithm 1. Proof: [of Part 2 of Theorem 2] In Lemma 1 in Ap- pendix I we prove that Σ(ˆx1:K) is the sum of two matrices: the first matrix is a block diagonal matrix, and the second one is the inverse of the covariance of x1:K, C(x1:K). The block diagonal matrix is computed in O(n2.4K) time. Moreover, by extending the result in [10] (Theorem 1), we get that C(x1:K)−1 is a block tri-diagonal matrix, that is described by the (K −1) transition matrices Φ(tk+1, tk) [22], where k ∈[K −1], and K identity matrices. For continuous time systems, the time complexity to compute all the block elements in C(x1:K)−1 is O(n3K) [39]; for discrete time systems, it is O(n2.4K) [22]. This computation 7Any independence system can be expressed as the intersection of a finite number of matroids [27]. of C(x1:K)−1 is made only once. Finally, from Theorem 2 in [31], we can now compute the det(Σ(ˆx1:K)) in O(n2.4K) time, since Σ(ˆx1:K) is block tri-diagonal. Therefore, the overall time complexity of Algorithm 1 is: O(n3K) + O(2n2.4K PK k=1 r2 k) = O(n2.4K PK k=1 r2 k) for K large, since C(x1:K)−1 is computed only once, and Algorithm 1 requests at most PK k=1 r2 k evaluations of Σ(ˆx1:K). The proof is complete. APPENDIX IV PROOF OF THEOREM 3 Proof: Since the arithmetic mean of a finite set of positive numbers is at least as large as their harmonic mean, tr(Σ(ˆx1:K)) ≥ (nK)2 tr PK k=1 Pm i=1 si(tk)U (ki) + C(x1:K)−1 . In the denominator, we have for the second term: tr(C(x1:K)−1) ≤nKσ(−1) w ; and for the first term: tr K X k=1 m X i=1 si(tk)U (ki) ! = K X k=1 m X i=1 si(tk)tr(U (ki)), where tr(U (ki)) ≤ nK∥C1:K∥2 2∥C(v1:K)−1∥2, since ∥I(ki)∥2 = 1. Then, we get the upper bound: tr K X k=1 m X i=1 si(tk)U (ki) ! ≤nKσ(−1) v ∥C1:K∥2 2 K X k=1 rk ≤nK2σ(−1) v ∥C1:K∥2 2 max k∈[K](rk). Overall: (we denote maxk∈[K](rk) as rM) tr(Σ(ˆx1:K)) ≥ (nK)2 nK2σ(−1) v rM∥C1:K∥2 2 + nKσ(−1) w = n σ(−1) v rM∥C1:K∥2 2 + σ(−1) w /K , and the proof is complete. REFERENCES [1] V. Kumar, D. Rus, and S. Singh, “Robot and sensor networks for first responders,” IEEE Pervasive computing, vol. 3, no. 4, pp. 24–33, 2004. [2] R. Nowak, U. Mitra, and R. Willett, “Estimating inhomogeneous fields using wireless sensor networks,” IEEE Journal on Selected Areas in Communications, vol. 22, no. 6, pp. 999–1006, 2004. [3] M. P. Vitus, “Sensor placement for improved robotic navigation,” Robotics: Science and Systems VI, p. 217, 2011. [4] E. Masazade, M. Fardad, and P. K. Varshney, “Sparsity-promoting extended kalman filtering for target tracking in wireless sensor net- works,” IEEE Signal Processing Letters, vol. 19, no. 12, pp. 845–848, 2012. [5] H. Rowaihy, S. Eswaran, M. Johnson, D. Verma, A. Bar-Noy, T. Brown, and T. La Porta, “A survey of sensor selection schemes in wireless sensor networks,” in Defense and Security Symposium. International Society for Optics and Photonics, 2007, pp. 65 621A– 65 621A. [6] T. Kailath, A. H. Sayed, and B. Hassibi, Linear estimation. Prentice Hall Upper Saddle River, NJ, 2000, vol. 1. [7] A. O. Hero III and D. Cochran, “Sensor management: Past, present, and future,” IEEE Sensors Journal, vol. 11, no. 12, pp. 3064–3075, 2011. [8] S. Liu, M. Fardad, E. Masazade, and P. K. Varshney, “Optimal periodic sensor scheduling in networks of dynamical systems,” IEEE Transactions on Signal Processing, vol. 62, no. 12, pp. 3055–3068. [9] M. Kaess, A. Ranganathan, and F. Dellaert, “isam: Incremental smoothing and mapping,” IEEE Transactions on Robotics, vol. 24, no. 6, pp. 1365–1378, 2008. [10] S. Anderson, T. D. Barfoot, C. H. Tong, and S. S¨arkk¨a, “Batch non- linear continuous-time trajectory estimation as exactly sparse gaussian process regression,” Autonomous Robots, vol. 39, no. 3, pp. 221–238, 2015. [11] H. Strasdat, J. Montiel, and A. J. Davison, “Real-time monocular slam: Why filter?” in 2010 IEEE International Conference on Robotics and Automation (ICRA),. IEEE, 2010, pp. 2657–2664. [12] V. Roy, A. Simonetto, and G. Leus, “Spatio-temporal sensor manage- ment for environmental field estimation,” Signal Processing, vol. 128, pp. 369 – 381, 2016. [13] M. Shamaiah, S. Banerjee, and H. Vikalo, “Greedy sensor selection: Leveraging submodularity,” in 49th IEEE Conference on Decision and Control (CDC),, 2010, pp. 2572–2577. [14] J. Liu, D. Petrovic, and F. Zhao, “Multi-step information-directed sensor querying in distributed sensor networks,” in Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03). 2003 IEEE International Conference on, vol. 5. IEEE, 2003, pp. V–145. [15] M. P. Vitus, W. Zhang, A. Abate, J. Hu, and C. J. Tomlin, “On efficient sensor scheduling for linear dynamical systems,” Automatica, vol. 48, no. 10, pp. 2482–2493, 2012. [16] S. Joshi and S. Boyd, “Sensor selection via convex optimization,” IEEE Transactions on Signal Processing, vol. 57, no. 2, pp. 451–462, 2009. [17] Y. Mo, R. Ambrosino, and B. Sinopoli, “Sensor selection strategies for state estimation in energy constrained wireless sensor networks,” Automatica, vol. 47, no. 7, pp. 1330–1338, 2011. [18] H. Zhang, R. Ayoub, and S. Sundaram, “Sensor selection for optimal filtering of linear dynamical systems: Complexity and approximation,” in IEEE Conference on Decision and Control (CDC), 2015. [19] S. T. Jawaid and S. L. Smith, “Submodularity and greedy algorithms in sensor scheduling for linear dynamical systems,” Automatica, vol. 61, pp. 282–288, 2015. [20] M. F. Huber, A. Kuwertz, F. Sawo, and U. D. Hanebeck, “Distributed greedy sensor scheduling for model-based reconstruction of space- time continuous physical phenomena,” in Information Fusion, 2009. FUSION’09. 12th International Conference on. IEEE, 2009, pp. 102– 109. [21] A. Krause, A. Singh, and C. Guestrin, “Near-optimal sensor place- ments in gaussian processes: Theory, efficient algorithms and empirical studies,” The Journal of Machine Learning Research, vol. 9, pp. 235– 284, 2008. [22] C.-T. Chen, Linear System Theory and Design, 3rd ed. New York, NY, USA: Oxford University Press, Inc., 1998. [23] S. Venkatesh, The Theory of Probability: Explorations and Applica- tions. Cambridge University Press, 2012. [24] A. Olshevsky, “Minimal controllability problems,” IEEE Transactions on Control of Network Systems, vol. 1, no. 3, pp. 249–258, 2014. [25] S. Pequito, S. Kar, and A. Aguiar, “A framework for structural in- put/output and control configuration selection in large-scale systems,” IEEE Transactions on Automatic Control, vol. PP, no. 99, pp. 1–1, 2015. [26] J. Vondr´ak, “Submodularity and curvature: the optimal algorithm,” RIMS Kokyuroku Bessatsu B, vol. 23, pp. 253–266, 2010. [27] G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Optimization. New York, NY, USA: Wiley-Interscience, 1988. [28] U. Feige, “A threshold of ln n for approximating set cover,” J. ACM, vol. 45, no. 4, pp. 634–652, Jul. 1998. [29] M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey, An analysis of approximations for maximizing submodular set functionsII. Springer, 1978. [30] M. Minoux, “Accelerated greedy algorithms for maximizing submod- ular set functions,” in Optimization Techniques. Springer, 1978, pp. 234–243. [31] L. G. Molinari, “Determinants of block tridiagonal matrices,” Linear algebra and its applications, vol. 429, no. 8, pp. 2221–2226, 2008. [32] A. Nemirovski, “Interior point polynomial time methods in convex programming.” [33] J. E. Weimer, B. Sinopoli, and B. H. Krogh, “A relaxation approach to dynamic sensor selection in large-scale wireless networks,” in 28th International Conference on Distributed Computing Systems Workshops (ICDCS), 2008, pp. 501–506. [34] V. Gupta, T. H. Chung, B. Hassibi, and R. M. Murray, “On a stochastic sensor selection algorithm with applications in sensor scheduling and sensor coverage,” Automatica, vol. 42, no. 2, pp. 251–260, 2006. [35] A. S. Leong, S. Dey, and J. S. Evans, “Asymptotics and power allo- cation for state estimation over fading channels,” IEEE Transactions on Aerospace and Electronic Systems, vol. 47, no. 1, pp. 611–633, January 2011. [36] D. S. Bernstein, Matrix mathematics: theory, facts, and formulas. Princeton University Press, 2009. [37] T. H. Summers, F. L. Cortesi, and J. Lygeros, “On submodularity and controllability in complex dynamical networks,” IEEE Transactions on Control of Network Systems, 2015, in press. [38] K. B. Petersen and M. S. Pedersen, The matrix cookbook, 2012. [39] C. Moler and C. Van Loan, “Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later,” SIAM review, vol. 45, no. 1, pp. 3–49, 2003.