Joint Estimation and Localization in Sensor Networks Nikolay Atanasov, Roberto Tron, Victor M. Preciado, and George J. Pappas Abstract—This paper addresses the problem of collaborative tracking of dynamic targets in wireless sensor networks. A novel distributed linear estimator, which is a version of a distributed Kalman filter, is derived. We prove that the filter is mean square consistent in the case of static target estimation. When large sensor networks are deployed, it is common that the sensors do not have good knowledge of their locations, which affects the target estimation procedure. Unlike most existing approaches for target tracking, we investigate the performance of our filter when the sensor poses need to be estimated by an auxiliary localization procedure. The sensors are localized via a distributed Jacobi algorithm from noisy relative measurements. We prove strong convergence guarantees for the localization method and in turn for the joint localization and target estimation approach. The performance of our algorithms is demonstrated in simulation on environmental monitoring and target tracking tasks. I. INTRODUCTION A central problem in networked sensing systems is the estimation and tracking of the states of dynamic phenomena of interest that evolve in the sensing field. Potential applications include environmental monitoring [1], [2], surveillance and reconnaissance [3], [4], social networks [5]. In most situations, individual sensors receive partially informative measurements which are insufficient to estimate the target state in isolation. The sensors need to engage in information exchange with one another and solve a distributed estimation problem. To compli- cate matters, it is often the case that the sensors need to know their own locations with respect to a common reference in order to utilize the target measurements meaningfully. Hence, in general, the sensors face a joint localization and estima- tion problem. Virtually all existing work in distributed target estimation assumes implicitly that the localization problem is solved, while all the literature on localization does not consider the effect of the residual errors on a common estimation task. The goal of this paper is to show that the two problems can be solved jointly, and that, with simple measurement models, the resulting estimates have strong convergence guarantees. Assumptions and contributions. We assume that the sensors obtain linear Gaussian measurements of the target state and repeated sequential measurements of their relative positions along the edges of a graph. Our contributions are as follows: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. This work was supported by ONR-HUNT grant N00014-08-1-0696 and by TerraSwarm, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA. N. Atanasov, V. Preciado, and G. Pappas are with the Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA 19104, {atanasov, preciado, pappasg}@seas.upenn.edu. R. Tron is with the Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, tron@seas.upenn.edu • We derive a distributed linear estimator for tracking dynamic targets. We prove that the filter is mean-square consistent in the case of a static target. • We provide a distributed algorithm for sensor localization from sequential relative measurements and prove mean- square and strong consistency. • We prove mean-square consistency of the joint localiza- tion and target estimation procedure. Related work. Our target estimation algorithm was inspired by Rahnama Rad and Tahbaz-Salehi [6], who propose an algorithm for distributed static parameter estimation using nonlinear sensing models. We specialize their model to het- erogeneous sensors with linear Gaussian observations, show stronger convergence results (mean-square consistency instead of weak consistency), and then generalize the solution to dynamic targets. Our filter is similar to the Kalman-Consensus [7], [8] and the filter proposed by Khan et al. [9], [10]. Khan et al. [9] show that a dynamic target can be tracked with bounded error if the norm of the target system matrix is less than the network tracking capacity. Shahrampour et al. [10] quantify the estimation performance using a global loss function and show that the asymptotic estimation error depends on its de- composition. Kar et al. [11] study distributed static parameter estimation with nonlinear observation models and noisy inter- sensor communication. Related work also includes [12], which combines the Jacobi over-relaxation method with dynamic consensus to compute distributed weighted least squares. Our localization algorithm follows the lines of the Jacobi algorithm, first proposed for localization in sensor networks by Barooah and Hespanha [13], [14]. In contrast with their approach, we consider repeated relative measurements and show strong convergence guarantees for the resulting sequen- tial localization algorithm. Other work in sensor network local- ization considers nonlinear and less informative measurement models than those used in this paper. For instance [15], [16], [17], [18] address the problem of localization using range- only measurements, which is challenging because a graph with specified edge lengths can have several embeddings in the plane. Khan et al. [19] introduce a distributed localization (DILOC) algorithm, which uses the barycentric coordinates of a node with respect to its neighbors and show its convergence via a Markov chain. Diao et al. [20] relax the assmuption of DILOC that all nodes must be inside the convex hull of the anchors. Localization has also been considered in the context of camera networks [21]. Paper organization. The joint localization and estimation problem is formulated precisely in Sec. II. The distributed linear estimator for target tracking is derived in Sec. III as- suming known sensor locations. A distributed Jacobi algorithm arXiv:1404.3580v1 [cs.MA] 14 Apr 2014 is introduced in Sec. IV to localize the sensors using relative measurements when the true locations are unknown. Mean- square and strong consistency are proven. In Sec. V, we show that the error of the target estimator, when combined with the localization procedure, remains arbitrarily small. All proofs are provided in the Appendix. II. PROBLEM FORMULATION Consider a static sensor network composed of n sensors with configurations {x1, . . . , xn} ⊂X ∼= Rd. The configura- tion of a sensor may include its position, orientation, and other operational parameters but we will refer to it, informally, as the sensor’s location. The communication network interconnecting the sensors is represented by an undirected graph G = (V, E) with vertices V := {1, . . . , n} corresponding to the sensors and |E| = m edges. An edge (j, i) ∈E from sensor j to sensor i exists if they can communicate. The set of nodes (neighbors) connected to sensor i is denoted by Ni. The task of the sensors is to estimate and track the state y(t) ∈Y ∼= Rdy of a phenomenon of interest (target), where Y is a convex set. The target evolves according to the following target motion model: y(t + 1) = Fy(t) + η(t), η(t) ∼N(0, W), (1) where η(t) is the process noise, whose values at any pair of times are independent. Sensor i, depending on its location xi, can obtain a measurement zi(t) of the target state y(t) at time t according to the following sensor observation model: zi(t) = Hi(xi)y(t)+vi(t, xi), vi(t, xi) ∼N(0, Vi(xi)), (2) where vi(t, xi) is a sensor-state-dependent measurement noise specific to sensor i, which is independent at any pair of times and across different sensors. The measurement noise is independent of the target noise η(t) as well. The signals, zi(t), observed by a single sensor, although potentially informative, do not reveal the target state completely, i.e. each sensor faces a local identification problem. We assume, however, that the target is observable if one has access to the signals received by all sensors. The sensors need to know their locations in order to use the signals zi(t) to estimate the targets state. However, when large sensor networks are deployed, it is common that the sensors do not have good knowledge of their positions but instead only a rough estimate (prior). We suppose that each sensor has access to noisy relative measurements of the positions of its neighbors1, which can be used to localize the sensors. In particular, at time t sensor i receives the following noisy relative configuration measurement from its neighbor j: sij(t) = xj −xi + ϵij(t), ϵij(t) ∼N(0, Eij), (3) where ϵij(t) is a measurement noise, which is independent at any pair of times and across sensor pairs. The relative measure- ment noises are independent of the target measurement and 1The graphs describing the communication and the relative measurement topologies might be different in practice. However, we assume that they are the same in order to simplify the presentation. motion noises too. Since there is translation ambiguity in the measurements (3) we assume that all sensors agree to localize themselves in the reference frame of sensor 1. The location estimates can then be used in place of the unknown sensor positions during the target estimation procedure. The joint localization and estimation problem is summarized below. Problem (Joint Estimation and Localization). The task of each sensor i is to construct estimators ˆxi(t) and ˆyi(t) of its own location xi and of the target state y in a distributed manner, i.e. using information only from its neighbors and the measurements {sij(t) | j ∈Ni} and {zi(t)}. To illustrate the results, we use two scenarios which fit our models throughout the paper. The first is environmental monitoring problem in which a sensor network of remote methane leak detectors (RMLD), based on tunable diode laser absorption spectroscopy, is deployed to estimate the methane concentration in a landfill. The methane field is assumed static (i.e. F = Idy, W = 0) and can be modeled by discretizing the environment into cells and representing the gas concentration with a discrete Gaussian random field, y ∈Rdy (See Fig.5). It was verified experimentally in [22] that the RMLD sensors fit the linear model in (2). Second, we consider tracking a swarm of mobile vehicles via a sensor network using range and bearing measurements (See Fig. 1). The position (y1 j , y2 j ) ∈R2 and velocity ( ˙y1 j , ˙y2 j ) ∈R2 of the jth target have discretized double integrator dynamics driven by Gaussian noise: yj(t + 1)= I2 τI2 0 I2  yj(t) + ηj(t), W := q " τ 3 3 I2 τ 2 2 I2 τ 2 2 I2 τI2 # , where yj = [y1 j , y2 j , ˙y1 j , ˙y2 j ]T is the j-th target state, τ is the sampling period is sec, and q is a diffusion strength measured in ( m sec2 )2 1 Hz. Each sensor in the network takes noisy range and bearing measurements of the target’s position: zij(t) = " q (y1 j −x1 i )2 + (y2 j −x2 i )2 arctan (y2 j −x2 i )/(y1 j −x1 i )  # + v(t, xi, yj), (4) where xi := (x1 i , x2 i ) ∈R2 is the sensor’s location and the noise v grows linearly with the distance between the sensor and the target. The observation model is nonlinear in this case so we resort to linearization in order to apply our framework. III. DISTRIBUTED TARGET ESTIMATION We begin with the task of estimating and tracking the dynamic state of a target via the sensor network. For now we assume that the sensors know their positions and con- centrate on the estimation task. We specializing the general parameter estimation scheme of Rahnama Rad and Tahbaz- Salehi [6] to linear Gaussian observation models such as (2). We show that the resulting distributed linear filter is mean- square consistent2 when the target is stationary. This result is 2A distributed estimator of a parameter y is weakly consistent if all esti- mates, ˆyi(t), converge in probability to y, i.e. lim t→∞P ∥ˆyi(t) −y∥≥ϵ  = 0 for any ϵ > 0 and all i. It is mean-square consistent if all estimates converge in L2 to y, i.e. lim t→∞E  ∥ˆyi(t) −y∥2 = 0, ∀i. −100 −80 −60 −40 −20 0 20 40 60 −60 −40 −20 0 20 40 x [m] y [m] Fig. 1: A realization of the target tracking scenario in which a sensor network with 40 nodes tracks 10 mobile targets via range and bearing measurements −50 0 50 −50 0 50 x [m] y [m] Iteration 0 −50 0 50 −50 0 50 x [m] Iteration 20 Fig. 2: Initial and final (after 20 steps) node locations estimated by the distributed localization algorithm on a randomly generated graph with 300 nodes ans 1288 edges stronger than the weak consistency2 shown in the general non- Gaussian case in [6, Thm.1]. Suppose for now that the target is stationary, i.e. y := y(0) = y(1) = . . .. To introduce the estimation scheme from [6], suppose also that instead of the linear Gaussian measurements in (2), the sensor measurements zi(t) are drawn from a general distribution with conditional probability density function (pdf) li(· | y). As before, the signals observed by sensor i are iid over time and independent from the observations of all other sensors. In order to aggregate the information provided to it over time - either through observations or communication with neighbors - each sensor i holds and updates a pdf pi,t over the target state space Y. Consider the following distributed estimation algorithm: pi,t+1(y) = ξi,tli(zi(t + 1) | y) Y j∈Ni∪{i} pj,t(y) κij, ˆyi(t) ∈arg max y∈Y pi,t(y), (5) where ξi,t is a normalization constant ensuring that pi,t+1 is a proper pdf and κij > 0 are weights such that P j∈Ni∪{i} κij = 1. The update is the same as the standard Bayes rule with the exception that sensor i does not just use its own prior but a geometric average of its neighbors’ priors. Given a connected graph, the authors of [6] show that (5) is weakly consistent under broad assumptions on the observation models li. Next, we specialize the estimator in (5) to the linear Gaussian measurement model in (2). Let G(ω, Ω) denote a Gaussian distribution (in information space) with mean Ω−1ω and covariance matrix Ω−1. The quantities ω and Ω are conventionally called information vector and information matrix, respectively. Suppose that the pdfs pi,t of all sensors i ∈V at time t are that of Gaussian distributions G(ωi,t, Ωi,t). We claim that the posteriors resulting from applying the update in (5) remain Gaussian. Lemma 1 ([23, Thm.2]). Let Yi ∼G(ωi, Ωi) for i = 1, . . . , n be a collection of random Gaussian vectors with associated weights κi. The weighted geometric mean, Qn i=1 pκi i , of their pdfs pi is proportional to the pdf of a random vector with distribution G Pn i=1 κiωi, Pn i=1 κiΩi  . Lemma 2 ([23, Thm.2]). Let Y ∼ G(ω, Ω) and V ∼ G(0, V −1) be random vectors. Consider the linear transforma- tion Z = HY + V. The conditional distribution of Y | Z = z is proportional to G(ω + HT V −1z, Ω+ HT V −1H). Lemma 1 says that if the sensor priors are Gaussian G(ωi,t, Ωi,t), then after applying the geometric averaging in (5) the resulting distribution will still be Gaussian and its information vector and information matrix will be weighted averages of the prior ones. Lemma 2 says that after applying Bayes rule the distribution will remain Gaussian. Combining the two allows us to derive the following linear Gaussian version of the estimator in (5): ωi,t+1 = X j∈Ni∪{i} κijωj,t + HT i V −1 i zi(t), Ωi,t+1 = X j∈Ni∪{i} κijΩj,t + HT i V −1 i Hi, (6) where Hi := Hi(xi) and Vi := Vi(xi). The estimate of sensor i at time t of the true target state y is: ˆyi(t) := Ω−1 i,t ωi,t. (7) In this linear Gaussian case, we prove a strong result about the quality of the estimates. Theorem 1. Suppose that the communication graph G is connected and the matrix  HT 1 . . . HT n T has rank dy. Then, the estimates (7) of all sensors converge in mean square to y, i.e. lim t→∞E  ∥ˆyi(t) −y∥2 2  = 0 for all i. The estimation procedure in (6), (7) can be extended to track a dynamic target as in (1) by adding a local prediction step, same as that of the Kalman filter, at each sensor. The distributed linear filter is summarized in Alg. 1 and Thm. 1 guarantees its mean-square consistency for stationary targets. Its performance on dynamic targets was studied in the target tracking scenario introduced in Sec. II and the results are presented in Fig. 1 and Fig. 3. IV. LOCALIZATION FROM RELATIVE MEASUREMENTS Target tracking via the distributed estimator in Alg. 1 requires knowledge of the true sensor locations. As mentioned 0 10 20 30 40 50 60 70 80 0 5 10 15 Position RMSE [m] Time Steps 0 10 20 30 40 50 60 70 80 0 2 4 6 8 10 Time Steps Velocity RMSE [m/s] Fig. 3: Root mean squared error (RMSE) of the estimated target position and velocity obtained from averaging 50 simulated runs of the distributed linear estimator in the target tracking scenario (Fig. 1). The error increases because as targets move away from the sensor network, the covariance of the measurement noise grows linearly with distance. The errors of node 1 (blue) are lower because it was always placed at the origin and thus close to the starting target positions. 0 5 10 15 20 25 30 0 0.2 0.4 0.6 0.8 1 Time Steps Position RMSE [m] Fig. 4: Root mean squared error (RMSE) of the location estimates obtained from averaging 50 simulated runs of the distributed localization alogorithm with randomly generated graphs with 300 nodes (e.g. Fig. 2) Algorithm 1 Distributed Linear Estimator Input: Prior (ωi,t, Ωi,t), messages (ωj,t, Ωj,t), ∀j ∈Ni, and measure- ment zi(t) Output: (ωi,t+1, Ωi,t+1) Update Step: ωi,t+1 = X j∈Ni∪{i} κijωj,t + HT i V −1 i zi(t) Ωi,t+1 = X j∈Ni∪{i} κijΩj,t + HT i V −1 i Hi ˆyi(t + 1) = Ω−1 i,t+1ωi,t+1 Prediction Step: Ωi,t+1 = (FΩ−1 i,t+1F T + W)−1 ωi,t+1 = Ωi,t+1F ˆyi(t + 1) earlier this is typically not the case, especially when large sensor networks are deployed. This section describes a method for localization from relative measurements (3), whose strong convergence guarantees can be used to analyze the conver- gence of a joint localization and estimation procedure. The relative measurements, received by all sensors at time t, can be written in matrix form as follows: s(t) = (B ⊗Id)T x + ϵ(t), where B ∈Rn×m is the incidence matrix of the communica- tion graph G. All sensors agree to localize relative to node 1 and know that x1 = 0. Let ˜B ∈R(n−1)×m be the incidence matrix with the row corresponding to sensor 1 removed. Further, define E := E[ϵ(t)ϵ(t)T ] = diag(E1, . . . , Em), where {Ek} is an enumeration of the noise covariances associated with the edges of G. Given t measurements, the least squares estimate of x leads to the classical Best Linear Unbiased Estimator (BLUE), given by: ˆx(t) := ˜BE−1 ˜BT −1 ˜BE−1 t−1 X τ=0 s(τ), (8) where the inverse of ˜BE−1 ˜BT exists as long as the graph G is connected [13]. Among all linear estimators of x, BLUE has the smallest variance for the estimation error [24]. The computation in (8) can be distributed via a Jacobi algorithm for solving a linear system as follows. Each sensor maintains an estimate ˆxi(t) of its own state at time t and a history of the av- eraged measurements, σi(t) := 1 t+1 Pt τ=0 P j∈Ni E−1 ij sij(τ), received up to time t. Given prior estimates (ˆxi(t), σi(t)), the update of the distributed Jacobi algorithm at sensor i is: ˆxi(t + 1) =  X j∈Ni E−1 ij −1 X j∈Ni E−1 ij ˆxj(t) −σi(t)  , σi(t + 1) = 1 t + 1  tσi(t) + X j∈Ni E−1 ij sij(t)  . (9) Barooah and Hespanha [13], [14] show that, with a single round of relative measurements, the the Jacobi algorithm provides an unbiased estimate of x. Here, we incorporate repeated sequential measurements and prove much stronger performance guarantee. Theorem 2. Suppose that the communication graph G is con- nected. Then, the estimates ˆxi(t) of the sensor configurations in (9) are mean-square and strongly consistent estimators of the true sensor states, i.e.: lim t→∞E  ∥ˆxi(t)−xi∥2 2  = 0, P lim t→∞∥ˆxi(t)−xi∥2 = 0  = 1, ∀i The performance of our distributed localization algorithm was analyzed on randomly generated graphs with 300 nodes. The location priors were chosen from a normal distribution with standard deviation of 5 meters from the true node positions. An instance of the localization task is illustrated in Fig. 2, while the estimation error is shown in Fig. 4. V. JOINT LOCALIZATION AND ESTIMATION Having derived separate estimators for the sensor locations and the target state, we are ready to return to the original problem of joint localization and estimation. At time t, the location estimates {ˆxi(t)} in (9) can be used in the target estimator (6), (7) instead of the true sensor positions. It is important to analyze the evolution of the coupled estimation procedure because it is not clear that the convergence result in Thm. 1 will continue to hold. Define the sensor information matrix Mi(x) := Hi(x)T Vi(x)−1Hi(x). In an analogy with the centralized Kalman filter, the sensor information matrix captures the amount of information added to the inverse of the covariance matrix during an update step of the Riccati map. From this point of view, it is natural to describe sensor properties in terms of the sensor information matrix. A regular- ity assumption which stipulates that nearby sensing locations provide similar information gain is necessary. 2 4 6 8 10 12 14 2 4 6 8 10 12 14 0 10 20 30 40 0 0.2 0.4 0.6 0.8 1 Position RMSE [m] Time Steps 0 10 20 30 40 0 20 40 60 80 100 Estimation RMSE Time Steps 0 10 20 30 40 0 20 40 60 80 100 Estimation RMSE Time Steps Fig. 5: Methane emission monitoring via a sensor network. The true (unknown) sensor locations (red dots), the sensing range (red circle), and a typical realization of the methane field are shown on the left. The root mean squared error (RMSE) of the location estimates and of the field estimates obtained from averaging 50 simulated runs of the joint localization and estimation algorithm with continuous sensor observation models are shown in the two middle plots. In an additional experiment, the sensors were placed on the boundaries of the cells of the discretized field. As the observation model for each sensor was defined in terms of the proximal environment cells, this made the model discontinuous. The rightmost plot illustrates that the field estimation error does not vanish when discontinuities are present. Assumption (Observation Model Continuity). The sensor in- formation matrices Mi(x) are bounded3 continuous functions of x for all i. The following theorem ensures that the target state estimator retains its convergence properties when used jointly with the distributed localization procedure. Theorem 3. Let {ˆxi(t)} be strongly consistent estimators of the sensor configurations, i.e. ˆxi(t) a.s. −−→xi, ∀i. Suppose that the communication graph G is connected and the matrix  H1(x1)T . . . Hn(xn)T T has rank dy. Let δ > 0 be ar- bitrary. If each sensor i updates its target estimate (ωi,t, Ωi,t) as follows: ωi,t+1 = X j∈Ni∪{i} κijωj,t + bHT i,t bV −1 i,t zi(t), Ωi,t+1 = X j∈Ni∪{i} κijΩj,t + bHT i,t bV −1 it bHi,t, ˆyi(t + 1) = Ωi,t+1 + (t + 1)δId −1ωi,t+1, (10) where bHi,t := Hi(ˆxi(t)) and bVi,t := Vi(ˆxi(t)), then the asymptotic mean-square error of target estimates is O(δ2): lim t→∞E  ∥ˆyi(t) −y∥2 2  = δ2yT n X j=1 πjMj(xj) + δI −2y, for all i, where y is the true target state and xj is the true position of sensor j. The combined procedure specified by (9) and (10) provides a mean-square consistent way to estimate the sensor locations and the target state jointly. The performance of the joint algo- rithm was evaluated on the methane concentration estimation problem and the results are summarized in Fig. 5. VI. CONCLUSION This paper studied the problem of joint target tracking and node localization in sensor networks. A distributed linear estimator for tracking dynamic targets was derived. It was 3There exists a constant q such that ∥Mi(x)∥≤q < ∞for all i and x. proven that the filter is mean-square consistent when esti- mating static states. Next, a distributed Jacobi algorithm was proposed for localization and its mean-square and almost sure consistency were shown. Finally, the combined localization and target estimation procedure was shown to have arbirarily small asymptotic estimation error. Future work will focus on strengthening the result in Thm. 3 to mean-square consistency and relaxing the assumption of a strongly consistent localization procedure. Studying the rela- tionship between our distributed linear estimator, the Kalman- Consensus filter [8], and the filter proposed by Khan et al. [9] is of interest as well. APPENDIX A: PROOF OF THEOREM 1 Define the following: ωt :=  ωT 1t . . . ωT nt T Ωt :=  ΩT 1t . . . ΩT nt T Mi := Hi(xi)T V −1 i (xi)Hi(xi) M :=  M T 1 . . . M T n T ζ(t) :=  H1V −T 1 v1(t)T . . . HnV −T n vn(t)T T . The update equations of the filter (6) in matrix form are: ωt+1 = K ⊗Idy  ωt + My + ζ(t), Ωt+1 = K ⊗Idy  Ωt + M, (11) where K = [κij] with κij = 0 if j /∈Ni ∪{i} is a stochastic matrix. The solutions of the linear systems are: ωt = K ⊗Idy tω0 + t−1 X τ=0 K ⊗Idy t−1−τ  My + ζ(τ)  , Ωt = K ⊗Idy tΩ0 + t−1 X τ=0 K ⊗Idy t−1−τM. Looking at the i-th components again, we have: ωit t + 1 := 1 t + 1 n X j=1  Kt ijωj0+ 1 t + 1 t−1 X τ=0 n X j=1  Kt−τ−1 ij(Mjy + HT j V −1 j vj(τ)), Ωit t + 1 := 1 t + 1 n X j=1  Kt ijΩj0 + 1 t + 1 t−1 X τ=0 n X j=1  Kt−τ−1 ijMj. Define the following to simplify the notation: git := 1 t+1 Pn j=1  Kt ijωj0, Git := 1 t+1 Pn j=1  Kt ijΩj0, φit := 1 t+1 Pt−1 τ=0 Pn j=1  Kt−τ−1 ijHT j V −1 j vj(τ), Cit := 1 t+1 Pt−1 τ=0 Pn j=1  Kt−τ−1 ijMj, bit := git −Gity, Bit := 1 t+1Ωit. (12) With the shorthand notation: ωit t + 1 = git+φit+City, Bit = Ωit t + 1 = Git+Cit, (13) where φit is the only random quantity. Its mean is zero because the measurement noise has zero mean, while its covariance is: E[φitφT it] = 1 (t + 1)2 E t−1 X τ=0 n X j=1  Kt−τ−1 ijHT j V −1 j vj(τ)  × t−1 X s=0 n X η=1  Kt−s−1 iηHT η V −1 η vη(s) T  = 1 (t + 1)2 n X j=1 t−1 X τ=0  Kt−τ−12 ijHT j V −1 j E[vj(τ)vj(τ)T ]V −1 j Hj = 1 (t + 1)2 n X j=1 t−1 X τ=0  Kt−τ−12 ijMj ⪯ 1 t + 1Cit, (14) where the second equality uses the fact that vj(τ) and vη(s) are independent unless the indices coincide, i.e. Evj(τ)vη(s)T = δτsδjηVj. The L¨owner ordering inequality in the last step uses that 0 ≤  Kt−τ−1 ij ≤1 and Mj ⪰0. Since G is connected, K corresponds to the transition matrix of an aperiodic irreducible Markov chain with a unique stationary distribution π so that Kt →π1T with πj > 0. This implies that, as t →∞, the numerators of git and Git remain bounded and therefore git →0 and Git →0. Since Ces´aro means preserve convergent sequences and their limits: 1 t + 1 t−1 X τ=0  Kt−τ−1 ij →πj, ∀i, which implies that Cit →Pn j=1 πjMj. The full-rank as- sumption on  HT 1 . . . HT n T and πj > 0 guarantee that Pn j=1 πjMj is positive definite. Finally, consider the mean squared error: E  (ˆyi(t) −y)T (ˆyi(t) −y)  = E  Ωit t + 1 −1 ωit t + 1 −  Ωit t + 1 −1 Ωit t + 1  y 2 2 = E B−1 it git + City + φit −(Git + Cit)y  2 2 = E∥B−1 it (bit + φit)∥2 2 = E  bT itB−T it B−1 it bit + 2bT itB−T it B−1 it φit + φT itB−T it B−1 it φit  (a) === bT itB−T it B−1 it bit + tr(B−1 it E[φitφT it]BT it) (b) ≤bT itB−T it B−1 it bit + 1 t + 1 tr(B−1 it CitB−T it ) →0, where (a) holds because the first term is deterministic, while the cross term contains E[φit] = 0. Inequality (b) follows from (14). In the final step, as shown before B−1 it → Pn j=1 πjMj −1 and Cit → Pn j=1 πjMj ≻ 0 remain bounded, while bit →0 and 1/(t + 1) →0. APPENDIX B: PROOF OF THEOREM 2 Define the generalized (matrix-weighted) degree matrix D ∈ Rnd×nd of the graph G as a block-diagonal matrix with Dii := P j∈Ni E−1 ij . Since Eij ≻0 for all {i, j} ∈E, the generalized degree matrix is positive definite, D ≻0. Define also the generalized adjacency matrix A ∈Rnd×nd as follows: Aij := ( E−1 ij if {i, j} ∈E, 0 else. The generalized Laplacian and the generalized signless Lapla- cian of G are defined as L := D −A and |L| := D + A, respectively. Further, let R := (B ⊗Id)T ∈Rmd×nd and define the block-diagonal matrix E ∈Rmd×md with blocks Eij for {i, j} ∈ E. It is straightforward to verify that L = RT E−1R ⪰0 and |L| = (|B| ⊗Id)E−1(|B| ⊗Id)T ⪰0, where |B| ∈Rn×m is the signless incidence matrix of G. Let ˜B ∈R(n−1)×m and ˜R ∈Rmd×(n−1)d be the matrices resulting after removing the row corresponding to sensor 1 from B. Similarly, let ˜D, ˜A, ˜L, |˜L| ∈R(n−1)d×(n−1)d denote the generalized degree, adjacency, Laplacian, and signless Laplacian matrices with the row and column corresponding to sensor 1 removed. Thm. 2.2.1 in [14] shows that ˜L ≻0 provided that G is connected. The same approach can be used to show that |˜L| ≻0. Let ˜x ∈R(n−1)d be the locations of sensors 2, . . . , n in the reference frame of sensor 1 and ˆx(t) ∈R(n−1)d be their estimates at time t obtained from (9). The update in (9) can be written in matrix form as follows: ˜Dˆx(t + 1) = ˜Aˆx(t) + ˜RT E−1  ˜R˜x + 1 t + 1 t X τ=0 ϵ(τ)  . (15) Define the estimation error at time t as e(t):= ˜x−ˆx(t) and let u(t) := 1 t+1 Pt τ=0 ϵ(τ). The dynamics of the error state can be obtained from (15): e(t + 1) = ˜x −˜D−1 ˜Aˆx(t) −˜D−1 ˜L˜x −˜D−1 ˜RT E−1u(t) = ˜x −˜D−1 ˜Aˆx(t) −˜D−1  ˜D −˜A  ˜x −˜D−1 ˜RT E−1u(t) = ˜D−1 ˜Ae(t) −˜D−1 ˜RT E−1u(t). The error dynamics are governed by a stochastic linear time- invariant system, whose internal stability depends on the eigenvalues of ˜D−1 ˜A. To show that the error dynamics are stable, we resort to the following lemma. Lemma 3 ([25, Lemma 4.2]). Let L = D −A ∈Cn×n be such that D+D∗≻0 and Lθ = D+D∗−(eiθA+e−iθA∗) ≻0 for all θ ∈R. Then ρ(D−1A) < 1. Consider ˜Lθ := 2( ˜D −cos(θ) ˜A). If cos θ = 0, then ˜Lθ = 2 ˜D ≻0. If cos θ ∈(0, 1], then ˜Lθ ⪰2 cos θ˜L ≻0. Finally, if cos θ ∈[−1, 0), then ˜Lθ ⪰2| cos θ||˜L| ≻0. Thus, we can conclude that ρ ˜D−1 ˜A  < 1. The proof of the theorem is completed by the following lemma with F := ˜D−1 ˜A and G := −˜D−1 ˜RT E−1. Lemma 4. Consider the discrete-time stochastic linear time- invariant system: e(t + 1) = Fe(t) + G 1 t+1 Pt τ=0 ϵ(τ) (16) driven by Gaussian noise ϵ(τ) ∼N(0, E), which is indepen- dent at any pair of times. If the spectral radius of F satisfies ρ(F) < 1, then e(t) a.s.,L2 −−−−→0. Proof. By the strong law of large numbers [26, Thm.2.4.1], u(t) := 1 t+1 Pt τ=0 ϵ(τ) converges to 0 almost surely. Let Ω be the set with measure 1 on which u(t) converges so that for any γ > 0, ∃T ∈N such that ∀t ≥T, ∥u(t)∥≤γ. For realizations in Ω, the solution to (16) with initial time T is: e(t) = Ft−T e(T) + Pt−1 τ=T Ft−τ−1Gu(τ). Then, ∥e(t)∥≤∥Ft−T e(T)∥+ t−1 X τ=T Ft−τ−1 ∥G∥γ. Taking the limit of t and using that F is stable, we have lim t→∞∥e(t)∥≤  ∞ X τ=0 Fτ  ∥G∥γ. Since ρ(F) < 1, the system is internally (uniformly) exponen- tially stable, which is equivalent to P∞ τ=0 ∥Fτ∥≤β for some finite constant β [27, Ch.22]. Thus, limt→∞∥e(t)∥≤β∥G∥γ, which can be made arbitrarily small by choice of γ. We conclude that e(t) →0 on Ωand consequently e(t) a.s. −−→0. Next, we show convergence in L2. First, consider the propagation of the cross term C(t) := (t + 1)Ee(t)u(t)T . Note that Eu(t) = 0 and Eu(t)u(t)T = E t+1. Using the fact that ϵ(t + 1) is independent of e(t) and u(t) we have C(t + 1) = E Fe(t) + Gu(t) (t + 1)u(t) + ϵ(t + 1) T = FC(t) + (t + 1)GEu(t)u(t)T = FC(t) + GE. The solution of the above linear time-invariant system is: C(t) = FtC(0) + Pt−1 τ=0 Ft−τ−1GE and since F is stable: lim t→∞Ee(t)u(t)T= lim t→∞ 1 t + 1 t−1 X τ=0 FτGE =0. Now, consider the second moment of the error: Σ(t + 1) := Ee(t + 1)e(t + 1)T = FΣ(t)FT +F  Ee(t)u(t)T  GT +G  Eu(t)e(t)T  FT+ 1 t+1GEGT = FΣ(t)FT + Q(t), where Q(t) := 1 t + 1  FC(t)GT + GC(t)T FT + GEGT  . As shown above Q(t) →0 as t →∞, i.e. for any δ > 0, ∃T ′ ∈N such that ∀t ≥T ′, ∥Q(t)∥≤δ. With initial time T ′, Σ(t) = Ft−T ′Σ(T ′)(FT )t−T ′ + t−1 X τ=T ′ Ft−τ−1Q(τ)(FT )t−τ−1 for t ≥T ′. Then: ∥Σ(t)∥≤ Ft−T ′ 2∥Σ(T ′)∥+ t−T ′−1 X τ=0 ∥Fτ∥2δ ≤α2µ2(t−T ′) + δα2 t−T ′−1 X τ=0 µ2τ, where the existence of the constants α > 0 and 0 ≤µ < 1 is guaranteed by the stability of F. We conclude that limt→∞∥Σ(t)∥≤ δα2 1−µ2 , which can be made arbitrarily small by choice of δ. In other words, e(t) L2 −−→0. APPENDIX C: PROOF OF THEOREM 3 We use the same notation and follow the same steps as in the proof of Thm. 1, except that now the terms Hi, Vi, Mi, M, ζ(t), φit, Cit, Bit are time-varying and stochastic because they depend on the location estimates ˆxi(t). To emphasize this, we denote them by bHit, bVit, c Mit, c Mt, bζ(t), bφit, bCit, bBit, where for example c Mit := Mi(ˆxi(t)). The same linear systems (11) describe the evolutions of ωt and Ωt except that they are stochastic now and (13) becomes: ωit t + 1 = git + bCity + bφit, bBit := Ωit t + 1 = Git + bCit. We still have that Kt →π1T with πj > 0. Also, git, Git, and bit are still deterministic and converge to zero as t →∞. The following observations are necessary to conclude that bCit still converges to Pn j=1 πjMj. Lemma 5. If ˆxi(t) a.s. −−→xi, then c Mit a.s.,L2 −−−−→Mi. Proof. Almost sure convergence follows from the continuity of Mi(·) and the continuous mapping theorem [26, Thm.3.2.4]. L2-convergence follows from the boundedness of Mi(·) and the dominated convergence theorem [26, Thm.1.6.7]. Lemma 6. If at →a and bt →b, then 1 t Pt−1 τ=0 at−τbτ →ab. Proof. The convergence of at implies its boundedness, |at| ≤ q < ∞. Then, notice ab = 1 t Pt−1 τ=0 ab and 1 t t−1 X τ=0 at−τbτ −ab = 1 t t−1 X τ=0 at−τ(bτ −b) + (at−τ −a)b  ≤ 1 t t−1 X τ=0 at−τ(bτ −b) + 1 t t−1 X τ=0 (at−τ −a)b ≤ q 1 t t−1 X τ=0 bτ −b  + 1 t t X τ=1 aτ −a  b , where both terms converge to zero since Ces´aro means pre- serve convergent sequences and their limits. Combining Lemma 5,  Kt ij →πj, and Lemma 6, we have: 1 t + 1 t−1 X τ=0  Kt−τ−1 ij c Mjτ a.s. −−→  π1T  ijMj = πjMj. Moreover, 0 ≤[Kt]ij ≤1 and the boundedness of c Mjt imply, by the bounded convergence theorem [26, Thm.1.6.7], that the sequence above converges in L2 as well: bCit a.s.,L2 −−−−→Pn j=1 πjMj ≻0. (17) In turn, (17) guarantees that: bB−2 it = Git + bCit −2 a.s. −−→ Pn j=1 πjMj −2 (18) but is not enough to ensure that E  bB−2 it  remains bounded as t →∞. The parameter δ > 0 is needed to guarantee the boundedness. In particular, define bBit(δ) := bBit + δIdy. Then bBit(δ)−2 = Git + bCit + δIdy −2 ≺1 δ2 Idy and by the bounded convergence theorem and (18): bBit(δ)−2 a.s.,L1 −−−−→ Pn j=1 πjMj + δIdy −2, (19) so that limt→∞E  bBit(δ)−2 < ∞. From (17) and the bound- edness of bBit(δ)−1 and bCit, we also have: bBit(δ)−1 bCit bBit(δ)−T a.s.,L2 −−−−→ (20)  n X j=1 πjMj + δIdy −1 n X j=1 πjMj  n X j=1 πjMj + δIdy −T . Since bHit and bVit depend solely on ˆxi(t), they are independent of vi(t). Because E[vj(τ)] = 0, E[ bHT jτ bV −1 jτ vj(τ)] = 0 and as before E[bφit] = 0. Since bBit(δ) is independent of vi(t) as well, E  bBit(δ)−2 bφit  = 0 and a result equivalent to (14) holds: E[ bBit(δ)−1 bφit bφT it bBit(δ)−T ] = E  bBit(δ)−1  1 (t + 1)2 n X j=1 t−1 X τ=0  Kt−τ−12 ij c Mjτ  bBit(δ)−T  ⪯ 1 t + 1E  bBit(δ)−1 bCit bBit(δ)−T  . (21) Finally, we can consider the mean squared error: E  ∥ˆyi(t) −y∥2 2  = E bBit(δ)−1 ωit t + 1 −bBit(δ)−1 bBit(δ)y 2 2 = E bBit(δ)−1  git + bCity + bφit −(Git + bCit + δIdy)y  2 2 = E∥bBit(δ)−1(bit + bφit + δy)∥2 2 = E  bT it bBit(δ)−2bit + bφT it bBit(δ)−2 bφit + δ2yT bBit(δ)−2y + 2bT it bBit(δ)−2 bφit + 2δyT bBit(δ)−2 bφit + 2δbT it bBit(δ)−2y  = bT itE  bBit(δ)−2 bit + tr(E  bBit(δ)−1 bφit bφT it bBit(δ)−T  ) + δ2yT E  bBit(δ)−2 y + 2δbT itE  bBit(δ)−2 y (21) ≤bT itE bBit(δ)−2bit + 2δbT itE bBit(δ)−2y + δ2yT E bBit(δ)−2y + 1 t + 1 tr  E  bBit(δ)−1 bCit bBit(δ)−T  →δ2yT n X j=1 πjMj + δIdy −2y. In the final step, the first two terms go to zero because bit →0 and limt E bBit(δ)−2 < ∞from (19), the third term converges in view of (19) again, while the last term goes to zero because the trace is bounded in the limit in view of (20). REFERENCES [1] A. Mainwaring, D. Culler, J. Polastre, R. Szewczyk, and J. Anderson, “Wireless Sensor Networks for Habitat Monitoring,” in Int. Workshop on Wireless Sensor Networks and Applications, 2002. [2] N. Leonard, D. Paley, F. Lekien, R. Sepulchre, D. Fratantoni, and R. Davis, “Collective Motion, Sensor Networks, and Ocean Sampling,” Proceedings of the IEEE, vol. 95, no. 1, 2007. [3] T. Yan, T. He, and J. Stankovic, “Differentiated Surveillance for Sensor Networks,” in Int. Conf. on Embedded Networked Sensor Systems, 2003. [4] J. Lu and T. Suda, “Differentiated Surveillance for Static and Random Mobile Sensor Networks,” IEEE Trans. on Wireless Communications, vol. 7, no. 11, 2008. [5] A. Jadbabaie, P. Molavi, A. Sandroni, and A. Tahbaz-Salehi, “Non- Bayesian Social Learning,” Games and Economic Behavior, vol. 76, no. 1, 2012. [6] K. Rahnama Rad and A. Tahbaz-Salehi, “Distributed Parameter Estima- tion in Networks,” in Conf. on Decision and Control (CDC), 2010. [7] R. Olfati-Saber, “Distributed Kalman Filtering for Sensor Networks,” in Conf. on Decision and Control (CDC), 2007. [8] ——, “Kalman-Consensus Filter: Optimality, Stability, and Perfor- mance,” in Joint Conf. on Decision and Control (CDC) and Chinese Control Conference, 2009. [9] U. Khan, S. Kar, A. Jadbabaie, and J. Moura, “On connectivity, observ- ability, and stability in distributed estimation,” in Conf. on Decision and Control (CDC), 2010. [10] S. Shahrampour, A. Rakhlin, and A. Jadbabaie, “Online Learning of Dynamic Parameters in Social Networks,” in Advances in Neural Information Processing Systems (NIPS), 2013. [11] S. Kar, J. Moura, and K. Ramanan, “Distributed Parameter Estimation in Sensor Networks: Nonlinear Observation Models and Imperfect Communication,” Trans. on Information Theory, vol. 58, no. 6, 2012. [12] J. Cort´es, “Distributed Kriged Kalman filter for spatial estimation,” Trans. on Automatic Control (TAC), vol. 54, no. 12, 2009. [13] P. Barooah and J. Hespanha, “Estimation on Graphs from Relative Measurements,” IEEE Control Systems, vol. 27, no. 4, 2007. [14] P. Barooah, “Estimation and Control with Relative Measurements: Al- gorithms and Scaling Laws,” Ph.D. dissertation, University of California Santa Barbara, 2007. [15] J. Aspnes, T. Eren, D. Goldenberg, A. Morse, W. Whiteley, Y. Yang, B. Anderson, and P. Belhumeur, “A Theory of Network Localization,” IEEE Trans. on Mobile Computing, vol. 5, no. 12, 2006. [16] D. Moore, J. Leonard, D. Rus, and S. Teller, “Robust Distributed Network Localization with Noisy Range Measurements,” in Int. Conf. on Embedded Networked Sensor Systems, 2004. [17] A. So and Y. Ye, “Theory of semidefinite programming for Sensor Network Localization,” Mathematical Programming, vol. 109, no. 2-3, 2007. [18] N. Priyantha, H. Balakrishnan, E. Demaine, and S. Teller, “Anchor-Free Distributed Localization in Sensor Networks,” in Int. Conf. on Embedded Networked Sensor Systems, 2003. [19] U. Khan, S. Kar, and J. Moura, “Distributed Sensor Localization in Random Environments Using Minimal Number of Anchor Nodes,” Trans. on Signal Processing, vol. 57, no. 5, 2009. [20] Y. Diao, Z. Lin, M. Fu, and H. Zhang, “A New Distributed Localization Method for Sensor Networks,” in Asian Control Conf. (ASCC), 2013. [21] R. Tron and R. Vidal, “Distributed Image-Based 3-D Localization in Camera Sensor Networks,” in Joint Conf. on Decision and Control (CDC) and Chinese Control Conference, 2009. [22] V. Hernandez Bennetts, A. Lilienthal, A. Khaliq, V. Pomareda Sese, and M. Trincavelli, “Towards Real-World Gas Distribution Mapping and Leak Localization Using a Mobile Robot with 3D and Remote Gas Sensing Capabilities,” in Int. Conf. on Robotics and Automation, 2013. [23] A. Barker, D. Brown, and W. Martin, “Bayesian estimation and the Kalman filter,” Computers and Mathematics with Applications, vol. 30, no. 10, 1995. [24] J. Mendel, Lessons in Estimation Theory for Signal Processing, Com- munications, and Control. Prentice-Hall, 1995. [25] L. Elsner and V. Mehrmann, “Convergence of block iterative methods for linear systems arising in the numerical solution of Euler equations,” Numerische Mathematik, vol. 59, no. 1, 1991. [26] R. Durrett, Probability: Theory and Examples, 4th ed. Cambridge University Press, 2010. [27] W. Rugh, Linear System Theory, 2nd ed. Prentice-Hall, 1996.