Gaussian Process-Based Decentralized Data Fusion and Active Sensing for Mobility-on-Demand System Jie Chen, Kian Hsiang Low, and Colin Keng-Yan Tan Department of Computer Science, National University of Singapore, Republic of Singapore {chenjie, lowkh, ctank}@comp.nus.edu.sg Abstract—Mobility-on-demand (MoD) systems have recently emerged as a promising paradigm of one-way vehicle sharing for sustainable personal urban mobility in densely populated cities. In this paper, we enhance the capability of a MoD system by deploying robotic shared vehicles that can autonomously cruise the streets to be hailed by users. A key challenge to managing the MoD system effectively is that of real-time, fine-grained mobility demand sensing and prediction. This paper presents a novel decentralized data fusion and active sensing algorithm for real- time, fine-grained mobility demand sensing and prediction with a fleet of autonomous robotic vehicles in a MoD system. Our Gaussian process (GP)-based decentralized data fusion algorithm can achieve a fine balance between predictive power and time efficiency. We theoretically guarantee its predictive performance to be equivalent to that of a sophisticated centralized sparse approximation for the GP model: The computation of such a sparse approximate GP model can thus be distributed among the MoD vehicles, hence achieving efficient and scalable demand prediction. Though our decentralized active sensing strategy is devised to gather the most informative demand data for demand prediction, it can achieve a dual effect of fleet rebalancing to service the mobility demands. Empirical evaluation on real-world mobility demand data shows that our proposed algorithm can achieve a better balance between predictive accuracy and time efficiency than state-of-the-art algorithms. I. INTRODUCTION Private automobiles are becoming unsustainable personal mobility solutions in densely populated urban cities because the addition of parking and road spaces cannot keep pace with their escalating numbers due to limited urban land. For exam- ple, Hong Kong and Singapore have, respectively, experienced 27.6% and 37% increase in private vehicles from 2003 to 2011 [2]. However, their road networks have only expanded less than 10% in size. Without implementing sustainable measures, traffic congestions and delays will grow more severe and frequent, especially during peak hours. Mobility-on-demand (MoD) systems [20] (e.g., V´elib sys- tem of over 20000 shared bicycles in Paris, experimental car- sharing systems described in [21]) have recently emerged as a promising paradigm of one-way vehicle sharing for sustainable personal urban mobility, specifically, to tackle the problems of low vehicle utilization rate and parking space caused by private automobiles. Conventionally, a MoD system provides stacks and racks of light electric vehicles distributed throughout a city: When a user wants to go somewhere, he simply walks to the nearest rack, swipes a card to pick up a vehicle, drives it to the rack nearest to his destination, and drops it off. In this paper, we enhance the capability of a MoD system by deploying robotic shared vehicles (e.g., General Motors Chevrolet EN-V 2.0 prototype [1]) that can autonomously drive and cruise the streets of a densely populated urban city to be hailed by users (like taxis) instead of just waiting at the racks to be picked up. Compared to the conventional MoD system, the fleet of autonomous robotic vehicles provides greater accessibility to users who can be picked up and dropped off at any location in the road network. As a result, it can service regions of high mobility demand but with poor coverage of stacks and racks due to limited space for their installation. The key factors in the success of a MoD system are the costs to the users and system latencies, which can be minimized by managing the MoD system effectively. To achieve this, two main technical challenges need to be addressed [19]: (a) Real- time, fine-grained mobility demand sensing and prediction, and (b) real-time active fleet management to balance vehi- cle supply and demand and satisfy latency requirements at sustainable operating costs. Existing works on load balancing for MoD systems [21], dynamic traffic assignment problems [22], dynamic one-to-one pickup and delivery problems [3], and location recommendation and dispatch for cruising taxis [5, 8, 12, 29] have tackled variants of the second challenge by assuming the necessary input of mobility demand information to be perfectly or accurately known using prior knowledge or offline processing of historic data. Such an assumption does not hold for densely populated urban cities because their mobility demand patterns are often subject to short- term random fluctuations and perturbations, in particular, due to frequent special events (e.g., storewide sales, exhibitions), unpredictable weather conditions, or emergencies (e.g., break- downs in public transport services). So, in order for the active fleet management strategies to perform well, they require accu- rate, fine-grained information of the spatiotemporally varying mobility demand patterns in real time, which is the desired outcome of addressing the first challenge. To the best of our knowledge, there is little progress in the algorithmic development of the first challenge, which will be the focus of our work in this paper. Given that the autonomous robotic vehicles are used directly as mobile probes to sample real-time demand data (e.g., pickup counts of different regions), how can the vacant ones actively cruise a road network to gather and assimilate the most informative demand data for predicting the mobility demand pattern? To solve this problem, a centralized approach to data arXiv:1306.1491v1 [cs.RO] 2 Jun 2013 fusion and active sensing [4, 11, 14, 15, 17, 18] is poorly suited because it suffers from a single point of failure and incurs huge communication, space, and time overheads with large data and fleet. Hence, we propose a novel decentralized data fusion and active sensing algorithm for real-time, fine- grained mobility demand sensing and prediction with a fleet of autonomous robotic vehicles in a MoD system. The specific contributions of our work include: • Modeling and predicting the spatiotemporally varying mo- bility demand pattern using a rich class of Bayesian non- parametric models called the log-Gaussian process (ℓGP) (Section II); • Developing a novel Gaussian process-based decentral- ized data fusion (GP-DDF+) algorithm (Section III) that achieves a fine balance between predictive power and time efficiency, and theoretically guaranteeing its predictive performance to be equivalent to that of a sophisticated centralized sparse approximation for the Gaussian process (GP) model: The computation of such a sparse approximate GP model can thus be distributed among the MoD vehicles, hence achieving efficient and scalable demand prediction; • Exploiting a decentralized active sensing (DAS) strategy (Section IV) to gather the most informative demand data for predicting the mobility demand pattern: Interestingly, we can analytically show that DAS exhibits a cruising behavior of simultaneously exploring demand hotspots and sparsely sampled regions that have higher likelihood of picking up users, hence achieving a dual effect of fleet rebalancing to service the mobility demands; • Empirically evaluating the predictive accuracy, time effi- ciency, scalability, and performance of servicing mobility demands (i.e., average cruising length of vehicles, average waiting time of users, total number of pickups) of our proposed algorithm on a real-world mobility demand pat- tern over the central business district of Singapore during evening hours (Section VI). II. MODELING A MOBILITY DEMAND PATTERN The service area in an urban city can be represented as a directed graph G ≜(V, E) where V denotes a set of all regions generated by gridding the service area, and E ⊆V ×V denotes a set of edges such that there is an edge (s, s′) from s ∈V to s′ ∈V iff at least one road segment in the road network starts in s and ends in s′. Each region s ∈V is associated with a p-dimensional feature vector xs representing its context information (e.g., location, time, precipitation), and a measurement ys quantifying its mobility demand1. Since it is often impractical in terms of sensing resource cost to determine the actual mobility demand of a region, a common practice is to use the pickup count of the region as a surrogate measure. To elaborate, the user pickups made by vacant MoD 1The service area is represented as a grid of regions instead of a network of road segments like in [6] because we observe less smoothly-varying, noisier demand measurements (hence, lower spatial correlation) for the latter in our real-world data (Section VI) since many road segments do not permit stopping of vehicles. vehicles cruising in a region contribute to its pickup count. Since we do not assume a data center to be available to keep track of the pickup count, a fully distributed gossip-based protocol [10] is utilized to aggregate these pickup information from the vehicles in the region that are connected via an ad hoc wireless communication network. Consequently, any vehicle entering the region can access its pickup count simply by joining its ad hoc network. As observed in [5, 12] and our real-world data (Fig. 1a), a mobility demand pattern over a large service area in an urban city is typically characterized by spatiotemporally correlated demand measurements and contains a few small-scale hotspots exhibiting extreme measurements and much higher spatiotem- poral variability than the rest of the demand pattern. That is, if the measurements are put together into a 1D sample frequency distribution, a positive skew results. We like to consider using a rich class of Bayesian nonparametric models called Gaussian process (GP) [24] to model the demand pattern. But, the GP covariance structure is sensitive to strong positive skewness and easily destabilized by a few extreme measurements [27]. In practice, this can cause reconstructed patterns to display large hotspots centered about a few extreme measurements and predictive variances to be unrealistically small in hotspots [9], which are undesirable. So, if the GP is used to model a demand pattern directly, it may not predict well. To resolve this, a standard statistical practice is to take the log of the measurements (i.e., zs = log ys) to remove skewness and extremity, and use the GP to model the demand pattern in the log-scale instead. A. Gaussian Process (GP) Each region s ∈V is associated with a realized (random) log-measurement zs (Zs) if s is sampled/observed (unob- served). Let {Zs}s∈V denote a GP, that is, every finite subset of {Zs}s∈V has a multivariate Gaussian distribution. The GP is fully specified by its prior mean µs ≜E[Zs] and covariance σss′ ≜cov(Zs, Zs′) for all s, s′ ∈V , the latter of which is defined by the widely-used squared exponential covariance function: σss′ ≜σ2 s exp −1 2 p X i=1 [xs]i −[xs′]i ℓi 2! + σ2 nδss′ where [xs]i ([xs′]i) is the i-th component of the feature vector xs (xs′), the hyperparameters σ2 n, σ2 s, ℓ1, . . . , ℓp are, respec- tively, noise and signal variances and length-scales that can be learned using maximum likelihood estimation, and δss′ is a Kronecker delta that is 1 if s = s′ and 0 otherwise. Given a set D ⊂V of observed regions and a column vector zD of corresponding log-measurements, the GP can be used to predict the log-measurements of any set S ⊂V of unobserved regions with the following Gaussian posterior mean vector and covariance matrix: µS|D ≜µS + ΣSDΣ−1 DD(zD −µD) (1) ΣSS|D ≜ΣSS −ΣSDΣ−1 DDΣDS (2) where µS(µD) is a column vector with mean components µs for all s ∈S(s ∈D), ΣSD(ΣDD) is a covariance matrix with covariance components σss′ for all s ∈S, s′ ∈D(s, s′ ∈D), and ΣSD is the transpose of ΣDS. B. Log-Gaussian Process (ℓGP) Demand measurements may not be observed in some re- gions because vacant MoD vehicles did not cruise into them. Since our ultimate interest is to predict them in the original scale, GP’s predicted log-measurements of these unobserved regions must be transformed back unbiasedly. To achieve this, we utilize a widely-used variant of GP in geostatistics called the ℓGP that can model the demand pattern in the original scale. Let {Ys}s∈V denote a ℓGP: If Zs ≜log Ys, then {Zs}s∈V is a GP. So, Ys = exp{Zs} denotes the original random demand measurement of unobserved region s and is predicted using the log-Gaussian posterior mean (i.e., best unbiased predictor) bµs|D ≜exp(µs|D + Σss|D/2) (3) where µs|D and Σss|D are simply the Gaussian posterior mean (1) and variance (2) of GP, respectively. The uncertainty of predicting the measurements of any set S ⊂V of unobserved regions can be quantified by the following log-Gaussian poste- rior joint entropy, which will be exploited by our DAS strategy (Section IV): H[YS|YD] ≜1 2 log(2πe)|S| ΣSS|D + µS|D · 1 (4) where µS|D and ΣSS|D are the Gaussian posterior mean vector (1) and covariance matrix (2) of GP, respectively. III. DECENTRALIZED DEMAND DATA FUSION The demand data are gathered in a distributed manner by the vacant MoD vehicles cruising the service area and have to be assimilated in order to predict the mobility demand pattern. A straightforward approach to data fusion is to fully communicate all the data to every vehicle, each of which then performs the same exact ℓGP prediction (3) separately. This approach, which we call full Gaussian process (FGP) [14, 15], unfortunately cannot scale well and be performed in real time due to its cubic time complexity in the size of the data. Alternatively, the work of [6] has recently proposed a Gaussian process-based decentralized data fusion (GP-DDF) approach to efficient and scalable approximate GP and ℓGP prediction. Their key idea is as follows: Each vehicle sum- marizes all its local data, based on a common prior support set U ⊂V , into a local summary (Definition 1) to be exchanged with every other vehicle. Then, it assimilates the local summaries received from the other vehicles into a global summary (Definition 2), which is then exploited for predicting the demands of unobserved regions. Though GP-DDF scales very well with large data, it can predict poorly due to (a) loss of information caused by summarizing the measurements and correlation structure of the original data; and (b) sparse coverage of the hotspots (i.e., with higher spatiotemporal variability) by the support set. We propose a novel decentralized data fusion algorithm called GP-DDF+ that combines the best of both worlds, that is, the predictive power of FGP and efficiency of GP-DDF. GP- DDF+ is based on the intuition that a vehicle can exploit its local data to improve the demand predictions for unobserved regions “close” to its data (in the correlation sense). At the same time, GP-DDF+ can preserve the efficiency of GP-DDF by exploiting its idea of summarizing information, specifically, into the local and global summaries, as reproduced below: Definition 1 (Local Summary): Given a common support set U ⊂V known to all K vehicles, a set Dk ⊂V of observed regions and a column vector zDk of corresponding demand measurements local to vehicle k, its local summary is defined as a tuple ( ˙zk U, ˙Σk UU) where ˙zk B ≜ΣBDkΣ−1 DkDk|U(zDk −µDk) (5) ˙Σk BB′ ≜ΣBDkΣ−1 DkDk|UΣDkB′ (6) such that ΣDkDk|U is defined in a similar manner as (2). Definition 2 (Global Summary): Given a common support set U ⊂V known to all K vehicles and the local summary ( ˙zk U, ˙Σk UU) of every vehicle k = 1, . . . , K, the global summary is defined as a tuple (¨zU, ¨ΣUU) where ¨zU ≜ K X k=1 ˙zk U (7) ¨ΣUU ≜ΣUU + K X k=1 ˙Σk UU . (8) Using GP-DDF [6], each vehicle exploits the global summary to compute a globally consistent predictive Gaussian distribu- tion of the log-measurements of any set of unobserved regions. The resulting predictive Gaussian mean and variance can then be plugged into (3) to obtain the log-Gaussian posterior mean for predicting the demand of any unobserved region in the original scale. To improve the predictive power of GP-DDF, we develop the following novel GP-DDF+ algorithm that is further augmented by local information. Definition 3 (GP-DDF+ k ): Given a common support set U ⊂V known to all K vehicles, the global summary (¨zU, ¨ΣUU), the local summary ( ˙zk U, ˙Σk UU), a set Dk ⊂V of observed regions and a column vector zDk of corresponding measurements local to vehicle k, its GP-DDF+ k algorithm computes a predictive Gaussian distribution N(µk S, Σ k SS) of the demand measurements of any set S ⊂V of unobserved regions where µk S ≜ µk s  s∈S and Σ k SS ≜ σk ss′  s,s′∈S such that µk s ≜µs +  γk sU ¨Σ−1 UU ¨zU −ΣsUΣ−1 UU ˙zk U  + ˙zk s (9) σk ss′ ≜σss′ −  γk sUΣ−1 UUΣUs′ −ΣsUΣ−1 UU ˙Σk Us′ −γk sU ¨Σ−1 UUγk Us′  −˙Σk ss′ (10) and γk sU ≜ΣsU + ΣsUΣ−1 UU ˙Σk UU −˙Σk sU . (11) Remark 1. Both the predictive Gaussian mean µk s (9) and co- variance σk ss′(10) of GP-DDF+ k exploit summary information (i.e., bracketed term) derived from global and local summaries and local information (i.e., last term) derived from local data. Remark 2. The predictive Gaussian mean µk s (9) and variance σk ss (10) can be plugged into (3) to obtain the log-Gaussian posterior mean for predicting the demand of any unobserved region in the original scale. Remark 3. Since different vehicles exploit different local data, their GP-DDF+ k algorithms provide inconsistent predictions of the mobility demand pattern. It is often desirable to achieve a globally consistent demand prediction among all vehicles. To do this, each unobserved region is simply assigned to the vehicle that predicts its demand best, which can be performed in a decentralized way: Definition 4 (Assignment Function): An assignment func- tion τ : V 7→{1 . . . K} is defined as τ(s) ≜arg min k∈{1...K} σk ss (12) for all s ∈S where the predictive variance σk ss is defined in (10). From now on, let τs ≜τ(s) for notational simplicity. Using the assignment function τ, each vehicle can now com- pute a globally consistent predictive Gaussian distribution, as detailed in Theorem 1A below: Theorem 1 (GP-DDF+): Let a common support set U ⊂ V and a common assignment function τ be known to all K vehicles. A. The GP-DDF+ algorithm of each vehicle computes a globally consistent predictive Gaussian distribution N(µS, ΣSS) of the demand measurements of any set S ⊂V of unobserved regions where µS ≜(µτs s )s∈S (9) and ΣSS ≜(σss′)s,s′∈S such that σss′ ≜ ( στs ss′ if τs = τs′, Σss′|U + γτs sU ¨Σ−1 UUγτs′ Us′ otherwise , (13) and γτs′ Us′ is the transpose of γτs′ s′U. B. Let N(µPIC S|D, ΣPIC SS|D) be the predictive Gaussian distribution computed by the centralized sparse par- tially independent conditional (PIC) approximation of GP model [25] where µPIC S|D ≜  µPIC s|D  s∈S and ΣPIC SS|D ≜  σPIC ss′|D  s,s′∈S such that µ PIC s|D ≜µs + eΓsD (ΓDD + Λ)−1 (zD −µD) (14) σ PIC ss′|D ≜σss′ −eΓsD (ΓDD + Λ)−1 eΓDs′ (15) and eΓDs′ is the transpose of eΓs′D such that ΓBB′ ≜ΣBUΣ−1 UUΣUB′ (16) eΓsD ≜(eΓs¯s)¯s∈D (17) eΓs¯s ≜  σs¯s if τs = τ¯s, Γs¯s otherwise , (18) and Λ is a block-diagonal matrix constructed from the K diagonal blocks of ΣDD|U, each of which is a matrix ΣDkDk|U for k = 1, . . . , K where D = SK k=1 Dk, and let τ¯s ≜k for all ¯s ∈Dk. Then, µs = µPIC s|D and σss′ = σPIC ss′|D for all s, s′ ∈S. The proof of Theorem 1B is given in Appendix A. Remark 1. In Theorem 1A, if τs = τs′ = k, then vehicle k can compute µτs s (9) and σss′ (10) locally and send them to the other vehicles that request them. Otherwise, τs ̸= τs′ and vehicle k has to request |U|-sized vectors γτs sU and γτs′ s′U from the respective vehicles τs and τs′ to compute σss′ (13). Remark 2. The equivalence result of Theorem 1B implies that the computational load of the centralized PIC approxi- mation of GP can be distributed among K vehicles, hence improving the time efficiency of demand prediction. Sup- posing |S| ≤|U| and |S| ≤|D|/K for simplicity, the O |D|((|D|/K)2 + |U|2)  time incurred by PIC can be re- duced to O (|D|/K)3 + |U|3 + |U|2K  time of running GP- DDF+ on each of the K vehicles. Hence, GP-DDF+ scales better with increasing size |D| of data. Remark 3. The equivalence result also sheds some light on an important property of GP-DDF+ based on the struc- ture of PIC: It is assumed that ZD1 S S1, ..., ZDK S SK are conditionally independent given the support set U. As compared to GP-DDF that assumes conditional independence of ZD1, . . . , ZDK, ZS1, . . . , ZSK, GP-DDF+ can predict ZS better since it imposes a weaker conditional independence as- sumption. Experimental results on real-world mobility demand data (Section VI) also show that GP-DDF+ achieves predictive accuracy comparable to FGP and significantly better than GP- DDF, thus justifying the practicality of such an assumption for predicting a mobility demand pattern. IV. DECENTRALIZED ACTIVE DEMAND SENSING Suppose that there are K vacant MoD vehicles in the fleet actively cruising the service area and each vehicle k ∈ {1 . . . K} has observed a set Dk ⊂V of regions. In the active demand sensing problem, all vehicles have to jointly select the most informative walks w∗ 1, . . . w∗ K of length H each along which demand data will be sampled: (w∗ 1, . . . , w∗ K) ≜arg max (w1,...,wK) H h YSK k=1 Swk YSK k=1 Dk i (19) where Swk denotes the set of unobserved regions to be visited by the walk wk. To ease notation, let w ≜(w1 . . . wK) and Sw = SK k=1 Swk (similarly, for w∗and Sw∗). Then, each vehicle k executes its walk w∗ k while observing the demands of regions Sw∗ k, and updates its location and stored data. To derive the most informative joint walk w∗, the posterior entropy (19) of every possible joint walk w has to be evaluated. Such a centralized strategy cannot be performed in real time due to the following two issues: (a) It relies on all the demand data that are gathered in a distributed manner by the vehicles, thus incurring huge time and communication overheads with large data, and (b) it involves evaluating a prohibitively large number of joint walks (i.e., exponential in the fleet size). The first issue can be alleviated by approximating the log- Gaussian posterior joint entropy using the decentralized GP- DDF+ algorithm (Theorem 1A), thus distributing its com- putational load among all vehicles. Then, the active demand sensing problem (19) is approximated by w∗= arg max w H [YSw] (20) H [YSw] ≜1 2 log(2πe)|Sw| ΣSwSw + µSw · 1 . (21) To obtain H [YSw] (21), ΣSwSw|D and µSw|D in H[YSw|YD] ((4) & (19)) are replaced by ΣSwSw and µSw defined in Theorem 1A, respectively. To address the second issue, a partially decentralized active sensing strategy proposed by [6] partitions the vehicles into several small groups such that each group of vehicles selects its joint walk independently. This partitioning heuristic performs poorly when the largest group formed still contains many vehicles. In our work, this is indeed the case because many vehicles tend to cluster within hotspots, as explained later. To scale well in the fleet size, we therefore adopt a fully decentralized active sensing (DAS) strategy by assuming that the joint walk w∗ 1 . . . w∗ K is derived by selecting the locally optimal walk of each vehicle k: w∗ k = arg max wk H h YSwk i (22) where H h YSwk i is defined in the same way as (21). Then, each vehicle can select its locally optimal walk independently of the other vehicles, thus significantly reducing the search space of joint walks. A consequence of such an assumption is that, without coordinating their walks, the vehicles may select suboptimal joint walks (e.g., two vehicles’ locally optimal walks are highly correlated). In practice, this assumption becomes less restrictive when the size |D| of data increases to potentially reduce the degree of violation of conditional independence of YSw1 , . . . , YSwK . More importantly, it can be observed from (21) and (22) that the cruising behavior of DAS trades off between exploring sparsely sampled regions with high predictive uncertainty (i.e., by maximizing the log-determinant of Gaussian poste- rior covariance matrix ΣSwk Swk term) and hotspots (i.e., by maximizing the Gaussian posterior mean vector µSwk term). As a result, it redistributes vacant MoD vehicles to regions with high likelihood of picking up users. Hence, besides gathering the most informative data for predicting the mobility demand pattern, DAS is able to achieve a dual effect of fleet rebalancing to service mobility demands. V. TIME AND COMMUNICATION ANALYSIS In this section, we analyze the time and communication overheads of our proposed GP-DDF+ coupled with DAS algorithm (Algo. 1) and compare them with that of both FGP (Section II) and GP-DDF [6] coupled with DAS algorithms (Section IV). Algorithm 1: GP-DDF++DAS(U, K, H, k, Dk, zDk) while true do /* Data fusion (Section III) */ Construct local summary by (5) & (6) Exchange local summary with every vehicle i ̸= k Construct global summary by (7) & (8) Construct assignment function by (12) Predict demand measurements of unobserved regions by (9) & (13) /* Active Sensing (Section IV) */ Compute local maximum-entropy walk w∗ k by (22) Execute walk w∗ k and observe its demand measurements Yw∗ k Update local information Dk and yDk A. Time Complexity Firstly, each vehicle k has to evaluate ΣDkDk|U in O |U|3 + |U|(|D|/K)2 time and invert it in O (|D|/K)3 time. After that, the data fusion component constructs the local summary in O |U|2|D|/K + |U|(|D|/K)2 time by (5) and (6), and subsequently the global summary in O |U|2K  time by (7) and (8). To construct the assignment function for any unobserved set S ⊂V , vehicle k first computes |S| number of γk sU for all unobserved regions s ∈S in O |S||U|2 + |S|(|D|/K)2 time by (11). Then, after inverting ¨ΣUU in O(|U|3), the predictive means and variances for all s ∈S are computed in O |S||U|2 + |S|(|D|/K)2 time by (9) and (13), respectively. Let ∆≜δH denote the number of possible walks of length H where δ is the maximum out-degree of graph G. In the active sensing component, to obtain the locally optimal walk, the log- Gaussian posterior entropies (22) of all possible walks are derived from (9) and (13), respectively, in O ∆H|U|2 and O ∆(H|U|)2 time. We assume |S| ≤ δ∆where S denotes the set S wk Swk of regions covered by any vehicle k’s all possible walks of length H. Then, the time complexity for our GP-DDF+ coupled with DAS algorithm is O (|D|/K)3+|U|3+|U|2K+∆(H3+(H|U|)2+(|D|/K)2)  . In contrast, the time incurred by FGP and GP-DDF coupled with DAS algorithms are, respectively, O |D|3 + ∆(H3 + (H|D|)2)  and O (|D|/K)3 + |U|3 + |U|2K + ∆(H3 + (H|U|)2)  . It can be observed that our GP-DDF+ coupled with DAS algorithm can scale better with large size |D| of data and fleet size K than FGP coupled with DAS algorithm, and its increased computational load, as compared to GP-DDF coupled with DAS algorithm, is well distributed among K vehicles. B. Communication complexity In each iteration, each vehicle of the system running our GP-DDF+ coupled with DAS algorithm has to broadcast a O(|U|2)-sized local summary for constructing the global summary, exchange O(∆) scalar values for constructing the assignment function, and request O(∆) number of O(|U|)- sized γk sU components for evaluating the entropies of all possible local walks. In contrast, FGP coupled with DAS algo- rithm needs to broadcast O(|D|/K)-sized message comprising all its local data to handle communication failure, and GP- DDF coupled with DAS algorithm only needs to broadcast a O(|U|2)-sized local summary. VI. EXPERIMENTS AND DISCUSSION This section evaluates the performance of our proposed algorithm using a real-world taxi trajectory dataset taken from the central business district of Singapore between 9:30 p.m. and 10 p.m. on August 2, 2010. The service area is gridded into 50 × 100 regions such that 2506 regions are included into the dataset as the remaining regions contain no road segment for cruising vehicles to access. The maximum out- degree δ of graph G over these regions is 8. The feature vector of each region is specified by its corresponding location. In any region, the demand (supply) measurement is obtained by counting the number of pickups (taxis cruising by) from all historic taxi trajectories generated by a major taxi company in a 30-minute time slot. After processing the taxi trajectories, the historic demand and supply distributions are obtained, as shown in Fig. 1. Then, a number C of users are randomly 10 20 30 40 50 60 70 80 90 100 5 10 15 20 25 30 35 40 45 50 10 20 30 40 50 60 70 80 90 100 5 10 15 20 25 30 35 40 45 50 (a) Demand (b) Supply Fig. 1. Historic demand and supply distributions. distributed over the service area with their locations drawn from the demand distribution (Fig. 1a). Similarly, a fleet of K vacant MoD vehicles are initialized at locations drawn from the supply distribution (Fig. 1b). In our simulation, when a vehicle enters a region with users, it picks up one of them randomly. Then, the MoD system removes this vehicle from the fleet of vacant cruising vehicles and introduce a new vacant vehicle drawn from the supply distribution. Similarly, a new user appears at a random location drawn from the demand distribution. The MoD system operates for L time steps and each vehicle plans a walk of length 4 at each time step, with all vehicles running a data fusion algorithm coupled with our DAS strategy. We will compare the performance of our GP-DDF+ algorithm with that of FGP and GP-DDF algorithms when coupled with our DAS strategy. The experiments are conducted on a Linux system with Intel® Xeon® CPU E5520 at 2.27 GHz. A. Performance Metrics The tested algorithms are evaluated with two sets of per- formance metrics. The performance of sensing and predicting mobility demands is evaluated using (a) root mean square error (RMSE) q |V |−1 P s∈V ys −bµs|D 2 where ys is the demand measurement and D is the set of regions observed by the MoD vehicles, and (b) incurred time of the algorithms. The performance of servicing mobility demands is eval- uated by comparing the Kullback-Leibler divergence (KLD) P s∈V Pc(s) log (Pc(s)/Pd(s)) between the fleet distribution Pc of vacant MoD vehicles controlled by the tested algorithms and historic demand distribution Pd (i.e., lower KLD implies better balance between fleet and demand), average cruising length of MoD vehicles, average waiting time of users, and total number of pickups resulting from the tested algorithms. B. Results and Analysis For notational simplicity, we will use GP-DDF+, FGP, and GP-DDF to represent the algorithms of their corresponding data fusion components coupled with DAS strategy in this subsection. 1) Performance: The MoD system comprises K = 20 vehicles running three tested algorithms for L = 960 time steps in a service area with C = 200 users. All results are obtained by averaging over 40 random instances. The performance of MoD systems in sensing and predicting mobility demands is illustrated in Figs. 2a-2b. Fig. 2a shows that the demand data collected by MoD vehicles using GP- DDF+ can achieve predictive accuracy comparable to that of using FGP and significantly better than that of using GP- DDF. This indicates that exploiting the local data of vehicles for predicting demands of nearby unobserved regions can improve the prediction of the mobility demand pattern. Fig. 2b shows the average incurred time of each vehicle using three algorithms. GP-DDF+ is significantly more time-efficient (i.e., one order of magnitude) than FGP, and only slightly less time- efficient than GP-DDF. This can be explained by the time analysis in Section V. The above results indicate that GP- DDF+ is more practical for real-world deployment due to a better balance between predictive accuracy and time efficiency. The performance of MoD systems in servicing the mobility demands is illustrated in Figs. 2c-2f. Fig. 2c shows that a MoD system using GP-DDF+ can achieve better fleet rebalancing of vehicles to service mobility demands than GP-DDF, but worse rebalancing than FGP. This implies that a better prediction of the underlying mobility demand pattern (Fig. 2a) can lead to better fleet rebalancing. Note that KLD (i.e., imbalance be- tween mobility demand and fleet) increases over time because we assume that when a vehicle picks up a user, its local data is removed from the fleet of cruising vehicles, and a new vehicle is introduced at a random location that may be distant from a demand hotspot, hence worsening the imbalance between demand and fleet. It can also be observed that an algorithm generating a better balance between fleet and demand will also perform better in servicing the mobility demands, that 200 400 600 800 1000 4.5 5 5.5 6 6.5 7 Length of walk RMSE FGP GP−DDF GP−DDF+ 200 400 600 800 1000 0 2 4 6 8 10 12 Length of walk Incurred time (s) FGP GP−DDF GP−DDF+ 200 400 600 800 1000 0.8 1 1.2 1.4 1.6 1.8 2 Length of walk KLD FGP GP−DDF GP−DDF+ 0 20 40 60 80 100 120 FGP GP−DDF GP−DDF+ Avg. length of walks 0 100 200 300 400 500 600 FGP GP−DDF GP−DDF+ Avg. waiting time 0 50 100 150 200 250 300 350 400 FGP GP−DDF GP−DDF+ Total no. of pickups (a) Accuracy (b) Efficiency (c) Balance (d) Vehicles (e) Users (f) Pickups Fig. 2. Performance of MoD systems in sensing, predicting, and servicing mobility demands. is, shorter average cruising trajectories of vehicles (Fig. 2d), shorter average waiting time of users (Fig. 2e), and larger total number of pickups (Fig. 2f). These observations imply that exploiting an active sensing strategy to collect the most informative demand data for predicting the mobility pattern achieves a dual effect of improving performance in servicing the mobility demands since these vehicles have higher chance of picking up users in demand hotspots or sparsely sampled regions (Section IV). 2) Scalability: We vary the number K = 10, 20, 30 of vehicles in the MoD system, and keep the total length of walks of all the vehicles to be the same, that is, these vehicles will walk for L = 960, 480, 320 steps, respectively. All three algorithms are tested in a service area with C = 600 users. All results are obtained by averaging over 40 random instances. From Figs. 3a-3c, it can be observed that all three algorithms can improve their prediction accuracy with an increasing number of vehicles in the MoD system because more vehicles indicate less walks when the total length of walks are the same, thus suffering less from the myopic planning (H = 4) and gathering more informative demand data. Figs. 3d-3f show that, with more MoD vehicles, GP-DDF+ and GP-DDF incur less time, while FGP incurs more time. This is because the computational load in decentralized data fusion algorithms are distributed among all vehicles, thus reducing the incurred time with more vehicles. 2000 4000 6000 8000 10000 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 Total length of walks of all vehicles RMSE K=10 K=20 K=30 2000 4000 6000 8000 10000 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 Total length of walks of all vehicles RMSE K=10 K=20 K=30 2000 4000 6000 8000 10000 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 Total length of walks of all vehicles RMSE K=10 K=20 K=30 (a) FGP (b) GP-DDF (c) GP-DDF+ 2000 4000 6000 8000 10000 0 0.5 1 1.5 Total length of walks of all vehicles Incurred time (s) K=10 K=20 K=30 2000 4000 6000 8000 10000 0 0.02 0.04 0.06 0.08 0.1 Total length of walks of all vehicles Incurred time (s) K=10 K=20 K=30 2000 4000 6000 8000 10000 0 0.1 0.2 0.3 0.4 0.5 Total length of walks of all vehicles Incurred time (s) K=10 K=20 K=30 (d) FGP (e) GP-DDF (f) GP-DDF+ Fig. 3. Scalability of MoD systems in sensing and predicting mobility demands. Figs. 4a-4c show that all three algorithms can achieve better balance between mobility demand and fleet with larger number of vehicles. It can also be observed that all three algorithms can improve the performance of servicing the mobility demand with more vehicles, that is, shorter average cruising trajectories of vehicles (Fig. 4d), shorter average waiting time of users (Fig. 4e), and larger total number of pickups (Fig. 4f). This is because MoD vehicles can collect more informative demand data with larger number of vehicles sampling demand hotspots or sparsely sampled regions, which are the regions with higher chance of picking up users than the rest of the service area. 2000 4000 6000 8000 10000 0.6 0.8 1 1.2 1.4 1.6 Total length of walks of all vehicles KLD K=10 K=20 K=30 2000 4000 6000 8000 10000 0.6 0.8 1 1.2 1.4 1.6 Total length of walks of all vehicles KLD K=10 K=20 K=30 2000 4000 6000 8000 10000 0.6 0.8 1 1.2 1.4 1.6 Total length of walks of all vehicles KLD K=10 K=20 K=30 (a) FGP (b) GP-DDF (c) GP-DDF+ 0 5 10 15 20 25 30 35 FGP GP−DDF GP−DDF+ Avg. cruising length K=10 K=20 K=30 0 50 100 150 200 250 FGP GP−DDF GP−DDF+ Avg. waiting time K=10 K=20 K=30 0 100 200 300 400 500 600 FGP GP−DDF GP−DDF+ No. of pickups K=10 K=20 K=30 (d) Vehicles (e) Users (f) Pickups Fig. 4. Scalability of MoD systems in servicing mobility demands. The above results indicate that more vehicles in MoD sys- tem result in better accuracy in predicting the mobility demand pattern, and achieve a dual effect of better performance in servicing mobility demands. VII. CONCLUSION This paper describes a novel GP-based decentralized data fusion (GP-DDF+) and active sensing (DAS) algorithm for real-time, fine-grained mobility demand sensing and predic- tion with a fleet of autonomous robotic vehicles in a MoD system. We have analytically and empirically demonstrated that GP-DDF+ can achieve a better balance between predictive accuracy and time efficiency than the state-of-the-art GP-DDF [6] and FGP [14, 15]. The practical applicability of GP-DDF+ is not restricted to mobility demand prediction; it can be used in other urban and natural environmental sensing applications like monitoring of traffic, ocean and freshwater phenomena [4, 7, 13, 16, 17, 18, 23, 26, 28]. We have also analytically and empirically shown that even though DAS is devised to gather the most informative demand data for predicting the mobility demand pattern, it can achieve a dual effect of fleet rebalancing to service the mobility demands. For our future work, we will relax the assumption of all-to-all communication such that each sensor may only be able to communicate locally with its neighbors and develop a distributed data fusion approach to efficient and scalable approximate GP and ℓGP prediction. ACKNOWLEDGMENTS This work was supported by Singapore-MIT Alliance Re- search & Technology Subaward Agreements No. 28 R-252- 000-502-592 & No. 33 R-252-000-509-592. REFERENCES [1] GM Shows Chevrolet EN-V 2.0 Mobility Concept Vehicle. General Motors Co. (http://media.gm.com/ media/us/en/gm/news.detail.html/content/Pages/news/us/ en/2012/Apr/0423 EN-V 2 Rendering.html), 2012. [2] Hong Kong in Figures. Census and Statistics De- partment, Hong Kong Special Administrative Region (http://www.censtatd.gov.hk); Singapore Land Transport: Statistics in Brief. Land Transport Authority of Singapore (http://www.lta.gov.sg), 2012. [3] G. Berbeglia, J.-F. Cordeau, and G. Laporte. Dynamic pickup and delivery problems. Eur. J. Oper. Res., 202 (1):8–15, 2010. [4] N. Cao, K. H. Low, and J. M. Dolan. Multi-robot infor- mative path planning for active sensing of environmental phenomena: A tale of two algorithms. In Proc. AAMAS, pages 7–14, 2013. [5] H.-W. Chang, Y.-C. Tai, and J. Y.-J. Hsu. Context- aware taxi demand hotspots prediction. Int. J. Business Intelligence and Data Mining, 5(1):3–18, 2010. [6] J. Chen, K. H. Low, C. K.-Y. Tan, A. Oran, P. Jaillet, J. M. Dolan, and G. S. Sukhatme. Decentralized data fusion and active sensing with mobile sensors for mod- eling and predicting spatiotemporal traffic phenomena. In Proc. UAI, pages 163–173, 2012. [7] J. M. Dolan, G. Podnar, S. Stancliff, K. H. Low, A. Elfes, J. Higinbotham, J. C. Hosler, T. A. Moisan, and J. Moisan. Cooperative aquatic sensing using the telesupervised adaptive ocean sensor fleet. In Proc. SPIE Conference on Remote Sensing of the Ocean, Sea Ice, and Large Water Regions, volume 7473, 2009. [8] Y. Ge, H. Xiong, A. Tuzhilin, K. Xiao, M. Gruteser, and M. Pazzani. An energy-efficient mobile recommender system. In Proc. KDD, pages 899–908, 2010. [9] M. E. Hohn. Geostatistics and Petroleum Geology. Springer, 2nd edition, 1998. [10] M. Jelasity, A. Montresor, and O. Babaoglu. Gossip- based aggregation in large dynamic networks. ACM Trans. Comput. Syst., 23(3):219–252, 2005. [11] A. Krause, A. Singh, and C. Guestrin. Near-optimal sen- sor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. J. Mach. Learn. Res., 9:235–284, 2008. [12] X. Li, G. Pan, Z. Wu, G. Qi, S. Li, D. Zhang, W. Zhang, and Z. Wang. Prediction of urban human mobility using large-scale taxi traces and its applications. Front. Comput. Sci., 6(1):111–121, 2012. [13] K. H. Low, G. J. Gordon, J. M. Dolan, and P. Khosla. Adaptive sampling for multi-robot wide-area exploration. In Proc. IEEE ICRA, pages 755–760, 2007. [14] K. H. Low, J. M. Dolan, and P. Khosla. Adaptive multi-robot wide-area exploration and mapping. In Proc. AAMAS, pages 23–30, 2008. [15] K. H. Low, J. M. Dolan, and P. Khosla. Information- theoretic approach to efficient adaptive path planning for mobile robotic environmental sensing. In Proc. ICAPS, pages 233–240, 2009. [16] K. H. Low, G. Podnar, S. Stancliff, J. M. Dolan, and A. Elfes. Robot boats as a mobile aquatic sensor network. In Proc. IPSN-09 Workshop on Sensor Networks for Earth and Space Science Applications, 2009. [17] K. H. Low, J. M. Dolan, and P. Khosla. Active Markov information-theoretic path planning for robotic environ- mental sensing. In Proc. AAMAS, pages 753–760, 2011. [18] K. H. Low, J. Chen, J. M. Dolan, S. Chien, and D. R. Thompson. Decentralized active robotic exploration and mapping for probabilistic field classification in environ- mental sensing. In Proc. AAMAS, pages 105–112, 2012. [19] W. J. Mitchell. Mobility on demand: Future of transporta- tion in cities. Technical Report, MIT Media Laboratory, Cambridge, MA, 2008. [20] W. J. Mitchell, C. E. Borroni-Bird, and L. D. Burns. Reinventing the Automobile: Personal Urban Mobility for the 21st Century. MIT Press, Cambridge, MA, 2010. [21] M. Pavone, S. L. Smith, E. Frazzoli, and D. Rus. Robotic load balancing for mobility-on-demand systems. Int. J. Robot. Res., 31(7):839–854, 2012. [22] S. Peeta and A. K. Ziliaskopoulos. Foundations of dynamic traffic assignment: The past, the present and the future. Networks and Spatial Economics, 1:233–265, 2001. [23] G. Podnar, J. M. Dolan, K. H. Low, and A. Elfes. Telesupervised remote surface water quality sensing. In Proc. IEEE Aerospace Conference, 2010. [24] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006. [25] E. Snelson. Local and global sparse Gaussian process approximations. In Proc. AISTATS, 2007. [26] D. R. Thompson, N. Cabrol, M. Furlong, C. Hardgrove, K. H. Low, J. Moersch, and D. Wettergreen. Adaptive sampling of time series for remote exploration. In Proc. IEEE ICRA, 2013. [27] R. Webster and M. Oliver. Geostatistics for Environmen- tal Scientists. John Wiley & Sons, Inc., NY, 2007. [28] J. Yu, K. H. Low, A. Oran, and P. Jaillet. Hierarchical Bayesian nonparametric approach to modeling and learn- ing the wisdom of crowds of urban traffic route planning agents. In Proc. IAT, pages 478–485, 2012. [29] N. J. Yuan, Y. Zheng, L. Zhang, and X. Xie. T-Finder: A recommender system for finding passengers and vacant taxis. IEEE Trans. Knowl. Data Eng., 2012. APPENDIX A. Proof of Theorem 1B (ΓDD + Λ)−1 = ΣDUΣ−1 UUΣUD + Λ −1 = Λ−1 −Λ−1ΣDU ΣUU + ΣUDΛ−1ΣDU −1 ΣUDΛ−1 = Λ−1 −Λ−1ΣDU ¨Σ−1 UUΣUDΛ−1 . (23) The first equality is by the definition of ΓDD (16). The second equality follows from matrix inversion lemma. The last equality is due to ΣUU + ΣUDΛ−1ΣDU = ΣUU + K X k=1 ΣUDkΣ−1 DkDk|UΣDkU = ΣUU + K X k=1 ˙Σk UU = ¨ΣUU . (24) The first equality is by the definition of Λ (Theorem 1B). The second and last equalities are due to (6) and (8), respectively. We will first derive the expressions of four components useful for completing the proof later. eΓsDΛ−1(zD −µD) = X i̸=k ΓsDiΣ−1 DiDi|U(zDi −µDi) + ΣsDkΣ−1 DkDk|U(zDk −µDk) = ΣsUΣ−1 UU X i̸=k  ΣUDiΣ−1 DiDi|U(zDi −µDi)  + ˙zk s = ΣsUΣ−1 UU X i̸=k ˙zi U + ˙zk s = ΣsUΣ−1 UU(¨zU −˙zk U) + ˙zk s . (25) The first two equalities expand the first component using the definition of Λ (Theorem 1B), (5), (16), (17), and (18). The last two equalities exploit (5) and (7). eΓsDΛ−1ΣDU = X i̸=k ΓsDiΣ−1 DiDi|UΣDiU + ΣsDkΣ−1 DkDk|UΣDkU = ΣsUΣ−1 UU X i̸=k  ΣUDiΣ−1 DiDi|UΣDiU  + ΣsDkΣ−1 DkDk|UΣDkU = ΣsUΣ−1 UU X i̸=k ˙Σi UU + ˙Σk sU = ΣsUΣ−1 UU  ¨ΣUU −˙Σk UU −ΣUU  + ˙Σk sU = ΣsUΣ−1 UU ¨ΣUU −γk sU . (26) The first two equalities expand the second component by the same trick as that in (25). The third and fourth equalities exploit (6) and (8), respectively. The last equality is due to (11). Let αsU ≜ΣsUΣ−1 UU and its transpose is αUs. By using similar tricks in (25) and (26), we can derive the expressions of the remaining two components. If τs = τs′ = k, then eΓsDΛ−1eΓDs′ = X i̸=k ΓsDiΣ−1 DiDi|UΓDis′ + ΣsDkΣ−1 DkDk|UΣDks′ = ΣsUΣ−1 UU X i̸=k  ΣUDiΣ−1 DiDi|UΣDiU  Σ−1 UUΣUs′ + ˙Σk ss′ = αsU X i̸=k  ˙Σi UU  αUs′ + ˙Σk ss′ = αsU  ¨ΣUU −˙Σk UU −ΣUU  αUs′ + ˙Σk ss′ = αsU ¨ΣUUαUs′ −αsUγk Us′ −αsU ˙Σk Us′ + ˙Σk ss′ . (27) If τs = i and τs′ = j such that i ̸= j, then eΓsDΛ−1eΓDs′ = X k̸=i,j ΓsDkΣ−1 DkDk|UΓDks′ + ΣsDiΣ−1 DiDi|UΓDis′ + ΓsDjΣ−1 DjDj|UΣDjs′ = ΣsUΣ−1 UU X k̸=i,j  ΣUDkΣ−1 DkDk|UΣDkU  Σ−1 UUΣUs′ + ΣsDiΣ−1 DiDi|UΣDiUΣ−1 UUΣUs′ + ΣsUΣ−1 UUΣUDjΣ−1 DjDj|UΣDjs′ = αsU  ¨ΣUU −˙Σi UU −˙Σj UU −ΣUU  αUs′ + ˙Σi sUαUs′ + αsU ˙Σj Us′ = αsU  ¨ΣUU + ΣUU  αUs′ −αsU  ˙Σi UU + ΣUU  αUs′ −αsU  ˙Σj UU + ΣUU  αUs′ + ˙Σi sUαUs′ + αsU ˙Σj Us′ = αsU  ¨ΣUU + ΣUU  αUs′ −  αsU ˙Σi UU + αsUΣUU −˙Σi sU  αUs′ −αsU  ˙Σj UUαUs′ + ΣUUαUs′ −˙Σj Us′  = αsU  ¨ΣUU + ΣUU  αUs′ −γi sUαUs′ −αsUγj Us′ = αsU ¨ΣUUαUs′ + ΣsUΣ−1 UUΣUs′ −γi sUαUs′ −αsUγj Us′ (28) If τs = k, then µPIC s|D = µs + eΓsD (ΓDD + Λ)−1 (zD −µD) = µs + eΓsDΛ−1(zD −µD) −eΓsDΛ−1ΣDU ¨Σ−1 UUΣUDΛ−1(zD −µD) = µs + eΓsDΛ−1(zD −µD) −eΓsDΛ−1ΣDU ¨Σ−1 UU ¨zU = µs + ΣsUΣ−1 UU(¨zU −˙zk U) + ˙zk s −eΓsDΛ−1ΣDU ¨Σ−1 UU ¨zU = µs + ΣsUΣ−1 UU(¨zU −˙zk U) + ˙zk s −  ΣsUΣ−1 UU ¨ΣUU −γk sU  ¨Σ−1 UU ¨zU = µs +  γk sU ¨Σ−1 UU ¨zU −ΣsUΣ−1 UU ˙zk U  + ˙zk s = µk s . The first equality is by definition (14). The second equality is due to (23). The third equality is due to the definition of global summary (7). The fourth and fifth equalities are due to (25) and (26), respectively. If τs = τs′ = k, then σPIC ss′|D = σss′ −eΓsD (ΓDD + Λ)−1 eΓDs′ = σss′ −eΓsDΛ−1eΓDs′ + eΓsDΛ−1ΣDU ¨Σ−1 UUΣUDΛ−1eΓDs′ = σss′ −eΓsDΛ−1eΓDs′ +  αsU ¨ΣUU −γk sU  ¨Σ−1 UU  ¨ΣUUαUs′ −γk Us′  = σss′ −αsU ¨ΣUUαUs′ + αsUγk Us′ + αsU ˙Σk Us′ −˙Σk ss′ + αsU ¨ΣUUαUs′ −αsUγk Us′ −γk sUαUs′ + γk sU ¨Σ−1 UUγk Us′ = σss′ −  γk sUαUs′ −αsU ˙Σk Us′ −γk sU ¨Σ−1 UUγk Us′  −˙Σk ss′ = σk ss′ = σss′ . The first equality is by definition (15). The second equality is due to (23). The third equality is due to (26). The fourth equality is due to (27). The last two equalities are by definition (13). If τs = i and τs′ = j such that i ̸= j, then σPIC ss′|D = σss′ −eΓsDΛ−1eΓDs′ + αsU ¨ΣUUαUs′ −αsUγj Us′ −γi sUαUs′ + γi sU ¨Σ−1 UUγj Us′ = σss′ −  αsU ¨ΣUUαUs′ + ΣsUΣ−1 UUΣUs′ −γi sUαUs′ −αsUγj Us′  + αsU ¨ΣUUαUs′ −αsUγj Us′ −γi sUαUs′ + γi sU ¨Σ−1 UUγj Us′ = σss′ −ΣsUΣ−1 UUΣUs′ + γi sU ¨Σ−1 UUγj Us′ = Σss′|U + γi sU ¨Σ−1 UUγj Us′ = σss′ . The first equality is obtained using a similar trick as when τs = τs′ = k. The second equality is due to (28). The second last equality is by the definition of posterior covariance in GP model (2). The last equality is by definition (13). Gaussian Process-Based Decentralized Data Fusion and Active Sensing for Mobility-on-Demand System Jie Chen, Kian Hsiang Low, and Colin Keng-Yan Tan Department of Computer Science, National University of Singapore, Republic of Singapore {chenjie, lowkh, ctank}@comp.nus.edu.sg Abstract—Mobility-on-demand (MoD) systems have recently emerged as a promising paradigm of one-way vehicle sharing for sustainable personal urban mobility in densely populated cities. In this paper, we enhance the capability of a MoD system by deploying robotic shared vehicles that can autonomously cruise the streets to be hailed by users. A key challenge to managing the MoD system effectively is that of real-time, fine-grained mobility demand sensing and prediction. This paper presents a novel decentralized data fusion and active sensing algorithm for real- time, fine-grained mobility demand sensing and prediction with a fleet of autonomous robotic vehicles in a MoD system. Our Gaussian process (GP)-based decentralized data fusion algorithm can achieve a fine balance between predictive power and time efficiency. We theoretically guarantee its predictive performance to be equivalent to that of a sophisticated centralized sparse approximation for the GP model: The computation of such a sparse approximate GP model can thus be distributed among the MoD vehicles, hence achieving efficient and scalable demand prediction. Though our decentralized active sensing strategy is devised to gather the most informative demand data for demand prediction, it can achieve a dual effect of fleet rebalancing to service the mobility demands. Empirical evaluation on real-world mobility demand data shows that our proposed algorithm can achieve a better balance between predictive accuracy and time efficiency than state-of-the-art algorithms. I. INTRODUCTION Private automobiles are becoming unsustainable personal mobility solutions in densely populated urban cities because the addition of parking and road spaces cannot keep pace with their escalating numbers due to limited urban land. For exam- ple, Hong Kong and Singapore have, respectively, experienced 27.6% and 37% increase in private vehicles from 2003 to 2011 [2]. However, their road networks have only expanded less than 10% in size. Without implementing sustainable measures, traffic congestions and delays will grow more severe and frequent, especially during peak hours. Mobility-on-demand (MoD) systems [20] (e.g., V´elib sys- tem of over 20000 shared bicycles in Paris, experimental car- sharing systems described in [21]) have recently emerged as a promising paradigm of one-way vehicle sharing for sustainable personal urban mobility, specifically, to tackle the problems of low vehicle utilization rate and parking space caused by private automobiles. Conventionally, a MoD system provides stacks and racks of light electric vehicles distributed throughout a city: When a user wants to go somewhere, he simply walks to the nearest rack, swipes a card to pick up a vehicle, drives it to the rack nearest to his destination, and drops it off. In this paper, we enhance the capability of a MoD system by deploying robotic shared vehicles (e.g., General Motors Chevrolet EN-V 2.0 prototype [1]) that can autonomously drive and cruise the streets of a densely populated urban city to be hailed by users (like taxis) instead of just waiting at the racks to be picked up. Compared to the conventional MoD system, the fleet of autonomous robotic vehicles provides greater accessibility to users who can be picked up and dropped off at any location in the road network. As a result, it can service regions of high mobility demand but with poor coverage of stacks and racks due to limited space for their installation. The key factors in the success of a MoD system are the costs to the users and system latencies, which can be minimized by managing the MoD system effectively. To achieve this, two main technical challenges need to be addressed [19]: (a) Real- time, fine-grained mobility demand sensing and prediction, and (b) real-time active fleet management to balance vehi- cle supply and demand and satisfy latency requirements at sustainable operating costs. Existing works on load balancing for MoD systems [21], dynamic traffic assignment problems [22], dynamic one-to-one pickup and delivery problems [3], and location recommendation and dispatch for cruising taxis [5, 8, 12, 29] have tackled variants of the second challenge by assuming the necessary input of mobility demand information to be perfectly or accurately known using prior knowledge or offline processing of historic data. Such an assumption does not hold for densely populated urban cities because their mobility demand patterns are often subject to short- term random fluctuations and perturbations, in particular, due to frequent special events (e.g., storewide sales, exhibitions), unpredictable weather conditions, or emergencies (e.g., break- downs in public transport services). So, in order for the active fleet management strategies to perform well, they require accu- rate, fine-grained information of the spatiotemporally varying mobility demand patterns in real time, which is the desired outcome of addressing the first challenge. To the best of our knowledge, there is little progress in the algorithmic development of the first challenge, which will be the focus of our work in this paper. Given that the autonomous robotic vehicles are used directly as mobile probes to sample real-time demand data (e.g., pickup counts of different regions), how can the vacant ones actively cruise a road network to gather and assimilate the most informative demand data for predicting the mobility demand pattern? To solve this problem, a centralized approach to data arXiv:1306.1491v1 [cs.RO] 2 Jun 2013 fusion and active sensing [4, 11, 14, 15, 17, 18] is poorly suited because it suffers from a single point of failure and incurs huge communication, space, and time overheads with large data and fleet. Hence, we propose a novel decentralized data fusion and active sensing algorithm for real-time, fine- grained mobility demand sensing and prediction with a fleet of autonomous robotic vehicles in a MoD system. The specific contributions of our work include: • Modeling and predicting the spatiotemporally varying mo- bility demand pattern using a rich class of Bayesian non- parametric models called the log-Gaussian process (ℓGP) (Section II); • Developing a novel Gaussian process-based decentral- ized data fusion (GP-DDF+) algorithm (Section III) that achieves a fine balance between predictive power and time efficiency, and theoretically guaranteeing its predictive performance to be equivalent to that of a sophisticated centralized sparse approximation for the Gaussian process (GP) model: The computation of such a sparse approximate GP model can thus be distributed among the MoD vehicles, hence achieving efficient and scalable demand prediction; • Exploiting a decentralized active sensing (DAS) strategy (Section IV) to gather the most informative demand data for predicting the mobility demand pattern: Interestingly, we can analytically show that DAS exhibits a cruising behavior of simultaneously exploring demand hotspots and sparsely sampled regions that have higher likelihood of picking up users, hence achieving a dual effect of fleet rebalancing to service the mobility demands; • Empirically evaluating the predictive accuracy, time effi- ciency, scalability, and performance of servicing mobility demands (i.e., average cruising length of vehicles, average waiting time of users, total number of pickups) of our proposed algorithm on a real-world mobility demand pat- tern over the central business district of Singapore during evening hours (Section VI). II. MODELING A MOBILITY DEMAND PATTERN The service area in an urban city can be represented as a directed graph G ≜(V, E) where V denotes a set of all regions generated by gridding the service area, and E ⊆V ×V denotes a set of edges such that there is an edge (s, s′) from s ∈V to s′ ∈V iff at least one road segment in the road network starts in s and ends in s′. Each region s ∈V is associated with a p-dimensional feature vector xs representing its context information (e.g., location, time, precipitation), and a measurement ys quantifying its mobility demand1. Since it is often impractical in terms of sensing resource cost to determine the actual mobility demand of a region, a common practice is to use the pickup count of the region as a surrogate measure. To elaborate, the user pickups made by vacant MoD 1The service area is represented as a grid of regions instead of a network of road segments like in [6] because we observe less smoothly-varying, noisier demand measurements (hence, lower spatial correlation) for the latter in our real-world data (Section VI) since many road segments do not permit stopping of vehicles. vehicles cruising in a region contribute to its pickup count. Since we do not assume a data center to be available to keep track of the pickup count, a fully distributed gossip-based protocol [10] is utilized to aggregate these pickup information from the vehicles in the region that are connected via an ad hoc wireless communication network. Consequently, any vehicle entering the region can access its pickup count simply by joining its ad hoc network. As observed in [5, 12] and our real-world data (Fig. 1a), a mobility demand pattern over a large service area in an urban city is typically characterized by spatiotemporally correlated demand measurements and contains a few small-scale hotspots exhibiting extreme measurements and much higher spatiotem- poral variability than the rest of the demand pattern. That is, if the measurements are put together into a 1D sample frequency distribution, a positive skew results. We like to consider using a rich class of Bayesian nonparametric models called Gaussian process (GP) [24] to model the demand pattern. But, the GP covariance structure is sensitive to strong positive skewness and easily destabilized by a few extreme measurements [27]. In practice, this can cause reconstructed patterns to display large hotspots centered about a few extreme measurements and predictive variances to be unrealistically small in hotspots [9], which are undesirable. So, if the GP is used to model a demand pattern directly, it may not predict well. To resolve this, a standard statistical practice is to take the log of the measurements (i.e., zs = log ys) to remove skewness and extremity, and use the GP to model the demand pattern in the log-scale instead. A. Gaussian Process (GP) Each region s ∈V is associated with a realized (random) log-measurement zs (Zs) if s is sampled/observed (unob- served). Let {Zs}s∈V denote a GP, that is, every finite subset of {Zs}s∈V has a multivariate Gaussian distribution. The GP is fully specified by its prior mean µs ≜E[Zs] and covariance σss′ ≜cov(Zs, Zs′) for all s, s′ ∈V , the latter of which is defined by the widely-used squared exponential covariance function: σss′ ≜σ2 s exp −1 2 p X i=1 [xs]i −[xs′]i ℓi 2! + σ2 nδss′ where [xs]i ([xs′]i) is the i-th component of the feature vector xs (xs′), the hyperparameters σ2 n, σ2 s, ℓ1, . . . , ℓp are, respec- tively, noise and signal variances and length-scales that can be learned using maximum likelihood estimation, and δss′ is a Kronecker delta that is 1 if s = s′ and 0 otherwise. Given a set D ⊂V of observed regions and a column vector zD of corresponding log-measurements, the GP can be used to predict the log-measurements of any set S ⊂V of unobserved regions with the following Gaussian posterior mean vector and covariance matrix: µS|D ≜µS + ΣSDΣ−1 DD(zD −µD) (1) ΣSS|D ≜ΣSS −ΣSDΣ−1 DDΣDS (2) where µS(µD) is a column vector with mean components µs for all s ∈S(s ∈D), ΣSD(ΣDD) is a covariance matrix with covariance components σss′ for all s ∈S, s′ ∈D(s, s′ ∈D), and ΣSD is the transpose of ΣDS. B. Log-Gaussian Process (ℓGP) Demand measurements may not be observed in some re- gions because vacant MoD vehicles did not cruise into them. Since our ultimate interest is to predict them in the original scale, GP’s predicted log-measurements of these unobserved regions must be transformed back unbiasedly. To achieve this, we utilize a widely-used variant of GP in geostatistics called the ℓGP that can model the demand pattern in the original scale. Let {Ys}s∈V denote a ℓGP: If Zs ≜log Ys, then {Zs}s∈V is a GP. So, Ys = exp{Zs} denotes the original random demand measurement of unobserved region s and is predicted using the log-Gaussian posterior mean (i.e., best unbiased predictor) bµs|D ≜exp(µs|D + Σss|D/2) (3) where µs|D and Σss|D are simply the Gaussian posterior mean (1) and variance (2) of GP, respectively. The uncertainty of predicting the measurements of any set S ⊂V of unobserved regions can be quantified by the following log-Gaussian poste- rior joint entropy, which will be exploited by our DAS strategy (Section IV): H[YS|YD] ≜1 2 log(2πe)|S| ΣSS|D + µS|D · 1 (4) where µS|D and ΣSS|D are the Gaussian posterior mean vector (1) and covariance matrix (2) of GP, respectively. III. DECENTRALIZED DEMAND DATA FUSION The demand data are gathered in a distributed manner by the vacant MoD vehicles cruising the service area and have to be assimilated in order to predict the mobility demand pattern. A straightforward approach to data fusion is to fully communicate all the data to every vehicle, each of which then performs the same exact ℓGP prediction (3) separately. This approach, which we call full Gaussian process (FGP) [14, 15], unfortunately cannot scale well and be performed in real time due to its cubic time complexity in the size of the data. Alternatively, the work of [6] has recently proposed a Gaussian process-based decentralized data fusion (GP-DDF) approach to efficient and scalable approximate GP and ℓGP prediction. Their key idea is as follows: Each vehicle sum- marizes all its local data, based on a common prior support set U ⊂V , into a local summary (Definition 1) to be exchanged with every other vehicle. Then, it assimilates the local summaries received from the other vehicles into a global summary (Definition 2), which is then exploited for predicting the demands of unobserved regions. Though GP-DDF scales very well with large data, it can predict poorly due to (a) loss of information caused by summarizing the measurements and correlation structure of the original data; and (b) sparse coverage of the hotspots (i.e., with higher spatiotemporal variability) by the support set. We propose a novel decentralized data fusion algorithm called GP-DDF+ that combines the best of both worlds, that is, the predictive power of FGP and efficiency of GP-DDF. GP- DDF+ is based on the intuition that a vehicle can exploit its local data to improve the demand predictions for unobserved regions “close” to its data (in the correlation sense). At the same time, GP-DDF+ can preserve the efficiency of GP-DDF by exploiting its idea of summarizing information, specifically, into the local and global summaries, as reproduced below: Definition 1 (Local Summary): Given a common support set U ⊂V known to all K vehicles, a set Dk ⊂V of observed regions and a column vector zDk of corresponding demand measurements local to vehicle k, its local summary is defined as a tuple ( ˙zk U, ˙Σk UU) where ˙zk B ≜ΣBDkΣ−1 DkDk|U(zDk −µDk) (5) ˙Σk BB′ ≜ΣBDkΣ−1 DkDk|UΣDkB′ (6) such that ΣDkDk|U is defined in a similar manner as (2). Definition 2 (Global Summary): Given a common support set U ⊂V known to all K vehicles and the local summary ( ˙zk U, ˙Σk UU) of every vehicle k = 1, . . . , K, the global summary is defined as a tuple (¨zU, ¨ΣUU) where ¨zU ≜ K X k=1 ˙zk U (7) ¨ΣUU ≜ΣUU + K X k=1 ˙Σk UU . (8) Using GP-DDF [6], each vehicle exploits the global summary to compute a globally consistent predictive Gaussian distribu- tion of the log-measurements of any set of unobserved regions. The resulting predictive Gaussian mean and variance can then be plugged into (3) to obtain the log-Gaussian posterior mean for predicting the demand of any unobserved region in the original scale. To improve the predictive power of GP-DDF, we develop the following novel GP-DDF+ algorithm that is further augmented by local information. Definition 3 (GP-DDF+ k ): Given a common support set U ⊂V known to all K vehicles, the global summary (¨zU, ¨ΣUU), the local summary ( ˙zk U, ˙Σk UU), a set Dk ⊂V of observed regions and a column vector zDk of corresponding measurements local to vehicle k, its GP-DDF+ k algorithm computes a predictive Gaussian distribution N(µk S, Σ k SS) of the demand measurements of any set S ⊂V of unobserved regions where µk S ≜ µk s  s∈S and Σ k SS ≜ σk ss′  s,s′∈S such that µk s ≜µs +  γk sU ¨Σ−1 UU ¨zU −ΣsUΣ−1 UU ˙zk U  + ˙zk s (9) σk ss′ ≜σss′ −  γk sUΣ−1 UUΣUs′ −ΣsUΣ−1 UU ˙Σk Us′ −γk sU ¨Σ−1 UUγk Us′  −˙Σk ss′ (10) and γk sU ≜ΣsU + ΣsUΣ−1 UU ˙Σk UU −˙Σk sU . (11) Remark 1. Both the predictive Gaussian mean µk s (9) and co- variance σk ss′(10) of GP-DDF+ k exploit summary information (i.e., bracketed term) derived from global and local summaries and local information (i.e., last term) derived from local data. Remark 2. The predictive Gaussian mean µk s (9) and variance σk ss (10) can be plugged into (3) to obtain the log-Gaussian posterior mean for predicting the demand of any unobserved region in the original scale. Remark 3. Since different vehicles exploit different local data, their GP-DDF+ k algorithms provide inconsistent predictions of the mobility demand pattern. It is often desirable to achieve a globally consistent demand prediction among all vehicles. To do this, each unobserved region is simply assigned to the vehicle that predicts its demand best, which can be performed in a decentralized way: Definition 4 (Assignment Function): An assignment func- tion τ : V 7→{1 . . . K} is defined as τ(s) ≜arg min k∈{1...K} σk ss (12) for all s ∈S where the predictive variance σk ss is defined in (10). From now on, let τs ≜τ(s) for notational simplicity. Using the assignment function τ, each vehicle can now com- pute a globally consistent predictive Gaussian distribution, as detailed in Theorem 1A below: Theorem 1 (GP-DDF+): Let a common support set U ⊂ V and a common assignment function τ be known to all K vehicles. A. The GP-DDF+ algorithm of each vehicle computes a globally consistent predictive Gaussian distribution N(µS, ΣSS) of the demand measurements of any set S ⊂V of unobserved regions where µS ≜(µτs s )s∈S (9) and ΣSS ≜(σss′)s,s′∈S such that σss′ ≜ ( στs ss′ if τs = τs′, Σss′|U + γτs sU ¨Σ−1 UUγτs′ Us′ otherwise , (13) and γτs′ Us′ is the transpose of γτs′ s′U. B. Let N(µPIC S|D, ΣPIC SS|D) be the predictive Gaussian distribution computed by the centralized sparse par- tially independent conditional (PIC) approximation of GP model [25] where µPIC S|D ≜  µPIC s|D  s∈S and ΣPIC SS|D ≜  σPIC ss′|D  s,s′∈S such that µ PIC s|D ≜µs + eΓsD (ΓDD + Λ)−1 (zD −µD) (14) σ PIC ss′|D ≜σss′ −eΓsD (ΓDD + Λ)−1 eΓDs′ (15) and eΓDs′ is the transpose of eΓs′D such that ΓBB′ ≜ΣBUΣ−1 UUΣUB′ (16) eΓsD ≜(eΓs¯s)¯s∈D (17) eΓs¯s ≜  σs¯s if τs = τ¯s, Γs¯s otherwise , (18) and Λ is a block-diagonal matrix constructed from the K diagonal blocks of ΣDD|U, each of which is a matrix ΣDkDk|U for k = 1, . . . , K where D = SK k=1 Dk, and let τ¯s ≜k for all ¯s ∈Dk. Then, µs = µPIC s|D and σss′ = σPIC ss′|D for all s, s′ ∈S. The proof of Theorem 1B is given in Appendix A. Remark 1. In Theorem 1A, if τs = τs′ = k, then vehicle k can compute µτs s (9) and σss′ (10) locally and send them to the other vehicles that request them. Otherwise, τs ̸= τs′ and vehicle k has to request |U|-sized vectors γτs sU and γτs′ s′U from the respective vehicles τs and τs′ to compute σss′ (13). Remark 2. The equivalence result of Theorem 1B implies that the computational load of the centralized PIC approxi- mation of GP can be distributed among K vehicles, hence improving the time efficiency of demand prediction. Sup- posing |S| ≤|U| and |S| ≤|D|/K for simplicity, the O |D|((|D|/K)2 + |U|2)  time incurred by PIC can be re- duced to O (|D|/K)3 + |U|3 + |U|2K  time of running GP- DDF+ on each of the K vehicles. Hence, GP-DDF+ scales better with increasing size |D| of data. Remark 3. The equivalence result also sheds some light on an important property of GP-DDF+ based on the struc- ture of PIC: It is assumed that ZD1 S S1, ..., ZDK S SK are conditionally independent given the support set U. As compared to GP-DDF that assumes conditional independence of ZD1, . . . , ZDK, ZS1, . . . , ZSK, GP-DDF+ can predict ZS better since it imposes a weaker conditional independence as- sumption. Experimental results on real-world mobility demand data (Section VI) also show that GP-DDF+ achieves predictive accuracy comparable to FGP and significantly better than GP- DDF, thus justifying the practicality of such an assumption for predicting a mobility demand pattern. IV. DECENTRALIZED ACTIVE DEMAND SENSING Suppose that there are K vacant MoD vehicles in the fleet actively cruising the service area and each vehicle k ∈ {1 . . . K} has observed a set Dk ⊂V of regions. In the active demand sensing problem, all vehicles have to jointly select the most informative walks w∗ 1, . . . w∗ K of length H each along which demand data will be sampled: (w∗ 1, . . . , w∗ K) ≜arg max (w1,...,wK) H h YSK k=1 Swk YSK k=1 Dk i (19) where Swk denotes the set of unobserved regions to be visited by the walk wk. To ease notation, let w ≜(w1 . . . wK) and Sw = SK k=1 Swk (similarly, for w∗and Sw∗). Then, each vehicle k executes its walk w∗ k while observing the demands of regions Sw∗ k, and updates its location and stored data. To derive the most informative joint walk w∗, the posterior entropy (19) of every possible joint walk w has to be evaluated. Such a centralized strategy cannot be performed in real time due to the following two issues: (a) It relies on all the demand data that are gathered in a distributed manner by the vehicles, thus incurring huge time and communication overheads with large data, and (b) it involves evaluating a prohibitively large number of joint walks (i.e., exponential in the fleet size). The first issue can be alleviated by approximating the log- Gaussian posterior joint entropy using the decentralized GP- DDF+ algorithm (Theorem 1A), thus distributing its com- putational load among all vehicles. Then, the active demand sensing problem (19) is approximated by w∗= arg max w H [YSw] (20) H [YSw] ≜1 2 log(2πe)|Sw| ΣSwSw + µSw · 1 . (21) To obtain H [YSw] (21), ΣSwSw|D and µSw|D in H[YSw|YD] ((4) & (19)) are replaced by ΣSwSw and µSw defined in Theorem 1A, respectively. To address the second issue, a partially decentralized active sensing strategy proposed by [6] partitions the vehicles into several small groups such that each group of vehicles selects its joint walk independently. This partitioning heuristic performs poorly when the largest group formed still contains many vehicles. In our work, this is indeed the case because many vehicles tend to cluster within hotspots, as explained later. To scale well in the fleet size, we therefore adopt a fully decentralized active sensing (DAS) strategy by assuming that the joint walk w∗ 1 . . . w∗ K is derived by selecting the locally optimal walk of each vehicle k: w∗ k = arg max wk H h YSwk i (22) where H h YSwk i is defined in the same way as (21). Then, each vehicle can select its locally optimal walk independently of the other vehicles, thus significantly reducing the search space of joint walks. A consequence of such an assumption is that, without coordinating their walks, the vehicles may select suboptimal joint walks (e.g., two vehicles’ locally optimal walks are highly correlated). In practice, this assumption becomes less restrictive when the size |D| of data increases to potentially reduce the degree of violation of conditional independence of YSw1 , . . . , YSwK . More importantly, it can be observed from (21) and (22) that the cruising behavior of DAS trades off between exploring sparsely sampled regions with high predictive uncertainty (i.e., by maximizing the log-determinant of Gaussian poste- rior covariance matrix ΣSwk Swk term) and hotspots (i.e., by maximizing the Gaussian posterior mean vector µSwk term). As a result, it redistributes vacant MoD vehicles to regions with high likelihood of picking up users. Hence, besides gathering the most informative data for predicting the mobility demand pattern, DAS is able to achieve a dual effect of fleet rebalancing to service mobility demands. V. TIME AND COMMUNICATION ANALYSIS In this section, we analyze the time and communication overheads of our proposed GP-DDF+ coupled with DAS algorithm (Algo. 1) and compare them with that of both FGP (Section II) and GP-DDF [6] coupled with DAS algorithms (Section IV). Algorithm 1: GP-DDF++DAS(U, K, H, k, Dk, zDk) while true do /* Data fusion (Section III) */ Construct local summary by (5) & (6) Exchange local summary with every vehicle i ̸= k Construct global summary by (7) & (8) Construct assignment function by (12) Predict demand measurements of unobserved regions by (9) & (13) /* Active Sensing (Section IV) */ Compute local maximum-entropy walk w∗ k by (22) Execute walk w∗ k and observe its demand measurements Yw∗ k Update local information Dk and yDk A. Time Complexity Firstly, each vehicle k has to evaluate ΣDkDk|U in O |U|3 + |U|(|D|/K)2 time and invert it in O (|D|/K)3 time. After that, the data fusion component constructs the local summary in O |U|2|D|/K + |U|(|D|/K)2 time by (5) and (6), and subsequently the global summary in O |U|2K  time by (7) and (8). To construct the assignment function for any unobserved set S ⊂V , vehicle k first computes |S| number of γk sU for all unobserved regions s ∈S in O |S||U|2 + |S|(|D|/K)2 time by (11). Then, after inverting ¨ΣUU in O(|U|3), the predictive means and variances for all s ∈S are computed in O |S||U|2 + |S|(|D|/K)2 time by (9) and (13), respectively. Let ∆≜δH denote the number of possible walks of length H where δ is the maximum out-degree of graph G. In the active sensing component, to obtain the locally optimal walk, the log- Gaussian posterior entropies (22) of all possible walks are derived from (9) and (13), respectively, in O ∆H|U|2 and O ∆(H|U|)2 time. We assume |S| ≤ δ∆where S denotes the set S wk Swk of regions covered by any vehicle k’s all possible walks of length H. Then, the time complexity for our GP-DDF+ coupled with DAS algorithm is O (|D|/K)3+|U|3+|U|2K+∆(H3+(H|U|)2+(|D|/K)2)  . In contrast, the time incurred by FGP and GP-DDF coupled with DAS algorithms are, respectively, O |D|3 + ∆(H3 + (H|D|)2)  and O (|D|/K)3 + |U|3 + |U|2K + ∆(H3 + (H|U|)2)  . It can be observed that our GP-DDF+ coupled with DAS algorithm can scale better with large size |D| of data and fleet size K than FGP coupled with DAS algorithm, and its increased computational load, as compared to GP-DDF coupled with DAS algorithm, is well distributed among K vehicles. B. Communication complexity In each iteration, each vehicle of the system running our GP-DDF+ coupled with DAS algorithm has to broadcast a O(|U|2)-sized local summary for constructing the global summary, exchange O(∆) scalar values for constructing the assignment function, and request O(∆) number of O(|U|)- sized γk sU components for evaluating the entropies of all possible local walks. In contrast, FGP coupled with DAS algo- rithm needs to broadcast O(|D|/K)-sized message comprising all its local data to handle communication failure, and GP- DDF coupled with DAS algorithm only needs to broadcast a O(|U|2)-sized local summary. VI. EXPERIMENTS AND DISCUSSION This section evaluates the performance of our proposed algorithm using a real-world taxi trajectory dataset taken from the central business district of Singapore between 9:30 p.m. and 10 p.m. on August 2, 2010. The service area is gridded into 50 × 100 regions such that 2506 regions are included into the dataset as the remaining regions contain no road segment for cruising vehicles to access. The maximum out- degree δ of graph G over these regions is 8. The feature vector of each region is specified by its corresponding location. In any region, the demand (supply) measurement is obtained by counting the number of pickups (taxis cruising by) from all historic taxi trajectories generated by a major taxi company in a 30-minute time slot. After processing the taxi trajectories, the historic demand and supply distributions are obtained, as shown in Fig. 1. Then, a number C of users are randomly 10 20 30 40 50 60 70 80 90 100 5 10 15 20 25 30 35 40 45 50 10 20 30 40 50 60 70 80 90 100 5 10 15 20 25 30 35 40 45 50 (a) Demand (b) Supply Fig. 1. Historic demand and supply distributions. distributed over the service area with their locations drawn from the demand distribution (Fig. 1a). Similarly, a fleet of K vacant MoD vehicles are initialized at locations drawn from the supply distribution (Fig. 1b). In our simulation, when a vehicle enters a region with users, it picks up one of them randomly. Then, the MoD system removes this vehicle from the fleet of vacant cruising vehicles and introduce a new vacant vehicle drawn from the supply distribution. Similarly, a new user appears at a random location drawn from the demand distribution. The MoD system operates for L time steps and each vehicle plans a walk of length 4 at each time step, with all vehicles running a data fusion algorithm coupled with our DAS strategy. We will compare the performance of our GP-DDF+ algorithm with that of FGP and GP-DDF algorithms when coupled with our DAS strategy. The experiments are conducted on a Linux system with Intel® Xeon® CPU E5520 at 2.27 GHz. A. Performance Metrics The tested algorithms are evaluated with two sets of per- formance metrics. The performance of sensing and predicting mobility demands is evaluated using (a) root mean square error (RMSE) q |V |−1 P s∈V ys −bµs|D 2 where ys is the demand measurement and D is the set of regions observed by the MoD vehicles, and (b) incurred time of the algorithms. The performance of servicing mobility demands is eval- uated by comparing the Kullback-Leibler divergence (KLD) P s∈V Pc(s) log (Pc(s)/Pd(s)) between the fleet distribution Pc of vacant MoD vehicles controlled by the tested algorithms and historic demand distribution Pd (i.e., lower KLD implies better balance between fleet and demand), average cruising length of MoD vehicles, average waiting time of users, and total number of pickups resulting from the tested algorithms. B. Results and Analysis For notational simplicity, we will use GP-DDF+, FGP, and GP-DDF to represent the algorithms of their corresponding data fusion components coupled with DAS strategy in this subsection. 1) Performance: The MoD system comprises K = 20 vehicles running three tested algorithms for L = 960 time steps in a service area with C = 200 users. All results are obtained by averaging over 40 random instances. The performance of MoD systems in sensing and predicting mobility demands is illustrated in Figs. 2a-2b. Fig. 2a shows that the demand data collected by MoD vehicles using GP- DDF+ can achieve predictive accuracy comparable to that of using FGP and significantly better than that of using GP- DDF. This indicates that exploiting the local data of vehicles for predicting demands of nearby unobserved regions can improve the prediction of the mobility demand pattern. Fig. 2b shows the average incurred time of each vehicle using three algorithms. GP-DDF+ is significantly more time-efficient (i.e., one order of magnitude) than FGP, and only slightly less time- efficient than GP-DDF. This can be explained by the time analysis in Section V. The above results indicate that GP- DDF+ is more practical for real-world deployment due to a better balance between predictive accuracy and time efficiency. The performance of MoD systems in servicing the mobility demands is illustrated in Figs. 2c-2f. Fig. 2c shows that a MoD system using GP-DDF+ can achieve better fleet rebalancing of vehicles to service mobility demands than GP-DDF, but worse rebalancing than FGP. This implies that a better prediction of the underlying mobility demand pattern (Fig. 2a) can lead to better fleet rebalancing. Note that KLD (i.e., imbalance be- tween mobility demand and fleet) increases over time because we assume that when a vehicle picks up a user, its local data is removed from the fleet of cruising vehicles, and a new vehicle is introduced at a random location that may be distant from a demand hotspot, hence worsening the imbalance between demand and fleet. It can also be observed that an algorithm generating a better balance between fleet and demand will also perform better in servicing the mobility demands, that 200 400 600 800 1000 4.5 5 5.5 6 6.5 7 Length of walk RMSE FGP GP−DDF GP−DDF+ 200 400 600 800 1000 0 2 4 6 8 10 12 Length of walk Incurred time (s) FGP GP−DDF GP−DDF+ 200 400 600 800 1000 0.8 1 1.2 1.4 1.6 1.8 2 Length of walk KLD FGP GP−DDF GP−DDF+ 0 20 40 60 80 100 120 FGP GP−DDF GP−DDF+ Avg. length of walks 0 100 200 300 400 500 600 FGP GP−DDF GP−DDF+ Avg. waiting time 0 50 100 150 200 250 300 350 400 FGP GP−DDF GP−DDF+ Total no. of pickups (a) Accuracy (b) Efficiency (c) Balance (d) Vehicles (e) Users (f) Pickups Fig. 2. Performance of MoD systems in sensing, predicting, and servicing mobility demands. is, shorter average cruising trajectories of vehicles (Fig. 2d), shorter average waiting time of users (Fig. 2e), and larger total number of pickups (Fig. 2f). These observations imply that exploiting an active sensing strategy to collect the most informative demand data for predicting the mobility pattern achieves a dual effect of improving performance in servicing the mobility demands since these vehicles have higher chance of picking up users in demand hotspots or sparsely sampled regions (Section IV). 2) Scalability: We vary the number K = 10, 20, 30 of vehicles in the MoD system, and keep the total length of walks of all the vehicles to be the same, that is, these vehicles will walk for L = 960, 480, 320 steps, respectively. All three algorithms are tested in a service area with C = 600 users. All results are obtained by averaging over 40 random instances. From Figs. 3a-3c, it can be observed that all three algorithms can improve their prediction accuracy with an increasing number of vehicles in the MoD system because more vehicles indicate less walks when the total length of walks are the same, thus suffering less from the myopic planning (H = 4) and gathering more informative demand data. Figs. 3d-3f show that, with more MoD vehicles, GP-DDF+ and GP-DDF incur less time, while FGP incurs more time. This is because the computational load in decentralized data fusion algorithms are distributed among all vehicles, thus reducing the incurred time with more vehicles. 2000 4000 6000 8000 10000 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 Total length of walks of all vehicles RMSE K=10 K=20 K=30 2000 4000 6000 8000 10000 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 Total length of walks of all vehicles RMSE K=10 K=20 K=30 2000 4000 6000 8000 10000 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 Total length of walks of all vehicles RMSE K=10 K=20 K=30 (a) FGP (b) GP-DDF (c) GP-DDF+ 2000 4000 6000 8000 10000 0 0.5 1 1.5 Total length of walks of all vehicles Incurred time (s) K=10 K=20 K=30 2000 4000 6000 8000 10000 0 0.02 0.04 0.06 0.08 0.1 Total length of walks of all vehicles Incurred time (s) K=10 K=20 K=30 2000 4000 6000 8000 10000 0 0.1 0.2 0.3 0.4 0.5 Total length of walks of all vehicles Incurred time (s) K=10 K=20 K=30 (d) FGP (e) GP-DDF (f) GP-DDF+ Fig. 3. Scalability of MoD systems in sensing and predicting mobility demands. Figs. 4a-4c show that all three algorithms can achieve better balance between mobility demand and fleet with larger number of vehicles. It can also be observed that all three algorithms can improve the performance of servicing the mobility demand with more vehicles, that is, shorter average cruising trajectories of vehicles (Fig. 4d), shorter average waiting time of users (Fig. 4e), and larger total number of pickups (Fig. 4f). This is because MoD vehicles can collect more informative demand data with larger number of vehicles sampling demand hotspots or sparsely sampled regions, which are the regions with higher chance of picking up users than the rest of the service area. 2000 4000 6000 8000 10000 0.6 0.8 1 1.2 1.4 1.6 Total length of walks of all vehicles KLD K=10 K=20 K=30 2000 4000 6000 8000 10000 0.6 0.8 1 1.2 1.4 1.6 Total length of walks of all vehicles KLD K=10 K=20 K=30 2000 4000 6000 8000 10000 0.6 0.8 1 1.2 1.4 1.6 Total length of walks of all vehicles KLD K=10 K=20 K=30 (a) FGP (b) GP-DDF (c) GP-DDF+ 0 5 10 15 20 25 30 35 FGP GP−DDF GP−DDF+ Avg. cruising length K=10 K=20 K=30 0 50 100 150 200 250 FGP GP−DDF GP−DDF+ Avg. waiting time K=10 K=20 K=30 0 100 200 300 400 500 600 FGP GP−DDF GP−DDF+ No. of pickups K=10 K=20 K=30 (d) Vehicles (e) Users (f) Pickups Fig. 4. Scalability of MoD systems in servicing mobility demands. The above results indicate that more vehicles in MoD sys- tem result in better accuracy in predicting the mobility demand pattern, and achieve a dual effect of better performance in servicing mobility demands. VII. CONCLUSION This paper describes a novel GP-based decentralized data fusion (GP-DDF+) and active sensing (DAS) algorithm for real-time, fine-grained mobility demand sensing and predic- tion with a fleet of autonomous robotic vehicles in a MoD system. We have analytically and empirically demonstrated that GP-DDF+ can achieve a better balance between predictive accuracy and time efficiency than the state-of-the-art GP-DDF [6] and FGP [14, 15]. The practical applicability of GP-DDF+ is not restricted to mobility demand prediction; it can be used in other urban and natural environmental sensing applications like monitoring of traffic, ocean and freshwater phenomena [4, 7, 13, 16, 17, 18, 23, 26, 28]. We have also analytically and empirically shown that even though DAS is devised to gather the most informative demand data for predicting the mobility demand pattern, it can achieve a dual effect of fleet rebalancing to service the mobility demands. For our future work, we will relax the assumption of all-to-all communication such that each sensor may only be able to communicate locally with its neighbors and develop a distributed data fusion approach to efficient and scalable approximate GP and ℓGP prediction. ACKNOWLEDGMENTS This work was supported by Singapore-MIT Alliance Re- search & Technology Subaward Agreements No. 28 R-252- 000-502-592 & No. 33 R-252-000-509-592. REFERENCES [1] GM Shows Chevrolet EN-V 2.0 Mobility Concept Vehicle. General Motors Co. (http://media.gm.com/ media/us/en/gm/news.detail.html/content/Pages/news/us/ en/2012/Apr/0423 EN-V 2 Rendering.html), 2012. [2] Hong Kong in Figures. Census and Statistics De- partment, Hong Kong Special Administrative Region (http://www.censtatd.gov.hk); Singapore Land Transport: Statistics in Brief. Land Transport Authority of Singapore (http://www.lta.gov.sg), 2012. [3] G. Berbeglia, J.-F. Cordeau, and G. Laporte. Dynamic pickup and delivery problems. Eur. J. Oper. Res., 202 (1):8–15, 2010. [4] N. Cao, K. H. Low, and J. M. Dolan. Multi-robot infor- mative path planning for active sensing of environmental phenomena: A tale of two algorithms. In Proc. AAMAS, pages 7–14, 2013. [5] H.-W. Chang, Y.-C. Tai, and J. Y.-J. Hsu. Context- aware taxi demand hotspots prediction. Int. J. Business Intelligence and Data Mining, 5(1):3–18, 2010. [6] J. Chen, K. H. Low, C. K.-Y. Tan, A. Oran, P. Jaillet, J. M. Dolan, and G. S. Sukhatme. Decentralized data fusion and active sensing with mobile sensors for mod- eling and predicting spatiotemporal traffic phenomena. In Proc. UAI, pages 163–173, 2012. [7] J. M. Dolan, G. Podnar, S. Stancliff, K. H. Low, A. Elfes, J. Higinbotham, J. C. Hosler, T. A. Moisan, and J. Moisan. Cooperative aquatic sensing using the telesupervised adaptive ocean sensor fleet. In Proc. SPIE Conference on Remote Sensing of the Ocean, Sea Ice, and Large Water Regions, volume 7473, 2009. [8] Y. Ge, H. Xiong, A. Tuzhilin, K. Xiao, M. Gruteser, and M. Pazzani. An energy-efficient mobile recommender system. In Proc. KDD, pages 899–908, 2010. [9] M. E. Hohn. Geostatistics and Petroleum Geology. Springer, 2nd edition, 1998. [10] M. Jelasity, A. Montresor, and O. Babaoglu. Gossip- based aggregation in large dynamic networks. ACM Trans. Comput. Syst., 23(3):219–252, 2005. [11] A. Krause, A. Singh, and C. Guestrin. Near-optimal sen- sor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. J. Mach. Learn. Res., 9:235–284, 2008. [12] X. Li, G. Pan, Z. Wu, G. Qi, S. Li, D. Zhang, W. Zhang, and Z. Wang. Prediction of urban human mobility using large-scale taxi traces and its applications. Front. Comput. Sci., 6(1):111–121, 2012. [13] K. H. Low, G. J. Gordon, J. M. Dolan, and P. Khosla. Adaptive sampling for multi-robot wide-area exploration. In Proc. IEEE ICRA, pages 755–760, 2007. [14] K. H. Low, J. M. Dolan, and P. Khosla. Adaptive multi-robot wide-area exploration and mapping. In Proc. AAMAS, pages 23–30, 2008. [15] K. H. Low, J. M. Dolan, and P. Khosla. Information- theoretic approach to efficient adaptive path planning for mobile robotic environmental sensing. In Proc. ICAPS, pages 233–240, 2009. [16] K. H. Low, G. Podnar, S. Stancliff, J. M. Dolan, and A. Elfes. Robot boats as a mobile aquatic sensor network. In Proc. IPSN-09 Workshop on Sensor Networks for Earth and Space Science Applications, 2009. [17] K. H. Low, J. M. Dolan, and P. Khosla. Active Markov information-theoretic path planning for robotic environ- mental sensing. In Proc. AAMAS, pages 753–760, 2011. [18] K. H. Low, J. Chen, J. M. Dolan, S. Chien, and D. R. Thompson. Decentralized active robotic exploration and mapping for probabilistic field classification in environ- mental sensing. In Proc. AAMAS, pages 105–112, 2012. [19] W. J. Mitchell. Mobility on demand: Future of transporta- tion in cities. Technical Report, MIT Media Laboratory, Cambridge, MA, 2008. [20] W. J. Mitchell, C. E. Borroni-Bird, and L. D. Burns. Reinventing the Automobile: Personal Urban Mobility for the 21st Century. MIT Press, Cambridge, MA, 2010. [21] M. Pavone, S. L. Smith, E. Frazzoli, and D. Rus. Robotic load balancing for mobility-on-demand systems. Int. J. Robot. Res., 31(7):839–854, 2012. [22] S. Peeta and A. K. Ziliaskopoulos. Foundations of dynamic traffic assignment: The past, the present and the future. Networks and Spatial Economics, 1:233–265, 2001. [23] G. Podnar, J. M. Dolan, K. H. Low, and A. Elfes. Telesupervised remote surface water quality sensing. In Proc. IEEE Aerospace Conference, 2010. [24] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006. [25] E. Snelson. Local and global sparse Gaussian process approximations. In Proc. AISTATS, 2007. [26] D. R. Thompson, N. Cabrol, M. Furlong, C. Hardgrove, K. H. Low, J. Moersch, and D. Wettergreen. Adaptive sampling of time series for remote exploration. In Proc. IEEE ICRA, 2013. [27] R. Webster and M. Oliver. Geostatistics for Environmen- tal Scientists. John Wiley & Sons, Inc., NY, 2007. [28] J. Yu, K. H. Low, A. Oran, and P. Jaillet. Hierarchical Bayesian nonparametric approach to modeling and learn- ing the wisdom of crowds of urban traffic route planning agents. In Proc. IAT, pages 478–485, 2012. [29] N. J. Yuan, Y. Zheng, L. Zhang, and X. Xie. T-Finder: A recommender system for finding passengers and vacant taxis. IEEE Trans. Knowl. Data Eng., 2012. APPENDIX A. Proof of Theorem 1B (ΓDD + Λ)−1 = ΣDUΣ−1 UUΣUD + Λ −1 = Λ−1 −Λ−1ΣDU ΣUU + ΣUDΛ−1ΣDU −1 ΣUDΛ−1 = Λ−1 −Λ−1ΣDU ¨Σ−1 UUΣUDΛ−1 . (23) The first equality is by the definition of ΓDD (16). The second equality follows from matrix inversion lemma. The last equality is due to ΣUU + ΣUDΛ−1ΣDU = ΣUU + K X k=1 ΣUDkΣ−1 DkDk|UΣDkU = ΣUU + K X k=1 ˙Σk UU = ¨ΣUU . (24) The first equality is by the definition of Λ (Theorem 1B). The second and last equalities are due to (6) and (8), respectively. We will first derive the expressions of four components useful for completing the proof later. eΓsDΛ−1(zD −µD) = X i̸=k ΓsDiΣ−1 DiDi|U(zDi −µDi) + ΣsDkΣ−1 DkDk|U(zDk −µDk) = ΣsUΣ−1 UU X i̸=k  ΣUDiΣ−1 DiDi|U(zDi −µDi)  + ˙zk s = ΣsUΣ−1 UU X i̸=k ˙zi U + ˙zk s = ΣsUΣ−1 UU(¨zU −˙zk U) + ˙zk s . (25) The first two equalities expand the first component using the definition of Λ (Theorem 1B), (5), (16), (17), and (18). The last two equalities exploit (5) and (7). eΓsDΛ−1ΣDU = X i̸=k ΓsDiΣ−1 DiDi|UΣDiU + ΣsDkΣ−1 DkDk|UΣDkU = ΣsUΣ−1 UU X i̸=k  ΣUDiΣ−1 DiDi|UΣDiU  + ΣsDkΣ−1 DkDk|UΣDkU = ΣsUΣ−1 UU X i̸=k ˙Σi UU + ˙Σk sU = ΣsUΣ−1 UU  ¨ΣUU −˙Σk UU −ΣUU  + ˙Σk sU = ΣsUΣ−1 UU ¨ΣUU −γk sU . (26) The first two equalities expand the second component by the same trick as that in (25). The third and fourth equalities exploit (6) and (8), respectively. The last equality is due to (11). Let αsU ≜ΣsUΣ−1 UU and its transpose is αUs. By using similar tricks in (25) and (26), we can derive the expressions of the remaining two components. If τs = τs′ = k, then eΓsDΛ−1eΓDs′ = X i̸=k ΓsDiΣ−1 DiDi|UΓDis′ + ΣsDkΣ−1 DkDk|UΣDks′ = ΣsUΣ−1 UU X i̸=k  ΣUDiΣ−1 DiDi|UΣDiU  Σ−1 UUΣUs′ + ˙Σk ss′ = αsU X i̸=k  ˙Σi UU  αUs′ + ˙Σk ss′ = αsU  ¨ΣUU −˙Σk UU −ΣUU  αUs′ + ˙Σk ss′ = αsU ¨ΣUUαUs′ −αsUγk Us′ −αsU ˙Σk Us′ + ˙Σk ss′ . (27) If τs = i and τs′ = j such that i ̸= j, then eΓsDΛ−1eΓDs′ = X k̸=i,j ΓsDkΣ−1 DkDk|UΓDks′ + ΣsDiΣ−1 DiDi|UΓDis′ + ΓsDjΣ−1 DjDj|UΣDjs′ = ΣsUΣ−1 UU X k̸=i,j  ΣUDkΣ−1 DkDk|UΣDkU  Σ−1 UUΣUs′ + ΣsDiΣ−1 DiDi|UΣDiUΣ−1 UUΣUs′ + ΣsUΣ−1 UUΣUDjΣ−1 DjDj|UΣDjs′ = αsU  ¨ΣUU −˙Σi UU −˙Σj UU −ΣUU  αUs′ + ˙Σi sUαUs′ + αsU ˙Σj Us′ = αsU  ¨ΣUU + ΣUU  αUs′ −αsU  ˙Σi UU + ΣUU  αUs′ −αsU  ˙Σj UU + ΣUU  αUs′ + ˙Σi sUαUs′ + αsU ˙Σj Us′ = αsU  ¨ΣUU + ΣUU  αUs′ −  αsU ˙Σi UU + αsUΣUU −˙Σi sU  αUs′ −αsU  ˙Σj UUαUs′ + ΣUUαUs′ −˙Σj Us′  = αsU  ¨ΣUU + ΣUU  αUs′ −γi sUαUs′ −αsUγj Us′ = αsU ¨ΣUUαUs′ + ΣsUΣ−1 UUΣUs′ −γi sUαUs′ −αsUγj Us′ (28) If τs = k, then µPIC s|D = µs + eΓsD (ΓDD + Λ)−1 (zD −µD) = µs + eΓsDΛ−1(zD −µD) −eΓsDΛ−1ΣDU ¨Σ−1 UUΣUDΛ−1(zD −µD) = µs + eΓsDΛ−1(zD −µD) −eΓsDΛ−1ΣDU ¨Σ−1 UU ¨zU = µs + ΣsUΣ−1 UU(¨zU −˙zk U) + ˙zk s −eΓsDΛ−1ΣDU ¨Σ−1 UU ¨zU = µs + ΣsUΣ−1 UU(¨zU −˙zk U) + ˙zk s −  ΣsUΣ−1 UU ¨ΣUU −γk sU  ¨Σ−1 UU ¨zU = µs +  γk sU ¨Σ−1 UU ¨zU −ΣsUΣ−1 UU ˙zk U  + ˙zk s = µk s . The first equality is by definition (14). The second equality is due to (23). The third equality is due to the definition of global summary (7). The fourth and fifth equalities are due to (25) and (26), respectively. If τs = τs′ = k, then σPIC ss′|D = σss′ −eΓsD (ΓDD + Λ)−1 eΓDs′ = σss′ −eΓsDΛ−1eΓDs′ + eΓsDΛ−1ΣDU ¨Σ−1 UUΣUDΛ−1eΓDs′ = σss′ −eΓsDΛ−1eΓDs′ +  αsU ¨ΣUU −γk sU  ¨Σ−1 UU  ¨ΣUUαUs′ −γk Us′  = σss′ −αsU ¨ΣUUαUs′ + αsUγk Us′ + αsU ˙Σk Us′ −˙Σk ss′ + αsU ¨ΣUUαUs′ −αsUγk Us′ −γk sUαUs′ + γk sU ¨Σ−1 UUγk Us′ = σss′ −  γk sUαUs′ −αsU ˙Σk Us′ −γk sU ¨Σ−1 UUγk Us′  −˙Σk ss′ = σk ss′ = σss′ . The first equality is by definition (15). The second equality is due to (23). The third equality is due to (26). The fourth equality is due to (27). The last two equalities are by definition (13). If τs = i and τs′ = j such that i ̸= j, then σPIC ss′|D = σss′ −eΓsDΛ−1eΓDs′ + αsU ¨ΣUUαUs′ −αsUγj Us′ −γi sUαUs′ + γi sU ¨Σ−1 UUγj Us′ = σss′ −  αsU ¨ΣUUαUs′ + ΣsUΣ−1 UUΣUs′ −γi sUαUs′ −αsUγj Us′  + αsU ¨ΣUUαUs′ −αsUγj Us′ −γi sUαUs′ + γi sU ¨Σ−1 UUγj Us′ = σss′ −ΣsUΣ−1 UUΣUs′ + γi sU ¨Σ−1 UUγj Us′ = Σss′|U + γi sU ¨Σ−1 UUγj Us′ = σss′ . The first equality is obtained using a similar trick as when τs = τs′ = k. The second equality is due to (28). The second last equality is by the definition of posterior covariance in GP model (2). The last equality is by definition (13).