arXiv:1306.1591v1 [cs.AI] 7 Jun 2013 1 Autonomous search for a diffusive source in an unknown environment Branko Ristic, Alex Skvortsov, Andrew Walker† Abstract—The paper presents an approach to olfactory search for a diffusive emitting source of tracer (e.g. aerosol, gas) in an environment with unknown map of randomly placed and shaped obstacles. The measurements of tracer concentration are sporadic, noisy and without directional information. The search domain is discretised and modelled by a finite two-dimensional lattice. The links is the lattice represent the traversable paths for emitted particles and for the searcher. A missing link in the lattice indicates a blocked paths, due to the walls or obstacles. The searcher must simultaneously estimate the source parameters, the map of the search domain and its own location within the map. The solution is formulated in the sequential Bayesian framework and implemented as a Rao-Blackwellised particle filter with information-driven motion control. The numerical results demonstrate the concept and its performance. Index Terms—Olfactory search, Bayesian inference, mapping and localisation, Rao-Blackwellised particle filter, observer con- trol, information gain. I. INTRODUCTION The search for an emitting source of particles, chemicals, odour, or radiation, based on sporadic clues or intermittent measurements, has attracted a great deal of interest lately. The topic is important for search and rescue operations with the goal to localise dangerous pollutants, such as chemical leaks and radioactive sources. In biology, the search is studied to model animal bahaviour in search for food or mates [1]–[3]. Bio-inspired search for underwater sources of pollution have been studied in [4]–[6]. A robot for gas/odour plume tracking guided by the increase in the concentration gradient has been proposed in [7]. “Infotaxis” [8] is a search strategy based on entropy-reduction maximisation which has been developed in the context of finding a weak source in a turbulent flow (e.g. drug or leak emitting chemicals, for a comprehensive review see [9]). Information-gain driven search for radioactive point sources has been studied in [10]. In all these applications the search domain is either open (without obstacles) or a precise map of the search domain (with obstacles) is available. In this paper we focus on autonomous olfactory search for a diffusive emitting source of tracer (e.g. aerosol, gas, heat, moisture) in a domain with randomly placed and shaped obstacles (forbidden areas), whose structure (the map) is unknown. The problem is of importance for example in localisation of dangerous leaks in collapsed buildings, inside tunnels or mines. The searcher senses in a probabilistic manner both the structure of the search domain (e.g. the presence or †The authors are with Defence Science and Technology Organisa- tion, Australia. Corresponding author: B. Ristic, DSTO, 506 Lorimer Street, Melbourne, VIC 3207, Australia; Tel: +61 3 9696 7688; Email: branko.ristic@dsto.defence.gov.au absence of obstacles, walls, blocked passages) and the level of concentration of tracer particles. The objective of the search is to localise and report the coordinates of the source in a shortest possible time. This is not a trivial task for several reasons. First, the emission rate of the source is typically unknown. Furthermore, the measurements of tracer particle concentration are sporadic, noisy and without directional information. Since the structure (map) of the search domain is unknown, the searcher needs to explore the domain and create its map. The searcher motion is fully autonomous: it senses the environment and after processing this uncertain information sequentially makes decisions on where to move next in order to collect new measurements. Its motion control, however, is not fully reliable as it may occasionally fail to execute correctly. The probabilistic model of searcher motion is assumed known. In the paper we restrict to the search in a two-dimensional domain. The coordinates of the searcher initial position, as well as the border of the search area (relative to the initial position) are given as input parameters. In order to fulfil its mission, the searcher has to find the source and report its coordinates relative to its initial position. This in turn requires simultaneous estimation at three levels: (1) estimation of source parameters (its location in 2D and its release rate); (2) estimation of the map of the search area and (3) estimation of the searcher position within the estimated map. Estimation at levels (2) and (3) has been studied extensively in robotics under the term grid-based simultaneous localisation and mapping (SLAM) [11]. The primary mission in all SLAM publications is an accurate mapping of the area. The primary mission of our searcher, however, is to localise the source, while SLAM is only a necessary component of the solution. The only related work which deals with olfactory search in an unknown structured environment is [12]. While this paper presents a plethora of experimental results, the algorithms are based on heuristics. Our approach, however, is theoretically sound in the sense that its mathematical models are precisely defined, estimation is carried out in the sequential Bayesian framework and the searcher motion control is driven by information gain. The search domain is discretised, as for example in [4], and modelled by a finite two-dimensional lattice. With sufficiently fine resolution of the lattice, the emitting source can be considered to be in one of the nodes of the lattice. The links (bonds, edges) of the lattice represent the traversable paths for emitted particles (tracer) and for the searcher. Missing links in the lattice indicate blocked paths due to the walls or obstacles. This is a very general model applicable to searches at various scales, from inside buildings and tunnels, to within cells of living organisms [2]. The percentage of missing links in the 2 lattice is assumed to be above the percolation threshold pc (for the adopted lattice structure pc = 1/2 [13], [14]) so that long-range connectivity is satisfied [13]. Using the absorbing Markov chains technique [15], we can compute exactly the mean concentration level in any node of the lattice, that is at any point of the search domain with obstacles. Since the structure (map) of the search domain is unknown, the searcher must rely on a theoretical model of concentration measurement which is independent of the this map. Such a model is derived in the paper in the analytic form and used in the search. The search itself consist of algorithms for sequential esti- mation and motion control. We adopt the framework of op- timal sequential Bayesian estimation with information-driven motion control. Implementation is carried out using a Rao- Blackwellised particle filter. The paper is organised as follows. Mathematical models of measurements and searcher motion are described in Sec.II. The olfactory search problem is formulated and its concep- tual solution provided in Sec.III. Full technical details of the proposed search algorithm are presented in Sec.IV, with numerical results given in Sec.V. Finally, conclusions of this study are summarised in Sec. VI. II. MODELLING A. Model of environment The concentration of a tracer at any point of the search domain is governed by the diffusive equation, which in the steady state reduces to the Laplace equation [16]: D0 ∆θ = A0 δ(x −X,y −Y ). (1) Here D0 is the diffusion coefficient of tracer in the envi- ronment, ∆is the Laplace operator, θ is the mean (time- averaged) tracer concentration, δ is the Dirac delta function, A0 is the release-rate of the tracer source, and X, Y are the coordinates of the source in a two-dimensional Cartesian coordinate system. For convenience we adopt a circular search domain of radius R0, centred at the origin of the coordi- nate system, that is for every point inside the search area, r = p x2 + y2 ≤R0. Assuming that the tracer source is undetectable outside the search domain, we can impose the absorbing boundary condition θ(r = R0) = 0. The traditional approach to the computation of the tracer concentration θ at every point of the search domain, is via analytical or numerical solution of (1). This, however, is a non-trivial task when the search domain is a structure of complex topology (due to obstacles, compartments walls, random openings, etc). We therefore adopt an alternative approach, where the continuous model of the tracer diffusion process is replaced with a random walk on a square lattice, adopted as a discrete model of the search area. Discretisation is illustrated in Fig.1 for a search area centred at the origin of the coordinate system, with the radius R0 = 9. The length of each link (edge, bond) in the lattice determines the resolution of discretisation, and in this example is adopted as a unit length. The source, assumed to be located at one of the nodes of the lattice, is emitting particles which travel through the lattice according to the −10 −5 0 5 10 −10 −8 −6 −4 −2 0 2 4 6 8 10 x y Figure 1. Search area discretisation: the complete grid, with the length of each link equal 1. The centre of the search area is in (0, 0), its radius is R0 = 9. random walk model [17]. The obstacles in the search domain (the regions through which the tracer cannot pass) are simply modelled as missing links (or clusters of missing links) in the square lattice. Fig.2 shows an example of such a model: this incomplete lattice is obtained by removing fraction p = 0.35 of the links in the complete lattice shown in Fig.1. Note that all nodes in the incomplete lattice (grid) are connected. On average this will be the case if the fraction of missing links in the incomplete grid of Fig.2 is below the percolation threshold pc; above the percolation threshold (p > pc) the lattice becomes fragmented. The framework of percolation theory enables analytical description of statistical properties of such a lattice [13], [14]. −10 −5 0 5 10 −10 −8 −6 −4 −2 0 2 4 6 8 10 x y Figure 2. A model of search area with obstacles: the missing links of the complete graph of Fig.1 represent blocked passages (due to the walls, closed doors, etc) for moving particles. This incomplete grid is obtained by removing fraction p = 0.35 of the links from the complete graph. 3 B. Model of tracer distribution This section explains how to compute the mean concen- tration of tracer particles in each node of the incomplete grid (such as the one shown in Fig.2) which represents a discretised model of the search area with obstacles. For a given incomplete grid, the mean concentration can be computed using the absorbing Markov chain technique [15]. Neglecting the spatial approximation of the search domain (due to discretisation) and under the assumption that the distribution of particles has reached the steady state, the absorbing Markov chain technique provides an exact solution for the quantity of source material at each location. We can regard the random walk of tracer particles through the incomplete grid (e.g. Fig.2) as a Markov chain whose states are the nodes of the grid. The Markov chain is specified by the transition matrix T; each element of this matrix is the probability of transition from state si to state sj (i.e. a particle move from node i to node j): Tij = P{sj|si}. How to construct T given the incomplete grid? First note that we distinguish two types of states in this Markov chain: absorbing states (corresponding to the nodes on the boundary of the grid) and transient states. For an absorbing state si, Tii = 1 and Tij = 0, if j ̸= i. Suppose a transient state si corresponds to node i in the incomplete grid, which has connections (links) with nodes j1, . . . , jm, where for a square grid m ≤4. Then Tij1 = · · · = Tijm = 1/m and Tij = 0 for j /∈{j1, . . . , jm}. Suppose there are r absorbing states and t transient states. If we order the states so that the absorbing states come first (before the transient states), then the transition matrix takes the canonical form: T =  I 0 R Q  , (2) where I is r × r identity matrix, Q is the t × t matrix that describes transitions between transient states, R is a t × r matrix that describes the transitions from transient to absorbing states and 0 is an r×t matrix of zeros. The fundamental matrix of the absorbing Markov chain [15], F = (I −Q)−1, (3) represents the expected number of visits to a transient state sj starting from a transient state si (before being absorbed). This matrix will be used in simulations to compute the mean particle concentration in any node of the incomplete grid. Suppose an emitting source is placed at node i, which is not on the boundary. The source is releasing tracer particles at a constant rate A0. Then the expected concentration of tracer particles in any other node j of the incomplete grid (which is not on the boundary) is given by θj = A0 · Fij. The concentration scales linearly with the release rate A0 as a direct consequence of the linearity of Laplace equation (1). Fig.3 shows the mean concentration of tracer particles for the search area modelled by incomplete grid of Fig.2, with the source placed at (X, Y ) = (0, 7) and with A0 = 12. Notice how the concentration depends on the distance from the source and the structure of the grid, plotted in Fig.2. −10 −5 0 5 −10 −5 0 5 0 5 10 15 20 25 30 Figure 3. Mean concentration of tracer particles for the search area modelled by incomplete graph of Fig.2 with source placed at (X, Y ) = (0, 7) with A0 = 12 (darker cells indicate higher concentration) C. Sensor models and motion model Two types of measurements are collected by the searcher. Sensor 1 measures the concentration of tracer particles as a count of particles received during the sampling time. Assum- ing the so-called ’dilution’ limit (limit of low concentrations) the tracer fluctuations follow the Poisson distribution [8], that is a concentration measurement at node j of the grid is a random sample drawn from n ∼P(n; λ) = λn n! e−λ (4) where λ = θj = A0 ·Fij. The Poisson distribution mimics the intermittency of concentration measurements [8]. The searcher sequentially estimates the source parameters without knowing the map of the search area. Hence the mea- surement model based on the mean concentration λ = A0·Fij cannot be used in estimation (recall that matrix F is formed based on the structure of the incomplete grid). Assuming that the fraction of missing links in the incomplete grid is smaller than the percolation threshold pc, the expected concentration of tracer particles in any node j of the incomplete grid can be computed approximately using the property of conformal invariance of the Laplace equation (see Appendix for details). Suppose the source of release rate A0 is placed at a node of the grid, positioned at coordinates (X, Y ). Then the mean (time and ensemble averaged) concentration at node j, positioned at (xj, yj) can be approximated as: ⟨θ⟩j ≈−A 2 log(R2) (5) where A = A0/fc, (fc is a constant, 0 < fc < 1, which depends on the fraction of missing links in the incomplete grid, see Appendix), and R2 = R2 0 (xj −X)2 + (yj −Y )2 (xjY −yjX)2 + (R2 0 −xjX −yjY )2 . (6) Note that this model is independent of the structure of the incomplete grid. In summary, estimation will be carried out 4 using Sensor 1 measurement model based on (4), where mean λ = ⟨θ⟩j is approximated by (5), (6). The actual concentration measurements will be simulated according to (4), but with λ = θj = A0 · Fij. The searcher moves and explores the search domain in order to find the source. The source parameter estimation is carried out using the map-independent measurement model (5), which does not require discretisation of the search domain on a square lattice (as in Fig.1). Nevertheless, we keep discretisation for the searcher in order to model its motion paths and to facilitate its grid-based SLAM functionality. Thus we assume that the searcher travels within the search area along the paths represented by the links of the incomplete grid as in Fig.2. As it travels, it stops at the nodes along its path to sense the environment, i.e. to collect measurements. Sensor 2 is a simple binary detector of the presence or absence of the links (paths) visible from the node in which the searcher is currently placed. It reports on the presence/absence of the primary and secondary neighbouring links. A link in a grid of Fig.2, is defined by a quadruple (x1, y1, x2, y2), where (x1, y2) and (x2, y2) are the coordinates of the nodes it connects. In order to explain what we mean by primary and secondary links, consider for example the node at location (−3, −4) indicated by ’o’ in Fig.2. The primary observable links from this node are the connecting links towards East, West, up and down from (−3, −4), i.e. ℓ1 = (−3, −4, −2, −4), ℓ2 = (−3, −4, −4, −4), ℓ3 = (−3, −4, −3, −3), and ℓ4 = (−3, −4, −3, −5), i.e. . The sta- tus of link ℓ, m(ℓ), takes values from {0, 1}, where m(ℓ) = 1 means that link ℓexists and m(ℓ) = 0 is the opposite. According to Fig.2, m(ℓ1) = 1, m(ℓ2) = 1, m(ℓ3) = 0, m(ℓ4) = 1. The secondary observable links from the node at (−3, −4) in Fig.2 represent second neighbouring links in direction of East, West, up and down from (−1, −1), that is ℓ5 = (−2, −4, −1, −4), ℓ6 = (−4, −4, −5, −4), ℓ7 = (−3, −3, −3, −2), and ℓ8 = (−3, −5, −3, −6). According to Fig.2, m(ℓ5) = 1, m(ℓ6) = 0, m(ℓ7) = 0, m(ℓ8) = 1. Let an observation (supplied by sensor 2) about the presence or absence of a link ℓ, be a binary value z(ℓ) ∈{0, 1}, where z(ℓ) = 0 means link ℓis absent and z(ℓ) = 1 is the opposite. The performance od sensor 2 can be described by two detection matrices, one for the primary links, the other for secondary links. Each detection matrix Π has a form Π = P(z = 0|m = 0) P(z = 0|m = 1) P(z = 1|m = 0) P(z = 1|m = 1)  . (7) where P(z = 1|m = 1) = pd and P(z = 1|m = 0) = pfa are the probability of correct detection and the probability of false detection pfa, respectively. The columns of matrix Π add up to 1, and hence (7) can be written as: Π = 1 −pfa 1 −pd pfa pd  . (8) Suppose the searcher is in node i at discrete-time k −1. Let the set of admissible controls vectors for the next move be defined as Uk = {·, ↑, →, ↓, ←}, meaning that the searcher can stay where it is, or move one unit length up, right, down or left. After processing measurements from its sensors, the searcher decides to choose control u∗ k ∈Uk and hence to be at time k at node j. This control, however, is executed correctly only with probability 1−pe. Due to control noise or unmodeled exogenous effects [11], with probability pe the searcher will actually execute control u′ k ∈Uk \ {u∗ k}. III. THE PROBLEM AND ITS CONCEPTUAL SOLUTION The searcher has at its disposal the probabilistic models of sensor measurements and dynamic models. Prior knowledge also includes: (1) the coordinates of its initial position; (2) the length of each link in the square lattice; (3) the boundary of the circular search area (defined by its centre and radius). The described prior translates into knowledge of the complete grid such as the one shown in Fig.1. The searcher cannot move outside the complete grid. The objective of the searcher is to estimate in the shortest possible time the coordinates of the emitting source, as well as the partial map describing the path from its starting (entry) point to the estimated location of the source. This partial map is important, for example, in order to guide the rescue team to the source or if the searcher has to retreat to its starting position. A. Sequential Bayesian estimation The described problem can be cast in the sequential Bayesian estimation framework as a nonlinear filtering prob- lem. Let us first define the state vector, which consists of three parts: 1) The coordinates of the searcher position at discrete-time k = 1, 2, . . . are denoted by pk = [xk yk]⊺. 2) The status (presence/absence) of each link in the com- plete grid (such as the one shown in Fig.1). The status of link ℓj, where j = 1, . . . , L, and L is the total number of links in the complete grid, is m(ℓj) = mj ∈{0, 1}. The notation P(mj = 1) refers to the probability that the link is present. The map at time k is fully specified by vector mk = [m1,k, . . . , mL,k]⊺. The time index is introduced because we allow the map of the search area to occasionally change, e.g. an open door can close. The assumption is that the statuses of links are mutually independent, i.e. mj,k is independent from mi,k for i ̸= j. 3) The parameter vector of the source is denoted by s = [X Y A]⊺. The complete state vector is then defined as yk = [p⊺ k m⊺ k s⊺]⊺. Dynamics of the state yk is described by two transitional densities: p(mk|mk−1) specifies the evolution of the map over time, while p(pk|pk−1, uk) characterises the searcher motion model. The observation models of the searcher are specified by two likelihood functions: g1(nk|pk, mk, s) characterises sensor 1, which provides the count of particles nk at pk from the source in state s through the map mk; g2(zk|pk, mk) refers to sensor 2 and describes the observation zk of the status of the links in mk visible from the searcher in location pk. Let us denote observations and controls at time k by a vector 5 ζk = [nk z⊺ k]⊺. Finally, the prior probability density function (PDF) of the state is denoted by p(y0). The goal in the sequential Bayesian framework is to es- timate any subset or property of the sequence of states y0:k := (y0, . . . , yk) given observation sequence ζ1:k := (ζ1, . . . , ζk) and the control sequence u1:k := (u1, . . . , uk), which is completely specified by the joint posterior distribu- tion p(y0:k|ζ1:k, u1:k). This posterior satisfies the following recursion: p(y0:k|ζ1:k, u1:k) ∝ g(ζk|yk)p(yk|yk−1, uk)p(y0:k−1|ζ1:k−1, u1:k−1) (9) where p(yk|yk−1, uk) = p(mk|mk−1) p(pk|pk−1, uk) (10) is the transitional density, and g(ζk|yk) = g1(nk|pk, mk, s) g2(zk|pk, mk) (11) is the measurement likelihood function. In general it is impossible to solve analytically the re- cursive equation (9). Instead we will formulate a numerical approximation using the sequential Monte Carlo method [18]. Before going into details, notice that factorization expressed by (10) and (11) imposes a structure which can be conve- niently represented by a dynamic Bayesian network (DBN) [19] shown in Fig.4. The circles in Fig.4 represent random variables: white circles are hidden variables; gray circles are observed variables. Arrows indicate dependencies. The arrows indicated by dashed lines are explained next. k u k x k z m s n k k k uk-1 k-1 x k-1 z m s n k-1 k-1 k-1 T i m e k+1 uk+1 xk+1 z m s n k+1 k+1 k+1 Figure 4. The dynamic Bayesian network representing the dependency between the random variables which feature in the described inference problem Count measurement nk depends on the map mk, hence the likelihood of count measurement is formulated as g1(nk|pk, mk, s). The searcher, however, does not know the map (it estimates it only partially as it travels through the search area) and hence we have introduced the approximate measurement model expressed by (4)-(6). The searcher will therefore process count observations nk using the likeli- hood function which is independent of mk, and denoted by ˜g1(nk|pk, s), rather than g1(nk|pk, mk, s). We indicate this fact by drawing the arrow from mk to nk in Fig.4 by a dashed line. The computation of the posterior PDF for a structured com- plex system such as the one shown in Fig.4 can be factorised and consequently made computationally and statistically more efficient. Technical details will be given in Sec.IV. B. Information driven motion control After processing the measurements received at time k −1, the searcher needs to select the next control vector uk which will change its position to pk ∼p(pk|pk−1, uk). The problem of selecting uk can be formulated as a partially- observed Markov decision process [20], whose elements are: (1) the set of admissible control vectors Uk; (2) the current information state, expressed by the predicted PDF p(yk|ζ1:k−1, u1:k−1, uk), where uk ∈Uk; (3) the reward function associated with each control uk ∈Uk. In the paper we adopt motion control based on a single step ahead strategy; this myopic approach is suboptimal in the presence of randomly missing links, but is computationally easier to implement and faster to run. The control vector is then selected as: uk = arg max v∈Uk E{D(v, p(yk|ζ1:k−1, u1:k−1, v), ζk(v))} (12) where D(u, p, ζ) is the reward function. Note that the reward depends on future measurement ζk = [nk z⊺ k]⊺, which would be acquired after control u ∈Uk had been applied. Since the decision has to be made before the actual control is applied, the expectation E is taken in (12) with respect to the prior measurements PDF. Considering that the primary mission of the search is source localisation (map estimation is of secondary impor- tance), the reward function at time k is adopted as the information gain between: (1) the predicted PDF over the state subspace (s, pk) and (2) the updated PDF over (s, pk), using the count measurement nk. The two distributions are denoted π0(s, pk|uk) = p(s|n1:k−1, u1:k)p(pk|pk−1, uk) and π1(s, pk|nk, uk) = ξ ˜g1(nk|pk, s) π0(s, pk|uk), respectively, where ξ is a normalisation constant. The information gain be- tween the two distributions is measured using the Bhattacharya distance [21]. The reward function is thus adopted as: D(uk) = −2 log Z p π1(s, pk|nk, uk) · π0(s, pk|uk) ds dpk (13) where we dropped unnecessary arguments in notation for D. IV. THE SEARCH ALGORITHM The proposed search algorithm, formulated as a DBN with observer control, can be implemented efficiently as a Rao- Blackwellised particle filter (RBPF) [22] with sensor control. The idea of the RBPF is as follows. Suppose it is possible to divide the components of the hidden state vector yk into two 6 groups, αk and βk, such that the following two conditions are satisfied: C-1: p(yk|yk−1, uk) = p(αk|βk−1:k, αk−1)·p(βk|βk−1, uk) C-2: the conditional posterior distribution p(αk| β0:k, ζ1:k, u1:k) is analytically tractable. Then we need only to estimate the posterior p(β0:k|ζ1:k, u1:k), meaning that we reduced the dimension of the space in which Monte Carlo estimation needs to be carried out, from dim(yk) to dim(βk). As argued in [22], this potentially improves computational and statistical efficiency of the particle filter. In the described DBN, shown in Fig,4, in order to satisfy conditions C-1 and C-2, the state vector yk can be partitioned as follows: αk = [m⊺ k A]⊺ (14) βk = [p⊺ k X Y ]⊺. (15) We are interested only in the filtering posterior density, which can now be factorised as follows: p(αk, β0:k|ζ1:k, u1:k) = p(αk|β0:k, ζ1:k, u1:k) · p(β0:k|ζ1:k, u1:k) (16) The PDF p(β0:k|ζ1:k, u1:k) is approximated by a random sample {β(i) 0:k}N i=1. Subsequently one can compute analytically (for each sample β(i) 0:k): p(αk|β(i) 0:k, n1:k, z1:k, u1:k) = p(mk|z1:k, β(i) 0:k) · p(A|n1:k, β(i) 0:k), (17) where • p(mk|z1:k, β(i) 0:k) = qk is a vector of probabilities of existence for each link in the random grid and • p(A|n1:k, β(i) 0:k) is approximated by a Gamma distribution with shape parameter ηk and scale parameter θk, i.e. G (A; ηk, θk). Hence each particle corresponds to a set:  β(i) 1:k, qk, ηk, θk  (18) where qk, ηk, θk are the sufficient statistics of p(αk|β(i) 0:k, n1:k, z1:k, u1:k). Keep in mind that qk, ηk, θk depend on a particular sequence (particle) β(i) 0:k. A. Recursive formulae for sufficient statistics Let us first discuss the analytic recursive formula for the computation of qk, following the ideas of the grid-based SLAM [11]. Note that qk =p(mk|z1:k, β(i) 0:k) = g2(zk|mk, β(i) k ) p(mk|z1:k−1, β(i) 0:k−1) P mk g2(zk|mk, β(i) k ) p(mk|z1:k−1, β(i) 0:k−1) (19) where p(mk|z1:k−1, β(i) 0:k−1) = X mk−1 p(mk|mk−1) p(mk−1|z1:k−1, β(i) 0:k−1) (20) The update of probability vector qk is then carried out as follows. Recall from (15) that particle β(i) k specifies the location of the searcher at time k, p(i) k = [x(i) k y(i) k ]⊺. Each component of vector zk is then an observation of existence of a primary or a secondary link from location p(i) k . Let qj,k−1 be a component of vector qk−1, denoting the posterior probability that link ℓj exists at time k −1, i.e. qj,k−1 = p(mj,k−1|z1:k−1, β(i) 0:k−1). Recall also that since the link sta- tuses are assumed independent, then qk−1 = QL j=1 qj,k−1. According to (20), link j existence probability is predicted as: qj,k|k−1 = p(mj,k = 1|mj,k−1 = 0)(1 −qj,k−1)+ p(mj,k = 1|mj.k−1 = 1)qj,k−1 (21) Let z be a component of vector zk which refers to link ℓj, according to the current position of the searcher, p(i) k . Then based on (19) we update the link j existence probability as: qj,k =        pd qj,k|k−1 pd qj,k|k−1+pfa(1−qj,k|k−1) if z = 1 (1−pd) qj,k|k−1 (1−pd) qj,k|k−1+(1−pfa)(1−qj,k|k−1) if z = 0 (22) where pd and pfa, introduced in (8), are the elements of the appropriate detection Π matrix (primary or secondary) of (8). Equations (19)-(22) can be summarised as: qk = ψ(qk−1, β(i) k , zk) (23) Let us describe next the analytic recursion for the update of the parameters ηk and θk of (18). At time k −1, the posterior of emission rate A is modeled by a gamma distribution: A|n1:k−1, β(i) 0:k−1 ∼G (A; ηk−1, θk−1)) . (24) Sensor 1 provides at time k a count measurement nk, which plays the key role in the update of parameters ηk−1 and θk−1. Recall that the likelihood function of this measurement, ˜g1(nk|β(i) k , A) is a Poisson distribution with parameter (mean) λ(i) k−1, rather than A. Fortunately, λ(i) k−1 is linearly related to emission rate A, that is λ(i) k = A · c(β(i) k ) where the constant c(β(i) k ) is always positive and given by c(i) k = −1 2  2 log R0+ log (x(i) k −X(i))2 + (y(i) k −Y (i))2 (x(i) k Y (i) −y(i) k X(i))2 + (R2 0 −x(i) k X(i) −y(i) k Y (i))2  . (25) with X(i) and Y (i), according to (15), being the components of particle β(i) k . In the proposed algorithm for the update of parameters ηk−1 and θk−1 we use the following two properties of Gamma distribution: 7 1) Scaling property [23]: if X ∼G(η, θ) then for any c > 0, cX ∼G(η, cθ). 2) Gamma distribution is the conjugate prior of Poisson distributions [24]: if λ ∼G(η, θ) is a prior distribution and n is a sample from the Poisson distribution with parameter λ, then the posterior is λ ∼G(η + n, θ/(1 + θ)). Given β(i) k we can compute constant c(i) k of (25) and express the prior distribution of λ(i) k−1 as: λ(i) k−1|n1:k−1, β(i) 0:k ∼G  λ; ηk−1, c(i) k · θk−1  (26) Using measurement nk and the conjugate prior property, the posterior distribution is: λ(i) k |n1:k, β(i) 0:k ∼G λ; ηk−1 + nk, c(i) k θk−1 1 + c(i) k θk−1 ! (27) Since we are after the updated parameters of Gamma distribu- tion of A (rather than λ(i) k ), again using the scaling property we have: A|n1:k, β(i) 0:k ∼G A; ηk−1 + nk, θk−1 1 + c(i) k θk−1 ! (28) We summarise from (24) and (28) the analytic expressions for the update of ηk and θk : ηk = ηk−1 + nk (29) θk = θk−1 1 + c(i) k θk−1 (30) B. Importance weights Recursive estimation of p(β0:k|ζ1:k, u1:k) is implemented using a particle filter. If we use the transitional prior as the proposal distribution. i.e. q(β0:k|ζ1:k, u1:k−1) = p(βk|βk−1, uk) p(β0:k−1|ζ1:k−1, u1:k−1) (31) the importance weights can be computed recursively as follows [22]: wk ∝p(ζk|ζ1:k−1, β0:k) (32) For our problem expression (32) can be evaluated as: wk ∝ Z p(ζk, αk|ζ1:k−1, β0:k) dαk (33) = Z ˜g1(nk|A, βk) p(A|n1:k−1, β0:k−1) dA × X mk g2(zk|mk, pk) p(mk|z1:k−1, β0:k−1) (34) where p(A|n1:k−1, β0:k−1) is given by (24) and p(mk|z1:k−1, β0:k−1) = qk|k−1 by (20), i.e. qk|k−1 = X mk−1 p(mk|mk−1) qk−1. The components of vector qk|k−1, i.e. qj,k|k−1, were specified by (21). The integral which features in (34) can also be computed analytically. This integral equals: I = Z ˜g1(nk|A, βk) p(A|n1:k−1, β0:k−1) dA (35) = Z P(nk; λk = c(βk) · A) G(A; ηk−1, θk−1)dA (36) where P(n; λ) is the Poisson distribution introduced in (4). Recall the explanation presented in Sec.IV-A about the update of the parameters of the Gamma distribution, summarised by (26)-(28). Effectively we have shown there that: G A; ηk−1 + nk, θk−1 1 + c(i) k θk−1 ! = P(nk; λk = c(βk) · A) G(A; ηk−1, θk−1) R P(nk; λk = c(βk) · A) G(A; ηk−1, θk−1)dA (37) where the integral in the denominator is I, see (36). Hence, the integral can be expressed as: I = P(nk|λk = c(βk) · A) G(A; ηk−1, θk−1) G(A; ηk−1 + nk, θk−1/(1 + c(βk)θk−1)) (38) and is computed for an arbitrary chosen value of A > 0. Based on (34), let us summarise the expression for an unnormalised importance weight as: ˜wk = ϕ(βk, qk−1, ηk−1, θk−1, nk, zk) (39) Importance weights determine in a probabilistic manner which particles will survive (and possibly multiply) during the resam- pling step of the RBPF. C. Information gain Suppose the posterior distribution at time k −1, p(yk−1|ζ1:k−1, u1:k−1), is approximated by a set of particles: Yk−1 = n β(i) k−1, q(i) k−1, η(i) k−1, θ(i) k−1 oN i=1 , (40) where random sample β(i) k−1 consists of the searcher position p(i) k−1 = [x(i) k−1 y(i) k−1]⊺and the position of the source p(i) s = [X(i) Y (i)]⊺, see (15). The weights of the particles in Yk−1 are equal, because sensor control is carried out after resampling, i.e. w(i) k−1 = 1/N, i = 1, . . . , N. The question is how to compute the information gain (13) for each u ∈Uk, based on particles Yk−1. We adopt the ideal measurement approximation for this, that is, in hypothesizing the future count measurement (resulting from action u), we assume: (1) action u will be carried out correctly, that is the transitional density p(pk|pk−1, uk) will be replaced by deter- ministic mapping: pk = pk−1 + uk, and (2) the measurement count will be equal to the mean of ˜g1(nk|A, βk), that is λk (rounded off to the nearest integer). Since we are after the expected value of the gain, that is E{D(u)}, we will create an ensemble of “future ideal measurements” {n(j) k }M j=1. Expectation is then approximated by a sample mean, i.e. E{D(u)} ≈1 M M X j=1 D(j)(u) 8 where D(j)(u) was computed using n(j) k . The ensemble of “future ideal measurements” {n(j) k }M j=1 is created as follows. For each action u, choose at random a set of M particle indices ij ∈{1, . . . , N}, j = 1, . . . , M. Action u is then expected to move the searcher to location p(ij) k = p(ij) k−1+u. Then a “future ideal measurement” is n(j) k = ⌊A(ij) · c(ij)⌉, where c(ij) as a function of p(ij) k , p(ij) s was defined by (25), A(ij) ∼G(A; ηk−1, θk−1), and ⌊·⌉denotes the nearest integer function. It remains to explain how to compute the gain D(j)(u) based on n(j) k . Distribution π0(s, pk|uk), which features in (13), can be approximated using the particle set Yk−1 as follows: πo(s, pk, u) ≈G(A; ηk−1, θk−1) N X i=1 w(i) k−1δ(ps−p(i) s , pk−p(i) k ) (41) where p(i) k ∼p(pk|p(i) k−1, u). The updated distribution is π1(s, pk|u, n(j) k ) = ˜g1(n(j) k |pk, ps, A)πo(s, pk|u) R ˜g1(n(j) k |pk, ps, A)πo(s, pk|u)dsdpk . (42) Substitution of (41) and (42) into (13) leads to: D(j)(u) ≈−2 log PN i=1 w(i) k−1J (i)(n(j) k ) hPN i=1 w(i) k−1I(i)(n(j) k ) i1/2 (43) where I(i)(nk) is computed via (38) and J (i)(nk) = Z h P(nk; λ(i) k = c(β(i) k )A) i1/2 × G(A; η(i) k−1, θ(i) k−1)dA (44) Integral (44) can be evaluated numerically. D. Implementation The pseudo-code of one cycle of the search algorithm is presented in Algorithm 1. The input consists of the particle set Yk−1, defined by (40). Selection of the control vector uk (line 2 of Algorithm 1) is described in Algorithm 2. Explanation of the steps in Algorithm 1 are described first. Estimation of the state vector via the RBPF is carried out in lines 4-18. According to (15), random vector β(i) k−1 consists of p(i) k−1, X(i) and Y (i). Since the source location, (X(i), Y (i)), is static, only the component p(i) k−1 is propagated to future time k in line 6. In line 7, equation (39) is applied to compute the unnormalised weights of each particle. The map, represented by the probability of existence of each link, is updated in line 8, based on the expression (23). The parameters of Gamma distribution are update in lines 9-11. The weights assigned to each quadruple (β(i) k , q(i) k , η(i) k , θ(i) k ) are normalised in line 14. Resampling of particles is carried out in lines 15-18. The particles for source position p(i) s are not restricted to the grid nodes and after the resampling step, their diversity is improved by regularisation [25]. Finally, the output is the particle set Yk. The selection of a control vector, described by Algorithm 2, starts with postulating the set Uk in line 2. For every u ∈Uk, Algorithm 1 The searcher algorithm 1: Input: Yk−1 2: Run Algorithm 2 to select the control vector uk 3: Apply control uk and collect measurements zk, nk 4: Yk = ∅; Yk = ∅ 5: for i = 1, . . . , N do 6: Draw p(i) k ∼p(pk|p(i) k−1, uk) 7: ˜w(i) k = ϕ(β(i) k , q(i) k−1, η(i) k−1, θ(i) k−1, nk, zk) 8: q(i) k = ψ(q(i) k−1, β(i) k , zk) 9: η(i) k = η(i) k−1 + nk 10: Compute constant c(i) k as a function of β(i) k using (25) 11: θ(i) k = θ(i) k−1/(1 + c(i) k θ(i) k−1) 12: Yk = Yk ∪{(β(i) k , q(i) k , η(i) k , θ(i) k )} 13: end for 14: w(i) k = ˜w(i) k / PN j=1 ˜w(j) k , for i = 1, . . . , N 15: for i = 1, . . . , N do 16: Select index ji ∈{1, . . . , N} with probability w(i) k 17: Yk = Yk ∪{(β(ji) k , q(ji) k , η(ji) k , θ(ji) k )} 18: end for 19: Output: Yk the algorithm anticipates j = 1, . . . , M future measurements n(j) k (line 9) and accordingly computes a sample of the reward D(j)(u) (line 14). The expected reward is then a sample mean (line 16). Finally the optimal one-step ahead control is selected in line 18. It has been observed in simulations that one step ahead control can sometimes lead to situations where the observer position switches eternally between two or three nodes of the lattice. In order to overcome this problem, we adopt a heuristic as follows: if a node has been visited in the last 10 search steps more than 3 times, the next motion control vector is selected at random. While a multi-step ahead searcher control would be preferable than adopted heuristic, it would also be computationally more demanding. Multi-step ahead searcher control remains to be explored in future work. Algorithm 2 Selection of control vector 1: Input: Yk−1 2: Create the set of admissible controls Uk = {·, ↑, →, ↓, ←} 3: for every u ∈Uk do 4: for j = 1, . . . , M do 5: Choose at random particle index ij ∈{1, . . . , N} 6: p (ij) k = p (ij) k−1 + u; 7: Compute c (ij ) k using p (ij) k and p (ij) s via (25) 8: Adopt A(ij) = η (ij) k−1 · θ (ij) k−1 9: n(j) k = ⌊A(ij) · c (ij) k ⌉ 10: for i = 1, . . . , N do 11: Compute I(i)(n(j) k ) via (38) 12: Compute J (i)(n(j) k ) via (44) 13: end for 14: Compute D(j)(u) using (43) 15: end for 16: Estimate E{D(u)} as a sample mean of {D(j)(u)}M j=1 17: end for 18: Select control vector uk ∈Uk using (12) 9 V. NUMERICAL RESULTS A. An illustrative run We applied the described search algorithm to the search area modelled by the random grid shown in Fig.2. Prior knowledge available to the searcher is illustrated by Fig.1: the radius of the search area is R0 = 9; the centre is c = (0, 0) and the total number of potential links in the complete grid modelling the search area is L = 572. The parameters of the emitting source to be estimated are: X = 0, Y = 7 and A0 = 12. The searcher initial position is p0 = (9, −4). Dynamic model p(mk|mk−1) is is a 2×2 transitional prob- ability matrix with diagonal and off-diagonal elements 0.999 and 0.001, respectively1. Dynamic model p(pk|pk−1, uk) can be expressed as: p(pk|pk−1, uk) =(1 −pe)δ(pk −pk−1 + uk)+ X v∈Uk\uk pe |Uk| −1δ(pk −pk−1 + v) (45) where in simulations we used the value pe = 0.04. The parameters of detection matrices, which define the likelihood function g2(zk|pk, mk), are as follows: for primary observable links, pd = 1 and pfa = 0; for secondary observable links pd = 0.8 and pfa = 0.1. The RBPF used N = 4000 particles with M = 400 samples used in the averaging of information gain. The particle set Y0 at initial time is created as follows: p(i) 0 = p0, for all i = 1, . . . , N particles; the source location vector is drawn from a uniform distribution over a circle with centre c and radius R0, i.e. p(i) s ∼UCircle(c,R0)(ps); link existence probabilities are set to qj,0 = 0.5, for all j = 1, . . . , L links; finally, the parameters of initial Gamma distribution G(A; η0, θ0) were selected as η0 = 15 and θ0 = 1. We terminate the search algorithm when the searcher steps on the source. At this point we compare the true source location with the current estimate of the posterior distribution of the searcher position, approximated by particles {p(i) k }N i=1. If the true source position is contained in the support defined by {p(i) k }N i=1, the search is considered successful. Fig. 5 illustrates a typical run of the search algorithm. The true path of the searcher on this run is shown in Fig.5.(a). It took the searcher 53 time steps to reach the source. During the search, the motion control vector failed to execute correctly on two occasions. The final estimate of the map (i.e of existing links of the square lattice) is shown in Fig.5.(b). This figure shows only the links whose probability of existence is higher than 0.6. The blue circles in Fig.5.(b) indicate the posterior distribution of the searcher final position. Its true position, which is the same as the source position, is included in the support of this posterior, meaning that the search was successful. Moreover, on this occasion, the MAP estimate of the searcher final position coincides with the truth. Fig.5.(c) shows the measured values of the count number nk along the path. As we discussed in introduction, the measurements 1The structure of the search domain must be stable (recall that the count measurement model is valid for a steady-state), hence the changes in the statuses of links are very rare. Table I THE AVERAGE PERFORMANCE OF THE SEARCH ALGORITHM: DIFFERENT SOURCE LOCATIONS, A0 = 12 Source Shortest Number of Success location path length search steps rate [%] (0, 7) 20 42.1 94 (0, 1) 14 34.0 95 (2, −5) 8 28.8 99 are sporadic, especially in the beginning, when the distance between the searcher and the source is large: among the first ten count measurements, only three indicated a non-zero tracer concentration. An avi video file, illustrating a single run of the algorithm, is supplied with the submission of this paper. B. Monte Carlo runs The average performance of the search algorithm in the described scenario has been assessed via Monte Carlo runs. If the search on a particular run was successful, its corresponding search time is used in averaging. A run is declared unsuccess- ful if the source has not been found after k = 100 discrete- time steps. We also keep the statistics on the success rate of the search. The results obtained via averaging over 100 Monte Carlo runs are presented in Table I for three different locations of the source, i.e. (X, Y ) = (0, 7), (0, 1), (2, −5). The three locations correspond to the shortest path distances (from the searcher initial position p0 = (9, −4) to the source) of 20, 14 and 8 unit lengths, respectively. All other parameters were the same as described above for the illustrative run. As expected, the results in Table I indicate that the search is quicker and more reliable (i.e. with a higher success rate) for a source which is closer to the searcher initial position. Table II presents the results for a source at location (0, 7) but with three different values of the source release-rate, i.e. A0 = 8, 12, 16. The results indicate that the search is quicker for a source characterised by a higher release rate. The expla- nation of this trend is as follows. Initially, when the searcher is far from the source, its measurements of tracer concentration are very small, typically zero, hence uninformative. During this phase of the search, the searcher effectively moves according to a ‘diffusive’ (or random walk) model, which is slower that the so-called ’ballistic’ movement associated with information driven search [2]. The random walk phase is longer for a weaker source, which contributes to the overall longer search time in this case. As a specific numerical example, we have also validated that a purely random search never manages to find the source at (0, 7) in the given time-frame of 100 discrete-time steps. VI. SUMMARY The paper considers a very difficult problem of autonomous search for a diffusive point source of tracer in an environment whose structure is unknown. Sequential estimation and motion control are carried out in highly uncertain circumstances with the state space including, in addition to the source parameters, 10 −10 −5 0 5 10 −10 −8 −6 −4 −2 0 2 4 6 8 10 (a) −10 −5 0 5 10 −10 −8 −6 −4 −2 0 2 4 6 8 10 (b) 0 5 10 15 20 25 30 35 40 45 50 55 0 5 10 15 20 25 30 Discrete−time index k Count measurements nk (c) Figure 5. An illustration of a single run of the search algorithm: (a) The true path of the searcher (blue circles); (b) The final estimate of the map (existing links) and the searcher position; (c) measured counts nk over time Table II THE AVERAGE PERFORMANCE OF THE SEARCH ALGORITHM: DIFFERENT SOURCE RELEASE-RATES, SOURCE LOCATION 0, 7) Release Number of Success rate A0 search steps rate [%] 8 49.5 78 12 42.1 94 16 38.2 93 the map of the search area and the searcher position within the map. The paper develops mathematical models of mea- surements, it formulates the sequential Bayesian solution (in the form of a Rao-Blackwellised particle filter) and proposes an information driven motion control of the searcher. The numerical results demonstrate the concept, indicating high success-rates in comparison with random walk. There are many areas for further research and improvements of the concepts introduces in this paper. One direction is to explore the potential benefits of analytical results available from the percolation theory in carrying out olfactory search. Another is to investigate more efficient particle filters for source parameter estimation (being a deterministic part of the state space). Finally, the search strategies based on multiple steps ahead (rather than myopic search) are expected to improve the performance in a domain with obstacles. APPENDIX An approximate model of mean concentration, independent of the grid structure, was introduced in Sec.II-C. This model is a solution of the Laplace equation (1) for a circular search area in the absence of obstacles, with a boundary condition θ(r = R0) = 0, but using different values of parameters. More specifically, the obstacles in the search area are incor- porated in this model via homogenization (volume/ensemble averaging) of the diffusion equation (similar to the effective media approach [26], [27]), so that (1) is replaced with D ∆⟨θ⟩= A0δ(x −X, y −Y ), (46) where D is the re-scaled diffusivity that accounts for such a homogenization, and ⟨θ⟩is the time/ensemble averaged tracer concentration. The new (often called effective) diffusivity D is related to ’unobstructed’ diffusivity D0 of (1) via the formula D = fcD0. The scaling parameter 0 ≤fc ≤1 (known as tortuosity [14], [28]) describes the effect of obstacles (their shape and packing density [14], [27]). According to (46), the decrease of the effective diffusivity of the tracer due to the presence of obstacles has the same effect as an appropriate increase of source release-rate (i.e. A = A0/fc), with unchanged diffusivity in (46) (i.e. D = D0), where pa- rameters D0, A0 correspond to their values in an unobstructed space, see (1). We arrive at a conclusion that the effect of obstacles can be approximately incorporated with imprecision (overestimation) of the source release-rate, without any effect on the source position. Since the main goal of the search algorithm is to find the source, then such inaccuracy in release- rate estimation becomes irrelevant for the performance of the algorithm. This means that as the first approximation for 11 adopted measurements model we can still use (46) with known diffusivity D = D0 and some unknown A. Estimation of A0 (if required) can be implemented retrospectively based on a theoretical model for fc [28]). For the lattice models an expression for fc can be derived analytically by employing the framework of percolation theory, resulting in the expression fc = (1 −p/pc)α, where pc = 1/2 (percolation threshold on square lattice), p is the fraction of missing links in the incomplete square lattice and α = 1.30 [13], [14]. If the number of missing links is small, we can adopt approximation fc ≈1 and A ≈A0. In line with the above comments we will use (46) as a foundation for the measurements model that is independent of the structure of the search domain. The solution of (46) for a tracer source located at the center of circle (X = Y = 0), is given by [29]: ⟨θ⟩= A 2 log[(zz∗)/R2 0], (47) where z = x + iy is the complex coordinate and z∗is its complex-conjugate. To find the solution for configurations other than the circular domain with the source in the centre, we employ the property of conformal invariance of the Laplace equation [29]. We illustrate this method with a source placed inside the circular domain, but away from its centre (that is at coordinates (X, Y ) s.t. √ X2 + Y 2 < R0). If we can find a conformal transformation ω(z) that maps an arbitrary position of the source (X, Y ) back to the center of the circular domain, then we can still use the solution (47), but with the substitution z →ω(z). Therefore, for an arbitrary position of the source inside the search area √ X2 + Y 2 < R0 we can write ⟨θ⟩= (κ/2) log[(ww∗)/R2 0]. (48) The required conformal transformation is the well-known M¨obius map (see [29]) w(z) = R0(z −Z) ZZ∗−R2 0 , (49) where Z = X + iY . After straightforward calculations we arrive at the solution given by (5) and (6). We point out that the model is not restricted to a circular search area. According to the theory of analytical functions, a conformal mapping to the circle always exists for arbitrary simply connected domain, and therefore can be computed analytically or numerically [29]. REFERENCES [1] G. M. Viswanathan, V. Afanasyev, S. V. Buldyrev, E. J. Murphy, P. A. Prince, and H. E. Satnley, “Levy fligh search patterns of wandering albatrosses,” Nature, vol. 381, pp. 413–415, 1996. [2] O. B´enichou, C. Loverdo, M. Moreau, and R. Voituriez, “Intermittent search strategies,” Rev. Mod. Phys., vol. 83, pp. 81–129, Mar 2011. [3] A. M. Hein and S. A. McKinley, “Sensing and decision-making in random search,” PNAS, July 2012. [4] J. A. Farrell, S. Pang, and W. Li, “Plume mapping via hidden Markov models,” IEEE Trans. Systems, Man, Cybernetics - Part B: Cybernetics, vol. 33, no. 6, pp. 850–863, 2003. [5] W. Li, J. A. Farrell, S. Pang, and R. M. Arrieta, “Moth-inspired chemical plume tracking on an autonomous underwater vehicle,” IEEE Trans Robotics, vol. 22, no. 2, pp. 292–307, 2006. [6] J. Oyekan and H. Hu, “A novel bio-controller for localizing pollution sources in a medium peclet environment,” Journal of Bionic Engineer- ing, vol. 7, no. 4, pp. 345–353, 2010. [7] H. Ishida, G. Nakayama, T. Nakamoto, and T. Morizumi, “Controlling a gas/odor plume tracking robot based on transient responses of gas sensors,” IEEE Sensors Journals, vol. 5, no. 3, pp. 537–545, 2005. [8] M. Vergassola, E. Villermaux, and B. I. Shraiman, “‘infotaxis’ as a strategy for searching without gradients,” Nature, vol. 445, no. 25, pp. 406–409, 2007. [9] G. L. Iacono, “A comparison of different searching strategies to locate sources of odor in turbulent flows,” Adaptive Behavior, vol. 18, no. 2, pp. 155–170, 2010. [10] B. Ristic, M. Morelande, and A. Gunatilaka, “Information driven search for point sources of gamma radiation,” Signal Processing, vol. 90, pp. 1225–1239, 2010. [11] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. MIT Press, 2005. [12] A. Marjovi and L. Marques, “Multi-robot olfactory search in structured environment,” Robotics and Autonomous Systems, vol. 59, pp. 867–881, 2011. [13] D. ben Avraham and S. Havlin, Diffusion and reaction in fractals and disordered systems. Cambridge University Press, 2000. [14] S. Torquato, Heterogeneous materials. Springer, 2002. [15] J. G. Kemeny and J. L. Snell, Finite Markov chains. New York: Van Nostrand, 1960. [16] A. P. S. Selvadurai, Partial differential equations in Mechanics 1. Springer, 2000. [17] R. Burioni and D. Cassi, “Random walks on graphs: ideas, techniques and results,” Journal of Physics A: Mathematical and General, vol. 38, no. 8, p. R45, 2005. [18] A. Doucet, J. F. G. de Freitas, and N. J. Gordon, Eds., Sequential Monte Carlo Methods in Practice. Springer, 2001. [19] T. Dean and K. Kanazawa, “A model for reasoning about persistence and causation,” Computational intelligence, vol. 5, no. 2, pp. 142–150, 1989. [20] E. Chong, C. Kreucher, and A. Hero, “POMDP approximation using simulation and heuristics,” in Foundations and Applications of Sensor Management, A. Hero, D. Castanon, D. Cochran, and K. Kastella, Eds. Springer, 2008, ch. 8. [21] T. Kailath, “The divergence and Bhattacharyya distance measures in signal selection,” IEEE Trans. Communic. Technology, vol. 15, no. 1, pp. 52–60, 1967. [22] A. Doucet, N. de Freitas, K. P. Murphy, and S. J. Russell, “Rao- Blackwellised particle filtering for dynamic Bayesian networks,” in Pro- ceedings of the 16th Conference on Uncertainty in Artificial Intelligence, 2000, pp. 176–183. [23] “Gamma-distribution,” in Encyclopedia of mathematics, M. Hazewinkel, Ed. Springer, 2001. [24] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian data analysis, 2nd ed. CRC Press, 2003. [25] B. Ristic, S. Arulampalam, and N. Gordon, Beyond the Kalman filter: Particle filters for tracking applications. Artech House, 2004. [26] C. Nicholson, “Diffusion and related transport mechanisms in brain tissue,” Reports on progress in physics, vol. 64, pp. 815–884, 2001. [27] I. L. Novak, P. Kraikivski, and B. M. Slepchenko, “Diffusion in cytoplasm: Effects of excluded volume due to internal membranes and cytoskeletal structures,” Biophysical journal, vol. 97, pp. 758–767, 2009. [28] L. Pisani, “Simple expression for the tortuosity of porous media,” Transport in Porous Media, vol. 88, no. 2, pp. 193–203, 2011. [29] A. Prosperetti, Advanced mathematics for applications. Cambridge University Press, 2011. arXiv:1306.1591v1 [cs.AI] 7 Jun 2013 1 Autonomous search for a diffusive source in an unknown environment Branko Ristic, Alex Skvortsov, Andrew Walker† Abstract—The paper presents an approach to olfactory search for a diffusive emitting source of tracer (e.g. aerosol, gas) in an environment with unknown map of randomly placed and shaped obstacles. The measurements of tracer concentration are sporadic, noisy and without directional information. The search domain is discretised and modelled by a finite two-dimensional lattice. The links is the lattice represent the traversable paths for emitted particles and for the searcher. A missing link in the lattice indicates a blocked paths, due to the walls or obstacles. The searcher must simultaneously estimate the source parameters, the map of the search domain and its own location within the map. The solution is formulated in the sequential Bayesian framework and implemented as a Rao-Blackwellised particle filter with information-driven motion control. The numerical results demonstrate the concept and its performance. Index Terms—Olfactory search, Bayesian inference, mapping and localisation, Rao-Blackwellised particle filter, observer con- trol, information gain. I. INTRODUCTION The search for an emitting source of particles, chemicals, odour, or radiation, based on sporadic clues or intermittent measurements, has attracted a great deal of interest lately. The topic is important for search and rescue operations with the goal to localise dangerous pollutants, such as chemical leaks and radioactive sources. In biology, the search is studied to model animal bahaviour in search for food or mates [1]–[3]. Bio-inspired search for underwater sources of pollution have been studied in [4]–[6]. A robot for gas/odour plume tracking guided by the increase in the concentration gradient has been proposed in [7]. “Infotaxis” [8] is a search strategy based on entropy-reduction maximisation which has been developed in the context of finding a weak source in a turbulent flow (e.g. drug or leak emitting chemicals, for a comprehensive review see [9]). Information-gain driven search for radioactive point sources has been studied in [10]. In all these applications the search domain is either open (without obstacles) or a precise map of the search domain (with obstacles) is available. In this paper we focus on autonomous olfactory search for a diffusive emitting source of tracer (e.g. aerosol, gas, heat, moisture) in a domain with randomly placed and shaped obstacles (forbidden areas), whose structure (the map) is unknown. The problem is of importance for example in localisation of dangerous leaks in collapsed buildings, inside tunnels or mines. The searcher senses in a probabilistic manner both the structure of the search domain (e.g. the presence or †The authors are with Defence Science and Technology Organisa- tion, Australia. Corresponding author: B. Ristic, DSTO, 506 Lorimer Street, Melbourne, VIC 3207, Australia; Tel: +61 3 9696 7688; Email: branko.ristic@dsto.defence.gov.au absence of obstacles, walls, blocked passages) and the level of concentration of tracer particles. The objective of the search is to localise and report the coordinates of the source in a shortest possible time. This is not a trivial task for several reasons. First, the emission rate of the source is typically unknown. Furthermore, the measurements of tracer particle concentration are sporadic, noisy and without directional information. Since the structure (map) of the search domain is unknown, the searcher needs to explore the domain and create its map. The searcher motion is fully autonomous: it senses the environment and after processing this uncertain information sequentially makes decisions on where to move next in order to collect new measurements. Its motion control, however, is not fully reliable as it may occasionally fail to execute correctly. The probabilistic model of searcher motion is assumed known. In the paper we restrict to the search in a two-dimensional domain. The coordinates of the searcher initial position, as well as the border of the search area (relative to the initial position) are given as input parameters. In order to fulfil its mission, the searcher has to find the source and report its coordinates relative to its initial position. This in turn requires simultaneous estimation at three levels: (1) estimation of source parameters (its location in 2D and its release rate); (2) estimation of the map of the search area and (3) estimation of the searcher position within the estimated map. Estimation at levels (2) and (3) has been studied extensively in robotics under the term grid-based simultaneous localisation and mapping (SLAM) [11]. The primary mission in all SLAM publications is an accurate mapping of the area. The primary mission of our searcher, however, is to localise the source, while SLAM is only a necessary component of the solution. The only related work which deals with olfactory search in an unknown structured environment is [12]. While this paper presents a plethora of experimental results, the algorithms are based on heuristics. Our approach, however, is theoretically sound in the sense that its mathematical models are precisely defined, estimation is carried out in the sequential Bayesian framework and the searcher motion control is driven by information gain. The search domain is discretised, as for example in [4], and modelled by a finite two-dimensional lattice. With sufficiently fine resolution of the lattice, the emitting source can be considered to be in one of the nodes of the lattice. The links (bonds, edges) of the lattice represent the traversable paths for emitted particles (tracer) and for the searcher. Missing links in the lattice indicate blocked paths due to the walls or obstacles. This is a very general model applicable to searches at various scales, from inside buildings and tunnels, to within cells of living organisms [2]. The percentage of missing links in the 2 lattice is assumed to be above the percolation threshold pc (for the adopted lattice structure pc = 1/2 [13], [14]) so that long-range connectivity is satisfied [13]. Using the absorbing Markov chains technique [15], we can compute exactly the mean concentration level in any node of the lattice, that is at any point of the search domain with obstacles. Since the structure (map) of the search domain is unknown, the searcher must rely on a theoretical model of concentration measurement which is independent of the this map. Such a model is derived in the paper in the analytic form and used in the search. The search itself consist of algorithms for sequential esti- mation and motion control. We adopt the framework of op- timal sequential Bayesian estimation with information-driven motion control. Implementation is carried out using a Rao- Blackwellised particle filter. The paper is organised as follows. Mathematical models of measurements and searcher motion are described in Sec.II. The olfactory search problem is formulated and its concep- tual solution provided in Sec.III. Full technical details of the proposed search algorithm are presented in Sec.IV, with numerical results given in Sec.V. Finally, conclusions of this study are summarised in Sec. VI. II. MODELLING A. Model of environment The concentration of a tracer at any point of the search domain is governed by the diffusive equation, which in the steady state reduces to the Laplace equation [16]: D0 ∆θ = A0 δ(x −X,y −Y ). (1) Here D0 is the diffusion coefficient of tracer in the envi- ronment, ∆is the Laplace operator, θ is the mean (time- averaged) tracer concentration, δ is the Dirac delta function, A0 is the release-rate of the tracer source, and X, Y are the coordinates of the source in a two-dimensional Cartesian coordinate system. For convenience we adopt a circular search domain of radius R0, centred at the origin of the coordi- nate system, that is for every point inside the search area, r = p x2 + y2 ≤R0. Assuming that the tracer source is undetectable outside the search domain, we can impose the absorbing boundary condition θ(r = R0) = 0. The traditional approach to the computation of the tracer concentration θ at every point of the search domain, is via analytical or numerical solution of (1). This, however, is a non-trivial task when the search domain is a structure of complex topology (due to obstacles, compartments walls, random openings, etc). We therefore adopt an alternative approach, where the continuous model of the tracer diffusion process is replaced with a random walk on a square lattice, adopted as a discrete model of the search area. Discretisation is illustrated in Fig.1 for a search area centred at the origin of the coordinate system, with the radius R0 = 9. The length of each link (edge, bond) in the lattice determines the resolution of discretisation, and in this example is adopted as a unit length. The source, assumed to be located at one of the nodes of the lattice, is emitting particles which travel through the lattice according to the −10 −5 0 5 10 −10 −8 −6 −4 −2 0 2 4 6 8 10 x y Figure 1. Search area discretisation: the complete grid, with the length of each link equal 1. The centre of the search area is in (0, 0), its radius is R0 = 9. random walk model [17]. The obstacles in the search domain (the regions through which the tracer cannot pass) are simply modelled as missing links (or clusters of missing links) in the square lattice. Fig.2 shows an example of such a model: this incomplete lattice is obtained by removing fraction p = 0.35 of the links in the complete lattice shown in Fig.1. Note that all nodes in the incomplete lattice (grid) are connected. On average this will be the case if the fraction of missing links in the incomplete grid of Fig.2 is below the percolation threshold pc; above the percolation threshold (p > pc) the lattice becomes fragmented. The framework of percolation theory enables analytical description of statistical properties of such a lattice [13], [14]. −10 −5 0 5 10 −10 −8 −6 −4 −2 0 2 4 6 8 10 x y Figure 2. A model of search area with obstacles: the missing links of the complete graph of Fig.1 represent blocked passages (due to the walls, closed doors, etc) for moving particles. This incomplete grid is obtained by removing fraction p = 0.35 of the links from the complete graph. 3 B. Model of tracer distribution This section explains how to compute the mean concen- tration of tracer particles in each node of the incomplete grid (such as the one shown in Fig.2) which represents a discretised model of the search area with obstacles. For a given incomplete grid, the mean concentration can be computed using the absorbing Markov chain technique [15]. Neglecting the spatial approximation of the search domain (due to discretisation) and under the assumption that the distribution of particles has reached the steady state, the absorbing Markov chain technique provides an exact solution for the quantity of source material at each location. We can regard the random walk of tracer particles through the incomplete grid (e.g. Fig.2) as a Markov chain whose states are the nodes of the grid. The Markov chain is specified by the transition matrix T; each element of this matrix is the probability of transition from state si to state sj (i.e. a particle move from node i to node j): Tij = P{sj|si}. How to construct T given the incomplete grid? First note that we distinguish two types of states in this Markov chain: absorbing states (corresponding to the nodes on the boundary of the grid) and transient states. For an absorbing state si, Tii = 1 and Tij = 0, if j ̸= i. Suppose a transient state si corresponds to node i in the incomplete grid, which has connections (links) with nodes j1, . . . , jm, where for a square grid m ≤4. Then Tij1 = · · · = Tijm = 1/m and Tij = 0 for j /∈{j1, . . . , jm}. Suppose there are r absorbing states and t transient states. If we order the states so that the absorbing states come first (before the transient states), then the transition matrix takes the canonical form: T =  I 0 R Q  , (2) where I is r × r identity matrix, Q is the t × t matrix that describes transitions between transient states, R is a t × r matrix that describes the transitions from transient to absorbing states and 0 is an r×t matrix of zeros. The fundamental matrix of the absorbing Markov chain [15], F = (I −Q)−1, (3) represents the expected number of visits to a transient state sj starting from a transient state si (before being absorbed). This matrix will be used in simulations to compute the mean particle concentration in any node of the incomplete grid. Suppose an emitting source is placed at node i, which is not on the boundary. The source is releasing tracer particles at a constant rate A0. Then the expected concentration of tracer particles in any other node j of the incomplete grid (which is not on the boundary) is given by θj = A0 · Fij. The concentration scales linearly with the release rate A0 as a direct consequence of the linearity of Laplace equation (1). Fig.3 shows the mean concentration of tracer particles for the search area modelled by incomplete grid of Fig.2, with the source placed at (X, Y ) = (0, 7) and with A0 = 12. Notice how the concentration depends on the distance from the source and the structure of the grid, plotted in Fig.2. −10 −5 0 5 −10 −5 0 5 0 5 10 15 20 25 30 Figure 3. Mean concentration of tracer particles for the search area modelled by incomplete graph of Fig.2 with source placed at (X, Y ) = (0, 7) with A0 = 12 (darker cells indicate higher concentration) C. Sensor models and motion model Two types of measurements are collected by the searcher. Sensor 1 measures the concentration of tracer particles as a count of particles received during the sampling time. Assum- ing the so-called ’dilution’ limit (limit of low concentrations) the tracer fluctuations follow the Poisson distribution [8], that is a concentration measurement at node j of the grid is a random sample drawn from n ∼P(n; λ) = λn n! e−λ (4) where λ = θj = A0 ·Fij. The Poisson distribution mimics the intermittency of concentration measurements [8]. The searcher sequentially estimates the source parameters without knowing the map of the search area. Hence the mea- surement model based on the mean concentration λ = A0·Fij cannot be used in estimation (recall that matrix F is formed based on the structure of the incomplete grid). Assuming that the fraction of missing links in the incomplete grid is smaller than the percolation threshold pc, the expected concentration of tracer particles in any node j of the incomplete grid can be computed approximately using the property of conformal invariance of the Laplace equation (see Appendix for details). Suppose the source of release rate A0 is placed at a node of the grid, positioned at coordinates (X, Y ). Then the mean (time and ensemble averaged) concentration at node j, positioned at (xj, yj) can be approximated as: ⟨θ⟩j ≈−A 2 log(R2) (5) where A = A0/fc, (fc is a constant, 0 < fc < 1, which depends on the fraction of missing links in the incomplete grid, see Appendix), and R2 = R2 0 (xj −X)2 + (yj −Y )2 (xjY −yjX)2 + (R2 0 −xjX −yjY )2 . (6) Note that this model is independent of the structure of the incomplete grid. In summary, estimation will be carried out 4 using Sensor 1 measurement model based on (4), where mean λ = ⟨θ⟩j is approximated by (5), (6). The actual concentration measurements will be simulated according to (4), but with λ = θj = A0 · Fij. The searcher moves and explores the search domain in order to find the source. The source parameter estimation is carried out using the map-independent measurement model (5), which does not require discretisation of the search domain on a square lattice (as in Fig.1). Nevertheless, we keep discretisation for the searcher in order to model its motion paths and to facilitate its grid-based SLAM functionality. Thus we assume that the searcher travels within the search area along the paths represented by the links of the incomplete grid as in Fig.2. As it travels, it stops at the nodes along its path to sense the environment, i.e. to collect measurements. Sensor 2 is a simple binary detector of the presence or absence of the links (paths) visible from the node in which the searcher is currently placed. It reports on the presence/absence of the primary and secondary neighbouring links. A link in a grid of Fig.2, is defined by a quadruple (x1, y1, x2, y2), where (x1, y2) and (x2, y2) are the coordinates of the nodes it connects. In order to explain what we mean by primary and secondary links, consider for example the node at location (−3, −4) indicated by ’o’ in Fig.2. The primary observable links from this node are the connecting links towards East, West, up and down from (−3, −4), i.e. ℓ1 = (−3, −4, −2, −4), ℓ2 = (−3, −4, −4, −4), ℓ3 = (−3, −4, −3, −3), and ℓ4 = (−3, −4, −3, −5), i.e. . The sta- tus of link ℓ, m(ℓ), takes values from {0, 1}, where m(ℓ) = 1 means that link ℓexists and m(ℓ) = 0 is the opposite. According to Fig.2, m(ℓ1) = 1, m(ℓ2) = 1, m(ℓ3) = 0, m(ℓ4) = 1. The secondary observable links from the node at (−3, −4) in Fig.2 represent second neighbouring links in direction of East, West, up and down from (−1, −1), that is ℓ5 = (−2, −4, −1, −4), ℓ6 = (−4, −4, −5, −4), ℓ7 = (−3, −3, −3, −2), and ℓ8 = (−3, −5, −3, −6). According to Fig.2, m(ℓ5) = 1, m(ℓ6) = 0, m(ℓ7) = 0, m(ℓ8) = 1. Let an observation (supplied by sensor 2) about the presence or absence of a link ℓ, be a binary value z(ℓ) ∈{0, 1}, where z(ℓ) = 0 means link ℓis absent and z(ℓ) = 1 is the opposite. The performance od sensor 2 can be described by two detection matrices, one for the primary links, the other for secondary links. Each detection matrix Π has a form Π = P(z = 0|m = 0) P(z = 0|m = 1) P(z = 1|m = 0) P(z = 1|m = 1)  . (7) where P(z = 1|m = 1) = pd and P(z = 1|m = 0) = pfa are the probability of correct detection and the probability of false detection pfa, respectively. The columns of matrix Π add up to 1, and hence (7) can be written as: Π = 1 −pfa 1 −pd pfa pd  . (8) Suppose the searcher is in node i at discrete-time k −1. Let the set of admissible controls vectors for the next move be defined as Uk = {·, ↑, →, ↓, ←}, meaning that the searcher can stay where it is, or move one unit length up, right, down or left. After processing measurements from its sensors, the searcher decides to choose control u∗ k ∈Uk and hence to be at time k at node j. This control, however, is executed correctly only with probability 1−pe. Due to control noise or unmodeled exogenous effects [11], with probability pe the searcher will actually execute control u′ k ∈Uk \ {u∗ k}. III. THE PROBLEM AND ITS CONCEPTUAL SOLUTION The searcher has at its disposal the probabilistic models of sensor measurements and dynamic models. Prior knowledge also includes: (1) the coordinates of its initial position; (2) the length of each link in the square lattice; (3) the boundary of the circular search area (defined by its centre and radius). The described prior translates into knowledge of the complete grid such as the one shown in Fig.1. The searcher cannot move outside the complete grid. The objective of the searcher is to estimate in the shortest possible time the coordinates of the emitting source, as well as the partial map describing the path from its starting (entry) point to the estimated location of the source. This partial map is important, for example, in order to guide the rescue team to the source or if the searcher has to retreat to its starting position. A. Sequential Bayesian estimation The described problem can be cast in the sequential Bayesian estimation framework as a nonlinear filtering prob- lem. Let us first define the state vector, which consists of three parts: 1) The coordinates of the searcher position at discrete-time k = 1, 2, . . . are denoted by pk = [xk yk]⊺. 2) The status (presence/absence) of each link in the com- plete grid (such as the one shown in Fig.1). The status of link ℓj, where j = 1, . . . , L, and L is the total number of links in the complete grid, is m(ℓj) = mj ∈{0, 1}. The notation P(mj = 1) refers to the probability that the link is present. The map at time k is fully specified by vector mk = [m1,k, . . . , mL,k]⊺. The time index is introduced because we allow the map of the search area to occasionally change, e.g. an open door can close. The assumption is that the statuses of links are mutually independent, i.e. mj,k is independent from mi,k for i ̸= j. 3) The parameter vector of the source is denoted by s = [X Y A]⊺. The complete state vector is then defined as yk = [p⊺ k m⊺ k s⊺]⊺. Dynamics of the state yk is described by two transitional densities: p(mk|mk−1) specifies the evolution of the map over time, while p(pk|pk−1, uk) characterises the searcher motion model. The observation models of the searcher are specified by two likelihood functions: g1(nk|pk, mk, s) characterises sensor 1, which provides the count of particles nk at pk from the source in state s through the map mk; g2(zk|pk, mk) refers to sensor 2 and describes the observation zk of the status of the links in mk visible from the searcher in location pk. Let us denote observations and controls at time k by a vector 5 ζk = [nk z⊺ k]⊺. Finally, the prior probability density function (PDF) of the state is denoted by p(y0). The goal in the sequential Bayesian framework is to es- timate any subset or property of the sequence of states y0:k := (y0, . . . , yk) given observation sequence ζ1:k := (ζ1, . . . , ζk) and the control sequence u1:k := (u1, . . . , uk), which is completely specified by the joint posterior distribu- tion p(y0:k|ζ1:k, u1:k). This posterior satisfies the following recursion: p(y0:k|ζ1:k, u1:k) ∝ g(ζk|yk)p(yk|yk−1, uk)p(y0:k−1|ζ1:k−1, u1:k−1) (9) where p(yk|yk−1, uk) = p(mk|mk−1) p(pk|pk−1, uk) (10) is the transitional density, and g(ζk|yk) = g1(nk|pk, mk, s) g2(zk|pk, mk) (11) is the measurement likelihood function. In general it is impossible to solve analytically the re- cursive equation (9). Instead we will formulate a numerical approximation using the sequential Monte Carlo method [18]. Before going into details, notice that factorization expressed by (10) and (11) imposes a structure which can be conve- niently represented by a dynamic Bayesian network (DBN) [19] shown in Fig.4. The circles in Fig.4 represent random variables: white circles are hidden variables; gray circles are observed variables. Arrows indicate dependencies. The arrows indicated by dashed lines are explained next. k u k x k z m s n k k k uk-1 k-1 x k-1 z m s n k-1 k-1 k-1 T i m e k+1 uk+1 xk+1 z m s n k+1 k+1 k+1 Figure 4. The dynamic Bayesian network representing the dependency between the random variables which feature in the described inference problem Count measurement nk depends on the map mk, hence the likelihood of count measurement is formulated as g1(nk|pk, mk, s). The searcher, however, does not know the map (it estimates it only partially as it travels through the search area) and hence we have introduced the approximate measurement model expressed by (4)-(6). The searcher will therefore process count observations nk using the likeli- hood function which is independent of mk, and denoted by ˜g1(nk|pk, s), rather than g1(nk|pk, mk, s). We indicate this fact by drawing the arrow from mk to nk in Fig.4 by a dashed line. The computation of the posterior PDF for a structured com- plex system such as the one shown in Fig.4 can be factorised and consequently made computationally and statistically more efficient. Technical details will be given in Sec.IV. B. Information driven motion control After processing the measurements received at time k −1, the searcher needs to select the next control vector uk which will change its position to pk ∼p(pk|pk−1, uk). The problem of selecting uk can be formulated as a partially- observed Markov decision process [20], whose elements are: (1) the set of admissible control vectors Uk; (2) the current information state, expressed by the predicted PDF p(yk|ζ1:k−1, u1:k−1, uk), where uk ∈Uk; (3) the reward function associated with each control uk ∈Uk. In the paper we adopt motion control based on a single step ahead strategy; this myopic approach is suboptimal in the presence of randomly missing links, but is computationally easier to implement and faster to run. The control vector is then selected as: uk = arg max v∈Uk E{D(v, p(yk|ζ1:k−1, u1:k−1, v), ζk(v))} (12) where D(u, p, ζ) is the reward function. Note that the reward depends on future measurement ζk = [nk z⊺ k]⊺, which would be acquired after control u ∈Uk had been applied. Since the decision has to be made before the actual control is applied, the expectation E is taken in (12) with respect to the prior measurements PDF. Considering that the primary mission of the search is source localisation (map estimation is of secondary impor- tance), the reward function at time k is adopted as the information gain between: (1) the predicted PDF over the state subspace (s, pk) and (2) the updated PDF over (s, pk), using the count measurement nk. The two distributions are denoted π0(s, pk|uk) = p(s|n1:k−1, u1:k)p(pk|pk−1, uk) and π1(s, pk|nk, uk) = ξ ˜g1(nk|pk, s) π0(s, pk|uk), respectively, where ξ is a normalisation constant. The information gain be- tween the two distributions is measured using the Bhattacharya distance [21]. The reward function is thus adopted as: D(uk) = −2 log Z p π1(s, pk|nk, uk) · π0(s, pk|uk) ds dpk (13) where we dropped unnecessary arguments in notation for D. IV. THE SEARCH ALGORITHM The proposed search algorithm, formulated as a DBN with observer control, can be implemented efficiently as a Rao- Blackwellised particle filter (RBPF) [22] with sensor control. The idea of the RBPF is as follows. Suppose it is possible to divide the components of the hidden state vector yk into two 6 groups, αk and βk, such that the following two conditions are satisfied: C-1: p(yk|yk−1, uk) = p(αk|βk−1:k, αk−1)·p(βk|βk−1, uk) C-2: the conditional posterior distribution p(αk| β0:k, ζ1:k, u1:k) is analytically tractable. Then we need only to estimate the posterior p(β0:k|ζ1:k, u1:k), meaning that we reduced the dimension of the space in which Monte Carlo estimation needs to be carried out, from dim(yk) to dim(βk). As argued in [22], this potentially improves computational and statistical efficiency of the particle filter. In the described DBN, shown in Fig,4, in order to satisfy conditions C-1 and C-2, the state vector yk can be partitioned as follows: αk = [m⊺ k A]⊺ (14) βk = [p⊺ k X Y ]⊺. (15) We are interested only in the filtering posterior density, which can now be factorised as follows: p(αk, β0:k|ζ1:k, u1:k) = p(αk|β0:k, ζ1:k, u1:k) · p(β0:k|ζ1:k, u1:k) (16) The PDF p(β0:k|ζ1:k, u1:k) is approximated by a random sample {β(i) 0:k}N i=1. Subsequently one can compute analytically (for each sample β(i) 0:k): p(αk|β(i) 0:k, n1:k, z1:k, u1:k) = p(mk|z1:k, β(i) 0:k) · p(A|n1:k, β(i) 0:k), (17) where • p(mk|z1:k, β(i) 0:k) = qk is a vector of probabilities of existence for each link in the random grid and • p(A|n1:k, β(i) 0:k) is approximated by a Gamma distribution with shape parameter ηk and scale parameter θk, i.e. G (A; ηk, θk). Hence each particle corresponds to a set:  β(i) 1:k, qk, ηk, θk  (18) where qk, ηk, θk are the sufficient statistics of p(αk|β(i) 0:k, n1:k, z1:k, u1:k). Keep in mind that qk, ηk, θk depend on a particular sequence (particle) β(i) 0:k. A. Recursive formulae for sufficient statistics Let us first discuss the analytic recursive formula for the computation of qk, following the ideas of the grid-based SLAM [11]. Note that qk =p(mk|z1:k, β(i) 0:k) = g2(zk|mk, β(i) k ) p(mk|z1:k−1, β(i) 0:k−1) P mk g2(zk|mk, β(i) k ) p(mk|z1:k−1, β(i) 0:k−1) (19) where p(mk|z1:k−1, β(i) 0:k−1) = X mk−1 p(mk|mk−1) p(mk−1|z1:k−1, β(i) 0:k−1) (20) The update of probability vector qk is then carried out as follows. Recall from (15) that particle β(i) k specifies the location of the searcher at time k, p(i) k = [x(i) k y(i) k ]⊺. Each component of vector zk is then an observation of existence of a primary or a secondary link from location p(i) k . Let qj,k−1 be a component of vector qk−1, denoting the posterior probability that link ℓj exists at time k −1, i.e. qj,k−1 = p(mj,k−1|z1:k−1, β(i) 0:k−1). Recall also that since the link sta- tuses are assumed independent, then qk−1 = QL j=1 qj,k−1. According to (20), link j existence probability is predicted as: qj,k|k−1 = p(mj,k = 1|mj,k−1 = 0)(1 −qj,k−1)+ p(mj,k = 1|mj.k−1 = 1)qj,k−1 (21) Let z be a component of vector zk which refers to link ℓj, according to the current position of the searcher, p(i) k . Then based on (19) we update the link j existence probability as: qj,k =        pd qj,k|k−1 pd qj,k|k−1+pfa(1−qj,k|k−1) if z = 1 (1−pd) qj,k|k−1 (1−pd) qj,k|k−1+(1−pfa)(1−qj,k|k−1) if z = 0 (22) where pd and pfa, introduced in (8), are the elements of the appropriate detection Π matrix (primary or secondary) of (8). Equations (19)-(22) can be summarised as: qk = ψ(qk−1, β(i) k , zk) (23) Let us describe next the analytic recursion for the update of the parameters ηk and θk of (18). At time k −1, the posterior of emission rate A is modeled by a gamma distribution: A|n1:k−1, β(i) 0:k−1 ∼G (A; ηk−1, θk−1)) . (24) Sensor 1 provides at time k a count measurement nk, which plays the key role in the update of parameters ηk−1 and θk−1. Recall that the likelihood function of this measurement, ˜g1(nk|β(i) k , A) is a Poisson distribution with parameter (mean) λ(i) k−1, rather than A. Fortunately, λ(i) k−1 is linearly related to emission rate A, that is λ(i) k = A · c(β(i) k ) where the constant c(β(i) k ) is always positive and given by c(i) k = −1 2  2 log R0+ log (x(i) k −X(i))2 + (y(i) k −Y (i))2 (x(i) k Y (i) −y(i) k X(i))2 + (R2 0 −x(i) k X(i) −y(i) k Y (i))2  . (25) with X(i) and Y (i), according to (15), being the components of particle β(i) k . In the proposed algorithm for the update of parameters ηk−1 and θk−1 we use the following two properties of Gamma distribution: 7 1) Scaling property [23]: if X ∼G(η, θ) then for any c > 0, cX ∼G(η, cθ). 2) Gamma distribution is the conjugate prior of Poisson distributions [24]: if λ ∼G(η, θ) is a prior distribution and n is a sample from the Poisson distribution with parameter λ, then the posterior is λ ∼G(η + n, θ/(1 + θ)). Given β(i) k we can compute constant c(i) k of (25) and express the prior distribution of λ(i) k−1 as: λ(i) k−1|n1:k−1, β(i) 0:k ∼G  λ; ηk−1, c(i) k · θk−1  (26) Using measurement nk and the conjugate prior property, the posterior distribution is: λ(i) k |n1:k, β(i) 0:k ∼G λ; ηk−1 + nk, c(i) k θk−1 1 + c(i) k θk−1 ! (27) Since we are after the updated parameters of Gamma distribu- tion of A (rather than λ(i) k ), again using the scaling property we have: A|n1:k, β(i) 0:k ∼G A; ηk−1 + nk, θk−1 1 + c(i) k θk−1 ! (28) We summarise from (24) and (28) the analytic expressions for the update of ηk and θk : ηk = ηk−1 + nk (29) θk = θk−1 1 + c(i) k θk−1 (30) B. Importance weights Recursive estimation of p(β0:k|ζ1:k, u1:k) is implemented using a particle filter. If we use the transitional prior as the proposal distribution. i.e. q(β0:k|ζ1:k, u1:k−1) = p(βk|βk−1, uk) p(β0:k−1|ζ1:k−1, u1:k−1) (31) the importance weights can be computed recursively as follows [22]: wk ∝p(ζk|ζ1:k−1, β0:k) (32) For our problem expression (32) can be evaluated as: wk ∝ Z p(ζk, αk|ζ1:k−1, β0:k) dαk (33) = Z ˜g1(nk|A, βk) p(A|n1:k−1, β0:k−1) dA × X mk g2(zk|mk, pk) p(mk|z1:k−1, β0:k−1) (34) where p(A|n1:k−1, β0:k−1) is given by (24) and p(mk|z1:k−1, β0:k−1) = qk|k−1 by (20), i.e. qk|k−1 = X mk−1 p(mk|mk−1) qk−1. The components of vector qk|k−1, i.e. qj,k|k−1, were specified by (21). The integral which features in (34) can also be computed analytically. This integral equals: I = Z ˜g1(nk|A, βk) p(A|n1:k−1, β0:k−1) dA (35) = Z P(nk; λk = c(βk) · A) G(A; ηk−1, θk−1)dA (36) where P(n; λ) is the Poisson distribution introduced in (4). Recall the explanation presented in Sec.IV-A about the update of the parameters of the Gamma distribution, summarised by (26)-(28). Effectively we have shown there that: G A; ηk−1 + nk, θk−1 1 + c(i) k θk−1 ! = P(nk; λk = c(βk) · A) G(A; ηk−1, θk−1) R P(nk; λk = c(βk) · A) G(A; ηk−1, θk−1)dA (37) where the integral in the denominator is I, see (36). Hence, the integral can be expressed as: I = P(nk|λk = c(βk) · A) G(A; ηk−1, θk−1) G(A; ηk−1 + nk, θk−1/(1 + c(βk)θk−1)) (38) and is computed for an arbitrary chosen value of A > 0. Based on (34), let us summarise the expression for an unnormalised importance weight as: ˜wk = ϕ(βk, qk−1, ηk−1, θk−1, nk, zk) (39) Importance weights determine in a probabilistic manner which particles will survive (and possibly multiply) during the resam- pling step of the RBPF. C. Information gain Suppose the posterior distribution at time k −1, p(yk−1|ζ1:k−1, u1:k−1), is approximated by a set of particles: Yk−1 = n β(i) k−1, q(i) k−1, η(i) k−1, θ(i) k−1 oN i=1 , (40) where random sample β(i) k−1 consists of the searcher position p(i) k−1 = [x(i) k−1 y(i) k−1]⊺and the position of the source p(i) s = [X(i) Y (i)]⊺, see (15). The weights of the particles in Yk−1 are equal, because sensor control is carried out after resampling, i.e. w(i) k−1 = 1/N, i = 1, . . . , N. The question is how to compute the information gain (13) for each u ∈Uk, based on particles Yk−1. We adopt the ideal measurement approximation for this, that is, in hypothesizing the future count measurement (resulting from action u), we assume: (1) action u will be carried out correctly, that is the transitional density p(pk|pk−1, uk) will be replaced by deter- ministic mapping: pk = pk−1 + uk, and (2) the measurement count will be equal to the mean of ˜g1(nk|A, βk), that is λk (rounded off to the nearest integer). Since we are after the expected value of the gain, that is E{D(u)}, we will create an ensemble of “future ideal measurements” {n(j) k }M j=1. Expectation is then approximated by a sample mean, i.e. E{D(u)} ≈1 M M X j=1 D(j)(u) 8 where D(j)(u) was computed using n(j) k . The ensemble of “future ideal measurements” {n(j) k }M j=1 is created as follows. For each action u, choose at random a set of M particle indices ij ∈{1, . . . , N}, j = 1, . . . , M. Action u is then expected to move the searcher to location p(ij) k = p(ij) k−1+u. Then a “future ideal measurement” is n(j) k = ⌊A(ij) · c(ij)⌉, where c(ij) as a function of p(ij) k , p(ij) s was defined by (25), A(ij) ∼G(A; ηk−1, θk−1), and ⌊·⌉denotes the nearest integer function. It remains to explain how to compute the gain D(j)(u) based on n(j) k . Distribution π0(s, pk|uk), which features in (13), can be approximated using the particle set Yk−1 as follows: πo(s, pk, u) ≈G(A; ηk−1, θk−1) N X i=1 w(i) k−1δ(ps−p(i) s , pk−p(i) k ) (41) where p(i) k ∼p(pk|p(i) k−1, u). The updated distribution is π1(s, pk|u, n(j) k ) = ˜g1(n(j) k |pk, ps, A)πo(s, pk|u) R ˜g1(n(j) k |pk, ps, A)πo(s, pk|u)dsdpk . (42) Substitution of (41) and (42) into (13) leads to: D(j)(u) ≈−2 log PN i=1 w(i) k−1J (i)(n(j) k ) hPN i=1 w(i) k−1I(i)(n(j) k ) i1/2 (43) where I(i)(nk) is computed via (38) and J (i)(nk) = Z h P(nk; λ(i) k = c(β(i) k )A) i1/2 × G(A; η(i) k−1, θ(i) k−1)dA (44) Integral (44) can be evaluated numerically. D. Implementation The pseudo-code of one cycle of the search algorithm is presented in Algorithm 1. The input consists of the particle set Yk−1, defined by (40). Selection of the control vector uk (line 2 of Algorithm 1) is described in Algorithm 2. Explanation of the steps in Algorithm 1 are described first. Estimation of the state vector via the RBPF is carried out in lines 4-18. According to (15), random vector β(i) k−1 consists of p(i) k−1, X(i) and Y (i). Since the source location, (X(i), Y (i)), is static, only the component p(i) k−1 is propagated to future time k in line 6. In line 7, equation (39) is applied to compute the unnormalised weights of each particle. The map, represented by the probability of existence of each link, is updated in line 8, based on the expression (23). The parameters of Gamma distribution are update in lines 9-11. The weights assigned to each quadruple (β(i) k , q(i) k , η(i) k , θ(i) k ) are normalised in line 14. Resampling of particles is carried out in lines 15-18. The particles for source position p(i) s are not restricted to the grid nodes and after the resampling step, their diversity is improved by regularisation [25]. Finally, the output is the particle set Yk. The selection of a control vector, described by Algorithm 2, starts with postulating the set Uk in line 2. For every u ∈Uk, Algorithm 1 The searcher algorithm 1: Input: Yk−1 2: Run Algorithm 2 to select the control vector uk 3: Apply control uk and collect measurements zk, nk 4: Yk = ∅; Yk = ∅ 5: for i = 1, . . . , N do 6: Draw p(i) k ∼p(pk|p(i) k−1, uk) 7: ˜w(i) k = ϕ(β(i) k , q(i) k−1, η(i) k−1, θ(i) k−1, nk, zk) 8: q(i) k = ψ(q(i) k−1, β(i) k , zk) 9: η(i) k = η(i) k−1 + nk 10: Compute constant c(i) k as a function of β(i) k using (25) 11: θ(i) k = θ(i) k−1/(1 + c(i) k θ(i) k−1) 12: Yk = Yk ∪{(β(i) k , q(i) k , η(i) k , θ(i) k )} 13: end for 14: w(i) k = ˜w(i) k / PN j=1 ˜w(j) k , for i = 1, . . . , N 15: for i = 1, . . . , N do 16: Select index ji ∈{1, . . . , N} with probability w(i) k 17: Yk = Yk ∪{(β(ji) k , q(ji) k , η(ji) k , θ(ji) k )} 18: end for 19: Output: Yk the algorithm anticipates j = 1, . . . , M future measurements n(j) k (line 9) and accordingly computes a sample of the reward D(j)(u) (line 14). The expected reward is then a sample mean (line 16). Finally the optimal one-step ahead control is selected in line 18. It has been observed in simulations that one step ahead control can sometimes lead to situations where the observer position switches eternally between two or three nodes of the lattice. In order to overcome this problem, we adopt a heuristic as follows: if a node has been visited in the last 10 search steps more than 3 times, the next motion control vector is selected at random. While a multi-step ahead searcher control would be preferable than adopted heuristic, it would also be computationally more demanding. Multi-step ahead searcher control remains to be explored in future work. Algorithm 2 Selection of control vector 1: Input: Yk−1 2: Create the set of admissible controls Uk = {·, ↑, →, ↓, ←} 3: for every u ∈Uk do 4: for j = 1, . . . , M do 5: Choose at random particle index ij ∈{1, . . . , N} 6: p (ij) k = p (ij) k−1 + u; 7: Compute c (ij ) k using p (ij) k and p (ij) s via (25) 8: Adopt A(ij) = η (ij) k−1 · θ (ij) k−1 9: n(j) k = ⌊A(ij) · c (ij) k ⌉ 10: for i = 1, . . . , N do 11: Compute I(i)(n(j) k ) via (38) 12: Compute J (i)(n(j) k ) via (44) 13: end for 14: Compute D(j)(u) using (43) 15: end for 16: Estimate E{D(u)} as a sample mean of {D(j)(u)}M j=1 17: end for 18: Select control vector uk ∈Uk using (12) 9 V. NUMERICAL RESULTS A. An illustrative run We applied the described search algorithm to the search area modelled by the random grid shown in Fig.2. Prior knowledge available to the searcher is illustrated by Fig.1: the radius of the search area is R0 = 9; the centre is c = (0, 0) and the total number of potential links in the complete grid modelling the search area is L = 572. The parameters of the emitting source to be estimated are: X = 0, Y = 7 and A0 = 12. The searcher initial position is p0 = (9, −4). Dynamic model p(mk|mk−1) is is a 2×2 transitional prob- ability matrix with diagonal and off-diagonal elements 0.999 and 0.001, respectively1. Dynamic model p(pk|pk−1, uk) can be expressed as: p(pk|pk−1, uk) =(1 −pe)δ(pk −pk−1 + uk)+ X v∈Uk\uk pe |Uk| −1δ(pk −pk−1 + v) (45) where in simulations we used the value pe = 0.04. The parameters of detection matrices, which define the likelihood function g2(zk|pk, mk), are as follows: for primary observable links, pd = 1 and pfa = 0; for secondary observable links pd = 0.8 and pfa = 0.1. The RBPF used N = 4000 particles with M = 400 samples used in the averaging of information gain. The particle set Y0 at initial time is created as follows: p(i) 0 = p0, for all i = 1, . . . , N particles; the source location vector is drawn from a uniform distribution over a circle with centre c and radius R0, i.e. p(i) s ∼UCircle(c,R0)(ps); link existence probabilities are set to qj,0 = 0.5, for all j = 1, . . . , L links; finally, the parameters of initial Gamma distribution G(A; η0, θ0) were selected as η0 = 15 and θ0 = 1. We terminate the search algorithm when the searcher steps on the source. At this point we compare the true source location with the current estimate of the posterior distribution of the searcher position, approximated by particles {p(i) k }N i=1. If the true source position is contained in the support defined by {p(i) k }N i=1, the search is considered successful. Fig. 5 illustrates a typical run of the search algorithm. The true path of the searcher on this run is shown in Fig.5.(a). It took the searcher 53 time steps to reach the source. During the search, the motion control vector failed to execute correctly on two occasions. The final estimate of the map (i.e of existing links of the square lattice) is shown in Fig.5.(b). This figure shows only the links whose probability of existence is higher than 0.6. The blue circles in Fig.5.(b) indicate the posterior distribution of the searcher final position. Its true position, which is the same as the source position, is included in the support of this posterior, meaning that the search was successful. Moreover, on this occasion, the MAP estimate of the searcher final position coincides with the truth. Fig.5.(c) shows the measured values of the count number nk along the path. As we discussed in introduction, the measurements 1The structure of the search domain must be stable (recall that the count measurement model is valid for a steady-state), hence the changes in the statuses of links are very rare. Table I THE AVERAGE PERFORMANCE OF THE SEARCH ALGORITHM: DIFFERENT SOURCE LOCATIONS, A0 = 12 Source Shortest Number of Success location path length search steps rate [%] (0, 7) 20 42.1 94 (0, 1) 14 34.0 95 (2, −5) 8 28.8 99 are sporadic, especially in the beginning, when the distance between the searcher and the source is large: among the first ten count measurements, only three indicated a non-zero tracer concentration. An avi video file, illustrating a single run of the algorithm, is supplied with the submission of this paper. B. Monte Carlo runs The average performance of the search algorithm in the described scenario has been assessed via Monte Carlo runs. If the search on a particular run was successful, its corresponding search time is used in averaging. A run is declared unsuccess- ful if the source has not been found after k = 100 discrete- time steps. We also keep the statistics on the success rate of the search. The results obtained via averaging over 100 Monte Carlo runs are presented in Table I for three different locations of the source, i.e. (X, Y ) = (0, 7), (0, 1), (2, −5). The three locations correspond to the shortest path distances (from the searcher initial position p0 = (9, −4) to the source) of 20, 14 and 8 unit lengths, respectively. All other parameters were the same as described above for the illustrative run. As expected, the results in Table I indicate that the search is quicker and more reliable (i.e. with a higher success rate) for a source which is closer to the searcher initial position. Table II presents the results for a source at location (0, 7) but with three different values of the source release-rate, i.e. A0 = 8, 12, 16. The results indicate that the search is quicker for a source characterised by a higher release rate. The expla- nation of this trend is as follows. Initially, when the searcher is far from the source, its measurements of tracer concentration are very small, typically zero, hence uninformative. During this phase of the search, the searcher effectively moves according to a ‘diffusive’ (or random walk) model, which is slower that the so-called ’ballistic’ movement associated with information driven search [2]. The random walk phase is longer for a weaker source, which contributes to the overall longer search time in this case. As a specific numerical example, we have also validated that a purely random search never manages to find the source at (0, 7) in the given time-frame of 100 discrete-time steps. VI. SUMMARY The paper considers a very difficult problem of autonomous search for a diffusive point source of tracer in an environment whose structure is unknown. Sequential estimation and motion control are carried out in highly uncertain circumstances with the state space including, in addition to the source parameters, 10 −10 −5 0 5 10 −10 −8 −6 −4 −2 0 2 4 6 8 10 (a) −10 −5 0 5 10 −10 −8 −6 −4 −2 0 2 4 6 8 10 (b) 0 5 10 15 20 25 30 35 40 45 50 55 0 5 10 15 20 25 30 Discrete−time index k Count measurements nk (c) Figure 5. An illustration of a single run of the search algorithm: (a) The true path of the searcher (blue circles); (b) The final estimate of the map (existing links) and the searcher position; (c) measured counts nk over time Table II THE AVERAGE PERFORMANCE OF THE SEARCH ALGORITHM: DIFFERENT SOURCE RELEASE-RATES, SOURCE LOCATION 0, 7) Release Number of Success rate A0 search steps rate [%] 8 49.5 78 12 42.1 94 16 38.2 93 the map of the search area and the searcher position within the map. The paper develops mathematical models of mea- surements, it formulates the sequential Bayesian solution (in the form of a Rao-Blackwellised particle filter) and proposes an information driven motion control of the searcher. The numerical results demonstrate the concept, indicating high success-rates in comparison with random walk. There are many areas for further research and improvements of the concepts introduces in this paper. One direction is to explore the potential benefits of analytical results available from the percolation theory in carrying out olfactory search. Another is to investigate more efficient particle filters for source parameter estimation (being a deterministic part of the state space). Finally, the search strategies based on multiple steps ahead (rather than myopic search) are expected to improve the performance in a domain with obstacles. APPENDIX An approximate model of mean concentration, independent of the grid structure, was introduced in Sec.II-C. This model is a solution of the Laplace equation (1) for a circular search area in the absence of obstacles, with a boundary condition θ(r = R0) = 0, but using different values of parameters. More specifically, the obstacles in the search area are incor- porated in this model via homogenization (volume/ensemble averaging) of the diffusion equation (similar to the effective media approach [26], [27]), so that (1) is replaced with D ∆⟨θ⟩= A0δ(x −X, y −Y ), (46) where D is the re-scaled diffusivity that accounts for such a homogenization, and ⟨θ⟩is the time/ensemble averaged tracer concentration. The new (often called effective) diffusivity D is related to ’unobstructed’ diffusivity D0 of (1) via the formula D = fcD0. The scaling parameter 0 ≤fc ≤1 (known as tortuosity [14], [28]) describes the effect of obstacles (their shape and packing density [14], [27]). According to (46), the decrease of the effective diffusivity of the tracer due to the presence of obstacles has the same effect as an appropriate increase of source release-rate (i.e. A = A0/fc), with unchanged diffusivity in (46) (i.e. D = D0), where pa- rameters D0, A0 correspond to their values in an unobstructed space, see (1). We arrive at a conclusion that the effect of obstacles can be approximately incorporated with imprecision (overestimation) of the source release-rate, without any effect on the source position. Since the main goal of the search algorithm is to find the source, then such inaccuracy in release- rate estimation becomes irrelevant for the performance of the algorithm. This means that as the first approximation for 11 adopted measurements model we can still use (46) with known diffusivity D = D0 and some unknown A. Estimation of A0 (if required) can be implemented retrospectively based on a theoretical model for fc [28]). For the lattice models an expression for fc can be derived analytically by employing the framework of percolation theory, resulting in the expression fc = (1 −p/pc)α, where pc = 1/2 (percolation threshold on square lattice), p is the fraction of missing links in the incomplete square lattice and α = 1.30 [13], [14]. If the number of missing links is small, we can adopt approximation fc ≈1 and A ≈A0. In line with the above comments we will use (46) as a foundation for the measurements model that is independent of the structure of the search domain. The solution of (46) for a tracer source located at the center of circle (X = Y = 0), is given by [29]: ⟨θ⟩= A 2 log[(zz∗)/R2 0], (47) where z = x + iy is the complex coordinate and z∗is its complex-conjugate. To find the solution for configurations other than the circular domain with the source in the centre, we employ the property of conformal invariance of the Laplace equation [29]. We illustrate this method with a source placed inside the circular domain, but away from its centre (that is at coordinates (X, Y ) s.t. √ X2 + Y 2 < R0). If we can find a conformal transformation ω(z) that maps an arbitrary position of the source (X, Y ) back to the center of the circular domain, then we can still use the solution (47), but with the substitution z →ω(z). Therefore, for an arbitrary position of the source inside the search area √ X2 + Y 2 < R0 we can write ⟨θ⟩= (κ/2) log[(ww∗)/R2 0]. (48) The required conformal transformation is the well-known M¨obius map (see [29]) w(z) = R0(z −Z) ZZ∗−R2 0 , (49) where Z = X + iY . After straightforward calculations we arrive at the solution given by (5) and (6). We point out that the model is not restricted to a circular search area. According to the theory of analytical functions, a conformal mapping to the circle always exists for arbitrary simply connected domain, and therefore can be computed analytically or numerically [29]. REFERENCES [1] G. M. Viswanathan, V. Afanasyev, S. V. Buldyrev, E. J. Murphy, P. A. Prince, and H. E. Satnley, “Levy fligh search patterns of wandering albatrosses,” Nature, vol. 381, pp. 413–415, 1996. [2] O. B´enichou, C. Loverdo, M. Moreau, and R. Voituriez, “Intermittent search strategies,” Rev. Mod. Phys., vol. 83, pp. 81–129, Mar 2011. [3] A. M. Hein and S. A. McKinley, “Sensing and decision-making in random search,” PNAS, July 2012. [4] J. A. Farrell, S. Pang, and W. Li, “Plume mapping via hidden Markov models,” IEEE Trans. Systems, Man, Cybernetics - Part B: Cybernetics, vol. 33, no. 6, pp. 850–863, 2003. [5] W. Li, J. A. Farrell, S. Pang, and R. M. Arrieta, “Moth-inspired chemical plume tracking on an autonomous underwater vehicle,” IEEE Trans Robotics, vol. 22, no. 2, pp. 292–307, 2006. [6] J. Oyekan and H. Hu, “A novel bio-controller for localizing pollution sources in a medium peclet environment,” Journal of Bionic Engineer- ing, vol. 7, no. 4, pp. 345–353, 2010. [7] H. Ishida, G. Nakayama, T. Nakamoto, and T. Morizumi, “Controlling a gas/odor plume tracking robot based on transient responses of gas sensors,” IEEE Sensors Journals, vol. 5, no. 3, pp. 537–545, 2005. [8] M. Vergassola, E. Villermaux, and B. I. Shraiman, “‘infotaxis’ as a strategy for searching without gradients,” Nature, vol. 445, no. 25, pp. 406–409, 2007. [9] G. L. Iacono, “A comparison of different searching strategies to locate sources of odor in turbulent flows,” Adaptive Behavior, vol. 18, no. 2, pp. 155–170, 2010. [10] B. Ristic, M. Morelande, and A. Gunatilaka, “Information driven search for point sources of gamma radiation,” Signal Processing, vol. 90, pp. 1225–1239, 2010. [11] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. MIT Press, 2005. [12] A. Marjovi and L. Marques, “Multi-robot olfactory search in structured environment,” Robotics and Autonomous Systems, vol. 59, pp. 867–881, 2011. [13] D. ben Avraham and S. Havlin, Diffusion and reaction in fractals and disordered systems. Cambridge University Press, 2000. [14] S. Torquato, Heterogeneous materials. Springer, 2002. [15] J. G. Kemeny and J. L. Snell, Finite Markov chains. New York: Van Nostrand, 1960. [16] A. P. S. Selvadurai, Partial differential equations in Mechanics 1. Springer, 2000. [17] R. Burioni and D. Cassi, “Random walks on graphs: ideas, techniques and results,” Journal of Physics A: Mathematical and General, vol. 38, no. 8, p. R45, 2005. [18] A. Doucet, J. F. G. de Freitas, and N. J. Gordon, Eds., Sequential Monte Carlo Methods in Practice. Springer, 2001. [19] T. Dean and K. Kanazawa, “A model for reasoning about persistence and causation,” Computational intelligence, vol. 5, no. 2, pp. 142–150, 1989. [20] E. Chong, C. Kreucher, and A. Hero, “POMDP approximation using simulation and heuristics,” in Foundations and Applications of Sensor Management, A. Hero, D. Castanon, D. Cochran, and K. Kastella, Eds. Springer, 2008, ch. 8. [21] T. Kailath, “The divergence and Bhattacharyya distance measures in signal selection,” IEEE Trans. Communic. Technology, vol. 15, no. 1, pp. 52–60, 1967. [22] A. Doucet, N. de Freitas, K. P. Murphy, and S. J. Russell, “Rao- Blackwellised particle filtering for dynamic Bayesian networks,” in Pro- ceedings of the 16th Conference on Uncertainty in Artificial Intelligence, 2000, pp. 176–183. [23] “Gamma-distribution,” in Encyclopedia of mathematics, M. Hazewinkel, Ed. Springer, 2001. [24] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian data analysis, 2nd ed. CRC Press, 2003. [25] B. Ristic, S. Arulampalam, and N. Gordon, Beyond the Kalman filter: Particle filters for tracking applications. Artech House, 2004. [26] C. Nicholson, “Diffusion and related transport mechanisms in brain tissue,” Reports on progress in physics, vol. 64, pp. 815–884, 2001. [27] I. L. Novak, P. Kraikivski, and B. M. Slepchenko, “Diffusion in cytoplasm: Effects of excluded volume due to internal membranes and cytoskeletal structures,” Biophysical journal, vol. 97, pp. 758–767, 2009. [28] L. Pisani, “Simple expression for the tortuosity of porous media,” Transport in Porous Media, vol. 88, no. 2, pp. 193–203, 2011. [29] A. Prosperetti, Advanced mathematics for applications. Cambridge University Press, 2011.