arXiv:1610.05882v2 [cs.RO] 19 Oct 2021  Cognitive Indoor Positioning and Tracking using Multipath Channel Information  Erik Leitinger, Paul Meissner, and Klaus Witrisal  Abstract —This paper presents a   robust   and   accurate   position- ing system that adapts its behavior to the surrounding environ- ment, mimicking the capability of the visual brain to filtering out clutter and focusing attention on activity and relevant informa- tion. Especially in indoor environments, which are characterized by harsh multipath propagation, robust positioning is still hard to achieve under the constraint of reasonable infrastructural needs. In such environments it is essential to separate relevant from irrelevant information and attain an appropriate uncertainty model for measurements that are used for positioning.  Index   Terms —Cognitive   dynamic   systems,   Cram ́ er-Rao bounds,   localization,   simultaneous   localization   and   mapping, radio channel models  I. I NTRODUCTION  A. Motivation and State of the Art  For radiobased positioning in indoor environments, which are characterized by harsh multipath propagation, it is still elusive to achieve the needed level of accuracy   robustly 1   under the constraint of reasonable infrastructural needs. In such environments it is essential to separate relevant from irrelevant information and attain an appropriate uncertainty model for measurements that are being used for positioning. To approach this objective more closely the four basic principles for   human cognition , namely the   perception-action- cycle (PAC) ,   memory ,   attention   and   intelligence   [1] are im- plemented into the positioning systems as schematically il- lustrated in Fig. 2. To encounter all these principles, the concepts of multipath-assisted indoor navigation and tracking (MINT) [2]–[5] are intertwined with the principles of cognitive dynamic systems (CDS) that were developed in [6]–[10]. Evidently, a perceptive system has to reason with measure- ments under   uncertainty   [11], i.e. it has to treat the gained information   probabilistically   [12], [13], but it also has to deliberately take actions on the environment and consequently influence measurements to reason in favor of relevant informa- tion instead of irrelevant one. Hence, cognitive processing of measurement data for positioning seems to be a natural choice to overcome such severe impairments. MINT exploit specular multipath components (MPCs) that can be associated to the local geometry as illustrated in Fig. 1. MPCs can be interpreted as signals originiating from addi- tional virtual sources, so-called virtual anchors (VA). These VAs are mirror-images of a physical anchor w.r.t. the flat surfaces as illustrated in Fig. 1 [2], [14], [15]. This additional  position-related information   can be utilized from the radio  1 We define robustness as the percentage of cases in which a system can achieve its given potential accuracy.  signals. For a proper consideration of uncertainties in the floor plan and to account for the stochastic nature of the radio signals a   geometry-based probabilistic environment model (GPEM)   and   a   geometry-based   stochastic   channel   model (GSCM) where introduced in [16]–[19], extending MINT to a simultaneous localization and mapping (SLAM) approach. Such a systems acquires and adapts online information about its surrounding environment and is able to continuously build- ing up a consistent memory in a Bayesian sense. The idea of combing MINT with a CDS is to gain control over the observed environment information to (i) provide as much position-related information to the Bayesian state estimator as possible for achieving the highest level of re- liability/robustness in position estimation, (ii) to improve the separation between relevant and irrelevant information, and (iii) building up a consistent environment and action memory. By actively planning next control actions on the environment using the Bayesian memory—in sense of waveform adaptation [6], [20]–[22] or mobile agent motor-control [23], [24]— the relevant information-return contained in the signals can be maximized. The information-flow coupling between the perceptor-actor system and the surrounding environment is given by the PAC that plays the key-role when it is coming to gather relevant environment information [1], [10]. The core feedback loop of the cognitive dynamic system, the perception-action-cycle resembles the idea of optimally choosing future measurements based on a physical model under reasoning with   uncertainty. The   principle has been explored by the physics community under the term Bayesian experimental design [25]. This decision-theoretic process gives a mathematical justification for selecting the appropriate opti- mality criterion under uncertainty that maximizes the utility function of the posterior probability density function, such that new model information of the acquired measurements can be   predicted. Information theoretic measures such   as the conditional entropy [26], the mutual information [26] or the determinate of the Fisher information matrix [27], [28] are suitable utility functions for this process. The active selection of measurement parameters has a lot in common with cognitive perception and control at the lowest layer. However, it lacks an explicit description of a layered memory structure that, in combination with algorithmic attention leads to an “intelligent” behavior of the overall system. II. MINT C ONCEPTS  In this section we review basic elements of MINT [3], [29] starting with the signal model, then discussing the estimation
of the MPC parameters, and finally introducing position related information that is of main importance for a proper weighting of the MPC-VA relations in the Bayesian tracking filter. All not-geometrically-modeled propagation effects in the signals, so-called diffuse multipath (DM) [30], constitute interference to the useful position-related information.  A. Geometry-based Stochastic Signal Model (GSCM)  Our signal model is the following. During time step   n , a baseband radio signal   s ( t )   is transmitted from the   j -th physical anchor located at position   a ( j ) 1   ∈   R 2 ,   j   ∈ { 1 , . . . , J }   =   J   , to a mobile agent at position   p n   ∈   R 2 . The corresponding received signal is given as [3]  r ( j )  n   ( t ) =  K ( j )  n ∑  k =1  α ( j )  k,n s ( t   −   τ   ( j )  k,n  )   +   s ( t )   ∗   ν ( j )  n   ( t ) +   w ( t ) .   (1) Here, the first term describes the contributions from   K ( j )  n  specular MPCs with complex amplitudes   α ( j )  k,n   and delays   τ   ( j )  k,n , where   k   ∈   { 1 , . . . , K ( j )  n  }   =   K ( j )  n   . The delays   τ   ( j )  k,n   correspond to the distances between the agent and the   j -th physical anchor (for   k   = 1 ) or the VAs of the   j -th physical anchor (for   k   ∈ { 2 , . . . , K ( j )  n  } ). Thus,   τ   ( j )  k,n   =   ∥ ∥ p n   − a ( j )  k  ∥ ∥/ c , where   a ( j )  k   ∈   R 2  is the position of the respective (physical or virtual) anchor and  c   is the speed of light. The energy of the transmitted signal  s ( t )   is assumed to be normalized to one. The second term in (1) denotes the convolution of   s ( t )   with the diffuse multipath (DM)   ν ( j )  n   ( t ) , which is modeled as a non-stationary zero-mean Gaussian random process. Considering uncorrelated scattering along the delay axis   τ   , the auto-correlation function of   ν ( j )  n   ( t )  is given by   E ν  { ν ( j )  n   ( τ   ) ν ( j )   ∗  n   ( u ) }   =   S ( j )  ν,n ( τ   ) δ ( τ   −   u ) , where  S ( j )  ν,n ( τ   )   represents the power delay profile of DM. The DM process   ν ( j )  n   ( t )   is assumed to be quasi-stationary in the spatial domain, which means that   S ( j )  ν,n ( τ   )   does not change in the vicinity of   p n   [31]. Note that the DM component interferes with the useful position-related information. The last term in (1),   w ( t ) , is additive white Gaussian noise with double-sided power spectral density   N 0 / 2 .  B. MPC Parameter Estimation  The delays of the MPCs at agent position   p n   are estimated from the received signals using a sparse Bayesian channel estimator [32]. The algorithm estimates up to a predefined maximum number   M   of MPCs yielding estimated delays  ˆ τ   ( j )  m,n   and according complex amplitudes   ˆ α ( j )  m,n , with   m   ∈ { 1 , . . . , M   ( j )  n   }   =   M ( j )  n   . The estimated delays are scaled by the speed of light   c   and used as noisy distance measurements  z ( j )  m,n   =   c ˆ τ   ( j )  m,n   in the proposed multipath-assisted SLAM algorithm. Furthermore, in a real-world MINT system, the amplitude estimates   ˆ α ( j )  m,n   (after being associated with the   k - th anchor) are fed into a higher-level, non-Bayesian algorithm that determines the signal-to-interference-plus-noise power ra- tio (SINR) between the useful specular MPC and the DM plus noise. This SINR is related to the range standard deviation  σ ( j )  m,n   (see [29], [33] for details). Note that an extension to additional parameters besides the delay (and the corresponding amplitude), as for example the angle-of-arrival and angle-of- departure of the MPCs, is straightforward.  C. Position and Range Uncertainty  As a performance measure and lower bound on the position error we use the Cramer-Rao-Lower Bound (CRLB) defined by the inequality   E {|| p   −   ˆ p ||} ≥   tr { J − 1  p   } , where   J p   is the equivalent Fisher information matrix (EFIM) [3], [34], [35] for the position vector and   tr {·}   is the trace operator. Assuming no path overlap between MPCs, the EFIM   J p   is formulated for a set of anchors in a canonical form by [3]  I I I p ,n   = 8 π 2 β 2  c 2  J ∑  j =1  K ( j )  n ∑  k =1  SINR ( j )  k,n I I I r  (  φ ( j )  k,n  )  ,   (2) where   β   denotes the effective (root mean square) bandwidth of   s ( t )   and   I I I r ( φ ( j )  k,n )   is the ranging direction matrix, which is a rank-one matrix with an eigenvector in direction   φ ( j )  k,n   from the agent to the   k -th VA. The signal-to-interference-plus-noise ratios (SINRs) are described by the ratio between the energy of the deterministic MPCs to the interfering DM plus noise  SINR ( j )  k,n   =   | α ( j )  k,n | 2  N 0   +   T p S ( j )  ν,n ( τ   ( j )  k,n )   (3) The according MPC range uncertainties   σ ( j ) 2  k,n   = var  {  z ( j )  k,n  }  to already associated VAs is given as  σ ( j ) 2  k,n   ≥  (   8 π 2 β 2  c 2   SINR ( j )  k,n  ) − 1  .   (4)  D. Geometry-based   Probabilitstic   Environment   Model (GPEM)  Fig. 1 illustrates the probabilistic geometric environment model. A signal exchanged between an anchor at position  a ( j ) 1   and an agent at   p ( m )   contains specular reflections at the room walls, indicated by the black lines 2 . These reflec- tions can be modeled geometrically using the VA   a ( j )  k   with  k   = 1 , . . . , K ( j )   that are mirror-images of the   j -th anchor w.r.t. walls [2], [14], [15]. The number of VAs per anchor  j   is defined as   K ( j ) . The VAs of all anchors are comprised in   A n   =   { A ( j )  n  } J j =1 , where   A ( j )  n   =   { a ( j )  k,n  } K ( j )  n  k =1   . To be able to cope with uncertainties in the floor plan the deterministic geometric model of the VA positions   a ( j )  k   of the   j -th anchor, is extended to a probabilistic one as shown in Fig. 1. The VA positions and the agent position   p ( m )   are represented by a joint PDF   p ( p ( m ) ,   a ( j ) 1   ,   a ( j ) 2   , . . . ,   a ( j )  K ( j )  n  ) . If the position of the   j -th anchor is assumed to be known exactly, the joint PDF reduces to   p ( p ( m ) ,   a ( j ) 2   , . . . ,   a ( j )  K ( j )  n  ) . The joint PDF of the agent and the VA positions is rep- resented by a multivariate Gaussian RV, where the figure shows the marginal distributions of the agent   p ( p ( m ) )   (dashed black ellipses) and the VA positions   p ( a ( j )  k   )   (red ellipses). The  2 Since the radio channel is reciprocal, the assignment of transmitter and receiver roles to anchors and agents is arbitrary and this choice can be made according to the application scenario.
−0.5   0   0.5   1   1.5   2   2.5   3   3.5   4   4.5  0  0.5  1  1.5  2  2.5   a false  a ( j ) 1  a ( j ) 2  a ( j ) 3  a ( j ) 4  p  Fig. 1.   Illustration of the VA for the   j -th anchor and an agent with PDF  p ( a ( j )  k  )   and   p ( p ( m )   ) , respectively. The VA at position   a false   represents a false detected VA.  marginal distribution   p ( a false  )   (dashed red ellipse) defines a wrongly detected VA at position   a false . The anchor position  a ( j ) 1   is assumed to be known perfectly. Uncertainty in the floor plan does not just mean that the VA positions are uncertain and thus described by RV, but also that floor plan information is incorrect/inconsistent or entirely missing. This means that positioning and tracking algorithms based on VA, have to consider this lack of knowledge.  E. Probabilistic Data Association (PDA)  The state of the agent   x n   = [ p T  n   ,   v T  n   ] T , where   v n   is the velocity, evolves according to the state transition probability density function (PDF)   p ( x n | x n − 1 )   over time instances   n . From each VA in   A ( j )  n   and the predicted agent position, a set of expected MPC distances   D ( j )  n   at time step   n   is computed. The MPC distances described in Section II-B are subject to a data association uncertainty, i.e., it is not known which measurement in   z ( j )  n   originated from which VA   k   of the   j - th anchor, and it is also possible that a measurement   y ( j )  m,n   did not originate from any VA (false alarm, clutter) or that a VA did not give rise to any measurement (missed detection). The probability that a VA is detected is denoted by   P d . Possible associations at time instance   n   are described by the   K ( j )  n   - dimensional random vector   b ( j )  n   =   [ b ( j ) 1 ,n   · · ·   b ( j )  n,K ( j )  n  ] T , whose  k -th entry is defined as [5], [17]–[19], [36], [37]  b ( j )  k,n   =          m   ∈ { 1 , . . . , M   } ,   a ( j )  k   generates measurement  z ( j )  m,n  0 ,   a ( j )  k   did not give rise to any measurement. We also define   b n   =   [ b (1)   T  n   · · ·   b ( J )   T  n  ] T . False alarms are modeled by a uniform distribution with mean arrival rate  μ , and the distribution of each false alarm measurement is described by the PDF   f FA  ( z ( j )  m,n  )   [38], [39], factoring in a likelihood that a measurement correspond to a false alarm. The statistical dependence of the distance measurement vec- tors   z n   =   [ z (1)  n   ,   · · ·   ,   z ( J )  n  ] T   on the agent state vector   x n   and the association vector   b n   is described by the   global likelihood function   f   ( y n | x n ,   b n ) . Under commonly-used assumptions about the statistics of the measurements [2], [38], [39], the global likelihood function at time instances   n   factors as  f   ( z n | x n ,   b n ) =  J ∏  j =1  (   M ∏  m =1  f FA  ( z ( j )  m,n  ))  ×   ∏  k ∈Q ( x n   , b ( j )  n   )  f  (  z ( j )  b ( j )  k,n ,n  ∣ ∣ ∣ x n ;   a ( j )  k   , σ ( j )  k,n  )  f FA  (  z ( j )  b ( j )  k,n   ,n  )   ,  where   Q ( x n ,   b ( j )  n   )   ,   { k   ∈ { 1 , . . . , K ( j )  n   }   :   b ( j )  k,n   6   = 0 ,   } . The local likelihood function   f   ( z ( j )  m,n | x n ;   a ( j )  k   , σ ( j )  k,n  )   is related to a noisy measurement of the distance to VA   a ( j )  k   at agent position  p n   which is modeled as  z ( j )  k,n   =   ‖ p n   −   a ( j )  k   ‖   +   v ( j )  k,n   ,  where   v k,n   is a zero-mean Gaussian random variable with standard deviation   σ ( j )  k,n   as described in (4). Based on the factorized likelihood model, a probabilistic data association algorithm is used to compute the associations between the expected delay to the VAs and the estimated MPCs using belief propagation as described in [5], [17]–[19], [36], [37]. The most probable MPC-to-anchor associations are obtained by means of an approximation of the maximum a posterior (MAP) detector [40]  ˆ b ( j )   MAP  k,n   ,   arg max  b ( j )  k,n   ∈{ 1 ,...,M }  p ( b ( j )  k,n  ∣ ∣ z ) .   (5) After the PDA was applied for all anchors, the following union sets are defined:  •   The set of associated discovered (and optionally a-priori known) VAs   A n, ass   =   ⋃  j   A ( j )  n, ass .  •   The according set of associated measurements   Z n, ass   =  ⋃  j   Z ( j )  n, ass .  •   The set of remaining measurements   Z n, ass   =   ⋃  j   Z ( j )  n, ass , which are not associated to VAs of   A n, ass .  F. MINT-SLAM  In the most generic form, the prediction equation for the VAs   A n   and the agent state   x n   = [ p n ,   x n ] T , can be written as, using the Markovian assumption,  p ( x n ,   A n |Z 1: n − 1 ) =  ∫  p ( x n − 1 ,   A n − 1 |Z 1: n − 1 ) p ( x n | x n − 1 )  ×   p ( A n |A n − 1 ) d { x n − 1 ,   A n − 1 } ,   (6) where   p ( x n | x n − 1 )   and   p ( A n |A n − 1 )   are the state transition probability distribution functions of the agent and the VAs, respectively. The latter can be represented by an identity function. The update equation is then  p ( x n ,   A n |Z 1: n ) =   p ( Z n | x n ,   A n ) p ( x n ,   A n |Z 1: n − 1 )  p ( Z 1: n |Z 1: n − 1 )   ,   (7) where   p ( Z n | x n ,   A n )   is the likelihood function of the current measurements. Assuming that the agent moves along a path
according to a linear Gaussian constant-velocity motion, the state space model is defined as,  x n   =   Fx n − 1   +   Gn a ,n  =       1   0   ∆ T   0 0   1   0   ∆ T  0   0   1   0 0   0   0   1        x n   +       ∆ T   2  2   0 0   ∆ T   2  2  ∆ T   0 0   ∆ T        n a ,n ,   (8) where   ∆ T   is   the   discrete   time   update   rate.   The   driving acceleration noise term   n a ,n   is zero-mean, circular symmetric with variance   σ 2 a   , and models motion changes which deviate from the constant-velocity assumption. The transformed noise covariance matrix is given as   R a   =   σ 2 a   GG T . The entire state space of   x n   and the associated VAs   A n, ass   described in (6) are formulated as [4], [16]   ̃ x n   =  [   F   0 4 × 2 K n  0 2 K n   × 4   I 2 K n   × 2 K n  ]   ̃ x n − 1   +  [   G 0 2 K n × 2  ]  n a ,n ,   (9) where    ̃ x n   = [ x T  n   ,   a T 2 ,n , . . . ,   a T  K n ,n ] T   represents the stacked state vector with   { a ( j )  k,n } ∈ A n, ass . The covariance matrix of the state vector consists of the agent covariance matrix   C x n   , the cross-covariances   C x n   , a k,n   between the agent state   x n   and the VAs at positions   a k,n , the cross-covariances between VAs  C a k,n , a k ′ ,n   with   k   6   =   k ′ , and the covariances of the VAs   C a k,n   . The measurement model is defined as  z n   =  ̃ h n ( ̃ x n ) +  ̃ n z,n ,   (10) where   z n   is defined in (5) with the according stack measure- ment noise vector    ̃ n z,n . The measurement model    ̃ h n   contains all   distance equations   || a ( j )  n,k   −   p n ||   ∀   a ( j )  n,k   ∈   A n, ass   to update the agent and the VAs, respectively. As Bayesian state estimator a UKF is used [4]. The measurement covariance matrix is written as  R n   = diag  {  var  {  z ( j )  k,n  }}  ∀   k, j   :   a ( j )  k,n   ∈ A n, ass ,   (11) where the range variances are defined by (4). III. C OGNITIVE   P OSITIONING   S YSTEM  The basic building blocks of a CDS, namely the   perception- action cycle (PAC) ,   cognitive perceptor (CP) ,   information feedback   and the   cognitive controller (CC)   are depicted in Fig. 2. All of these blocks are reciprocally coupled and form a hierarchical structure to enable the ability to interpret the environmental observables on different abstraction layers.  A. Multipath-assisted Positioning as CDS  Figure 2 illustrates the block diagram of a cognitive local- ization and tracking system with a triple layered structure:  •   First Layer : Defines (i) the direct Bayesian state esti- mation   p ( x n  ∣ ∣ Z n ,   c n  )   at the CP holds the agent position and its velocity, and (ii) the cognitive control parameters  c n   at the CC based on the feedback information of the Bayesian state space filter.  •   Second   Layer :   Represents   (i)   the   memory   for   the GPEM   described   by   the   VAs   with   marginal   PDF  p ( A ( j )  n   |Z ( j )  n   ,   c n  )   and the memory for the GSCM de- scribed by the   SINR ( j )  k,n   of the MPC at the CP and (ii) the memory of VAs specific waveform parameters at the CC, which specify on which the cognitive control is based on.  •   Third Layer : It represents the highest layer and is dif- ferent from the two layers below in the sense that it defines the application driven by the cognitive localiza- tion/tracking system. The CP memory of applications holds abstract parameters or structures of the specified application and the CC enables the motor control for realizing higher goal planning [41]. The first and second layers describe the signal and information processing of the model parameters of the surrounding phys- ical environment and the radio channel. On the other hand, the third layer holds higher goal parameters, i.e. motor-control input to fulfill navigation goals, that are based on the physical- related parameters [41]–[43].  B. Feedback Information  The system is able to adapt online its behavior to the environment, i.e. perceptual attention is given, through the following principles:  •   At the CP side, the GSCM and GPEM memories are up- dated using the received signal   r n ( t,   c n )   with waveform parameters chosen by the CC.  •   In the actual sensing cycle the attention is put through the CC using the control parameters   c n   on the potential set of VA and their parameters memorized in the GSCM and GPEM. These model parameters are seen at the CP side of Fig. 2. Now the question is, “ How to control the environment infor- mation flow through the received signal and put cognitive attention on the relevant features in the following sensing cycle? ” The answer to this lies in the CC and the feed-back and feed-forward information between the perceptor and the controller as illustrated in Fig. 2. The control parameter vector   c n +1   of the next sensing cycle is chosen in order to gain the most “valuable” position-related information from the new set of measurements    ̃ Z n + l   using the predicted posterior   p ( x n + l ,   A n + l |  ̃ Z n + l ,   b n + l ,   c n + l  )   the predicted received signals    ̃ Z n + l   that depends on the chosen signal model, with   l   = 0 , . . . , l future   as future horizon. This goal can be reached by minimizing an expected cost-to-go function, yielding  c n +1   = arg min  c n  C ( p ( x n + l ,   A n + l |  ̃ Z n + l ,   b n + l ,   c n + l  )) ,  (12) where   C ( · )   is the expected cost-to-go function for optimal control [25], [43] of the environmental information contained in    ̃ Z n + l . The expected cost-to-go function is based on an information-theoretic measure that should depend on the envi- ronment parameters, like the VA specific   SINR ( j )  k,n , and serves as feedback information in the CDS. In general, estimation and control problems have to deal with probabilistic states and observations. As a consequence, also the control has to be probabilistic, i.e. the cost function or utility must handle uncertainties. Based on covering the
s n ( t, a n )   r n ( t, a n )  J n ( a ) π n ( a, a ′ )   p ( x n )   p ( A n, ass )  Cognitive Controller   Cognitive Perceptor Information Flow Feedforward Information Feedback Information  h ( x n , a n )   b  Policy   π n : Waveform Selection  Learning & Planning  Bayesian State Filter:    ̃ x n ,    ̃ C n  Bayesian VA Memory: { a k,n ,   C a k,n  } {  C x n , a k,n   ,   C a k,n   , a k ′ ,n  }  { SINR k,n }  Short-term Memory:   z − 1  Environment  Perception-Action Cycle  Fig. 2.   Block diagram of the cognitive indoor positioning and tracking system that uses multipath channel information.  uncertainty of the state with a PDF, a measure of informative- ness of measurements has to be defined on the posterior state distribution. Two commonly used information measures of an RV are the entropy [44] and the Fisher information [28].  C. Information Measures for Feedback 1) Fisher   Information:   The   Fisher   information   matrix (FIM) of a RV   r , dependent on the deterministic parameter  p , can also be used as a measure of information. Using the likelihood function   ln   f   ( r ;   p ) , it is defined as  I I I p   =   E r ; p  {[   ∂  ∂ p   ln   f   ( r ;   p )  ] [   ∂  ∂ p   ln   f   ( r ;   p )  ] T }  .   (13)  2) Entropy:   For a continuous-valued vector RV   p   ∈   R L  (in the follow-up sections   p   represents the agent position), the conditional entropy is given as [26]  h ( p )   .  =   − E p   { ln   p ( p ) }   =   −  ∫   ∞ −∞  · · ·  ∫   ∞ −∞  p ( p ) ln   p ( p )d p ,  (14) The   entropy is   directly   related   to   the   uncertainty of   the according RV. For a multivariate Gaussian RV   N   ( m p ,   C p )  this means that the entropy is directly related to the covariance matrix   C p , yielding  h ( p ) = 1  2 ln   ( (2 πe ) L   det   C p  ))   ,   (15) where   det( · )   defines the determinant of a matrix. The de- terminate of the covariance matrix   C p   is a measure of the “volume” of uncertainty of   p . The more compact the volume is, the smaller is the entropy   h ( p )   and consequently the more informative is the distribution   p ( p ) . The inverse of the FIM is a lower bound on the covariance  C ˆ p    I I I − 1  p   of the deterministic parameter   p   of an estimator  ˆ p   [28]. Looking at the entropy of the estimator’s distribution  N   (ˆ p ,   C ˆ p ) , the explicit relationship between the FIM   I I I p   of   r  (dependent on   p ) and the entropy   h (ˆ p )   is given as  h (ˆ p ) = 1  2 log   ( (2 πe ) L   det   ( C ˆ p  ))   (16)  ≥ −   1  2 log   ( (2 πe ) L   det   ( I I I p  ))   .  As the relationship in (16) shows, one can connect the FIM of a parameter vector with the entropy, resulting in a scalar measure of information that is valuable for choosing optimal waveform parameters, as it is needed for a cognitive posi- tioning system. As it is shown in Section II-C, the FIM   I I I p  on the position of the agent   p   contains the environment and signal parameters, e.g. VA positions and the according SINRs. With this, a direct relationship between the environment, the feedback information and the control of the sensing is given, closing the PAC (Figure 2). In the same manner, the system can also be expanded to information-based control of the agent
state to increase the informativeness in the measurements [23], [24], [42]. IV. C OGNITIVE   MINT  A. Cognitice Controller: Reinforcement Learning (RL)  As already stated in Section III, the control parameters should be chosen in order to optimize the expected cost-to-go function   C   ( · )   of the predicted posterior PDF as defined in (12). In general, the expected cost-to-go function for a Bayesian state space filter can be written as  C   ( p ( x n +1 ,   A n +1 |  ̃ r n +1 ( t,   c n )) =  ̄ g   ( ǫ n +1 | n +1 ( c n ) )   ,   (17) where   ǫ n +1 | n +1 ( c n )   is the predicted posterior state-estimation error depend on the control parameters and    ̄ g ( · )   defines the cost-to-go function of the transmitter. The conditional entropy was discussed as a possible information measure for the feed- back, thus a possible cost-to-go function    ̄ g ( · )   of the transmitter is the conditional entropy of the predicted posterior state- estimation error   ǫ n +1 | n +1 ( c n ) , given as    ̄ g   ( ǫ n +1 | n +1 ( c n ) )   =  h   ( ǫ n +1 | n +1 ( c n ) )   [26], [45]. This entropy conditioned on the control parameter vector   c n   is directly coupled with the posterior covariance matrix of the Bayesian tracking filter that is lower bounded by the inverse of the EFIM in (2). The entropy of the predicted posterior state-estimation error (when assuming a Gaussian approximation) is given as  h   ( ǫ n +1 | n +1 ( c n ) )   ∝   det   (  ̃ C x n +1   ( c n ) ) ,   (18) where    ̃ C x n +1   ( c n )   and   I I I x n +1   ( c n )   is the predicted state co- variance matrix as described in Section II-F of the state vector provided from the Bayesian state space filter (UKF) dependent on the control parameter vector   c n . Thus, the entropy in (18) is directly coupled with the position-related information that is contained in the measurement noise covariance matrix  R z ,n   described by (11). How the introduced algorithm is using the state space and measurement model equations of the Bayesian state space estimator is described in more detail in Sections IV-B2 and IV-B3. For readability of the following derivations of the control optimization   algorithm, the   cost-to-go of   the   CC   (18)   is rewritten as    ̄ g   ( ǫ n +1 | n +1 ( c n ) )   =   h   ( x n +1 ,   c n )   with   c n   ∈ A , where   A   is the space of cognitive action with size   |A|   that represents the waveform library in our case. Consequently, the next set of waveform parameters has to be chosen in order to minimize the cost-to-go of the next posterior entropy. As elaborated in [46], dynamic programming represents an optimal solution for such problems, but unfortunately it is based on the assumption that the state to be controlled is “perfectly” perceivable. Hence, methods have been introduced that are capable of handling imperfect state information [47] with the drawback that they are computational complex. In [6], [45] approximate dynamic programming was used for optimal control. In there, the trace of the posterior covariance matrix was used as cost-to-go function to reduce the computational complexity. The policy for control parameter selection in the transmitter at time instance   n   is seeking to find the set of waveform parameters, for which the cost-to-go function   ̄ g ( ǫ n +1 | n +1 ( c n ))   ≈   tr[    ̃ C x n +1   ( c n )]   is minimized for a rolling future horizon of   l future   predicted states. In practice, it is difficult to construct all state transition probabilities from one state to another that are conditioned on the selected actions, including their cost incurred as a result of each transition. RL 3  [48] represents an approximation of dynamic programming [46], [47] for solving such optimal control and future planning task with high computationally efficiency. In RL literature the cost-to-go function is termed value-to-go function   J n ( c n )  that is updated online for every PAC based on the immediate rewards   r n . The immediate reward   r n   is a measure of “quality” of an action   c n   taken on the environment. Using the Markovian assumption and following the way in [8], it is given by  r n   =   g n   ( h ( x n − 1 ,   c n − 1 )   −   h ( x n ,   c n ))   ,   (19) where   h ( x n ,   c n )   ∝   det   ( C x n   ( c n ) )   and   g n ( · )   is an arbitrary scalar operator that in its most general form could also depend on the time instance   n   [8]. A reasonable function for the reward is the scaled change in the posterior entropy from one PAC to the next, i.e.  r n   = sign (∆ h ( x n ,   c n )   ∣ ∣ log   ( | ∆ h ( x n ,   c n ) | )∣ ∣   .   (20) A positive reward will be favoring the current action   a n   for the future action   c n +1   and conversely a negative one will lead to a penalty for these actions. As described in [8], the cognitive RL algorithm has to find the optimal future action   c n +1   for the next PAC based on the immediate reward   r n   and the learned value-to-go function   J n ( c n ) . For computing the expected costs of future actions as it is done in dynamic programming, RL divides the computation of the value-to-go function into two parts, (i) the learning phase that incorporates the actual measured reward into the value-to- go function based on actions   c n   and   c n − 1 , and (ii) the planning phase   that   incorporates   predicted   future   rewards   into   the value-to-go function. Whereas for learning a “real” reward is perceived from the environment, for planning just model-based predicted rewards are perceived from the internal perceptor memory using the feedforward link. A faster convergence to the optimal control policy can be achieved in this way.  B. Learning and Planning: Algorithm  The   value-to-go function   that   is   used   in   the   cognitive controller is defined as [8]  J n ( c ) =   E π n  { r n   +   γr n +1   +   γ 2 r n +2   +   · · · | c n   =   c }   ,   (21) where   r n   with   c   ∈   C   is the actual reward,   r n + l   are the predicted future rewards that are based on the GPEM and GSCM parameter that are used by the Bayesian filter,  0   < γ   ≤   1   is the discount factor for future rewards based on action   c n   ∈ C   and the expected value is calculated using the cognitive policy  π n ( c ′ ,   c ) =   P   [ c n +1   =   c ′ | c n   =   c ]   ,   c ,   c ′   ∈ C ,   (22)  3 RL represents an intermediate learning procedure that lies between super- vised and unsupervised learning as stated in [45].
where   P [ ·|· ]   defines a conditional PMF that describes the tran- sition probabilities of all actions   c   ∈ C   over time instances   n . Following the derivations in [8], the value-to-go function can be reformulated in an incremental recursive manner, yielding  J n ( c )   ←   J n ( c )+ α  [  R ( c ) +   γ   ∑  c ′  π n ( c ,   c ′ ) J n ( c ′ )   −   J n ( c )  ]  ,  (23) where   R ( c ) =   E π n   { r n | c n   =   c } ∀   c   ∈ C   denotes the expected immediate reward and   α   >   0   is   the   learning rate.   The algorithm for updating the value-to-go function can be found in the Appendix of [4]. The incremental recursive update in (23) means that for all actions   c   ∈ C   the value-to-go function is updated using the expected immediate reward and the policy  π n ( c ,   c ′ )   for all these actions.  1) Learning from applied Actions:   With the value of the immediate reward   r n , a new value is learned for the value- to-go function for the   currently selected   action   c n   using  J n ( c n )   ←   (1   −   α ) J n ( c n ) +   α R ( c n   )   of (23). This accounts for the “real” physical action on the environment. Hence, only one parameter set can be chosen as an action for the PAC   at   a   time;   it   would   take   at   least   |C| T   seconds for applying all actions on the environment and collecting the according immediate rewards, where   T   is the time period of a PAC. Unfortunately, this results in a poor convergence rate of the algorithm and unacceptable behavior for time-variant environments. A possible remedy against this is the planning of future actions based on the state space and measurement model of the Bayesian state estimator.  2) Planning for Improving Convergence Behavior:   Plan- ning is defined as predicting expected future rewards using the state and measurement model of the Bayesian state space filter to improve the convergence rate of the RL algorithm. As depicted in Fig. 2, the feedforward link is used to connect the controller with the perceptor. The feedforward information is a hypothesized future action, which is selected for a future planning stage. Inspecting (23), one can observe that for every action   c   ∈ M , where   M ⊂ C   is a subset of   C   depending on the actual policy   π n , the predicted posterior covariance matrices    ̃ C n + l ( c )   and the according predicted future rewards  r n + l , are computed with decreasing discount factor   γ l   for predicted future rewards, for   l   = 1 , . . . , l future , where   l future   is the future horizon. The predicted covariance matrices    ̃ C n + l ( c )  for a specific future action   c   is computed using the state space (e.g. (9)) and measurement model (e.g. (10)) of the Bayesian state space estimator and the according GPEM and GSCM parameters stored in the perceptors’ memory as shown in Fig. 2. After the planning process is finished, the value-to- go function is updated for all actions   c   ∈ M . Finally, the actual PAC is closed by updating the policy to   π n +1   using the value-to-go function   J n +1   and choosing the new action, i.e. the waveform parameters, for the next PAC according to this new policy. This means that the value-to-go function   J n ( c n )   and the policy   π n   are updated iteratively from one another from one PAC to the next PAC, with one important detail which is discussed below.  a) Explore/Exploit trade-off::   Both the planning process and choosing new actions are based on the policy. In planning, the chosen action-subset   M   is defined by sampling from the policy   π n   and new actions are selected based on the updated policy   π n +1 . Hence, the policy is responsible for the explore/exploit trade-off in the action space. A widely used method for balancing the exploration of new actions and exploiting the already learned value-to-go function   J n ( c n )   is the   ǫ -greedy strategy, meaning that with a small probability of  ǫ   a random action is selected, representing pure exploration, and with probability of   1   −   ǫ   the action is chosen according to the maximum of the value-to-go function, representing a pure exploitation. The random selection of a new action and the action in the subspace   M   can either be selected from a uniform distribution over the action space   A   or from the policy  π n . The policy is computed using the Boltzmann distribution  π n + l   =   π n + l − 1 ( c )   exp { ∆ J n + l ( c ) /τ   }  ∑  c ′   π n + l − 1 ( c ′ ) exp { ∆ J n + l ( c ′ ) /τ   }   ,  where   τ   defines the exploration degree and is referred to as the system temperature [49] and   ∆ J n + l ( c ) =   J n + l ( c )   −  J n − 1+ l ( c ) . The cognitive action is selected according to  c n   =  {   random action   ∼   π n +1   ∈ C   if   ξ < ǫ  arg max c ∈C   J n ( c )   otherwise   ,   (24) where   0   ≤   ξ   ≤   1   is a uniform random number drawn at each time step   n . As we have said, from the policy in (24) the new action   c n +1   is selected and applied on the environment so that the next PAC can start. The important concept of   attention  at the perceptor as well as the actuator side in the cognitive dynamic system can be argued with the following:  •   Perceptual   attention:   Is   given   by   the   fact   that   the environment dependent parameters, i.e. the marginal PDF of the VA   p ( a k,n )   and their multipath channel dependent reliability measures,   SINR k,n , are learned and updated online, so that the perceptual Bayesian state space filter puts its attention on the relevant position-related informa- tion in the received signal.  •   Control attention:   Is given by the fact that the policy  π n   that is learned over time and the according subset of actions   M   put focus on the “more relevant” actions. These action in turn focus on the relevant position-related information in the received signal.  3) Waveform Library:   The general form of the waveform library contains the control parameters   c n   =   { T p ,n , f   j  c ,n } J j =1  for the   j -th anchor consisting of carrier frequencies and pulse durations. Hence, the VA specific MPC parameters are esti- mated using specific sub-bands of the radio channel spectrum defined by the parameter pair   T   j  p ,n   and   f   j  c ,n , which in turn is chosen in an “optimal” manner. Optimal in this case means that the position-related information that is contained in the MPC parameters is maximized at agent position   p n   (see for (2)). Equations   (3)   and   (2),   which   describe   the   parameters   ̃ SINR j k,n , show the relation between the pulse parameter pair  T   j  p ,n   and   f   j  c ,n   and the position-related information contained in the channel. The pulse duration   T   j  p ,n   scales the amount of DM and is directly proportional to the effective root mean square bandwidth   β . The relation to   f   j  c ,n   is not that obvious, since
0   2   4   6  0  1  2  3  4  5  6  7  8  9  x [ m ]  y [ m ]  p (1) 1  p (2) 1  Fig. 3.   Scenario for probabilistic MINT using cognitive sensing in presence of additional DM interference. The anchors are at the positions   a (1) 1   and  a (2) 1   . The black line represents the agent trajectory and the red part of the line indicates the agent positions, where the DM interference is activated.  it describes the frequency dependency of the environment parameters and thus the GSCM parameters as the complex amplitudes of the MPC and the DM PDP. The set of selected VA should lead to the highest overall SINR values (and accordingly the smallest range variances   var  {   ˆ d j k,n  }  ) and the smallest possible GDOP 4 , i.e. geometric optimal constellation of VA positions which is reflected by the ranging direction matrix   I I I r ( φ j k,n ) . In a cognitive sense this means that the actions   a   ∈ A   are chosen to reduce the posterior entropy over time under quasi-stationary environment conditions. V. R ESULTS  A. Measurement Setup  For the evaluation of this positioning approach, we use the seminar room scenario of the MeasureMINT database [51]. The measurements allow for   5   trajectories consisting of   1000   agent positions with a   1   cm spacing as shown in Fig. 3. At each position, UWB measurements are available of the channel between the agent and the two anchors at the positions   a (1) 1   = [0 . 5 ,   7] T   and   a (2) 1   = [5 . 2 ,   3 . 2] T . The measurements have been performed using an M-sequence cor- relative channel sounder developed by   Ilmsens . This sounder provides measurements over approximately the FCC frequency range, from   3   −   10   GHz. On anchor and agent sides, dipole- like antennas made of Euro-cent coins have been used. They  4 The GDOP the ratio between position variance and the range variance [50]. For positioning a small value indicates a high level of confidence that high precision can be reached. Hence, the GDOP indicates a “good ” geometry for positioning, i.e. a good geometric placement of the anchors.  0   0.01   0.02   0.03   0.04   0.05   0.06   0.07   0.08   0.09   0.1  0  0.2  0.4  0.6  0.8  1  conventional MINT  cognitive MINT, win = 1  cognitive MINT, win = 5  PSfrag replacements  P ( p )[ m ]  CDF  Fig. 4.   Performance CDF of the cognitive MINT algorithm using a smaller restricted set of VA. Visibilities of VA are computed using the SINR instead of optical ray-tracing.  have an approximately uniform radiation pattern in azimuth plane and zeroes in the directions of floor and ceiling. The chosen initial pulse duration is   T p   = 0 . 5   ns (corre- sponding to a bandwidth of   2   GHz) and the center frequency is   f c   = 7   GHz. The VA for the anchors at the positions   a (1) 1   and  a (2) 1   were computed a-priori up to order   2 . The past window of agent positions for the SINR estimation is again chosen to be   w past   = 40 . For all simulations   30   Monte Carlo runs were conducted.  B. Initial Experiment Setup  For the sake of simplicity, we reduce the control parameters to just the carrier frequency   c n   =   f c ,n   for each PAC for all anchors and we fix the pulse duration   T p . This means that the cognitive MINT system adaptively finds the carrier frequency  f c ,n   from PAC to PAC that yields the highest reward from the environment by maximizing the position-related information. Starting from the initial value   f c , 1   = 7   GHz (which represents the center of the measured bandwidth), the carrier frequency is adapted over time using the posterior entropy in (18). The finite space of cognitive actions   C   contains the discrete frequency values bounded by the measured bandwidth, i.e.  f c ,n,i   ∈   C , where   i   =   1 , . . . ,   |C| . The frequency spacing between the frequency bins is equidistant,   ∆ f c   =   f c ,n,i +1   −  f c ,n,i . For the experiments, we haven chosen   ∆ f c   = 50   MHz, considering the large signal bandwidth of   2   GHz. The start- ing policy is defined as a uniform distribution   π 1 ( c ′ ,   c ) =  U ( f c ,n, 1 , f c ,n, |C| )   and the cost-to-go function is chosen to be  J 1 ( c ) = 0   ∀ c . The size of the planning subspace is   |M|   = 20 ; the size of   C   is   |C|   = 40 .  C. Discussion of Performance Results 1) Conventional MINT:   Fig. 4 shows the overall position error CDF for “conventional” MINT (which assumes perfect floor plan knowledge) with and without cognitive waveform adaptation. To show the advantage of the cognitive MINT algorithm, a restricted set of VA is chosen and the visibilities of the VA are computed using the SINR instead of optical ray-tracing. As the CDF of “conventional” MINT indicates (blue line with circle marker), the tracking algorithm tends to
0   0.01   0.02   0.03   0.04   0.05   0.06   0.07   0.08   0.09   0.1  0  0.2  0.4  0.6  0.8  1  prob. MINT  cog. prob. MINT, win = 1  cog. prob. MINT, win = 5  P ( p )[ m ]  CDF  Fig. 5.   Performance CDF of cognitive probabilistic MINT using a smaller restricted set of VA. For probabilistic MINT, the visibilities of VA are always computed using the SINR.  diverge since too little position-related information is available. The black and the red lines show the overall position error CDF for cognitive MINT for a future horizon window of  l   = 1   and   l   = 5 , respectively. As one can observe, the perfor- mance is significantly increased due to the cognitive waveform adaptation. This means that the cognitive MINT algorithm is able to increase the amount of position-related information by changing the sensing spectrum via the carrier frequency  f c ,n,i   ∈   A   to bands that carry more geometry-dependent information in the MPC. Another interesting observation of Fig. 4 is that an increase of the planning horizon results in an increased performance, confirming the correct functionality of the cognitive algorithm.  2) Probabilistic MINT:   Fig. 5 shows the overall position error CDF for probabilistic MINT with and without cog- nitive waveform adaptation. Uncertainties in the floor plan and wrong associations can be robustly handled due to the probabilistic treatment of VA and thus none of the individual trajectory runs diverges. The already achieved high accuracy and robustness of probabilistic MINT are the reasons that cognitive sensing leads to only a minor additional performance gain for this scenario. It is suspected that for lower bandwidth the performance gain induced by the cognitive probabilistic MINT should be much more distinct.  3) Probabilistic MINT with additional DM Interference:  In the last setup, we additionally have added synthetic DM interference filtered at a carrier frequency   f c   = 7   GHz, with a bandwidth of   2   GHz. The DM parameters are chosen according to [52] except for the DM power. The experiments were conducted with three levels of DM power,   Ω 1   = 1 . 1615 ∗ 10 − 9 ,  Ω 1   = 5 . 8076   ∗   10 − 9   and   Ω 1   = 1 . 1615   ∗   10 − 8 . Fig. 3 illustrates the scenario used for the experiment. The black line represents the agent trajectory and the red part of it indicates the agent positions, where the DM interference is activated. Fig. 6 shows the signals exchanged between the agent and the Anchors   1   and   2   for one sample position. The “clean” signals are shown in Fig. 6a, the noisy signal for DM power of   Ω 1   = 1 . 1615   ∗   10 − 9   in Fig. 6b. Looking at Fig. 6b it is quite obvious that this level of DM represents already a severe interference. The justification of using such a interference noise model lies in the fact that it can describe  0   5   10   15   20   25  0  0.2  0.4  0.6  0.8  1  0   5   10   15   20   25  0  0.2  0.4  0.6  0.8  1  PSfrag replacements  path delay   [ m ]  path delay   [ m ]  | r (1)  n   ( t ) | | r (2)  n   ( t ) |  (a) Clean signals  0   5   10   15   20   25  0  0.2  0.4  0.6  0.8  1  0   5   10   15   20   25  0  0.2  0.4  0.6  0.8  1  PSfrag replacements  path delay   [ m ]  path delay   [ m ]  | r (1)  n   ( t ) | | r (2)  n   ( t ) |  (b) Noisy signal with DM power of   Ω 1   = 1 . 1615   ∗   10 − 8  Fig. 6. Signals exchanged between agent and Anchors   1   and   2   for an example agent position. The gray lines represent the estimated delays of the MPC. Fig. 6a shows the “clean” signal and Fig. 6b the noisy signal.  0   100   200   300   400   500   600   700   800   900   1000  5.5  6  6.5  7  7.5  8  8.5  initial f c,0  DM band at f c,0  mean f c,n  examples of f c,n  PSfrag replacements  time index   n f   [GHz]  Fig. 7. Mean carrier frequency for DM power   Ω 1   = 1 . 1615 ∗ 10 − 8   . The black line denotes for the initial carrier frequency   f c , 1   and the blue one the mean of the cognitively adapted carrier frequency   f c ,n . The blue dashed lines show a few example realizations of cognitively adapted carrier frequencies along different trajectories and for different Monte Carlo runs.  many kinds of measurement modeling mismatches, e.g. if the anisotropy of the antenna pattern for different angle of arrivals is not considered. Fig. 7 illustrates the mean values of the cognitively adapted carrier frequency along one of the trajectories at DM power
0   100   200   300   400   500   600   700   800   900   1000  0  0.5  1  1.5  2  2.5  3  3.5  4 x 10 −8  mean prob. MINT  example runs prob. MINT  mean cog. prob. MINT  example runs cog. prob. MINT  DM disturbance  time index   n  Entropy  Fig. 8. Mean entropy of probabilistic MINT and cognitive probabilistic MINT over time instances   n   for DM power   Ω 1   = 1 . 1615   ∗   10 − 8   . The red and black dashed lines show a few example entropy realizations along different trajectories and for different Monte Carlo runs.  Ω 1   = 1 . 1615   ∗   10 − 8 . The mean is computed using the   30  Monte Carlo simulations of the experiment. The black line denotes the initial carrier frequency   f c , 1   and the blue one the mean of the cognitively adapted carrier   f c ,n . The blue dashed lines show a few example realizations of cognitively adapted carrier frequencies along different trajectories and for different Monte Carlo runs. The figure shows quite clearly that the cognitive probabilistic MINT algorithm is avoiding (almost at all agent positions, where additional DM interference is present) carrier frequencies   f c,n   near to the carrier of DM. Fig. 8 shows the according mean entropy values of proba- bilistic MINT (red line with diamond markers) and cognitive probabilistic MINT (black line with triangle markers) over time instances   n   for DM power   Ω 1   = 1 . 1615   ∗   10 − 8 . The red and black dashed lines show a few example entropy realizations along different trajectories and for different Monte Carlo runs. Before the noise disturbance starts the entropy of the probabilistic MINT algorithm is almost the same as of the cognitive probabilistic MINT algorithm. In the moment the disturbance is introduced, the entropy of the posterior increases. The cognitive probabilistic MINT algorithm then starts to change its carrier frequency   f c,n   (as shown in Fig. 7) until the entropy is again reduced. This leads to an almost constant or even decreasing entropy even in the presence of a tremendous noise level (black line with triangle markers in Fig. 8). In contrast to that the probabilistic MINT algorithm without cognitive waveform adaptation starts to diverge after the disturbance is introduced and is not able to recover. This is indicated by the rapid increase of the entropy and stagnation at a large value shown in Fig. 8 by the red line with diamond markers. This result is confirmed by looking at the performance CDF of the agent position error shown in Fig. 9. This comparison between probabilistic MINT and cognitive probabilistic MINT illustrates the powerful property of the cognitive algorithm to separate relevant from irrelevant information using adaptation of the control parameter   f c,n   to avoid the noisy frequency band of the signal. The probabilistic MINT algorithm without wave- form adaptation tends to diverge under such harsh conditions as depicted by CDF drawn with solid lines. In contrast to this,  0   0.01   0.02   0.03   0.04   0.05   0.06   0.07   0.08   0.09   0.1  0  0.2  0.4  0.6  0.8  1  prob. MINT: noise 1  cognitive prob. MINT: noise 1  prob. MINT: noise 2  cognitive prob. MINT: noise 2  prob. MINT: noise 3  cognitive prob. MINT: noise 3  PSfrag replacements  P ( p )[ m ]  CDF  Fig. 9. Performance CDF of the cognitive probabilistic MINT algorithm with introducing a disturbance at three different noise levels along a certain part of the trajectory. Noise   1   corresponds to DM with   Ω 1   = 1 . 1615   ∗   10 − 9 , Noise  2   with power   Ω 1   = 5 . 8076   ∗   10 − 9   and with power   Ω 1   = 1 . 1615   ∗   10 − 8  the cognitive MINT algorithm overcomes these impairments, leading again to a   robust   behavior as depicted by CDF drawn with dashed lines. R EFERENCES [1]   J. M. Fuster,   Cortex and Mind - Unifying Cognition .   Oxford University Press, 2003. [2]   P. Meissner, “Multipath-Assisted Indoor Positioning,” Ph.D. dissertation, Graz University of Technology, 2014. [3]   E. Leitinger, P. Meissner, C. Rudisser, G. Dumphart, and K. Witrisal, “Evaluation of Position-Related Information in Multipath Components for Indoor Positioning,”   IEEE Journal on Selected Areas in Communi- cations , vol. 33, no. 11, pp. 2313–2328, Nov 2015. [4]   E. Leitinger, “Cognitive Indoor Positioning and Tracking using Mul- tipath   Channel   Information,”   Ph.D. dissertation,   Graz University   of Technology, 2016. [5]   E. Leitinger, M. F., P. Meissner, K. Witrisal, and F. Hlawatsch, “Belief Propagation based Joint Probabilistic Data Association for Multipath- Assisted Indoor Navigation and Tracking,” in   2016 International Con- ference on Localization and GNSS (ICL-GNSS) , June 2016. [6]   S. Haykin, Y. Xue, and P. Setoodeh, “Cognitive Radar: Step Toward Bridging the Gap Between Neuroscience and Engineering,”   Proceedings of the IEEE , vol. 100, no. 11, pp. 3102 –3130, nov. 2012. [7]   S. Haykin, M. Fatemi, P. Setoodeh, and Y. Xue, “Cognitive Control,”  Proceedings of the IEEE , vol. 100, no. 12, pp. 3156 –3169, dec. 2012. [8]   M. Fatemi and S. Haykin, “Cognitive Control: Theory and Application,”  Access, IEEE , vol. 2, pp. 698–710, 2014. [9]   A. Amiri and S. Haykin, “Improved Sparse Coding Under the Influence of Perceptual Attention,”   Neural Comput. , vol. 26, no. 2, pp. 377–420, Feb. 2014. [10]   S. Haykin and J. Fuster, “On Cognitive Dynamic Systems: Cognitive Neuroscience and Engineering Learning From Each Other,”   Proceedings of the IEEE , vol. 102, no. 4, pp. 608–628, April 2014. [11]   J. Pearl,   Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference .   San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1988. [12]   P. Gregory,   Bayesian Logical Data Analysis for the Physical Sciences . New York, NY, USA: Cambridge University Press, 2005. [13]   D. S. Sivia and J. Skilling,   Data analysis : a Bayesian tutorial , ser. Oxford science publications.   Oxford, New York: Oxford University Press, 2006. [14]   J. Borish, “Extension of the Image Model to arbitrary Polyhedra,”   The Journal of the Acoustical Society of America , March 1984. [15]   J. Kunisch and J. Pamp, “An Ultra-Wideband space-variant Multipath Indoor Radio Channel Model,” in   Ultra Wideband Systems and Tech- nologies, 2003 IEEE Conference on , Nov 2003, pp. 290–294. [16]   E. Leitinger, P. Meissner, M. Lafer, and K. Witrisal, “Simultaneous Localization and Mapping using Multipath Channel Information,” in  2015 IEEE International Conference on Communications Workshops (ICC) , London, UK, June 2015, pp. 754–760.
[17]   E. Leitinger, F. Meyer, F. Tufvesson, and K. Witrisal, “Factor graph based simultaneous localization and mapping using multipath channel information,” in   Proc. IEEE ICCW-17 , Paris, France, May 2017, pp. 652–658. [18]   E. Leitinger, F. Meyer, F. Hlawatsch, K. Witrisal, F. Tufvesson, and M. Z. Win, “A belief propagation algorithm for multipath-based SLAM,”  IEEE Trans. Wireless Commun. , vol. 18, no. 12, pp. 5613–5629, Dec. 2019. [19]   E. Leitinger,   S. Grebien, and K. Witrisal, “Multipath-based   SLAM exploiting AoA and amplitude information,” in   Proc. IEEE ICCW-19 , Shanghai, China, May 2019, pp. 1–7. [20]   D. Kershaw and R. Evans, “Optimal waveform selection for tracking systems,”   Information Theory, IEEE Transactions on , vol. 40, no. 5, pp. 1536 –1550, sep 1994. [21]   S. Haykin, A. Zia, Y. Xue, and I. Arasaratnam, “Control Theoretic Approach to Tracking Radar: First step towards cognition,”   Digital Signal Processing , vol. 21, no. 5, pp. 576 – 585, 2011. [22]   K. Bell, C. Baker, G. Smith, J. Johnson, and M. Rangaswamy, “Cognitive radar framework for target detection and tracking,”   Selected Topics in Signal Processing, IEEE Journal of , vol. 9, no. 8, pp. 1427–1439, Dec 2015. [23]   G. Hoffmann and C. Tomlin, “Mobile Sensor Network Control Using Mutual Information Methods and Particle Filters,”   Automatic Control, IEEE Transactions on , vol. 55, no. 1, pp. 32–47, Jan 2010. [24]   B. J. Julian, M. Angermann, M. Schwager, and D. Rus, “Distributed Robotic Sensor Networks: An Information-theoretic   Approach,”   Int. J. Rob. Res. , vol. 31, no. 10, pp. 1134–1154, Sep. 2012. [Online]. Available: http://dx.doi.org/10.1177/0278364912452675 [25]   K.   Chaloner   and   I.   Verdinelli,   “Bayesian   Experimental   Design:   A Review,”   Statistical Science , vol. 10, no. 3, pp. pp. 273–304, 1995. [Online]. Available: http://www.jstor.org/stable/2246015 [26]   T.   M.   Cover   and   J.   A.   Thomas,   Elements   of   Information   Theory (Wiley Series in Telecommunications and Signal Processing) .   Wiley- Interscience, 2006. [27]   H. L. Van Trees,   Detection, Estimation and Modulation, Part I .   Wiley Press, 1968. [28]   S.   Kay,   Fundamentals   of   Statistical   Signal   Processing:   Estimation Theory .   Prentice Hall Signal Processing Series, 1993. [29]   P. Meissner, E. Leitinger, and K. Witrisal, “UWB for robust indoor tracking: Weighting of multipath components for efficient estimation,”  Wireless Communications Letters, IEEE , vol. 3, no. 5, pp. 501–504, Oct 2014. [30]   N. Michelusi, U. Mitra, A. Molisch, and M. Zorzi, “UWB Sparse/Diffuse Channels, Part I: Channel Models and Bayesian Estimators,”   Signal Processing, IEEE Transactions on , vol. 60, no. 10, pp. 5307–5319, 2012. [31]   A. Molisch, “Ultra-Wide-Band Propagation Channels,”   Proceedings of the IEEE , vol. 97, no. 2, pp. 353–371, Feb. 2009. [32]   S. Grebien, E. Leitinger, K. Witrisal, and B. H. Fleury, “Super-resolution channel estimation including the dense multipath component — A sparse variational Bayesian approach,” 2021, in preperation. [33]   K. Witrisal, P. Meissner, E. Leitinger, Y. Shen, C. Gustafson, F. Tufves- son, K. Haneda,   D. Dardari,   A. F. Molisch,   A. Conti, and M. Z. Win, “High-Accuracy Localization for Assisted Living: 5G systems will turn multipath channels from foe to friend,”   IEEE Signal Processing Magazine , vol. 33, no. 2, pp. 59–70, March 2016. [34]   Y. Shen and M. Win, “Fundamental Limits of Wideband Localization; Part I: A General Framework,”   Information Theory, IEEE Transactions on , vol. 56, no. 10, pp. 4956–4980, Oct. 2010. [35]   Y. Shen, H. Wymeersch, and M. Win, “Fundamental Limits of Wideband Localization; Part II: Cooperative Networks,”   Information Theory, IEEE Transactions on , vol. 56, no. 10, pp. 4981 –5000, Oct. 2010. [36]   F. Meyer, T. Kropfreiter, J. L. Williams, R. Lau, F. Hlawatsch, P. Braca, and M. Z. Win, “Message passing algorithms for scalable multitarget tracking,”   Proc. IEEE , vol. 106, no. 2, pp. 221–259, Feb. 2018. [37]   F. Meyer, P. Braca, P. Willett, and F. Hlawatsch, “Scalable Multitarget Tracking using Multiple Sensors: A belief propagation approach,” in  Information Fusion (Fusion), 2015 18th International Conference on , July 2015, pp. 1778–1785. [38]   Y. Bar-Shalom and X.-R. Li,   Multitarget-Multisensor Tracking : Prin- ciples and Techniques .   Storrs, CT: Yaakov Bar-Shalom, 1995. [39]   J. Vermaak, S. J. Godsill, and P. Perez, “Monte Carlo filtering for multi target tracking and data association,” vol. 41, no. 1, pp. 309–332, Jan. 2005. [40]   S. Kay,   Fundamentals of Statistical Signal Processing: Detection The- ory .   Prentice Hall Signal Processing Series, 1998. [41]   H. Wymeersch, “The Impact of Cooperative Localization on Achieving higher-level Goals,” in   Communications Workshops (ICC), 2013 IEEE International Conference on , June 2013, pp. 1–5. [42]   F. Meyer, H. Wymeersch, M. Frohle, and F. Hlawatsch, “Distributed Estimation   With   Information-Seeking   Control   in   Agent   Networks,”  Selected Areas in Communications, IEEE Journal on , vol. 33, no. 11, pp. 2439–2456, Nov 2015. [43]   B.   Grocholsky   and   B.   Grocholsky,   “Information-Theoretic   Control of   Multiple   Sensor   Platforms,”   Ph.D.   dissertation,   Department   of Aerospace, Mechatronic and Mechanical Engineering, 2002. [44]   C. Shannon, “A Mathematical Theory of Communication,”   Bell System Technical Journal, The , vol. 27, no. 4, pp. 623–656, Oct 1948. [45]   S. Haykin,   Cognitive Dynamic Systems: Perception-action Cycle, Radar and Radio .   New York, NY, USA: Cambridge University Press, 2012. [46]   R. Bellman,   Dynamic Programming .   Princeton, NJ, USA: Princeton University Press, 1957. [47]   D. P. Bertsekas,   Dynamic Programming and Optimal Control .   Athena Scientific, 2000. [48]   R. S. Sutton and A. G. Barto,   Introduction to Reinforcement Learning , 1st ed.   Cambridge, MA, USA: MIT Press, 1998. [49]   A. Lazaric, M. Restelli, and A. Bonarini, “Reinforcement learning in continuous action spaces through sequential monte carlo methods,” in  Advances in Neural Information Processing Systems , 2007. [50]   Z. Sahinoglu, S. Gezici, and I. Guvenc,   Ultra-wideband Positioning Systems – Theoretical Limits, Ranging Algorithms and Protocols .   Cam- bridge University Press, 2008. [51]   P. Meissner, E. Leitinger, M. Lafer, and K. Witrisal, “MeasureMINT UWB   database,”   www.spsc.tugraz.at/tools/UWBmeasurements,   2013, Publicly available   database   of UWB indoor   channel measurements. [Online]. Available: www.spsc.tugraz.at/tools/UWBmeasurements [52]   J. Karedal, S. Wyne, P. Almers, F. Tufvesson, and A. Molisch, “A Measurement-Based   Statistical   Model   for   Industrial   Ultra-Wideband Channels,”   Wireless Communications, IEEE Transactions on , vol. 6, no. 8, pp. 3028–3037, Aug. 2007.