arXiv:1610.05882v2  [cs.RO]  19 Oct 2021
Cognitive Indoor Positioning and Tracking using
Multipath Channel Information
Erik Leitinger, Paul Meissner, and Klaus Witrisal
Abstract—This paper presents a robust and accurate position-
ing system that adapts its behavior to the surrounding environ-
ment, mimicking the capability of the visual brain to ﬁltering out
clutter and focusing attention on activity and relevant informa-
tion. Especially in indoor environments, which are characterized
by harsh multipath propagation, robust positioning is still hard to
achieve under the constraint of reasonable infrastructural needs.
In such environments it is essential to separate relevant from
irrelevant information and attain an appropriate uncertainty
model for measurements that are used for positioning.
Index
Terms—Cognitive
dynamic
systems,
Cram´er-Rao
bounds, localization, simultaneous localization and mapping,
radio channel models
I. INTRODUCTION
A. Motivation and State of the Art
For radiobased positioning in indoor environments, which
are characterized by harsh multipath propagation, it is still
elusive to achieve the needed level of accuracy robustly1 under
the constraint of reasonable infrastructural needs. In such
environments it is essential to separate relevant from irrelevant
information and attain an appropriate uncertainty model for
measurements that are being used for positioning.
To approach this objective more closely the four basic
principles for human cognition, namely the perception-action-
cycle (PAC), memory, attention and intelligence [1] are im-
plemented into the positioning systems as schematically il-
lustrated in Fig. 2. To encounter all these principles, the
concepts of multipath-assisted indoor navigation and tracking
(MINT) [2]–[5] are intertwined with the principles of cognitive
dynamic systems (CDS) that were developed in [6]–[10].
Evidently, a perceptive system has to reason with measure-
ments under uncertainty [11], i.e. it has to treat the gained
information probabilistically [12], [13], but it also has to
deliberately take actions on the environment and consequently
inﬂuence measurements to reason in favor of relevant informa-
tion instead of irrelevant one. Hence, cognitive processing of
measurement data for positioning seems to be a natural choice
to overcome such severe impairments.
MINT exploit specular multipath components (MPCs) that
can be associated to the local geometry as illustrated in Fig. 1.
MPCs can be interpreted as signals originiating from addi-
tional virtual sources, so-called virtual anchors (VA). These
VAs are mirror-images of a physical anchor w.r.t. the ﬂat
surfaces as illustrated in Fig. 1 [2], [14], [15]. This additional
position-related information can be utilized from the radio
1We deﬁne robustness as the percentage of cases in which a system can
achieve its given potential accuracy.
signals. For a proper consideration of uncertainties in the ﬂoor
plan and to account for the stochastic nature of the radio
signals a geometry-based probabilistic environment model
(GPEM) and a geometry-based stochastic channel model
(GSCM) where introduced in [16]–[19], extending MINT to
a simultaneous localization and mapping (SLAM) approach.
Such a systems acquires and adapts online information about
its surrounding environment and is able to continuously build-
ing up a consistent memory in a Bayesian sense.
The idea of combing MINT with a CDS is to gain control
over the observed environment information to (i) provide
as much position-related information to the Bayesian state
estimator as possible for achieving the highest level of re-
liability/robustness in position estimation, (ii) to improve the
separation between relevant and irrelevant information, and
(iii) building up a consistent environment and action memory.
By actively planning next control actions on the environment
using the Bayesian memory—in sense of waveform adaptation
[6], [20]–[22] or mobile agent motor-control [23], [24]—
the relevant information-return contained in the signals can
be maximized. The information-ﬂow coupling between the
perceptor-actor system and the surrounding environment is
given by the PAC that plays the key-role when it is coming to
gather relevant environment information [1], [10].
The core feedback loop of the cognitive dynamic system,
the perception-action-cycle resembles the idea of optimally
choosing future measurements based on a physical model
under reasoning with uncertainty. The principle has been
explored by the physics community under the term Bayesian
experimental design [25]. This decision-theoretic process gives
a mathematical justiﬁcation for selecting the appropriate opti-
mality criterion under uncertainty that maximizes the utility
function of the posterior probability density function, such
that new model information of the acquired measurements
can be predicted. Information theoretic measures such as
the conditional entropy [26], the mutual information [26]
or the determinate of the Fisher information matrix [27],
[28] are suitable utility functions for this process. The active
selection of measurement parameters has a lot in common with
cognitive perception and control at the lowest layer. However,
it lacks an explicit description of a layered memory structure
that, in combination with algorithmic attention leads to an
“intelligent” behavior of the overall system.
II. MINT CONCEPTS
In this section we review basic elements of MINT [3], [29]
starting with the signal model, then discussing the estimation
of the MPC parameters, and ﬁnally introducing position related
information that is of main importance for a proper weighting
of the MPC-VA relations in the Bayesian tracking ﬁlter. All
not-geometrically-modeled propagation effects in the signals,
so-called diffuse multipath (DM) [30], constitute interference
to the useful position-related information.
A. Geometry-based Stochastic Signal Model (GSCM)
Our signal model is the following. During time step n, a
baseband radio signal s(t) is transmitted from the j-th physical
anchor located at position a(j)
1
∈R2, j ∈{1, . . . , J} = J ,
to a mobile agent at position pn ∈R2. The corresponding
received signal is given as [3]
r(j)
n (t) =
K(j)
n
X
k=1
α(j)
k,ns
 t −τ(j)
k,n

+ s(t) ∗ν(j)
n (t) + w(t).
(1)
Here, the ﬁrst term describes the contributions from K(j)
n
specular MPCs with complex amplitudes α(j)
k,n and delays τ (j)
k,n,
where k ∈

1, . . . , K(j)
n
	
= K(j)
n . The delays τ (j)
k,n correspond
to the distances between the agent and the j-th physical anchor
(for k = 1) or the VAs of the j-th physical anchor (for k ∈

2, . . . , K(j)
n
	
). Thus, τ(j)
k,n =
pn−a(j)
k

c, where a(j)
k
∈R2
is the position of the respective (physical or virtual) anchor and
c is the speed of light. The energy of the transmitted signal
s(t) is assumed to be normalized to one. The second term in
(1) denotes the convolution of s(t) with the diffuse multipath
(DM) ν(j)
n (t), which is modeled as a non-stationary zero-mean
Gaussian random process. Considering uncorrelated scattering
along the delay axis τ, the auto-correlation function of ν(j)
n (t)
is given by Eν

ν(j)
n (τ)ν(j)∗
n
(u)
	
= S(j)
ν,n(τ)δ(τ −u), where
S(j)
ν,n(τ) represents the power delay proﬁle of DM. The DM
process ν(j)
n (t) is assumed to be quasi-stationary in the spatial
domain, which means that S(j)
ν,n(τ) does not change in the
vicinity of pn [31]. Note that the DM component interferes
with the useful position-related information. The last term in
(1), w(t), is additive white Gaussian noise with double-sided
power spectral density N0/2.
B. MPC Parameter Estimation
The delays of the MPCs at agent position pn are estimated
from the received signals using a sparse Bayesian channel
estimator [32]. The algorithm estimates up to a predeﬁned
maximum number M of MPCs yielding estimated delays
ˆτ (j)
m,n and according complex amplitudes ˆα(j)
m,n, with m ∈
{1, . . . , M (j)
n } = M(j)
n . The estimated delays are scaled by
the speed of light c and used as noisy distance measurements
z(j)
m,n = cˆτ (j)
m,n in the proposed multipath-assisted SLAM
algorithm. Furthermore, in a real-world MINT system, the
amplitude estimates ˆα(j)
m,n (after being associated with the k-
th anchor) are fed into a higher-level, non-Bayesian algorithm
that determines the signal-to-interference-plus-noise power ra-
tio (SINR) between the useful specular MPC and the DM plus
noise. This SINR is related to the range standard deviation
σ(j)
m,n (see [29], [33] for details). Note that an extension to
additional parameters besides the delay (and the corresponding
amplitude), as for example the angle-of-arrival and angle-of-
departure of the MPCs, is straightforward.
C. Position and Range Uncertainty
As a performance measure and lower bound on the position
error we use the Cramer-Rao-Lower Bound (CRLB) deﬁned
by the inequality E{||p −ˆp||} ≥tr{J−1
p }, where Jp is the
equivalent Fisher information matrix (EFIM) [3], [34], [35] for
the position vector and tr{·} is the trace operator. Assuming
no path overlap between MPCs, the EFIM Jp is formulated
for a set of anchors in a canonical form by [3]
IIIp,n = 8π2β2
c2
J
X
j=1
K(j)
n
X
k=1
SINR(j)
k,nIIIr

φ(j)
k,n

,
(2)
where β denotes the effective (root mean square) bandwidth
of s(t) and IIIr(φ(j)
k,n) is the ranging direction matrix, which is
a rank-one matrix with an eigenvector in direction φ(j)
k,n from
the agent to the k-th VA. The signal-to-interference-plus-noise
ratios (SINRs) are described by the ratio between the energy
of the deterministic MPCs to the interfering DM plus noise
SINR(j)
k,n =
|α(j)
k,n|2
N0 + TpS(j)
ν,n(τ (j)
k,n)
(3)
The according MPC range uncertainties σ(j)2
k,n = var
n
z(j)
k,n
o
to already associated VAs is given as
σ(j)2
k,n ≥
8π2β2
c2
SINR(j)
k,n
−1
.
(4)
D. Geometry-based
Probabilitstic
Environment
Model
(GPEM)
Fig. 1 illustrates the probabilistic geometric environment
model. A signal exchanged between an anchor at position
a(j)
1
and an agent at p(m) contains specular reﬂections at
the room walls, indicated by the black lines2. These reﬂec-
tions can be modeled geometrically using the VA a(j)
k
with
k = 1, . . . , K(j) that are mirror-images of the j-th anchor
w.r.t. walls [2], [14], [15]. The number of VAs per anchor
j is deﬁned as K(j). The VAs of all anchors are comprised
in An =

A(j)
n
	J
j=1, where A(j)
n
=

a(j)
k,n
	K(j)
n
k=1 . To be able
to cope with uncertainties in the ﬂoor plan the deterministic
geometric model of the VA positions a(j)
k
of the j-th anchor,
is extended to a probabilistic one as shown in Fig. 1. The
VA positions and the agent position p(m) are represented by
a joint PDF p
 p(m), a(j)
1 , a(j)
2 , . . . , a(j)
K(j)
n

. If the position of
the j-th anchor is assumed to be known exactly, the joint PDF
reduces to p
 p(m), a(j)
2 , . . . , a(j)
K(j)
n

.
The joint PDF of the agent and the VA positions is rep-
resented by a multivariate Gaussian RV, where the ﬁgure
shows the marginal distributions of the agent p
 p(m)
(dashed
black ellipses) and the VA positions p
 a(j)
k ) (red ellipses). The
2Since the radio channel is reciprocal, the assignment of transmitter and
receiver roles to anchors and agents is arbitrary and this choice can be made
according to the application scenario.
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0
0.5
1
1.5
2
2.5
eplacements
afalse
a(j)
1
a(j)
2
a(j)
3
a(j)
4
p
Fig. 1.
Illustration of the VA for the j-th anchor and an agent with PDF
p
 a(j)
k

and p
 p(m)
, respectively. The VA at position afalse represents a
false detected VA.
marginal distribution p
 afalse

(dashed red ellipse) deﬁnes a
wrongly detected VA at position afalse. The anchor position
a(j)
1
is assumed to be known perfectly. Uncertainty in the ﬂoor
plan does not just mean that the VA positions are uncertain
and thus described by RV, but also that ﬂoor plan information
is incorrect/inconsistent or entirely missing. This means that
positioning and tracking algorithms based on VA, have to
consider this lack of knowledge.
E. Probabilistic Data Association (PDA)
The state of the agent xn = [pT
n, vT
n ]T, where vn is the
velocity, evolves according to the state transition probability
density function (PDF) p(xn|xn−1) over time instances n.
From each VA in A(j)
n
and the predicted agent position, a set
of expected MPC distances D(j)
n
at time step n is computed.
The MPC distances described in Section II-B are subject to
a data association uncertainty, i.e., it is not known which
measurement in z(j)
n
originated from which VA k of the j-
th anchor, and it is also possible that a measurement y(j)
m,n did
not originate from any VA (false alarm, clutter) or that a VA
did not give rise to any measurement (missed detection). The
probability that a VA is detected is denoted by Pd. Possible
associations at time instance n are described by the K(j)
n -
dimensional random vector b(j)
n
=

b(j)
1,n · · · b(j)
n,K(j)
n
T, whose
k-th entry is deﬁned as [5], [17]–[19], [36], [37]
b(j)
k,n =







m ∈{1, . . ., M},
a(j)
k
generates measurement
z(j)
m,n
0,
a(j)
k
did not give rise to any
measurement.
We also deﬁne bn =

b(1)T
n
· · · b(J)T
n
T. False alarms are
modeled by a uniform distribution with mean arrival rate
µ, and the distribution of each false alarm measurement is
described by the PDF fFA
 z(j)
m,n

[38], [39], factoring in a
likelihood that a measurement correspond to a false alarm.
The statistical dependence of the distance measurement vec-
tors zn =

z(1)
n , · · · , z(J)
n
T on the agent state vector xn and
the association vector bn is described by the global likelihood
function f(yn|xn, bn). Under commonly-used assumptions
about the statistics of the measurements [2], [38], [39], the
global likelihood function at time instances n factors as
f(zn|xn, bn) =
J
Y
j=1
 
M
Y
m=1
fFA
 z(j)
m,n

!
×
Y
k∈Q(xn,b(j)
n )
f

z(j)
b(j)
k,n,n
xn; a(j)
k , σ(j)
k,n

fFA

z(j)
b(j)
k,n,n

,
where Q(xn, b(j)
n ) ≜

k ∈{1, . . . , K(j)
n } : b(j)
k,n ̸= 0,
	
. The
local likelihood function f
 z(j)
m,n|xn; a(j)
k , σ(j)
k,n

is related to a
noisy measurement of the distance to VA a(j)
k
at agent position
pn which is modeled as
z(j)
k,n = ∥pn −a(j)
k ∥+ v(j)
k,n ,
where vk,n is a zero-mean Gaussian random variable with
standard deviation σ(j)
k,n as described in (4). Based on the
factorized likelihood model, a probabilistic data association
algorithm is used to compute the associations between the
expected delay to the VAs and the estimated MPCs using
belief propagation as described in [5], [17]–[19], [36], [37].
The most probable MPC-to-anchor associations are obtained
by means of an approximation of the maximum a posterior
(MAP) detector [40]
ˆb(j)MAP
k,n
≜
arg max
b(j)
k,n∈{1,...,M}
p
 b(j)
k,n
z

.
(5)
After the PDA was applied for all anchors, the following union
sets are deﬁned:
• The set of associated discovered (and optionally a-priori
known) VAs An,ass = S
j A(j)
n,ass.
• The according set of associated measurements Zn,ass =
S
j Z(j)
n,ass.
• The set of remaining measurements Zn,ass = S
j Z(j)
n,ass,
which are not associated to VAs of An,ass.
F. MINT-SLAM
In the most generic form, the prediction equation for the
VAs An and the agent state xn = [pn, xn]T, can be written
as, using the Markovian assumption,
p(xn, An|Z1:n−1) =
Z
p(xn−1, An−1|Z1:n−1)p(xn|xn−1)
× p(An|An−1)d{xn−1, An−1},
(6)
where p(xn|xn−1) and p(An|An−1) are the state transition
probability distribution functions of the agent and the VAs,
respectively. The latter can be represented by an identity
function. The update equation is then
p(xn, An|Z1:n) = p(Zn|xn, An)p(xn, An|Z1:n−1)
p(Z1:n|Z1:n−1)
,
(7)
where p(Zn|xn, An) is the likelihood function of the current
measurements. Assuming that the agent moves along a path
according to a linear Gaussian constant-velocity motion, the
state space model is deﬁned as,
xn = Fxn−1 + Gna,n
=


1
0
∆T
0
0
1
0
∆T
0
0
1
0
0
0
0
1

xn +


∆T 2
2
0
0
∆T 2
2
∆T
0
0
∆T

na,n,
(8)
where ∆T is the discrete time update rate. The driving
acceleration noise term na,n is zero-mean, circular symmetric
with variance σ2
a, and models motion changes which deviate
from the constant-velocity assumption. The transformed noise
covariance matrix is given as Ra = σ2
aGGT. The entire state
space of xn and the associated VAs An,ass described in (6)
are formulated as [4], [16]
˜xn =

F
04×2Kn
02Kn×4
I2Kn×2Kn

˜xn−1 +

G
02Kn×2

na,n,
(9)
where ˜xn = [xT
n, aT
2,n, . . . , aT
Kn,n]T represents the stacked
state vector with {a(j)
k,n} ∈An,ass. The covariance matrix of
the state vector consists of the agent covariance matrix Cxn,
the cross-covariances Cxn,ak,n between the agent state xn and
the VAs at positions ak,n, the cross-covariances between VAs
Cak,n,ak′,n with k ̸= k′, and the covariances of the VAs Cak,n.
The measurement model is deﬁned as
zn = ˜hn(˜xn) + ˜nz,n,
(10)
where zn is deﬁned in (5) with the according stack measure-
ment noise vector ˜nz,n. The measurement model ˜hn contains
all distance equations ||a(j)
n,k −pn|| ∀a(j)
n,k
∈An,ass to
update the agent and the VAs, respectively. As Bayesian state
estimator a UKF is used [4]. The measurement covariance
matrix is written as
Rn = diag
n
var
n
z(j)
k,n
oo
∀k, j : a(j)
k,n ∈An,ass,
(11)
where the range variances are deﬁned by (4).
III. COGNITIVE POSITIONING SYSTEM
The basic building blocks of a CDS, namely the perception-
action cycle (PAC), cognitive perceptor (CP), information
feedback and the cognitive controller (CC) are depicted in
Fig. 2. All of these blocks are reciprocally coupled and form
a hierarchical structure to enable the ability to interpret the
environmental observables on different abstraction layers.
A. Multipath-assisted Positioning as CDS
Figure 2 illustrates the block diagram of a cognitive local-
ization and tracking system with a triple layered structure:
• First Layer: Deﬁnes (i) the direct Bayesian state esti-
mation p
 xn
Zn, cn

at the CP holds the agent position
and its velocity, and (ii) the cognitive control parameters
cn at the CC based on the feedback information of the
Bayesian state space ﬁlter.
• Second Layer: Represents (i) the memory for the
GPEM described by the VAs with marginal PDF
p
 A(j)
n |Z(j)
n , cn

and the memory for the GSCM de-
scribed by the SINR(j)
k,n of the MPC at the CP and (ii) the
memory of VAs speciﬁc waveform parameters at the CC,
which specify on which the cognitive control is based on.
• Third Layer: It represents the highest layer and is dif-
ferent from the two layers below in the sense that it
deﬁnes the application driven by the cognitive localiza-
tion/tracking system. The CP memory of applications
holds abstract parameters or structures of the speciﬁed
application and the CC enables the motor control for
realizing higher goal planning [41].
The ﬁrst and second layers describe the signal and information
processing of the model parameters of the surrounding phys-
ical environment and the radio channel. On the other hand,
the third layer holds higher goal parameters, i.e. motor-control
input to fulﬁll navigation goals, that are based on the physical-
related parameters [41]–[43].
B. Feedback Information
The system is able to adapt online its behavior to the
environment, i.e. perceptual attention is given, through the
following principles:
• At the CP side, the GSCM and GPEM memories are up-
dated using the received signal rn(t, cn) with waveform
parameters chosen by the CC.
• In the actual sensing cycle the attention is put through
the CC using the control parameters cn on the potential
set of VA and their parameters memorized in the GSCM
and GPEM. These model parameters are seen at the CP
side of Fig. 2.
Now the question is, “How to control the environment infor-
mation ﬂow through the received signal and put cognitive
attention on the relevant features in the following sensing
cycle?” The answer to this lies in the CC and the feed-back
and feed-forward information between the perceptor and the
controller as illustrated in Fig. 2.
The control parameter vector cn+1 of the next sensing cycle
is chosen in order to gain the most “valuable” position-related
information from the new set of measurements ˜Zn+l using
the predicted posterior p
 xn+l, An+l|˜Zn+l, bn+l, cn+l

the
predicted received signals ˜Zn+l that depends on the chosen
signal model, with l = 0, . . . , lfuture as future horizon. This
goal can be reached by minimizing an expected cost-to-go
function, yielding
cn+1 = arg min
cn
C
 p
 xn+l, An+l|˜Zn+l, bn+l, cn+l

,
(12)
where C(·) is the expected cost-to-go function for optimal
control [25], [43] of the environmental information contained
in ˜Zn+l. The expected cost-to-go function is based on an
information-theoretic measure that should depend on the envi-
ronment parameters, like the VA speciﬁc SINR(j)
k,n, and serves
as feedback information in the CDS.
In general, estimation and control problems have to deal
with probabilistic states and observations. As a consequence,
also the control has to be probabilistic, i.e. the cost function
or utility must handle uncertainties. Based on covering the
sn(t, an)
rn(t, an)
Jn(a)
πn(a, a′)
p(xn)
p(An,ass)
Cognitive Controller
Cognitive Perceptor
Information Flow
Feedforward
Information
Feedback
Information
h(xn, an)
b
Policy πn:
Waveform Selection
Learning &
Planning
Bayesian State
Filter: ˜xn, eCn
Bayesian VA Memory:

ak,n, Cak,n
	
n
Cxn,ak,n, Cak,n,ak′,n
o
{SINRk,n}
Short-term
Memory: z−1
Environment
Perception-Action Cycle
Fig. 2. Block diagram of the cognitive indoor positioning and tracking system that uses multipath channel information.
uncertainty of the state with a PDF, a measure of informative-
ness of measurements has to be deﬁned on the posterior state
distribution. Two commonly used information measures of an
RV are the entropy [44] and the Fisher information [28].
C. Information Measures for Feedback
1) Fisher Information:
The Fisher information matrix
(FIM) of a RV r, dependent on the deterministic parameter
p, can also be used as a measure of information. Using the
likelihood function ln f(r; p), it is deﬁned as
IIIp = Er;p
( ∂
∂p ln f(r; p)
  ∂
∂p ln f(r; p)
T)
.
(13)
2) Entropy: For a continuous-valued vector RV p ∈RL
(in the follow-up sections p represents the agent position), the
conditional entropy is given as [26]
h(p) .= −Ep {ln p(p)} = −
Z ∞
−∞
· · ·
Z ∞
−∞
p(p) ln p(p)dp,
(14)
The entropy is directly related to the uncertainty of the
according RV. For a multivariate Gaussian RV N (mp, Cp)
this means that the entropy is directly related to the covariance
matrix Cp, yielding
h(p) = 1
2 ln
 (2πe)L det Cp

,
(15)
where det(·) deﬁnes the determinant of a matrix. The de-
terminate of the covariance matrix Cp is a measure of the
“volume” of uncertainty of p. The more compact the volume
is, the smaller is the entropy h(p) and consequently the more
informative is the distribution p(p).
The inverse of the FIM is a lower bound on the covariance
Cˆp ⪰III−1
p
of the deterministic parameter p of an estimator
ˆp [28]. Looking at the entropy of the estimator’s distribution
N (ˆp, Cˆp), the explicit relationship between the FIM IIIp of r
(dependent on p) and the entropy h(ˆp) is given as
h(ˆp) = 1
2 log
 (2πe)L det
 Cˆp

(16)
≥−1
2 log
 (2πe)L det
 IIIp

.
As the relationship in (16) shows, one can connect the FIM
of a parameter vector with the entropy, resulting in a scalar
measure of information that is valuable for choosing optimal
waveform parameters, as it is needed for a cognitive posi-
tioning system. As it is shown in Section II-C, the FIM IIIp
on the position of the agent p contains the environment and
signal parameters, e.g. VA positions and the according SINRs.
With this, a direct relationship between the environment, the
feedback information and the control of the sensing is given,
closing the PAC (Figure 2). In the same manner, the system
can also be expanded to information-based control of the agent
state to increase the informativeness in the measurements [23],
[24], [42].
IV. COGNITIVE MINT
A. Cognitice Controller: Reinforcement Learning (RL)
As already stated in Section III, the control parameters
should be chosen in order to optimize the expected cost-to-go
function C (·) of the predicted posterior PDF as deﬁned in (12).
In general, the expected cost-to-go function for a Bayesian
state space ﬁlter can be written as
C (p(xn+1, An+1|˜rn+1(t, cn)) = ¯g
 ǫn+1|n+1(cn)

,
(17)
where ǫn+1|n+1(cn) is the predicted posterior state-estimation
error depend on the control parameters and ¯g(·) deﬁnes the
cost-to-go function of the transmitter. The conditional entropy
was discussed as a possible information measure for the feed-
back, thus a possible cost-to-go function ¯g(·) of the transmitter
is the conditional entropy of the predicted posterior state-
estimation error ǫn+1|n+1(cn), given as ¯g
 ǫn+1|n+1(cn)

=
h
 ǫn+1|n+1(cn)

[26], [45]. This entropy conditioned on
the control parameter vector cn is directly coupled with the
posterior covariance matrix of the Bayesian tracking ﬁlter that
is lower bounded by the inverse of the EFIM in (2). The
entropy of the predicted posterior state-estimation error (when
assuming a Gaussian approximation) is given as
h
 ǫn+1|n+1(cn)

∝det
 eCxn+1(cn)

,
(18)
where eCxn+1(cn) and IIIxn+1(cn) is the predicted state co-
variance matrix as described in Section II-F of the state vector
provided from the Bayesian state space ﬁlter (UKF) dependent
on the control parameter vector cn. Thus, the entropy in
(18) is directly coupled with the position-related information
that is contained in the measurement noise covariance matrix
Rz,n described by (11). How the introduced algorithm is
using the state space and measurement model equations of
the Bayesian state space estimator is described in more detail
in Sections IV-B2 and IV-B3.
For readability of the following derivations of the control
optimization algorithm, the cost-to-go of the CC (18) is
rewritten as ¯g
 ǫn+1|n+1(cn)

= h (xn+1, cn) with cn ∈A,
where A is the space of cognitive action with size |A| that
represents the waveform library in our case. Consequently,
the next set of waveform parameters has to be chosen in
order to minimize the cost-to-go of the next posterior entropy.
As elaborated in [46], dynamic programming represents an
optimal solution for such problems, but unfortunately it is
based on the assumption that the state to be controlled is
“perfectly” perceivable. Hence, methods have been introduced
that are capable of handling imperfect state information [47]
with the drawback that they are computational complex. In [6],
[45] approximate dynamic programming was used for optimal
control. In there, the trace of the posterior covariance matrix
was used as cost-to-go function to reduce the computational
complexity. The policy for control parameter selection in
the transmitter at time instance n is seeking to ﬁnd the set
of waveform parameters, for which the cost-to-go function
¯g(ǫn+1|n+1(cn)) ≈tr[eCxn+1(cn)] is minimized for a rolling
future horizon of lfuture predicted states. In practice, it is
difﬁcult to construct all state transition probabilities from one
state to another that are conditioned on the selected actions,
including their cost incurred as a result of each transition. RL3
[48] represents an approximation of dynamic programming
[46], [47] for solving such optimal control and future planning
task with high computationally efﬁciency. In RL literature
the cost-to-go function is termed value-to-go function Jn(cn)
that is updated online for every PAC based on the immediate
rewards rn. The immediate reward rn is a measure of “quality”
of an action cn taken on the environment. Using the Markovian
assumption and following the way in [8], it is given by
rn = gn (h(xn−1, cn−1) −h(xn, cn)) ,
(19)
where h(xn, cn) ∝det
 Cxn(cn)

and gn(·) is an arbitrary
scalar operator that in its most general form could also depend
on the time instance n [8]. A reasonable function for the
reward is the scaled change in the posterior entropy from one
PAC to the next, i.e.
rn = sign (∆h(xn, cn)
log
 |∆h(xn, cn)|
 .
(20)
A positive reward will be favoring the current action an for the
future action cn+1 and conversely a negative one will lead to
a penalty for these actions. As described in [8], the cognitive
RL algorithm has to ﬁnd the optimal future action cn+1 for the
next PAC based on the immediate reward rn and the learned
value-to-go function Jn(cn).
For computing the expected costs of future actions as it is
done in dynamic programming, RL divides the computation of
the value-to-go function into two parts, (i) the learning phase
that incorporates the actual measured reward into the value-to-
go function based on actions cn and cn−1, and (ii) the planning
phase that incorporates predicted future rewards into the
value-to-go function. Whereas for learning a “real” reward is
perceived from the environment, for planning just model-based
predicted rewards are perceived from the internal perceptor
memory using the feedforward link. A faster convergence to
the optimal control policy can be achieved in this way.
B. Learning and Planning: Algorithm
The value-to-go function that is used in the cognitive
controller is deﬁned as [8]
Jn(c) = Eπn

rn + γrn+1 + γ2rn+2 + · · · |cn = c
	
, (21)
where rn with
c ∈C is the actual reward, rn+l are
the predicted future rewards that are based on the GPEM
and GSCM parameter that are used by the Bayesian ﬁlter,
0 < γ ≤1 is the discount factor for future rewards based on
action cn ∈C and the expected value is calculated using the
cognitive policy
πn(c′, c) = P [cn+1 = c′|cn = c] ,
c, c′ ∈C,
(22)
3RL represents an intermediate learning procedure that lies between super-
vised and unsupervised learning as stated in [45].
where P[·|·] deﬁnes a conditional PMF that describes the tran-
sition probabilities of all actions c ∈C over time instances n.
Following the derivations in [8], the value-to-go function can
be reformulated in an incremental recursive manner, yielding
Jn(c) ←Jn(c)+α
"
R(c) + γ
X
c′
πn(c, c′)Jn(c′) −Jn(c)
#
,
(23)
where R(c) = Eπn {rn|cn = c} ∀c ∈C denotes the expected
immediate reward and α > 0 is the learning rate. The
algorithm for updating the value-to-go function can be found
in the Appendix of [4]. The incremental recursive update in
(23) means that for all actions c ∈C the value-to-go function
is updated using the expected immediate reward and the policy
πn(c, c′) for all these actions.
1) Learning from applied Actions: With the value of the
immediate reward rn, a new value is learned for the value-
to-go function for the currently selected action cn using
Jn(cn) ←(1 −α)Jn(cn) + αR(cn) of (23). This accounts
for the “real” physical action on the environment. Hence,
only one parameter set can be chosen as an action for the
PAC at a time; it would take at least |C|T seconds for
applying all actions on the environment and collecting the
according immediate rewards, where T is the time period of
a PAC. Unfortunately, this results in a poor convergence rate
of the algorithm and unacceptable behavior for time-variant
environments. A possible remedy against this is the planning
of future actions based on the state space and measurement
model of the Bayesian state estimator.
2) Planning for Improving Convergence Behavior: Plan-
ning is deﬁned as predicting expected future rewards using
the state and measurement model of the Bayesian state space
ﬁlter to improve the convergence rate of the RL algorithm.
As depicted in Fig. 2, the feedforward link is used to connect
the controller with the perceptor. The feedforward information
is a hypothesized future action, which is selected for a future
planning stage. Inspecting (23), one can observe that for every
action c ∈M, where M ⊂C is a subset of C depending
on the actual policy πn, the predicted posterior covariance
matrices eCn+l(c) and the according predicted future rewards
rn+l, are computed with decreasing discount factor γl for
predicted future rewards, for l = 1, . . . , lfuture, where lfuture is
the future horizon. The predicted covariance matrices eCn+l(c)
for a speciﬁc future action c is computed using the state space
(e.g. (9)) and measurement model (e.g. (10)) of the Bayesian
state space estimator and the according GPEM and GSCM
parameters stored in the perceptors’ memory as shown in
Fig. 2. After the planning process is ﬁnished, the value-to-
go function is updated for all actions c ∈M. Finally, the
actual PAC is closed by updating the policy to πn+1 using the
value-to-go function Jn+1 and choosing the new action, i.e. the
waveform parameters, for the next PAC according to this new
policy. This means that the value-to-go function Jn(cn) and
the policy πn are updated iteratively from one another from
one PAC to the next PAC, with one important detail which is
discussed below.
a) Explore/Exploit trade-off:: Both the planning process
and choosing new actions are based on the policy. In planning,
the chosen action-subset M is deﬁned by sampling from
the policy πn and new actions are selected based on the
updated policy πn+1. Hence, the policy is responsible for
the explore/exploit trade-off in the action space. A widely
used method for balancing the exploration of new actions and
exploiting the already learned value-to-go function Jn(cn) is
the ǫ-greedy strategy, meaning that with a small probability of
ǫ a random action is selected, representing pure exploration,
and with probability of 1 −ǫ the action is chosen according
to the maximum of the value-to-go function, representing a
pure exploitation. The random selection of a new action and
the action in the subspace M can either be selected from a
uniform distribution over the action space A or from the policy
πn. The policy is computed using the Boltzmann distribution
πn+l = πn+l−1(c)
exp{∆Jn+l(c)/τ}
P
c′ πn+l−1(c′) exp{∆Jn+l(c′)/τ},
where τ deﬁnes the exploration degree and is referred to
as the system temperature [49] and ∆Jn+l(c) = Jn+l(c) −
Jn−1+l(c). The cognitive action is selected according to
cn =

random action ∼πn+1 ∈C
if ξ < ǫ
arg maxc∈C Jn(c)
otherwise
,
(24)
where 0 ≤ξ ≤1 is a uniform random number drawn at
each time step n. As we have said, from the policy in (24) the
new action cn+1 is selected and applied on the environment so
that the next PAC can start. The important concept of attention
at the perceptor as well as the actuator side in the cognitive
dynamic system can be argued with the following:
• Perceptual attention: Is given by the fact that the
environment dependent parameters, i.e. the marginal PDF
of the VA p(ak,n) and their multipath channel dependent
reliability measures, SINRk,n, are learned and updated
online, so that the perceptual Bayesian state space ﬁlter
puts its attention on the relevant position-related informa-
tion in the received signal.
• Control attention: Is given by the fact that the policy
πn that is learned over time and the according subset
of actions M put focus on the “more relevant” actions.
These action in turn focus on the relevant position-related
information in the received signal.
3) Waveform Library: The general form of the waveform
library contains the control parameters cn = {Tp,n, f j
c,n}J
j=1
for the j-th anchor consisting of carrier frequencies and pulse
durations. Hence, the VA speciﬁc MPC parameters are esti-
mated using speciﬁc sub-bands of the radio channel spectrum
deﬁned by the parameter pair T j
p,n and f j
c,n, which in turn is
chosen in an “optimal” manner. Optimal in this case means
that the position-related information that is contained in the
MPC parameters is maximized at agent position pn (see for
(2)).
Equations (3) and (2), which describe the parameters
^
SINR
j
k,n, show the relation between the pulse parameter pair
T j
p,n and f j
c,n and the position-related information contained in
the channel. The pulse duration T j
p,n scales the amount of DM
and is directly proportional to the effective root mean square
bandwidth β. The relation to f j
c,n is not that obvious, since
0
2
4
6
0
1
2
3
4
5
6
7
8
9
eplacements
x[m]
y[m]
p(1)
1
p(2)
1
Fig. 3. Scenario for probabilistic MINT using cognitive sensing in presence
of additional DM interference. The anchors are at the positions a(1)
1
and
a(2)
1 . The black line represents the agent trajectory and the red part of the
line indicates the agent positions, where the DM interference is activated.
it describes the frequency dependency of the environment
parameters and thus the GSCM parameters as the complex
amplitudes of the MPC and the DM PDP. The set of selected
VA should lead to the highest overall SINR values (and
accordingly the smallest range variances var
n
ˆdj
k,n
o
) and the
smallest possible GDOP4, i.e. geometric optimal constellation
of VA positions which is reﬂected by the ranging direction
matrix IIIr(φj
k,n). In a cognitive sense this means that the
actions a ∈A are chosen to reduce the posterior entropy over
time under quasi-stationary environment conditions.
V. RESULTS
A. Measurement Setup
For the evaluation of this positioning approach, we use
the seminar room scenario of the MeasureMINT database
[51]. The measurements allow for 5 trajectories consisting
of 1000 agent positions with a 1 cm spacing as shown in
Fig. 3. At each position, UWB measurements are available
of the channel between the agent and the two anchors at
the positions a(1)
1
= [0.5, 7]T and a(2)
1
= [5.2, 3.2]T. The
measurements have been performed using an M-sequence cor-
relative channel sounder developed by Ilmsens. This sounder
provides measurements over approximately the FCC frequency
range, from 3 −10 GHz. On anchor and agent sides, dipole-
like antennas made of Euro-cent coins have been used. They
4The GDOP the ratio between position variance and the range variance [50].
For positioning a small value indicates a high level of conﬁdence that high
precision can be reached. Hence, the GDOP indicates a “good ” geometry for
positioning, i.e. a good geometric placement of the anchors.
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0
0.2
0.4
0.6
0.8
1
 
 
conventional MINT
cognitive MINT, win = 1
cognitive MINT, win = 5
PSfrag replacements
P(p)[m]
CDF
Fig. 4. Performance CDF of the cognitive MINT algorithm using a smaller
restricted set of VA. Visibilities of VA are computed using the SINR instead
of optical ray-tracing.
have an approximately uniform radiation pattern in azimuth
plane and zeroes in the directions of ﬂoor and ceiling.
The chosen initial pulse duration is Tp = 0.5 ns (corre-
sponding to a bandwidth of 2 GHz) and the center frequency
is fc = 7 GHz. The VA for the anchors at the positions a(1)
1
and
a(2)
1
were computed a-priori up to order 2. The past window
of agent positions for the SINR estimation is again chosen to
be wpast = 40. For all simulations 30 Monte Carlo runs were
conducted.
B. Initial Experiment Setup
For the sake of simplicity, we reduce the control parameters
to just the carrier frequency cn = fc,n for each PAC for all
anchors and we ﬁx the pulse duration Tp. This means that the
cognitive MINT system adaptively ﬁnds the carrier frequency
fc,n from PAC to PAC that yields the highest reward from the
environment by maximizing the position-related information.
Starting from the initial value fc,1 = 7 GHz (which represents
the center of the measured bandwidth), the carrier frequency
is adapted over time using the posterior entropy in (18).
The ﬁnite space of cognitive actions C contains the discrete
frequency values bounded by the measured bandwidth, i.e.
fc,n,i ∈C, where i = 1, . . . , |C|. The frequency spacing
between the frequency bins is equidistant, ∆fc = fc,n,i+1 −
fc,n,i. For the experiments, we haven chosen ∆fc = 50 MHz,
considering the large signal bandwidth of 2 GHz. The start-
ing policy is deﬁned as a uniform distribution π1(c′, c) =
U(fc,n,1, fc,n,|C|) and the cost-to-go function is chosen to be
J1(c) = 0 ∀c. The size of the planning subspace is |M| = 20;
the size of C is |C| = 40.
C. Discussion of Performance Results
1) Conventional MINT: Fig. 4 shows the overall position
error CDF for “conventional” MINT (which assumes perfect
ﬂoor plan knowledge) with and without cognitive waveform
adaptation. To show the advantage of the cognitive MINT
algorithm, a restricted set of VA is chosen and the visibilities
of the VA are computed using the SINR instead of optical
ray-tracing. As the CDF of “conventional” MINT indicates
(blue line with circle marker), the tracking algorithm tends to
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0
0.2
0.4
0.6
0.8
1
 
 
prob. MINT
cog. prob. MINT, win = 1
cog. prob. MINT, win = 5
eplacements
P(p)[m]
CDF
Fig. 5.
Performance CDF of cognitive probabilistic MINT using a smaller
restricted set of VA. For probabilistic MINT, the visibilities of VA are always
computed using the SINR.
diverge since too little position-related information is available.
The black and the red lines show the overall position error
CDF for cognitive MINT for a future horizon window of
l = 1 and l = 5, respectively. As one can observe, the perfor-
mance is signiﬁcantly increased due to the cognitive waveform
adaptation. This means that the cognitive MINT algorithm is
able to increase the amount of position-related information
by changing the sensing spectrum via the carrier frequency
fc,n,i ∈A to bands that carry more geometry-dependent
information in the MPC. Another interesting observation of
Fig. 4 is that an increase of the planning horizon results in an
increased performance, conﬁrming the correct functionality of
the cognitive algorithm.
2) Probabilistic MINT: Fig. 5 shows the overall position
error CDF for probabilistic MINT with and without cog-
nitive waveform adaptation. Uncertainties in the ﬂoor plan
and wrong associations can be robustly handled due to the
probabilistic treatment of VA and thus none of the individual
trajectory runs diverges. The already achieved high accuracy
and robustness of probabilistic MINT are the reasons that
cognitive sensing leads to only a minor additional performance
gain for this scenario. It is suspected that for lower bandwidth
the performance gain induced by the cognitive probabilistic
MINT should be much more distinct.
3) Probabilistic MINT with additional DM Interference:
In the last setup, we additionally have added synthetic DM
interference ﬁltered at a carrier frequency fc = 7 GHz, with a
bandwidth of 2 GHz. The DM parameters are chosen according
to [52] except for the DM power. The experiments were
conducted with three levels of DM power, Ω1 = 1.1615∗10−9,
Ω1 = 5.8076 ∗10−9 and Ω1 = 1.1615 ∗10−8.
Fig. 3 illustrates the scenario used for the experiment. The
black line represents the agent trajectory and the red part of
it indicates the agent positions, where the DM interference
is activated. Fig. 6 shows the signals exchanged between the
agent and the Anchors 1 and 2 for one sample position. The
“clean” signals are shown in Fig. 6a, the noisy signal for
DM power of Ω1 = 1.1615 ∗10−9 in Fig. 6b. Looking at
Fig. 6b it is quite obvious that this level of DM represents
already a severe interference. The justiﬁcation of using such
a interference noise model lies in the fact that it can describe
0
5
10
15
20
25
0
0.2
0.4
0.6
0.8
1
0
5
10
15
20
25
0
0.2
0.4
0.6
0.8
1
PSfrag replacements
path delay [m]
path delay [m]
|r(1)
n (t)|
|r(2)
n (t)|
(a) Clean signals
0
5
10
15
20
25
0
0.2
0.4
0.6
0.8
1
0
5
10
15
20
25
0
0.2
0.4
0.6
0.8
1
PSfrag replacements
path delay [m]
path delay [m]
|r(1)
n (t)|
|r(2)
n (t)|
(b) Noisy signal with DM power of Ω1 = 1.1615 ∗10−8
Fig. 6. Signals exchanged between agent and Anchors 1 and 2 for an example
agent position. The gray lines represent the estimated delays of the MPC.
Fig. 6a shows the “clean” signal and Fig. 6b the noisy signal.
 
 
0
100
200
300
400
500
600
700
800
900
1000
5.5
6
6.5
7
7.5
8
8.5
initial fc,0
DM band at fc,0
mean fc,n
 examples of fc,n
PSfrag replacements
time index n
f [GHz]
Fig. 7. Mean carrier frequency for DM power Ω1 = 1.1615∗10−8. The black
line denotes for the initial carrier frequency fc,1 and the blue one the mean
of the cognitively adapted carrier frequency fc,n. The blue dashed lines show
a few example realizations of cognitively adapted carrier frequencies along
different trajectories and for different Monte Carlo runs.
many kinds of measurement modeling mismatches, e.g. if the
anisotropy of the antenna pattern for different angle of arrivals
is not considered.
Fig. 7 illustrates the mean values of the cognitively adapted
carrier frequency along one of the trajectories at DM power
0
100
200
300
400
500
600
700
800
900
1000
0
0.5
1
1.5
2
2.5
3
3.5
4
x 10
−8
 
 
mean prob. MINT
example runs prob. MINT
mean cog. prob. MINT
example runs cog. prob. MINT    
DM disturbance
eplacements
time index n
Entropy
Fig. 8. Mean entropy of probabilistic MINT and cognitive probabilistic MINT
over time instances n for DM power Ω1 = 1.1615 ∗10−8. The red and
black dashed lines show a few example entropy realizations along different
trajectories and for different Monte Carlo runs.
Ω1 = 1.1615 ∗10−8. The mean is computed using the 30
Monte Carlo simulations of the experiment. The black line
denotes the initial carrier frequency fc,1 and the blue one the
mean of the cognitively adapted carrier fc,n. The blue dashed
lines show a few example realizations of cognitively adapted
carrier frequencies along different trajectories and for different
Monte Carlo runs. The ﬁgure shows quite clearly that the
cognitive probabilistic MINT algorithm is avoiding (almost
at all agent positions, where additional DM interference is
present) carrier frequencies fc,n near to the carrier of DM.
Fig. 8 shows the according mean entropy values of proba-
bilistic MINT (red line with diamond markers) and cognitive
probabilistic MINT (black line with triangle markers) over
time instances n for DM power Ω1 = 1.1615 ∗10−8. The
red and black dashed lines show a few example entropy
realizations along different trajectories and for different Monte
Carlo runs. Before the noise disturbance starts the entropy of
the probabilistic MINT algorithm is almost the same as of
the cognitive probabilistic MINT algorithm. In the moment
the disturbance is introduced, the entropy of the posterior
increases. The cognitive probabilistic MINT algorithm then
starts to change its carrier frequency fc,n (as shown in Fig. 7)
until the entropy is again reduced. This leads to an almost
constant or even decreasing entropy even in the presence of
a tremendous noise level (black line with triangle markers in
Fig. 8). In contrast to that the probabilistic MINT algorithm
without cognitive waveform adaptation starts to diverge after
the disturbance is introduced and is not able to recover. This is
indicated by the rapid increase of the entropy and stagnation
at a large value shown in Fig. 8 by the red line with diamond
markers.
This result is conﬁrmed by looking at the performance CDF
of the agent position error shown in Fig. 9. This comparison
between probabilistic MINT and cognitive probabilistic MINT
illustrates the powerful property of the cognitive algorithm to
separate relevant from irrelevant information using adaptation
of the control parameter fc,n to avoid the noisy frequency band
of the signal. The probabilistic MINT algorithm without wave-
form adaptation tends to diverge under such harsh conditions
as depicted by CDF drawn with solid lines. In contrast to this,
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0
0.2
0.4
0.6
0.8
1
 
 
prob. MINT: noise 1
cognitive prob. MINT: noise 1
prob. MINT: noise 2
cognitive prob. MINT: noise 2
prob. MINT: noise 3
cognitive prob. MINT: noise 3
PSfrag replacements
P(p)[m]
CDF
Fig. 9. Performance CDF of the cognitive probabilistic MINT algorithm with
introducing a disturbance at three different noise levels along a certain part of
the trajectory. Noise 1 corresponds to DM with Ω1 = 1.1615 ∗10−9, Noise
2 with power Ω1 = 5.8076 ∗10−9 and with power Ω1 = 1.1615 ∗10−8
the cognitive MINT algorithm overcomes these impairments,
leading again to a robust behavior as depicted by CDF drawn
with dashed lines.
REFERENCES
[1] J. M. Fuster, Cortex and Mind - Unifying Cognition.
Oxford University
Press, 2003.
[2] P. Meissner, “Multipath-Assisted Indoor Positioning,” Ph.D. dissertation,
Graz University of Technology, 2014.
[3] E. Leitinger, P. Meissner, C. Rudisser, G. Dumphart, and K. Witrisal,
“Evaluation of Position-Related Information in Multipath Components
for Indoor Positioning,” IEEE Journal on Selected Areas in Communi-
cations, vol. 33, no. 11, pp. 2313–2328, Nov 2015.
[4] E. Leitinger, “Cognitive Indoor Positioning and Tracking using Mul-
tipath Channel Information,” Ph.D. dissertation, Graz University of
Technology, 2016.
[5] E. Leitinger, M. F., P. Meissner, K. Witrisal, and F. Hlawatsch, “Belief
Propagation based Joint Probabilistic Data Association for Multipath-
Assisted Indoor Navigation and Tracking,” in 2016 International Con-
ference on Localization and GNSS (ICL-GNSS), June 2016.
[6] S. Haykin, Y. Xue, and P. Setoodeh, “Cognitive Radar: Step Toward
Bridging the Gap Between Neuroscience and Engineering,” Proceedings
of the IEEE, vol. 100, no. 11, pp. 3102 –3130, nov. 2012.
[7] S. Haykin, M. Fatemi, P. Setoodeh, and Y. Xue, “Cognitive Control,”
Proceedings of the IEEE, vol. 100, no. 12, pp. 3156 –3169, dec. 2012.
[8] M. Fatemi and S. Haykin, “Cognitive Control: Theory and Application,”
Access, IEEE, vol. 2, pp. 698–710, 2014.
[9] A. Amiri and S. Haykin, “Improved Sparse Coding Under the Inﬂuence
of Perceptual Attention,” Neural Comput., vol. 26, no. 2, pp. 377–420,
Feb. 2014.
[10] S. Haykin and J. Fuster, “On Cognitive Dynamic Systems: Cognitive
Neuroscience and Engineering Learning From Each Other,” Proceedings
of the IEEE, vol. 102, no. 4, pp. 608–628, April 2014.
[11] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of
Plausible Inference.
San Francisco, CA, USA: Morgan Kaufmann
Publishers Inc., 1988.
[12] P. Gregory, Bayesian Logical Data Analysis for the Physical Sciences.
New York, NY, USA: Cambridge University Press, 2005.
[13] D. S. Sivia and J. Skilling, Data analysis : a Bayesian tutorial, ser.
Oxford science publications.
Oxford, New York: Oxford University
Press, 2006.
[14] J. Borish, “Extension of the Image Model to arbitrary Polyhedra,” The
Journal of the Acoustical Society of America, March 1984.
[15] J. Kunisch and J. Pamp, “An Ultra-Wideband space-variant Multipath
Indoor Radio Channel Model,” in Ultra Wideband Systems and Tech-
nologies, 2003 IEEE Conference on, Nov 2003, pp. 290–294.
[16] E. Leitinger, P. Meissner, M. Lafer, and K. Witrisal, “Simultaneous
Localization and Mapping using Multipath Channel Information,” in
2015 IEEE International Conference on Communications Workshops
(ICC), London, UK, June 2015, pp. 754–760.
[17] E. Leitinger, F. Meyer, F. Tufvesson, and K. Witrisal, “Factor graph
based simultaneous localization and mapping using multipath channel
information,” in Proc. IEEE ICCW-17, Paris, France, May 2017, pp.
652–658.
[18] E. Leitinger, F. Meyer, F. Hlawatsch, K. Witrisal, F. Tufvesson, and
M. Z. Win, “A belief propagation algorithm for multipath-based SLAM,”
IEEE Trans. Wireless Commun., vol. 18, no. 12, pp. 5613–5629, Dec.
2019.
[19] E. Leitinger, S. Grebien, and K. Witrisal, “Multipath-based SLAM
exploiting AoA and amplitude information,” in Proc. IEEE ICCW-19,
Shanghai, China, May 2019, pp. 1–7.
[20] D. Kershaw and R. Evans, “Optimal waveform selection for tracking
systems,” Information Theory, IEEE Transactions on, vol. 40, no. 5, pp.
1536 –1550, sep 1994.
[21] S. Haykin, A. Zia, Y. Xue, and I. Arasaratnam, “Control Theoretic
Approach to Tracking Radar: First step towards cognition,” Digital
Signal Processing, vol. 21, no. 5, pp. 576 – 585, 2011.
[22] K. Bell, C. Baker, G. Smith, J. Johnson, and M. Rangaswamy, “Cognitive
radar framework for target detection and tracking,” Selected Topics in
Signal Processing, IEEE Journal of, vol. 9, no. 8, pp. 1427–1439, Dec
2015.
[23] G. Hoffmann and C. Tomlin, “Mobile Sensor Network Control Using
Mutual Information Methods and Particle Filters,” Automatic Control,
IEEE Transactions on, vol. 55, no. 1, pp. 32–47, Jan 2010.
[24] B. J. Julian, M. Angermann, M. Schwager, and D. Rus, “Distributed
Robotic Sensor Networks: An Information-theoretic Approach,” Int.
J. Rob. Res., vol. 31, no. 10, pp. 1134–1154, Sep. 2012. [Online].
Available: http://dx.doi.org/10.1177/0278364912452675
[25] K. Chaloner and I. Verdinelli, “Bayesian Experimental Design: A
Review,” Statistical Science, vol. 10, no. 3, pp. pp. 273–304, 1995.
[Online]. Available: http://www.jstor.org/stable/2246015
[26] T. M. Cover and J. A. Thomas, Elements of Information Theory
(Wiley Series in Telecommunications and Signal Processing).
Wiley-
Interscience, 2006.
[27] H. L. Van Trees, Detection, Estimation and Modulation, Part I.
Wiley
Press, 1968.
[28] S. Kay, Fundamentals of Statistical Signal Processing: Estimation
Theory.
Prentice Hall Signal Processing Series, 1993.
[29] P. Meissner, E. Leitinger, and K. Witrisal, “UWB for robust indoor
tracking: Weighting of multipath components for efﬁcient estimation,”
Wireless Communications Letters, IEEE, vol. 3, no. 5, pp. 501–504, Oct
2014.
[30] N. Michelusi, U. Mitra, A. Molisch, and M. Zorzi, “UWB Sparse/Diffuse
Channels, Part I: Channel Models and Bayesian Estimators,” Signal
Processing, IEEE Transactions on, vol. 60, no. 10, pp. 5307–5319, 2012.
[31] A. Molisch, “Ultra-Wide-Band Propagation Channels,” Proceedings of
the IEEE, vol. 97, no. 2, pp. 353–371, Feb. 2009.
[32] S. Grebien, E. Leitinger, K. Witrisal, and B. H. Fleury, “Super-resolution
channel estimation including the dense multipath component — A sparse
variational Bayesian approach,” 2021, in preperation.
[33] K. Witrisal, P. Meissner, E. Leitinger, Y. Shen, C. Gustafson, F. Tufves-
son, K. Haneda, D. Dardari, A. F. Molisch, A. Conti, and M. Z.
Win, “High-Accuracy Localization for Assisted Living: 5G systems will
turn multipath channels from foe to friend,” IEEE Signal Processing
Magazine, vol. 33, no. 2, pp. 59–70, March 2016.
[34] Y. Shen and M. Win, “Fundamental Limits of Wideband Localization;
Part I: A General Framework,” Information Theory, IEEE Transactions
on, vol. 56, no. 10, pp. 4956–4980, Oct. 2010.
[35] Y. Shen, H. Wymeersch, and M. Win, “Fundamental Limits of Wideband
Localization; Part II: Cooperative Networks,” Information Theory, IEEE
Transactions on, vol. 56, no. 10, pp. 4981 –5000, Oct. 2010.
[36] F. Meyer, T. Kropfreiter, J. L. Williams, R. Lau, F. Hlawatsch, P. Braca,
and M. Z. Win, “Message passing algorithms for scalable multitarget
tracking,” Proc. IEEE, vol. 106, no. 2, pp. 221–259, Feb. 2018.
[37] F. Meyer, P. Braca, P. Willett, and F. Hlawatsch, “Scalable Multitarget
Tracking using Multiple Sensors: A belief propagation approach,” in
Information Fusion (Fusion), 2015 18th International Conference on,
July 2015, pp. 1778–1785.
[38] Y. Bar-Shalom and X.-R. Li, Multitarget-Multisensor Tracking : Prin-
ciples and Techniques.
Storrs, CT: Yaakov Bar-Shalom, 1995.
[39] J. Vermaak, S. J. Godsill, and P. Perez, “Monte Carlo ﬁltering for multi
target tracking and data association,” vol. 41, no. 1, pp. 309–332, Jan.
2005.
[40] S. Kay, Fundamentals of Statistical Signal Processing: Detection The-
ory.
Prentice Hall Signal Processing Series, 1998.
[41] H. Wymeersch, “The Impact of Cooperative Localization on Achieving
higher-level Goals,” in Communications Workshops (ICC), 2013 IEEE
International Conference on, June 2013, pp. 1–5.
[42] F. Meyer, H. Wymeersch, M. Frohle, and F. Hlawatsch, “Distributed
Estimation With Information-Seeking Control in Agent Networks,”
Selected Areas in Communications, IEEE Journal on, vol. 33, no. 11,
pp. 2439–2456, Nov 2015.
[43] B. Grocholsky and B. Grocholsky, “Information-Theoretic Control
of Multiple Sensor Platforms,” Ph.D. dissertation, Department of
Aerospace, Mechatronic and Mechanical Engineering, 2002.
[44] C. Shannon, “A Mathematical Theory of Communication,” Bell System
Technical Journal, The, vol. 27, no. 4, pp. 623–656, Oct 1948.
[45] S. Haykin, Cognitive Dynamic Systems: Perception-action Cycle, Radar
and Radio.
New York, NY, USA: Cambridge University Press, 2012.
[46] R. Bellman, Dynamic Programming.
Princeton, NJ, USA: Princeton
University Press, 1957.
[47] D. P. Bertsekas, Dynamic Programming and Optimal Control.
Athena
Scientiﬁc, 2000.
[48] R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning,
1st ed.
Cambridge, MA, USA: MIT Press, 1998.
[49] A. Lazaric, M. Restelli, and A. Bonarini, “Reinforcement learning in
continuous action spaces through sequential monte carlo methods,” in
Advances in Neural Information Processing Systems, 2007.
[50] Z. Sahinoglu, S. Gezici, and I. Guvenc, Ultra-wideband Positioning
Systems – Theoretical Limits, Ranging Algorithms and Protocols. Cam-
bridge University Press, 2008.
[51] P. Meissner, E. Leitinger, M. Lafer, and K. Witrisal, “MeasureMINT
UWB database,” www.spsc.tugraz.at/tools/UWBmeasurements, 2013,
Publicly available database of UWB indoor channel measurements.
[Online]. Available: www.spsc.tugraz.at/tools/UWBmeasurements
[52] J. Karedal, S. Wyne, P. Almers, F. Tufvesson, and A. Molisch, “A
Measurement-Based Statistical Model for Industrial Ultra-Wideband
Channels,” Wireless Communications, IEEE Transactions on, vol. 6,
no. 8, pp. 3028–3037, Aug. 2007.