Intrinsic ﬁltering on Lie groups with applications to attitude
estimation
Axel Barrau, Silv`ere Bonnabel∗
September 15, 2014
Abstract
This paper proposes a probabilistic approach to the prob-
lem of intrinsic ﬁltering of a system on a matrix Lie group
with invariance properties. The problem of an invariant
continuous-time model with discrete-time measurements
is cast into a rigorous stochastic and geometric frame-
work. Building upon the theory of continuous-time in-
variant observers, we introduce a class of simple ﬁlters
and study their properties (without addressing the opti-
mal ﬁltering problem). We show that, akin to the Kalman
ﬁlter for linear systems, the error equation is a Markov
chain that does not depend on the state estimate. Thus,
when the ﬁlter’s gains are held ﬁxed, the noisy error’s dis-
tribution is proved to converge to a stationary distribution,
under some convergence properties of the ﬁlter with noise
turned off. We also introduce two novel tools of engineer-
ing interest: the discrete-time Invariant Extended Kalman
Filter, for which the trusted covariance matrix is shown
to converge, and the Invariant Ensemble Kalman Filter.
The methods are applied to attitude estimation, allowing
to derive novel theoretical results in this ﬁeld, and illus-
trated through simulations on synthetic data.
1
Introduction
The problem of probabilistic ﬁltering aims at computing
the conditional expectation ˆx(t) of the state x(t) of a sys-
tem driven by process and observation noises conditioned
∗A.
Barrau
and
S.
Bonnabel
are
with
MINES
ParisTech,
PSL
Research
University,
Centre
for
robotics,
60
Bd
St
Michel
75006
Paris,
France.
[axel.barrau,silvere.bonnabel]@mines-paristech.fr
on the past inputs and outputs. Filtering is a branch of
stochastic process theory that has been strongly driven
by applications in control and signal processing. In the
particular case where the system is linear and corrupted
by independent white Gaussian process and observation
noises, the celebrated Kalman ﬁlter [20] solves the ﬁlter-
ing problem. Yet, when the system is non-linear there
is no general method to derive efﬁcient ﬁlters, and their
design has encountered important difﬁculties in practice.
Indeed, general ﬁltering equations describing the evolu-
tion of the state conditioned on past outputs can also be
derived, see e.g., [10] and references therein. However,
the equations are not closed, and in most cases the con-
ditional densities can not be computed by solving a ﬁnite
dimensional equation.
In engineering applications the most popular solution
is the extended Kalman ﬁlter (EKF), which does not pos-
sess general theoretical properties, especially in a stochas-
tic context, and is sometimes prone to divergence. The
EKF amounts to linearize the system around the esti-
mated trajectory, and build a Kalman ﬁlter for the linear
model, which can in turn be implemented on the non-
linear model. Despite the fact that it is subject to sev-
eral problems both theoretical and practical, one merit of
the EKF is to convey an estimation of the mean state ˆx(t)
together with an approximation of the covariance matrix
P(t) of the error, yielding a (trusted) ellipsoidal credible
set that reﬂects the level of uncertainty in the estimation.
In order to capture more closely the distribution, several
rather computationally extensive methods have recently
attracted growing interest such as particle ﬁlters and un-
scented Kalman ﬁlters (UKF) [19].
In this paper we consider a general ﬁltering problem on
Lie groups in a stochastic setting, and a class of simple
arXiv:1310.2539v2  [cs.SY]  12 Sep 2014
recursive ﬁnite dimensional ﬁlters, having a similar struc-
ture as the EKF, is introduced. Such recursively deﬁned
ﬁlters allow straightforward digital computer implemen-
tation. Besides, as the Kalman ﬁlter, the corresponding
algorithms will be readily implementable with very little
understanding of the theory leading to their development,
as illustrated by the concrete examples of sections 4.2,
4.3 and 5.3, which can be read independently of the rest
of the paper. As concerns their theoretical properties, they
are motivated by the three following desirable properties
a simple recursive ﬁlter should ideally have: 1- forget-
ting of the initial condition, 2- being able to capture the
extent of uncertainty in the estimate, 3- being efﬁcient,
that is, minimizing the error’s dispersion within a given
class. In the ﬁltering problem, the estimation error is ini-
tially distributed according to a prior distribution provided
by the practitioner. The rationale for the ﬁrst property is
that, as the ﬁlter acquires information, the prior informa-
tion should be given less (eventually no) importance by
the ﬁlter. As the user generally possesses very little in-
formation on the system’s state initially, the prior can be
very far from the truth. Forgetting the initial condition
allows the (subjective Bayesian) prior distribution to be
as informative as noninformative without impacting the
ﬁnal estimates. Moreover, it corresponds to convergence
properties of the ﬁlter, and thus deals with the ability to re-
cover from catastrophic failures. In robotics, this leads to
robustness to erroneous beliefs due for instance to percep-
tual ambiguity. The second property deals with the cen-
tral motivation behind stochastic approaches. Indeed, all
deterministic observers, no matter how strong properties
they may have, have a common fundamental weakness: in
the presence of sensor and model uncertainties, they pro-
vide no indication of the extent of uncertainty conveyed
by the estimate. Being able to compute credible sets for
the estimate may be needed in order to adapt control deci-
sions to the situation, for instance to guarantee an absence
of collision. The two ﬁrst properties are general and ap-
ply to any class of sensible ﬁlters. The third property ad-
dresses the issue of gain tuning. When working with a
class of (suboptimal) ﬁlters, the gains can be tuned over a
training set in order to minimize the level of uncertainty
conveyed by the estimate.
Several approaches to ﬁltering for systems possessing
a geometric structure have been developed in previous lit-
erature. For stochastic processes on Riemannian mani-
folds [18, 14] some results have been derived, see e.g.,
[29]. The speciﬁc situation where the process evolves in
a vector space but the observations belong to a manifold
has also been considered, see e.g. [33, 13] and more re-
cently [36]. For systems on Lie groups powerful tools to
study the ﬁltering equations (such as harmonic analysis
[45, 32, 24, 43]) have been used, notably in the case of bi-
linear systems [44] and estimation of the initial condition
of a Brownian motion in [15]. An extension of Gaussian
distributions has also been developed and recently used
for Bayesian ﬁltering on Lie groups in [46] and also in
[7] which devises a general adaptation of the EKF to Lie
groups, without exploiting the geometrical properties of
the speciﬁc invariant case such as in the present paper. A
somewhat different but related approach to ﬁltering con-
sists of ﬁnding the path that best ﬁts the data in a deter-
ministic setting. It is thus related to optimal control theory
where geometric methods have long played an important
role [9]. A certain class of least squares problems on the
Euclidian group has been tackled in [17]. This approach
has also been used to ﬁlter inertial data recently in [34],
[26].
Although the paper is concerned with a general the-
ory on (matrix) Lie groups, the several derived ﬁlters will
systematically be applied to a benchmark and motivating
ﬁltering example on the special orthogonal group SO(3).
The problem of attitude estimation consists of estimating
the orientation of a rigid body, which proves to be a cen-
tral task in aeronautics, in order to stabilize or guide the
system, or in mobile robotics, where orientation is a part
of the robot’s localization. When the system is equipped
with gyroscopes, the delivered signals can be integrated
over time to compute an orientation increment. However,
the initial orientation may not be known, and even if it is,
the gyroscopes are subject to noise and drift, so that pure
integration may quickly result in a very erroneous atti-
tude. Thus the estimations are regularly checked against
some vectors’ orientation, such as the earth magnetic ﬁeld
measured by magnetometers, or the earth gravity vector
measured by accelerometers under quasi-stationary ﬂight
assumptions. The explosion over the last decade of low-
cost noisy sensors integrated in unmanned aerial vehicles
(UAV’s), and the limited computational power on-board,
have been a driving force for the development of simple
ﬁlters on SO(3). Work on the deterministic problem dates
back to [37, 30] and has received much attention recently,
see [42, 27, 26, 4, 28, 38] to cite a few, and some related
issues have been addressed as well, like angular velocity
estimation [40] or attitude control [37, 39]. The beauty of
the intrinsic problem formulation on SO(3) and the prop-
erties stemming from it have been exploited in various
papers in the deterministic case, see e.g. [26, 21, 5]. The
problem has also been cast into a minimum energy ﬁl-
tering framework in [25]. In the stochastic framework, a
general review of the classical techniques based on Gaus-
sian assumptions is proposed in [27].
A ﬁrst attempt
to systematically exploit the invariance properties to de-
sign stochastic ﬁlters on SO(3) appears in the preliminary
work [1].
The contributions and the organization of the present
paper are as follows. In Section 2, the problem of stochas-
tic ﬁltering on Lie groups with a continuous-time noisy
model and discrete-time observations is rigorously posed.
From a theoretical viewpoint, the choice of discrete-time
measurements allows to circumvent some technical difﬁ-
culties and to cast the problem into a perfectly rigorous
framework. As this work is mainly guided by applica-
tions to aerospace, this choice is relevant as the evolution
equations of the orientation generally rely on the integra-
tion of noisy inertial measurements that are delivered at
a high rate, whereas the observations may be delivered
at low(er) rate (GPS, vision). In order to derive proper
ﬁlters, the problem is then transformed into a complete
discrete-time model. The discretization is not straightfor-
ward because unlike the linear case, difﬁculties arise from
non-commutativity.
Guided by the theory of symmetry-preserving ob-
servers that has been exclusively concerned with the
continuous-time and deterministic case so far, we propose
in Section 3 a class of natural ﬁlters. They appear as a
mere transposition of linear observers (such as the linear
Kalman ﬁlter) to the group, but where the addition has
been replaced with the group multiplication law (due to
non-commutativity, there is always some freedom in the
transposition as left and right multiplications are differ-
ent operations). We prove the proposed ﬁlters retain the
invariances of the problem, and that the discrete-time er-
ror’s evolution is independent of the system’s trajectory,
inheriting the properties of the deterministic continuous-
time case [5] (or merely the linear case). Thus, the gains
can be computed off-line, and the proposed ﬁlters require
very few on-board computational resources. This is one
of their main advantages over current ﬁlters for the atti-
tude estimation problems considered in simulations.
The proposed class is very broad and the tuning issue
is far from trivial. Sections 4 and 5 explore two different
routes, depending on if we are interested in the asymp-
totic or transitory phase. In Section 4, we propose to hold
the gains ﬁxed over time. As a result, the error equation
becomes a homogeneous Markov chain. We ﬁrst prove
some new results about global convergence in a determin-
istic and discrete-time framework. Then, building upon
the homogeneity property of the error, we prove that if
the ﬁlter with noise turned off admits almost global con-
vergence properties, the error with noise turned on con-
verges to a stationary distribution. Mathematically this is
a very strong and to some extent surprising result. From a
practical viewpoint, the gains can be tuned numerically to
minimize the asymptotic error’s dispersion. This allows
to “learn” sensible gains for very general types of noises.
The theory is applied to two examples and gives conver-
gence guarantees in each case. First an attitude estimation
problem using two vector measurements and a gyroscope
having isotropic noise (Section 4.2), then the construc-
tion of an artiﬁcial horizon with optimal gains, for a non-
Gaussian noise model (Section 4.3). Each application is
a contribution in itself and can be implemented without
reading the whole paper.
In Section 5, we propose to optimize the convergence
during the transitory phase using Gaussian approxima-
tions. We ﬁrst introduce a method of engineering inter-
est: the Invariant Extended Kalman Filter (IEKF) in dis-
crete time. The IEKF was introduced in continuous time
in [3] and developped and applied to localization prob-
lems in [6] and very recently in [2]. A discrete time ver-
sion on SO(3) was proposed in the preliminary paper [1].
The idea consists of linearizing the invariant error for the
class of ﬁlters introduced in Section 3 around a ﬁxed point
(the identity element of the matrix group), build a Kalman
ﬁlter for the linear model and implement it on the non-
linear model. As the linearizations always occur around
the same point, the linearized model is time-invariant and
thus the Kalman gains, as well as the Kalman covariance
matrix, are proved to converge to ﬁxed values. When on-
board storage and computational resources are very lim-
ited, this advantageously allows to replace the gain with
its asymptotic value. The IEKF is compared to the well-
known MEKF [22, 12] and UKF [19, 11] on the attitude
estimation problem, and simulations illustrate some con-
vergence properties that the latter lack. In the case where
the error equation is totally autonomous, we introduce
a new method based on off-line simulations, the IEnKF,
which outperforms the other ﬁlters in case of large noises
by capturing very accurately the error’s dispersion.
2
Problem setting
2.1
Considered continuous-time model
Consider a state variable χt taking values in a matrix
Lie group G with neutral element Id, and the following
continuous-time model with discrete measurements:
d
dt χt = (υt +wt)χt + χtωt
(1)
Yn = h(χtn,Vn)
(2)
where υt and ωt are inputs taking their values in the Lie
algebra g (i.e. the tangent space to the identity element of
G), wt is a continuous white Langevin noise with diffu-
sion matrix Σt, whose precise deﬁnition will be discussed
below in Subsection 2.3. (Yn)n⩾0 are the discrete-time ob-
servations, belonging to some measurable space Y and
Vn is a noise taking values in Rp for an integer p > 0. We
further make the following additional assumption:
Assumption 1 (left-right equivariance)
The output
map h is left-right equivariant, i.e. there exists a left action
of G on Y such that we have the equality in law:
∀g,χ ∈G, h(χg,Vn) L= g−1h(χ,Vn)
(3)
and h(χg,0) = g−1h(χ,0)
(4)
The reader who is not familiar with stochastic calcu-
lus and Lie groups can view χt as a rotation matrix, and
replace the latter property with the following more restric-
tive assumption:
Assumption 1 bis
For any n ⩾0 there exists a vector b
such that h(χ,Vn)=χ−1(b+Vn).
The assumptions could seem restrictive but are veriﬁed
in practice in various cases as shown by the following ex-
amples, which will provide the reader with a more con-
crete picture than the formalism of Lie groups.
2.2
Examples
2.2.1
Attitude estimation on ﬂat earth
Our motivating example for the model (1)-(2) is the atti-
tude estimation of a rigid body assuming the earth is ﬂat,
and observing two vectors:
d
dt Rt = Rt(ωt +wω
t )×,
Yn = (RT
tnb1 +V 1
n ,RT
tnb2 +V 2
n ),
(5)
where as usual Rt ∈SO(3) represents the rotation that
maps the body frame to the earth-ﬁxed frame, and where
ωt ∈R3 is the instantaneous rotation vector, and wω
t ∈R3
is a continuous Gaussian white noise representing the gy-
roscopes’ noise. We have let (x)× ∈R3×3 denote the skew
matrix associated with the cross product with a three di-
mensional vector, i.e., for a,x ∈R3, we have (x)×a =
x × a. (Yn)n≥0 is a sequence of discrete noisy measure-
ments of two vectors b1,b2 of the earth ﬁxed frame ver-
ifying b1 × b2 ̸= 0, and V 1
n and V 2
n are sequences of in-
dependent isotropic Gaussian white noises. Note that the
noise wω
t is deﬁned, as the input ωt, in the body frame (in
other words it is multiplied on the left). Thus equations
(5) do not match (1)-(2) which correspond to a noise de-
ﬁned in the earth-ﬁxed frame. This can be remedied in
the particular case where the gyroscope noise is isotropic,
a restrictive yet relevant assumption in practice.
Deﬁnition 1. A Langevin noise wt of g is said isotropic if
˜wt := gwtg−1 L= wt for any g ∈G.
Note that for a noise taking values in so(3) this deﬁni-
tion corresponds to the physical intuition of an isotropic
noise of R3. With this additional assumption the equation
can be rewritten:
d
dt Rt = ( ˜wω
t )×Rt +Rt(ωt)×,
Yn = (RT
tnb1 +V 1
n ,RT
tnb2 +V 2
n ),
which corresponds to (1)-(2).
Remark 1. If the gyrometer noise is not isotropic the new
noise ˜wω
t is related to Rt by ˜wω
t = Rtwω
t RT
t . Depending
on the degree of anisotropy this can prevent the use of
methods based on the autonomy of the error (see 4) but
not of the discrete-time IEKF (see 5.1).
2.2.2
Attitude estimation on a round rotating Earth
Another interesting case is attitude estimation on SO(3)
using an observation of the vertical direction (given by an
accelerometer) and taking into account the rotation of the
earth. Whereas the ﬂat earth assumption perfectly suits
low-cost gyroscopes, precise gyroscopes can measure the
complete attitude by taking into account the earth’s ro-
tation. We deﬁne a geographic frame with axis North-
west-up. The attitude Rt is the transition matrix from the
body reference to the geographic reference. The earth in-
stantaneous rotation vector υ ∈R3 and the gravity vector
g ∈R3 are expressed in the geographic reference. Both
are constant in this frame. The gyroscope gives a con-
tinuous rotation speed ωt, disturbed by an isotropic con-
tinuous white noise wω
t . The equation of the considered
system reads:
d
dt Rt = (υ)×Rt +Rt(ωt +wω
t )×,
Yn = RT
tn(g+Vn)
As wω
t is supposed isotropic the equation can be rewritten:
d
dt Rt = (υ + ˜wω
t )×Rt +Rt(ωt)×,
Yn = RT
tn(g+Vn),
which corresponds to (1)-(2).
2.2.3
Attitude and velocity estimation
We give here an example on a larger group. Consider the
attitude Rt ∈SO(3) and speed vt ∈R3 of an aircraft evolv-
ing on a ﬂat earth, equipped with a gyroscope and an ac-
celerometer. The gyroscopes gives continuous increments
ψt ∈R3 with isotropic noise wψ
t ∈R3, and the accelerom-
eters gives continuous increments at ∈R3 with isotropic
noise wa
t ∈R3. The aircraft has noisy speed measure-
ments in the earth reference frame Yn = vtn +Vn, where
Vn is a supposed to be a Gaussian isotropic white noise.
The equations read:
d
dt Rt = Rt(ψt +wψ
t )×,
d
dt vt = Rt(at +wa
t )+g,
where Rt is the rotation mapping the body frame to the
earth-ﬁxed frame, vt is the velocity expressed in the earth
ﬁxed frame, and g is the earth gravity vector, supposed to
be (locally) constant. Using the matrix Lie group SE(3)
we introduce:
At =

RT
t
RT
t vt
0
1

; ωt =

0
g
0
0

υt =

−(ψt)×
at
0
0

; wt =

(wψ
t )×
wa
t
0
0

The problem can then be rewritten under the form (1)-(2):
d
dt At = (υt +wt)At +Atωt,
Ytn = A−1
tn

Vn
1

,
2.3
Interpretation of Langevin noises on Lie
Groups
This present subsection provides some mathematical con-
siderations about white noises on Lie groups. It can be
skipped by the uninterested reader who is directly referred
to the model (6).
Equation (1) is actually a Langevin equation that suf-
fers from poly-interpretability because of its non-linearity,
and its meaning must be clariﬁed in the rigorous frame-
work of stochastic calculus. Stratonovich stochastic dif-
ferential equations on a Lie group can be intrinsically de-
ﬁned as in e.g. [23]. A somewhat simpler (but equiva-
lent) approach consists of using the natural embedding of
the matrix Lie group G in a matrix space and to under-
stand the equation in the sense of Stratonovich, see e.g.
[35]. The mathematical reasons stem from the fact that
the resulting stochastic process is well-deﬁned on the Lie
group, whereas this is not the case when opting for an Ito
interpretation as underlined by the following easily prov-
able proposition.
Proposition 1. If the stochastic differential equation (1)
is taken in the sense of Itˆo, for G = SO(3) embedded in
R3×3, an application of the Itˆo formula to χT
t χt shows
that the solution almost surely leaves the submanifold G
for any time t > 0.
Besides, the physical reasons for this stem from the fact
that the sensors’ noise are never completely white, and
for colored noise the Stratonovich interpretation provides
a better approximation to the true solution than Ito’s, as
advocated by the result of Wong and Zakai [47].
2.4
Exact discretization of the considered
model
To treat rigorously the problem of integrating discrete
measurements we need to discretize the continuous model
with the same time step as the measurements’. Unlike the
general case of non-linear estimation, the exact discrete-
time dynamics corresponding to Equation (1) can be ob-
tained, as proved by the following result:
Theorem 1. Let χn = χtn.
Then the discrete system
(χn,Yn) satisﬁes the following equations:
χn+1 = ϒnWnχnΩn,
Yn = h(χn,Vn),
(6)
where Wn is a random variable with values in G and
whose law depends on the values taken by υt for t ∈
[tn,tn+1] and on the law of wt for t ∈[tn,tn+1], and ϒn and
Ωn are elements of G which only depend on the values
taken respectively by υt and ωt for t ∈[tn,tn+1].
Proof. For n ∈N consider the value of the process χt is
known until time tn. Let ϒt and Ωt be the solutions of the
following equations:
ϒtn = Id, d
dt ϒt = υtϒt
and Ωtn = Id, d
dt Ωt = Ωtωt
Let Wt be the solution of the following (Stratonovich)
stochastic differential equation:
Wtn = Id,
d
dtWt = ϒ−1
t
wtϒtWt
Note that ϒ−1
t
wtϒt being in g, Wt is ensured to stay in
G. Deﬁne the process χt|tn = ϒtWtχtnΩt. We will show
that for t > tn the processes χt and χt|tn verify the same
stochastic differential equation. Indeed:
d
dt χt|tn = ( d
dt ϒt)WtχtΩt +ϒt( d
dtWt)χtΩt +ϒtWtχt( d
dt Ω)
= υtϒtWtχtΩt +wtϒtWtχtΩt +ϒtWtχtΩtωt
= (υt +wt)χt|tn + χt|tnωt
Thus the two processes have the same law at time tn+1,
i.e. χn+1 and Wtn+1ϒtn+1χtnΩtn+1 have the same law. Let-
ting ϒn = ϒtn+1, Ωn = Ωtn+1 and Wn = Wtn+1 we obtain the
result.
Remark 2. In many practical situations (for instance ex-
amples 2.2.1 and 2.2.2), the Langevin noise wt is isotropic
and we have thus ϒ−1
t
wtϒt
L= wt. Note that, the variable
Wt depends also only on the law of wt for t ∈[tn,tn+1].
In the sequel, for mathematical reasons (the equa-
tions do not suffer from poly-interpretability), tutorial rea-
sons (the framework of diffusion processes on Lie groups
needs not be known), and practical reasons (any ﬁlter
must be implemented in discrete time), we will system-
atically consider the discrete-time model (6). Moreover,
the noise Wn will be a general random variable in G, not
necessary a solution of the stochastic differential equation
d
dtWt = ϒ−1
t
wtϒtWt.
3
A class of discrete-time intrinsic
ﬁlters
3.1
Preliminary: Kalman ﬁlter for linear
stationary systems with drift
Consider the particular case of a linear system in RN (with
arbitrary N) of the form d
dt Xt = bt +wt, wt being a white
noise and bt a known deterministic input. Assume we
have discrete measurements of the form Yn = HXtn +Vn ∈
Rp, Vn being a Gaussian noise in Rp. A straightforward
discretization yields:
Xn+1 = Xn +Bn +Wn,
Yn = HXn +Vn,
(7)
With Bn =
R tn+1
tn
bsds and Wn =
R tn+1
tn
wsds is the value of a
Brownian motion in RN. The optimal ﬁlter in this case is
the celebrated linear Kalman ﬁlter whose equations are:
ˆX′
n+1 = ˆXn +Bn,
ˆXn+1 = ˆX′
n+1 +Kn+1(Yn+1 −H ˆX′
n+1)
(8)
The ﬁrst equation is the so-called prediction step, and the
second one the correction step.
Let e′
n = ˆX′
n −Xn and
en = ˆXn −Xn denote the predicted and corrected state es-
timation errors, we get:
e′
n+1 = en −Wn,
en+1 = e′
n+1 −Kn+1(He′
n+1 −Vn+1)
(9)
The error equation is autonomous and can be optimized
independently from the dynamical inputs Bn’s. As a con-
sequence, the optimal gains Kn can be computed off-line.
It is a well-known result that they (as well as the error
distribution) converge under observability and detectabil-
ity assumptions, so all computations can be done off-line
and only the ﬁrst values need be stored, in which case
only the equations (8) must be computed on-line, requir-
ing extremely restricted computational power on board.
Mimicking the form of equations (8) on the Lie group G
where the addition is naturally replaced with the group
multiplication law, leads to a similar result, as shown in
the next section.
3.2
Proposed intrinsic ﬁlters
Inspired by the linear case and the theory of continuous-
time symmetry-preserving observers on Lie groups [5] we
propose to mimic the prediction and update steps (8) re-
placing the addition with the group multiplication yield-
ing the class of ﬁlters deﬁned by:
ˆχ′
n+1 = ϒn ˆχnΩn,
(10)
ˆχn+1 = Kn+1( ˆχ′n+1Yn+1) ˆχ′n+1,
(11)
where Kn+1(•) can be any function of Y →G, ensuring
K(h(Id,0)) = Id . There are two ways to understand the
links between (8) and (10)-(11). First, by deﬁning a an
estimation error on the group G by χn+1 ˆχ−1
n+1 (see below),
and by using the left-right equivariance hypothesis, which
allows to interpret ˆχ′n+1Yn+1 = h(χn+1 ˆχ′−1
n+1,Vn+1) as a
group equivalent to H(X′
n+1 −ˆXn+1)+Vn+1. Secondly, by
viewing the linear case of Section 3.1 as a speciﬁc case of
the proposed approach through the following proposition
whose proof has been moved to the Appendix.
Proposition 2. There exists an isomorphic representation
of RN as a matrix Lie group such that (7) takes the canoni-
cal form (1)-(2), and (8) becomes the invariant ﬁlter (10)-
(11).
We now deﬁne the invariant output errors, which are
a transposition of the linear error to our multiplicative
group:
ηn = χn ˆχ−1
n ,
η′
n = χn ˆχ′−1
n
(12)
We have the following striking property, that is similar to
the linear case:
Theorem 2. The error variables ηn and η′
n are Markov
processes, and are independent of the inputs Ωn’s.
Proof. The equations followed by η′
n and ηn read:
η′
n+1 = χn+1 ˆχ′−1
n+1 = ϒnWnχnΩnΩ−1
n
ˆχ−1
n ϒ−1
n = ϒnWnηnϒ−1
n
(13)
and:
ηn+1 = χn+1 ˆχ′−1
n+1Kn+1( ˆχ′n+1Yn+1)−1
= η′
n+1Kn+1( ˆχ′n+1h(χn+1,Vn+1))−1
L= η′
n+1Kn+1(h(η′
n+1,Vn+1))−1,
(14)
thanks to the equivariance property of the output.
The most important consequence of this property is that
if the inputs υt are known in advance, or are ﬁxed, as it
is the case in Examples 2.2.1 and 2.2.2, the gain func-
tions Kn can be optimized off-line, independently of the
trajectory followed by the system. In any case, numerous
choices are possible to tune the gains Kn and the remain-
ing sections are all devoted to various types of methods to
tackle this problem.
4
Fixed gains ﬁlters
In certain cases, one can build an (almost) globally con-
vergent observer for the associated deterministic system,
i.e., with noise turned off, by using a family of constant
gain function Kn(•) ≡K(•). If the ﬁlter with noise turned
off has the desirable property of forgetting its initial con-
dition, convergence to a single point is impossible to re-
tain, because of the unpredictability of the noises that
“steer” the system, but convergence of the distribution can
be expected, assuming as in the linear case that ϒn is inde-
pendent of n. Indeed in this section we prove that, when
noise is turned on, the error forgets its initial distribution
under mild conditions. The results are illustrated by an
attitude estimation example for which we propose an in-
trinsic ﬁlter having strong convergence properties.
It should be noted that in practice, the convergence to
an invariant distribution allows in turn to pick the most
desirable gain K(•) among the family based on a per-
formance criterion, such as convergence speed, or ﬁlter’s
precision (that is, ensuring low error’s dispersion). This
fact will be illustrated by an artiﬁcial horizon example of
Subsection 4.3.
4.1
Convergence results
Here the left-hand inputs ϒn are assumed ﬁxed. This, to-
gether with constant gains Kn, makes the error sequence
ηn a homogeneous Markov chain. Thus, under appropri-
ate technical conditions, the chain has a unique stationary
distribution and the sequence converges in distribution to
this invariant distribution. Let d denote a right-invariant
distance on the group G. We propose the following as-
sumptions:
1. Conﬁnement of the error: there exists a compact set
C such that ∀n ∈N,ηn ∈C a.s. for any η0 ∈C
2. Diffusivity: the process noise has a continuous part
with respect to Haar measure, with density positive
and uniformly bounded from zero in a ball of radius
α > 0 around Id.
3. Reasonable
output
noise:
∀g
∈
G,
P[gK(h(g,Vn))−1 ∈Bo(gK(h(g,0))−1, α
2 )] > ε′
for some ε′ > 0.
The second assumption implies, and can in fact be re-
placed with, the more general technical assumption that
there exists ε > 0 such that for any subset U of the ball
Bo(Id,α) we have P(Wn ⊂U) > εµ(U) for all n ≥0
where µ denotes the Haar measure. Those noise proper-
ties are relatively painless to verify, whereas the conﬁne-
ment property although stronger is automatically veriﬁed
whenever G is compact, e.g. G = SO(3). Intuitively, the
last two assumptions guarantee the error process is well
approximated by its dynamics with noise turned off, fol-
lowed by a small diffusion. In the theory of Harris chains,
the latter diffusion step is a key element to allow proba-
bility laws to mix at each step and eventually forget their
initial distribution.
Theorem 3. For constant left-hand inputs ϒn ≡ϒ, con-
sider the ﬁlter:
ˆχ′
n+1 = ϒ ˆχnΩn
(15)
ˆχn+1 = K( ˆχ′n+1Yn+1) ˆχ′n+1
(16)
Suppose that Assumptions 1)-2)-3) are veriﬁed, where the
compact set satisﬁes C = cl(Co), cl denoting the closure
and o the interior. When noises are turned off, the error
equation (13)-(14) becomes:
γ′
n+1 = ϒγnϒ−1,
γn+1=γ′
n+1Kn+1(h(γ′
n+1,0))−1,
(17)
Suppose that for any γ0 ∈C, except on a set of null Haar
measure, γn converges to Id. Then there exists a unique
stationary distribution π on G such that for any prior law
µ0 of the error η0 supported by C, the law (µn)n≥0 of
(ηn)n≥0 satisﬁes the total variation (T.V.) norm conver-
gence property:
lim
n
∥µn −π∥T.V. →0
Corollary 1. When the group G is compact, The conver-
gence results of Theorem 3 hold globally, i.e. without the
conﬁnement assumption 1).
Theorem 4. Under the assumptions of Theorem 3, as-
suming only h(γn,0) →0 instead of almost global con-
vergence of (γn)n≥0, that G is compact, the set K = {g ∈
G,h(g,0) = h(Id,0)} connected and h(ϒ,0) = h(Id,0), the
results of Theorem 3 are still valid. Moreover, if Wn is
isotropic, we have π(˜ϒV) = π(V) for any ˜ϒ ∈K commut-
ing with ϒ (˜ϒϒ = ϒ˜ϒ).
The proofs of the results above have all been moved to
the Appendix.
4.2
Application to attitude estimation
Consider the attitude estimation example of Subsection
2.2.1. In a deterministic and continuous time setting, al-
most globally converging observers have been proposed
in several papers (see e.g. [37, 26]) and have since been
analyzed and extended in a number of papers. In order
to apply the previously developed theory to this example,
the challenge is twofold. First, the deterministic observer
must be adapted to the discrete time and proved to be al-
most globally convergent. Then, the corresponding ﬁlter
must be proved to satisfy the assumptions of the theorems
above in the presence of noise. In discrete time, the sys-
tem equations read:
Rn+1 =WnRnΩn,
Yn =(RT
n b1 +V 1
n ,RT
n b2 +V 2
n )
with the notations introduced in 2.2.1. We propose the
following ﬁlter on SO(3):
ˆR′
n+1 = ˆRnΩn,
ˆRn+1 = K( ˆR′n+1Yn+1) ˆR′n+1,
with K(y1,y2) = exp(k1(y1 ×b1)+k2(y2 ×b2)),
k1 > 0, k2 > 0, k1 +k2 ⩽1
(18)
Proposition 3. With noise turned off, the discrete invari-
ant observer (18) is almost globally convergent, that is,
the error converges to Id for any initial condition except
one.
The proof has been moved to the Appendix, only the
main idea is given here.
For the continuous-time de-
terministic problem it is known, and easily seen, that
E : γ →k1||γTb1 −b1||2 +k2||γTb2 −b2||2 is a Lyapunov
function, allowing to prove almost global convergence of
the corresponding observer. In the discrete time determin-
istic case, the function above remains a Lyapunov func-
tion for the sequence (γn)n≥0, allowing to derive Propo-
sition 3. This is not trivial to prove, and stems from the
more general following novel result:
Proposition 4. Consider a Lie group G equipped with a
left-invariant metric ⟨.,.⟩, and a left-invariant determinis-
tic discrete equation on G of the form:
γn+1 = γn exp(−k(γn))
Assume there exists a C2 function E : G →R≥0 with
bounded sublevel sets, a global minimum at Id, and a con-
tinuous and strictly positive function u : G →R>0 such
that: ∀x ∈G,k(x) = u(x)[x−1.gradE(x)]. If the condition
∀x ∈G,| ∂k
∂x| ⩽1 (for the operator norm) is veriﬁed, for
any initial value γ0 such that Id is the only critical point
of E in the sublevel set {x ∈G | E(x) ≤E(γ0)} we have:
γn →
n→∞Id
The proof has been moved to the Appendix. Note that,
the latter property is closely related to Lemma 2 of [41].
Using Theorem 3 and Proposition 3 we ﬁnally directly
get:
Theorem 5. The distribution of the error variable of the
invariant ﬁlter (18) converges for the T.V. norm to an
asymptotic distribution, which does not depend on its ini-
tial distribution.
4.3
Learning robust gains: application to
the design of an artiﬁcial horizon
As proved in 4.1, under appropriate conditions the error
variable is a converging Markov chain whose asymptotic
law depends on the gain function but not on the trajectory
followed by the system (which is a major difference with
most nonlinear ﬁlters, such as the EKF). Hence a ﬁxed
gain can be asymptotically optimized off-line, leading to
a very low numerical cost of the on-line update.
A classical aerospace problem is the design of an artiﬁ-
cial horizon using an inertial measurement unit (IMU). An
estimation of the vertical is maintained using the observa-
tions of the accelerometer (which senses the body accel-
eration minus the gravity vector, expressed in the body
frame) and the stationary ﬂight assumption according to
which the body’s linear velocity is constant. The prob-
lem is that this approximation is not valid in dynamical
phases (take-off, landing, atmospheric turbulence), which
are precisely when the artiﬁcial horizon is most needed.
The problem is generally stated as follows:
d
dt Rt = Rt(ωt +wt)×,
Yn = RT
tng+Vn +Nn,
where Rt is the attitude of the aircraft (the rotation from
body-frame coordinates to inertial coordinates), ωt is the
continuous-time gyroscope increment and Yn is the obser-
vation of the accelerometer. The sensor noises wt and Vn
can be considered as Gaussian, and Nn represents ﬂuctu-
ations due to accelerations of the aircraft that we propose
to model as follows: Nn is null with high probability but
when non-zero it can take large values. The Nn’s are as-
sumed to be independent as usually.
4.3.1
Convergence results
Consider the following class of ﬁlters:
ˆR′
n+1 = ˆRnΩn,
ˆRn+1 = K( ˆR′n+1Yn+1) ˆR′n+1,
with K(y) = exp( fk,λ(y))
(19)
where fk,λ(x) = k.min(angle(x,g),λ) x×g
||x×g|| if x × g ̸= 0
and fk,λ(x) = 0 otherwise.
The rationale for the gain tuning is as follows: if the
accelerometer measures a value y, we consider the small-
est rotation giving to g the same direction as y.
Con-
serving the same axis, the angle of this rotation is thresh-
olded (hence the parameter λ) to give less weight to out-
liers (without purely rejecting them, otherwise the ﬁlter
couldn’t converge when initialized too far).
Then we
choose as a gain function a rotation by a fraction of the
obtained angle. We begin with the following preliminary
result:
Lemma 1. For any 0 < k ⩽1 and 0 < λ ⩽π the out-
put error ∥Yn −ˆRT
n g∥associated to the observer (19) with
noise turned off converges to 0.
Proof. Let us consider the error evolution when the noise
is turned off. It writes γn+1 = γn exp(−fk,λ(γ−1
n g)), thus
we have γ−1
n+1g = exp( fk,λ(γ−1
n g))γ−1
n g. As for any n ∈N,
fk,λ(γ−1
n g) is orthogonal to g and γ−1
n g, γ−1
n+1g stays in
the plane spanned by g and γ−1
0 g, as well as the whole
sequence (γ−1
n g)n⩾0. Let φn = angle(γ−1
n g,g). The dy-
namics of (φn) writes: φn+1 = φn −k.min(λ,φn). Thus φn
goes to 0, i.e: γ−1
n g →
n→∞g, i.e. the observation error goes
to 0.
The following result is a mere consequence of Lemma
1 and Theorem 4.
Proposition 5. The error variable associated to the ﬁlter
deﬁned by (19) converges to a stationary distribution for
the T.V. norm, which does not depend on its initial distri-
bution.
4.3.2
Numerical asymptotic gain optimization
To each couple (k,λ) we can associate an asymptotic er-
ror dispersion (computed in the Lie Algebra) associated
to the corresponding stationary distribution, and try to
minimize it.
As all computations are to be done off-
line, the statistics of all distributions can be computed
using particle methods.
Table 1 gives the parameters
of the model used in the following numerical experi-
ment.
Figure 1 displays the Root Mean Square Error
RMSE =
p
E(η∞g−g), computed over a grid for the pa-
rameters (k,λ). The minimum is obtained for k = 0.1202
and λ = 0.0029. If we compare it to a MEKF, we observe
a huge difference. For our asymptotic invariant ﬁlter we
get RMSE = 8.02 × 10−4. For the MEKF, the observa-
tion noise matrix giving the best results leads to the value
RMSE = 4.3 × 10−3. This result is not surprising due to
the fact that the outliers signiﬁcantly pollute the estimates
of the Kalman ﬁlter (see Fig. 1). This illustrates the fact
that when the noise is highly non-Gaussian, an asymptotic
gain with some optimality properties can still be found.
5
Gaussian ﬁlters
The present section focuses more on the transitory phase,
and the gain is computed at each step based on Gaussian
noise approximations and linearizations. Beyond the lo-
cal optimization underlying the gain computation, the two
following ﬁlters have the merit to readily convey an ap-
proximation of the extent of uncertainty carried by the
estimations. We ﬁrst introduce the discrete-time invari-
ant extended Kalman ﬁlter (IEKF), which is an EKF, but
based on the linearization of an invariant state error. Then
we introduce the IEnKF, which computes the empirical
covariance matrix of the error using particles. As all com-
putations can be done off-line, it is easily implementable,
as long as the gains can be stored, and is shown to con-
vey a better indication of the error’s dispersion for large
noises. Simulations illustrate the results.
5.1
The discrete-time Invariant EKF
In the present section we introduce the discrete-time
IEKF. Indeed, as in the standard EKF theory, the idea is to
linearize the error equation (13)-(14) assuming the noises
Figure 1:
Artiﬁcial horizon example. Top plot: off-line
optimization of the parameters of the asymptotic invari-
ant ﬁlter (axis x →λ ; axis y →k ; axis z →Root Mean
Square Error). The highest values are not displayed to
improve visualization. The performance is optimal for
k = 0.1202,λ = 0.0029.
Bottom plot: error evolution
of three artiﬁcial horizons : the invariant ﬁlter with op-
timized gain functions (black), a MEKF ignoring outliers
(red), and a MEKF where the observation noise covari-
ance matrix has been numerically adjusted to minimize
the RMSE (dashed line). We see that the MEKF cannot
ﬁlter the outliers and at the same time be efﬁcient in the
absence of outliers, contrarily to the proposed invariant
ﬁlter.
and the state error are small enough, use Kalman equa-
tions to tune the gains on this linear system, and imple-
ment the gain on the nonlinear model. As in the heuristic
theory of the Invariant EKF in continuous time [3] [6] the
gains and the error covariance matrix are proved to con-
verge to a ﬁxed value. In particular this allows (after some
time) to advantageously replace the gain by its constant ﬁ-
nal value leading to a numerically very cheap asymptotic
version of the IEKF.
Algorithm 1 Invariant Extended Kalman Filter
Returns at each time step ˆχn, Pn such that χn ≈
exp(ξn) ˆχn, where ξn is Gaussian and var(ξn) = Pn. ˆχ0
and P0 are inputs.
Hξ,HV deﬁned by:
h(exp(ξ ′
n),Vn) = h(Id,0)+Hξξ ′
n+HVVn+O(∥ξ ′
n∥2)+O(∥Vn∥2)
Set n = 0.
loop
Compute the value Mtn+1 solving the equation
Mtn = 0,
d
dt Mt = Var(wt)+adυtMtadT
υt
The process noise covariance Qw
n = Var(Adϒnwn) is
equal to Mtn+1
 if wn is isotropic or if υt ≡0, merely
set Qw
n = Var(wn)(tn+1 −tn)

,
QV
n+1 = Var(Vn+1),
Pn+1|n = AdϒnPnAdT
ϒn +Qw
n ,
Sn+1 = HVQV
n+1HT
V +HξPn+1|nHT
ξ ,
Ln+1 = Pn+1|nHT
ξ S−1
n+1,
Pn+1 = (I −Ln+1Hξ)Pn+1|n,
ˆχ′
n+1 = ϒn ˆχnΩn,
ˆχn+1 = exp(Ln+1[ ˆχ′n+1Yn+1 −h(Id,0)]) ˆχ′n+1
n ←n+1
end loop
5.1.1
Linearization of the equations and IEKF for-
mulas
We consider here the error equations (13)-(14). Assuming
errors and noises are small, we introduce their projection
in the tangent space at the identity Id (the so-called Lie
algebra g of the group) using the matrix exponential map
exp(.) (see e.g. [31]). As the matrix vector space g can be
identiﬁed with Rdimg using the linear mapping (.)m (see
Table 2) we can assume exp : Rdimg →G. This function is
deﬁned in Table 2 for usual Lie groups of mechanics. Its
inverse function will be denoted by exp−1. We thus deﬁne
the following quantities of Rdimg:
ξn = exp−1(ηn),
ξ ′
n = exp−1(η′
n),
wn = exp−1(Wn)
The simplest way to design the gain is to use a function
which is linear in Rdimg and then mapped to G through
the exponential:
Kn : y →exp[Ln(y−h(Id,0))]
Equations (13) and (14) mapped to Rdimg become:
ξ ′
n+1 = exp−1(exp(Adϒnwn)exp(Adϒnξn)),
ξn+1 = exp−1(exp(ξ ′
n+1)exp[−Ln(h(exp(ξ ′
n+1,Vn+1))−h(Id,0))]),
where we have introduced the adjoint matrix Ad:
Deﬁnition 2. The following property of the adjoint matrix
Ad can be used as a deﬁnition:
∀g ∈G,∀u ∈Rdimg, exp(Adgu) = gexp(u)g−1
The differential of ξ →Adexp(ξ) at zero is denoted ξ →
adξ.
To tune the gains Ln through the Kalman theory we
need to evaluate at each time step, using Pn = Var(ξn) ∈
Rdimg×dimg, the following quantities:
Pn+1|n = Var(ξ ′
n+1),
Sn+1 = Var(Yn+1)
Our approach consists of using a coarse ﬁrst-order de-
velopment of the propagation and observation functions.
To do so we use the Baker-Campbell-Hausdorff formula
and keep only ﬁrst order terms in ξ ′
n+1, wn and Vn+1, the
crossed terms being also neglected:
ξ ′
n+1 = Adϒnwn +Adϒnξn
h(exp(ξ ′
n+1),Vn+1) = h(Id,0)+Hξξ ′
n+1 +HVVn+1
ξn+1 = ξ ′
n+1 −Ln+1(Hξξ ′
n+1 +HVVn+1)
The computation of the gains is then straightforward using
the Kalman theory in Rdimg, which is a vector space. It is
described in Algorithm 1.
5.1.2
Convergence of the gains
The main beneﬁt of the ﬁlter is with respect to its conver-
gence properties. Indeed, under very mild conditions, the
covariance matrix Pn and the ﬁlter’s gain Kn are proved to
converge to ﬁxed values [20]. The practical consequences
are at least twofold: 1-the error covariance converges to
a ﬁxed value and is thus much easier to interpret by the
user than a matrix whose entries keep on changing (see
Fig. 2) 2-due to computational limitations on-board, the
covariance may be approximated by its asymptotic value,
leading to an asymptotic version of the IEKF being nu-
merically very cheap.
Theorem 6. If the noise matrices Qw
n = Var(wn), QV
n =
Var(Vn) and the left inputs ϒn are ﬁxed, the pair
(Adϒn,Hξ) is observable and Qw
n has full rank, then Pn
and Kn converge to ﬁxed values P∞and K∞, as in the lin-
ear time-invariant case.
5.2
The discrete-time Invariant EnKF
When the error equation is independent of the inputs, the
exact density of the error variable can be sampled off-
line. This allows to radically improve the precision of
the quantities involved in the Kalman gains computation
of Section 5.1.1. We propose here the Invariant Ensemble
Kalman ﬁlter (IEnKF) described in Algorithm 2. The idea
is to compute recursively through Monte-Carlo simula-
tions a sampling of the error density and to use it, instead
of linearizations, to evaluate precisely the innovation and
error covariance matrices used to compute the gains in the
Kalman ﬁlter. The procedure is described in Algorithm 2.
5.3
Simulations
We will display here the results of Algorithms 1 and 2
for the attitude estimation problem described in Section
2.2.1:
d
dt Rt = Rt(ωt +wω
t )×,
Yn = (RT
tnb1 +V 1
n ,RT
tnb2 +V 2
n ),
where Rt ∈SO(3) represents the rotation that maps the
body frame to the earth-ﬁxed frame, and ωt is the instan-
taneous angular rotation vector which is assumed to be
measured by noisy gyroscopes. (Yn)n≥0 is a sequence of
Algorithm 2 IEnKF
Deﬁne H by h(exp(ξ),0) = h(Id,0)+Hξ +O(∥ξ∥2)
Sample M particles (ηi
0)1⩽i⩽M following the prior error
density
for n = 0 to N −1 do
for i = 1 to M do
η′i
n+1 = W i
nϒηi
nϒ−1,
yi
n+1 = h(η′i
n+1,V i
n+1),
end for
Pn+1|n = 1
M ∑M
i=1 exp−1(η′i
n+1)exp−1(η′i
n+1)T
Sn+1 = 1
M ∑M
i=1 yi
n+1yi
n+1
T,
Ln+1 = Pn+1|nHTS−1
n+1
for i = 1 to M do
ηi
n+1 = η′i
n+1 exp(−Ln+1[yi
n+1 −h(Id,0)])
end for
Store the gain matrix Ln+1
end for
The prior estimation ˆχ0 is an input.
for n = 0 to N-1 do
ˆχ′
n+1 = ϒ ˆχnΩn,
ˆχn+1 = exp(Ln+1[ ˆχ′n+1Yn+1 −h(Id,0)]) ˆχ′n+1
end for
discrete noisy measurements of the vectors b1 and b2 (for
instance the gravity and the magnetic ﬁeld), V 1
n and V 2
n are
sequences of independent isotropic white noises. A sim-
ulation has been performed using the parameters given in
Table 3. In this case the IEKF equations are described by
Algorithm 3. As the state-of-the-art methods for this par-
ticular problem are the Multiplicative Extended Kalman
Filter (MEKF, see e.g. [12]) and the USQUE ﬁlter [11]
(a quaternion implementation of the Unscented Kalman
Filter), the four methods are compared. The evolution of
the gains is displayed on Fig. 2 and shows the interest of
the invariant approach. The error variable is expressed in
the Lie algebra and its projection to the ﬁrst axis is given
on Fig. 3 for each method (the projections on other axes
being very similar, they are not displayed due to space
limitations). All perform satisfactorily, but some differ-
ences are worthy of note. First we can see that the MEKF
and the IEKF yield comparable performances but only the
gains of the IEKF converge. Moreover, the linearizations
lead the MEKF and the IEKF to fail capturing accurately
the error dispersion through a 3σ-envelope. To this re-
spect, the USQUE performs better than the two latter ﬁl-
ters but still does not succeed in capturing the uncertainty
very accurately. On the other hand, the envelope provided
by the EnKF is very satisfying. This result is not surpris-
ing: this envelope has been computed using a sampling of
the true density of the error, there is thus no reason why
it should not be a valid approximation as long as there are
sufﬁciently many particles. This method is thus to be pre-
ferred to the other ones when the user is willing and able
to perform extensive simulations, and has the capacity to
store the gains over the whole trajectory.
Algorithm 3 Invariant Extended Kalman Filter on SO(3)
Returns at each time step
ˆRn, Pn such that Rn ≈
exp(ξn) ˆRn, where ξn is a centered Gaussian and
var(ξn) = Pn.
ˆR0 and P0 are inputs.
Hξ =

(b1)×
(b2)×

, HV = I6,
QV
=

Var(V 1)
0
0
Var(V 2)

,
Qw
=
R tn+1
tn
Var(wω
s )ds.
Set n = 0.
loop
Let Ωn be the solution at tn+1 of Ttn = I3,
d
dt Tt =
(ωt)×.
Pn+1|n = Pn +Qw,
Sn+1 = HVQVHT
V +HξPn+1|nHT
ξ ,
Ln+1 = Pn+1|nHT
ξ S−1
n+1,
Pn+1 = (I −Ln+1Hξ)Pn+1|n,
ˆR′
n+1 = ˆRnΩn
∆χ = Ln+1[
 ˆR′n+1Y 1
n+1
ˆR′n+1Y 2
n+1

−

b1
b2

],
ˆRn+1 = exp(∆χ) ˆR′n+1 (see Table 2)
n ←n+1
end loop
6
Conclusion
This paper has introduced a proper stochastic framework
for a class of ﬁltering problems on Lie groups. A wide
class of discrete-time ﬁlters with strong properties has
Figure 2:
Evolution over time of the coefﬁcients of the
gain matrix for MEKF, IEKF, USQUE and IEnKF. The
coefﬁcients of the MEKF and USQUE have an erratic
evolution, whereas those of the IEKF are convergent, al-
lowing to save much computation power using asymptotic
values. Moreover, in this case, only 4 of the 18 coefﬁ-
cients are not equal to zero for the IEKF, allowing sparse
implementation.
been proposed, along with several methods to tune the
gains. When the gain is held ﬁxed, the estimation error
has the remarkable property to provably converge in dis-
tribution. Another route is to focus more on the transi-
tory phase, and to seek to minimize the dispersion of the
estimation error at each step based on linearizations and
Gaussian noise approximations. This has lead to an in-
variant version of the celebrated EKF, having the property
that the error covariance matrix converges asymptotically,
as well as to an invariant version of the EnKF which has
been shown to convey an accurate indication of the level
of uncertainty carried by the estimate.
The results of the paper open the door to a rich vari-
ety of ﬁlters on Lie groups. Indeed, all the classical gain
tuning approaches can still be considered, and a method
based on the unscented transform for instance, complet-
ing the work [11] seems an interesting idea. In a situation
where many observations are available at each time step,
an information ﬁlter could also be derived mimicking the
IEKF. Besides, we believe that the cases where the gains
of the IEKF are sparse should be given special attention,
Figure 3:
Evolution of the error variable, projected on
the ﬁrst axis, for 1000 simulations using MEKF, IEKF,
USQUE and IEnKF. The MEKF and the IEKF give sim-
ilar results. The USQUE is doing slightly better but is
outperformed by the IEnKF. Only the latter captures the
error dispersion satisfactorily, as the 3σ envelope contains
99 % of the simulated trajectories.
by studying when this property occurs and if signiﬁcant
computation power and memory can be saved. The possi-
bilities offered by off-line simulations deserve also to be
deepened. The computation of exact conﬁdence sets can
be explored, establishing a link with the approach of set-
valued observers proposed in [8] and [38]. Concerning
the asymptotic method introduced here, the optimization
has been performed quite naively as efﬁcient density sim-
ulation is beyond the scope of the paper, but there may
certainly be some room for improvements using Markov
Chain Monte-Carlo (MCMC) schemes to speed up con-
vergence. In the future we will concentrate on proving
the convergence in distribution of the IEKF, that can be
conjectured at this stage, but does not seem easy.
A
Proof of Proposition 2
For any i ∈N, let Mi(R) denote the set of square matri-
ces of size i. Consider the isomorphisms φX : X ∈RN →
IdN
0N,1
XT
1

∈MN+1(R) and φY : Y ∈Rp →

HT
Y T

∈
MN+1,p. Letting χn = φX(Xn), Y ∗
n = φY(Yn), V ∗
n = φY(Vn),
W ∗
n = φX(Wn), Ωn = φX(Bn) and ϒn = IdN+1 equations (7)
become χn+1 = ϒnW ∗
n χnΩn and Y ∗
n = χ−1
n V ∗
n . Using the
inverse isomorphisms φ −1
X
and φ −1
Y , the invariant update
ˆχn+1 = Kn( ˆχ′
n+1Y ∗
n+1) ˆχn+1 corresponds in Rn to (7).
B
Proofs of the results of Section 4.1
We ﬁrst introduce here the basic notions about Harris
Chains that we will need, to prove the stochastic conver-
gence properties of the invariant ﬁlters.
Deﬁnition 3. (First hitting time) Let S be a measurable
space and (Xn)n⩾0 a Markov chain taking values in S. The
ﬁrst hitting time τC of a subset C ⊂S is deﬁned by τC =
infn⩾0 {Xn ∈C}.
Deﬁnition 4. (Recurrent aperiodic Harris chain) Let S be
a measurable space and (Xn)n⩾0 a Markov chain taking
values in S. (Xn)n⩾0 is said to be a recurrent aperiodic
Harris chain if there exist two sets A,B ⊂S satisfying the
following properties:
i For any initial state X0 the ﬁrst hitting time of A is a.s.
ﬁnite.
ii There exists a probability measure ρ on B, and ε > 0
such that if x ∈A and D ⊂B then P(X1 ∈D|X0 = x) >
ερ(D).
iii There exists an integer N ⩾0 such that: ∀x ∈A,∀n ⩾
N,P(Xn ∈A|X0 = x) > 0.
The somewhat technical property ii means that any
given area of B can be reached from each point of A with
non-vanishing probability.
Theorem 7. [Harris, 1956] A recurrent aperiodic Harris
chain admits a unique stationary distribution ρ∞and the
density of the state Xn converges to ρ∞in T.V. norm for
any distribution of X0.
Proof. See [16], Theorems 6.5 and 6.8
In other words, if for any initialization we can ensure
that the process will come back with probability 1 to a
central area A where mixing occurs, we have a conver-
gence property. The following technical result is a cor-
nerstone to demonstrate the theorems of Subsection 4.1.
Lemma 2. Let G be a locally compact Lie group pro-
vided with a right-invariant distance d, and C ⊂G be a
measurable compact set endowed with the σ-algebra Σ.
Consider a homogeneous Markov chain (ηn)n⩾0 deﬁned
by the relation
η0 ∈C,
ηn+1 = qz(ηn)
where z ∈Z is some random variable belonging to a
measurable space. Let 0 denote a speciﬁc point of Z . Let
Q : (C,Σ) →R+, denote the transition kernel of the chain,
that is Q(x,V) = P(qz(ηn) ∈V | ηn = x). We assume that
(a) there exist real numbers α,ε > 0 such that for any x ∈
C and V ⊂Bo(q0(x),α), the latter denoting the open
ball center q0(x) and radius α, we have Q(x,V) =
P(qz(ηn) ∈V | ηn = x) ⩾εµG(V) where µG is the
right-invariant Haar measure.
(b) q0 admits a ﬁxed point x0 ∈C, i.e. q0(x0) = x0.
Let U0 = {x0}. Deﬁne the sets (Un)n⩾0 recursively by
U′
n = {x ∈G,d(x,Un) < α
2 } ∩C and Un+1 = q−1
0 (U′
n). If
there exists an integer N ⩾0 such that UN = C, then the
p.d.f. of ηn converges for the T.V. norm and its limit does
not depend on the initialization.
More prosaically, z denotes the cumulated effect of the
model and observation noise, and it is assumed that 1- the
noiseless algorithm has some global convergence proper-
ties and 2- there are sufﬁciently many small enough noises
so that there is an ε chance for ηn to jump from x to any
neighborhood of the noiseless iterate q0(x) with a multi-
plicative factor corresponding to the neighborhood’s size.
It then deﬁnes a sequence of sets, by picking a ﬁxed point
of q0, dilating it, and taking its pre-image. Dilatation and
inversion are then reiterated until the whole set C is cov-
ered. In this case, forgetting of the initial distribution is
ensured.
Proof. We will demonstrate the property trough three in-
termediate results:
1. There exists a sequence ε1,··· ,εN such that: P(η1 ∈
Un|η0 ∈Un+1) > εn > 0.
2. The ﬁrst hitting time of U1 is a.s. ﬁnite.
3. (ηn)n≥0 is an aperiodic recurrent Harris chain with
A = U1 and B = U′
0.
The conclusion will then immediately follow from The-
orem 7.
We give ﬁrst a qualitative explanation of the
approach.
We have by assumption a sequence of sets
(Un)n⩾0. Intermediate result 1) states that starting from
a set Un+1 the chain has a non-vanishing chance to jump
to the “smaller” set Un at the next time step. Interme-
diate result 2) states that starting from Un+1, because of
the non-vanishing property 1), the chain cannot avoid Un
forever and eventually reaches Un, Un−1 and ﬁnally U1.
Intermediate result 3) brings out the fact that, A = U′
0 be-
ing totally included in the ball of radius α centered on
any of its points, the chain can go from any given point
of its pre-image B = U1 to any area of A with controlled
probability, which is the last property needed to obtain a
recurrent aperiodic Harris chain.
1. Consider the closure U′n of U′
n. For any x ∈U′n the set
Bo(x,α)∩Un is open and non-empty (as d(x,Un) ⩽
α
2 by deﬁnition of U′n) so µG(Bo(x,α) ∩Un) > 0.
The function f : x →µG(Bo(x,α)∩Un) > 0 is con-
tinuous (since µG is a regular and positive mea-
sure) on the compact set U′n, so it admits a mini-
mum mn > 0.
We get: P(η1 ∈Un|η0 ∈Un+1) ≥
P(η1 ∈Bo(q0(η0),α) ∩Un|η0 ∈Un+1) = P(η1 ∈
Bo(q0(η0),α) ∩Un|q0(η0) ∈U′
n) ≥εmn using the
fact that, by assumption (a), Q(x,V) ≥εµG(V) for
x ∈C and V ⊂Bo(q0(x),α). We set εn = εmn to
prove intermediate result 1).
2. We will prove by descending induction on n the
property Pn : P(τn < ∞) = 1.
PN is obvious as
UN =C. Now assume Pn+1 is true for n ⩾1. Due to
homogeneity, we can construct a strictly increasing
sequence of stopping times (νp)p⩾0 such that: ∀p ⩾
0,ηνp ∈Un+1. But P(ηνp+1 ∈Un|ηνp ∈Un+1) > εn
by intermediate result 1). Thus letting Tk denote the
event {∀i ⩽k,ηi /∈Un} = {k < τn}, we have for any
n ⩾0: P(Tνp+1) = P(Tνp)P(Tνp+1|Tνp) ⩽P(Tνp)(1−
εn). Thus P(Tνp+1) ⩽P(Tνp+1) ⩽P(Tνp)(1−εn). A
quick induction and a standard application of Borel-
Cantelli Lemma prove there exists a.s. a rank p such
that Tνp is false. In other words, Pn is true. In par-
ticular, P1 is true: the ﬁrst hitting time of U1 is a.s.
ﬁnite.
3. Deﬁne A =U1 and B =U′
0 = Bo(x0, α
2 )∩C . For any
V ⊂B and x ⊂A we have V ⊂Bo(q0(x),α), thus
Q(x,V) ⩾εµG(V) by assumption (a). Thus (ηn)n⩾0
veriﬁes property ii) of Deﬁnition 4. As A = U1 the
intermediate result 2) shows that (ηn)n⩾0 veriﬁes the
property i) of Deﬁnition 4. As there exists a.s. a
rank k such that ηk ∈U1 by intermediate result 2),
there exists an integer M such that P(ηM ∈U1) > 0.
As U1 ∩U′
0 ⊂Bo(q(x),α) for any x ∈U1 (by deﬁni-
tion of U′
0 and U1) Assumption (a) gives immediately
P(ηM+1 ∈U1∩U′
0) > P(ηM ∈U1)εµG(U1∩U′
0). Us-
ing only the deﬁnition of U′
0 and U1, and Assump-
tion (b), we see easily that both contain x0. Thus the
set U′
0 ∩U1 is non-empty and open. We have then
µG(U1 ∩U′
0) > 0 and P(ηM+1 ∈U1 ∩U′
0) > 0. By
an obvious induction on m, P(ηm ∈U1 ∩U′
0) > 0 for
any m > M. As U1 ∩U′
0 ⊂A we get P(ηm ∈A) > 0
for any m > M and the property iii) of Deﬁnition 4
is also veriﬁed. We ﬁnally obtain that (ηn)n⩾0 is an
aperiodic recurrent Harris chain.
Using Theorem 7 we can conclude that the density of ηn
converges in T.V. norm to its unique equilibrium distribu-
tion for any initialization.
Building upon the previous results, the proof of Theo-
rem 3 is as follows. The Markov chain deﬁned by equa-
tions (13) and (14) with ﬁxed ϒ can be written under the
form η′
n+1 = qz(η′
n) = q1,z ◦q2,z(η′
n) with z = (Wn,Vn) ∈
G×Rp, q1,z(x) = ϒWnxϒ−1 and q2,z(x) = xK(h(x,Vn))−1.
Let 0 = (Id,0). We will show that (η′
n)n⩾0 has all the
properties required in Lemma 2.
Let α,ε,ε′ be deﬁned as in 4.1.
Let α′ such as
Bo(Id,α′) ⊂ϒBo(Id, α
2 )ϒ−1 and ε′′ = ε′|Adϒ−1|ε, where
|Adϒ−1| is the determinant of the adjoint operator on g.
For any x ∈C and V ⊂Bo(q0(x),α′) = Bo(Id,α′)q0(x)
we have:
P(qz(x) ∈V) ⩾P(qz(x) ∈V,q2,z(x) ∈Bo(q2,0(x), α
2 ))
⩾P(q2,z(x) ∈Bo(q2,0(x), α
2 ))×
P(ϒWnq2,z(x)ϒ−1 ∈V|q2,z(x) ∈Bo(q2,0(x), α
2 ))
i.e.
P(qz(x)
∈
V)
⩾
ε′P(Wn
∈
ϒ−1Vϒq2,z(x)−1|q2,z(x) ∈Bo(q2,0(x), α
2 )).
As-
suming
q2,z(x)
∈
Bo(q2,0(x), α
2 )
we
have:
ϒ−1Vϒq2,z(x)−1
⊂
ϒ−1Bo(Id,α′)q0(x)ϒq2,z(x)−1
⊂
Bo(Id, α
2 )ϒ−1q0(x)ϒq2,z(x)−1
⊂
Bo(Id, α
2 )q2,0(x)q2,z(x)−1 ⊂Bo(Id, α
2 + α
2 ).
As Wn
is independent from the other variables we obtain:
P(Wn ∈ϒ−1Vϒq2,z(x)−1|q2,z(x) ∈Bo(q2,0(x), α
2 )) ⩾
εµG(ϒ−1Vϒq2,z(x)−1) = εµG(ϒ−1Vϒ) = εµG(V)|Adϒ−1|
And ﬁnally:
P(qz(x) ∈V) ⩾ε′|Adϒ−1|εµ(V) = ε′′µ(V)
As q0 has Id as a ﬁx point we only have to ver-
ify that the sets Un as deﬁned in Lemma 2 eventually
cover the whole set C.
It sufﬁces to consider for any
n ⩾0 the set Dn = {x ∈C,∀k ⩾n,qk
0(x) ∈U′
0}.
As
we have qn
0(x) →Id almost-everywhere on C we get:
µG(∪n⩾0Dn) = µG(C). As the sequence of sets (Dn)n⩾0
increases we have: µG(Dn) −→
n→∞µG(C).
We introduce
here the quantity vmin = minx∈C µG(Bo(x, α′
2 ) ∩C) (the
property C = cl(Co) and the regularity of µG ensure that
we have vmin > 0). Let N ∈N be such that µG(DN) >
µG(C)−vmin. We have then: ∀y ∈C,d(y,DN) < α′
2 (oth-
erwise we would have µG(DN) ⩽µG(C) −vmin).
As
we have DN ⊂UN we obtain ∀y ∈C,d(y,DN) < α′
2 thus
U′
N = C and UN+1 = q−1
0 (U′
N) = q−1
0 (C) = C. So all the
conditions of Lemma 2 hold and we can conclude con-
vergence in T.V. norm to a stationary distribution which
doesn’t depend on the prior, provided that its support lies
in C. Theorem 3 is proved.
Corollary 1 follows directly from the case C = G.
Theorem 4 can be proved as follows. Let C = G. Let
K = {x ∈G,h(x,0) = h(Id,0)}. Note that the left-right
equivariance assumption ensures that K is a subgroup of
G.
First we show that there exists an integer N1 such
that K ⊂UN1. As we have ∀x ∈K q0(x) = ϒxϒ−1 the se-
quence of sets Qn = ϒ−nUnϒn∩K is growing and we have:
{x ∈K, d(x,Qn) < α
2 } ⊂Qn+1.The set Q∞= ∪∞
n=1Qn is
open in K as an union of open sets, but we have ∀x ∈
Q∞,Bo(x, α
2 )∩K ⊂Q∞thus ∀x ∈K \Q∞,Bo(x, α
4 )∩K ⊂
K \ Q∞. This implies that Q∞and K \ Q∞are both open
in K. As K is connected (see assumptions of Theorem 4)
we obtain Q∞= K. As K is compact (as a closed sub-
set of a compact) the open cover ∪∞
i=1Qn has a ﬁnite sub-
cover and there exists an integer N1 such that QN1 = K,
i.e. ϒN1Kϒ−N1 = UN1. As K is a subgroup of G (see
above) containing ϒ we obtain K ⊂UN1. Now we have
to prove that the sets (Un)n⩾0 eventually cover the whole
set C. For any x ∈G we have h(qn
0(x),0) →h(Id,0). Thus
there exists a rank n such that ∀k ⩾n,qk
0(x) ∈U′
N (oth-
erwise we could extract a subsequence from (qn
0(x))n⩾0
which stays at a distance > α
2 from K, as G is com-
pact we could extract again a convergent subsequence and
its limit q∞
0 (x) would be outside K, thus we would have
h(qn
0(x)) ̸→h(Id,0)). Here we can deﬁne as in the proof
of Theorem 3 the sets Dn = {x ∈C,∀k ⩾n,qk
0(x) ∈U′
N1}
(note that U′
1 has been replaced by U′
N1 in this deﬁnition
of Dn). As for almost any x ∈G there exists a rank n such
that x ∈Dn we have µG(Dn) →µG(C) and as in the proof
of Theorem 3 the sets Un eventually cover the set C.
The second part of the property (invariance to left mul-
tiplication) is easier. Consider the error process ζn = ˜ϒηn.
The propagation and update steps read:
ζ ′
n+1 = ˜ϒη′
n+1 = ˜ϒϒWnηnϒ−1 = ϒ˜ϒWnηnϒ−1 L= ϒWn ˜ϒηnϒ−1
L= ϒWnζϒ−1
ζn+1 = ˜ϒη′
n+1Kn+1(h(η′
n+1,Vn+1))−1
L= ˜ϒη′
n+1Kn+1(h(˜ϒη′
n+1,Vn+1))−1
L= ζ ′
n+1Kn+1(h(ζ ′,Vn+1))−1
where we have used the property:
h(η′
n+1,Vn+1) L=
η′−1
n+1h(Id,Vn+1) L= η′−1
n+1h(˜ϒ,Vn+1) L= h(˜ϒη′
n+1,Vn+1). We
see that the law of the error process is invariant under left
multiplication by ˜ϒ, thus the asymptotic distribution π in-
herits this property.
C
Proofs of the results of Section 4.2
C.1
Proof of Proposition 4
We deﬁne the continuous process ˜γt : R →G as follows:
∀n ∈N, ˜γn = γn
∀n ∈N,∀t ∈[n,n+1[, d
dt ˜γt = −˜γtk( ˜γn)
We obtain immediately that ˜γt is continuous. Besides, for
any n ∈N and t ∈]n,n+1[ one has:
| d
dt ⟨k( ˜γt),k( ˜γn)⟩| = |⟨d
dt k( ˜γt),k( ˜γn)⟩|
⩽| d
dt k( ˜γt)||k( ˜γn)|
⩽| d
dt ˜γt||k( ˜γn)| (due to |∂k
∂x| ⩽1 )
⩽|k( ˜γn)|2 (as d
dt ˜γt = −˜γtk( ˜γn) )
Thus:
⟨k( ˜γt),k( ˜γn)⟩⩾⟨k( ˜γn),k( ˜γn)⟩−(t −n)|k( ˜γn)|2 ⩾0
proving in the Lie algebra that ⟨k( ˜γt),k( ˜γn)⟩⩾0.
As
u
is
a
positive
function
we
have
thus
u( ˜γt)−1⟨˜γtk( ˜γt),−˜γtk( ˜γn)⟩⩽0
immediately
proving
⟨gradE( ˜γt), d
dt ˜γt⟩⩽0.
The latter result being true for
every t ∈R\N and the function E( ˜γt) being continuous,
it decreases and thus converges on R≥0.
As we have
supposed the sublevel sets of E are bounded (and closed
as E is continuous), ˜γt is stuck in a compact set.
Let
γ∞an adherence value of ˜γn.
By continuity of E we
have E( ˜γ∞) = E( ˜γ∞exp(−k( ˜γ∞)).
Thus the function
t →E( ˜γ∞exp(−tk( ˜γ∞))) is decreasing on [0,1] and has
the same value at 0 and 1. Thus it is constant, its deriva-
tive at 0 is null proving γ∞= Id. ( ˜γn)n⩾0 being conﬁned
in a compact set and having Id as unique adherence value
we ﬁnally get γn = ˜γn →Id.
C.2
Proof of Proposition 3
Let γn = Rn ˆRT
n be the invariant error. Its equation reduces
to:
γn+1 = γn.exp(−k1(γT
n b1)×b1 −k2(γT
n b2)×b2)
Let k(γ) = k1(γTb1) × b1 + k2(γTb2) × b2 and E : γ →
k1||γTb1 −b1||2 + k2||γTb2 −b2||2.
To apply Proposi-
tion 4 we will ﬁrst verify that 1) ∀γ ∈SO(3),γ.k(γ) =
gradE(γ) 2) | ∂k
∂γ | ⩽1.
To prove 1) consider the
dynamics in ∈SO(3) deﬁned by
d
dt γt = γt(ψt)× for
some rotation vector path ψt.
We have
d
dt E(γt) =
k1(γT
t b1 −b1)T d
dt γT
t b1 + k2(γT
t b2 −b2)T d
dt γT
t b2.
Using
triple product equalities this is equal to ⟨k1(γT
t b1)×b1 +
k2(γT
t b2)×b2,ψt⟩= ⟨k(γt),ψt⟩= ⟨γt.k(γt), d
dt γt⟩.
Thus
γ.k(γ) = gradE(γ). To prove 2) we analogously see that
d
dt k(γt) = −[k1(b1)×(γT
t b1)× +k2(b2)×(γT
t b2)×]ψt. Thus
| d
dt k(γt)| ⩽(k1 +k2)|ψt| and ﬁnally | ∂k
∂γ | ⩽1. Now, except
if initially γ0 is the rotation of axis b1×b2 and angle π, the
function E is strictly decreasing, and Id is the only point
in the sublevel set {γ ∈SO(3) | E(γ) ≤E(γ0)} such that
gradE(γ) = 0. Applying Proposition 4 allows to prove
Proposition 3.
References
[1] S. Bonnabel A. Barrau. Intrinsic ﬁltering on SO(3)
with discrete-time observations.
In IEEE Confer-
ence on Decision and Control, 2013.
[2] M. Barczyk and A. F. Lynch. Invariant observer de-
sign for a helicopter uav aided inertial navigation
system. Control Systems Technology, IEEE Trans-
actions on, 21(3):791–806, 2013.
[3] S. Bonnabel. Left-invariant extended Kalman ﬁlter
and attitude estimation. In Decision and Control,
2007 46th IEEE Conference on, pages 1027–1032.
IEEE, 2007.
[4] S.
Bonnabel,
Ph.
Martin,
and
P.
Rouchon.
Symmetry-preserving observers.
Automatic Con-
trol, IEEE Transactions on, 53(11):2514–2526,
2008.
[5] S.
Bonnabel,
Ph.
Martin,
and
P.
Rouchon.
Non-linear symmetry-preserving observers on Lie
groups.
IEEE Trans. on Automatic Control,
54(7):1709 – 1713, 2009.
[6] S. Bonnabel, Ph. Martin, and E. Salaun. Invariant
extended Kalman ﬁlter: theory and application to a
velocity-aided attitude estimation problem. In De-
cision and Control, 2009 held jointly with the 2009
28th Chinese Control Conference. CDC/CCC 2009.
Proceedings of the 48th IEEE Conference on, pages
1297–1304. IEEE, 2009.
[7] G.
Bourmaud,
R.
M´egret,
A.
Giremus,
Y. Berthoumieu, et al. Discrete extended Kalman
ﬁlter on Lie groups.
In 21st European Signal
Processing Conference 2013 (EUSIPCO 2013),
2013.
[8] S. Br´as, P. Rosa, C. Silvestre, and P. Oliveira. Global
attitude and gyro bias estimation based on set-valued
observers. Systems & Control Letters, 62(10):937–
942, 2013.
[9] R. W. Brockett. Lie algebras and Lie groups in con-
trol theory. In Geometric methods in system theory,
pages 43–82. Springer, 1973.
[10] D.L. Burkholder, ´E. Pardoux, A.S. Sznitman, and
P.L. Hennequin.
´Ecole d’´et´e de probabilit´es de
Saint-Flour XIX, 1989. Springer-Verlag, 1991.
[11] J. L. Crassidis and F Landis Markley. Unscented
ﬁltering for spacecraft attitude estimation. Journal
of guidance, control, and dynamics, 26(4):536–542,
2003.
[12] J. L. Crassidis, F. Landis Markley, and Yang Cheng.
A survey of nonlinear attitude estimation meth-
ods. Journal of Guidance, Control,and Dynamics,
30(1):12–28, 2007.
[13] T.E. Duncan.
Some ﬁltering results in Riemann
manifolds.
Information and Control, 35(3):182–
195, 1977.
[14] T.E. Duncan.
Stochastic systems in Riemannian
manifolds. Journal of Optimization theory and ap-
plications, 27(3):399–426, 1979.
[15] T.E. Duncan. An estimation problem in compact Lie
groups. Systems & Control Letters, 10(4):257–263,
1988.
[16] R. Durrett. Probability: theory and examples, vol-
ume 3. Cambridge university press, 2010.
[17] Y. Han and F.C. Park. Least squares tracking on the
Euclidean group. Automatic Control, IEEE Trans-
actions on, 46(7):1127–1132, 2001.
[18] K. Itˆo. Stochastic differential equations in a differ-
entiable manifold. Nagoya Mathematical Journal,
1:35–47, 1950.
[19] S.J. Julier and J.K. Uhlmann.
A new exten-
sion of the Kalman Filter to Nonlinear Systems.
Proc. of AeroSense :
The 11th Int. Symp. on
Aerospace/Defence Sensing, Simulation and Con-
trols, 1997.
[20] R. Kalman and R. Bucy. New results in linear ﬁlter-
ing and prediction theory. Basic Eng., Trans. ASME,
Ser. D,, 83(3):95–108, 1961.
[21] Ch. Lageman, J. Trumpf, and R. Mahony. Gradient-
like observers for invariant dynamics on a Lie
group. Automatic Control, IEEE Transactions on,
55(2):367–377, 2010.
[22] E. J. Lefferts, F Landis Markley, and M. D. Shus-
ter. Kalman ﬁltering for spacecraft attitude estima-
tion. Journal of Guidance, Control, and Dynamics,
5(5):417–429, 1982.
[23] M. Liao. L´evy processes in Lie groups. Cambridge
University Press, 2004.
[24] JT-H Lo. Optimal estimation for the satellite atti-
tude using star tracker measurements. Automatica,
22(4):477–482, 1986.
[25] J. Trumpf M. Zamani and R. Mahony. Near-optimal
deterministic ﬁltering on the rotation group. IEEE
Trans. on Automatic Control, 56:6:1411 – 1414,
2011.
[26] R. Mahony, T. Hamel, and J-M Pﬂimlin.
Non-
linear complementary ﬁlters on the special orthog-
onal group.
IEEE-Trans. on Automatic Control,
53(5):1203–1218, 2008.
[27] F Landis Markley, J. L. Crassidis, and Y. Cheng.
Nonlinear attitude ﬁltering methods. In AIAA Guid-
ance, Navigation, and Control Conference, 2005.
[28] Ph. Martin and E. Sala¨un. Invariant observers for
attitude and heading estimation from low-cost iner-
tial and magnetic sensors. In 46th IEEE Conf Decis
Contr, pages 1039–1045, 2007.
[29] S.K. Ng and P.E. Caines. Nonlinear ﬁltering in Rie-
mannian manifolds. IMA journal of mathematical
control and information, 2(1):25–36, 1985.
[30] H. Nijmeijer and T. I. Fossen. New directions in non-
linear observer design, volume 244. Springer, 1999.
[31] P. J. Olver. Classical Invariant Theory. Cambridge
University Press, 1999.
[32] W. Park, Y. Liu, Y. Zhou, M. Moses, G. S.
Chirikjian, et al. Kinematic state estimation and mo-
tion planning for stochastic nonholonomic systems
using the exponential map.
Robotica, 26(4):419–
434, 2008.
[33] M. Pontier and J. Szpirglas. Filtering on manifolds.
In Stochastic Modelling and Filtering, pages 147–
160. Springer, 1987.
[34] A. Saccon, J. Trumpf, R. Mahony, and A. P. Aguiar.
Second-order-optimal ﬁlters on lie groups.
[35] S. Said. Estimation and ﬁltering of processes in ma-
trix Lie groups. PhD thesis, Institut National Poly-
technique de Grenoble-INPG, 2009.
[36] S. Said and J. H. Manton. On ﬁltering with observa-
tion in a manifold: Reduction to a classical ﬁltering
problem. SIAM Journal on Control and Optimiza-
tion, 51(1):767–783, 2013.
[37] S. Salcudean.
A globally convergent angular ve-
locity observer for rigid body motion.
Automatic
Control, IEEE Transactions on, 36(12):1493–1497,
1991.
[38] A. K. Sanyal, T. Lee, M. Leok, and N. H. Mc-
Clamroch. Global optimal attitude estimation using
uncertainty ellipsoids. Systems & Control Letters,
57(3):236–245, 2008.
[39] A.
Tayebi,
S.
McGilvray,
A.
Roberts,
and
M. Moallem. Attitude estimation and stabilization
of a rigid body using low-cost sensors. In Decision
and Control, 2007 46th IEEE Conference on, pages
6424–6429. IEEE, 2007.
[40] J. Thienel and Robert M Sanner. A coupled non-
linear spacecraft attitude controller and observer
with an unknown constant gyro bias and gyro
noise.
Automatic Control, IEEE Transactions on,
48(11):2011–2015, 2003.
[41] R. Tron, B. Afsari, and R. Vidal. Riemannian con-
sensus for manifolds with bounded curvature. 2012.
[42] J.F. Vasconcelos, R. Cunha, C. Silvestre, and
P. Oliveira. A nonlinear position and attitude ob-
server on SE(3) using landmark measurements. Sys-
tems Control Letters, 59:155–166, 2010.
[43] A. S. Willsky.
Some estimation problems on Lie
groups. In D.Q. Mayne and R.W. Brockett, editors,
Geometric Methods in System Theory, volume 3 of
NATO Advanced Study Institutes Series, pages 305–
314. Springer Netherlands, 1973.
[44] A. S. Willsky and S. I. Marcus. Estimation for bilin-
ear stochastic systems. In Variable Structure Sys-
tems with Application to Economics and Biology,
pages 116–137. Springer, 1975.
[45] A.S. Willsky.
Dynamical Systems Deﬁned on
Groups: Structural Properties and Estimation. PhD
thesis, MIT Dept. of Aeronautics and Astronautics,
May 1973.
[46] K. C. Wolfe, M. Mashner, and G. S. Chirikjian.
Bayesian fusion on Lie groups. Journal of Algebraic
Statistics, 2(1):75–97, 2011.
[47] E. Wong and M. Zakai. Riemann-Stieltjes approxi-
mations of stochastic integrals. Probability Theory
and Related Fields, 12(2):87–97, 1969.
Table 1: Artiﬁcial horizon: experiment parameters
Standard Deviation of the model noise
0.01 degree = 1.75×10−4 rad
Standard Deviation of the regular observations
0.1 degree = 1.75×10−3 rad
Standard Deviation of the outliers
30 degrees
Probability of the outliers to occur
0.01
Table 2: Useful formulas for some classical matrix Lie groups
G
SO(3) (Rdimg = R3)
(Rdimg = R6)
Embedding of G
R ∈M3(R),RTR = I3
 
R
v
0
1
!
,R ∈SO(3),u ∈R3
Embedding of g
A3 : ψ ∈M3(R),ψT = −ψ
 
ψ
u
0
0
!
,ψ ∈A3,u ∈R3
(.)m
ξ →(ξ)×
 
ξ
u
!
→
 
(ξ)×
u
0
0
!
x →Adx
R →R
 
R
T
!
→
 
R
0
(T)×R
R
!
ξ →adξ
ξ →(ξ)×
 
ξ
u
!
→
 
(ξ)×
0
(u)×
(ξ)×
!
exp
exp(x) = I3 + sin(||x||)
||x||
(x)×
exp(x) = I4 +(x)m + [1−cos(||x||)]
||x||2
(x)2
×
+
1
||x||2 (1−cos|x|)(x)2
m
+
1
||x||3 (||x||−sin||x||)(x)3
m
Table 3: Observation of two vectors: parameters
Parameter
b1
b2
QV1
QV2
Value
(1,0,0)
(0,1,0)
0,08732I3
0,08732I3
Parameter
P0
Qw
N
Simulations
Value
0.52362I3
0,017452I3
50
1000
Intrinsic ﬁltering on Lie groups with applications to attitude
estimation
Axel Barrau, Silv`ere Bonnabel∗
September 15, 2014
Abstract
This paper proposes a probabilistic approach to the prob-
lem of intrinsic ﬁltering of a system on a matrix Lie group
with invariance properties. The problem of an invariant
continuous-time model with discrete-time measurements
is cast into a rigorous stochastic and geometric frame-
work. Building upon the theory of continuous-time in-
variant observers, we introduce a class of simple ﬁlters
and study their properties (without addressing the opti-
mal ﬁltering problem). We show that, akin to the Kalman
ﬁlter for linear systems, the error equation is a Markov
chain that does not depend on the state estimate. Thus,
when the ﬁlter’s gains are held ﬁxed, the noisy error’s dis-
tribution is proved to converge to a stationary distribution,
under some convergence properties of the ﬁlter with noise
turned off. We also introduce two novel tools of engineer-
ing interest: the discrete-time Invariant Extended Kalman
Filter, for which the trusted covariance matrix is shown
to converge, and the Invariant Ensemble Kalman Filter.
The methods are applied to attitude estimation, allowing
to derive novel theoretical results in this ﬁeld, and illus-
trated through simulations on synthetic data.
1
Introduction
The problem of probabilistic ﬁltering aims at computing
the conditional expectation ˆx(t) of the state x(t) of a sys-
tem driven by process and observation noises conditioned
∗A.
Barrau
and
S.
Bonnabel
are
with
MINES
ParisTech,
PSL
Research
University,
Centre
for
robotics,
60
Bd
St
Michel
75006
Paris,
France.
[axel.barrau,silvere.bonnabel]@mines-paristech.fr
on the past inputs and outputs. Filtering is a branch of
stochastic process theory that has been strongly driven
by applications in control and signal processing. In the
particular case where the system is linear and corrupted
by independent white Gaussian process and observation
noises, the celebrated Kalman ﬁlter [20] solves the ﬁlter-
ing problem. Yet, when the system is non-linear there
is no general method to derive efﬁcient ﬁlters, and their
design has encountered important difﬁculties in practice.
Indeed, general ﬁltering equations describing the evolu-
tion of the state conditioned on past outputs can also be
derived, see e.g., [10] and references therein. However,
the equations are not closed, and in most cases the con-
ditional densities can not be computed by solving a ﬁnite
dimensional equation.
In engineering applications the most popular solution
is the extended Kalman ﬁlter (EKF), which does not pos-
sess general theoretical properties, especially in a stochas-
tic context, and is sometimes prone to divergence. The
EKF amounts to linearize the system around the esti-
mated trajectory, and build a Kalman ﬁlter for the linear
model, which can in turn be implemented on the non-
linear model. Despite the fact that it is subject to sev-
eral problems both theoretical and practical, one merit of
the EKF is to convey an estimation of the mean state ˆx(t)
together with an approximation of the covariance matrix
P(t) of the error, yielding a (trusted) ellipsoidal credible
set that reﬂects the level of uncertainty in the estimation.
In order to capture more closely the distribution, several
rather computationally extensive methods have recently
attracted growing interest such as particle ﬁlters and un-
scented Kalman ﬁlters (UKF) [19].
In this paper we consider a general ﬁltering problem on
Lie groups in a stochastic setting, and a class of simple
arXiv:1310.2539v2  [cs.SY]  12 Sep 2014
recursive ﬁnite dimensional ﬁlters, having a similar struc-
ture as the EKF, is introduced. Such recursively deﬁned
ﬁlters allow straightforward digital computer implemen-
tation. Besides, as the Kalman ﬁlter, the corresponding
algorithms will be readily implementable with very little
understanding of the theory leading to their development,
as illustrated by the concrete examples of sections 4.2,
4.3 and 5.3, which can be read independently of the rest
of the paper. As concerns their theoretical properties, they
are motivated by the three following desirable properties
a simple recursive ﬁlter should ideally have: 1- forget-
ting of the initial condition, 2- being able to capture the
extent of uncertainty in the estimate, 3- being efﬁcient,
that is, minimizing the error’s dispersion within a given
class. In the ﬁltering problem, the estimation error is ini-
tially distributed according to a prior distribution provided
by the practitioner. The rationale for the ﬁrst property is
that, as the ﬁlter acquires information, the prior informa-
tion should be given less (eventually no) importance by
the ﬁlter. As the user generally possesses very little in-
formation on the system’s state initially, the prior can be
very far from the truth. Forgetting the initial condition
allows the (subjective Bayesian) prior distribution to be
as informative as noninformative without impacting the
ﬁnal estimates. Moreover, it corresponds to convergence
properties of the ﬁlter, and thus deals with the ability to re-
cover from catastrophic failures. In robotics, this leads to
robustness to erroneous beliefs due for instance to percep-
tual ambiguity. The second property deals with the cen-
tral motivation behind stochastic approaches. Indeed, all
deterministic observers, no matter how strong properties
they may have, have a common fundamental weakness: in
the presence of sensor and model uncertainties, they pro-
vide no indication of the extent of uncertainty conveyed
by the estimate. Being able to compute credible sets for
the estimate may be needed in order to adapt control deci-
sions to the situation, for instance to guarantee an absence
of collision. The two ﬁrst properties are general and ap-
ply to any class of sensible ﬁlters. The third property ad-
dresses the issue of gain tuning. When working with a
class of (suboptimal) ﬁlters, the gains can be tuned over a
training set in order to minimize the level of uncertainty
conveyed by the estimate.
Several approaches to ﬁltering for systems possessing
a geometric structure have been developed in previous lit-
erature. For stochastic processes on Riemannian mani-
folds [18, 14] some results have been derived, see e.g.,
[29]. The speciﬁc situation where the process evolves in
a vector space but the observations belong to a manifold
has also been considered, see e.g. [33, 13] and more re-
cently [36]. For systems on Lie groups powerful tools to
study the ﬁltering equations (such as harmonic analysis
[45, 32, 24, 43]) have been used, notably in the case of bi-
linear systems [44] and estimation of the initial condition
of a Brownian motion in [15]. An extension of Gaussian
distributions has also been developed and recently used
for Bayesian ﬁltering on Lie groups in [46] and also in
[7] which devises a general adaptation of the EKF to Lie
groups, without exploiting the geometrical properties of
the speciﬁc invariant case such as in the present paper. A
somewhat different but related approach to ﬁltering con-
sists of ﬁnding the path that best ﬁts the data in a deter-
ministic setting. It is thus related to optimal control theory
where geometric methods have long played an important
role [9]. A certain class of least squares problems on the
Euclidian group has been tackled in [17]. This approach
has also been used to ﬁlter inertial data recently in [34],
[26].
Although the paper is concerned with a general the-
ory on (matrix) Lie groups, the several derived ﬁlters will
systematically be applied to a benchmark and motivating
ﬁltering example on the special orthogonal group SO(3).
The problem of attitude estimation consists of estimating
the orientation of a rigid body, which proves to be a cen-
tral task in aeronautics, in order to stabilize or guide the
system, or in mobile robotics, where orientation is a part
of the robot’s localization. When the system is equipped
with gyroscopes, the delivered signals can be integrated
over time to compute an orientation increment. However,
the initial orientation may not be known, and even if it is,
the gyroscopes are subject to noise and drift, so that pure
integration may quickly result in a very erroneous atti-
tude. Thus the estimations are regularly checked against
some vectors’ orientation, such as the earth magnetic ﬁeld
measured by magnetometers, or the earth gravity vector
measured by accelerometers under quasi-stationary ﬂight
assumptions. The explosion over the last decade of low-
cost noisy sensors integrated in unmanned aerial vehicles
(UAV’s), and the limited computational power on-board,
have been a driving force for the development of simple
ﬁlters on SO(3). Work on the deterministic problem dates
back to [37, 30] and has received much attention recently,
see [42, 27, 26, 4, 28, 38] to cite a few, and some related
issues have been addressed as well, like angular velocity
estimation [40] or attitude control [37, 39]. The beauty of
the intrinsic problem formulation on SO(3) and the prop-
erties stemming from it have been exploited in various
papers in the deterministic case, see e.g. [26, 21, 5]. The
problem has also been cast into a minimum energy ﬁl-
tering framework in [25]. In the stochastic framework, a
general review of the classical techniques based on Gaus-
sian assumptions is proposed in [27].
A ﬁrst attempt
to systematically exploit the invariance properties to de-
sign stochastic ﬁlters on SO(3) appears in the preliminary
work [1].
The contributions and the organization of the present
paper are as follows. In Section 2, the problem of stochas-
tic ﬁltering on Lie groups with a continuous-time noisy
model and discrete-time observations is rigorously posed.
From a theoretical viewpoint, the choice of discrete-time
measurements allows to circumvent some technical difﬁ-
culties and to cast the problem into a perfectly rigorous
framework. As this work is mainly guided by applica-
tions to aerospace, this choice is relevant as the evolution
equations of the orientation generally rely on the integra-
tion of noisy inertial measurements that are delivered at
a high rate, whereas the observations may be delivered
at low(er) rate (GPS, vision). In order to derive proper
ﬁlters, the problem is then transformed into a complete
discrete-time model. The discretization is not straightfor-
ward because unlike the linear case, difﬁculties arise from
non-commutativity.
Guided by the theory of symmetry-preserving ob-
servers that has been exclusively concerned with the
continuous-time and deterministic case so far, we propose
in Section 3 a class of natural ﬁlters. They appear as a
mere transposition of linear observers (such as the linear
Kalman ﬁlter) to the group, but where the addition has
been replaced with the group multiplication law (due to
non-commutativity, there is always some freedom in the
transposition as left and right multiplications are differ-
ent operations). We prove the proposed ﬁlters retain the
invariances of the problem, and that the discrete-time er-
ror’s evolution is independent of the system’s trajectory,
inheriting the properties of the deterministic continuous-
time case [5] (or merely the linear case). Thus, the gains
can be computed off-line, and the proposed ﬁlters require
very few on-board computational resources. This is one
of their main advantages over current ﬁlters for the atti-
tude estimation problems considered in simulations.
The proposed class is very broad and the tuning issue
is far from trivial. Sections 4 and 5 explore two different
routes, depending on if we are interested in the asymp-
totic or transitory phase. In Section 4, we propose to hold
the gains ﬁxed over time. As a result, the error equation
becomes a homogeneous Markov chain. We ﬁrst prove
some new results about global convergence in a determin-
istic and discrete-time framework. Then, building upon
the homogeneity property of the error, we prove that if
the ﬁlter with noise turned off admits almost global con-
vergence properties, the error with noise turned on con-
verges to a stationary distribution. Mathematically this is
a very strong and to some extent surprising result. From a
practical viewpoint, the gains can be tuned numerically to
minimize the asymptotic error’s dispersion. This allows
to “learn” sensible gains for very general types of noises.
The theory is applied to two examples and gives conver-
gence guarantees in each case. First an attitude estimation
problem using two vector measurements and a gyroscope
having isotropic noise (Section 4.2), then the construc-
tion of an artiﬁcial horizon with optimal gains, for a non-
Gaussian noise model (Section 4.3). Each application is
a contribution in itself and can be implemented without
reading the whole paper.
In Section 5, we propose to optimize the convergence
during the transitory phase using Gaussian approxima-
tions. We ﬁrst introduce a method of engineering inter-
est: the Invariant Extended Kalman Filter (IEKF) in dis-
crete time. The IEKF was introduced in continuous time
in [3] and developped and applied to localization prob-
lems in [6] and very recently in [2]. A discrete time ver-
sion on SO(3) was proposed in the preliminary paper [1].
The idea consists of linearizing the invariant error for the
class of ﬁlters introduced in Section 3 around a ﬁxed point
(the identity element of the matrix group), build a Kalman
ﬁlter for the linear model and implement it on the non-
linear model. As the linearizations always occur around
the same point, the linearized model is time-invariant and
thus the Kalman gains, as well as the Kalman covariance
matrix, are proved to converge to ﬁxed values. When on-
board storage and computational resources are very lim-
ited, this advantageously allows to replace the gain with
its asymptotic value. The IEKF is compared to the well-
known MEKF [22, 12] and UKF [19, 11] on the attitude
estimation problem, and simulations illustrate some con-
vergence properties that the latter lack. In the case where
the error equation is totally autonomous, we introduce
a new method based on off-line simulations, the IEnKF,
which outperforms the other ﬁlters in case of large noises
by capturing very accurately the error’s dispersion.
2
Problem setting
2.1
Considered continuous-time model
Consider a state variable χt taking values in a matrix
Lie group G with neutral element Id, and the following
continuous-time model with discrete measurements:
d
dt χt = (υt +wt)χt + χtωt
(1)
Yn = h(χtn,Vn)
(2)
where υt and ωt are inputs taking their values in the Lie
algebra g (i.e. the tangent space to the identity element of
G), wt is a continuous white Langevin noise with diffu-
sion matrix Σt, whose precise deﬁnition will be discussed
below in Subsection 2.3. (Yn)n⩾0 are the discrete-time ob-
servations, belonging to some measurable space Y and
Vn is a noise taking values in Rp for an integer p > 0. We
further make the following additional assumption:
Assumption 1 (left-right equivariance)
The output
map h is left-right equivariant, i.e. there exists a left action
of G on Y such that we have the equality in law:
∀g,χ ∈G, h(χg,Vn) L= g−1h(χ,Vn)
(3)
and h(χg,0) = g−1h(χ,0)
(4)
The reader who is not familiar with stochastic calcu-
lus and Lie groups can view χt as a rotation matrix, and
replace the latter property with the following more restric-
tive assumption:
Assumption 1 bis
For any n ⩾0 there exists a vector b
such that h(χ,Vn)=χ−1(b+Vn).
The assumptions could seem restrictive but are veriﬁed
in practice in various cases as shown by the following ex-
amples, which will provide the reader with a more con-
crete picture than the formalism of Lie groups.
2.2
Examples
2.2.1
Attitude estimation on ﬂat earth
Our motivating example for the model (1)-(2) is the atti-
tude estimation of a rigid body assuming the earth is ﬂat,
and observing two vectors:
d
dt Rt = Rt(ωt +wω
t )×,
Yn = (RT
tnb1 +V 1
n ,RT
tnb2 +V 2
n ),
(5)
where as usual Rt ∈SO(3) represents the rotation that
maps the body frame to the earth-ﬁxed frame, and where
ωt ∈R3 is the instantaneous rotation vector, and wω
t ∈R3
is a continuous Gaussian white noise representing the gy-
roscopes’ noise. We have let (x)× ∈R3×3 denote the skew
matrix associated with the cross product with a three di-
mensional vector, i.e., for a,x ∈R3, we have (x)×a =
x × a. (Yn)n≥0 is a sequence of discrete noisy measure-
ments of two vectors b1,b2 of the earth ﬁxed frame ver-
ifying b1 × b2 ̸= 0, and V 1
n and V 2
n are sequences of in-
dependent isotropic Gaussian white noises. Note that the
noise wω
t is deﬁned, as the input ωt, in the body frame (in
other words it is multiplied on the left). Thus equations
(5) do not match (1)-(2) which correspond to a noise de-
ﬁned in the earth-ﬁxed frame. This can be remedied in
the particular case where the gyroscope noise is isotropic,
a restrictive yet relevant assumption in practice.
Deﬁnition 1. A Langevin noise wt of g is said isotropic if
˜wt := gwtg−1 L= wt for any g ∈G.
Note that for a noise taking values in so(3) this deﬁni-
tion corresponds to the physical intuition of an isotropic
noise of R3. With this additional assumption the equation
can be rewritten:
d
dt Rt = ( ˜wω
t )×Rt +Rt(ωt)×,
Yn = (RT
tnb1 +V 1
n ,RT
tnb2 +V 2
n ),
which corresponds to (1)-(2).
Remark 1. If the gyrometer noise is not isotropic the new
noise ˜wω
t is related to Rt by ˜wω
t = Rtwω
t RT
t . Depending
on the degree of anisotropy this can prevent the use of
methods based on the autonomy of the error (see 4) but
not of the discrete-time IEKF (see 5.1).
2.2.2
Attitude estimation on a round rotating Earth
Another interesting case is attitude estimation on SO(3)
using an observation of the vertical direction (given by an
accelerometer) and taking into account the rotation of the
earth. Whereas the ﬂat earth assumption perfectly suits
low-cost gyroscopes, precise gyroscopes can measure the
complete attitude by taking into account the earth’s ro-
tation. We deﬁne a geographic frame with axis North-
west-up. The attitude Rt is the transition matrix from the
body reference to the geographic reference. The earth in-
stantaneous rotation vector υ ∈R3 and the gravity vector
g ∈R3 are expressed in the geographic reference. Both
are constant in this frame. The gyroscope gives a con-
tinuous rotation speed ωt, disturbed by an isotropic con-
tinuous white noise wω
t . The equation of the considered
system reads:
d
dt Rt = (υ)×Rt +Rt(ωt +wω
t )×,
Yn = RT
tn(g+Vn)
As wω
t is supposed isotropic the equation can be rewritten:
d
dt Rt = (υ + ˜wω
t )×Rt +Rt(ωt)×,
Yn = RT
tn(g+Vn),
which corresponds to (1)-(2).
2.2.3
Attitude and velocity estimation
We give here an example on a larger group. Consider the
attitude Rt ∈SO(3) and speed vt ∈R3 of an aircraft evolv-
ing on a ﬂat earth, equipped with a gyroscope and an ac-
celerometer. The gyroscopes gives continuous increments
ψt ∈R3 with isotropic noise wψ
t ∈R3, and the accelerom-
eters gives continuous increments at ∈R3 with isotropic
noise wa
t ∈R3. The aircraft has noisy speed measure-
ments in the earth reference frame Yn = vtn +Vn, where
Vn is a supposed to be a Gaussian isotropic white noise.
The equations read:
d
dt Rt = Rt(ψt +wψ
t )×,
d
dt vt = Rt(at +wa
t )+g,
where Rt is the rotation mapping the body frame to the
earth-ﬁxed frame, vt is the velocity expressed in the earth
ﬁxed frame, and g is the earth gravity vector, supposed to
be (locally) constant. Using the matrix Lie group SE(3)
we introduce:
At =

RT
t
RT
t vt
0
1

; ωt =

0
g
0
0

υt =

−(ψt)×
at
0
0

; wt =

(wψ
t )×
wa
t
0
0

The problem can then be rewritten under the form (1)-(2):
d
dt At = (υt +wt)At +Atωt,
Ytn = A−1
tn

Vn
1

,
2.3
Interpretation of Langevin noises on Lie
Groups
This present subsection provides some mathematical con-
siderations about white noises on Lie groups. It can be
skipped by the uninterested reader who is directly referred
to the model (6).
Equation (1) is actually a Langevin equation that suf-
fers from poly-interpretability because of its non-linearity,
and its meaning must be clariﬁed in the rigorous frame-
work of stochastic calculus. Stratonovich stochastic dif-
ferential equations on a Lie group can be intrinsically de-
ﬁned as in e.g. [23]. A somewhat simpler (but equiva-
lent) approach consists of using the natural embedding of
the matrix Lie group G in a matrix space and to under-
stand the equation in the sense of Stratonovich, see e.g.
[35]. The mathematical reasons stem from the fact that
the resulting stochastic process is well-deﬁned on the Lie
group, whereas this is not the case when opting for an Ito
interpretation as underlined by the following easily prov-
able proposition.
Proposition 1. If the stochastic differential equation (1)
is taken in the sense of Itˆo, for G = SO(3) embedded in
R3×3, an application of the Itˆo formula to χT
t χt shows
that the solution almost surely leaves the submanifold G
for any time t > 0.
Besides, the physical reasons for this stem from the fact
that the sensors’ noise are never completely white, and
for colored noise the Stratonovich interpretation provides
a better approximation to the true solution than Ito’s, as
advocated by the result of Wong and Zakai [47].
2.4
Exact discretization of the considered
model
To treat rigorously the problem of integrating discrete
measurements we need to discretize the continuous model
with the same time step as the measurements’. Unlike the
general case of non-linear estimation, the exact discrete-
time dynamics corresponding to Equation (1) can be ob-
tained, as proved by the following result:
Theorem 1. Let χn = χtn.
Then the discrete system
(χn,Yn) satisﬁes the following equations:
χn+1 = ϒnWnχnΩn,
Yn = h(χn,Vn),
(6)
where Wn is a random variable with values in G and
whose law depends on the values taken by υt for t ∈
[tn,tn+1] and on the law of wt for t ∈[tn,tn+1], and ϒn and
Ωn are elements of G which only depend on the values
taken respectively by υt and ωt for t ∈[tn,tn+1].
Proof. For n ∈N consider the value of the process χt is
known until time tn. Let ϒt and Ωt be the solutions of the
following equations:
ϒtn = Id, d
dt ϒt = υtϒt
and Ωtn = Id, d
dt Ωt = Ωtωt
Let Wt be the solution of the following (Stratonovich)
stochastic differential equation:
Wtn = Id,
d
dtWt = ϒ−1
t
wtϒtWt
Note that ϒ−1
t
wtϒt being in g, Wt is ensured to stay in
G. Deﬁne the process χt|tn = ϒtWtχtnΩt. We will show
that for t > tn the processes χt and χt|tn verify the same
stochastic differential equation. Indeed:
d
dt χt|tn = ( d
dt ϒt)WtχtΩt +ϒt( d
dtWt)χtΩt +ϒtWtχt( d
dt Ω)
= υtϒtWtχtΩt +wtϒtWtχtΩt +ϒtWtχtΩtωt
= (υt +wt)χt|tn + χt|tnωt
Thus the two processes have the same law at time tn+1,
i.e. χn+1 and Wtn+1ϒtn+1χtnΩtn+1 have the same law. Let-
ting ϒn = ϒtn+1, Ωn = Ωtn+1 and Wn = Wtn+1 we obtain the
result.
Remark 2. In many practical situations (for instance ex-
amples 2.2.1 and 2.2.2), the Langevin noise wt is isotropic
and we have thus ϒ−1
t
wtϒt
L= wt. Note that, the variable
Wt depends also only on the law of wt for t ∈[tn,tn+1].
In the sequel, for mathematical reasons (the equa-
tions do not suffer from poly-interpretability), tutorial rea-
sons (the framework of diffusion processes on Lie groups
needs not be known), and practical reasons (any ﬁlter
must be implemented in discrete time), we will system-
atically consider the discrete-time model (6). Moreover,
the noise Wn will be a general random variable in G, not
necessary a solution of the stochastic differential equation
d
dtWt = ϒ−1
t
wtϒtWt.
3
A class of discrete-time intrinsic
ﬁlters
3.1
Preliminary: Kalman ﬁlter for linear
stationary systems with drift
Consider the particular case of a linear system in RN (with
arbitrary N) of the form d
dt Xt = bt +wt, wt being a white
noise and bt a known deterministic input. Assume we
have discrete measurements of the form Yn = HXtn +Vn ∈
Rp, Vn being a Gaussian noise in Rp. A straightforward
discretization yields:
Xn+1 = Xn +Bn +Wn,
Yn = HXn +Vn,
(7)
With Bn =
R tn+1
tn
bsds and Wn =
R tn+1
tn
wsds is the value of a
Brownian motion in RN. The optimal ﬁlter in this case is
the celebrated linear Kalman ﬁlter whose equations are:
ˆX′
n+1 = ˆXn +Bn,
ˆXn+1 = ˆX′
n+1 +Kn+1(Yn+1 −H ˆX′
n+1)
(8)
The ﬁrst equation is the so-called prediction step, and the
second one the correction step.
Let e′
n = ˆX′
n −Xn and
en = ˆXn −Xn denote the predicted and corrected state es-
timation errors, we get:
e′
n+1 = en −Wn,
en+1 = e′
n+1 −Kn+1(He′
n+1 −Vn+1)
(9)
The error equation is autonomous and can be optimized
independently from the dynamical inputs Bn’s. As a con-
sequence, the optimal gains Kn can be computed off-line.
It is a well-known result that they (as well as the error
distribution) converge under observability and detectabil-
ity assumptions, so all computations can be done off-line
and only the ﬁrst values need be stored, in which case
only the equations (8) must be computed on-line, requir-
ing extremely restricted computational power on board.
Mimicking the form of equations (8) on the Lie group G
where the addition is naturally replaced with the group
multiplication law, leads to a similar result, as shown in
the next section.
3.2
Proposed intrinsic ﬁlters
Inspired by the linear case and the theory of continuous-
time symmetry-preserving observers on Lie groups [5] we
propose to mimic the prediction and update steps (8) re-
placing the addition with the group multiplication yield-
ing the class of ﬁlters deﬁned by:
ˆχ′
n+1 = ϒn ˆχnΩn,
(10)
ˆχn+1 = Kn+1( ˆχ′n+1Yn+1) ˆχ′n+1,
(11)
where Kn+1(•) can be any function of Y →G, ensuring
K(h(Id,0)) = Id . There are two ways to understand the
links between (8) and (10)-(11). First, by deﬁning a an
estimation error on the group G by χn+1 ˆχ−1
n+1 (see below),
and by using the left-right equivariance hypothesis, which
allows to interpret ˆχ′n+1Yn+1 = h(χn+1 ˆχ′−1
n+1,Vn+1) as a
group equivalent to H(X′
n+1 −ˆXn+1)+Vn+1. Secondly, by
viewing the linear case of Section 3.1 as a speciﬁc case of
the proposed approach through the following proposition
whose proof has been moved to the Appendix.
Proposition 2. There exists an isomorphic representation
of RN as a matrix Lie group such that (7) takes the canoni-
cal form (1)-(2), and (8) becomes the invariant ﬁlter (10)-
(11).
We now deﬁne the invariant output errors, which are
a transposition of the linear error to our multiplicative
group:
ηn = χn ˆχ−1
n ,
η′
n = χn ˆχ′−1
n
(12)
We have the following striking property, that is similar to
the linear case:
Theorem 2. The error variables ηn and η′
n are Markov
processes, and are independent of the inputs Ωn’s.
Proof. The equations followed by η′
n and ηn read:
η′
n+1 = χn+1 ˆχ′−1
n+1 = ϒnWnχnΩnΩ−1
n
ˆχ−1
n ϒ−1
n = ϒnWnηnϒ−1
n
(13)
and:
ηn+1 = χn+1 ˆχ′−1
n+1Kn+1( ˆχ′n+1Yn+1)−1
= η′
n+1Kn+1( ˆχ′n+1h(χn+1,Vn+1))−1
L= η′
n+1Kn+1(h(η′
n+1,Vn+1))−1,
(14)
thanks to the equivariance property of the output.
The most important consequence of this property is that
if the inputs υt are known in advance, or are ﬁxed, as it
is the case in Examples 2.2.1 and 2.2.2, the gain func-
tions Kn can be optimized off-line, independently of the
trajectory followed by the system. In any case, numerous
choices are possible to tune the gains Kn and the remain-
ing sections are all devoted to various types of methods to
tackle this problem.
4
Fixed gains ﬁlters
In certain cases, one can build an (almost) globally con-
vergent observer for the associated deterministic system,
i.e., with noise turned off, by using a family of constant
gain function Kn(•) ≡K(•). If the ﬁlter with noise turned
off has the desirable property of forgetting its initial con-
dition, convergence to a single point is impossible to re-
tain, because of the unpredictability of the noises that
“steer” the system, but convergence of the distribution can
be expected, assuming as in the linear case that ϒn is inde-
pendent of n. Indeed in this section we prove that, when
noise is turned on, the error forgets its initial distribution
under mild conditions. The results are illustrated by an
attitude estimation example for which we propose an in-
trinsic ﬁlter having strong convergence properties.
It should be noted that in practice, the convergence to
an invariant distribution allows in turn to pick the most
desirable gain K(•) among the family based on a per-
formance criterion, such as convergence speed, or ﬁlter’s
precision (that is, ensuring low error’s dispersion). This
fact will be illustrated by an artiﬁcial horizon example of
Subsection 4.3.
4.1
Convergence results
Here the left-hand inputs ϒn are assumed ﬁxed. This, to-
gether with constant gains Kn, makes the error sequence
ηn a homogeneous Markov chain. Thus, under appropri-
ate technical conditions, the chain has a unique stationary
distribution and the sequence converges in distribution to
this invariant distribution. Let d denote a right-invariant
distance on the group G. We propose the following as-
sumptions:
1. Conﬁnement of the error: there exists a compact set
C such that ∀n ∈N,ηn ∈C a.s. for any η0 ∈C
2. Diffusivity: the process noise has a continuous part
with respect to Haar measure, with density positive
and uniformly bounded from zero in a ball of radius
α > 0 around Id.
3. Reasonable
output
noise:
∀g
∈
G,
P[gK(h(g,Vn))−1 ∈Bo(gK(h(g,0))−1, α
2 )] > ε′
for some ε′ > 0.
The second assumption implies, and can in fact be re-
placed with, the more general technical assumption that
there exists ε > 0 such that for any subset U of the ball
Bo(Id,α) we have P(Wn ⊂U) > εµ(U) for all n ≥0
where µ denotes the Haar measure. Those noise proper-
ties are relatively painless to verify, whereas the conﬁne-
ment property although stronger is automatically veriﬁed
whenever G is compact, e.g. G = SO(3). Intuitively, the
last two assumptions guarantee the error process is well
approximated by its dynamics with noise turned off, fol-
lowed by a small diffusion. In the theory of Harris chains,
the latter diffusion step is a key element to allow proba-
bility laws to mix at each step and eventually forget their
initial distribution.
Theorem 3. For constant left-hand inputs ϒn ≡ϒ, con-
sider the ﬁlter:
ˆχ′
n+1 = ϒ ˆχnΩn
(15)
ˆχn+1 = K( ˆχ′n+1Yn+1) ˆχ′n+1
(16)
Suppose that Assumptions 1)-2)-3) are veriﬁed, where the
compact set satisﬁes C = cl(Co), cl denoting the closure
and o the interior. When noises are turned off, the error
equation (13)-(14) becomes:
γ′
n+1 = ϒγnϒ−1,
γn+1=γ′
n+1Kn+1(h(γ′
n+1,0))−1,
(17)
Suppose that for any γ0 ∈C, except on a set of null Haar
measure, γn converges to Id. Then there exists a unique
stationary distribution π on G such that for any prior law
µ0 of the error η0 supported by C, the law (µn)n≥0 of
(ηn)n≥0 satisﬁes the total variation (T.V.) norm conver-
gence property:
lim
n
∥µn −π∥T.V. →0
Corollary 1. When the group G is compact, The conver-
gence results of Theorem 3 hold globally, i.e. without the
conﬁnement assumption 1).
Theorem 4. Under the assumptions of Theorem 3, as-
suming only h(γn,0) →0 instead of almost global con-
vergence of (γn)n≥0, that G is compact, the set K = {g ∈
G,h(g,0) = h(Id,0)} connected and h(ϒ,0) = h(Id,0), the
results of Theorem 3 are still valid. Moreover, if Wn is
isotropic, we have π(˜ϒV) = π(V) for any ˜ϒ ∈K commut-
ing with ϒ (˜ϒϒ = ϒ˜ϒ).
The proofs of the results above have all been moved to
the Appendix.
4.2
Application to attitude estimation
Consider the attitude estimation example of Subsection
2.2.1. In a deterministic and continuous time setting, al-
most globally converging observers have been proposed
in several papers (see e.g. [37, 26]) and have since been
analyzed and extended in a number of papers. In order
to apply the previously developed theory to this example,
the challenge is twofold. First, the deterministic observer
must be adapted to the discrete time and proved to be al-
most globally convergent. Then, the corresponding ﬁlter
must be proved to satisfy the assumptions of the theorems
above in the presence of noise. In discrete time, the sys-
tem equations read:
Rn+1 =WnRnΩn,
Yn =(RT
n b1 +V 1
n ,RT
n b2 +V 2
n )
with the notations introduced in 2.2.1. We propose the
following ﬁlter on SO(3):
ˆR′
n+1 = ˆRnΩn,
ˆRn+1 = K( ˆR′n+1Yn+1) ˆR′n+1,
with K(y1,y2) = exp(k1(y1 ×b1)+k2(y2 ×b2)),
k1 > 0, k2 > 0, k1 +k2 ⩽1
(18)
Proposition 3. With noise turned off, the discrete invari-
ant observer (18) is almost globally convergent, that is,
the error converges to Id for any initial condition except
one.
The proof has been moved to the Appendix, only the
main idea is given here.
For the continuous-time de-
terministic problem it is known, and easily seen, that
E : γ →k1||γTb1 −b1||2 +k2||γTb2 −b2||2 is a Lyapunov
function, allowing to prove almost global convergence of
the corresponding observer. In the discrete time determin-
istic case, the function above remains a Lyapunov func-
tion for the sequence (γn)n≥0, allowing to derive Propo-
sition 3. This is not trivial to prove, and stems from the
more general following novel result:
Proposition 4. Consider a Lie group G equipped with a
left-invariant metric ⟨.,.⟩, and a left-invariant determinis-
tic discrete equation on G of the form:
γn+1 = γn exp(−k(γn))
Assume there exists a C2 function E : G →R≥0 with
bounded sublevel sets, a global minimum at Id, and a con-
tinuous and strictly positive function u : G →R>0 such
that: ∀x ∈G,k(x) = u(x)[x−1.gradE(x)]. If the condition
∀x ∈G,| ∂k
∂x| ⩽1 (for the operator norm) is veriﬁed, for
any initial value γ0 such that Id is the only critical point
of E in the sublevel set {x ∈G | E(x) ≤E(γ0)} we have:
γn →
n→∞Id
The proof has been moved to the Appendix. Note that,
the latter property is closely related to Lemma 2 of [41].
Using Theorem 3 and Proposition 3 we ﬁnally directly
get:
Theorem 5. The distribution of the error variable of the
invariant ﬁlter (18) converges for the T.V. norm to an
asymptotic distribution, which does not depend on its ini-
tial distribution.
4.3
Learning robust gains: application to
the design of an artiﬁcial horizon
As proved in 4.1, under appropriate conditions the error
variable is a converging Markov chain whose asymptotic
law depends on the gain function but not on the trajectory
followed by the system (which is a major difference with
most nonlinear ﬁlters, such as the EKF). Hence a ﬁxed
gain can be asymptotically optimized off-line, leading to
a very low numerical cost of the on-line update.
A classical aerospace problem is the design of an artiﬁ-
cial horizon using an inertial measurement unit (IMU). An
estimation of the vertical is maintained using the observa-
tions of the accelerometer (which senses the body accel-
eration minus the gravity vector, expressed in the body
frame) and the stationary ﬂight assumption according to
which the body’s linear velocity is constant. The prob-
lem is that this approximation is not valid in dynamical
phases (take-off, landing, atmospheric turbulence), which
are precisely when the artiﬁcial horizon is most needed.
The problem is generally stated as follows:
d
dt Rt = Rt(ωt +wt)×,
Yn = RT
tng+Vn +Nn,
where Rt is the attitude of the aircraft (the rotation from
body-frame coordinates to inertial coordinates), ωt is the
continuous-time gyroscope increment and Yn is the obser-
vation of the accelerometer. The sensor noises wt and Vn
can be considered as Gaussian, and Nn represents ﬂuctu-
ations due to accelerations of the aircraft that we propose
to model as follows: Nn is null with high probability but
when non-zero it can take large values. The Nn’s are as-
sumed to be independent as usually.
4.3.1
Convergence results
Consider the following class of ﬁlters:
ˆR′
n+1 = ˆRnΩn,
ˆRn+1 = K( ˆR′n+1Yn+1) ˆR′n+1,
with K(y) = exp( fk,λ(y))
(19)
where fk,λ(x) = k.min(angle(x,g),λ) x×g
||x×g|| if x × g ̸= 0
and fk,λ(x) = 0 otherwise.
The rationale for the gain tuning is as follows: if the
accelerometer measures a value y, we consider the small-
est rotation giving to g the same direction as y.
Con-
serving the same axis, the angle of this rotation is thresh-
olded (hence the parameter λ) to give less weight to out-
liers (without purely rejecting them, otherwise the ﬁlter
couldn’t converge when initialized too far).
Then we
choose as a gain function a rotation by a fraction of the
obtained angle. We begin with the following preliminary
result:
Lemma 1. For any 0 < k ⩽1 and 0 < λ ⩽π the out-
put error ∥Yn −ˆRT
n g∥associated to the observer (19) with
noise turned off converges to 0.
Proof. Let us consider the error evolution when the noise
is turned off. It writes γn+1 = γn exp(−fk,λ(γ−1
n g)), thus
we have γ−1
n+1g = exp( fk,λ(γ−1
n g))γ−1
n g. As for any n ∈N,
fk,λ(γ−1
n g) is orthogonal to g and γ−1
n g, γ−1
n+1g stays in
the plane spanned by g and γ−1
0 g, as well as the whole
sequence (γ−1
n g)n⩾0. Let φn = angle(γ−1
n g,g). The dy-
namics of (φn) writes: φn+1 = φn −k.min(λ,φn). Thus φn
goes to 0, i.e: γ−1
n g →
n→∞g, i.e. the observation error goes
to 0.
The following result is a mere consequence of Lemma
1 and Theorem 4.
Proposition 5. The error variable associated to the ﬁlter
deﬁned by (19) converges to a stationary distribution for
the T.V. norm, which does not depend on its initial distri-
bution.
4.3.2
Numerical asymptotic gain optimization
To each couple (k,λ) we can associate an asymptotic er-
ror dispersion (computed in the Lie Algebra) associated
to the corresponding stationary distribution, and try to
minimize it.
As all computations are to be done off-
line, the statistics of all distributions can be computed
using particle methods.
Table 1 gives the parameters
of the model used in the following numerical experi-
ment.
Figure 1 displays the Root Mean Square Error
RMSE =
p
E(η∞g−g), computed over a grid for the pa-
rameters (k,λ). The minimum is obtained for k = 0.1202
and λ = 0.0029. If we compare it to a MEKF, we observe
a huge difference. For our asymptotic invariant ﬁlter we
get RMSE = 8.02 × 10−4. For the MEKF, the observa-
tion noise matrix giving the best results leads to the value
RMSE = 4.3 × 10−3. This result is not surprising due to
the fact that the outliers signiﬁcantly pollute the estimates
of the Kalman ﬁlter (see Fig. 1). This illustrates the fact
that when the noise is highly non-Gaussian, an asymptotic
gain with some optimality properties can still be found.
5
Gaussian ﬁlters
The present section focuses more on the transitory phase,
and the gain is computed at each step based on Gaussian
noise approximations and linearizations. Beyond the lo-
cal optimization underlying the gain computation, the two
following ﬁlters have the merit to readily convey an ap-
proximation of the extent of uncertainty carried by the
estimations. We ﬁrst introduce the discrete-time invari-
ant extended Kalman ﬁlter (IEKF), which is an EKF, but
based on the linearization of an invariant state error. Then
we introduce the IEnKF, which computes the empirical
covariance matrix of the error using particles. As all com-
putations can be done off-line, it is easily implementable,
as long as the gains can be stored, and is shown to con-
vey a better indication of the error’s dispersion for large
noises. Simulations illustrate the results.
5.1
The discrete-time Invariant EKF
In the present section we introduce the discrete-time
IEKF. Indeed, as in the standard EKF theory, the idea is to
linearize the error equation (13)-(14) assuming the noises
Figure 1:
Artiﬁcial horizon example. Top plot: off-line
optimization of the parameters of the asymptotic invari-
ant ﬁlter (axis x →λ ; axis y →k ; axis z →Root Mean
Square Error). The highest values are not displayed to
improve visualization. The performance is optimal for
k = 0.1202,λ = 0.0029.
Bottom plot: error evolution
of three artiﬁcial horizons : the invariant ﬁlter with op-
timized gain functions (black), a MEKF ignoring outliers
(red), and a MEKF where the observation noise covari-
ance matrix has been numerically adjusted to minimize
the RMSE (dashed line). We see that the MEKF cannot
ﬁlter the outliers and at the same time be efﬁcient in the
absence of outliers, contrarily to the proposed invariant
ﬁlter.
and the state error are small enough, use Kalman equa-
tions to tune the gains on this linear system, and imple-
ment the gain on the nonlinear model. As in the heuristic
theory of the Invariant EKF in continuous time [3] [6] the
gains and the error covariance matrix are proved to con-
verge to a ﬁxed value. In particular this allows (after some
time) to advantageously replace the gain by its constant ﬁ-
nal value leading to a numerically very cheap asymptotic
version of the IEKF.
Algorithm 1 Invariant Extended Kalman Filter
Returns at each time step ˆχn, Pn such that χn ≈
exp(ξn) ˆχn, where ξn is Gaussian and var(ξn) = Pn. ˆχ0
and P0 are inputs.
Hξ,HV deﬁned by:
h(exp(ξ ′
n),Vn) = h(Id,0)+Hξξ ′
n+HVVn+O(∥ξ ′
n∥2)+O(∥Vn∥2)
Set n = 0.
loop
Compute the value Mtn+1 solving the equation
Mtn = 0,
d
dt Mt = Var(wt)+adυtMtadT
υt
The process noise covariance Qw
n = Var(Adϒnwn) is
equal to Mtn+1
 if wn is isotropic or if υt ≡0, merely
set Qw
n = Var(wn)(tn+1 −tn)

,
QV
n+1 = Var(Vn+1),
Pn+1|n = AdϒnPnAdT
ϒn +Qw
n ,
Sn+1 = HVQV
n+1HT
V +HξPn+1|nHT
ξ ,
Ln+1 = Pn+1|nHT
ξ S−1
n+1,
Pn+1 = (I −Ln+1Hξ)Pn+1|n,
ˆχ′
n+1 = ϒn ˆχnΩn,
ˆχn+1 = exp(Ln+1[ ˆχ′n+1Yn+1 −h(Id,0)]) ˆχ′n+1
n ←n+1
end loop
5.1.1
Linearization of the equations and IEKF for-
mulas
We consider here the error equations (13)-(14). Assuming
errors and noises are small, we introduce their projection
in the tangent space at the identity Id (the so-called Lie
algebra g of the group) using the matrix exponential map
exp(.) (see e.g. [31]). As the matrix vector space g can be
identiﬁed with Rdimg using the linear mapping (.)m (see
Table 2) we can assume exp : Rdimg →G. This function is
deﬁned in Table 2 for usual Lie groups of mechanics. Its
inverse function will be denoted by exp−1. We thus deﬁne
the following quantities of Rdimg:
ξn = exp−1(ηn),
ξ ′
n = exp−1(η′
n),
wn = exp−1(Wn)
The simplest way to design the gain is to use a function
which is linear in Rdimg and then mapped to G through
the exponential:
Kn : y →exp[Ln(y−h(Id,0))]
Equations (13) and (14) mapped to Rdimg become:
ξ ′
n+1 = exp−1(exp(Adϒnwn)exp(Adϒnξn)),
ξn+1 = exp−1(exp(ξ ′
n+1)exp[−Ln(h(exp(ξ ′
n+1,Vn+1))−h(Id,0))]),
where we have introduced the adjoint matrix Ad:
Deﬁnition 2. The following property of the adjoint matrix
Ad can be used as a deﬁnition:
∀g ∈G,∀u ∈Rdimg, exp(Adgu) = gexp(u)g−1
The differential of ξ →Adexp(ξ) at zero is denoted ξ →
adξ.
To tune the gains Ln through the Kalman theory we
need to evaluate at each time step, using Pn = Var(ξn) ∈
Rdimg×dimg, the following quantities:
Pn+1|n = Var(ξ ′
n+1),
Sn+1 = Var(Yn+1)
Our approach consists of using a coarse ﬁrst-order de-
velopment of the propagation and observation functions.
To do so we use the Baker-Campbell-Hausdorff formula
and keep only ﬁrst order terms in ξ ′
n+1, wn and Vn+1, the
crossed terms being also neglected:
ξ ′
n+1 = Adϒnwn +Adϒnξn
h(exp(ξ ′
n+1),Vn+1) = h(Id,0)+Hξξ ′
n+1 +HVVn+1
ξn+1 = ξ ′
n+1 −Ln+1(Hξξ ′
n+1 +HVVn+1)
The computation of the gains is then straightforward using
the Kalman theory in Rdimg, which is a vector space. It is
described in Algorithm 1.
5.1.2
Convergence of the gains
The main beneﬁt of the ﬁlter is with respect to its conver-
gence properties. Indeed, under very mild conditions, the
covariance matrix Pn and the ﬁlter’s gain Kn are proved to
converge to ﬁxed values [20]. The practical consequences
are at least twofold: 1-the error covariance converges to
a ﬁxed value and is thus much easier to interpret by the
user than a matrix whose entries keep on changing (see
Fig. 2) 2-due to computational limitations on-board, the
covariance may be approximated by its asymptotic value,
leading to an asymptotic version of the IEKF being nu-
merically very cheap.
Theorem 6. If the noise matrices Qw
n = Var(wn), QV
n =
Var(Vn) and the left inputs ϒn are ﬁxed, the pair
(Adϒn,Hξ) is observable and Qw
n has full rank, then Pn
and Kn converge to ﬁxed values P∞and K∞, as in the lin-
ear time-invariant case.
5.2
The discrete-time Invariant EnKF
When the error equation is independent of the inputs, the
exact density of the error variable can be sampled off-
line. This allows to radically improve the precision of
the quantities involved in the Kalman gains computation
of Section 5.1.1. We propose here the Invariant Ensemble
Kalman ﬁlter (IEnKF) described in Algorithm 2. The idea
is to compute recursively through Monte-Carlo simula-
tions a sampling of the error density and to use it, instead
of linearizations, to evaluate precisely the innovation and
error covariance matrices used to compute the gains in the
Kalman ﬁlter. The procedure is described in Algorithm 2.
5.3
Simulations
We will display here the results of Algorithms 1 and 2
for the attitude estimation problem described in Section
2.2.1:
d
dt Rt = Rt(ωt +wω
t )×,
Yn = (RT
tnb1 +V 1
n ,RT
tnb2 +V 2
n ),
where Rt ∈SO(3) represents the rotation that maps the
body frame to the earth-ﬁxed frame, and ωt is the instan-
taneous angular rotation vector which is assumed to be
measured by noisy gyroscopes. (Yn)n≥0 is a sequence of
Algorithm 2 IEnKF
Deﬁne H by h(exp(ξ),0) = h(Id,0)+Hξ +O(∥ξ∥2)
Sample M particles (ηi
0)1⩽i⩽M following the prior error
density
for n = 0 to N −1 do
for i = 1 to M do
η′i
n+1 = W i
nϒηi
nϒ−1,
yi
n+1 = h(η′i
n+1,V i
n+1),
end for
Pn+1|n = 1
M ∑M
i=1 exp−1(η′i
n+1)exp−1(η′i
n+1)T
Sn+1 = 1
M ∑M
i=1 yi
n+1yi
n+1
T,
Ln+1 = Pn+1|nHTS−1
n+1
for i = 1 to M do
ηi
n+1 = η′i
n+1 exp(−Ln+1[yi
n+1 −h(Id,0)])
end for
Store the gain matrix Ln+1
end for
The prior estimation ˆχ0 is an input.
for n = 0 to N-1 do
ˆχ′
n+1 = ϒ ˆχnΩn,
ˆχn+1 = exp(Ln+1[ ˆχ′n+1Yn+1 −h(Id,0)]) ˆχ′n+1
end for
discrete noisy measurements of the vectors b1 and b2 (for
instance the gravity and the magnetic ﬁeld), V 1
n and V 2
n are
sequences of independent isotropic white noises. A sim-
ulation has been performed using the parameters given in
Table 3. In this case the IEKF equations are described by
Algorithm 3. As the state-of-the-art methods for this par-
ticular problem are the Multiplicative Extended Kalman
Filter (MEKF, see e.g. [12]) and the USQUE ﬁlter [11]
(a quaternion implementation of the Unscented Kalman
Filter), the four methods are compared. The evolution of
the gains is displayed on Fig. 2 and shows the interest of
the invariant approach. The error variable is expressed in
the Lie algebra and its projection to the ﬁrst axis is given
on Fig. 3 for each method (the projections on other axes
being very similar, they are not displayed due to space
limitations). All perform satisfactorily, but some differ-
ences are worthy of note. First we can see that the MEKF
and the IEKF yield comparable performances but only the
gains of the IEKF converge. Moreover, the linearizations
lead the MEKF and the IEKF to fail capturing accurately
the error dispersion through a 3σ-envelope. To this re-
spect, the USQUE performs better than the two latter ﬁl-
ters but still does not succeed in capturing the uncertainty
very accurately. On the other hand, the envelope provided
by the EnKF is very satisfying. This result is not surpris-
ing: this envelope has been computed using a sampling of
the true density of the error, there is thus no reason why
it should not be a valid approximation as long as there are
sufﬁciently many particles. This method is thus to be pre-
ferred to the other ones when the user is willing and able
to perform extensive simulations, and has the capacity to
store the gains over the whole trajectory.
Algorithm 3 Invariant Extended Kalman Filter on SO(3)
Returns at each time step
ˆRn, Pn such that Rn ≈
exp(ξn) ˆRn, where ξn is a centered Gaussian and
var(ξn) = Pn.
ˆR0 and P0 are inputs.
Hξ =

(b1)×
(b2)×

, HV = I6,
QV
=

Var(V 1)
0
0
Var(V 2)

,
Qw
=
R tn+1
tn
Var(wω
s )ds.
Set n = 0.
loop
Let Ωn be the solution at tn+1 of Ttn = I3,
d
dt Tt =
(ωt)×.
Pn+1|n = Pn +Qw,
Sn+1 = HVQVHT
V +HξPn+1|nHT
ξ ,
Ln+1 = Pn+1|nHT
ξ S−1
n+1,
Pn+1 = (I −Ln+1Hξ)Pn+1|n,
ˆR′
n+1 = ˆRnΩn
∆χ = Ln+1[
 ˆR′n+1Y 1
n+1
ˆR′n+1Y 2
n+1

−

b1
b2

],
ˆRn+1 = exp(∆χ) ˆR′n+1 (see Table 2)
n ←n+1
end loop
6
Conclusion
This paper has introduced a proper stochastic framework
for a class of ﬁltering problems on Lie groups. A wide
class of discrete-time ﬁlters with strong properties has
Figure 2:
Evolution over time of the coefﬁcients of the
gain matrix for MEKF, IEKF, USQUE and IEnKF. The
coefﬁcients of the MEKF and USQUE have an erratic
evolution, whereas those of the IEKF are convergent, al-
lowing to save much computation power using asymptotic
values. Moreover, in this case, only 4 of the 18 coefﬁ-
cients are not equal to zero for the IEKF, allowing sparse
implementation.
been proposed, along with several methods to tune the
gains. When the gain is held ﬁxed, the estimation error
has the remarkable property to provably converge in dis-
tribution. Another route is to focus more on the transi-
tory phase, and to seek to minimize the dispersion of the
estimation error at each step based on linearizations and
Gaussian noise approximations. This has lead to an in-
variant version of the celebrated EKF, having the property
that the error covariance matrix converges asymptotically,
as well as to an invariant version of the EnKF which has
been shown to convey an accurate indication of the level
of uncertainty carried by the estimate.
The results of the paper open the door to a rich vari-
ety of ﬁlters on Lie groups. Indeed, all the classical gain
tuning approaches can still be considered, and a method
based on the unscented transform for instance, complet-
ing the work [11] seems an interesting idea. In a situation
where many observations are available at each time step,
an information ﬁlter could also be derived mimicking the
IEKF. Besides, we believe that the cases where the gains
of the IEKF are sparse should be given special attention,
Figure 3:
Evolution of the error variable, projected on
the ﬁrst axis, for 1000 simulations using MEKF, IEKF,
USQUE and IEnKF. The MEKF and the IEKF give sim-
ilar results. The USQUE is doing slightly better but is
outperformed by the IEnKF. Only the latter captures the
error dispersion satisfactorily, as the 3σ envelope contains
99 % of the simulated trajectories.
by studying when this property occurs and if signiﬁcant
computation power and memory can be saved. The possi-
bilities offered by off-line simulations deserve also to be
deepened. The computation of exact conﬁdence sets can
be explored, establishing a link with the approach of set-
valued observers proposed in [8] and [38]. Concerning
the asymptotic method introduced here, the optimization
has been performed quite naively as efﬁcient density sim-
ulation is beyond the scope of the paper, but there may
certainly be some room for improvements using Markov
Chain Monte-Carlo (MCMC) schemes to speed up con-
vergence. In the future we will concentrate on proving
the convergence in distribution of the IEKF, that can be
conjectured at this stage, but does not seem easy.
A
Proof of Proposition 2
For any i ∈N, let Mi(R) denote the set of square matri-
ces of size i. Consider the isomorphisms φX : X ∈RN →
IdN
0N,1
XT
1

∈MN+1(R) and φY : Y ∈Rp →

HT
Y T

∈
MN+1,p. Letting χn = φX(Xn), Y ∗
n = φY(Yn), V ∗
n = φY(Vn),
W ∗
n = φX(Wn), Ωn = φX(Bn) and ϒn = IdN+1 equations (7)
become χn+1 = ϒnW ∗
n χnΩn and Y ∗
n = χ−1
n V ∗
n . Using the
inverse isomorphisms φ −1
X
and φ −1
Y , the invariant update
ˆχn+1 = Kn( ˆχ′
n+1Y ∗
n+1) ˆχn+1 corresponds in Rn to (7).
B
Proofs of the results of Section 4.1
We ﬁrst introduce here the basic notions about Harris
Chains that we will need, to prove the stochastic conver-
gence properties of the invariant ﬁlters.
Deﬁnition 3. (First hitting time) Let S be a measurable
space and (Xn)n⩾0 a Markov chain taking values in S. The
ﬁrst hitting time τC of a subset C ⊂S is deﬁned by τC =
infn⩾0 {Xn ∈C}.
Deﬁnition 4. (Recurrent aperiodic Harris chain) Let S be
a measurable space and (Xn)n⩾0 a Markov chain taking
values in S. (Xn)n⩾0 is said to be a recurrent aperiodic
Harris chain if there exist two sets A,B ⊂S satisfying the
following properties:
i For any initial state X0 the ﬁrst hitting time of A is a.s.
ﬁnite.
ii There exists a probability measure ρ on B, and ε > 0
such that if x ∈A and D ⊂B then P(X1 ∈D|X0 = x) >
ερ(D).
iii There exists an integer N ⩾0 such that: ∀x ∈A,∀n ⩾
N,P(Xn ∈A|X0 = x) > 0.
The somewhat technical property ii means that any
given area of B can be reached from each point of A with
non-vanishing probability.
Theorem 7. [Harris, 1956] A recurrent aperiodic Harris
chain admits a unique stationary distribution ρ∞and the
density of the state Xn converges to ρ∞in T.V. norm for
any distribution of X0.
Proof. See [16], Theorems 6.5 and 6.8
In other words, if for any initialization we can ensure
that the process will come back with probability 1 to a
central area A where mixing occurs, we have a conver-
gence property. The following technical result is a cor-
nerstone to demonstrate the theorems of Subsection 4.1.
Lemma 2. Let G be a locally compact Lie group pro-
vided with a right-invariant distance d, and C ⊂G be a
measurable compact set endowed with the σ-algebra Σ.
Consider a homogeneous Markov chain (ηn)n⩾0 deﬁned
by the relation
η0 ∈C,
ηn+1 = qz(ηn)
where z ∈Z is some random variable belonging to a
measurable space. Let 0 denote a speciﬁc point of Z . Let
Q : (C,Σ) →R+, denote the transition kernel of the chain,
that is Q(x,V) = P(qz(ηn) ∈V | ηn = x). We assume that
(a) there exist real numbers α,ε > 0 such that for any x ∈
C and V ⊂Bo(q0(x),α), the latter denoting the open
ball center q0(x) and radius α, we have Q(x,V) =
P(qz(ηn) ∈V | ηn = x) ⩾εµG(V) where µG is the
right-invariant Haar measure.
(b) q0 admits a ﬁxed point x0 ∈C, i.e. q0(x0) = x0.
Let U0 = {x0}. Deﬁne the sets (Un)n⩾0 recursively by
U′
n = {x ∈G,d(x,Un) < α
2 } ∩C and Un+1 = q−1
0 (U′
n). If
there exists an integer N ⩾0 such that UN = C, then the
p.d.f. of ηn converges for the T.V. norm and its limit does
not depend on the initialization.
More prosaically, z denotes the cumulated effect of the
model and observation noise, and it is assumed that 1- the
noiseless algorithm has some global convergence proper-
ties and 2- there are sufﬁciently many small enough noises
so that there is an ε chance for ηn to jump from x to any
neighborhood of the noiseless iterate q0(x) with a multi-
plicative factor corresponding to the neighborhood’s size.
It then deﬁnes a sequence of sets, by picking a ﬁxed point
of q0, dilating it, and taking its pre-image. Dilatation and
inversion are then reiterated until the whole set C is cov-
ered. In this case, forgetting of the initial distribution is
ensured.
Proof. We will demonstrate the property trough three in-
termediate results:
1. There exists a sequence ε1,··· ,εN such that: P(η1 ∈
Un|η0 ∈Un+1) > εn > 0.
2. The ﬁrst hitting time of U1 is a.s. ﬁnite.
3. (ηn)n≥0 is an aperiodic recurrent Harris chain with
A = U1 and B = U′
0.
The conclusion will then immediately follow from The-
orem 7.
We give ﬁrst a qualitative explanation of the
approach.
We have by assumption a sequence of sets
(Un)n⩾0. Intermediate result 1) states that starting from
a set Un+1 the chain has a non-vanishing chance to jump
to the “smaller” set Un at the next time step. Interme-
diate result 2) states that starting from Un+1, because of
the non-vanishing property 1), the chain cannot avoid Un
forever and eventually reaches Un, Un−1 and ﬁnally U1.
Intermediate result 3) brings out the fact that, A = U′
0 be-
ing totally included in the ball of radius α centered on
any of its points, the chain can go from any given point
of its pre-image B = U1 to any area of A with controlled
probability, which is the last property needed to obtain a
recurrent aperiodic Harris chain.
1. Consider the closure U′n of U′
n. For any x ∈U′n the set
Bo(x,α)∩Un is open and non-empty (as d(x,Un) ⩽
α
2 by deﬁnition of U′n) so µG(Bo(x,α) ∩Un) > 0.
The function f : x →µG(Bo(x,α)∩Un) > 0 is con-
tinuous (since µG is a regular and positive mea-
sure) on the compact set U′n, so it admits a mini-
mum mn > 0.
We get: P(η1 ∈Un|η0 ∈Un+1) ≥
P(η1 ∈Bo(q0(η0),α) ∩Un|η0 ∈Un+1) = P(η1 ∈
Bo(q0(η0),α) ∩Un|q0(η0) ∈U′
n) ≥εmn using the
fact that, by assumption (a), Q(x,V) ≥εµG(V) for
x ∈C and V ⊂Bo(q0(x),α). We set εn = εmn to
prove intermediate result 1).
2. We will prove by descending induction on n the
property Pn : P(τn < ∞) = 1.
PN is obvious as
UN =C. Now assume Pn+1 is true for n ⩾1. Due to
homogeneity, we can construct a strictly increasing
sequence of stopping times (νp)p⩾0 such that: ∀p ⩾
0,ηνp ∈Un+1. But P(ηνp+1 ∈Un|ηνp ∈Un+1) > εn
by intermediate result 1). Thus letting Tk denote the
event {∀i ⩽k,ηi /∈Un} = {k < τn}, we have for any
n ⩾0: P(Tνp+1) = P(Tνp)P(Tνp+1|Tνp) ⩽P(Tνp)(1−
εn). Thus P(Tνp+1) ⩽P(Tνp+1) ⩽P(Tνp)(1−εn). A
quick induction and a standard application of Borel-
Cantelli Lemma prove there exists a.s. a rank p such
that Tνp is false. In other words, Pn is true. In par-
ticular, P1 is true: the ﬁrst hitting time of U1 is a.s.
ﬁnite.
3. Deﬁne A =U1 and B =U′
0 = Bo(x0, α
2 )∩C . For any
V ⊂B and x ⊂A we have V ⊂Bo(q0(x),α), thus
Q(x,V) ⩾εµG(V) by assumption (a). Thus (ηn)n⩾0
veriﬁes property ii) of Deﬁnition 4. As A = U1 the
intermediate result 2) shows that (ηn)n⩾0 veriﬁes the
property i) of Deﬁnition 4. As there exists a.s. a
rank k such that ηk ∈U1 by intermediate result 2),
there exists an integer M such that P(ηM ∈U1) > 0.
As U1 ∩U′
0 ⊂Bo(q(x),α) for any x ∈U1 (by deﬁni-
tion of U′
0 and U1) Assumption (a) gives immediately
P(ηM+1 ∈U1∩U′
0) > P(ηM ∈U1)εµG(U1∩U′
0). Us-
ing only the deﬁnition of U′
0 and U1, and Assump-
tion (b), we see easily that both contain x0. Thus the
set U′
0 ∩U1 is non-empty and open. We have then
µG(U1 ∩U′
0) > 0 and P(ηM+1 ∈U1 ∩U′
0) > 0. By
an obvious induction on m, P(ηm ∈U1 ∩U′
0) > 0 for
any m > M. As U1 ∩U′
0 ⊂A we get P(ηm ∈A) > 0
for any m > M and the property iii) of Deﬁnition 4
is also veriﬁed. We ﬁnally obtain that (ηn)n⩾0 is an
aperiodic recurrent Harris chain.
Using Theorem 7 we can conclude that the density of ηn
converges in T.V. norm to its unique equilibrium distribu-
tion for any initialization.
Building upon the previous results, the proof of Theo-
rem 3 is as follows. The Markov chain deﬁned by equa-
tions (13) and (14) with ﬁxed ϒ can be written under the
form η′
n+1 = qz(η′
n) = q1,z ◦q2,z(η′
n) with z = (Wn,Vn) ∈
G×Rp, q1,z(x) = ϒWnxϒ−1 and q2,z(x) = xK(h(x,Vn))−1.
Let 0 = (Id,0). We will show that (η′
n)n⩾0 has all the
properties required in Lemma 2.
Let α,ε,ε′ be deﬁned as in 4.1.
Let α′ such as
Bo(Id,α′) ⊂ϒBo(Id, α
2 )ϒ−1 and ε′′ = ε′|Adϒ−1|ε, where
|Adϒ−1| is the determinant of the adjoint operator on g.
For any x ∈C and V ⊂Bo(q0(x),α′) = Bo(Id,α′)q0(x)
we have:
P(qz(x) ∈V) ⩾P(qz(x) ∈V,q2,z(x) ∈Bo(q2,0(x), α
2 ))
⩾P(q2,z(x) ∈Bo(q2,0(x), α
2 ))×
P(ϒWnq2,z(x)ϒ−1 ∈V|q2,z(x) ∈Bo(q2,0(x), α
2 ))
i.e.
P(qz(x)
∈
V)
⩾
ε′P(Wn
∈
ϒ−1Vϒq2,z(x)−1|q2,z(x) ∈Bo(q2,0(x), α
2 )).
As-
suming
q2,z(x)
∈
Bo(q2,0(x), α
2 )
we
have:
ϒ−1Vϒq2,z(x)−1
⊂
ϒ−1Bo(Id,α′)q0(x)ϒq2,z(x)−1
⊂
Bo(Id, α
2 )ϒ−1q0(x)ϒq2,z(x)−1
⊂
Bo(Id, α
2 )q2,0(x)q2,z(x)−1 ⊂Bo(Id, α
2 + α
2 ).
As Wn
is independent from the other variables we obtain:
P(Wn ∈ϒ−1Vϒq2,z(x)−1|q2,z(x) ∈Bo(q2,0(x), α
2 )) ⩾
εµG(ϒ−1Vϒq2,z(x)−1) = εµG(ϒ−1Vϒ) = εµG(V)|Adϒ−1|
And ﬁnally:
P(qz(x) ∈V) ⩾ε′|Adϒ−1|εµ(V) = ε′′µ(V)
As q0 has Id as a ﬁx point we only have to ver-
ify that the sets Un as deﬁned in Lemma 2 eventually
cover the whole set C.
It sufﬁces to consider for any
n ⩾0 the set Dn = {x ∈C,∀k ⩾n,qk
0(x) ∈U′
0}.
As
we have qn
0(x) →Id almost-everywhere on C we get:
µG(∪n⩾0Dn) = µG(C). As the sequence of sets (Dn)n⩾0
increases we have: µG(Dn) −→
n→∞µG(C).
We introduce
here the quantity vmin = minx∈C µG(Bo(x, α′
2 ) ∩C) (the
property C = cl(Co) and the regularity of µG ensure that
we have vmin > 0). Let N ∈N be such that µG(DN) >
µG(C)−vmin. We have then: ∀y ∈C,d(y,DN) < α′
2 (oth-
erwise we would have µG(DN) ⩽µG(C) −vmin).
As
we have DN ⊂UN we obtain ∀y ∈C,d(y,DN) < α′
2 thus
U′
N = C and UN+1 = q−1
0 (U′
N) = q−1
0 (C) = C. So all the
conditions of Lemma 2 hold and we can conclude con-
vergence in T.V. norm to a stationary distribution which
doesn’t depend on the prior, provided that its support lies
in C. Theorem 3 is proved.
Corollary 1 follows directly from the case C = G.
Theorem 4 can be proved as follows. Let C = G. Let
K = {x ∈G,h(x,0) = h(Id,0)}. Note that the left-right
equivariance assumption ensures that K is a subgroup of
G.
First we show that there exists an integer N1 such
that K ⊂UN1. As we have ∀x ∈K q0(x) = ϒxϒ−1 the se-
quence of sets Qn = ϒ−nUnϒn∩K is growing and we have:
{x ∈K, d(x,Qn) < α
2 } ⊂Qn+1.The set Q∞= ∪∞
n=1Qn is
open in K as an union of open sets, but we have ∀x ∈
Q∞,Bo(x, α
2 )∩K ⊂Q∞thus ∀x ∈K \Q∞,Bo(x, α
4 )∩K ⊂
K \ Q∞. This implies that Q∞and K \ Q∞are both open
in K. As K is connected (see assumptions of Theorem 4)
we obtain Q∞= K. As K is compact (as a closed sub-
set of a compact) the open cover ∪∞
i=1Qn has a ﬁnite sub-
cover and there exists an integer N1 such that QN1 = K,
i.e. ϒN1Kϒ−N1 = UN1. As K is a subgroup of G (see
above) containing ϒ we obtain K ⊂UN1. Now we have
to prove that the sets (Un)n⩾0 eventually cover the whole
set C. For any x ∈G we have h(qn
0(x),0) →h(Id,0). Thus
there exists a rank n such that ∀k ⩾n,qk
0(x) ∈U′
N (oth-
erwise we could extract a subsequence from (qn
0(x))n⩾0
which stays at a distance > α
2 from K, as G is com-
pact we could extract again a convergent subsequence and
its limit q∞
0 (x) would be outside K, thus we would have
h(qn
0(x)) ̸→h(Id,0)). Here we can deﬁne as in the proof
of Theorem 3 the sets Dn = {x ∈C,∀k ⩾n,qk
0(x) ∈U′
N1}
(note that U′
1 has been replaced by U′
N1 in this deﬁnition
of Dn). As for almost any x ∈G there exists a rank n such
that x ∈Dn we have µG(Dn) →µG(C) and as in the proof
of Theorem 3 the sets Un eventually cover the set C.
The second part of the property (invariance to left mul-
tiplication) is easier. Consider the error process ζn = ˜ϒηn.
The propagation and update steps read:
ζ ′
n+1 = ˜ϒη′
n+1 = ˜ϒϒWnηnϒ−1 = ϒ˜ϒWnηnϒ−1 L= ϒWn ˜ϒηnϒ−1
L= ϒWnζϒ−1
ζn+1 = ˜ϒη′
n+1Kn+1(h(η′
n+1,Vn+1))−1
L= ˜ϒη′
n+1Kn+1(h(˜ϒη′
n+1,Vn+1))−1
L= ζ ′
n+1Kn+1(h(ζ ′,Vn+1))−1
where we have used the property:
h(η′
n+1,Vn+1) L=
η′−1
n+1h(Id,Vn+1) L= η′−1
n+1h(˜ϒ,Vn+1) L= h(˜ϒη′
n+1,Vn+1). We
see that the law of the error process is invariant under left
multiplication by ˜ϒ, thus the asymptotic distribution π in-
herits this property.
C
Proofs of the results of Section 4.2
C.1
Proof of Proposition 4
We deﬁne the continuous process ˜γt : R →G as follows:
∀n ∈N, ˜γn = γn
∀n ∈N,∀t ∈[n,n+1[, d
dt ˜γt = −˜γtk( ˜γn)
We obtain immediately that ˜γt is continuous. Besides, for
any n ∈N and t ∈]n,n+1[ one has:
| d
dt ⟨k( ˜γt),k( ˜γn)⟩| = |⟨d
dt k( ˜γt),k( ˜γn)⟩|
⩽| d
dt k( ˜γt)||k( ˜γn)|
⩽| d
dt ˜γt||k( ˜γn)| (due to |∂k
∂x| ⩽1 )
⩽|k( ˜γn)|2 (as d
dt ˜γt = −˜γtk( ˜γn) )
Thus:
⟨k( ˜γt),k( ˜γn)⟩⩾⟨k( ˜γn),k( ˜γn)⟩−(t −n)|k( ˜γn)|2 ⩾0
proving in the Lie algebra that ⟨k( ˜γt),k( ˜γn)⟩⩾0.
As
u
is
a
positive
function
we
have
thus
u( ˜γt)−1⟨˜γtk( ˜γt),−˜γtk( ˜γn)⟩⩽0
immediately
proving
⟨gradE( ˜γt), d
dt ˜γt⟩⩽0.
The latter result being true for
every t ∈R\N and the function E( ˜γt) being continuous,
it decreases and thus converges on R≥0.
As we have
supposed the sublevel sets of E are bounded (and closed
as E is continuous), ˜γt is stuck in a compact set.
Let
γ∞an adherence value of ˜γn.
By continuity of E we
have E( ˜γ∞) = E( ˜γ∞exp(−k( ˜γ∞)).
Thus the function
t →E( ˜γ∞exp(−tk( ˜γ∞))) is decreasing on [0,1] and has
the same value at 0 and 1. Thus it is constant, its deriva-
tive at 0 is null proving γ∞= Id. ( ˜γn)n⩾0 being conﬁned
in a compact set and having Id as unique adherence value
we ﬁnally get γn = ˜γn →Id.
C.2
Proof of Proposition 3
Let γn = Rn ˆRT
n be the invariant error. Its equation reduces
to:
γn+1 = γn.exp(−k1(γT
n b1)×b1 −k2(γT
n b2)×b2)
Let k(γ) = k1(γTb1) × b1 + k2(γTb2) × b2 and E : γ →
k1||γTb1 −b1||2 + k2||γTb2 −b2||2.
To apply Proposi-
tion 4 we will ﬁrst verify that 1) ∀γ ∈SO(3),γ.k(γ) =
gradE(γ) 2) | ∂k
∂γ | ⩽1.
To prove 1) consider the
dynamics in ∈SO(3) deﬁned by
d
dt γt = γt(ψt)× for
some rotation vector path ψt.
We have
d
dt E(γt) =
k1(γT
t b1 −b1)T d
dt γT
t b1 + k2(γT
t b2 −b2)T d
dt γT
t b2.
Using
triple product equalities this is equal to ⟨k1(γT
t b1)×b1 +
k2(γT
t b2)×b2,ψt⟩= ⟨k(γt),ψt⟩= ⟨γt.k(γt), d
dt γt⟩.
Thus
γ.k(γ) = gradE(γ). To prove 2) we analogously see that
d
dt k(γt) = −[k1(b1)×(γT
t b1)× +k2(b2)×(γT
t b2)×]ψt. Thus
| d
dt k(γt)| ⩽(k1 +k2)|ψt| and ﬁnally | ∂k
∂γ | ⩽1. Now, except
if initially γ0 is the rotation of axis b1×b2 and angle π, the
function E is strictly decreasing, and Id is the only point
in the sublevel set {γ ∈SO(3) | E(γ) ≤E(γ0)} such that
gradE(γ) = 0. Applying Proposition 4 allows to prove
Proposition 3.
References
[1] S. Bonnabel A. Barrau. Intrinsic ﬁltering on SO(3)
with discrete-time observations.
In IEEE Confer-
ence on Decision and Control, 2013.
[2] M. Barczyk and A. F. Lynch. Invariant observer de-
sign for a helicopter uav aided inertial navigation
system. Control Systems Technology, IEEE Trans-
actions on, 21(3):791–806, 2013.
[3] S. Bonnabel. Left-invariant extended Kalman ﬁlter
and attitude estimation. In Decision and Control,
2007 46th IEEE Conference on, pages 1027–1032.
IEEE, 2007.
[4] S.
Bonnabel,
Ph.
Martin,
and
P.
Rouchon.
Symmetry-preserving observers.
Automatic Con-
trol, IEEE Transactions on, 53(11):2514–2526,
2008.
[5] S.
Bonnabel,
Ph.
Martin,
and
P.
Rouchon.
Non-linear symmetry-preserving observers on Lie
groups.
IEEE Trans. on Automatic Control,
54(7):1709 – 1713, 2009.
[6] S. Bonnabel, Ph. Martin, and E. Salaun. Invariant
extended Kalman ﬁlter: theory and application to a
velocity-aided attitude estimation problem. In De-
cision and Control, 2009 held jointly with the 2009
28th Chinese Control Conference. CDC/CCC 2009.
Proceedings of the 48th IEEE Conference on, pages
1297–1304. IEEE, 2009.
[7] G.
Bourmaud,
R.
M´egret,
A.
Giremus,
Y. Berthoumieu, et al. Discrete extended Kalman
ﬁlter on Lie groups.
In 21st European Signal
Processing Conference 2013 (EUSIPCO 2013),
2013.
[8] S. Br´as, P. Rosa, C. Silvestre, and P. Oliveira. Global
attitude and gyro bias estimation based on set-valued
observers. Systems & Control Letters, 62(10):937–
942, 2013.
[9] R. W. Brockett. Lie algebras and Lie groups in con-
trol theory. In Geometric methods in system theory,
pages 43–82. Springer, 1973.
[10] D.L. Burkholder, ´E. Pardoux, A.S. Sznitman, and
P.L. Hennequin.
´Ecole d’´et´e de probabilit´es de
Saint-Flour XIX, 1989. Springer-Verlag, 1991.
[11] J. L. Crassidis and F Landis Markley. Unscented
ﬁltering for spacecraft attitude estimation. Journal
of guidance, control, and dynamics, 26(4):536–542,
2003.
[12] J. L. Crassidis, F. Landis Markley, and Yang Cheng.
A survey of nonlinear attitude estimation meth-
ods. Journal of Guidance, Control,and Dynamics,
30(1):12–28, 2007.
[13] T.E. Duncan.
Some ﬁltering results in Riemann
manifolds.
Information and Control, 35(3):182–
195, 1977.
[14] T.E. Duncan.
Stochastic systems in Riemannian
manifolds. Journal of Optimization theory and ap-
plications, 27(3):399–426, 1979.
[15] T.E. Duncan. An estimation problem in compact Lie
groups. Systems & Control Letters, 10(4):257–263,
1988.
[16] R. Durrett. Probability: theory and examples, vol-
ume 3. Cambridge university press, 2010.
[17] Y. Han and F.C. Park. Least squares tracking on the
Euclidean group. Automatic Control, IEEE Trans-
actions on, 46(7):1127–1132, 2001.
[18] K. Itˆo. Stochastic differential equations in a differ-
entiable manifold. Nagoya Mathematical Journal,
1:35–47, 1950.
[19] S.J. Julier and J.K. Uhlmann.
A new exten-
sion of the Kalman Filter to Nonlinear Systems.
Proc. of AeroSense :
The 11th Int. Symp. on
Aerospace/Defence Sensing, Simulation and Con-
trols, 1997.
[20] R. Kalman and R. Bucy. New results in linear ﬁlter-
ing and prediction theory. Basic Eng., Trans. ASME,
Ser. D,, 83(3):95–108, 1961.
[21] Ch. Lageman, J. Trumpf, and R. Mahony. Gradient-
like observers for invariant dynamics on a Lie
group. Automatic Control, IEEE Transactions on,
55(2):367–377, 2010.
[22] E. J. Lefferts, F Landis Markley, and M. D. Shus-
ter. Kalman ﬁltering for spacecraft attitude estima-
tion. Journal of Guidance, Control, and Dynamics,
5(5):417–429, 1982.
[23] M. Liao. L´evy processes in Lie groups. Cambridge
University Press, 2004.
[24] JT-H Lo. Optimal estimation for the satellite atti-
tude using star tracker measurements. Automatica,
22(4):477–482, 1986.
[25] J. Trumpf M. Zamani and R. Mahony. Near-optimal
deterministic ﬁltering on the rotation group. IEEE
Trans. on Automatic Control, 56:6:1411 – 1414,
2011.
[26] R. Mahony, T. Hamel, and J-M Pﬂimlin.
Non-
linear complementary ﬁlters on the special orthog-
onal group.
IEEE-Trans. on Automatic Control,
53(5):1203–1218, 2008.
[27] F Landis Markley, J. L. Crassidis, and Y. Cheng.
Nonlinear attitude ﬁltering methods. In AIAA Guid-
ance, Navigation, and Control Conference, 2005.
[28] Ph. Martin and E. Sala¨un. Invariant observers for
attitude and heading estimation from low-cost iner-
tial and magnetic sensors. In 46th IEEE Conf Decis
Contr, pages 1039–1045, 2007.
[29] S.K. Ng and P.E. Caines. Nonlinear ﬁltering in Rie-
mannian manifolds. IMA journal of mathematical
control and information, 2(1):25–36, 1985.
[30] H. Nijmeijer and T. I. Fossen. New directions in non-
linear observer design, volume 244. Springer, 1999.
[31] P. J. Olver. Classical Invariant Theory. Cambridge
University Press, 1999.
[32] W. Park, Y. Liu, Y. Zhou, M. Moses, G. S.
Chirikjian, et al. Kinematic state estimation and mo-
tion planning for stochastic nonholonomic systems
using the exponential map.
Robotica, 26(4):419–
434, 2008.
[33] M. Pontier and J. Szpirglas. Filtering on manifolds.
In Stochastic Modelling and Filtering, pages 147–
160. Springer, 1987.
[34] A. Saccon, J. Trumpf, R. Mahony, and A. P. Aguiar.
Second-order-optimal ﬁlters on lie groups.
[35] S. Said. Estimation and ﬁltering of processes in ma-
trix Lie groups. PhD thesis, Institut National Poly-
technique de Grenoble-INPG, 2009.
[36] S. Said and J. H. Manton. On ﬁltering with observa-
tion in a manifold: Reduction to a classical ﬁltering
problem. SIAM Journal on Control and Optimiza-
tion, 51(1):767–783, 2013.
[37] S. Salcudean.
A globally convergent angular ve-
locity observer for rigid body motion.
Automatic
Control, IEEE Transactions on, 36(12):1493–1497,
1991.
[38] A. K. Sanyal, T. Lee, M. Leok, and N. H. Mc-
Clamroch. Global optimal attitude estimation using
uncertainty ellipsoids. Systems & Control Letters,
57(3):236–245, 2008.
[39] A.
Tayebi,
S.
McGilvray,
A.
Roberts,
and
M. Moallem. Attitude estimation and stabilization
of a rigid body using low-cost sensors. In Decision
and Control, 2007 46th IEEE Conference on, pages
6424–6429. IEEE, 2007.
[40] J. Thienel and Robert M Sanner. A coupled non-
linear spacecraft attitude controller and observer
with an unknown constant gyro bias and gyro
noise.
Automatic Control, IEEE Transactions on,
48(11):2011–2015, 2003.
[41] R. Tron, B. Afsari, and R. Vidal. Riemannian con-
sensus for manifolds with bounded curvature. 2012.
[42] J.F. Vasconcelos, R. Cunha, C. Silvestre, and
P. Oliveira. A nonlinear position and attitude ob-
server on SE(3) using landmark measurements. Sys-
tems Control Letters, 59:155–166, 2010.
[43] A. S. Willsky.
Some estimation problems on Lie
groups. In D.Q. Mayne and R.W. Brockett, editors,
Geometric Methods in System Theory, volume 3 of
NATO Advanced Study Institutes Series, pages 305–
314. Springer Netherlands, 1973.
[44] A. S. Willsky and S. I. Marcus. Estimation for bilin-
ear stochastic systems. In Variable Structure Sys-
tems with Application to Economics and Biology,
pages 116–137. Springer, 1975.
[45] A.S. Willsky.
Dynamical Systems Deﬁned on
Groups: Structural Properties and Estimation. PhD
thesis, MIT Dept. of Aeronautics and Astronautics,
May 1973.
[46] K. C. Wolfe, M. Mashner, and G. S. Chirikjian.
Bayesian fusion on Lie groups. Journal of Algebraic
Statistics, 2(1):75–97, 2011.
[47] E. Wong and M. Zakai. Riemann-Stieltjes approxi-
mations of stochastic integrals. Probability Theory
and Related Fields, 12(2):87–97, 1969.
Table 1: Artiﬁcial horizon: experiment parameters
Standard Deviation of the model noise
0.01 degree = 1.75×10−4 rad
Standard Deviation of the regular observations
0.1 degree = 1.75×10−3 rad
Standard Deviation of the outliers
30 degrees
Probability of the outliers to occur
0.01
Table 2: Useful formulas for some classical matrix Lie groups
G
SO(3) (Rdimg = R3)
(Rdimg = R6)
Embedding of G
R ∈M3(R),RTR = I3
 
R
v
0
1
!
,R ∈SO(3),u ∈R3
Embedding of g
A3 : ψ ∈M3(R),ψT = −ψ
 
ψ
u
0
0
!
,ψ ∈A3,u ∈R3
(.)m
ξ →(ξ)×
 
ξ
u
!
→
 
(ξ)×
u
0
0
!
x →Adx
R →R
 
R
T
!
→
 
R
0
(T)×R
R
!
ξ →adξ
ξ →(ξ)×
 
ξ
u
!
→
 
(ξ)×
0
(u)×
(ξ)×
!
exp
exp(x) = I3 + sin(||x||)
||x||
(x)×
exp(x) = I4 +(x)m + [1−cos(||x||)]
||x||2
(x)2
×
+
1
||x||2 (1−cos|x|)(x)2
m
+
1
||x||3 (||x||−sin||x||)(x)3
m
Table 3: Observation of two vectors: parameters
Parameter
b1
b2
QV1
QV2
Value
(1,0,0)
(0,1,0)
0,08732I3
0,08732I3
Parameter
P0
Qw
N
Simulations
Value
0.52362I3
0,017452I3
50
1000