Intrinsic filtering on Lie groups with applications to attitude estimation Axel Barrau, Silv`ere Bonnabel∗ September 15, 2014 Abstract This paper proposes a probabilistic approach to the prob- lem of intrinsic filtering of a system on a matrix Lie group with invariance properties. The problem of an invariant continuous-time model with discrete-time measurements is cast into a rigorous stochastic and geometric frame- work. Building upon the theory of continuous-time in- variant observers, we introduce a class of simple filters and study their properties (without addressing the opti- mal filtering problem). We show that, akin to the Kalman filter for linear systems, the error equation is a Markov chain that does not depend on the state estimate. Thus, when the filter’s gains are held fixed, the noisy error’s dis- tribution is proved to converge to a stationary distribution, under some convergence properties of the filter with noise turned off. We also introduce two novel tools of engineer- ing interest: the discrete-time Invariant Extended Kalman Filter, for which the trusted covariance matrix is shown to converge, and the Invariant Ensemble Kalman Filter. The methods are applied to attitude estimation, allowing to derive novel theoretical results in this field, and illus- trated through simulations on synthetic data. 1 Introduction The problem of probabilistic filtering aims at computing the conditional expectation ˆx(t) of the state x(t) of a sys- tem driven by process and observation noises conditioned ∗A. Barrau and S. Bonnabel are with MINES ParisTech, PSL Research University, Centre for robotics, 60 Bd St Michel 75006 Paris, France. [axel.barrau,silvere.bonnabel]@mines-paristech.fr on the past inputs and outputs. Filtering is a branch of stochastic process theory that has been strongly driven by applications in control and signal processing. In the particular case where the system is linear and corrupted by independent white Gaussian process and observation noises, the celebrated Kalman filter [20] solves the filter- ing problem. Yet, when the system is non-linear there is no general method to derive efficient filters, and their design has encountered important difficulties in practice. Indeed, general filtering equations describing the evolu- tion of the state conditioned on past outputs can also be derived, see e.g., [10] and references therein. However, the equations are not closed, and in most cases the con- ditional densities can not be computed by solving a finite dimensional equation. In engineering applications the most popular solution is the extended Kalman filter (EKF), which does not pos- sess general theoretical properties, especially in a stochas- tic context, and is sometimes prone to divergence. The EKF amounts to linearize the system around the esti- mated trajectory, and build a Kalman filter for the linear model, which can in turn be implemented on the non- linear model. Despite the fact that it is subject to sev- eral problems both theoretical and practical, one merit of the EKF is to convey an estimation of the mean state ˆx(t) together with an approximation of the covariance matrix P(t) of the error, yielding a (trusted) ellipsoidal credible set that reflects the level of uncertainty in the estimation. In order to capture more closely the distribution, several rather computationally extensive methods have recently attracted growing interest such as particle filters and un- scented Kalman filters (UKF) [19]. In this paper we consider a general filtering problem on Lie groups in a stochastic setting, and a class of simple arXiv:1310.2539v2 [cs.SY] 12 Sep 2014 recursive finite dimensional filters, having a similar struc- ture as the EKF, is introduced. Such recursively defined filters allow straightforward digital computer implemen- tation. Besides, as the Kalman filter, the corresponding algorithms will be readily implementable with very little understanding of the theory leading to their development, as illustrated by the concrete examples of sections 4.2, 4.3 and 5.3, which can be read independently of the rest of the paper. As concerns their theoretical properties, they are motivated by the three following desirable properties a simple recursive filter should ideally have: 1- forget- ting of the initial condition, 2- being able to capture the extent of uncertainty in the estimate, 3- being efficient, that is, minimizing the error’s dispersion within a given class. In the filtering problem, the estimation error is ini- tially distributed according to a prior distribution provided by the practitioner. The rationale for the first property is that, as the filter acquires information, the prior informa- tion should be given less (eventually no) importance by the filter. As the user generally possesses very little in- formation on the system’s state initially, the prior can be very far from the truth. Forgetting the initial condition allows the (subjective Bayesian) prior distribution to be as informative as noninformative without impacting the final estimates. Moreover, it corresponds to convergence properties of the filter, and thus deals with the ability to re- cover from catastrophic failures. In robotics, this leads to robustness to erroneous beliefs due for instance to percep- tual ambiguity. The second property deals with the cen- tral motivation behind stochastic approaches. Indeed, all deterministic observers, no matter how strong properties they may have, have a common fundamental weakness: in the presence of sensor and model uncertainties, they pro- vide no indication of the extent of uncertainty conveyed by the estimate. Being able to compute credible sets for the estimate may be needed in order to adapt control deci- sions to the situation, for instance to guarantee an absence of collision. The two first properties are general and ap- ply to any class of sensible filters. The third property ad- dresses the issue of gain tuning. When working with a class of (suboptimal) filters, the gains can be tuned over a training set in order to minimize the level of uncertainty conveyed by the estimate. Several approaches to filtering for systems possessing a geometric structure have been developed in previous lit- erature. For stochastic processes on Riemannian mani- folds [18, 14] some results have been derived, see e.g., [29]. The specific situation where the process evolves in a vector space but the observations belong to a manifold has also been considered, see e.g. [33, 13] and more re- cently [36]. For systems on Lie groups powerful tools to study the filtering equations (such as harmonic analysis [45, 32, 24, 43]) have been used, notably in the case of bi- linear systems [44] and estimation of the initial condition of a Brownian motion in [15]. An extension of Gaussian distributions has also been developed and recently used for Bayesian filtering on Lie groups in [46] and also in [7] which devises a general adaptation of the EKF to Lie groups, without exploiting the geometrical properties of the specific invariant case such as in the present paper. A somewhat different but related approach to filtering con- sists of finding the path that best fits the data in a deter- ministic setting. It is thus related to optimal control theory where geometric methods have long played an important role [9]. A certain class of least squares problems on the Euclidian group has been tackled in [17]. This approach has also been used to filter inertial data recently in [34], [26]. Although the paper is concerned with a general the- ory on (matrix) Lie groups, the several derived filters will systematically be applied to a benchmark and motivating filtering example on the special orthogonal group SO(3). The problem of attitude estimation consists of estimating the orientation of a rigid body, which proves to be a cen- tral task in aeronautics, in order to stabilize or guide the system, or in mobile robotics, where orientation is a part of the robot’s localization. When the system is equipped with gyroscopes, the delivered signals can be integrated over time to compute an orientation increment. However, the initial orientation may not be known, and even if it is, the gyroscopes are subject to noise and drift, so that pure integration may quickly result in a very erroneous atti- tude. Thus the estimations are regularly checked against some vectors’ orientation, such as the earth magnetic field measured by magnetometers, or the earth gravity vector measured by accelerometers under quasi-stationary flight assumptions. The explosion over the last decade of low- cost noisy sensors integrated in unmanned aerial vehicles (UAV’s), and the limited computational power on-board, have been a driving force for the development of simple filters on SO(3). Work on the deterministic problem dates back to [37, 30] and has received much attention recently, see [42, 27, 26, 4, 28, 38] to cite a few, and some related issues have been addressed as well, like angular velocity estimation [40] or attitude control [37, 39]. The beauty of the intrinsic problem formulation on SO(3) and the prop- erties stemming from it have been exploited in various papers in the deterministic case, see e.g. [26, 21, 5]. The problem has also been cast into a minimum energy fil- tering framework in [25]. In the stochastic framework, a general review of the classical techniques based on Gaus- sian assumptions is proposed in [27]. A first attempt to systematically exploit the invariance properties to de- sign stochastic filters on SO(3) appears in the preliminary work [1]. The contributions and the organization of the present paper are as follows. In Section 2, the problem of stochas- tic filtering on Lie groups with a continuous-time noisy model and discrete-time observations is rigorously posed. From a theoretical viewpoint, the choice of discrete-time measurements allows to circumvent some technical diffi- culties and to cast the problem into a perfectly rigorous framework. As this work is mainly guided by applica- tions to aerospace, this choice is relevant as the evolution equations of the orientation generally rely on the integra- tion of noisy inertial measurements that are delivered at a high rate, whereas the observations may be delivered at low(er) rate (GPS, vision). In order to derive proper filters, the problem is then transformed into a complete discrete-time model. The discretization is not straightfor- ward because unlike the linear case, difficulties arise from non-commutativity. Guided by the theory of symmetry-preserving ob- servers that has been exclusively concerned with the continuous-time and deterministic case so far, we propose in Section 3 a class of natural filters. They appear as a mere transposition of linear observers (such as the linear Kalman filter) to the group, but where the addition has been replaced with the group multiplication law (due to non-commutativity, there is always some freedom in the transposition as left and right multiplications are differ- ent operations). We prove the proposed filters retain the invariances of the problem, and that the discrete-time er- ror’s evolution is independent of the system’s trajectory, inheriting the properties of the deterministic continuous- time case [5] (or merely the linear case). Thus, the gains can be computed off-line, and the proposed filters require very few on-board computational resources. This is one of their main advantages over current filters for the atti- tude estimation problems considered in simulations. The proposed class is very broad and the tuning issue is far from trivial. Sections 4 and 5 explore two different routes, depending on if we are interested in the asymp- totic or transitory phase. In Section 4, we propose to hold the gains fixed over time. As a result, the error equation becomes a homogeneous Markov chain. We first prove some new results about global convergence in a determin- istic and discrete-time framework. Then, building upon the homogeneity property of the error, we prove that if the filter with noise turned off admits almost global con- vergence properties, the error with noise turned on con- verges to a stationary distribution. Mathematically this is a very strong and to some extent surprising result. From a practical viewpoint, the gains can be tuned numerically to minimize the asymptotic error’s dispersion. This allows to “learn” sensible gains for very general types of noises. The theory is applied to two examples and gives conver- gence guarantees in each case. First an attitude estimation problem using two vector measurements and a gyroscope having isotropic noise (Section 4.2), then the construc- tion of an artificial horizon with optimal gains, for a non- Gaussian noise model (Section 4.3). Each application is a contribution in itself and can be implemented without reading the whole paper. In Section 5, we propose to optimize the convergence during the transitory phase using Gaussian approxima- tions. We first introduce a method of engineering inter- est: the Invariant Extended Kalman Filter (IEKF) in dis- crete time. The IEKF was introduced in continuous time in [3] and developped and applied to localization prob- lems in [6] and very recently in [2]. A discrete time ver- sion on SO(3) was proposed in the preliminary paper [1]. The idea consists of linearizing the invariant error for the class of filters introduced in Section 3 around a fixed point (the identity element of the matrix group), build a Kalman filter for the linear model and implement it on the non- linear model. As the linearizations always occur around the same point, the linearized model is time-invariant and thus the Kalman gains, as well as the Kalman covariance matrix, are proved to converge to fixed values. When on- board storage and computational resources are very lim- ited, this advantageously allows to replace the gain with its asymptotic value. The IEKF is compared to the well- known MEKF [22, 12] and UKF [19, 11] on the attitude estimation problem, and simulations illustrate some con- vergence properties that the latter lack. In the case where the error equation is totally autonomous, we introduce a new method based on off-line simulations, the IEnKF, which outperforms the other filters in case of large noises by capturing very accurately the error’s dispersion. 2 Problem setting 2.1 Considered continuous-time model Consider a state variable χt taking values in a matrix Lie group G with neutral element Id, and the following continuous-time model with discrete measurements: d dt χt = (υt +wt)χt + χtωt (1) Yn = h(χtn,Vn) (2) where υt and ωt are inputs taking their values in the Lie algebra g (i.e. the tangent space to the identity element of G), wt is a continuous white Langevin noise with diffu- sion matrix Σt, whose precise definition will be discussed below in Subsection 2.3. (Yn)n⩾0 are the discrete-time ob- servations, belonging to some measurable space Y and Vn is a noise taking values in Rp for an integer p > 0. We further make the following additional assumption: Assumption 1 (left-right equivariance) The output map h is left-right equivariant, i.e. there exists a left action of G on Y such that we have the equality in law: ∀g,χ ∈G, h(χg,Vn) L= g−1h(χ,Vn) (3) and h(χg,0) = g−1h(χ,0) (4) The reader who is not familiar with stochastic calcu- lus and Lie groups can view χt as a rotation matrix, and replace the latter property with the following more restric- tive assumption: Assumption 1 bis For any n ⩾0 there exists a vector b such that h(χ,Vn)=χ−1(b+Vn). The assumptions could seem restrictive but are verified in practice in various cases as shown by the following ex- amples, which will provide the reader with a more con- crete picture than the formalism of Lie groups. 2.2 Examples 2.2.1 Attitude estimation on flat earth Our motivating example for the model (1)-(2) is the atti- tude estimation of a rigid body assuming the earth is flat, and observing two vectors: d dt Rt = Rt(ωt +wω t )×, Yn = (RT tnb1 +V 1 n ,RT tnb2 +V 2 n ), (5) where as usual Rt ∈SO(3) represents the rotation that maps the body frame to the earth-fixed frame, and where ωt ∈R3 is the instantaneous rotation vector, and wω t ∈R3 is a continuous Gaussian white noise representing the gy- roscopes’ noise. We have let (x)× ∈R3×3 denote the skew matrix associated with the cross product with a three di- mensional vector, i.e., for a,x ∈R3, we have (x)×a = x × a. (Yn)n≥0 is a sequence of discrete noisy measure- ments of two vectors b1,b2 of the earth fixed frame ver- ifying b1 × b2 ̸= 0, and V 1 n and V 2 n are sequences of in- dependent isotropic Gaussian white noises. Note that the noise wω t is defined, as the input ωt, in the body frame (in other words it is multiplied on the left). Thus equations (5) do not match (1)-(2) which correspond to a noise de- fined in the earth-fixed frame. This can be remedied in the particular case where the gyroscope noise is isotropic, a restrictive yet relevant assumption in practice. Definition 1. A Langevin noise wt of g is said isotropic if ˜wt := gwtg−1 L= wt for any g ∈G. Note that for a noise taking values in so(3) this defini- tion corresponds to the physical intuition of an isotropic noise of R3. With this additional assumption the equation can be rewritten: d dt Rt = ( ˜wω t )×Rt +Rt(ωt)×, Yn = (RT tnb1 +V 1 n ,RT tnb2 +V 2 n ), which corresponds to (1)-(2). Remark 1. If the gyrometer noise is not isotropic the new noise ˜wω t is related to Rt by ˜wω t = Rtwω t RT t . Depending on the degree of anisotropy this can prevent the use of methods based on the autonomy of the error (see 4) but not of the discrete-time IEKF (see 5.1). 2.2.2 Attitude estimation on a round rotating Earth Another interesting case is attitude estimation on SO(3) using an observation of the vertical direction (given by an accelerometer) and taking into account the rotation of the earth. Whereas the flat earth assumption perfectly suits low-cost gyroscopes, precise gyroscopes can measure the complete attitude by taking into account the earth’s ro- tation. We define a geographic frame with axis North- west-up. The attitude Rt is the transition matrix from the body reference to the geographic reference. The earth in- stantaneous rotation vector υ ∈R3 and the gravity vector g ∈R3 are expressed in the geographic reference. Both are constant in this frame. The gyroscope gives a con- tinuous rotation speed ωt, disturbed by an isotropic con- tinuous white noise wω t . The equation of the considered system reads: d dt Rt = (υ)×Rt +Rt(ωt +wω t )×, Yn = RT tn(g+Vn) As wω t is supposed isotropic the equation can be rewritten: d dt Rt = (υ + ˜wω t )×Rt +Rt(ωt)×, Yn = RT tn(g+Vn), which corresponds to (1)-(2). 2.2.3 Attitude and velocity estimation We give here an example on a larger group. Consider the attitude Rt ∈SO(3) and speed vt ∈R3 of an aircraft evolv- ing on a flat earth, equipped with a gyroscope and an ac- celerometer. The gyroscopes gives continuous increments ψt ∈R3 with isotropic noise wψ t ∈R3, and the accelerom- eters gives continuous increments at ∈R3 with isotropic noise wa t ∈R3. The aircraft has noisy speed measure- ments in the earth reference frame Yn = vtn +Vn, where Vn is a supposed to be a Gaussian isotropic white noise. The equations read: d dt Rt = Rt(ψt +wψ t )×, d dt vt = Rt(at +wa t )+g, where Rt is the rotation mapping the body frame to the earth-fixed frame, vt is the velocity expressed in the earth fixed frame, and g is the earth gravity vector, supposed to be (locally) constant. Using the matrix Lie group SE(3) we introduce: At =  RT t RT t vt 0 1  ; ωt =  0 g 0 0  υt =  −(ψt)× at 0 0  ; wt =  (wψ t )× wa t 0 0  The problem can then be rewritten under the form (1)-(2): d dt At = (υt +wt)At +Atωt, Ytn = A−1 tn  Vn 1  , 2.3 Interpretation of Langevin noises on Lie Groups This present subsection provides some mathematical con- siderations about white noises on Lie groups. It can be skipped by the uninterested reader who is directly referred to the model (6). Equation (1) is actually a Langevin equation that suf- fers from poly-interpretability because of its non-linearity, and its meaning must be clarified in the rigorous frame- work of stochastic calculus. Stratonovich stochastic dif- ferential equations on a Lie group can be intrinsically de- fined as in e.g. [23]. A somewhat simpler (but equiva- lent) approach consists of using the natural embedding of the matrix Lie group G in a matrix space and to under- stand the equation in the sense of Stratonovich, see e.g. [35]. The mathematical reasons stem from the fact that the resulting stochastic process is well-defined on the Lie group, whereas this is not the case when opting for an Ito interpretation as underlined by the following easily prov- able proposition. Proposition 1. If the stochastic differential equation (1) is taken in the sense of Itˆo, for G = SO(3) embedded in R3×3, an application of the Itˆo formula to χT t χt shows that the solution almost surely leaves the submanifold G for any time t > 0. Besides, the physical reasons for this stem from the fact that the sensors’ noise are never completely white, and for colored noise the Stratonovich interpretation provides a better approximation to the true solution than Ito’s, as advocated by the result of Wong and Zakai [47]. 2.4 Exact discretization of the considered model To treat rigorously the problem of integrating discrete measurements we need to discretize the continuous model with the same time step as the measurements’. Unlike the general case of non-linear estimation, the exact discrete- time dynamics corresponding to Equation (1) can be ob- tained, as proved by the following result: Theorem 1. Let χn = χtn. Then the discrete system (χn,Yn) satisfies the following equations: χn+1 = ϒnWnχnΩn, Yn = h(χn,Vn), (6) where Wn is a random variable with values in G and whose law depends on the values taken by υt for t ∈ [tn,tn+1] and on the law of wt for t ∈[tn,tn+1], and ϒn and Ωn are elements of G which only depend on the values taken respectively by υt and ωt for t ∈[tn,tn+1]. Proof. For n ∈N consider the value of the process χt is known until time tn. Let ϒt and Ωt be the solutions of the following equations: ϒtn = Id, d dt ϒt = υtϒt and Ωtn = Id, d dt Ωt = Ωtωt Let Wt be the solution of the following (Stratonovich) stochastic differential equation: Wtn = Id, d dtWt = ϒ−1 t wtϒtWt Note that ϒ−1 t wtϒt being in g, Wt is ensured to stay in G. Define the process χt|tn = ϒtWtχtnΩt. We will show that for t > tn the processes χt and χt|tn verify the same stochastic differential equation. Indeed: d dt χt|tn = ( d dt ϒt)WtχtΩt +ϒt( d dtWt)χtΩt +ϒtWtχt( d dt Ω) = υtϒtWtχtΩt +wtϒtWtχtΩt +ϒtWtχtΩtωt = (υt +wt)χt|tn + χt|tnωt Thus the two processes have the same law at time tn+1, i.e. χn+1 and Wtn+1ϒtn+1χtnΩtn+1 have the same law. Let- ting ϒn = ϒtn+1, Ωn = Ωtn+1 and Wn = Wtn+1 we obtain the result. Remark 2. In many practical situations (for instance ex- amples 2.2.1 and 2.2.2), the Langevin noise wt is isotropic and we have thus ϒ−1 t wtϒt L= wt. Note that, the variable Wt depends also only on the law of wt for t ∈[tn,tn+1]. In the sequel, for mathematical reasons (the equa- tions do not suffer from poly-interpretability), tutorial rea- sons (the framework of diffusion processes on Lie groups needs not be known), and practical reasons (any filter must be implemented in discrete time), we will system- atically consider the discrete-time model (6). Moreover, the noise Wn will be a general random variable in G, not necessary a solution of the stochastic differential equation d dtWt = ϒ−1 t wtϒtWt. 3 A class of discrete-time intrinsic filters 3.1 Preliminary: Kalman filter for linear stationary systems with drift Consider the particular case of a linear system in RN (with arbitrary N) of the form d dt Xt = bt +wt, wt being a white noise and bt a known deterministic input. Assume we have discrete measurements of the form Yn = HXtn +Vn ∈ Rp, Vn being a Gaussian noise in Rp. A straightforward discretization yields: Xn+1 = Xn +Bn +Wn, Yn = HXn +Vn, (7) With Bn = R tn+1 tn bsds and Wn = R tn+1 tn wsds is the value of a Brownian motion in RN. The optimal filter in this case is the celebrated linear Kalman filter whose equations are: ˆX′ n+1 = ˆXn +Bn, ˆXn+1 = ˆX′ n+1 +Kn+1(Yn+1 −H ˆX′ n+1) (8) The first equation is the so-called prediction step, and the second one the correction step. Let e′ n = ˆX′ n −Xn and en = ˆXn −Xn denote the predicted and corrected state es- timation errors, we get: e′ n+1 = en −Wn, en+1 = e′ n+1 −Kn+1(He′ n+1 −Vn+1) (9) The error equation is autonomous and can be optimized independently from the dynamical inputs Bn’s. As a con- sequence, the optimal gains Kn can be computed off-line. It is a well-known result that they (as well as the error distribution) converge under observability and detectabil- ity assumptions, so all computations can be done off-line and only the first values need be stored, in which case only the equations (8) must be computed on-line, requir- ing extremely restricted computational power on board. Mimicking the form of equations (8) on the Lie group G where the addition is naturally replaced with the group multiplication law, leads to a similar result, as shown in the next section. 3.2 Proposed intrinsic filters Inspired by the linear case and the theory of continuous- time symmetry-preserving observers on Lie groups [5] we propose to mimic the prediction and update steps (8) re- placing the addition with the group multiplication yield- ing the class of filters defined by: ˆχ′ n+1 = ϒn ˆχnΩn, (10) ˆχn+1 = Kn+1( ˆχ′n+1Yn+1) ˆχ′n+1, (11) where Kn+1(•) can be any function of Y →G, ensuring K(h(Id,0)) = Id . There are two ways to understand the links between (8) and (10)-(11). First, by defining a an estimation error on the group G by χn+1 ˆχ−1 n+1 (see below), and by using the left-right equivariance hypothesis, which allows to interpret ˆχ′n+1Yn+1 = h(χn+1 ˆχ′−1 n+1,Vn+1) as a group equivalent to H(X′ n+1 −ˆXn+1)+Vn+1. Secondly, by viewing the linear case of Section 3.1 as a specific case of the proposed approach through the following proposition whose proof has been moved to the Appendix. Proposition 2. There exists an isomorphic representation of RN as a matrix Lie group such that (7) takes the canoni- cal form (1)-(2), and (8) becomes the invariant filter (10)- (11). We now define the invariant output errors, which are a transposition of the linear error to our multiplicative group: ηn = χn ˆχ−1 n , η′ n = χn ˆχ′−1 n (12) We have the following striking property, that is similar to the linear case: Theorem 2. The error variables ηn and η′ n are Markov processes, and are independent of the inputs Ωn’s. Proof. The equations followed by η′ n and ηn read: η′ n+1 = χn+1 ˆχ′−1 n+1 = ϒnWnχnΩnΩ−1 n ˆχ−1 n ϒ−1 n = ϒnWnηnϒ−1 n (13) and: ηn+1 = χn+1 ˆχ′−1 n+1Kn+1( ˆχ′n+1Yn+1)−1 = η′ n+1Kn+1( ˆχ′n+1h(χn+1,Vn+1))−1 L= η′ n+1Kn+1(h(η′ n+1,Vn+1))−1, (14) thanks to the equivariance property of the output. The most important consequence of this property is that if the inputs υt are known in advance, or are fixed, as it is the case in Examples 2.2.1 and 2.2.2, the gain func- tions Kn can be optimized off-line, independently of the trajectory followed by the system. In any case, numerous choices are possible to tune the gains Kn and the remain- ing sections are all devoted to various types of methods to tackle this problem. 4 Fixed gains filters In certain cases, one can build an (almost) globally con- vergent observer for the associated deterministic system, i.e., with noise turned off, by using a family of constant gain function Kn(•) ≡K(•). If the filter with noise turned off has the desirable property of forgetting its initial con- dition, convergence to a single point is impossible to re- tain, because of the unpredictability of the noises that “steer” the system, but convergence of the distribution can be expected, assuming as in the linear case that ϒn is inde- pendent of n. Indeed in this section we prove that, when noise is turned on, the error forgets its initial distribution under mild conditions. The results are illustrated by an attitude estimation example for which we propose an in- trinsic filter having strong convergence properties. It should be noted that in practice, the convergence to an invariant distribution allows in turn to pick the most desirable gain K(•) among the family based on a per- formance criterion, such as convergence speed, or filter’s precision (that is, ensuring low error’s dispersion). This fact will be illustrated by an artificial horizon example of Subsection 4.3. 4.1 Convergence results Here the left-hand inputs ϒn are assumed fixed. This, to- gether with constant gains Kn, makes the error sequence ηn a homogeneous Markov chain. Thus, under appropri- ate technical conditions, the chain has a unique stationary distribution and the sequence converges in distribution to this invariant distribution. Let d denote a right-invariant distance on the group G. We propose the following as- sumptions: 1. Confinement of the error: there exists a compact set C such that ∀n ∈N,ηn ∈C a.s. for any η0 ∈C 2. Diffusivity: the process noise has a continuous part with respect to Haar measure, with density positive and uniformly bounded from zero in a ball of radius α > 0 around Id. 3. Reasonable output noise: ∀g ∈ G, P[gK(h(g,Vn))−1 ∈Bo(gK(h(g,0))−1, α 2 )] > ε′ for some ε′ > 0. The second assumption implies, and can in fact be re- placed with, the more general technical assumption that there exists ε > 0 such that for any subset U of the ball Bo(Id,α) we have P(Wn ⊂U) > εµ(U) for all n ≥0 where µ denotes the Haar measure. Those noise proper- ties are relatively painless to verify, whereas the confine- ment property although stronger is automatically verified whenever G is compact, e.g. G = SO(3). Intuitively, the last two assumptions guarantee the error process is well approximated by its dynamics with noise turned off, fol- lowed by a small diffusion. In the theory of Harris chains, the latter diffusion step is a key element to allow proba- bility laws to mix at each step and eventually forget their initial distribution. Theorem 3. For constant left-hand inputs ϒn ≡ϒ, con- sider the filter: ˆχ′ n+1 = ϒ ˆχnΩn (15) ˆχn+1 = K( ˆχ′n+1Yn+1) ˆχ′n+1 (16) Suppose that Assumptions 1)-2)-3) are verified, where the compact set satisfies C = cl(Co), cl denoting the closure and o the interior. When noises are turned off, the error equation (13)-(14) becomes: γ′ n+1 = ϒγnϒ−1, γn+1=γ′ n+1Kn+1(h(γ′ n+1,0))−1, (17) Suppose that for any γ0 ∈C, except on a set of null Haar measure, γn converges to Id. Then there exists a unique stationary distribution π on G such that for any prior law µ0 of the error η0 supported by C, the law (µn)n≥0 of (ηn)n≥0 satisfies the total variation (T.V.) norm conver- gence property: lim n ∥µn −π∥T.V. →0 Corollary 1. When the group G is compact, The conver- gence results of Theorem 3 hold globally, i.e. without the confinement assumption 1). Theorem 4. Under the assumptions of Theorem 3, as- suming only h(γn,0) →0 instead of almost global con- vergence of (γn)n≥0, that G is compact, the set K = {g ∈ G,h(g,0) = h(Id,0)} connected and h(ϒ,0) = h(Id,0), the results of Theorem 3 are still valid. Moreover, if Wn is isotropic, we have π(˜ϒV) = π(V) for any ˜ϒ ∈K commut- ing with ϒ (˜ϒϒ = ϒ˜ϒ). The proofs of the results above have all been moved to the Appendix. 4.2 Application to attitude estimation Consider the attitude estimation example of Subsection 2.2.1. In a deterministic and continuous time setting, al- most globally converging observers have been proposed in several papers (see e.g. [37, 26]) and have since been analyzed and extended in a number of papers. In order to apply the previously developed theory to this example, the challenge is twofold. First, the deterministic observer must be adapted to the discrete time and proved to be al- most globally convergent. Then, the corresponding filter must be proved to satisfy the assumptions of the theorems above in the presence of noise. In discrete time, the sys- tem equations read: Rn+1 =WnRnΩn, Yn =(RT n b1 +V 1 n ,RT n b2 +V 2 n ) with the notations introduced in 2.2.1. We propose the following filter on SO(3): ˆR′ n+1 = ˆRnΩn, ˆRn+1 = K( ˆR′n+1Yn+1) ˆR′n+1, with K(y1,y2) = exp(k1(y1 ×b1)+k2(y2 ×b2)), k1 > 0, k2 > 0, k1 +k2 ⩽1 (18) Proposition 3. With noise turned off, the discrete invari- ant observer (18) is almost globally convergent, that is, the error converges to Id for any initial condition except one. The proof has been moved to the Appendix, only the main idea is given here. For the continuous-time de- terministic problem it is known, and easily seen, that E : γ →k1||γTb1 −b1||2 +k2||γTb2 −b2||2 is a Lyapunov function, allowing to prove almost global convergence of the corresponding observer. In the discrete time determin- istic case, the function above remains a Lyapunov func- tion for the sequence (γn)n≥0, allowing to derive Propo- sition 3. This is not trivial to prove, and stems from the more general following novel result: Proposition 4. Consider a Lie group G equipped with a left-invariant metric ⟨.,.⟩, and a left-invariant determinis- tic discrete equation on G of the form: γn+1 = γn exp(−k(γn)) Assume there exists a C2 function E : G →R≥0 with bounded sublevel sets, a global minimum at Id, and a con- tinuous and strictly positive function u : G →R>0 such that: ∀x ∈G,k(x) = u(x)[x−1.gradE(x)]. If the condition ∀x ∈G,| ∂k ∂x| ⩽1 (for the operator norm) is verified, for any initial value γ0 such that Id is the only critical point of E in the sublevel set {x ∈G | E(x) ≤E(γ0)} we have: γn → n→∞Id The proof has been moved to the Appendix. Note that, the latter property is closely related to Lemma 2 of [41]. Using Theorem 3 and Proposition 3 we finally directly get: Theorem 5. The distribution of the error variable of the invariant filter (18) converges for the T.V. norm to an asymptotic distribution, which does not depend on its ini- tial distribution. 4.3 Learning robust gains: application to the design of an artificial horizon As proved in 4.1, under appropriate conditions the error variable is a converging Markov chain whose asymptotic law depends on the gain function but not on the trajectory followed by the system (which is a major difference with most nonlinear filters, such as the EKF). Hence a fixed gain can be asymptotically optimized off-line, leading to a very low numerical cost of the on-line update. A classical aerospace problem is the design of an artifi- cial horizon using an inertial measurement unit (IMU). An estimation of the vertical is maintained using the observa- tions of the accelerometer (which senses the body accel- eration minus the gravity vector, expressed in the body frame) and the stationary flight assumption according to which the body’s linear velocity is constant. The prob- lem is that this approximation is not valid in dynamical phases (take-off, landing, atmospheric turbulence), which are precisely when the artificial horizon is most needed. The problem is generally stated as follows: d dt Rt = Rt(ωt +wt)×, Yn = RT tng+Vn +Nn, where Rt is the attitude of the aircraft (the rotation from body-frame coordinates to inertial coordinates), ωt is the continuous-time gyroscope increment and Yn is the obser- vation of the accelerometer. The sensor noises wt and Vn can be considered as Gaussian, and Nn represents fluctu- ations due to accelerations of the aircraft that we propose to model as follows: Nn is null with high probability but when non-zero it can take large values. The Nn’s are as- sumed to be independent as usually. 4.3.1 Convergence results Consider the following class of filters: ˆR′ n+1 = ˆRnΩn, ˆRn+1 = K( ˆR′n+1Yn+1) ˆR′n+1, with K(y) = exp( fk,λ(y)) (19) where fk,λ(x) = k.min(angle(x,g),λ) x×g ||x×g|| if x × g ̸= 0 and fk,λ(x) = 0 otherwise. The rationale for the gain tuning is as follows: if the accelerometer measures a value y, we consider the small- est rotation giving to g the same direction as y. Con- serving the same axis, the angle of this rotation is thresh- olded (hence the parameter λ) to give less weight to out- liers (without purely rejecting them, otherwise the filter couldn’t converge when initialized too far). Then we choose as a gain function a rotation by a fraction of the obtained angle. We begin with the following preliminary result: Lemma 1. For any 0 < k ⩽1 and 0 < λ ⩽π the out- put error ∥Yn −ˆRT n g∥associated to the observer (19) with noise turned off converges to 0. Proof. Let us consider the error evolution when the noise is turned off. It writes γn+1 = γn exp(−fk,λ(γ−1 n g)), thus we have γ−1 n+1g = exp( fk,λ(γ−1 n g))γ−1 n g. As for any n ∈N, fk,λ(γ−1 n g) is orthogonal to g and γ−1 n g, γ−1 n+1g stays in the plane spanned by g and γ−1 0 g, as well as the whole sequence (γ−1 n g)n⩾0. Let φn = angle(γ−1 n g,g). The dy- namics of (φn) writes: φn+1 = φn −k.min(λ,φn). Thus φn goes to 0, i.e: γ−1 n g → n→∞g, i.e. the observation error goes to 0. The following result is a mere consequence of Lemma 1 and Theorem 4. Proposition 5. The error variable associated to the filter defined by (19) converges to a stationary distribution for the T.V. norm, which does not depend on its initial distri- bution. 4.3.2 Numerical asymptotic gain optimization To each couple (k,λ) we can associate an asymptotic er- ror dispersion (computed in the Lie Algebra) associated to the corresponding stationary distribution, and try to minimize it. As all computations are to be done off- line, the statistics of all distributions can be computed using particle methods. Table 1 gives the parameters of the model used in the following numerical experi- ment. Figure 1 displays the Root Mean Square Error RMSE = p E(η∞g−g), computed over a grid for the pa- rameters (k,λ). The minimum is obtained for k = 0.1202 and λ = 0.0029. If we compare it to a MEKF, we observe a huge difference. For our asymptotic invariant filter we get RMSE = 8.02 × 10−4. For the MEKF, the observa- tion noise matrix giving the best results leads to the value RMSE = 4.3 × 10−3. This result is not surprising due to the fact that the outliers significantly pollute the estimates of the Kalman filter (see Fig. 1). This illustrates the fact that when the noise is highly non-Gaussian, an asymptotic gain with some optimality properties can still be found. 5 Gaussian filters The present section focuses more on the transitory phase, and the gain is computed at each step based on Gaussian noise approximations and linearizations. Beyond the lo- cal optimization underlying the gain computation, the two following filters have the merit to readily convey an ap- proximation of the extent of uncertainty carried by the estimations. We first introduce the discrete-time invari- ant extended Kalman filter (IEKF), which is an EKF, but based on the linearization of an invariant state error. Then we introduce the IEnKF, which computes the empirical covariance matrix of the error using particles. As all com- putations can be done off-line, it is easily implementable, as long as the gains can be stored, and is shown to con- vey a better indication of the error’s dispersion for large noises. Simulations illustrate the results. 5.1 The discrete-time Invariant EKF In the present section we introduce the discrete-time IEKF. Indeed, as in the standard EKF theory, the idea is to linearize the error equation (13)-(14) assuming the noises Figure 1: Artificial horizon example. Top plot: off-line optimization of the parameters of the asymptotic invari- ant filter (axis x →λ ; axis y →k ; axis z →Root Mean Square Error). The highest values are not displayed to improve visualization. The performance is optimal for k = 0.1202,λ = 0.0029. Bottom plot: error evolution of three artificial horizons : the invariant filter with op- timized gain functions (black), a MEKF ignoring outliers (red), and a MEKF where the observation noise covari- ance matrix has been numerically adjusted to minimize the RMSE (dashed line). We see that the MEKF cannot filter the outliers and at the same time be efficient in the absence of outliers, contrarily to the proposed invariant filter. and the state error are small enough, use Kalman equa- tions to tune the gains on this linear system, and imple- ment the gain on the nonlinear model. As in the heuristic theory of the Invariant EKF in continuous time [3] [6] the gains and the error covariance matrix are proved to con- verge to a fixed value. In particular this allows (after some time) to advantageously replace the gain by its constant fi- nal value leading to a numerically very cheap asymptotic version of the IEKF. Algorithm 1 Invariant Extended Kalman Filter Returns at each time step ˆχn, Pn such that χn ≈ exp(ξn) ˆχn, where ξn is Gaussian and var(ξn) = Pn. ˆχ0 and P0 are inputs. Hξ,HV defined by: h(exp(ξ ′ n),Vn) = h(Id,0)+Hξξ ′ n+HVVn+O(∥ξ ′ n∥2)+O(∥Vn∥2) Set n = 0. loop Compute the value Mtn+1 solving the equation Mtn = 0, d dt Mt = Var(wt)+adυtMtadT υt The process noise covariance Qw n = Var(Adϒnwn) is equal to Mtn+1 if wn is isotropic or if υt ≡0, merely set Qw n = Var(wn)(tn+1 −tn)  , QV n+1 = Var(Vn+1), Pn+1|n = AdϒnPnAdT ϒn +Qw n , Sn+1 = HVQV n+1HT V +HξPn+1|nHT ξ , Ln+1 = Pn+1|nHT ξ S−1 n+1, Pn+1 = (I −Ln+1Hξ)Pn+1|n, ˆχ′ n+1 = ϒn ˆχnΩn, ˆχn+1 = exp(Ln+1[ ˆχ′n+1Yn+1 −h(Id,0)]) ˆχ′n+1 n ←n+1 end loop 5.1.1 Linearization of the equations and IEKF for- mulas We consider here the error equations (13)-(14). Assuming errors and noises are small, we introduce their projection in the tangent space at the identity Id (the so-called Lie algebra g of the group) using the matrix exponential map exp(.) (see e.g. [31]). As the matrix vector space g can be identified with Rdimg using the linear mapping (.)m (see Table 2) we can assume exp : Rdimg →G. This function is defined in Table 2 for usual Lie groups of mechanics. Its inverse function will be denoted by exp−1. We thus define the following quantities of Rdimg: ξn = exp−1(ηn), ξ ′ n = exp−1(η′ n), wn = exp−1(Wn) The simplest way to design the gain is to use a function which is linear in Rdimg and then mapped to G through the exponential: Kn : y →exp[Ln(y−h(Id,0))] Equations (13) and (14) mapped to Rdimg become: ξ ′ n+1 = exp−1(exp(Adϒnwn)exp(Adϒnξn)), ξn+1 = exp−1(exp(ξ ′ n+1)exp[−Ln(h(exp(ξ ′ n+1,Vn+1))−h(Id,0))]), where we have introduced the adjoint matrix Ad: Definition 2. The following property of the adjoint matrix Ad can be used as a definition: ∀g ∈G,∀u ∈Rdimg, exp(Adgu) = gexp(u)g−1 The differential of ξ →Adexp(ξ) at zero is denoted ξ → adξ. To tune the gains Ln through the Kalman theory we need to evaluate at each time step, using Pn = Var(ξn) ∈ Rdimg×dimg, the following quantities: Pn+1|n = Var(ξ ′ n+1), Sn+1 = Var(Yn+1) Our approach consists of using a coarse first-order de- velopment of the propagation and observation functions. To do so we use the Baker-Campbell-Hausdorff formula and keep only first order terms in ξ ′ n+1, wn and Vn+1, the crossed terms being also neglected: ξ ′ n+1 = Adϒnwn +Adϒnξn h(exp(ξ ′ n+1),Vn+1) = h(Id,0)+Hξξ ′ n+1 +HVVn+1 ξn+1 = ξ ′ n+1 −Ln+1(Hξξ ′ n+1 +HVVn+1) The computation of the gains is then straightforward using the Kalman theory in Rdimg, which is a vector space. It is described in Algorithm 1. 5.1.2 Convergence of the gains The main benefit of the filter is with respect to its conver- gence properties. Indeed, under very mild conditions, the covariance matrix Pn and the filter’s gain Kn are proved to converge to fixed values [20]. The practical consequences are at least twofold: 1-the error covariance converges to a fixed value and is thus much easier to interpret by the user than a matrix whose entries keep on changing (see Fig. 2) 2-due to computational limitations on-board, the covariance may be approximated by its asymptotic value, leading to an asymptotic version of the IEKF being nu- merically very cheap. Theorem 6. If the noise matrices Qw n = Var(wn), QV n = Var(Vn) and the left inputs ϒn are fixed, the pair (Adϒn,Hξ) is observable and Qw n has full rank, then Pn and Kn converge to fixed values P∞and K∞, as in the lin- ear time-invariant case. 5.2 The discrete-time Invariant EnKF When the error equation is independent of the inputs, the exact density of the error variable can be sampled off- line. This allows to radically improve the precision of the quantities involved in the Kalman gains computation of Section 5.1.1. We propose here the Invariant Ensemble Kalman filter (IEnKF) described in Algorithm 2. The idea is to compute recursively through Monte-Carlo simula- tions a sampling of the error density and to use it, instead of linearizations, to evaluate precisely the innovation and error covariance matrices used to compute the gains in the Kalman filter. The procedure is described in Algorithm 2. 5.3 Simulations We will display here the results of Algorithms 1 and 2 for the attitude estimation problem described in Section 2.2.1: d dt Rt = Rt(ωt +wω t )×, Yn = (RT tnb1 +V 1 n ,RT tnb2 +V 2 n ), where Rt ∈SO(3) represents the rotation that maps the body frame to the earth-fixed frame, and ωt is the instan- taneous angular rotation vector which is assumed to be measured by noisy gyroscopes. (Yn)n≥0 is a sequence of Algorithm 2 IEnKF Define H by h(exp(ξ),0) = h(Id,0)+Hξ +O(∥ξ∥2) Sample M particles (ηi 0)1⩽i⩽M following the prior error density for n = 0 to N −1 do for i = 1 to M do η′i n+1 = W i nϒηi nϒ−1, yi n+1 = h(η′i n+1,V i n+1), end for Pn+1|n = 1 M ∑M i=1 exp−1(η′i n+1)exp−1(η′i n+1)T Sn+1 = 1 M ∑M i=1 yi n+1yi n+1 T, Ln+1 = Pn+1|nHTS−1 n+1 for i = 1 to M do ηi n+1 = η′i n+1 exp(−Ln+1[yi n+1 −h(Id,0)]) end for Store the gain matrix Ln+1 end for The prior estimation ˆχ0 is an input. for n = 0 to N-1 do ˆχ′ n+1 = ϒ ˆχnΩn, ˆχn+1 = exp(Ln+1[ ˆχ′n+1Yn+1 −h(Id,0)]) ˆχ′n+1 end for discrete noisy measurements of the vectors b1 and b2 (for instance the gravity and the magnetic field), V 1 n and V 2 n are sequences of independent isotropic white noises. A sim- ulation has been performed using the parameters given in Table 3. In this case the IEKF equations are described by Algorithm 3. As the state-of-the-art methods for this par- ticular problem are the Multiplicative Extended Kalman Filter (MEKF, see e.g. [12]) and the USQUE filter [11] (a quaternion implementation of the Unscented Kalman Filter), the four methods are compared. The evolution of the gains is displayed on Fig. 2 and shows the interest of the invariant approach. The error variable is expressed in the Lie algebra and its projection to the first axis is given on Fig. 3 for each method (the projections on other axes being very similar, they are not displayed due to space limitations). All perform satisfactorily, but some differ- ences are worthy of note. First we can see that the MEKF and the IEKF yield comparable performances but only the gains of the IEKF converge. Moreover, the linearizations lead the MEKF and the IEKF to fail capturing accurately the error dispersion through a 3σ-envelope. To this re- spect, the USQUE performs better than the two latter fil- ters but still does not succeed in capturing the uncertainty very accurately. On the other hand, the envelope provided by the EnKF is very satisfying. This result is not surpris- ing: this envelope has been computed using a sampling of the true density of the error, there is thus no reason why it should not be a valid approximation as long as there are sufficiently many particles. This method is thus to be pre- ferred to the other ones when the user is willing and able to perform extensive simulations, and has the capacity to store the gains over the whole trajectory. Algorithm 3 Invariant Extended Kalman Filter on SO(3) Returns at each time step ˆRn, Pn such that Rn ≈ exp(ξn) ˆRn, where ξn is a centered Gaussian and var(ξn) = Pn. ˆR0 and P0 are inputs. Hξ =  (b1)× (b2)×  , HV = I6, QV =  Var(V 1) 0 0 Var(V 2)  , Qw = R tn+1 tn Var(wω s )ds. Set n = 0. loop Let Ωn be the solution at tn+1 of Ttn = I3, d dt Tt = (ωt)×. Pn+1|n = Pn +Qw, Sn+1 = HVQVHT V +HξPn+1|nHT ξ , Ln+1 = Pn+1|nHT ξ S−1 n+1, Pn+1 = (I −Ln+1Hξ)Pn+1|n, ˆR′ n+1 = ˆRnΩn ∆χ = Ln+1[  ˆR′n+1Y 1 n+1 ˆR′n+1Y 2 n+1  −  b1 b2  ], ˆRn+1 = exp(∆χ) ˆR′n+1 (see Table 2) n ←n+1 end loop 6 Conclusion This paper has introduced a proper stochastic framework for a class of filtering problems on Lie groups. A wide class of discrete-time filters with strong properties has Figure 2: Evolution over time of the coefficients of the gain matrix for MEKF, IEKF, USQUE and IEnKF. The coefficients of the MEKF and USQUE have an erratic evolution, whereas those of the IEKF are convergent, al- lowing to save much computation power using asymptotic values. Moreover, in this case, only 4 of the 18 coeffi- cients are not equal to zero for the IEKF, allowing sparse implementation. been proposed, along with several methods to tune the gains. When the gain is held fixed, the estimation error has the remarkable property to provably converge in dis- tribution. Another route is to focus more on the transi- tory phase, and to seek to minimize the dispersion of the estimation error at each step based on linearizations and Gaussian noise approximations. This has lead to an in- variant version of the celebrated EKF, having the property that the error covariance matrix converges asymptotically, as well as to an invariant version of the EnKF which has been shown to convey an accurate indication of the level of uncertainty carried by the estimate. The results of the paper open the door to a rich vari- ety of filters on Lie groups. Indeed, all the classical gain tuning approaches can still be considered, and a method based on the unscented transform for instance, complet- ing the work [11] seems an interesting idea. In a situation where many observations are available at each time step, an information filter could also be derived mimicking the IEKF. Besides, we believe that the cases where the gains of the IEKF are sparse should be given special attention, Figure 3: Evolution of the error variable, projected on the first axis, for 1000 simulations using MEKF, IEKF, USQUE and IEnKF. The MEKF and the IEKF give sim- ilar results. The USQUE is doing slightly better but is outperformed by the IEnKF. Only the latter captures the error dispersion satisfactorily, as the 3σ envelope contains 99 % of the simulated trajectories. by studying when this property occurs and if significant computation power and memory can be saved. The possi- bilities offered by off-line simulations deserve also to be deepened. The computation of exact confidence sets can be explored, establishing a link with the approach of set- valued observers proposed in [8] and [38]. Concerning the asymptotic method introduced here, the optimization has been performed quite naively as efficient density sim- ulation is beyond the scope of the paper, but there may certainly be some room for improvements using Markov Chain Monte-Carlo (MCMC) schemes to speed up con- vergence. In the future we will concentrate on proving the convergence in distribution of the IEKF, that can be conjectured at this stage, but does not seem easy. A Proof of Proposition 2 For any i ∈N, let Mi(R) denote the set of square matri- ces of size i. Consider the isomorphisms φX : X ∈RN → IdN 0N,1 XT 1  ∈MN+1(R) and φY : Y ∈Rp →  HT Y T  ∈ MN+1,p. Letting χn = φX(Xn), Y ∗ n = φY(Yn), V ∗ n = φY(Vn), W ∗ n = φX(Wn), Ωn = φX(Bn) and ϒn = IdN+1 equations (7) become χn+1 = ϒnW ∗ n χnΩn and Y ∗ n = χ−1 n V ∗ n . Using the inverse isomorphisms φ −1 X and φ −1 Y , the invariant update ˆχn+1 = Kn( ˆχ′ n+1Y ∗ n+1) ˆχn+1 corresponds in Rn to (7). B Proofs of the results of Section 4.1 We first introduce here the basic notions about Harris Chains that we will need, to prove the stochastic conver- gence properties of the invariant filters. Definition 3. (First hitting time) Let S be a measurable space and (Xn)n⩾0 a Markov chain taking values in S. The first hitting time τC of a subset C ⊂S is defined by τC = infn⩾0 {Xn ∈C}. Definition 4. (Recurrent aperiodic Harris chain) Let S be a measurable space and (Xn)n⩾0 a Markov chain taking values in S. (Xn)n⩾0 is said to be a recurrent aperiodic Harris chain if there exist two sets A,B ⊂S satisfying the following properties: i For any initial state X0 the first hitting time of A is a.s. finite. ii There exists a probability measure ρ on B, and ε > 0 such that if x ∈A and D ⊂B then P(X1 ∈D|X0 = x) > ερ(D). iii There exists an integer N ⩾0 such that: ∀x ∈A,∀n ⩾ N,P(Xn ∈A|X0 = x) > 0. The somewhat technical property ii means that any given area of B can be reached from each point of A with non-vanishing probability. Theorem 7. [Harris, 1956] A recurrent aperiodic Harris chain admits a unique stationary distribution ρ∞and the density of the state Xn converges to ρ∞in T.V. norm for any distribution of X0. Proof. See [16], Theorems 6.5 and 6.8 In other words, if for any initialization we can ensure that the process will come back with probability 1 to a central area A where mixing occurs, we have a conver- gence property. The following technical result is a cor- nerstone to demonstrate the theorems of Subsection 4.1. Lemma 2. Let G be a locally compact Lie group pro- vided with a right-invariant distance d, and C ⊂G be a measurable compact set endowed with the σ-algebra Σ. Consider a homogeneous Markov chain (ηn)n⩾0 defined by the relation η0 ∈C, ηn+1 = qz(ηn) where z ∈Z is some random variable belonging to a measurable space. Let 0 denote a specific point of Z . Let Q : (C,Σ) →R+, denote the transition kernel of the chain, that is Q(x,V) = P(qz(ηn) ∈V | ηn = x). We assume that (a) there exist real numbers α,ε > 0 such that for any x ∈ C and V ⊂Bo(q0(x),α), the latter denoting the open ball center q0(x) and radius α, we have Q(x,V) = P(qz(ηn) ∈V | ηn = x) ⩾εµG(V) where µG is the right-invariant Haar measure. (b) q0 admits a fixed point x0 ∈C, i.e. q0(x0) = x0. Let U0 = {x0}. Define the sets (Un)n⩾0 recursively by U′ n = {x ∈G,d(x,Un) < α 2 } ∩C and Un+1 = q−1 0 (U′ n). If there exists an integer N ⩾0 such that UN = C, then the p.d.f. of ηn converges for the T.V. norm and its limit does not depend on the initialization. More prosaically, z denotes the cumulated effect of the model and observation noise, and it is assumed that 1- the noiseless algorithm has some global convergence proper- ties and 2- there are sufficiently many small enough noises so that there is an ε chance for ηn to jump from x to any neighborhood of the noiseless iterate q0(x) with a multi- plicative factor corresponding to the neighborhood’s size. It then defines a sequence of sets, by picking a fixed point of q0, dilating it, and taking its pre-image. Dilatation and inversion are then reiterated until the whole set C is cov- ered. In this case, forgetting of the initial distribution is ensured. Proof. We will demonstrate the property trough three in- termediate results: 1. There exists a sequence ε1,··· ,εN such that: P(η1 ∈ Un|η0 ∈Un+1) > εn > 0. 2. The first hitting time of U1 is a.s. finite. 3. (ηn)n≥0 is an aperiodic recurrent Harris chain with A = U1 and B = U′ 0. The conclusion will then immediately follow from The- orem 7. We give first a qualitative explanation of the approach. We have by assumption a sequence of sets (Un)n⩾0. Intermediate result 1) states that starting from a set Un+1 the chain has a non-vanishing chance to jump to the “smaller” set Un at the next time step. Interme- diate result 2) states that starting from Un+1, because of the non-vanishing property 1), the chain cannot avoid Un forever and eventually reaches Un, Un−1 and finally U1. Intermediate result 3) brings out the fact that, A = U′ 0 be- ing totally included in the ball of radius α centered on any of its points, the chain can go from any given point of its pre-image B = U1 to any area of A with controlled probability, which is the last property needed to obtain a recurrent aperiodic Harris chain. 1. Consider the closure U′n of U′ n. For any x ∈U′n the set Bo(x,α)∩Un is open and non-empty (as d(x,Un) ⩽ α 2 by definition of U′n) so µG(Bo(x,α) ∩Un) > 0. The function f : x →µG(Bo(x,α)∩Un) > 0 is con- tinuous (since µG is a regular and positive mea- sure) on the compact set U′n, so it admits a mini- mum mn > 0. We get: P(η1 ∈Un|η0 ∈Un+1) ≥ P(η1 ∈Bo(q0(η0),α) ∩Un|η0 ∈Un+1) = P(η1 ∈ Bo(q0(η0),α) ∩Un|q0(η0) ∈U′ n) ≥εmn using the fact that, by assumption (a), Q(x,V) ≥εµG(V) for x ∈C and V ⊂Bo(q0(x),α). We set εn = εmn to prove intermediate result 1). 2. We will prove by descending induction on n the property Pn : P(τn < ∞) = 1. PN is obvious as UN =C. Now assume Pn+1 is true for n ⩾1. Due to homogeneity, we can construct a strictly increasing sequence of stopping times (νp)p⩾0 such that: ∀p ⩾ 0,ηνp ∈Un+1. But P(ηνp+1 ∈Un|ηνp ∈Un+1) > εn by intermediate result 1). Thus letting Tk denote the event {∀i ⩽k,ηi /∈Un} = {k < τn}, we have for any n ⩾0: P(Tνp+1) = P(Tνp)P(Tνp+1|Tνp) ⩽P(Tνp)(1− εn). Thus P(Tνp+1) ⩽P(Tνp+1) ⩽P(Tνp)(1−εn). A quick induction and a standard application of Borel- Cantelli Lemma prove there exists a.s. a rank p such that Tνp is false. In other words, Pn is true. In par- ticular, P1 is true: the first hitting time of U1 is a.s. finite. 3. Define A =U1 and B =U′ 0 = Bo(x0, α 2 )∩C . For any V ⊂B and x ⊂A we have V ⊂Bo(q0(x),α), thus Q(x,V) ⩾εµG(V) by assumption (a). Thus (ηn)n⩾0 verifies property ii) of Definition 4. As A = U1 the intermediate result 2) shows that (ηn)n⩾0 verifies the property i) of Definition 4. As there exists a.s. a rank k such that ηk ∈U1 by intermediate result 2), there exists an integer M such that P(ηM ∈U1) > 0. As U1 ∩U′ 0 ⊂Bo(q(x),α) for any x ∈U1 (by defini- tion of U′ 0 and U1) Assumption (a) gives immediately P(ηM+1 ∈U1∩U′ 0) > P(ηM ∈U1)εµG(U1∩U′ 0). Us- ing only the definition of U′ 0 and U1, and Assump- tion (b), we see easily that both contain x0. Thus the set U′ 0 ∩U1 is non-empty and open. We have then µG(U1 ∩U′ 0) > 0 and P(ηM+1 ∈U1 ∩U′ 0) > 0. By an obvious induction on m, P(ηm ∈U1 ∩U′ 0) > 0 for any m > M. As U1 ∩U′ 0 ⊂A we get P(ηm ∈A) > 0 for any m > M and the property iii) of Definition 4 is also verified. We finally obtain that (ηn)n⩾0 is an aperiodic recurrent Harris chain. Using Theorem 7 we can conclude that the density of ηn converges in T.V. norm to its unique equilibrium distribu- tion for any initialization. Building upon the previous results, the proof of Theo- rem 3 is as follows. The Markov chain defined by equa- tions (13) and (14) with fixed ϒ can be written under the form η′ n+1 = qz(η′ n) = q1,z ◦q2,z(η′ n) with z = (Wn,Vn) ∈ G×Rp, q1,z(x) = ϒWnxϒ−1 and q2,z(x) = xK(h(x,Vn))−1. Let 0 = (Id,0). We will show that (η′ n)n⩾0 has all the properties required in Lemma 2. Let α,ε,ε′ be defined as in 4.1. Let α′ such as Bo(Id,α′) ⊂ϒBo(Id, α 2 )ϒ−1 and ε′′ = ε′|Adϒ−1|ε, where |Adϒ−1| is the determinant of the adjoint operator on g. For any x ∈C and V ⊂Bo(q0(x),α′) = Bo(Id,α′)q0(x) we have: P(qz(x) ∈V) ⩾P(qz(x) ∈V,q2,z(x) ∈Bo(q2,0(x), α 2 )) ⩾P(q2,z(x) ∈Bo(q2,0(x), α 2 ))× P(ϒWnq2,z(x)ϒ−1 ∈V|q2,z(x) ∈Bo(q2,0(x), α 2 )) i.e. P(qz(x) ∈ V) ⩾ ε′P(Wn ∈ ϒ−1Vϒq2,z(x)−1|q2,z(x) ∈Bo(q2,0(x), α 2 )). As- suming q2,z(x) ∈ Bo(q2,0(x), α 2 ) we have: ϒ−1Vϒq2,z(x)−1 ⊂ ϒ−1Bo(Id,α′)q0(x)ϒq2,z(x)−1 ⊂ Bo(Id, α 2 )ϒ−1q0(x)ϒq2,z(x)−1 ⊂ Bo(Id, α 2 )q2,0(x)q2,z(x)−1 ⊂Bo(Id, α 2 + α 2 ). As Wn is independent from the other variables we obtain: P(Wn ∈ϒ−1Vϒq2,z(x)−1|q2,z(x) ∈Bo(q2,0(x), α 2 )) ⩾ εµG(ϒ−1Vϒq2,z(x)−1) = εµG(ϒ−1Vϒ) = εµG(V)|Adϒ−1| And finally: P(qz(x) ∈V) ⩾ε′|Adϒ−1|εµ(V) = ε′′µ(V) As q0 has Id as a fix point we only have to ver- ify that the sets Un as defined in Lemma 2 eventually cover the whole set C. It suffices to consider for any n ⩾0 the set Dn = {x ∈C,∀k ⩾n,qk 0(x) ∈U′ 0}. As we have qn 0(x) →Id almost-everywhere on C we get: µG(∪n⩾0Dn) = µG(C). As the sequence of sets (Dn)n⩾0 increases we have: µG(Dn) −→ n→∞µG(C). We introduce here the quantity vmin = minx∈C µG(Bo(x, α′ 2 ) ∩C) (the property C = cl(Co) and the regularity of µG ensure that we have vmin > 0). Let N ∈N be such that µG(DN) > µG(C)−vmin. We have then: ∀y ∈C,d(y,DN) < α′ 2 (oth- erwise we would have µG(DN) ⩽µG(C) −vmin). As we have DN ⊂UN we obtain ∀y ∈C,d(y,DN) < α′ 2 thus U′ N = C and UN+1 = q−1 0 (U′ N) = q−1 0 (C) = C. So all the conditions of Lemma 2 hold and we can conclude con- vergence in T.V. norm to a stationary distribution which doesn’t depend on the prior, provided that its support lies in C. Theorem 3 is proved. Corollary 1 follows directly from the case C = G. Theorem 4 can be proved as follows. Let C = G. Let K = {x ∈G,h(x,0) = h(Id,0)}. Note that the left-right equivariance assumption ensures that K is a subgroup of G. First we show that there exists an integer N1 such that K ⊂UN1. As we have ∀x ∈K q0(x) = ϒxϒ−1 the se- quence of sets Qn = ϒ−nUnϒn∩K is growing and we have: {x ∈K, d(x,Qn) < α 2 } ⊂Qn+1.The set Q∞= ∪∞ n=1Qn is open in K as an union of open sets, but we have ∀x ∈ Q∞,Bo(x, α 2 )∩K ⊂Q∞thus ∀x ∈K \Q∞,Bo(x, α 4 )∩K ⊂ K \ Q∞. This implies that Q∞and K \ Q∞are both open in K. As K is connected (see assumptions of Theorem 4) we obtain Q∞= K. As K is compact (as a closed sub- set of a compact) the open cover ∪∞ i=1Qn has a finite sub- cover and there exists an integer N1 such that QN1 = K, i.e. ϒN1Kϒ−N1 = UN1. As K is a subgroup of G (see above) containing ϒ we obtain K ⊂UN1. Now we have to prove that the sets (Un)n⩾0 eventually cover the whole set C. For any x ∈G we have h(qn 0(x),0) →h(Id,0). Thus there exists a rank n such that ∀k ⩾n,qk 0(x) ∈U′ N (oth- erwise we could extract a subsequence from (qn 0(x))n⩾0 which stays at a distance > α 2 from K, as G is com- pact we could extract again a convergent subsequence and its limit q∞ 0 (x) would be outside K, thus we would have h(qn 0(x)) ̸→h(Id,0)). Here we can define as in the proof of Theorem 3 the sets Dn = {x ∈C,∀k ⩾n,qk 0(x) ∈U′ N1} (note that U′ 1 has been replaced by U′ N1 in this definition of Dn). As for almost any x ∈G there exists a rank n such that x ∈Dn we have µG(Dn) →µG(C) and as in the proof of Theorem 3 the sets Un eventually cover the set C. The second part of the property (invariance to left mul- tiplication) is easier. Consider the error process ζn = ˜ϒηn. The propagation and update steps read: ζ ′ n+1 = ˜ϒη′ n+1 = ˜ϒϒWnηnϒ−1 = ϒ˜ϒWnηnϒ−1 L= ϒWn ˜ϒηnϒ−1 L= ϒWnζϒ−1 ζn+1 = ˜ϒη′ n+1Kn+1(h(η′ n+1,Vn+1))−1 L= ˜ϒη′ n+1Kn+1(h(˜ϒη′ n+1,Vn+1))−1 L= ζ ′ n+1Kn+1(h(ζ ′,Vn+1))−1 where we have used the property: h(η′ n+1,Vn+1) L= η′−1 n+1h(Id,Vn+1) L= η′−1 n+1h(˜ϒ,Vn+1) L= h(˜ϒη′ n+1,Vn+1). We see that the law of the error process is invariant under left multiplication by ˜ϒ, thus the asymptotic distribution π in- herits this property. C Proofs of the results of Section 4.2 C.1 Proof of Proposition 4 We define the continuous process ˜γt : R →G as follows: ∀n ∈N, ˜γn = γn ∀n ∈N,∀t ∈[n,n+1[, d dt ˜γt = −˜γtk( ˜γn) We obtain immediately that ˜γt is continuous. Besides, for any n ∈N and t ∈]n,n+1[ one has: | d dt ⟨k( ˜γt),k( ˜γn)⟩| = |⟨d dt k( ˜γt),k( ˜γn)⟩| ⩽| d dt k( ˜γt)||k( ˜γn)| ⩽| d dt ˜γt||k( ˜γn)| (due to |∂k ∂x| ⩽1 ) ⩽|k( ˜γn)|2 (as d dt ˜γt = −˜γtk( ˜γn) ) Thus: ⟨k( ˜γt),k( ˜γn)⟩⩾⟨k( ˜γn),k( ˜γn)⟩−(t −n)|k( ˜γn)|2 ⩾0 proving in the Lie algebra that ⟨k( ˜γt),k( ˜γn)⟩⩾0. As u is a positive function we have thus u( ˜γt)−1⟨˜γtk( ˜γt),−˜γtk( ˜γn)⟩⩽0 immediately proving ⟨gradE( ˜γt), d dt ˜γt⟩⩽0. The latter result being true for every t ∈R\N and the function E( ˜γt) being continuous, it decreases and thus converges on R≥0. As we have supposed the sublevel sets of E are bounded (and closed as E is continuous), ˜γt is stuck in a compact set. Let γ∞an adherence value of ˜γn. By continuity of E we have E( ˜γ∞) = E( ˜γ∞exp(−k( ˜γ∞)). Thus the function t →E( ˜γ∞exp(−tk( ˜γ∞))) is decreasing on [0,1] and has the same value at 0 and 1. Thus it is constant, its deriva- tive at 0 is null proving γ∞= Id. ( ˜γn)n⩾0 being confined in a compact set and having Id as unique adherence value we finally get γn = ˜γn →Id. C.2 Proof of Proposition 3 Let γn = Rn ˆRT n be the invariant error. Its equation reduces to: γn+1 = γn.exp(−k1(γT n b1)×b1 −k2(γT n b2)×b2) Let k(γ) = k1(γTb1) × b1 + k2(γTb2) × b2 and E : γ → k1||γTb1 −b1||2 + k2||γTb2 −b2||2. To apply Proposi- tion 4 we will first verify that 1) ∀γ ∈SO(3),γ.k(γ) = gradE(γ) 2) | ∂k ∂γ | ⩽1. To prove 1) consider the dynamics in ∈SO(3) defined by d dt γt = γt(ψt)× for some rotation vector path ψt. We have d dt E(γt) = k1(γT t b1 −b1)T d dt γT t b1 + k2(γT t b2 −b2)T d dt γT t b2. Using triple product equalities this is equal to ⟨k1(γT t b1)×b1 + k2(γT t b2)×b2,ψt⟩= ⟨k(γt),ψt⟩= ⟨γt.k(γt), d dt γt⟩. Thus γ.k(γ) = gradE(γ). To prove 2) we analogously see that d dt k(γt) = −[k1(b1)×(γT t b1)× +k2(b2)×(γT t b2)×]ψt. Thus | d dt k(γt)| ⩽(k1 +k2)|ψt| and finally | ∂k ∂γ | ⩽1. Now, except if initially γ0 is the rotation of axis b1×b2 and angle π, the function E is strictly decreasing, and Id is the only point in the sublevel set {γ ∈SO(3) | E(γ) ≤E(γ0)} such that gradE(γ) = 0. Applying Proposition 4 allows to prove Proposition 3. References [1] S. Bonnabel A. Barrau. Intrinsic filtering on SO(3) with discrete-time observations. In IEEE Confer- ence on Decision and Control, 2013. [2] M. Barczyk and A. F. Lynch. Invariant observer de- sign for a helicopter uav aided inertial navigation system. Control Systems Technology, IEEE Trans- actions on, 21(3):791–806, 2013. [3] S. Bonnabel. Left-invariant extended Kalman filter and attitude estimation. In Decision and Control, 2007 46th IEEE Conference on, pages 1027–1032. IEEE, 2007. [4] S. Bonnabel, Ph. Martin, and P. Rouchon. Symmetry-preserving observers. Automatic Con- trol, IEEE Transactions on, 53(11):2514–2526, 2008. [5] S. Bonnabel, Ph. Martin, and P. Rouchon. Non-linear symmetry-preserving observers on Lie groups. IEEE Trans. on Automatic Control, 54(7):1709 – 1713, 2009. [6] S. Bonnabel, Ph. Martin, and E. Salaun. Invariant extended Kalman filter: theory and application to a velocity-aided attitude estimation problem. In De- cision and Control, 2009 held jointly with the 2009 28th Chinese Control Conference. CDC/CCC 2009. Proceedings of the 48th IEEE Conference on, pages 1297–1304. IEEE, 2009. [7] G. Bourmaud, R. M´egret, A. Giremus, Y. Berthoumieu, et al. Discrete extended Kalman filter on Lie groups. In 21st European Signal Processing Conference 2013 (EUSIPCO 2013), 2013. [8] S. Br´as, P. Rosa, C. Silvestre, and P. Oliveira. Global attitude and gyro bias estimation based on set-valued observers. Systems & Control Letters, 62(10):937– 942, 2013. [9] R. W. Brockett. Lie algebras and Lie groups in con- trol theory. In Geometric methods in system theory, pages 43–82. Springer, 1973. [10] D.L. Burkholder, ´E. Pardoux, A.S. Sznitman, and P.L. Hennequin. ´Ecole d’´et´e de probabilit´es de Saint-Flour XIX, 1989. Springer-Verlag, 1991. [11] J. L. Crassidis and F Landis Markley. Unscented filtering for spacecraft attitude estimation. Journal of guidance, control, and dynamics, 26(4):536–542, 2003. [12] J. L. Crassidis, F. Landis Markley, and Yang Cheng. A survey of nonlinear attitude estimation meth- ods. Journal of Guidance, Control,and Dynamics, 30(1):12–28, 2007. [13] T.E. Duncan. Some filtering results in Riemann manifolds. Information and Control, 35(3):182– 195, 1977. [14] T.E. Duncan. Stochastic systems in Riemannian manifolds. Journal of Optimization theory and ap- plications, 27(3):399–426, 1979. [15] T.E. Duncan. An estimation problem in compact Lie groups. Systems & Control Letters, 10(4):257–263, 1988. [16] R. Durrett. Probability: theory and examples, vol- ume 3. Cambridge university press, 2010. [17] Y. Han and F.C. Park. Least squares tracking on the Euclidean group. Automatic Control, IEEE Trans- actions on, 46(7):1127–1132, 2001. [18] K. Itˆo. Stochastic differential equations in a differ- entiable manifold. Nagoya Mathematical Journal, 1:35–47, 1950. [19] S.J. Julier and J.K. Uhlmann. A new exten- sion of the Kalman Filter to Nonlinear Systems. Proc. of AeroSense : The 11th Int. Symp. on Aerospace/Defence Sensing, Simulation and Con- trols, 1997. [20] R. Kalman and R. Bucy. New results in linear filter- ing and prediction theory. Basic Eng., Trans. ASME, Ser. D,, 83(3):95–108, 1961. [21] Ch. Lageman, J. Trumpf, and R. Mahony. Gradient- like observers for invariant dynamics on a Lie group. Automatic Control, IEEE Transactions on, 55(2):367–377, 2010. [22] E. J. Lefferts, F Landis Markley, and M. D. Shus- ter. Kalman filtering for spacecraft attitude estima- tion. Journal of Guidance, Control, and Dynamics, 5(5):417–429, 1982. [23] M. Liao. L´evy processes in Lie groups. Cambridge University Press, 2004. [24] JT-H Lo. Optimal estimation for the satellite atti- tude using star tracker measurements. Automatica, 22(4):477–482, 1986. [25] J. Trumpf M. Zamani and R. Mahony. Near-optimal deterministic filtering on the rotation group. IEEE Trans. on Automatic Control, 56:6:1411 – 1414, 2011. [26] R. Mahony, T. Hamel, and J-M Pflimlin. Non- linear complementary filters on the special orthog- onal group. IEEE-Trans. on Automatic Control, 53(5):1203–1218, 2008. [27] F Landis Markley, J. L. Crassidis, and Y. Cheng. Nonlinear attitude filtering methods. In AIAA Guid- ance, Navigation, and Control Conference, 2005. [28] Ph. Martin and E. Sala¨un. Invariant observers for attitude and heading estimation from low-cost iner- tial and magnetic sensors. In 46th IEEE Conf Decis Contr, pages 1039–1045, 2007. [29] S.K. Ng and P.E. Caines. Nonlinear filtering in Rie- mannian manifolds. IMA journal of mathematical control and information, 2(1):25–36, 1985. [30] H. Nijmeijer and T. I. Fossen. New directions in non- linear observer design, volume 244. Springer, 1999. [31] P. J. Olver. Classical Invariant Theory. Cambridge University Press, 1999. [32] W. Park, Y. Liu, Y. Zhou, M. Moses, G. S. Chirikjian, et al. Kinematic state estimation and mo- tion planning for stochastic nonholonomic systems using the exponential map. Robotica, 26(4):419– 434, 2008. [33] M. Pontier and J. Szpirglas. Filtering on manifolds. In Stochastic Modelling and Filtering, pages 147– 160. Springer, 1987. [34] A. Saccon, J. Trumpf, R. Mahony, and A. P. Aguiar. Second-order-optimal filters on lie groups. [35] S. Said. Estimation and filtering of processes in ma- trix Lie groups. PhD thesis, Institut National Poly- technique de Grenoble-INPG, 2009. [36] S. Said and J. H. Manton. On filtering with observa- tion in a manifold: Reduction to a classical filtering problem. SIAM Journal on Control and Optimiza- tion, 51(1):767–783, 2013. [37] S. Salcudean. A globally convergent angular ve- locity observer for rigid body motion. Automatic Control, IEEE Transactions on, 36(12):1493–1497, 1991. [38] A. K. Sanyal, T. Lee, M. Leok, and N. H. Mc- Clamroch. Global optimal attitude estimation using uncertainty ellipsoids. Systems & Control Letters, 57(3):236–245, 2008. [39] A. Tayebi, S. McGilvray, A. Roberts, and M. Moallem. Attitude estimation and stabilization of a rigid body using low-cost sensors. In Decision and Control, 2007 46th IEEE Conference on, pages 6424–6429. IEEE, 2007. [40] J. Thienel and Robert M Sanner. A coupled non- linear spacecraft attitude controller and observer with an unknown constant gyro bias and gyro noise. Automatic Control, IEEE Transactions on, 48(11):2011–2015, 2003. [41] R. Tron, B. Afsari, and R. Vidal. Riemannian con- sensus for manifolds with bounded curvature. 2012. [42] J.F. Vasconcelos, R. Cunha, C. Silvestre, and P. Oliveira. A nonlinear position and attitude ob- server on SE(3) using landmark measurements. Sys- tems Control Letters, 59:155–166, 2010. [43] A. S. Willsky. Some estimation problems on Lie groups. In D.Q. Mayne and R.W. Brockett, editors, Geometric Methods in System Theory, volume 3 of NATO Advanced Study Institutes Series, pages 305– 314. Springer Netherlands, 1973. [44] A. S. Willsky and S. I. Marcus. Estimation for bilin- ear stochastic systems. In Variable Structure Sys- tems with Application to Economics and Biology, pages 116–137. Springer, 1975. [45] A.S. Willsky. Dynamical Systems Defined on Groups: Structural Properties and Estimation. PhD thesis, MIT Dept. of Aeronautics and Astronautics, May 1973. [46] K. C. Wolfe, M. Mashner, and G. S. Chirikjian. Bayesian fusion on Lie groups. Journal of Algebraic Statistics, 2(1):75–97, 2011. [47] E. Wong and M. Zakai. Riemann-Stieltjes approxi- mations of stochastic integrals. Probability Theory and Related Fields, 12(2):87–97, 1969. Table 1: Artificial horizon: experiment parameters Standard Deviation of the model noise 0.01 degree = 1.75×10−4 rad Standard Deviation of the regular observations 0.1 degree = 1.75×10−3 rad Standard Deviation of the outliers 30 degrees Probability of the outliers to occur 0.01 Table 2: Useful formulas for some classical matrix Lie groups G SO(3) (Rdimg = R3) (Rdimg = R6) Embedding of G R ∈M3(R),RTR = I3 R v 0 1 ! ,R ∈SO(3),u ∈R3 Embedding of g A3 : ψ ∈M3(R),ψT = −ψ ψ u 0 0 ! ,ψ ∈A3,u ∈R3 (.)m ξ →(ξ)× ξ u ! → (ξ)× u 0 0 ! x →Adx R →R R T ! → R 0 (T)×R R ! ξ →adξ ξ →(ξ)× ξ u ! → (ξ)× 0 (u)× (ξ)× ! exp exp(x) = I3 + sin(||x||) ||x|| (x)× exp(x) = I4 +(x)m + [1−cos(||x||)] ||x||2 (x)2 × + 1 ||x||2 (1−cos|x|)(x)2 m + 1 ||x||3 (||x||−sin||x||)(x)3 m Table 3: Observation of two vectors: parameters Parameter b1 b2 QV1 QV2 Value (1,0,0) (0,1,0) 0,08732I3 0,08732I3 Parameter P0 Qw N Simulations Value 0.52362I3 0,017452I3 50 1000 Intrinsic filtering on Lie groups with applications to attitude estimation Axel Barrau, Silv`ere Bonnabel∗ September 15, 2014 Abstract This paper proposes a probabilistic approach to the prob- lem of intrinsic filtering of a system on a matrix Lie group with invariance properties. The problem of an invariant continuous-time model with discrete-time measurements is cast into a rigorous stochastic and geometric frame- work. Building upon the theory of continuous-time in- variant observers, we introduce a class of simple filters and study their properties (without addressing the opti- mal filtering problem). We show that, akin to the Kalman filter for linear systems, the error equation is a Markov chain that does not depend on the state estimate. Thus, when the filter’s gains are held fixed, the noisy error’s dis- tribution is proved to converge to a stationary distribution, under some convergence properties of the filter with noise turned off. We also introduce two novel tools of engineer- ing interest: the discrete-time Invariant Extended Kalman Filter, for which the trusted covariance matrix is shown to converge, and the Invariant Ensemble Kalman Filter. The methods are applied to attitude estimation, allowing to derive novel theoretical results in this field, and illus- trated through simulations on synthetic data. 1 Introduction The problem of probabilistic filtering aims at computing the conditional expectation ˆx(t) of the state x(t) of a sys- tem driven by process and observation noises conditioned ∗A. Barrau and S. Bonnabel are with MINES ParisTech, PSL Research University, Centre for robotics, 60 Bd St Michel 75006 Paris, France. [axel.barrau,silvere.bonnabel]@mines-paristech.fr on the past inputs and outputs. Filtering is a branch of stochastic process theory that has been strongly driven by applications in control and signal processing. In the particular case where the system is linear and corrupted by independent white Gaussian process and observation noises, the celebrated Kalman filter [20] solves the filter- ing problem. Yet, when the system is non-linear there is no general method to derive efficient filters, and their design has encountered important difficulties in practice. Indeed, general filtering equations describing the evolu- tion of the state conditioned on past outputs can also be derived, see e.g., [10] and references therein. However, the equations are not closed, and in most cases the con- ditional densities can not be computed by solving a finite dimensional equation. In engineering applications the most popular solution is the extended Kalman filter (EKF), which does not pos- sess general theoretical properties, especially in a stochas- tic context, and is sometimes prone to divergence. The EKF amounts to linearize the system around the esti- mated trajectory, and build a Kalman filter for the linear model, which can in turn be implemented on the non- linear model. Despite the fact that it is subject to sev- eral problems both theoretical and practical, one merit of the EKF is to convey an estimation of the mean state ˆx(t) together with an approximation of the covariance matrix P(t) of the error, yielding a (trusted) ellipsoidal credible set that reflects the level of uncertainty in the estimation. In order to capture more closely the distribution, several rather computationally extensive methods have recently attracted growing interest such as particle filters and un- scented Kalman filters (UKF) [19]. In this paper we consider a general filtering problem on Lie groups in a stochastic setting, and a class of simple arXiv:1310.2539v2 [cs.SY] 12 Sep 2014 recursive finite dimensional filters, having a similar struc- ture as the EKF, is introduced. Such recursively defined filters allow straightforward digital computer implemen- tation. Besides, as the Kalman filter, the corresponding algorithms will be readily implementable with very little understanding of the theory leading to their development, as illustrated by the concrete examples of sections 4.2, 4.3 and 5.3, which can be read independently of the rest of the paper. As concerns their theoretical properties, they are motivated by the three following desirable properties a simple recursive filter should ideally have: 1- forget- ting of the initial condition, 2- being able to capture the extent of uncertainty in the estimate, 3- being efficient, that is, minimizing the error’s dispersion within a given class. In the filtering problem, the estimation error is ini- tially distributed according to a prior distribution provided by the practitioner. The rationale for the first property is that, as the filter acquires information, the prior informa- tion should be given less (eventually no) importance by the filter. As the user generally possesses very little in- formation on the system’s state initially, the prior can be very far from the truth. Forgetting the initial condition allows the (subjective Bayesian) prior distribution to be as informative as noninformative without impacting the final estimates. Moreover, it corresponds to convergence properties of the filter, and thus deals with the ability to re- cover from catastrophic failures. In robotics, this leads to robustness to erroneous beliefs due for instance to percep- tual ambiguity. The second property deals with the cen- tral motivation behind stochastic approaches. Indeed, all deterministic observers, no matter how strong properties they may have, have a common fundamental weakness: in the presence of sensor and model uncertainties, they pro- vide no indication of the extent of uncertainty conveyed by the estimate. Being able to compute credible sets for the estimate may be needed in order to adapt control deci- sions to the situation, for instance to guarantee an absence of collision. The two first properties are general and ap- ply to any class of sensible filters. The third property ad- dresses the issue of gain tuning. When working with a class of (suboptimal) filters, the gains can be tuned over a training set in order to minimize the level of uncertainty conveyed by the estimate. Several approaches to filtering for systems possessing a geometric structure have been developed in previous lit- erature. For stochastic processes on Riemannian mani- folds [18, 14] some results have been derived, see e.g., [29]. The specific situation where the process evolves in a vector space but the observations belong to a manifold has also been considered, see e.g. [33, 13] and more re- cently [36]. For systems on Lie groups powerful tools to study the filtering equations (such as harmonic analysis [45, 32, 24, 43]) have been used, notably in the case of bi- linear systems [44] and estimation of the initial condition of a Brownian motion in [15]. An extension of Gaussian distributions has also been developed and recently used for Bayesian filtering on Lie groups in [46] and also in [7] which devises a general adaptation of the EKF to Lie groups, without exploiting the geometrical properties of the specific invariant case such as in the present paper. A somewhat different but related approach to filtering con- sists of finding the path that best fits the data in a deter- ministic setting. It is thus related to optimal control theory where geometric methods have long played an important role [9]. A certain class of least squares problems on the Euclidian group has been tackled in [17]. This approach has also been used to filter inertial data recently in [34], [26]. Although the paper is concerned with a general the- ory on (matrix) Lie groups, the several derived filters will systematically be applied to a benchmark and motivating filtering example on the special orthogonal group SO(3). The problem of attitude estimation consists of estimating the orientation of a rigid body, which proves to be a cen- tral task in aeronautics, in order to stabilize or guide the system, or in mobile robotics, where orientation is a part of the robot’s localization. When the system is equipped with gyroscopes, the delivered signals can be integrated over time to compute an orientation increment. However, the initial orientation may not be known, and even if it is, the gyroscopes are subject to noise and drift, so that pure integration may quickly result in a very erroneous atti- tude. Thus the estimations are regularly checked against some vectors’ orientation, such as the earth magnetic field measured by magnetometers, or the earth gravity vector measured by accelerometers under quasi-stationary flight assumptions. The explosion over the last decade of low- cost noisy sensors integrated in unmanned aerial vehicles (UAV’s), and the limited computational power on-board, have been a driving force for the development of simple filters on SO(3). Work on the deterministic problem dates back to [37, 30] and has received much attention recently, see [42, 27, 26, 4, 28, 38] to cite a few, and some related issues have been addressed as well, like angular velocity estimation [40] or attitude control [37, 39]. The beauty of the intrinsic problem formulation on SO(3) and the prop- erties stemming from it have been exploited in various papers in the deterministic case, see e.g. [26, 21, 5]. The problem has also been cast into a minimum energy fil- tering framework in [25]. In the stochastic framework, a general review of the classical techniques based on Gaus- sian assumptions is proposed in [27]. A first attempt to systematically exploit the invariance properties to de- sign stochastic filters on SO(3) appears in the preliminary work [1]. The contributions and the organization of the present paper are as follows. In Section 2, the problem of stochas- tic filtering on Lie groups with a continuous-time noisy model and discrete-time observations is rigorously posed. From a theoretical viewpoint, the choice of discrete-time measurements allows to circumvent some technical diffi- culties and to cast the problem into a perfectly rigorous framework. As this work is mainly guided by applica- tions to aerospace, this choice is relevant as the evolution equations of the orientation generally rely on the integra- tion of noisy inertial measurements that are delivered at a high rate, whereas the observations may be delivered at low(er) rate (GPS, vision). In order to derive proper filters, the problem is then transformed into a complete discrete-time model. The discretization is not straightfor- ward because unlike the linear case, difficulties arise from non-commutativity. Guided by the theory of symmetry-preserving ob- servers that has been exclusively concerned with the continuous-time and deterministic case so far, we propose in Section 3 a class of natural filters. They appear as a mere transposition of linear observers (such as the linear Kalman filter) to the group, but where the addition has been replaced with the group multiplication law (due to non-commutativity, there is always some freedom in the transposition as left and right multiplications are differ- ent operations). We prove the proposed filters retain the invariances of the problem, and that the discrete-time er- ror’s evolution is independent of the system’s trajectory, inheriting the properties of the deterministic continuous- time case [5] (or merely the linear case). Thus, the gains can be computed off-line, and the proposed filters require very few on-board computational resources. This is one of their main advantages over current filters for the atti- tude estimation problems considered in simulations. The proposed class is very broad and the tuning issue is far from trivial. Sections 4 and 5 explore two different routes, depending on if we are interested in the asymp- totic or transitory phase. In Section 4, we propose to hold the gains fixed over time. As a result, the error equation becomes a homogeneous Markov chain. We first prove some new results about global convergence in a determin- istic and discrete-time framework. Then, building upon the homogeneity property of the error, we prove that if the filter with noise turned off admits almost global con- vergence properties, the error with noise turned on con- verges to a stationary distribution. Mathematically this is a very strong and to some extent surprising result. From a practical viewpoint, the gains can be tuned numerically to minimize the asymptotic error’s dispersion. This allows to “learn” sensible gains for very general types of noises. The theory is applied to two examples and gives conver- gence guarantees in each case. First an attitude estimation problem using two vector measurements and a gyroscope having isotropic noise (Section 4.2), then the construc- tion of an artificial horizon with optimal gains, for a non- Gaussian noise model (Section 4.3). Each application is a contribution in itself and can be implemented without reading the whole paper. In Section 5, we propose to optimize the convergence during the transitory phase using Gaussian approxima- tions. We first introduce a method of engineering inter- est: the Invariant Extended Kalman Filter (IEKF) in dis- crete time. The IEKF was introduced in continuous time in [3] and developped and applied to localization prob- lems in [6] and very recently in [2]. A discrete time ver- sion on SO(3) was proposed in the preliminary paper [1]. The idea consists of linearizing the invariant error for the class of filters introduced in Section 3 around a fixed point (the identity element of the matrix group), build a Kalman filter for the linear model and implement it on the non- linear model. As the linearizations always occur around the same point, the linearized model is time-invariant and thus the Kalman gains, as well as the Kalman covariance matrix, are proved to converge to fixed values. When on- board storage and computational resources are very lim- ited, this advantageously allows to replace the gain with its asymptotic value. The IEKF is compared to the well- known MEKF [22, 12] and UKF [19, 11] on the attitude estimation problem, and simulations illustrate some con- vergence properties that the latter lack. In the case where the error equation is totally autonomous, we introduce a new method based on off-line simulations, the IEnKF, which outperforms the other filters in case of large noises by capturing very accurately the error’s dispersion. 2 Problem setting 2.1 Considered continuous-time model Consider a state variable χt taking values in a matrix Lie group G with neutral element Id, and the following continuous-time model with discrete measurements: d dt χt = (υt +wt)χt + χtωt (1) Yn = h(χtn,Vn) (2) where υt and ωt are inputs taking their values in the Lie algebra g (i.e. the tangent space to the identity element of G), wt is a continuous white Langevin noise with diffu- sion matrix Σt, whose precise definition will be discussed below in Subsection 2.3. (Yn)n⩾0 are the discrete-time ob- servations, belonging to some measurable space Y and Vn is a noise taking values in Rp for an integer p > 0. We further make the following additional assumption: Assumption 1 (left-right equivariance) The output map h is left-right equivariant, i.e. there exists a left action of G on Y such that we have the equality in law: ∀g,χ ∈G, h(χg,Vn) L= g−1h(χ,Vn) (3) and h(χg,0) = g−1h(χ,0) (4) The reader who is not familiar with stochastic calcu- lus and Lie groups can view χt as a rotation matrix, and replace the latter property with the following more restric- tive assumption: Assumption 1 bis For any n ⩾0 there exists a vector b such that h(χ,Vn)=χ−1(b+Vn). The assumptions could seem restrictive but are verified in practice in various cases as shown by the following ex- amples, which will provide the reader with a more con- crete picture than the formalism of Lie groups. 2.2 Examples 2.2.1 Attitude estimation on flat earth Our motivating example for the model (1)-(2) is the atti- tude estimation of a rigid body assuming the earth is flat, and observing two vectors: d dt Rt = Rt(ωt +wω t )×, Yn = (RT tnb1 +V 1 n ,RT tnb2 +V 2 n ), (5) where as usual Rt ∈SO(3) represents the rotation that maps the body frame to the earth-fixed frame, and where ωt ∈R3 is the instantaneous rotation vector, and wω t ∈R3 is a continuous Gaussian white noise representing the gy- roscopes’ noise. We have let (x)× ∈R3×3 denote the skew matrix associated with the cross product with a three di- mensional vector, i.e., for a,x ∈R3, we have (x)×a = x × a. (Yn)n≥0 is a sequence of discrete noisy measure- ments of two vectors b1,b2 of the earth fixed frame ver- ifying b1 × b2 ̸= 0, and V 1 n and V 2 n are sequences of in- dependent isotropic Gaussian white noises. Note that the noise wω t is defined, as the input ωt, in the body frame (in other words it is multiplied on the left). Thus equations (5) do not match (1)-(2) which correspond to a noise de- fined in the earth-fixed frame. This can be remedied in the particular case where the gyroscope noise is isotropic, a restrictive yet relevant assumption in practice. Definition 1. A Langevin noise wt of g is said isotropic if ˜wt := gwtg−1 L= wt for any g ∈G. Note that for a noise taking values in so(3) this defini- tion corresponds to the physical intuition of an isotropic noise of R3. With this additional assumption the equation can be rewritten: d dt Rt = ( ˜wω t )×Rt +Rt(ωt)×, Yn = (RT tnb1 +V 1 n ,RT tnb2 +V 2 n ), which corresponds to (1)-(2). Remark 1. If the gyrometer noise is not isotropic the new noise ˜wω t is related to Rt by ˜wω t = Rtwω t RT t . Depending on the degree of anisotropy this can prevent the use of methods based on the autonomy of the error (see 4) but not of the discrete-time IEKF (see 5.1). 2.2.2 Attitude estimation on a round rotating Earth Another interesting case is attitude estimation on SO(3) using an observation of the vertical direction (given by an accelerometer) and taking into account the rotation of the earth. Whereas the flat earth assumption perfectly suits low-cost gyroscopes, precise gyroscopes can measure the complete attitude by taking into account the earth’s ro- tation. We define a geographic frame with axis North- west-up. The attitude Rt is the transition matrix from the body reference to the geographic reference. The earth in- stantaneous rotation vector υ ∈R3 and the gravity vector g ∈R3 are expressed in the geographic reference. Both are constant in this frame. The gyroscope gives a con- tinuous rotation speed ωt, disturbed by an isotropic con- tinuous white noise wω t . The equation of the considered system reads: d dt Rt = (υ)×Rt +Rt(ωt +wω t )×, Yn = RT tn(g+Vn) As wω t is supposed isotropic the equation can be rewritten: d dt Rt = (υ + ˜wω t )×Rt +Rt(ωt)×, Yn = RT tn(g+Vn), which corresponds to (1)-(2). 2.2.3 Attitude and velocity estimation We give here an example on a larger group. Consider the attitude Rt ∈SO(3) and speed vt ∈R3 of an aircraft evolv- ing on a flat earth, equipped with a gyroscope and an ac- celerometer. The gyroscopes gives continuous increments ψt ∈R3 with isotropic noise wψ t ∈R3, and the accelerom- eters gives continuous increments at ∈R3 with isotropic noise wa t ∈R3. The aircraft has noisy speed measure- ments in the earth reference frame Yn = vtn +Vn, where Vn is a supposed to be a Gaussian isotropic white noise. The equations read: d dt Rt = Rt(ψt +wψ t )×, d dt vt = Rt(at +wa t )+g, where Rt is the rotation mapping the body frame to the earth-fixed frame, vt is the velocity expressed in the earth fixed frame, and g is the earth gravity vector, supposed to be (locally) constant. Using the matrix Lie group SE(3) we introduce: At =  RT t RT t vt 0 1  ; ωt =  0 g 0 0  υt =  −(ψt)× at 0 0  ; wt =  (wψ t )× wa t 0 0  The problem can then be rewritten under the form (1)-(2): d dt At = (υt +wt)At +Atωt, Ytn = A−1 tn  Vn 1  , 2.3 Interpretation of Langevin noises on Lie Groups This present subsection provides some mathematical con- siderations about white noises on Lie groups. It can be skipped by the uninterested reader who is directly referred to the model (6). Equation (1) is actually a Langevin equation that suf- fers from poly-interpretability because of its non-linearity, and its meaning must be clarified in the rigorous frame- work of stochastic calculus. Stratonovich stochastic dif- ferential equations on a Lie group can be intrinsically de- fined as in e.g. [23]. A somewhat simpler (but equiva- lent) approach consists of using the natural embedding of the matrix Lie group G in a matrix space and to under- stand the equation in the sense of Stratonovich, see e.g. [35]. The mathematical reasons stem from the fact that the resulting stochastic process is well-defined on the Lie group, whereas this is not the case when opting for an Ito interpretation as underlined by the following easily prov- able proposition. Proposition 1. If the stochastic differential equation (1) is taken in the sense of Itˆo, for G = SO(3) embedded in R3×3, an application of the Itˆo formula to χT t χt shows that the solution almost surely leaves the submanifold G for any time t > 0. Besides, the physical reasons for this stem from the fact that the sensors’ noise are never completely white, and for colored noise the Stratonovich interpretation provides a better approximation to the true solution than Ito’s, as advocated by the result of Wong and Zakai [47]. 2.4 Exact discretization of the considered model To treat rigorously the problem of integrating discrete measurements we need to discretize the continuous model with the same time step as the measurements’. Unlike the general case of non-linear estimation, the exact discrete- time dynamics corresponding to Equation (1) can be ob- tained, as proved by the following result: Theorem 1. Let χn = χtn. Then the discrete system (χn,Yn) satisfies the following equations: χn+1 = ϒnWnχnΩn, Yn = h(χn,Vn), (6) where Wn is a random variable with values in G and whose law depends on the values taken by υt for t ∈ [tn,tn+1] and on the law of wt for t ∈[tn,tn+1], and ϒn and Ωn are elements of G which only depend on the values taken respectively by υt and ωt for t ∈[tn,tn+1]. Proof. For n ∈N consider the value of the process χt is known until time tn. Let ϒt and Ωt be the solutions of the following equations: ϒtn = Id, d dt ϒt = υtϒt and Ωtn = Id, d dt Ωt = Ωtωt Let Wt be the solution of the following (Stratonovich) stochastic differential equation: Wtn = Id, d dtWt = ϒ−1 t wtϒtWt Note that ϒ−1 t wtϒt being in g, Wt is ensured to stay in G. Define the process χt|tn = ϒtWtχtnΩt. We will show that for t > tn the processes χt and χt|tn verify the same stochastic differential equation. Indeed: d dt χt|tn = ( d dt ϒt)WtχtΩt +ϒt( d dtWt)χtΩt +ϒtWtχt( d dt Ω) = υtϒtWtχtΩt +wtϒtWtχtΩt +ϒtWtχtΩtωt = (υt +wt)χt|tn + χt|tnωt Thus the two processes have the same law at time tn+1, i.e. χn+1 and Wtn+1ϒtn+1χtnΩtn+1 have the same law. Let- ting ϒn = ϒtn+1, Ωn = Ωtn+1 and Wn = Wtn+1 we obtain the result. Remark 2. In many practical situations (for instance ex- amples 2.2.1 and 2.2.2), the Langevin noise wt is isotropic and we have thus ϒ−1 t wtϒt L= wt. Note that, the variable Wt depends also only on the law of wt for t ∈[tn,tn+1]. In the sequel, for mathematical reasons (the equa- tions do not suffer from poly-interpretability), tutorial rea- sons (the framework of diffusion processes on Lie groups needs not be known), and practical reasons (any filter must be implemented in discrete time), we will system- atically consider the discrete-time model (6). Moreover, the noise Wn will be a general random variable in G, not necessary a solution of the stochastic differential equation d dtWt = ϒ−1 t wtϒtWt. 3 A class of discrete-time intrinsic filters 3.1 Preliminary: Kalman filter for linear stationary systems with drift Consider the particular case of a linear system in RN (with arbitrary N) of the form d dt Xt = bt +wt, wt being a white noise and bt a known deterministic input. Assume we have discrete measurements of the form Yn = HXtn +Vn ∈ Rp, Vn being a Gaussian noise in Rp. A straightforward discretization yields: Xn+1 = Xn +Bn +Wn, Yn = HXn +Vn, (7) With Bn = R tn+1 tn bsds and Wn = R tn+1 tn wsds is the value of a Brownian motion in RN. The optimal filter in this case is the celebrated linear Kalman filter whose equations are: ˆX′ n+1 = ˆXn +Bn, ˆXn+1 = ˆX′ n+1 +Kn+1(Yn+1 −H ˆX′ n+1) (8) The first equation is the so-called prediction step, and the second one the correction step. Let e′ n = ˆX′ n −Xn and en = ˆXn −Xn denote the predicted and corrected state es- timation errors, we get: e′ n+1 = en −Wn, en+1 = e′ n+1 −Kn+1(He′ n+1 −Vn+1) (9) The error equation is autonomous and can be optimized independently from the dynamical inputs Bn’s. As a con- sequence, the optimal gains Kn can be computed off-line. It is a well-known result that they (as well as the error distribution) converge under observability and detectabil- ity assumptions, so all computations can be done off-line and only the first values need be stored, in which case only the equations (8) must be computed on-line, requir- ing extremely restricted computational power on board. Mimicking the form of equations (8) on the Lie group G where the addition is naturally replaced with the group multiplication law, leads to a similar result, as shown in the next section. 3.2 Proposed intrinsic filters Inspired by the linear case and the theory of continuous- time symmetry-preserving observers on Lie groups [5] we propose to mimic the prediction and update steps (8) re- placing the addition with the group multiplication yield- ing the class of filters defined by: ˆχ′ n+1 = ϒn ˆχnΩn, (10) ˆχn+1 = Kn+1( ˆχ′n+1Yn+1) ˆχ′n+1, (11) where Kn+1(•) can be any function of Y →G, ensuring K(h(Id,0)) = Id . There are two ways to understand the links between (8) and (10)-(11). First, by defining a an estimation error on the group G by χn+1 ˆχ−1 n+1 (see below), and by using the left-right equivariance hypothesis, which allows to interpret ˆχ′n+1Yn+1 = h(χn+1 ˆχ′−1 n+1,Vn+1) as a group equivalent to H(X′ n+1 −ˆXn+1)+Vn+1. Secondly, by viewing the linear case of Section 3.1 as a specific case of the proposed approach through the following proposition whose proof has been moved to the Appendix. Proposition 2. There exists an isomorphic representation of RN as a matrix Lie group such that (7) takes the canoni- cal form (1)-(2), and (8) becomes the invariant filter (10)- (11). We now define the invariant output errors, which are a transposition of the linear error to our multiplicative group: ηn = χn ˆχ−1 n , η′ n = χn ˆχ′−1 n (12) We have the following striking property, that is similar to the linear case: Theorem 2. The error variables ηn and η′ n are Markov processes, and are independent of the inputs Ωn’s. Proof. The equations followed by η′ n and ηn read: η′ n+1 = χn+1 ˆχ′−1 n+1 = ϒnWnχnΩnΩ−1 n ˆχ−1 n ϒ−1 n = ϒnWnηnϒ−1 n (13) and: ηn+1 = χn+1 ˆχ′−1 n+1Kn+1( ˆχ′n+1Yn+1)−1 = η′ n+1Kn+1( ˆχ′n+1h(χn+1,Vn+1))−1 L= η′ n+1Kn+1(h(η′ n+1,Vn+1))−1, (14) thanks to the equivariance property of the output. The most important consequence of this property is that if the inputs υt are known in advance, or are fixed, as it is the case in Examples 2.2.1 and 2.2.2, the gain func- tions Kn can be optimized off-line, independently of the trajectory followed by the system. In any case, numerous choices are possible to tune the gains Kn and the remain- ing sections are all devoted to various types of methods to tackle this problem. 4 Fixed gains filters In certain cases, one can build an (almost) globally con- vergent observer for the associated deterministic system, i.e., with noise turned off, by using a family of constant gain function Kn(•) ≡K(•). If the filter with noise turned off has the desirable property of forgetting its initial con- dition, convergence to a single point is impossible to re- tain, because of the unpredictability of the noises that “steer” the system, but convergence of the distribution can be expected, assuming as in the linear case that ϒn is inde- pendent of n. Indeed in this section we prove that, when noise is turned on, the error forgets its initial distribution under mild conditions. The results are illustrated by an attitude estimation example for which we propose an in- trinsic filter having strong convergence properties. It should be noted that in practice, the convergence to an invariant distribution allows in turn to pick the most desirable gain K(•) among the family based on a per- formance criterion, such as convergence speed, or filter’s precision (that is, ensuring low error’s dispersion). This fact will be illustrated by an artificial horizon example of Subsection 4.3. 4.1 Convergence results Here the left-hand inputs ϒn are assumed fixed. This, to- gether with constant gains Kn, makes the error sequence ηn a homogeneous Markov chain. Thus, under appropri- ate technical conditions, the chain has a unique stationary distribution and the sequence converges in distribution to this invariant distribution. Let d denote a right-invariant distance on the group G. We propose the following as- sumptions: 1. Confinement of the error: there exists a compact set C such that ∀n ∈N,ηn ∈C a.s. for any η0 ∈C 2. Diffusivity: the process noise has a continuous part with respect to Haar measure, with density positive and uniformly bounded from zero in a ball of radius α > 0 around Id. 3. Reasonable output noise: ∀g ∈ G, P[gK(h(g,Vn))−1 ∈Bo(gK(h(g,0))−1, α 2 )] > ε′ for some ε′ > 0. The second assumption implies, and can in fact be re- placed with, the more general technical assumption that there exists ε > 0 such that for any subset U of the ball Bo(Id,α) we have P(Wn ⊂U) > εµ(U) for all n ≥0 where µ denotes the Haar measure. Those noise proper- ties are relatively painless to verify, whereas the confine- ment property although stronger is automatically verified whenever G is compact, e.g. G = SO(3). Intuitively, the last two assumptions guarantee the error process is well approximated by its dynamics with noise turned off, fol- lowed by a small diffusion. In the theory of Harris chains, the latter diffusion step is a key element to allow proba- bility laws to mix at each step and eventually forget their initial distribution. Theorem 3. For constant left-hand inputs ϒn ≡ϒ, con- sider the filter: ˆχ′ n+1 = ϒ ˆχnΩn (15) ˆχn+1 = K( ˆχ′n+1Yn+1) ˆχ′n+1 (16) Suppose that Assumptions 1)-2)-3) are verified, where the compact set satisfies C = cl(Co), cl denoting the closure and o the interior. When noises are turned off, the error equation (13)-(14) becomes: γ′ n+1 = ϒγnϒ−1, γn+1=γ′ n+1Kn+1(h(γ′ n+1,0))−1, (17) Suppose that for any γ0 ∈C, except on a set of null Haar measure, γn converges to Id. Then there exists a unique stationary distribution π on G such that for any prior law µ0 of the error η0 supported by C, the law (µn)n≥0 of (ηn)n≥0 satisfies the total variation (T.V.) norm conver- gence property: lim n ∥µn −π∥T.V. →0 Corollary 1. When the group G is compact, The conver- gence results of Theorem 3 hold globally, i.e. without the confinement assumption 1). Theorem 4. Under the assumptions of Theorem 3, as- suming only h(γn,0) →0 instead of almost global con- vergence of (γn)n≥0, that G is compact, the set K = {g ∈ G,h(g,0) = h(Id,0)} connected and h(ϒ,0) = h(Id,0), the results of Theorem 3 are still valid. Moreover, if Wn is isotropic, we have π(˜ϒV) = π(V) for any ˜ϒ ∈K commut- ing with ϒ (˜ϒϒ = ϒ˜ϒ). The proofs of the results above have all been moved to the Appendix. 4.2 Application to attitude estimation Consider the attitude estimation example of Subsection 2.2.1. In a deterministic and continuous time setting, al- most globally converging observers have been proposed in several papers (see e.g. [37, 26]) and have since been analyzed and extended in a number of papers. In order to apply the previously developed theory to this example, the challenge is twofold. First, the deterministic observer must be adapted to the discrete time and proved to be al- most globally convergent. Then, the corresponding filter must be proved to satisfy the assumptions of the theorems above in the presence of noise. In discrete time, the sys- tem equations read: Rn+1 =WnRnΩn, Yn =(RT n b1 +V 1 n ,RT n b2 +V 2 n ) with the notations introduced in 2.2.1. We propose the following filter on SO(3): ˆR′ n+1 = ˆRnΩn, ˆRn+1 = K( ˆR′n+1Yn+1) ˆR′n+1, with K(y1,y2) = exp(k1(y1 ×b1)+k2(y2 ×b2)), k1 > 0, k2 > 0, k1 +k2 ⩽1 (18) Proposition 3. With noise turned off, the discrete invari- ant observer (18) is almost globally convergent, that is, the error converges to Id for any initial condition except one. The proof has been moved to the Appendix, only the main idea is given here. For the continuous-time de- terministic problem it is known, and easily seen, that E : γ →k1||γTb1 −b1||2 +k2||γTb2 −b2||2 is a Lyapunov function, allowing to prove almost global convergence of the corresponding observer. In the discrete time determin- istic case, the function above remains a Lyapunov func- tion for the sequence (γn)n≥0, allowing to derive Propo- sition 3. This is not trivial to prove, and stems from the more general following novel result: Proposition 4. Consider a Lie group G equipped with a left-invariant metric ⟨.,.⟩, and a left-invariant determinis- tic discrete equation on G of the form: γn+1 = γn exp(−k(γn)) Assume there exists a C2 function E : G →R≥0 with bounded sublevel sets, a global minimum at Id, and a con- tinuous and strictly positive function u : G →R>0 such that: ∀x ∈G,k(x) = u(x)[x−1.gradE(x)]. If the condition ∀x ∈G,| ∂k ∂x| ⩽1 (for the operator norm) is verified, for any initial value γ0 such that Id is the only critical point of E in the sublevel set {x ∈G | E(x) ≤E(γ0)} we have: γn → n→∞Id The proof has been moved to the Appendix. Note that, the latter property is closely related to Lemma 2 of [41]. Using Theorem 3 and Proposition 3 we finally directly get: Theorem 5. The distribution of the error variable of the invariant filter (18) converges for the T.V. norm to an asymptotic distribution, which does not depend on its ini- tial distribution. 4.3 Learning robust gains: application to the design of an artificial horizon As proved in 4.1, under appropriate conditions the error variable is a converging Markov chain whose asymptotic law depends on the gain function but not on the trajectory followed by the system (which is a major difference with most nonlinear filters, such as the EKF). Hence a fixed gain can be asymptotically optimized off-line, leading to a very low numerical cost of the on-line update. A classical aerospace problem is the design of an artifi- cial horizon using an inertial measurement unit (IMU). An estimation of the vertical is maintained using the observa- tions of the accelerometer (which senses the body accel- eration minus the gravity vector, expressed in the body frame) and the stationary flight assumption according to which the body’s linear velocity is constant. The prob- lem is that this approximation is not valid in dynamical phases (take-off, landing, atmospheric turbulence), which are precisely when the artificial horizon is most needed. The problem is generally stated as follows: d dt Rt = Rt(ωt +wt)×, Yn = RT tng+Vn +Nn, where Rt is the attitude of the aircraft (the rotation from body-frame coordinates to inertial coordinates), ωt is the continuous-time gyroscope increment and Yn is the obser- vation of the accelerometer. The sensor noises wt and Vn can be considered as Gaussian, and Nn represents fluctu- ations due to accelerations of the aircraft that we propose to model as follows: Nn is null with high probability but when non-zero it can take large values. The Nn’s are as- sumed to be independent as usually. 4.3.1 Convergence results Consider the following class of filters: ˆR′ n+1 = ˆRnΩn, ˆRn+1 = K( ˆR′n+1Yn+1) ˆR′n+1, with K(y) = exp( fk,λ(y)) (19) where fk,λ(x) = k.min(angle(x,g),λ) x×g ||x×g|| if x × g ̸= 0 and fk,λ(x) = 0 otherwise. The rationale for the gain tuning is as follows: if the accelerometer measures a value y, we consider the small- est rotation giving to g the same direction as y. Con- serving the same axis, the angle of this rotation is thresh- olded (hence the parameter λ) to give less weight to out- liers (without purely rejecting them, otherwise the filter couldn’t converge when initialized too far). Then we choose as a gain function a rotation by a fraction of the obtained angle. We begin with the following preliminary result: Lemma 1. For any 0 < k ⩽1 and 0 < λ ⩽π the out- put error ∥Yn −ˆRT n g∥associated to the observer (19) with noise turned off converges to 0. Proof. Let us consider the error evolution when the noise is turned off. It writes γn+1 = γn exp(−fk,λ(γ−1 n g)), thus we have γ−1 n+1g = exp( fk,λ(γ−1 n g))γ−1 n g. As for any n ∈N, fk,λ(γ−1 n g) is orthogonal to g and γ−1 n g, γ−1 n+1g stays in the plane spanned by g and γ−1 0 g, as well as the whole sequence (γ−1 n g)n⩾0. Let φn = angle(γ−1 n g,g). The dy- namics of (φn) writes: φn+1 = φn −k.min(λ,φn). Thus φn goes to 0, i.e: γ−1 n g → n→∞g, i.e. the observation error goes to 0. The following result is a mere consequence of Lemma 1 and Theorem 4. Proposition 5. The error variable associated to the filter defined by (19) converges to a stationary distribution for the T.V. norm, which does not depend on its initial distri- bution. 4.3.2 Numerical asymptotic gain optimization To each couple (k,λ) we can associate an asymptotic er- ror dispersion (computed in the Lie Algebra) associated to the corresponding stationary distribution, and try to minimize it. As all computations are to be done off- line, the statistics of all distributions can be computed using particle methods. Table 1 gives the parameters of the model used in the following numerical experi- ment. Figure 1 displays the Root Mean Square Error RMSE = p E(η∞g−g), computed over a grid for the pa- rameters (k,λ). The minimum is obtained for k = 0.1202 and λ = 0.0029. If we compare it to a MEKF, we observe a huge difference. For our asymptotic invariant filter we get RMSE = 8.02 × 10−4. For the MEKF, the observa- tion noise matrix giving the best results leads to the value RMSE = 4.3 × 10−3. This result is not surprising due to the fact that the outliers significantly pollute the estimates of the Kalman filter (see Fig. 1). This illustrates the fact that when the noise is highly non-Gaussian, an asymptotic gain with some optimality properties can still be found. 5 Gaussian filters The present section focuses more on the transitory phase, and the gain is computed at each step based on Gaussian noise approximations and linearizations. Beyond the lo- cal optimization underlying the gain computation, the two following filters have the merit to readily convey an ap- proximation of the extent of uncertainty carried by the estimations. We first introduce the discrete-time invari- ant extended Kalman filter (IEKF), which is an EKF, but based on the linearization of an invariant state error. Then we introduce the IEnKF, which computes the empirical covariance matrix of the error using particles. As all com- putations can be done off-line, it is easily implementable, as long as the gains can be stored, and is shown to con- vey a better indication of the error’s dispersion for large noises. Simulations illustrate the results. 5.1 The discrete-time Invariant EKF In the present section we introduce the discrete-time IEKF. Indeed, as in the standard EKF theory, the idea is to linearize the error equation (13)-(14) assuming the noises Figure 1: Artificial horizon example. Top plot: off-line optimization of the parameters of the asymptotic invari- ant filter (axis x →λ ; axis y →k ; axis z →Root Mean Square Error). The highest values are not displayed to improve visualization. The performance is optimal for k = 0.1202,λ = 0.0029. Bottom plot: error evolution of three artificial horizons : the invariant filter with op- timized gain functions (black), a MEKF ignoring outliers (red), and a MEKF where the observation noise covari- ance matrix has been numerically adjusted to minimize the RMSE (dashed line). We see that the MEKF cannot filter the outliers and at the same time be efficient in the absence of outliers, contrarily to the proposed invariant filter. and the state error are small enough, use Kalman equa- tions to tune the gains on this linear system, and imple- ment the gain on the nonlinear model. As in the heuristic theory of the Invariant EKF in continuous time [3] [6] the gains and the error covariance matrix are proved to con- verge to a fixed value. In particular this allows (after some time) to advantageously replace the gain by its constant fi- nal value leading to a numerically very cheap asymptotic version of the IEKF. Algorithm 1 Invariant Extended Kalman Filter Returns at each time step ˆχn, Pn such that χn ≈ exp(ξn) ˆχn, where ξn is Gaussian and var(ξn) = Pn. ˆχ0 and P0 are inputs. Hξ,HV defined by: h(exp(ξ ′ n),Vn) = h(Id,0)+Hξξ ′ n+HVVn+O(∥ξ ′ n∥2)+O(∥Vn∥2) Set n = 0. loop Compute the value Mtn+1 solving the equation Mtn = 0, d dt Mt = Var(wt)+adυtMtadT υt The process noise covariance Qw n = Var(Adϒnwn) is equal to Mtn+1 if wn is isotropic or if υt ≡0, merely set Qw n = Var(wn)(tn+1 −tn)  , QV n+1 = Var(Vn+1), Pn+1|n = AdϒnPnAdT ϒn +Qw n , Sn+1 = HVQV n+1HT V +HξPn+1|nHT ξ , Ln+1 = Pn+1|nHT ξ S−1 n+1, Pn+1 = (I −Ln+1Hξ)Pn+1|n, ˆχ′ n+1 = ϒn ˆχnΩn, ˆχn+1 = exp(Ln+1[ ˆχ′n+1Yn+1 −h(Id,0)]) ˆχ′n+1 n ←n+1 end loop 5.1.1 Linearization of the equations and IEKF for- mulas We consider here the error equations (13)-(14). Assuming errors and noises are small, we introduce their projection in the tangent space at the identity Id (the so-called Lie algebra g of the group) using the matrix exponential map exp(.) (see e.g. [31]). As the matrix vector space g can be identified with Rdimg using the linear mapping (.)m (see Table 2) we can assume exp : Rdimg →G. This function is defined in Table 2 for usual Lie groups of mechanics. Its inverse function will be denoted by exp−1. We thus define the following quantities of Rdimg: ξn = exp−1(ηn), ξ ′ n = exp−1(η′ n), wn = exp−1(Wn) The simplest way to design the gain is to use a function which is linear in Rdimg and then mapped to G through the exponential: Kn : y →exp[Ln(y−h(Id,0))] Equations (13) and (14) mapped to Rdimg become: ξ ′ n+1 = exp−1(exp(Adϒnwn)exp(Adϒnξn)), ξn+1 = exp−1(exp(ξ ′ n+1)exp[−Ln(h(exp(ξ ′ n+1,Vn+1))−h(Id,0))]), where we have introduced the adjoint matrix Ad: Definition 2. The following property of the adjoint matrix Ad can be used as a definition: ∀g ∈G,∀u ∈Rdimg, exp(Adgu) = gexp(u)g−1 The differential of ξ →Adexp(ξ) at zero is denoted ξ → adξ. To tune the gains Ln through the Kalman theory we need to evaluate at each time step, using Pn = Var(ξn) ∈ Rdimg×dimg, the following quantities: Pn+1|n = Var(ξ ′ n+1), Sn+1 = Var(Yn+1) Our approach consists of using a coarse first-order de- velopment of the propagation and observation functions. To do so we use the Baker-Campbell-Hausdorff formula and keep only first order terms in ξ ′ n+1, wn and Vn+1, the crossed terms being also neglected: ξ ′ n+1 = Adϒnwn +Adϒnξn h(exp(ξ ′ n+1),Vn+1) = h(Id,0)+Hξξ ′ n+1 +HVVn+1 ξn+1 = ξ ′ n+1 −Ln+1(Hξξ ′ n+1 +HVVn+1) The computation of the gains is then straightforward using the Kalman theory in Rdimg, which is a vector space. It is described in Algorithm 1. 5.1.2 Convergence of the gains The main benefit of the filter is with respect to its conver- gence properties. Indeed, under very mild conditions, the covariance matrix Pn and the filter’s gain Kn are proved to converge to fixed values [20]. The practical consequences are at least twofold: 1-the error covariance converges to a fixed value and is thus much easier to interpret by the user than a matrix whose entries keep on changing (see Fig. 2) 2-due to computational limitations on-board, the covariance may be approximated by its asymptotic value, leading to an asymptotic version of the IEKF being nu- merically very cheap. Theorem 6. If the noise matrices Qw n = Var(wn), QV n = Var(Vn) and the left inputs ϒn are fixed, the pair (Adϒn,Hξ) is observable and Qw n has full rank, then Pn and Kn converge to fixed values P∞and K∞, as in the lin- ear time-invariant case. 5.2 The discrete-time Invariant EnKF When the error equation is independent of the inputs, the exact density of the error variable can be sampled off- line. This allows to radically improve the precision of the quantities involved in the Kalman gains computation of Section 5.1.1. We propose here the Invariant Ensemble Kalman filter (IEnKF) described in Algorithm 2. The idea is to compute recursively through Monte-Carlo simula- tions a sampling of the error density and to use it, instead of linearizations, to evaluate precisely the innovation and error covariance matrices used to compute the gains in the Kalman filter. The procedure is described in Algorithm 2. 5.3 Simulations We will display here the results of Algorithms 1 and 2 for the attitude estimation problem described in Section 2.2.1: d dt Rt = Rt(ωt +wω t )×, Yn = (RT tnb1 +V 1 n ,RT tnb2 +V 2 n ), where Rt ∈SO(3) represents the rotation that maps the body frame to the earth-fixed frame, and ωt is the instan- taneous angular rotation vector which is assumed to be measured by noisy gyroscopes. (Yn)n≥0 is a sequence of Algorithm 2 IEnKF Define H by h(exp(ξ),0) = h(Id,0)+Hξ +O(∥ξ∥2) Sample M particles (ηi 0)1⩽i⩽M following the prior error density for n = 0 to N −1 do for i = 1 to M do η′i n+1 = W i nϒηi nϒ−1, yi n+1 = h(η′i n+1,V i n+1), end for Pn+1|n = 1 M ∑M i=1 exp−1(η′i n+1)exp−1(η′i n+1)T Sn+1 = 1 M ∑M i=1 yi n+1yi n+1 T, Ln+1 = Pn+1|nHTS−1 n+1 for i = 1 to M do ηi n+1 = η′i n+1 exp(−Ln+1[yi n+1 −h(Id,0)]) end for Store the gain matrix Ln+1 end for The prior estimation ˆχ0 is an input. for n = 0 to N-1 do ˆχ′ n+1 = ϒ ˆχnΩn, ˆχn+1 = exp(Ln+1[ ˆχ′n+1Yn+1 −h(Id,0)]) ˆχ′n+1 end for discrete noisy measurements of the vectors b1 and b2 (for instance the gravity and the magnetic field), V 1 n and V 2 n are sequences of independent isotropic white noises. A sim- ulation has been performed using the parameters given in Table 3. In this case the IEKF equations are described by Algorithm 3. As the state-of-the-art methods for this par- ticular problem are the Multiplicative Extended Kalman Filter (MEKF, see e.g. [12]) and the USQUE filter [11] (a quaternion implementation of the Unscented Kalman Filter), the four methods are compared. The evolution of the gains is displayed on Fig. 2 and shows the interest of the invariant approach. The error variable is expressed in the Lie algebra and its projection to the first axis is given on Fig. 3 for each method (the projections on other axes being very similar, they are not displayed due to space limitations). All perform satisfactorily, but some differ- ences are worthy of note. First we can see that the MEKF and the IEKF yield comparable performances but only the gains of the IEKF converge. Moreover, the linearizations lead the MEKF and the IEKF to fail capturing accurately the error dispersion through a 3σ-envelope. To this re- spect, the USQUE performs better than the two latter fil- ters but still does not succeed in capturing the uncertainty very accurately. On the other hand, the envelope provided by the EnKF is very satisfying. This result is not surpris- ing: this envelope has been computed using a sampling of the true density of the error, there is thus no reason why it should not be a valid approximation as long as there are sufficiently many particles. This method is thus to be pre- ferred to the other ones when the user is willing and able to perform extensive simulations, and has the capacity to store the gains over the whole trajectory. Algorithm 3 Invariant Extended Kalman Filter on SO(3) Returns at each time step ˆRn, Pn such that Rn ≈ exp(ξn) ˆRn, where ξn is a centered Gaussian and var(ξn) = Pn. ˆR0 and P0 are inputs. Hξ =  (b1)× (b2)×  , HV = I6, QV =  Var(V 1) 0 0 Var(V 2)  , Qw = R tn+1 tn Var(wω s )ds. Set n = 0. loop Let Ωn be the solution at tn+1 of Ttn = I3, d dt Tt = (ωt)×. Pn+1|n = Pn +Qw, Sn+1 = HVQVHT V +HξPn+1|nHT ξ , Ln+1 = Pn+1|nHT ξ S−1 n+1, Pn+1 = (I −Ln+1Hξ)Pn+1|n, ˆR′ n+1 = ˆRnΩn ∆χ = Ln+1[  ˆR′n+1Y 1 n+1 ˆR′n+1Y 2 n+1  −  b1 b2  ], ˆRn+1 = exp(∆χ) ˆR′n+1 (see Table 2) n ←n+1 end loop 6 Conclusion This paper has introduced a proper stochastic framework for a class of filtering problems on Lie groups. A wide class of discrete-time filters with strong properties has Figure 2: Evolution over time of the coefficients of the gain matrix for MEKF, IEKF, USQUE and IEnKF. The coefficients of the MEKF and USQUE have an erratic evolution, whereas those of the IEKF are convergent, al- lowing to save much computation power using asymptotic values. Moreover, in this case, only 4 of the 18 coeffi- cients are not equal to zero for the IEKF, allowing sparse implementation. been proposed, along with several methods to tune the gains. When the gain is held fixed, the estimation error has the remarkable property to provably converge in dis- tribution. Another route is to focus more on the transi- tory phase, and to seek to minimize the dispersion of the estimation error at each step based on linearizations and Gaussian noise approximations. This has lead to an in- variant version of the celebrated EKF, having the property that the error covariance matrix converges asymptotically, as well as to an invariant version of the EnKF which has been shown to convey an accurate indication of the level of uncertainty carried by the estimate. The results of the paper open the door to a rich vari- ety of filters on Lie groups. Indeed, all the classical gain tuning approaches can still be considered, and a method based on the unscented transform for instance, complet- ing the work [11] seems an interesting idea. In a situation where many observations are available at each time step, an information filter could also be derived mimicking the IEKF. Besides, we believe that the cases where the gains of the IEKF are sparse should be given special attention, Figure 3: Evolution of the error variable, projected on the first axis, for 1000 simulations using MEKF, IEKF, USQUE and IEnKF. The MEKF and the IEKF give sim- ilar results. The USQUE is doing slightly better but is outperformed by the IEnKF. Only the latter captures the error dispersion satisfactorily, as the 3σ envelope contains 99 % of the simulated trajectories. by studying when this property occurs and if significant computation power and memory can be saved. The possi- bilities offered by off-line simulations deserve also to be deepened. The computation of exact confidence sets can be explored, establishing a link with the approach of set- valued observers proposed in [8] and [38]. Concerning the asymptotic method introduced here, the optimization has been performed quite naively as efficient density sim- ulation is beyond the scope of the paper, but there may certainly be some room for improvements using Markov Chain Monte-Carlo (MCMC) schemes to speed up con- vergence. In the future we will concentrate on proving the convergence in distribution of the IEKF, that can be conjectured at this stage, but does not seem easy. A Proof of Proposition 2 For any i ∈N, let Mi(R) denote the set of square matri- ces of size i. Consider the isomorphisms φX : X ∈RN → IdN 0N,1 XT 1  ∈MN+1(R) and φY : Y ∈Rp →  HT Y T  ∈ MN+1,p. Letting χn = φX(Xn), Y ∗ n = φY(Yn), V ∗ n = φY(Vn), W ∗ n = φX(Wn), Ωn = φX(Bn) and ϒn = IdN+1 equations (7) become χn+1 = ϒnW ∗ n χnΩn and Y ∗ n = χ−1 n V ∗ n . Using the inverse isomorphisms φ −1 X and φ −1 Y , the invariant update ˆχn+1 = Kn( ˆχ′ n+1Y ∗ n+1) ˆχn+1 corresponds in Rn to (7). B Proofs of the results of Section 4.1 We first introduce here the basic notions about Harris Chains that we will need, to prove the stochastic conver- gence properties of the invariant filters. Definition 3. (First hitting time) Let S be a measurable space and (Xn)n⩾0 a Markov chain taking values in S. The first hitting time τC of a subset C ⊂S is defined by τC = infn⩾0 {Xn ∈C}. Definition 4. (Recurrent aperiodic Harris chain) Let S be a measurable space and (Xn)n⩾0 a Markov chain taking values in S. (Xn)n⩾0 is said to be a recurrent aperiodic Harris chain if there exist two sets A,B ⊂S satisfying the following properties: i For any initial state X0 the first hitting time of A is a.s. finite. ii There exists a probability measure ρ on B, and ε > 0 such that if x ∈A and D ⊂B then P(X1 ∈D|X0 = x) > ερ(D). iii There exists an integer N ⩾0 such that: ∀x ∈A,∀n ⩾ N,P(Xn ∈A|X0 = x) > 0. The somewhat technical property ii means that any given area of B can be reached from each point of A with non-vanishing probability. Theorem 7. [Harris, 1956] A recurrent aperiodic Harris chain admits a unique stationary distribution ρ∞and the density of the state Xn converges to ρ∞in T.V. norm for any distribution of X0. Proof. See [16], Theorems 6.5 and 6.8 In other words, if for any initialization we can ensure that the process will come back with probability 1 to a central area A where mixing occurs, we have a conver- gence property. The following technical result is a cor- nerstone to demonstrate the theorems of Subsection 4.1. Lemma 2. Let G be a locally compact Lie group pro- vided with a right-invariant distance d, and C ⊂G be a measurable compact set endowed with the σ-algebra Σ. Consider a homogeneous Markov chain (ηn)n⩾0 defined by the relation η0 ∈C, ηn+1 = qz(ηn) where z ∈Z is some random variable belonging to a measurable space. Let 0 denote a specific point of Z . Let Q : (C,Σ) →R+, denote the transition kernel of the chain, that is Q(x,V) = P(qz(ηn) ∈V | ηn = x). We assume that (a) there exist real numbers α,ε > 0 such that for any x ∈ C and V ⊂Bo(q0(x),α), the latter denoting the open ball center q0(x) and radius α, we have Q(x,V) = P(qz(ηn) ∈V | ηn = x) ⩾εµG(V) where µG is the right-invariant Haar measure. (b) q0 admits a fixed point x0 ∈C, i.e. q0(x0) = x0. Let U0 = {x0}. Define the sets (Un)n⩾0 recursively by U′ n = {x ∈G,d(x,Un) < α 2 } ∩C and Un+1 = q−1 0 (U′ n). If there exists an integer N ⩾0 such that UN = C, then the p.d.f. of ηn converges for the T.V. norm and its limit does not depend on the initialization. More prosaically, z denotes the cumulated effect of the model and observation noise, and it is assumed that 1- the noiseless algorithm has some global convergence proper- ties and 2- there are sufficiently many small enough noises so that there is an ε chance for ηn to jump from x to any neighborhood of the noiseless iterate q0(x) with a multi- plicative factor corresponding to the neighborhood’s size. It then defines a sequence of sets, by picking a fixed point of q0, dilating it, and taking its pre-image. Dilatation and inversion are then reiterated until the whole set C is cov- ered. In this case, forgetting of the initial distribution is ensured. Proof. We will demonstrate the property trough three in- termediate results: 1. There exists a sequence ε1,··· ,εN such that: P(η1 ∈ Un|η0 ∈Un+1) > εn > 0. 2. The first hitting time of U1 is a.s. finite. 3. (ηn)n≥0 is an aperiodic recurrent Harris chain with A = U1 and B = U′ 0. The conclusion will then immediately follow from The- orem 7. We give first a qualitative explanation of the approach. We have by assumption a sequence of sets (Un)n⩾0. Intermediate result 1) states that starting from a set Un+1 the chain has a non-vanishing chance to jump to the “smaller” set Un at the next time step. Interme- diate result 2) states that starting from Un+1, because of the non-vanishing property 1), the chain cannot avoid Un forever and eventually reaches Un, Un−1 and finally U1. Intermediate result 3) brings out the fact that, A = U′ 0 be- ing totally included in the ball of radius α centered on any of its points, the chain can go from any given point of its pre-image B = U1 to any area of A with controlled probability, which is the last property needed to obtain a recurrent aperiodic Harris chain. 1. Consider the closure U′n of U′ n. For any x ∈U′n the set Bo(x,α)∩Un is open and non-empty (as d(x,Un) ⩽ α 2 by definition of U′n) so µG(Bo(x,α) ∩Un) > 0. The function f : x →µG(Bo(x,α)∩Un) > 0 is con- tinuous (since µG is a regular and positive mea- sure) on the compact set U′n, so it admits a mini- mum mn > 0. We get: P(η1 ∈Un|η0 ∈Un+1) ≥ P(η1 ∈Bo(q0(η0),α) ∩Un|η0 ∈Un+1) = P(η1 ∈ Bo(q0(η0),α) ∩Un|q0(η0) ∈U′ n) ≥εmn using the fact that, by assumption (a), Q(x,V) ≥εµG(V) for x ∈C and V ⊂Bo(q0(x),α). We set εn = εmn to prove intermediate result 1). 2. We will prove by descending induction on n the property Pn : P(τn < ∞) = 1. PN is obvious as UN =C. Now assume Pn+1 is true for n ⩾1. Due to homogeneity, we can construct a strictly increasing sequence of stopping times (νp)p⩾0 such that: ∀p ⩾ 0,ηνp ∈Un+1. But P(ηνp+1 ∈Un|ηνp ∈Un+1) > εn by intermediate result 1). Thus letting Tk denote the event {∀i ⩽k,ηi /∈Un} = {k < τn}, we have for any n ⩾0: P(Tνp+1) = P(Tνp)P(Tνp+1|Tνp) ⩽P(Tνp)(1− εn). Thus P(Tνp+1) ⩽P(Tνp+1) ⩽P(Tνp)(1−εn). A quick induction and a standard application of Borel- Cantelli Lemma prove there exists a.s. a rank p such that Tνp is false. In other words, Pn is true. In par- ticular, P1 is true: the first hitting time of U1 is a.s. finite. 3. Define A =U1 and B =U′ 0 = Bo(x0, α 2 )∩C . For any V ⊂B and x ⊂A we have V ⊂Bo(q0(x),α), thus Q(x,V) ⩾εµG(V) by assumption (a). Thus (ηn)n⩾0 verifies property ii) of Definition 4. As A = U1 the intermediate result 2) shows that (ηn)n⩾0 verifies the property i) of Definition 4. As there exists a.s. a rank k such that ηk ∈U1 by intermediate result 2), there exists an integer M such that P(ηM ∈U1) > 0. As U1 ∩U′ 0 ⊂Bo(q(x),α) for any x ∈U1 (by defini- tion of U′ 0 and U1) Assumption (a) gives immediately P(ηM+1 ∈U1∩U′ 0) > P(ηM ∈U1)εµG(U1∩U′ 0). Us- ing only the definition of U′ 0 and U1, and Assump- tion (b), we see easily that both contain x0. Thus the set U′ 0 ∩U1 is non-empty and open. We have then µG(U1 ∩U′ 0) > 0 and P(ηM+1 ∈U1 ∩U′ 0) > 0. By an obvious induction on m, P(ηm ∈U1 ∩U′ 0) > 0 for any m > M. As U1 ∩U′ 0 ⊂A we get P(ηm ∈A) > 0 for any m > M and the property iii) of Definition 4 is also verified. We finally obtain that (ηn)n⩾0 is an aperiodic recurrent Harris chain. Using Theorem 7 we can conclude that the density of ηn converges in T.V. norm to its unique equilibrium distribu- tion for any initialization. Building upon the previous results, the proof of Theo- rem 3 is as follows. The Markov chain defined by equa- tions (13) and (14) with fixed ϒ can be written under the form η′ n+1 = qz(η′ n) = q1,z ◦q2,z(η′ n) with z = (Wn,Vn) ∈ G×Rp, q1,z(x) = ϒWnxϒ−1 and q2,z(x) = xK(h(x,Vn))−1. Let 0 = (Id,0). We will show that (η′ n)n⩾0 has all the properties required in Lemma 2. Let α,ε,ε′ be defined as in 4.1. Let α′ such as Bo(Id,α′) ⊂ϒBo(Id, α 2 )ϒ−1 and ε′′ = ε′|Adϒ−1|ε, where |Adϒ−1| is the determinant of the adjoint operator on g. For any x ∈C and V ⊂Bo(q0(x),α′) = Bo(Id,α′)q0(x) we have: P(qz(x) ∈V) ⩾P(qz(x) ∈V,q2,z(x) ∈Bo(q2,0(x), α 2 )) ⩾P(q2,z(x) ∈Bo(q2,0(x), α 2 ))× P(ϒWnq2,z(x)ϒ−1 ∈V|q2,z(x) ∈Bo(q2,0(x), α 2 )) i.e. P(qz(x) ∈ V) ⩾ ε′P(Wn ∈ ϒ−1Vϒq2,z(x)−1|q2,z(x) ∈Bo(q2,0(x), α 2 )). As- suming q2,z(x) ∈ Bo(q2,0(x), α 2 ) we have: ϒ−1Vϒq2,z(x)−1 ⊂ ϒ−1Bo(Id,α′)q0(x)ϒq2,z(x)−1 ⊂ Bo(Id, α 2 )ϒ−1q0(x)ϒq2,z(x)−1 ⊂ Bo(Id, α 2 )q2,0(x)q2,z(x)−1 ⊂Bo(Id, α 2 + α 2 ). As Wn is independent from the other variables we obtain: P(Wn ∈ϒ−1Vϒq2,z(x)−1|q2,z(x) ∈Bo(q2,0(x), α 2 )) ⩾ εµG(ϒ−1Vϒq2,z(x)−1) = εµG(ϒ−1Vϒ) = εµG(V)|Adϒ−1| And finally: P(qz(x) ∈V) ⩾ε′|Adϒ−1|εµ(V) = ε′′µ(V) As q0 has Id as a fix point we only have to ver- ify that the sets Un as defined in Lemma 2 eventually cover the whole set C. It suffices to consider for any n ⩾0 the set Dn = {x ∈C,∀k ⩾n,qk 0(x) ∈U′ 0}. As we have qn 0(x) →Id almost-everywhere on C we get: µG(∪n⩾0Dn) = µG(C). As the sequence of sets (Dn)n⩾0 increases we have: µG(Dn) −→ n→∞µG(C). We introduce here the quantity vmin = minx∈C µG(Bo(x, α′ 2 ) ∩C) (the property C = cl(Co) and the regularity of µG ensure that we have vmin > 0). Let N ∈N be such that µG(DN) > µG(C)−vmin. We have then: ∀y ∈C,d(y,DN) < α′ 2 (oth- erwise we would have µG(DN) ⩽µG(C) −vmin). As we have DN ⊂UN we obtain ∀y ∈C,d(y,DN) < α′ 2 thus U′ N = C and UN+1 = q−1 0 (U′ N) = q−1 0 (C) = C. So all the conditions of Lemma 2 hold and we can conclude con- vergence in T.V. norm to a stationary distribution which doesn’t depend on the prior, provided that its support lies in C. Theorem 3 is proved. Corollary 1 follows directly from the case C = G. Theorem 4 can be proved as follows. Let C = G. Let K = {x ∈G,h(x,0) = h(Id,0)}. Note that the left-right equivariance assumption ensures that K is a subgroup of G. First we show that there exists an integer N1 such that K ⊂UN1. As we have ∀x ∈K q0(x) = ϒxϒ−1 the se- quence of sets Qn = ϒ−nUnϒn∩K is growing and we have: {x ∈K, d(x,Qn) < α 2 } ⊂Qn+1.The set Q∞= ∪∞ n=1Qn is open in K as an union of open sets, but we have ∀x ∈ Q∞,Bo(x, α 2 )∩K ⊂Q∞thus ∀x ∈K \Q∞,Bo(x, α 4 )∩K ⊂ K \ Q∞. This implies that Q∞and K \ Q∞are both open in K. As K is connected (see assumptions of Theorem 4) we obtain Q∞= K. As K is compact (as a closed sub- set of a compact) the open cover ∪∞ i=1Qn has a finite sub- cover and there exists an integer N1 such that QN1 = K, i.e. ϒN1Kϒ−N1 = UN1. As K is a subgroup of G (see above) containing ϒ we obtain K ⊂UN1. Now we have to prove that the sets (Un)n⩾0 eventually cover the whole set C. For any x ∈G we have h(qn 0(x),0) →h(Id,0). Thus there exists a rank n such that ∀k ⩾n,qk 0(x) ∈U′ N (oth- erwise we could extract a subsequence from (qn 0(x))n⩾0 which stays at a distance > α 2 from K, as G is com- pact we could extract again a convergent subsequence and its limit q∞ 0 (x) would be outside K, thus we would have h(qn 0(x)) ̸→h(Id,0)). Here we can define as in the proof of Theorem 3 the sets Dn = {x ∈C,∀k ⩾n,qk 0(x) ∈U′ N1} (note that U′ 1 has been replaced by U′ N1 in this definition of Dn). As for almost any x ∈G there exists a rank n such that x ∈Dn we have µG(Dn) →µG(C) and as in the proof of Theorem 3 the sets Un eventually cover the set C. The second part of the property (invariance to left mul- tiplication) is easier. Consider the error process ζn = ˜ϒηn. The propagation and update steps read: ζ ′ n+1 = ˜ϒη′ n+1 = ˜ϒϒWnηnϒ−1 = ϒ˜ϒWnηnϒ−1 L= ϒWn ˜ϒηnϒ−1 L= ϒWnζϒ−1 ζn+1 = ˜ϒη′ n+1Kn+1(h(η′ n+1,Vn+1))−1 L= ˜ϒη′ n+1Kn+1(h(˜ϒη′ n+1,Vn+1))−1 L= ζ ′ n+1Kn+1(h(ζ ′,Vn+1))−1 where we have used the property: h(η′ n+1,Vn+1) L= η′−1 n+1h(Id,Vn+1) L= η′−1 n+1h(˜ϒ,Vn+1) L= h(˜ϒη′ n+1,Vn+1). We see that the law of the error process is invariant under left multiplication by ˜ϒ, thus the asymptotic distribution π in- herits this property. C Proofs of the results of Section 4.2 C.1 Proof of Proposition 4 We define the continuous process ˜γt : R →G as follows: ∀n ∈N, ˜γn = γn ∀n ∈N,∀t ∈[n,n+1[, d dt ˜γt = −˜γtk( ˜γn) We obtain immediately that ˜γt is continuous. Besides, for any n ∈N and t ∈]n,n+1[ one has: | d dt ⟨k( ˜γt),k( ˜γn)⟩| = |⟨d dt k( ˜γt),k( ˜γn)⟩| ⩽| d dt k( ˜γt)||k( ˜γn)| ⩽| d dt ˜γt||k( ˜γn)| (due to |∂k ∂x| ⩽1 ) ⩽|k( ˜γn)|2 (as d dt ˜γt = −˜γtk( ˜γn) ) Thus: ⟨k( ˜γt),k( ˜γn)⟩⩾⟨k( ˜γn),k( ˜γn)⟩−(t −n)|k( ˜γn)|2 ⩾0 proving in the Lie algebra that ⟨k( ˜γt),k( ˜γn)⟩⩾0. As u is a positive function we have thus u( ˜γt)−1⟨˜γtk( ˜γt),−˜γtk( ˜γn)⟩⩽0 immediately proving ⟨gradE( ˜γt), d dt ˜γt⟩⩽0. The latter result being true for every t ∈R\N and the function E( ˜γt) being continuous, it decreases and thus converges on R≥0. As we have supposed the sublevel sets of E are bounded (and closed as E is continuous), ˜γt is stuck in a compact set. Let γ∞an adherence value of ˜γn. By continuity of E we have E( ˜γ∞) = E( ˜γ∞exp(−k( ˜γ∞)). Thus the function t →E( ˜γ∞exp(−tk( ˜γ∞))) is decreasing on [0,1] and has the same value at 0 and 1. Thus it is constant, its deriva- tive at 0 is null proving γ∞= Id. ( ˜γn)n⩾0 being confined in a compact set and having Id as unique adherence value we finally get γn = ˜γn →Id. C.2 Proof of Proposition 3 Let γn = Rn ˆRT n be the invariant error. Its equation reduces to: γn+1 = γn.exp(−k1(γT n b1)×b1 −k2(γT n b2)×b2) Let k(γ) = k1(γTb1) × b1 + k2(γTb2) × b2 and E : γ → k1||γTb1 −b1||2 + k2||γTb2 −b2||2. To apply Proposi- tion 4 we will first verify that 1) ∀γ ∈SO(3),γ.k(γ) = gradE(γ) 2) | ∂k ∂γ | ⩽1. To prove 1) consider the dynamics in ∈SO(3) defined by d dt γt = γt(ψt)× for some rotation vector path ψt. We have d dt E(γt) = k1(γT t b1 −b1)T d dt γT t b1 + k2(γT t b2 −b2)T d dt γT t b2. Using triple product equalities this is equal to ⟨k1(γT t b1)×b1 + k2(γT t b2)×b2,ψt⟩= ⟨k(γt),ψt⟩= ⟨γt.k(γt), d dt γt⟩. Thus γ.k(γ) = gradE(γ). To prove 2) we analogously see that d dt k(γt) = −[k1(b1)×(γT t b1)× +k2(b2)×(γT t b2)×]ψt. Thus | d dt k(γt)| ⩽(k1 +k2)|ψt| and finally | ∂k ∂γ | ⩽1. Now, except if initially γ0 is the rotation of axis b1×b2 and angle π, the function E is strictly decreasing, and Id is the only point in the sublevel set {γ ∈SO(3) | E(γ) ≤E(γ0)} such that gradE(γ) = 0. Applying Proposition 4 allows to prove Proposition 3. References [1] S. Bonnabel A. Barrau. Intrinsic filtering on SO(3) with discrete-time observations. In IEEE Confer- ence on Decision and Control, 2013. [2] M. Barczyk and A. F. Lynch. Invariant observer de- sign for a helicopter uav aided inertial navigation system. Control Systems Technology, IEEE Trans- actions on, 21(3):791–806, 2013. [3] S. Bonnabel. Left-invariant extended Kalman filter and attitude estimation. In Decision and Control, 2007 46th IEEE Conference on, pages 1027–1032. IEEE, 2007. [4] S. Bonnabel, Ph. Martin, and P. Rouchon. Symmetry-preserving observers. Automatic Con- trol, IEEE Transactions on, 53(11):2514–2526, 2008. [5] S. Bonnabel, Ph. Martin, and P. Rouchon. Non-linear symmetry-preserving observers on Lie groups. IEEE Trans. on Automatic Control, 54(7):1709 – 1713, 2009. [6] S. Bonnabel, Ph. Martin, and E. Salaun. Invariant extended Kalman filter: theory and application to a velocity-aided attitude estimation problem. In De- cision and Control, 2009 held jointly with the 2009 28th Chinese Control Conference. CDC/CCC 2009. Proceedings of the 48th IEEE Conference on, pages 1297–1304. IEEE, 2009. [7] G. Bourmaud, R. M´egret, A. Giremus, Y. Berthoumieu, et al. Discrete extended Kalman filter on Lie groups. In 21st European Signal Processing Conference 2013 (EUSIPCO 2013), 2013. [8] S. Br´as, P. Rosa, C. Silvestre, and P. Oliveira. Global attitude and gyro bias estimation based on set-valued observers. Systems & Control Letters, 62(10):937– 942, 2013. [9] R. W. Brockett. Lie algebras and Lie groups in con- trol theory. In Geometric methods in system theory, pages 43–82. Springer, 1973. [10] D.L. Burkholder, ´E. Pardoux, A.S. Sznitman, and P.L. Hennequin. ´Ecole d’´et´e de probabilit´es de Saint-Flour XIX, 1989. Springer-Verlag, 1991. [11] J. L. Crassidis and F Landis Markley. Unscented filtering for spacecraft attitude estimation. Journal of guidance, control, and dynamics, 26(4):536–542, 2003. [12] J. L. Crassidis, F. Landis Markley, and Yang Cheng. A survey of nonlinear attitude estimation meth- ods. Journal of Guidance, Control,and Dynamics, 30(1):12–28, 2007. [13] T.E. Duncan. Some filtering results in Riemann manifolds. Information and Control, 35(3):182– 195, 1977. [14] T.E. Duncan. Stochastic systems in Riemannian manifolds. Journal of Optimization theory and ap- plications, 27(3):399–426, 1979. [15] T.E. Duncan. An estimation problem in compact Lie groups. Systems & Control Letters, 10(4):257–263, 1988. [16] R. Durrett. Probability: theory and examples, vol- ume 3. Cambridge university press, 2010. [17] Y. Han and F.C. Park. Least squares tracking on the Euclidean group. Automatic Control, IEEE Trans- actions on, 46(7):1127–1132, 2001. [18] K. Itˆo. Stochastic differential equations in a differ- entiable manifold. Nagoya Mathematical Journal, 1:35–47, 1950. [19] S.J. Julier and J.K. Uhlmann. A new exten- sion of the Kalman Filter to Nonlinear Systems. Proc. of AeroSense : The 11th Int. Symp. on Aerospace/Defence Sensing, Simulation and Con- trols, 1997. [20] R. Kalman and R. Bucy. New results in linear filter- ing and prediction theory. Basic Eng., Trans. ASME, Ser. D,, 83(3):95–108, 1961. [21] Ch. Lageman, J. Trumpf, and R. Mahony. Gradient- like observers for invariant dynamics on a Lie group. Automatic Control, IEEE Transactions on, 55(2):367–377, 2010. [22] E. J. Lefferts, F Landis Markley, and M. D. Shus- ter. Kalman filtering for spacecraft attitude estima- tion. Journal of Guidance, Control, and Dynamics, 5(5):417–429, 1982. [23] M. Liao. L´evy processes in Lie groups. Cambridge University Press, 2004. [24] JT-H Lo. Optimal estimation for the satellite atti- tude using star tracker measurements. Automatica, 22(4):477–482, 1986. [25] J. Trumpf M. Zamani and R. Mahony. Near-optimal deterministic filtering on the rotation group. IEEE Trans. on Automatic Control, 56:6:1411 – 1414, 2011. [26] R. Mahony, T. Hamel, and J-M Pflimlin. Non- linear complementary filters on the special orthog- onal group. IEEE-Trans. on Automatic Control, 53(5):1203–1218, 2008. [27] F Landis Markley, J. L. Crassidis, and Y. Cheng. Nonlinear attitude filtering methods. In AIAA Guid- ance, Navigation, and Control Conference, 2005. [28] Ph. Martin and E. Sala¨un. Invariant observers for attitude and heading estimation from low-cost iner- tial and magnetic sensors. In 46th IEEE Conf Decis Contr, pages 1039–1045, 2007. [29] S.K. Ng and P.E. Caines. Nonlinear filtering in Rie- mannian manifolds. IMA journal of mathematical control and information, 2(1):25–36, 1985. [30] H. Nijmeijer and T. I. Fossen. New directions in non- linear observer design, volume 244. Springer, 1999. [31] P. J. Olver. Classical Invariant Theory. Cambridge University Press, 1999. [32] W. Park, Y. Liu, Y. Zhou, M. Moses, G. S. Chirikjian, et al. Kinematic state estimation and mo- tion planning for stochastic nonholonomic systems using the exponential map. Robotica, 26(4):419– 434, 2008. [33] M. Pontier and J. Szpirglas. Filtering on manifolds. In Stochastic Modelling and Filtering, pages 147– 160. Springer, 1987. [34] A. Saccon, J. Trumpf, R. Mahony, and A. P. Aguiar. Second-order-optimal filters on lie groups. [35] S. Said. Estimation and filtering of processes in ma- trix Lie groups. PhD thesis, Institut National Poly- technique de Grenoble-INPG, 2009. [36] S. Said and J. H. Manton. On filtering with observa- tion in a manifold: Reduction to a classical filtering problem. SIAM Journal on Control and Optimiza- tion, 51(1):767–783, 2013. [37] S. Salcudean. A globally convergent angular ve- locity observer for rigid body motion. Automatic Control, IEEE Transactions on, 36(12):1493–1497, 1991. [38] A. K. Sanyal, T. Lee, M. Leok, and N. H. Mc- Clamroch. Global optimal attitude estimation using uncertainty ellipsoids. Systems & Control Letters, 57(3):236–245, 2008. [39] A. Tayebi, S. McGilvray, A. Roberts, and M. Moallem. Attitude estimation and stabilization of a rigid body using low-cost sensors. In Decision and Control, 2007 46th IEEE Conference on, pages 6424–6429. IEEE, 2007. [40] J. Thienel and Robert M Sanner. A coupled non- linear spacecraft attitude controller and observer with an unknown constant gyro bias and gyro noise. Automatic Control, IEEE Transactions on, 48(11):2011–2015, 2003. [41] R. Tron, B. Afsari, and R. Vidal. Riemannian con- sensus for manifolds with bounded curvature. 2012. [42] J.F. Vasconcelos, R. Cunha, C. Silvestre, and P. Oliveira. A nonlinear position and attitude ob- server on SE(3) using landmark measurements. Sys- tems Control Letters, 59:155–166, 2010. [43] A. S. Willsky. Some estimation problems on Lie groups. In D.Q. Mayne and R.W. Brockett, editors, Geometric Methods in System Theory, volume 3 of NATO Advanced Study Institutes Series, pages 305– 314. Springer Netherlands, 1973. [44] A. S. Willsky and S. I. Marcus. Estimation for bilin- ear stochastic systems. In Variable Structure Sys- tems with Application to Economics and Biology, pages 116–137. Springer, 1975. [45] A.S. Willsky. Dynamical Systems Defined on Groups: Structural Properties and Estimation. PhD thesis, MIT Dept. of Aeronautics and Astronautics, May 1973. [46] K. C. Wolfe, M. Mashner, and G. S. Chirikjian. Bayesian fusion on Lie groups. Journal of Algebraic Statistics, 2(1):75–97, 2011. [47] E. Wong and M. Zakai. Riemann-Stieltjes approxi- mations of stochastic integrals. Probability Theory and Related Fields, 12(2):87–97, 1969. Table 1: Artificial horizon: experiment parameters Standard Deviation of the model noise 0.01 degree = 1.75×10−4 rad Standard Deviation of the regular observations 0.1 degree = 1.75×10−3 rad Standard Deviation of the outliers 30 degrees Probability of the outliers to occur 0.01 Table 2: Useful formulas for some classical matrix Lie groups G SO(3) (Rdimg = R3) (Rdimg = R6) Embedding of G R ∈M3(R),RTR = I3 R v 0 1 ! ,R ∈SO(3),u ∈R3 Embedding of g A3 : ψ ∈M3(R),ψT = −ψ ψ u 0 0 ! ,ψ ∈A3,u ∈R3 (.)m ξ →(ξ)× ξ u ! → (ξ)× u 0 0 ! x →Adx R →R R T ! → R 0 (T)×R R ! ξ →adξ ξ →(ξ)× ξ u ! → (ξ)× 0 (u)× (ξ)× ! exp exp(x) = I3 + sin(||x||) ||x|| (x)× exp(x) = I4 +(x)m + [1−cos(||x||)] ||x||2 (x)2 × + 1 ||x||2 (1−cos|x|)(x)2 m + 1 ||x||3 (||x||−sin||x||)(x)3 m Table 3: Observation of two vectors: parameters Parameter b1 b2 QV1 QV2 Value (1,0,0) (0,1,0) 0,08732I3 0,08732I3 Parameter P0 Qw N Simulations Value 0.52362I3 0,017452I3 50 1000