arXiv:1204.0067v1 [cs.RO] 31 Mar 2012 1 Estimating Rigid Transformation Between Two Range Maps Using Expectation Maximization Algorithm Shuqing Zeng † Abstract We address the problem of estimating a rigid transformation between two point sets, which is a key module for target tracking system using Light Detection And Ranging (LiDAR). A fast implementation of Expectation-maximization (EM) algorithm is presented whose complexity is O ( N ) with N the number of scan points. I. INTRODUCTION Rigid registration of two sets of points sampled from a surface has been widely investigated (e.g., [1], [4]–[6], [9]) in computer vision literature. Generally, these methods are designed to tackle range maps with dense points for non-realtime applications. In [2], [8] scans are matched using iterative closest line (ICL), a variant of “normal-distance” form of ICP algorithm [1] originally proposed in computer vision community by [3]. However, the convergence of this approach is sensitive to errors in normal direction estimations [10]. m s j k T w ( s , m ) j k Fig. 1. Illustration of the proposed algorithm. Fig. 1 illustrates the concept. The light green circles denote the contour of a target M . The red circles are the projection of M under a rigid transformation T , denoted as ̃ M . Let S be the current range image shown as upper triangles. We propose an Expectation-maximization (EM) algorithm [4], [7] to find the rigid transformation such that the projected range image best matches the current image. Each point m j in ̃ M is treated as the center of a Parzen window . There is an edge between s k ∈ S and m j if s k lies in the window. The weight of the edge ( s k , m j ) is based on the proximity between the two vertices. The larger weight of the edge, the thicker the line is shown, and the more force that pulls the corresponding the point m j to s k through T . This document describes a fast implementation of expectation maximization (EM) algorithm [7] to locally match between M and S . By exploiting the sparsity of the locally matching matrix, this imple- mentation scales linearly with the number of points. † Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, e- mail:zengshuq@msu.edu. 2 II. ALGORITHM DERIVATION This section is devoted to the problem of how to estimate the rigid transformation T using (EM) algorithm, giving scan map S and a contour model M . We constructed a bipartite graph B = ( M , S , E B ) between the vertex set M to S with E B the set of edges. Let m ∈ M and s ∈ S . An edge exists between the points s and m if and only if ‖ s − m ‖ < W with W a distance threshold. By N ( s ) ≡ { m | ( s, m ) ∈ E B } we denote the neighborhood of s . Scan points are indexed using a lookup hash-table with W/ 2 resolution. Find the points m near a point s within the radius W involving searching through all the three-by-three neighbor grid of the cell containing s . Since hash table is used, and |N ( s ) | is bounded, construction graph B is an O ( N ) operation with N the number of points in a scan. Let s j ∈ S be one of the n S scan points, and m k ∈ M be one of the n M points from the model. We denote T a rigid transformation from the model to the new scan frame, with the parameter vector y . If s j is the measure of m k (i.e., ( s j , m k ) ∈ B ) with a known noise model, we write the density function as p ( s j | m k , y ) = p ( s j | T ( m k , y )) . In case of an additive and centered Gaussian noise of precision matrix Γ , p ( s j | m k , T ) = c exp( − ‖ s j − T ( m k , y ) ‖ 2 Γ 2 ) where the Mahalanobis norm is defined as ‖ x ‖ 2 Γ ≡ x T Γ x . We use the binary matrix A to represent the correspondence between s j and m k . The entry A jk = 1 if s j matches m k and 0 otherwise. Assume each scan point s j corresponds to at most one model point. We have Σ k A jk = { 1 If N ( s j ) 6 = ∅ 0 Otherwise. for all scan point index j . For the above equation, we note that for the case N ( s j ) = ∅ , s j is an outlier, and the correspondence s j to m k can be treated as a categorical distribution . In order to apply EM procedure we use a random matching matrix A with each element a binary random variable. Each eligible matching matrix A has a probability p ( A ) ≡ p ( A = A ) . One can verify that ̄ A jk = E {A jk } = P ( A jk = 1) , and the following constraint holds Σ k ̄ A jk = { 1 If N ( s j ) 6 = ∅ 0 Otherwise. Considering the distribution of A j , the j -th row of the A , which is the distribution of assigning the scan point s j to the model point m k , i.e., p ( A j ) = ∏ m k ∈N ( s j ) ( ̄ A jk ) A jk Assuming the scan points are independent, we can write p ( A ) = ∏ s j ∈S ∏ m k ∈N ( s j ) ( ̄ A jk ) A jk = ∏ ( s j ,m k ) ∈ E B ( ̄ A jk ) A jk (1) An example of p ( A ) is the noninformative prior probability of the matches: a probability distribution that a given scan point is a measure of a given model point without knowing measurement information: ̄ A jk = π jk = { 1 |N ( s j ) | If N ( s j ) 6 = ∅ 0 Otherwise. The joint probability of the scan point s j and the corresponding assignment A j can be expressed as p ( s j , A j | M , y ) = ∏ m k ∈N ( s j ) ( π jk p ( s j | m k , y )) A jk 3 Providing that the scan points are conditionally independent, the overall joint probability is the product of the each row of A : p ( S , A | M , y ) = ∏ ( s j ,m k ) ∈ E B ( π jk p ( s j | m k , y )) A jk (2) and the logarithm of marginal distribution can be written as ML ( T ) = log p ( S | M , y ) = log (∑ A p ( S , A | M , y ) ) (3) Unfortunately, Eq. (3) has no closed-form solution and no robust and efficient algorithm to directly minimize it with respect to the parameter y . Noticing that Eq. (3) only involves the logarithm of a sum, we can treat the matching matrix A as latent variables and apply the EM algorithm to iteratively estimate y . Assuming after n -th iteration, the current estimate for y is given by y n , we can compute an updated estimate T such that ML ( T ) is monotonically increasing, i.e., ∆( y | y n ) = ML ( y ) − ML ( y n ) > 0 Namely, we want to maximize the difference ∆( y | y n ) . Now we are ready to state two propositions whose proofs are relegated to Appendix. Proposition 2.1: ∆( y | y n ) = E A|S , M , y n { log ( p ( S , A | M , y )) } Proposition 2.2: Given the transformation estimate y n , scan points S and model points M , the posterior of the matching matrix A can be written as p ( A | S , M , y n ) = ∏ j,k :( s j ,m k ) ∈ E B ( ˆ A jk ) A jk (4) where E {A} jk = ˆ A jk = { π jk p ( s j | m k , y n ) ∑ k π jk p ( s j | m k , y n ) If N ( s j ) 6 = ∅ 0 Otherwise. (5) Therefore, we have the following EM algorithm to compute y that maximizes the likelihood defined in Eq. (3). We assume there exists an edge in the graph B between s j and m k in the following derivation. • E-step: Given the previous estimate T n , we update ˆ A jk using Eq. (5). The conditional expectation is computed as ∆( y | y n ) = E A|S , M , y n { log p ( S , A | M , y ) } = E    log  ∏ j,k π jk p ( s j | m k , y ) A jk      = ∑ j,k E {A jk } (log p ( s j | m k , y ) + log π jk ) = ∑ j,k ˆ A jk ‖ s j − T ( m k , y ) ‖ 2 Γ + const. (6) where E is E A|S , M , y n in short, and const. is the terms irrelevant to y . • M-step: Compute y to maximize the least-squares expression in Eq. (6). The above EM procedure is repeated until the model is converged, i.e., the difference of log-likelihood between two iterations ∆( y | y n ) is less than a small number. The complexity of the above computation for a target in each iteration is O ( | E B | ) . Since the number of neighbors for s j is bounded, the complexity is reduced to O ( |S| ) . Since experimental result shows that only 4-5 epochs are needed for EM iteration 4 to converge. Consequently, the overall complexity for all of the tracked objects is O ( N ) with N the number of scan points. The following proposition shows how to compute the covariance matrix for the transformation param- eters y . Proposition 2.3: Given y , the covariance matrix R is R = 1 n P ∑ ( s j ,m k ) ∈ E B ˆ A jk ( s j − T ( m k , y ))( s j − T ( m k , y )) T (7) where n P is the number of the nonzero rows of the matrix ˆ A . III. P ROOF OF P ROPOSITIONS A. Proof of Proposition 2.1 ∆( y | y n ) = ML ( y ) − ML ( y n ) = log (∑ A p ( S , A|M , y ) ) − log ( p ( S | M , y n )) = log (∑ A p ( S|A , M , y ) p ( A|M , y ) ) − log p ( S|M , y n ) = log (∑ A p ( A|S , M , y n ) p ( S|A , M , y ) p ( A|M , y ) p ( A|S , M , y n ) ) − log p ( S|M , y n ) ≥ ∑ A p ( A|S , M , y n ) log ( p ( S|A , M , y ) p ( A|M , y ) p ( A|S , M , y n ) ) − log p ( S|M , y n ) (8) = ∑ A p ( A|S , M , y n ) log ( p ( S|A , M , y ) p ( A|M , y ) p ( A|S , M , y n ) p ( S|M , y n ) ) where Jansen’s inequality and convexity of logarithm function are applied in deriving Eq. (8). Since we are maximizing ∆( y | y n ) with respect to y , we can drop terms that are irrelevant to y , thus ∆( y | y n ) = ∑ A p ( A|S , M , y n ) log ( p ( S|A , M , y ) p ( A|M , y )) = ∑ A p ( A|S , M , y n ) log ( p ( S , A , y |M ) p ( A , y |M ) p ( A , y |M ) p ( y |M ) ) = ∑ A p ( A|S , M , y n ) log ( p ( S , A , y |M ) p ( y |M ) ) = ∑ A p ( A|S , M , y n ) log ( p ( S , A|M , y )) = E A|S , M , y n { log ( p ( S , A|M , y )) } (9) 5 B. Proof of Proposition 2.2 If N ( s j ) 6 = ∅ , the marginal PDF of the j -th row of A is p ( s j |M , y ) = ∑ k π jk p ( s j | m k , y ) . We assume there exists an edge in the graph B between the scan and model points s j and m k , and scan points are independent each other. One can verify that p ( S|M , y n ) = ∏ j (∑ k π jk p ( s j | m k , y n ) ) Using Bayesian theorem, we have p ( A|S , M , y n ) = p ( S , A|M , y n ) p ( S|M , y n ) = ∏ j,k ( π jk p ( s j | m k , y n )) A jk ∏ j ( ∑ k π jk p ( s j | m k , y n )) = ∏ j,k ( π jk p ( s j | m k , y n )) A jk ∏ j,k ( ∑ k π jk p ( s j | m k , y n )) A jk = ∏ j,k ( π jk p ( s j | m k , y n ) ∑ k π jk p ( s j | m k , y n ) ) A jk Comparing with Eq. (4), the equation Eq. (5) holds. C. Proof of Proposition 2.3 We treat the precision matrix Γ as the uncertainty of unknown transformation parameter y . We use a maximum likelihood approach, which amounts to minimizing Eq. (6) with respect to Γ given a transformation and a set of matches with probabilities: ∂ ∂ Γ ∆( y | y n ) = ∂ ∂ Γ ∑ ( s j ,m k ) ∈ E B ˆ A jk ( ‖ s j − T ( m k , y ) ‖ 2 Γ 2 + log | Γ | − 1 2 ) = 1 2 ∑ ( s j ,m k ) ∈ E B ˆ A jk ( s j − T ( m k , y ))( s j − T ( m k , y )) T − n P 2 Γ − 1 = 0 where n P is the number of nonzero rows of the matrix ˆ A . Thereby the covariance matrix R is computed as R = Γ − 1 = 1 n P ∑ ( s j ,m k ) ∈ E B ˆ A jk ( s j − T ( m k ))( s j − T ( m k )) T R EFERENCES [1] P. Besl and N. McKay. A method for registration of 3-D shapes. IEEE Trans. Pattern Analaysis and Machine Intelligence , 14(2):239–256, 1992. [2] A. Censi. An icp variant using a point-to-line metric. In IEEE International Conference on Robotics and Automation , pages 19–25, New York, NY, 2008. 6 [3] Y. Chen and G. Medioni. Object modeling by registration of multiple range images. Image and Vision Computing , 10(3):145–155, 1992. [4] S. Granger and X. Pennec. Multi-scale EM-ICP: A fast and robust approach for surface registration. In ECCV , pages 418–432, 2002. [5] A. Jagannathan and E. Miller. Unstructure point cloud matching within graph-theoretic and thermodynamic frameworks. In CVPR , pages 1008–1015, 2005. [6] A. Makadia, A. Patterson, and K. Daniilidis. Fully automatic registration of 3D point clouds. In CVPR , pages 1297–1304, 2006. [7] G. McLachain and T. Krishnan. The EM algorithm and extensions . John Wiley & Sons Inc., New York, second edition, 2008. [8] E. Olson. Real-time correlative scan matching. In IEEE International Conference on Robotics and Automation , pages 4387–4393, Kobe, Japan, 2009. [9] G. Sharp, S. Lee, and D. Wehe. ICP registration using invariant features. IEEE Trans. Pattern Analaysis and Machine Intelligence , 24(1):90–102, 2002. [10] C. Stewart. Uncertainty-driven, point-based image registration. In N. Paragios, Y. Chen, and O. Faugeras, editors, Handbook of Mathematical Models in Computer Vision , chapter 14, pages 221–235. Springer, 2006.