10 ANGUELOV ET AL. UAI2002 Learning Hierarchical Object Maps Of Non-Stationary Environments With Mobile Robots Dragomir Anguelov* Rahul Biswas* Daphne Koller* Benson Limketkai* Sebastian Thrun t *Computer Science Department Stanford University Stanford, CA 94305 Abstract Building models, or maps, of robot environments is a highly active research area; however, most existing techniques construct unstructured maps and assume static environments. In this paper, we present an al gorithm for learning object models of non-stationary objects found in office-type environments. Our al gorithm exploits the fact that many objects found in office environments look alike (e.g., chairs, recycling bins). It does so through a two-level hierarchical repre sentation, which links individual objects with generic shape templates of object classes. We derive an ap proximate EM algorithm for learning shape parame ters at both levels of the hierarchy, using local occu pancy grid maps for representing shape. Additionally, we develop a Bayesian model selection algorithm that enables the robot to estimate the total number of ob jects and object templates in the environment. Ex perimental results using a real robot equipped with a laser range finder indicate that our approach performs well at learning object-based maps of simple office en vironments. The approach outperforms a previously developed non-hierarchical algorithm that models ob jects but lacks class templates. 1 Introduction Building environmental maps with mobile robots is a key prerequisite of truly autonomous robots [19]. State-of-the art algorithms focus predominantly on building maps in static environments [20]. Common map representations range from lists of landrnarks [3, 9, 21], fine-grained grids of numerical occupancy values [6, 15], collections of point obstacles [11], or sets of polygons [12]. These representa tions are appropriate for mobile robot navigation in static environments. Real environments, however, consist of objects. For ex ample, office environments possess chairs, doors, recycling bins, etc. Many of these objects are non-stationary, that is, their locations may change over time. This observation motivates research on a new generation of mapping algo rithms, which represent environments as collections of ob jects. At a minimum, such object models would enable a robot to track changes in the environment. For example, a cleaning robot entering an office at night might realize that t School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 a recycling bin has moved from one location to another. It might do so without the need to learn a model of this recy cling bin from scratch, as would be necessary with existing robot mapping techniques [20]. Object representations offer a second, important advan tage, which is due to the fact that many office environ ments possess large collections of objects of the same type. For example, most office chairs are instances of the same generic chair and therefore look alike, as do most doors, recycling bins, and so on. As these examples suggest, at tributes of objects are shared by entire classes of objects, and understanding the nature of object classes is of signifi cant interest to mobile robotics. In particular, algorithms that learn properties of object classes would be able to transfer learned parameters (e.g., appearance, motion pa rameters) from one object to another in the same class. Such ability to generalize would have a profound impact on the accuracy of object models, and the speed at which such models can be acquired. If, for example, a cleaning robot enters a room it has never visited before, it might realize that a specific object in the room possesses the same vi sual appearance of other objects seen in other rooms (e.g., chairs). The robot would then be able to acquire a map of this object much faster. It would also enable the robot to predict properties of this newly seen object, such as the fact that a chair is non-stationary-without ever seeing this specific object move. In previous work, we developed an algorithm that has successfully been demonstrated to learn shape models of non-stationary objects [2]. This approach works by com paring occupancy grid maps acquired at different points in time. A straightforward segmentation algorithm was developed that extracts object footprints from occupancy grid maps. It uses these footprints to learn shape models of objects in the environment, represented by occupancy grid maps. This algorithm is related to work on learn ing generative object models in computer vision and med ical imaging. Frey and Jojic [7] describe an unsupervised approach which infers a set of object templates and their transformations from a set of camera images. Leventon et a!. [I OJ describe an alternative shape representation based on geodesic active contours. They show how to learn ob- UAI2002 ANGUELOV ET AL. 1 1 (a) (b) Figure I: (a) Generative hierarchical model of environ ments with non-stationary objects. (b) Representation as a graphical model. ject shape priors using the representation and how to use the object priors for tissue segmentation in tomography scans. This paper goes one step further by proposing an al gorithm that identifies classes of objects, in addition to learning plain object models. In particular, our approach learns shape models of individual object classes, from mul tiple occurrences of objects of the same type. By learning shape models of object types-in addition to shape models of individual objects--our approach is able to generalize across different object models that belong to the same ob ject class. This approach follows the hierarchical Bayesian framework (see [1, 8, 13]). We show that our approach leads to significantly more accurate models in real-world environments with multiple objects of the same type. The specific learning algorithm proposed here is an in stance of the popular EM algorithm [14]. We develop a closed-form solution for learning at both levels of the hi erarchy, which simultaneously identifies object models and shape templates for entire object classes. On top of this, we propose a Bayesian procedure for determining the appro priate number of object models, and object class models. We tested our algorithm on data gathered by a physical robot, which was equipped with a laser range finder. Our results suggest that our approach succeeds in learning ac curate shape and class models. A systematic comparison with our previous, non-hierarchical approach [2] illustrates that the use of class models yields significantly better re sults, both in terms of predictive power (as measured by the log-likelihood over testing data) and in terms of con vergence properties (measured by the number of times each algorithm is trapped in a local maximum of poor quality). 2 The Generative Hierarchical Model We begin with a description of the hierarchical model. The object level generalizes the approach of [2] to maps with continuous occupancy. The central innovation is the intro duction of a template level. 2.1 The Object Hierarchy Our object hierarchy (Figure Ia) is composed of two levels, the object template level at the top, and the physical object level at the bottom. The object template level consists of a set of M shape templates, denoted
[j])2 L-a, The summation over a k in calculating the expectations a � ] t n is necessary because of the mutual exclusion con straint described above, namely that no object can be seen twice in the same map. The summation is exponential in the number of observed objects K t-however, Kt is rarely larger than 10. If s umm ing over Gt (because of its expo nential domain) becomes too costly, efficient (and provably polynomial) sampling schemes can be applied for approxi mating the desired expectations [4, 16]. 4.2 Model M-Step Our M-step first generates a new hierarchical model II,
:� ";..,. .J: ·-� l.. . �. . . • -J .. . ,. "'c:r r, ,... . .... . . . ' .. ·r \• �'�· If.' r r .. ,. .. . . r;_..,' ' Figure 4: Nine maps from the Robotics Lab dataset. The number of objects present varies. sum of all objects ever seen: The number of class templates M is upper-bounded by the number of objects N. Our approach applies a Bayesian prior for selecting the right N and M, effectively transforming the learning prob lem into a maximum a posteriori (MAP) estimation prob lem. At both levels, we use an exponential prior, which in log-form penalizes the log-likelihood in proportion to the number of objects N and object templates M: Ea,iJ[logp(J.L, a, ,8 I 0,