Joint Belief and Intent Prediction for Collision Avoidance in Autonomous Vehicles Alan J. Hamlet University of Florida Gainesville, FL, USA AJHamlet@ufl.edu Carl Crane University of Florida Gainesville, FL, USA Carl.Crane@gmail.com ABSTRACT This paper describes a novel method for allowing an au- tonomous ground vehicle to predict the intent of other agents in an urban environment. This method, termed the cognitive driving framework, models both the intent and the potentially false beliefs of an obstacle vehicle. By modeling the relationships between these variables as a dynamic Bayesian network, filtering can be performed to calculate the intent of the obstacle vehicle as well as its belief about the environment. This joint knowledge can be exploited to plan safer and more efficient trajectories when navigating in an urban environment. Simulation results are presented that demonstrate the ability of the proposed method to calculate the intent of obstacle vehicles as an autonomous vehicle navigates a road intersection such that preventative maneuvers can be taken to avoid imminent collisions. Keywords: Non-linear filtering, autonomous vehicles, pre- diction, multi-agent systems 1. Introduction The potential saftey and conveinence benefits that au- tonomous vehicles can provide to our society are myriad. The World Health Organization reported that in 2010, 1.24 million people died due to road vehicle accidents [2]. In addition to the potential of reducing this massive loss of life, autonomous vehicles have shown promise in increasing vehicle efficiency and convenience for drivers [6, 8, 10]. The vast majority of current autonomous vehicle archi- tectures employ a reactionary response to changes in the environment. These systems require very frequent and rapid re-planning in order to avoid dynamic obstacles. Another intuitive approach is to have the autonomous vehicle predict where the dynamic obstacles are going to be in order to plan a path. One popular approach to making this prediction is to assume the dynamic obstacle continues to move in a straight line at its current velocity, as is done in the ’velocity obstacle’ literature [3, 13]. This approach does not take into account the control decisions made by the dynamic obstacle that affect its trajectory, as is the case for pedestrians and other vehicles. Some research has begun to incorporate the intentions of the dynamic obstacle in order to more intelligently predict its future position. Some methods used to predict intent are hidden Markov models [14, 11], Markov decision processes [5], and Gaussian processes or mixture models [9, 7]. These methods attempt to model trajectories and classify the dynamic obstacles’ motion according to the corresponding intent. While this body of research is a step toward realizing more intelligent vehicles that truly understand their environment, it fails to consider how the obstacles’ understanding of the environment will affect its future state. In this research, both the intent and the belief of a dynamic obstacle are considered when modeling the future states of the obstacle. This is beneficial for situations in which a dynamic obstacle, e.g. a pedestrian or another vehicle, may have an incorrect belief about the environment. For example, an obstacle vehicle trying to merge into traffic may believe it has more space than it actually does or it may not see an oncoming vehicle due to occlusions or driver error. In these situations, just knowing the driver’s intent does not suffice since for the same intent she may yield or begin to merge depending on her belief. In this paper, the dynamics of a multiple-vehicle system are modeled as a dynamic Bayesian network. Obstacle vehicles’ actions are dependent on both their intent and their belief of the surrounding environment. This idea is similar to that proposed in [4], but in this paper the problem is not formulated as a Markov decision process as to avoid discretization of the state space. This is required in order to achieve the resolution necessary for the autonomous vehicle domain. Inference is performed over the network using a particle filter to jointly estimate the vehicle’s intent and belief. Simulation results show that this method of joint inference allows an autonomous vehicle to predict a collision with enough time to take evasive action. The remainder of this paper is structured as follows. In section 2, an overview of the cognitive driving framework is given. The manner of representing the system state and dynamics is described. In section 3, the process for formulating the problem as inference over a dynamic Bayesian network is explained. The structure of the DBN is detailed and the method of performing joint inference over the network using particle filtering is discussed. Next, in section 4, simulation results demonstrating the accuracy of the proposed method are presented. Finally concluding remarks are given in section 5. 2. The Cognitive Driving Framework This section provides an overview of the cognitive driving framework by describing how the state of an intersection environment with multiple vehicles is represented as well as the form of the system dynamics. 2015 Florida Conference on Recent Advances in Robotics, FCRAR 2015 Melbourne, Florida, May 14 - 15, 2015 arXiv:1504.00060v1 [cs.RO] 31 Mar 2015 In the cognitive driving framework, or CDF, the system consists of two vehicles, the obstacle vehicle and the ego vehicle, in a known environment. The joint state of the two vehicles is called the system pose and is represented as S t = [ 1 x t 2 x t ] (1) where a superscript 1 denotes the obstacle vehicle and a superscript 2 denotes the ego vehicle. In this paper, the term ’ego vehicle’ refers to the vehicle running the CDF which is trying to predict the intent of the obstacle vehicle. In order to provide a general algorithm, the system dynamics are assumed to be nonlinear and of the form S t +1 = [ 1 x t +1 2 x t +1 ] = [ f ( 1 x t , 1 u t , 1 ν t ) f ( 2 x t , 2 u t , 2 ν t ) ] (2) where i u t is the control input and i ν t is the process noise for vehicle i at time t . The controller for the autonomous vehicle running the CDF (the ego vehicle) is assumed to be of the form 2 u t = h ( 2 x t , 1 x t , 2 I ) (3) where the arguments to the nonlinear function h() are the vehicle’s own state, the state of the obstacle vehicle, and the intent of the ego vehicle, respectively, at time t . The intent variable, i I , represents the current behavior the vehicle is trying to execute (e.g. turn left or go straight through intersection). The control for the obstacle vehicle is modeled similarly as 1 u t = h ( 1 x t , B t , 1 I ) (4) The difference here is that the obstacle vehicle is not assumed to have exact knowledge of the ego vehicle’s state. Instead, the obstacle vehicle’s controller operates on the assumed state of the ego vehicle, the belief , B t . It should be noted that in this context the belief is simply a point, not a distribution or density as sometimes used in the literature. If the ego vehicle has not been observed by the obstacle vehicle, then B t = ∅ . The obstacle vehicle’s belief updates according to the equations B t +1 = g ( B t , O t +1 ) (5) O t = k ( S t , β, e t ) (6) where O t is the obstacle vehicle’s observation at time t , and β is a parameter that represents the probability of the obstacle vehicle observing the ego vehicle at any given discrete time step. The observation noise, e t , is normally distributed with a mean of zero. Given B t and O t +1 , B t +1 updates deterministically. The observation model, k() , determines from the system pose if the ego vehicle is in the obstacle vehicle’s isovist : the volume of space with line of sight visibility from the obstacle vehicle’s pose. If the ego vehicle is occluded by other vehicles or buildings, it will not be in the obstacle vehicle’s isovist, and thus O t = ∅ . If the ego vehicle is in the obstacle vehicle’s isovist, then the obstacle vehicle will make a noisy observation of the ego vehicle’s pose with probability β . The goal of the cognitive driving framework is to allow the ego vehicle to predict the future states of the obstacle vehicle using this model in order to prevent collisions. This is done by performing online inference on the obstacle vehicle’s belief and intent, B t and I t . The following section details the procedure for performing this joint inference. Figure 1. The structure of the DBN used in the cognitive driving framework. 3. Joint Belief-Intent Inference In this section, the model outlined in the previous section is formulated as a dynamic Bayesian network, or DBN. The relationships between the variables in the network are described and how inference is performed over the network using a particle filter is explained. 3.1 Dynamic Bayesian Network The cognitive driving framework uses a dynamic Bayesian network to capture the dependencies between the random variables in the CDF system dynamics. A Bayesian network is a directed acyclic probabilistic graphical model that is used to represent a set of random variables and their conditional dependencies. A dynamic Bayesian network is a Bayesian network which relates the variables to each other over sequential time steps. Sometimes dynamic Bayesian networks are called two-time-slice Bayesian networks be- cause at any point in time t , the value of a variable in the network can be calculated from the prior value (at time t-1 ) and the independent variables. Kalman filter models and hidden Markov models are special cases of DBN’s, but, in general, DBN’s allow the hidden state of the system to be factored into separate variables so the structure of the dependencies between the variables can be exploited. The structure of the DBN used in the CDF is depicted in figure 1. This graphical model reflects the dependencies given by the equations in section 2. The gray nodes in the graph denote the variable known by the ego vehicle, the system pose, S t , as given in equation 1. In some contexts, because the value of this variable is provided to the ego vehicle by its sensors, it is called the observation. In this work the observation, O t , refers to the obstacle vehicle’s noisy measurement of the ego vehicle’s pose, 2 x t . Between time-slices, the variables in the DBN flow tem- porally from left to right and within a time-slice they flow (more-or-less) from top to bottom. The system pose affects the obstacle vehicle’s observation which in turn determines the obstacle vehicle’s belief. The obstacle vehicle’s intent 2015 Florida Conference on Recent Advances in Robotics, FCRAR 2015 Melbourne, Florida, May 14 - 15, 2015 Figure 2. Simplified model of the CDF using the joint DBN state variable X t . and belief of the system pose inform the obstacle vehicle’s action. The joint actions of the two vehicles result stochas- tically in the next system pose. Without loss of generality, the intent of the obstacle vehicle is assumed to be constant throughout an episode. 3.2 Filtering Now that the two-vehicle system dynamics are represented as a DBN, a method of filtering needs to be implemented in order to perform online inference of the obstacle vehicle’s belief and intent. By combining equations 2-6 we can represent the DBN state and dynamics, repectively, as X t = [ S t , B t , I ] T (7) X t +1 = F ( X t ) (8) This condenses the DBN in figure 1 to that shown in figure 2, which is the typical representation for filtering problems. The variable Y t represents the measurement, which in this study is the system pose, S t . As shown in section 2, the sytem dynamics are highly non- linear. Instead of linearizing the dynamics at the expense of accuracy of the estimation, a non-linear Monte-Carlo based filtering method was employed. Monte-Carlo (MC) methods are ideally suited for the current application due to their ability to model highly non-linear systems with multi-modal, non-Gaussian distributions. In this study, the DBN state is composed of both continuous and discrete variables, representing discrete intention hypotheses making traditional linearization methods such as the Extended or Unscented Kalman Filter unsuitable for this application. Particle filters are sequential Monte-Carlo methods that maintain an estimate of the posterior distribution of the system state as a set of particles. This non-parametric representation is capable of representing arbitrarily complex distributions as long as a large enough particle set is used. Each particle is initialized according to the a priori distribution and is propagated through the noisy system dynamics. The particles are then resampled according to the particles’ importance weights. The importance weight of a particle is proportional to the likelihood of the particle generating the measurement, Y t = S t . In this study, the likelihood is represented as a Gaussian distribution centered around the measured system pose. The weight of each particle is proportional to the probability of the system pose of the particle given the measured system pose, as shown in the equation below. w [ m ] t ∝ P ( X [ m ] t | S t ) ∼ N ( S t , Σ ) (9) Figure 3. A close up image of the simulated intersection environment with the ego vehicle (red) trying to predict the intent of the obstacle vehicle (blue). Σ is the variance of the Gaussian likelihood function. A superscript [ m ] denotes that the variable corresponds to the m th particle. The weights are normalized such that they sum to one. Once the importance weights are calculated, the particles are resampled in order to move the posterior distribution toward the region of the state space that matches the measurement. The technique of stratified or low-variance sampling is employed to reduce computational complexity [12]. The resulting set of particles is an approximation to the actual state of the system. As the number of particles, N , approaches infinity, the particle set converges to the true distribution of the state. 4. Experimental Results In order to demonstrate the ability of the CDF to perform joint inference on the intent and belief of an obstacle vehicle, a simulated experiment is conducted. This section describes the simulation set up and presents the results showing the CDF’s ability to detect and prevent vehicle collisions in a road intersection navigation scenario. 4.1 Simulation The CDF was tested using a simulated T-intersection scenario as depicted in figure 3. The autonomous vehicle running the CDF (the ego vehicle, in red) has the right of way. An obstacle vehicle (blue) is stopped at a stop sign at the T-intersection and can either turn left into the ego vehicle’s lane or turn right. It is desirable for the ego vehicle to predict not just the intention of the obstacle vehicle to turn left, but whether the obstacle vehicle is going to turn left in front of the ego vehicle or if it is going to yield. The simulation uses a bicycle kinematic model for the vehicles as described in [9]. The pose for a vehicle from equation 1 is given by the four dimensional vector x t =     x t y t θ t v t     (10) 2015 Florida Conference on Recent Advances in Robotics, FCRAR 2015 Melbourne, Florida, May 14 - 15, 2015 Prediction Scenario Cutoff Yield Right Total Cutoff 417 0 0 417 Yield 13 334 0 347 Right 0 0 236 236 1000 Figure 4. Simulation results using the cognitive driving framework. Each row corresponds to the scenario given in the leftmost column while the columns the prediction given by the CDF. where x t and y t are the position of the rear axle in the ground plane, θ t is the vehicle’s orientation, and v t is the speed of the vehicle, all at time t . The superscript indicating which vehicle the state corresponds to has been left off here for clarity. The dynamics from equation 2 are then given by x t +1 = f ( x t , u t , ν t ) =     x t + v t · ∆ t · cos θ t y t + v t · ∆ t · sin θ t θ t + v t · ∆ t l · tan ( 2 u t + 2 ν t ) v t + ( 1 u t + 1 ν t ) · ∆ t     (11) where the elements of the two dimensional control input are acceleration, 1 u t , and steering angle, 2 u t . The two components of the process noise, 1 ν t and 2 ν t , are both zero mean Gaussian noise affecting the realization of the controller’s commanded acceleration and steering angle. The parameter l is the wheelbase of the vehicle. In these simulations, a time step, ∆ t , of 0.1 seconds is used. The controller used in this simulation has a different path following controller for each intent, I , that is coupled with a hand tuned finite state machine that determines whether the vehicle should yield to the other vehicle or if it is clear to proceed. As shown in equations 3 and 4, the ego vehicle determines its control input based on the known positions of both vehicles, while the obstacle vehicle only has access to its own position and a noisy estimate of the ego vehicle’s position. 4.2 Results and Analysis Experiments were performed to compare the CDF to a purely reactionary planner. The simulation described in the previous section was run with the obstacle vehicle randomly choosing to turn left with a probability of 0.75 or right with a probability of 0.25. The parameter β in equation 6 was set to 0.05. This selection of β corresponds to about a 40 percent chance of the obstacle vehicle observing the ego vehicle within the first second of simulation. The ego vehicle has the right-of-way and is attempting to drive through the intersection at a speed of about 30 miles per hour (48 km/h ). During the first experiment, the ego vehicle is using the CDF to jointly estimate the obstacle vehicle’s belief and intent. If the obstacle vehicle’s predicted intent is to turn left and its belief is that the intersection is clear, then a collision is predicted to be imminent. The ego vehicle then brakes at the maximum rate in an attempt to avoid the collision. The maximum rate of deceleration used in the experiments was 16 f t/s 2 , which is reasonable for a passenger vehicle traveling on a road surface with a moderate coefficient of friction [1]. The table in figure 4 details the simulation results. The simulation was run for a total of 1000 episodes and the Collisions Collisions Collisions Imminent Occured Avoided CDF 417 9 98% Reactive 404 213 47% Figure 5. Comparison of simulation results for the CDF and a purely reactionary planner. obstacle vehicle cutoff the ego vehicle a total of 417 times. In this context, ’cutoff’ means that if the ego vehicle were to keep its speed constant and not take preventative measures, a collision would result. While 13 scenarios where falsely classified as imminent collisions, 100% of the true imminent collisions were recognized by the CDF. These false positives due to the obstacle vehicle observing the ego vehicle and braking at the last second. In the next experiment, the same simulation was run except a reactionary planner was used in place of the CDF. If any part of the obstacle vehicle entered the ego vehicle’s lane, the ego vehicle performed an emergency braking maneuver. The planner was run at a rate of 10 Hz. After a total of 1000 episodes, the ego vehicle was cutoff 404 times. The table in figure 5 compares the performance of the reactive planner to that of the CDF. The CDF was able to avoid 98% of the imminent collisions caused by the obstacle vehicle, while the reactive planner was only able to avoid 47% of the imminent collisions. 5. Conclusions This paper presented the cognitive driving framework, a method for joint inference of the intent and belief of an obstacle vehicle in an intersection navigation scenario. The goal of the CDF is to allow an autonomous vehicle to predict when a potentially hazardous situation is about to occur early enough to allow the autonomous vehicle to take evasive action to prevent a collision. The formulation of the problem as a dynamic Bayesian network was presented. A non-linear filtering method was proposed using a particle filter to estimate the posterior distribution of the state of the DBN. Finally, the accuracy of the estimation method was demonstrated by simulating an intersection navigation scenario where an obstacle vehicle cuts off the autonomous vehicle. The simulation results show that the CDF is able to predict and prevent 98% of imminent collisions. For comparison, the same simulation was run using a purely reactionary planner, resulting in only 47% of imminent collisions being avoided. References [1] Drive and stay alive, inc. http://www. driveandstayalive.com/Info%20Section/ stopping-distances.htm . Accessed: 2015-03-30. [2] Global health observatory (gho) data. http: //www.who.int/gho/road_safety/mortality/ traffic_deaths_number/en/ . Accessed: 2015-03- 15. [3] J. Alonso-Mora, A. Breitenmoser, P. Beardsley, and R. Siegwart. Reciprocal collision avoidance for multiple car-like robots. In Robotics and Automation (ICRA), 2012 IEEE International Conference on , pages 360– 366. IEEE, 2012. 2015 Florida Conference on Recent Advances in Robotics, FCRAR 2015 Melbourne, Florida, May 14 - 15, 2015 [4] C. L. Baker, R. R. Saxe, and J. B. Tenenbaum. Bayesian theory of mind: Modeling joint belief-desire attribution. In Proceedings of the thirty-second annual conference of the cognitive science society , pages 2469–2474, 2011. [5] T. Bandyopadhyay, C. Z. Jie, D. Hsu, M. H. Ang Jr, D. Rus, and E. Frazzoli. Intention-aware pedestrian avoidance. In Experimental Robotics , pages 963–977. Springer, 2013. [6] R. Bishop. Intelligent vehicle applications worldwide. Intelligent Systems and their Applications, IEEE , 15(1):78–81, 2000. [7] D. Ellis, E. Sommerlade, and I. Reid. Modelling pedestrian trajectory patterns with gaussian processes. In Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on , pages 1229–1234. IEEE, 2009. [8] T. C. Folsom. Social ramifications of autonomous urban land vehicles. In IEEE International Symposium on Technology and Society , 2011. [9] F. Havlak and M. Campbell. Discrete and continuous, probabilistic anticipation for autonomous robots in urban environments. Robotics, IEEE Transactions on , 30(2):461–474, 2014. [10] Z. Juan, J. Wu, and M. McDonald. Socio-economic impact assessment of intelligent transport systems. Tsinghua Science & Technology , 11(3):339–350, 2006. [11] R. Kelley, A. Tavakkoli, C. King, M. Nicolescu, M. Nicolescu, and G. Bebis. Understanding human intentions via hidden markov models in autonomous mobile robots. In Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction , pages 367–374. ACM, 2008. [12] G. Kitagawa. Monte carlo filter and smoother for non-gaussian nonlinear state space models. Journal of computational and graphical statistics , 5(1):1–25, 1996. [13] B. Kluge and E. Prassler. Recursive agent modeling with probabilistic velocity obstacles for mobile robot navigation among humans. In Autonomous Navigation in Dynamic Environments , pages 121–134. Springer, 2007. [14] D. Vasquez, T. Fraichard, and C. Laugier. Incremental learning of statistical motion patterns with growing hidden markov models. Intelligent Transportation Systems, IEEE Transactions on , 10(3):403–416, 2009. 2015 Florida Conference on Recent Advances in Robotics, FCRAR 2015 Melbourne, Florida, May 14 - 15, 2015