Identifying Driver Interactions via Conditional Behavior Prediction Ekaterina Tolstaya1, Reza Mahjourian2, Carlton Downey2, Balakrishnan Varadarajan2, Benjamin Sapp2, Dragomir Anguelov2 Abstract—Interactivedrivingscenarios,suchaslanechanges, mergesandunprotectedturns,aresomeofthemostchallenging situations for autonomous driving. Planning in interactive scenarios requires accurately modeling the reactions of other agents to different future actions of the ego agent. We develop end-to-end models for conditional behavior prediction (CBP) that take as an input a query future trajectory for an ego- agent, and predict distributions over future trajectories for otheragentsconditionedonthequery.Leveragingsuchamodel, we develop a general-purpose agent interactivity score derived fromprobabilisticfirstprinciples.Theinteractivityscoreallows Fig. 1. A conditional behavior prediction model describes how one us to find interesting interactive scenarios for training and agent’s predicted future trajectory can shift due to the actions of evaluatingbehaviorpredictionmodels.Wefurtherdemonstrate other agents. thattheproposedscoreiseffectiveforagentprioritizationunder computational budget constraints. limited driving game environments: [4] iteratively conditions on generated human actions in a CVAE framework; [8] I. INTRODUCTION formulates the interaction problem as a 2-player game with Behavior prediction is a core component of real-world human reward learned via inverse reinforcement learning. systems involving human-robot interaction. This task is As an alternative to sample-based models, there is a long particularly challenging due to the high degree of uncertainty line of work on single-shot, passive behavior prediction [9], in the future—the intent of human actors is unobserved, and [10],[11],[12],[13],[14],[15].Thesemodelsarecompelling multiple interacting agents may continually influence one duetotractabilityandpracticalparametricoutputdistributions, another. and have become the popular choice in AV systems and Weareparticularlyinterestedinthehigh-impactapplication associated benchmarks [15]. However, these models ignore of Autonomous Vehicles (AV), in which a robot may wish the fact that the AV ego-agent will take actions in the future, to pose behavior prediction queries of the form “If I take which may cause a critical reaction by another agent. Using action X, what will agent B do?”, as shown in Figure 1. We such models makes decision-making challenging: because assert that this type of conditional inference is important and the models do not condition on any explicit ego actions, they fundamental for making planning decisions in an interactive mustimplicitlyaccountforallpossibleego-actions(orignore driving environment. In this paper, we focus on probabilistic interactions altogether). In practice, interaction modeling has models of future behavior that can condition on possible been handled via aggregating neighboring agents’ observed future action sequences (i.e., trajectories) of other agents. We states via max-pooling, transformer layers, or graph neural call this task conditional behavior prediction (CBP). network architectures [1], [16], [17], [18]. In the literature, there are a family of behavior prediction Inthispaper,weproposeasingle-shot,conditionalbehavior modelsforwhichtheconditioningcapabilitycomesnaturally: prediction model. Our CBP model is a powerful, end-to-end those that employ step-wise, iterative sampling (“roll-outs”) trained deep neural network, which takes into account static for multiple agents in a scene, e.g. [1], [2], [3], [4]. In such anddynamicsceneelements—roadlanes,agentstatehistories models, it is possible to control the action sequences for a (vehicle, pedestrian and cyclist), traffic light information, etc. subset of agents, so that the roll-out of others will take them Fromtheseinputs,wepredictadiversesetoffutureoutcomes, into account. While flexible, these sample-based models have represented as Gaussian Mixture distributions, where each significant disadvantages for real-world applications: sample- mixture component corresponds to a future state sequence based inference is risky to employ in a safety-critical system, (i.e., a trajectory with uncertainty). We train these models iterative errors can compound [5], it is difficult to control to be capable of conditional inference by selectively adding sample diversity [6], [7], and attempting to jointly model all future trajectory information for some agents as additional agents is often intractable computationally. Some past work inputs to the model. We use large datasets of logged driving has focused on tightly-coupled robot-human interaction in data and train models via maximum likelihood to output 1 General Robotics Automation Sensing and Perception (GRASP) either conditional or passive (marginal) predictions for any LaboratoryattheUniversityofPennsylvania,eig@seas.upenn.edu subset of agents in a scene. The recently proposed WIMP 2 Waymo,rezama@waymo.com model [19] is also a single-shot conditional inference model; ThefirstauthoracknowledgessupportfromtheNSFGraduateResearch FellowshipsProgram. ours differs in that we condition on generic trajectories for 1202 nuJ 1 ]OR.sc[ 2v95990.4012:viXraany subset of agents in a fully probabilistic framework. ground-truth future trajectories, we can quantify interactions Thenotionofinteractivityisakeyconceptforthisproblem, by estimating the change in log likelihood of the target’s andakeycontributionofthispaperistoformalizethenotion ground-truth future sB: andobtainasimpleandpracticalinteractivityscore.Nowthat ∆ :=logp(sB|sA)−logp(sB) (1) we are equipped with a probabilistic model for conditional LL future distributions, we can quantify a notion of interactivity A large change in the log-likelihood indicates a situation in as follows. We quantify the degree of influence one agent which the likelihood of the target agent’s trajectory changes has on another as the KL-divergence between (a) the agent’s significantly as a result of the query agent’s action. If the future distribution conditioned on the other’s future and (b) target’s trajectory sB becomes more likely given the query its marginal distribution. We then take an expectation over all agent’strajectorysA,then∆ willbepositive.Ifitbecomes LL possible conditioned futures for the other agent to get a final less likely, then ∆ will be negative. If there is no change, LL interactivity score. This results in a simple, agent-symmetric then ∆ will be zero. LL computation in the form of mutual information between the A query agent may need to estimate the impact of a twoagent’sfutures.Incontrast,pastworkhavehand-designed planned future trajectory sA on the target agent B. Since models of surprise or discomfort for motion planning [20], we don’t have access to the ground-truth future for the [21], [22], [23]. Entropy and mutual information have been target agent, we can quantify the potential for a surprising previouslyusedinAVapplicationsasameasureofuncertainty interaction by estimating the shift in the distribution of the to predict collisions [24]. target agent’s trajectory. More specifically, we use the KL- In real-world driving, the interactivity score can be used to divergencebetweentheconditionalandmarginaldistributions anticipate driver interactions. When processing data offline, for the target’s predicted future trajectory SB to quantify the we demonstrate the use of the interactivity score for mining degree of influence exerted on B by a a trajectory sA: interactive scenarios that are potentially unsafe, since the wta erge dt emag oe nn st t’ rs ate exp te hc eta bti eo nn es fita sre ob fe ti hn eg iv ni to el ra at ce td iv. iF tyurt sh ce or rm eo fr oe r, D KL(cid:2) p(SB|sA)(cid:13) (cid:13)p(SB)(cid:3) =(cid:90) p(sB|sA)logp( ps (B s| Bs )A) (2) sB prioritizing agents for behavior prediction and planning. In For example, in Fig. 1, if the query agent decides to change contrast to previous work that built a special-purpose model lanes in front of the target agent, the target agent will have trained directly for the task of prioritization [23], which to slow down. In this case, the KL-divergence will reflect was derived from an implicitly-defined side-channel output a significant change in the target agent’s expected behavior of a blackbox planner, we provide a formulation that is as a result of the query agent’s planned lane change. In the independent of a specific planner definition and consequently, absence of a particular plan for the query or target agent, more generally applicable. we can consider the set of all possible actions for the query Our contributions can be summarized as follows: (1) We agent and compute the expectation of the degree of influence provide a novel, principled information-theoretic definition over all those possible actions. This expectation is defined of interactivity, which applies to any multi-agent interaction as the mutual information between the two agents’ future application, (2) we develop a first-of-its-kind, single-shot, trajectories SA and SB, and is computed as: deep neural network for probabilistic conditional behavior prediction and (3) we show our interactivity score improves (cid:90) state-of-the-art model performance in several settings. I(SA,SB)= p(sA)D KL(cid:2) p(SB|sA)(cid:13) (cid:13)p(SB)(cid:3) (3) sA II. DEFININGAGENTINTERACTIVITY Mutual information expresses the dependence between two random variables. It is non-negative, I(SA,SB) ≥ 0, and We define an agent trajectory S as a fixed-length, time- symmetric, I(SA,SB) = I(SB,SA) [25]. We use this discretizedsequenceofagentstatesuptoafinitetimehorizon. quantity as the interactivity score between agents A and All quantities in this work consider a pair of agents A and B. For example, if the target agent is driving closely behind B. Without loss of generality, we consider A to be the query the query agent, we expect their interactivity score to be high agent whose plan for the future can potentially affect B, because the target agent is likely to respond immediately to the target agent. The future trajectories of A and B are random variables SA and SB. The marginal probability of any actions, such as deceleration or acceleration, from the a particular realization of agent B’s trajectory sB is given query agent. by p(SB = sB), also indicated by the shorthand p(sB). III. METHOD The conditional distribution of agent B’s future trajectory given a realization of agent A’s trajectory sA is given by In the previous section, we developed a measure of p(SB =sB|SA =sA), indicated by the shorthand p(sB|sA). interactivity between a pair of agents. In this section, we Even in highly interactive scenarios, agents may behave as discuss training a conditional behavior prediction model that expected by other agents and not exert any influence on one canestimatethedistributionsp(sB)andp(sB|sA).Wediscuss another.Adefineasurprisinginteractionasoneinwhichthe the internals of this model and losses. We then discuss the target agent experiences a change in their behavior due to the process for computing the interactivity score by sampling query agent’s observed trajectory. When we have access to from the predicted distributions.Let x denote observations from the scene, including past wherethemarginalandconditionalprobabilitiesareevaluated trajectories of all agents, and context information such as via Eqs. (5) and (6). The use of other more efficient lane semantics. Let t denote a discrete time step, and let s approaches for estimating KL divergence are left to future t denote the state of an agent at time t. The realization of the work [26]. future trajectory s={s ,...,s } is a sequence of states for To train the model for conditional prediction, we set 1 T t∈{1,...,T}, a fixed horizon. the conditional query/plan input to agent A’s ground-truth A CBP model predicts p(SB|SA=sA,x), the distribution future trajectory from the training dataset. We learn to offuturetrajectoriesforBconditionedonsA.TheCBPmodel predict the distribution parameters fB(x,sA), µµµBk(x,sA), k t receives as input a realization of the future trajectory of the and ΣBk(x,sA) via supervised learning with the negative t query agent, sA =[sA,...,sA], which we refer to as agent log-likelihood loss, 1 T A’s plan, or the conditional query. Following the approach M K (cid:88) (cid:88) (cid:104) of MultiPath [9], the model predicts a set of K trajectories L(θ)= 1(k =kB) logπBk(x ,sA;θ) m m m for agent B, µµµB = {µµµBk}K , where each trajectory is a k=1 m=1k=1 (8) sequence of states µµµBk = {µµµB 1k,...,µµµB Tk}, capturing K (cid:88)T (cid:105) potentially-different intents for agent B. The model predicts + logN(sBk|µµµBk,ΣBk;x ,sA;θ) , t t t m m uncertainty over the K intents as a softmax distribution t=1 πBk(x,sA). The model also predicts Gaussian uncertainty where k is the index of the mode of the distribution that m over the positions of the trajectory waypoints as: has the closest endpoint to the given ground-truth trajectory, φBk(sB|x,sA)=N (cid:0) sB|µµµBk(x,sA),ΣBk(x,sA)(cid:1) . (4) k mB =argmin k(cid:80)T t=1(cid:107)sB t k−µµµB t k(cid:107) 2. t t t t Above, we describe how to produce predictions for a This yields the full conditional distribution p(SˆB|sA,x) as a single agent B. However, for increased efficiency, our model Gaussian Mixture Model (GMM) with mixture weights fixed produces predictions for multiple agents in parallel. To over all time steps of the same trajectory: encourage the model to maintain the fundamental physical property that agents cannot occupy the same future location K T p(SˆB=sB|x,sA)=(cid:88) πBk(x,sA)(cid:89) φBk(sB|x,sA). (5) in space-time, we include an additional loss function: t (cid:88)(cid:88) k=1 t=1 L O(θ)= πAiπBjmaxexp(−(cid:107)µµµA ti−µµµB t j(cid:107)2 2/α), (9) t TheGaussianparametersµµµBk andΣBk aredirectlypredicted i j t t by a deep neural network (DNN). The softmax distribution where {(πAi,µµµAi)}K and {(πBj,µµµBj)}K are the modes i=1 j=1 is computed as πBk(x,sA)= (cid:80)ex ep xf pkB f( Bx (, xsA ,s) A), where f kB are and probabilities of the future trajectory distributions for logit values also output by the Di NN.i agents A and B. The computation of the interactivity score also requires IV. EXPERIMENTS the estimation of marginal distributions, p(SB|x), which A. Data are not conditioned on any future plan for A. We train Wecollectedalarge,in-housedatasetofreal-worlddriving a single model which can produce both marginal and from urban and suburban environments, using a vehicle conditionalpredictions,inordertohavecomparablequantities equipped with an industry-grade sensor and perception stack, without any uncertainty due to model variance. Marginal predictions,p(S˜B|x),areprovidedbyturningoffinputsfrom which provides us with tracked objects. In total, the training set has 1.9 billion vehicle agents that we learn to model, the conditional query encoder in the model. We adopt the shorthands πBk(x,∅),φBk(x,∅) to describe this operation, from 19 million unique scenarios, comprising 18 years of continuous driving data. The models receive 2 seconds of which gives us the marginal distribution as history and predict 15 seconds of future behavior for all K T p(S˜B =sB|x)=(cid:88) πBk(x,∅)(cid:89) φBk(sB|x,∅). (6) agents in the scene, including the AV. The state of the agents t arerecordedat5Hz.Featuresdescribingthepaststatesofthe k=1 t=1 agents include (x,y,z) position, velocity vector, acceleration Given the conditional and marginal predictions of the CBP vector, orientation θ, and angular velocity. There are also model,wecannowcomputethemutualinformation.Directly binaryattributesindicatingwhetherthevehicleissignalingto computing the mutual information between the future states turnleftorright,andwhetheritisparked.Thelanemarkings of two agents via Eq. (3) is intractable between the GMM andboundariesarerepresentedby500pointssampledaround distributions (Eq. (6)). We estimate the outer expectation via the current location of each predicted vehicle to balance importance sampling. Rather than sampling N samples from memory requirements. themarginaldistribution,wewillusethemostlikely6modes At training time, we select one agent uniformly at random of the marginal distribution’s GMM in Eq. (6) as in standard fromthevehiclesinthescenetobethequeryagent.For95% motion forecasting metrics [15], with sA ∈{µµµkA(x)}6 : ofthesamples,thequeryagent’sfutureground-truthisfedto k k=1 themodelastheconditionalqueryinput.Fortheother5%,no I(SA,SB|x)≈ 1(cid:88)6 (cid:88) p(sA|x)logp(SˆB=sB m|sA k,x) (7) conditional query is provided, leading to marginal behavior M k p(S˜B=sB|x) prediction, with the split chosen through cross-validation. k=1 m mTABLE I: Comparison of CBP models on an evaluation dataset containing over 8 million agent pairs. Metrics are computed and averaged over all (query agent, target agent) pairs possible in every scene. The mean error is computed only over predictions for the target agent and does not include predictions for the query agent. The standard error of the mean is also reported. Method wADE6 (B)↓ minADE6 (B)↓ CBP CBP Non-conditional 3.486±0.0017 1.207±0.00062 Earlyfusion(encoder) 3.142±0.0016 1.170±0.00061 Latefusion(GNN) 3.469±0.0017 1.209±0.00063 Fig.2.Thearchitectureoftheconditionalbehaviorpredictionmodel. Earlyandlatefusion 3.160±0.0016 1.172±0.00067 Foreveryscene,themodelpredictsfuturebehaviorsforup minimum Average Distance Error (minADE), defined for to the 20 closest vehicles. Other agents (vehicles, pedestrians, conditional models over the most likely 6 modes as: and cyclists) are still used in the agent state feature encoder, but the model doesn’t predict futures for them. We observed 1 (cid:88) minADE6 (B)= min (cid:107)sB −µµµBk(sA,x)(cid:107) . (11) thatpredictionperformancebeyond20agentsdegradesrapidly CBP 1≤k≤6T t t 2 due to sensor limitations. t To obtain a low minADE value, the model needs to B. Model Architecture accurately predict the ground-truth future as one of its The architecture is composed of an input encoder stage, predicted intents. On the other hand, the wADE metric is a trajectory decoder stage, and a GNN-based trajectory moresuitableforevaluatingmulti-modaldistributionsandcan refinement stage, shown in Fig. 2. The encoder stage is reflect shifts in distribution of intent probabilities. Therefore, composed of a road lane encoder which uses an architecture we use wADE as the main metric in the following results. similar to VectorNet [14], and a track history encoder which Furthermore, ∆ is closely related to the definition of wADE uses a 64-dimensional LSTM applied to 5 time steps of past ∆ in Eq. (1), but since it is weighted by the distance error, LL state observations comprising 1 second of history. The result it is less sensitive to prediction errors for nearly-stationary of the above two encoders are concatenated and passed into vehicles. a decoder which outputs a sequence of (x,y) points via predicted polynomial coefficients for K = 287 trajectory B. Conditional Behavior Prediction modes [9]. In our experiments, we use a tenth-degree polynomial. The resulting trajectories are further refined Comparing accuracy between marginal and conditional using a GNN [27], [17]. The GNN uses an attention-based predictionsfromthetrainedmodelshowsa10%improvement aggregation function that combines relative agent positions for conditional predictions, as seen in Table I. This is clear as edge features to form messages passed to each node confirmation that our model is leveraging future information [28]. We apply one message update, which passes trajectory to improve predictive power, as expected. The early fusion information between neighboring agents, and then re-apply conditionalencoderreceivestheconditionalqueryatanearlier trajectory decoding. This process can refine the agents’ stage in the model, whereas the late fusion setup feeds the trajectory distributions with awareness of their neighbors’ query to the GNN only at the final prediction stage. As the distributions. Further details of this state-of-the-art model resultsshow,theearly-fusionvariantsignificantlyoutperforms architecture are currently under anonymous review. late fusion. V. RESULTS C. Evaluation on Argoverse A. Metrics Our model is competitive with state of the art on the Givenalabeledexample(x,sA,sB),theweightedAverage popular Argoverse benchmark dataset [15]. On the validation Distance Error (wADE) over the most likely 6 modes of the dataset,weachieveaminADE of0.7488,whichisnearstate 6 conditional prediction of agent B’s future trajectory given of the art in recent work: 0.71 by Liang et al. [29], 0.728 by the query agent A’s future trajectory is: TNT [30], and 0.75 by WIMP [19]. By conditioning on the 6 sensor vehicle, the CBP model reduces minADE by 0.8% to 1 (cid:88)(cid:88) wADE6 (B)= πBk(sA,x)(cid:107)sB−µµµBk(cid:107) , (10) 0.7409, consistent with our more exhaustive studies on the CBP T t t 2 t k=1 internal dataset. where µµµBk =µµµBk(sA,x) is the kth mode for the predicted t t D. Distribution of Interactivity Scores position of agent B at time t with its respective probability of πBk(sA,x). Likewise, we can compute the wADE Figure 3 shows the histogram of interactivity scores be- BP metric using µµµBk = µµµBk(x,∅) and πBk(x,∅). Computing tween all agent pairs in the evaluation dataset. The incidence t t their difference: ∆ (B)=wADE (B)−wADE (B) of interactions in most datasets are rare, so the interactivity wADE BP CBP quantifies the reduction in B’s error due to conditioning on score may be a good tool to automatically mine a dataset for A. Another established metric for behavior prediction is the interactive examples.Fig.3.Histogramofinteractivityscore(mutualinformation)between 8,919,306 pairs of agents in the validation dataset. Fig. 4. (a) The mutual information between the target and query agents and (b) the KL divergence for the target agent given the E. Interactivity Score Predicts Surprise ground-truth query trajectory both correlate with the incidence The mutual information score allows us to discover of surprising interactions, as measured by the target’s ∆ wADE(B). Shaded regions are between the 10 & 90th, 20 & 80th, 30 & 70th, scenarios with a potential for surprising interactions, where and 40 & 60th percentiles. the ground-truth future of the query agent causes a target agent to change its behavior. Using the ground-truth future F. Selecting Salient Agents trajectories of agents sA and sB, we can quantify how query agent A affected target agent B in reality by comparing the This section demonstrates using the interactivity score prediction error between the conditional and marginal (non- to predict which vehicles are salient for planning for the conditional) models. A large, positive ∆ indicates that autonomous vehicle. We predict the trajectory of the AV wADE providing the query agent’s future significantly improves the both in the original scene, and in a modified scene where prediction accuracy for the target agent. some agents have been removed. We show agents with high Figure 4a shows that there is a strong correlation between interactivitywiththeAVaremorelikelytoaffectitsbehavior, high values of mutual information and high values of comparedtoagentsthatarejustclosertoit.Inthedataset,we ∆ .Inotherwords,agentswithhighinteractivityscores typically have 10 to 32 cars in a scene, but in practice, very wADE are more likely to exert influence on one another. Note that few of these cars are actually relevant for planning for the the interactivity score does not use any future information, AV, so they could potentially be excluded from high-fidelity while ∆ does. behavior predictions on-board the vehicle. wADE Also, in percentiles with high mutual information, there Inthefirstexperiment,wecomputethemutualinformation is a high occurrence of examples where the conditional between the autonomous vehicle and every other agent. We prediction errors are much lower than marginal prediction choose the top N agents with the largest mutual information errors.Thesearesceneswherethebehaviorofthequeryagent values. Then, we remove all others from the scene, and use has significantly affected the target agent. Such examples are only the top N agents states to predict the trajectory of the not present in the lower mutual information percentiles. AV. We compare this approach to selecting the top N agents On the other hand, Figure 4b shows a decrease in av- closest in distance to the AV in the scene. This is a common erage ∆ for the percentiles with the highest mutual heuristic used for identifying relevant vehicles in the scene. wADE information. Upon inspection of a portion of scenes in the Figure 5a shows that mutual information can identify top percentiles, we observe many examples where the agent more relevant agents for planning up to N = 4. For larger pair are positioned very close to each other and can exert numbers of agents, the distance heuristic outperforms mutual influence on one another, however since they are almost information as an agent selection mechanism. In practice, for stationary and ∆ is sensitive to distance, the impact of agent prioritization onboard an AV, mutual information could wADE influence on ∆ is small. We also observe that a high KL be combined with other heuristics, such as distance. wADE divergence for the target agent given the ground-truth query In the second experiment, we do not remove the pruned trajectory strongly correlates with high values of ∆ . agentsfromthescene,butremovethemfromthesetofagents wADE Given the future trajectory of the query agent, we can predict whose behaviors are to be predicted by the model. In this surprising interactions even more accurately than without case, the pruned agents are visible to the model as scene future trajectories for either agent. context. Figure 5b compares using interactivity score vs. a Figure 7 shows two examples of pairs of interacting agents distance heuristic in this task. As the results show, moving discovered in the evaluation set by filtering by high mutual less interactive agents to scene context is actually improving informationandhigh∆ .Inthefirstexample,onevehicle predictions for the autonomous vehicle, as long as at least wADE yields to another in a turn. While in the marginal prediction the 3 most interactive agents are kept in the prediction set. there is a high probability for the target agent to cross the Onepotentialexplanationforthisresultisthatreducingthe intersection, the conditional prediction shows the target agent prediction set of the model provides an attention mechanism yielding. In second example, the target agent slows down for the prediction of the autonomous vehicle that emphasizes behind a query agent which is braking. the potential future trajectories of certain agents over others.In particular, the message-passing mechanism in the GNN can focus only on the relevant neighbors for the AV. Figure 6 visualizes pruning agents by mutual information vs. pruning by distance in the same scene. We see that the mutualinformationselectsvehiclesthatarebehindandahead of the AV in the same lane, in addition to a few vehicles further ahead in neighboring lanes. The distance metric, on the other hand, selects vehicles that are multiple lanes away and are not likely to interact with the AV. (a)Queryagentturnsleftandtargetagentyields. (a)Otheragentsareremovedfromthescene. (b)Targetagentslowsdownbehindthequeryagent. Fig. 7. Two examples of interacting agents found by sorting examples by mutual information and ∆ . The marginal (left) wADE andconditionalpredictions(right)areshownwiththequeryinsolid green, and predictions in dashed cyan lines. (b)Otheragentsareusedonlyascontext;theirbehaviorisnotpredicted. Fig. 5. The interactivity score allows pruning agents that are not relevantforplanningfortheAV.ThebarsshowtheaverageBPerror over the pruned scene minus the BP error over the original scene. Note that there are no conditional predictions in this experiment. The error bars indicate the standard error of the mean. Fig.8.Anexampleinwhichthequeryandtargetagentsslowdown in parallel lanes as a result of a traffic light change. The marginal (left) and conditional predictions (right) are shown with the query in solid green. intersection. These agents are reacting to a change in traffic light state, rather than to one another. The CBP model can not differentiate between correlation and causation of two agent’s trajectories. Before using a trajectory as a query, one cancomputethemarginallikelihoodofthequeryp(sB,x),to determine whether sB is a likely query for which the model can accurately provide counterfactual predictions. Fig. 6. The inset shows the non-pruned scene with the AV (pink) The interactivity score can be evaluated very efficiently and other cars (blue). On the left, agents with a low interactivity by pre-computing the embedding of the roadgraph, which is scorewiththeAVarepruned.Ontheright,agentsareprunedbased the most expensive part of the architecture in practice, and on distance to the AV. batchingthedifferentqueriestoevaluatetheminparallel.We G. Challenges and Future Work could also consider using our interactivity score as a reward signal in cooperative multi-agent reinforcement learning, Fig. 8 shows an example where our metrics have selected similar to the notion of influence introduced in [31]. a pair of vehicles slowing down in parallel lanes at anREFERENCES [27] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, C. Gulcehre, F. Song, A. Ballard, J. Gilmer, G. Dahl, [1] A. ”Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and A. Vaswani, K. Allen, C. Nash, V. Langston, C. Dyer, N. Heess, S.Savarese,“SocialLSTM:HumanTrajectoryPredictioninCrowded D.Wierstra,P.Kohli,M.Botvinick,O.Vinyals,Y.Li,andR.Pascanu, Spaces,”inCVPR,2016. “Relationalinductivebiases,deeplearning,andgraphnetworks,”arXiv [2] C. Tang and R. R. Salakhutdinov, “Multiple futures prediction,” in preprintarXiv:1806.01261,2018. NeurIPS,2019. [28] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. [3] N. Rhinehart, R. McAllister, K. Kitani, and S. Levine, “PRECOG: Gomez,Ł.Kaiser,andI.Polosukhin,“Attentionisallyouneed,”in Predictionconditionedongoalsinvisualmulti-agentsettings,”inIntl. NeurIPS,2017. Conf.onComputerVision,2019. [29] M.Liang,B.Yang,R.Hu,Y.Chen,R.Liao,S.Feng,andR.Urtasun, [4] E.Schmerling,K.Leung,W.Vollprecht,andM.Pavone,“Multimodal “Learning lane graph representations for motion forecasting,” arXiv probabilistic model-based planning for human-robot interaction,” in preprintarXiv:2007.13732,2020. IEEEIntl.Conf.onRoboticsandAutomation. IEEE,2018,pp.1–9. [30] H.Zhao,J.Gao,T.Lan,C.Sun,B.Sapp,B.Varadarajan,Y.Shen, [5] S.Ross,G.Gordon,andD.Bagnell,“Areductionofimitationlearning Y. Shen, Y. Chai, C. Schmid et al., “Tnt: Target-driven trajectory andstructuredpredictiontono-regretonlinelearning,”inICML,2011. prediction,”arXivpreprintarXiv:2008.08294,2020. [6] N.Rhinehart,K.Kitani,andP.Vernaza,“R2P2:Areparameterized [31] N.Jaques,A.Lazaridou,E.Hughes,C.Gulcehre,P.Ortega,D.Strouse, pushforwardpolicyfordiverse,precisegenerativepathforecasting,”in J.Z.Leibo,andN.DeFreitas,“Socialinfluenceasintrinsicmotivation ECCV,2018. formulti-agentdeepreinforcementlearning,”inICML,2019. [7] Y.YuanandK.Kitani,“Diversetrajectoryforecastingwithdetermi- nantalpointprocesses,”ICLR,2020. [8] D.Sadigh,S.Sastry,S.A.Seshia,andA.D.Dragan,“Planningfor autonomouscarsthatleverageeffectsonhumanactions.”inRobotics: ScienceandSystemsconference,2016. [9] Y.Chai,B.Sapp,M.Bansal,andD.Anguelov,“Multipath:Multiple probabilisticanchortrajectoryhypothesesforbehaviorprediction,”in Conf.onRobotLearning,2019. [10] N. Lee, W. Choi, P. Vernaza, C. B. Choy, P. H. S. Torr, and M.Chandraker,“DESIRE:Distantfuturepredictionindynamicscenes withinteractingagents,”inCVPR,2017. [11] W.Zeng,W.Luo,S.Suo,A.Sadat,B.Yang,S.Casas,andR.Urtasun, “End-to-endinterpretableneuralmotionplanner,”inCVPR,2019. [12] S. Casas, W. Luo, and R. Urtasun, “Intentnet: Learning to predict intentionfromrawsensordata,”inConf.onRobotLearning,2018. [13] A.Gupta,J.Johnson,L.Fei-Fei,S.Savarese,andA.Alahi,“Social GAN: Socially acceptable trajectories with generative adversarial networks,”inCVPR,2018. [14] J.Gao,C.Sun,H.Zhao,Y.Shen,D.Anguelov,C.Li,andC.Schmid, “VectorNet:Encodinghdmapsandagentdynamicsfromvectorized representation,”inCVPR,2020. [15] M.-F.Chang,J.Lambert,P.Sangkloy,J.Singh,S.Bak,A.Hartnett, D.Wang,P.Carr,S.Lucey,D.Ramananetal.,“Argoverse:3dtracking andforecastingwithrichmaps,”inCVPR,2019. [16] J.Mercat,T.Gilles,N.Zoghby,G.Sandou,D.Beauvois,andG.Gil, “Multi-headattentionforjointmulti-modalvehiclemotionforecasting,” inIEEEIntl.Conf.onRoboticsandAutomation,2020. [17] S.Casas,C.Gulino,R.Liao,andR.Urtasun,“Spagnn:Spatially-aware graphneuralnetworksforrelationalbehaviorforecastingfromsensor data,”inIEEEIntl.Conf.onRoboticsandAutomation. IEEE,2020. [18] K.Mangalam,H.Girase,S.Agarwal,K.-H.Lee,E.Adeli,J.Malik, and A. Gaidon, “It is not the journey but the destination: Endpoint conditionedtrajectoryprediction,”arXiv:2004.02025,2020. [19] S.Khandelwal,W.Qi,J.Singh,A.Hartnett,andD.Ramanan,“What-if motionpredictionforautonomousdriving,”ArXiv,2020. [20] A.K.PandeyandR.Alami,“Aframeworktowardsasociallyaware mobile robot motion in human-centered dynamic environment,” in IEEE/RSJIntl.Conf.onIntelligentRobotsandSystems,2010. [21] L.ScandoloandT.Fraichard,“Ananthropomorphicnavigationscheme fordynamicscenarios,”inIEEEIntl.Conf.onRoboticsandAutomation, 2011. [22] E.A.Sisbot,L.F.Marin-Urias,R.Alami,andT.Simeon,“Ahuman awaremobilerobotmotionplanner,”IEEETransactionsonRobotics, 2007. [23] K.S.Refaat,K.Ding,N.Ponomareva,andS.Ross,“Agentprioritiza- tionforautonomousnavigation,”inIEEE/RSJIntl.Conf.onIntelligent RobotsandSystems,2019. [24] R.Michelmore,M.Kwiatkowska,andY.Gal,“Evaluatinguncertainty quantificationinend-to-endautonomousdrivingcontrol,”arXivpreprint arXiv:1811.06817,2018. [25] C.E.Shannon,“Amathematicaltheoryofcommunication,”TheBell systemtechnicaljournal,1948. [26] J.R.HersheyandP.A.Olsen,“Approximatingthekullbackleibler divergence between gaussian mixture models,” in Intl. Conf. on Acoustics,SpeechandSignalProc.,2007.