MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction YuningChai∗ BenjaminSapp∗ MayankBansal DragomirAnguelov WaymoLLC {chaiy,bensapp}@waymo.com Abstract: Predictinghumanbehaviorisadifficultandcrucialtaskrequiredfor motionplanning.Itischallenginginlargepartduetothehighlyuncertainandmulti- modalsetofpossibleoutcomesinreal-worlddomainssuchasautonomousdriving. BeyondsingleMAPtrajectoryprediction[1,2],obtaininganaccurateprobability distributionofthefutureisanareaofactiveinterest[3,4]. WepresentMultiPath, which leverages a fixed set of future state-sequence anchors that correspond to modesofthetrajectorydistribution. Atinference,ourmodelpredictsadiscrete distributionovertheanchorsand,foreachanchor,regressesoffsetsfromanchor waypoints along with uncertainties, yielding a Gaussian mixture at each time step. Ourmodelisefficient,requiringonlyoneforwardinferencepasstoobtain multi-modalfuturedistributions,andtheoutputisparametric,allowingcompact communicationandanalyticalprobabilisticqueries. Weshowonseveraldatasets that our model achieves more accurate predictions, and compared to sampling baselines,doessowithanorderofmagnitudefewertrajectories. 1 Introduction Wefocusontheproblemofpredictingfutureagentstates,whichisacrucialtaskforrobotplanning inreal-worldenvironments. Weareparticularlyinterestedinaddressingthisproblemforself-driving vehicles, anapplication witha potentially enormoussocietal impact. Importantly, predicting the futureofotheragentsinthisdomainisvitalforsafe,comfortableandefficientoperation.Forexample, itisimportanttoknowwhethertoyieldtoavehicleiftheyaregoingtocutinfrontofourrobotor whenwouldbethebesttimetomergeintotraffic. Suchfuturepredictionrequiresanunderstanding ofthestaticanddynamicworldcontext: roadsemantics(e.g.,laneconnectivity,stoplines),traffic lightinformation,andpastobservationsofotheragents,asdepictedinFig.1. Afundamentalaspectoffuturestatepredictionisthatitisinherentlystochastic,asagentscannot knoweachother’smotivations. Whendriving,wecanneverreallybesurewhatotherdriverswilldo next,anditisimportanttoconsidermultipleoutcomesandtheirlikelihoods. Weseekamodelofthefuturethatcanprovideboth(1)aweighted,parsimonioussetofdiscrete trajectoriesthatcoversthespaceoflikelyoutcomesand(2)aclosed-formevaluationofthelikelihood of any trajectory. These two attributes enable efficient reasoning in crucial planning use-cases, forexample,human-likereactionstodiscretetrajectoryhypotheses(e.g.,yielding,following),and probabilisticqueriessuchastheexpectedriskofcollisioninaspace-timeregion. Both of these attributes present modeling challenges. Models which try to achieve diversity and coverage often suffer from mode collapse during training [4, 5, 6], while tractable probabilistic inferenceisdifficultduetothespaceofpossibletrajectoriesgrowingexponentiallyovertime. OurMultiPathmodeladdressestheseissueswithakeyinsight: itemploysafixedsetoftrajectory anchorsasthebasisofourmodeling. Thisletsusfactorstochasticuncertaintyhierarchically: First, intent uncertainty captures the uncertainty of what an agent intends to do and is encoded as a distributionoverthesetofanchortrajectories. Second,givenanintent,controluncertaintyrepresents ouruncertaintyoverhowtheymightachieveit. Weassumecontroluncertaintyisnormallydistributed ∗Equalcontribution 3rdConferenceonRobotLearning(CoRL2019),Osaka,Japan. 9102 tcO 21 ]GL.sc[ 1v94450.0191:viXraInput scene K=16 anchors p = 0.5 oriented p = 0.3 crop Agent- centric CNN p = 0.1 Scene features ... Inference Offline training Figure1: MultiPathestimatesthedistributionoverfuturetrajectoriesperagentinascene,asfollows:1)Based onatop-downscenerepresentation,theSceneCNNextractsmid-levelfeaturesthatencodethestateofindividual agentsandtheirinteractions. 2)Foreachagentinthescene,wecropanagent-centricviewofthemid-level featurerepresentationandpredicttheprobabilitiesoverthefixedsetofKpredefinedanchortrajectories.3)For eachanchor,themodelregressesoffsetsfromtheanchorstatesanduncertaintydistributionsforeachfuturetime step. ateachfuturetimestep[7],parameterizedsuchthatthemeancorrespondstoacontext-specificoffset fromtheanchorstate,withtheassociatedcovariancecapturingtheunimodalaleatoricuncertainty[8]. Fig. 1 illustrates a typical scenario where there are 3 likely intents given the scene context, with controlmeanoffsetrefinementsrespectingtheroadgeometry, andcontroluncertaintyintuitively growingovertime. Ourtrajectoryanchorsaremodesfoundinourtrainingdatainstate-sequencespaceviaunsupervised learning. These anchors provide templates for coarse-granularity futures for an agent and might correspondtosemanticconceptslike“changelanes”,or“slowdown”(althoughtobeclear,wedon’t useanysemanticconceptsinourmodeling). OurcompletemodelpredictsaGaussianmixturemodel(GMM)ateachtimestep,withthemixture weights(intentdistribution)fixedovertime. Givensuchaparametricdistributionmodel,wecan directly evaluate the likelihood of any future trajectory and also have a simple way to obtain a compact,diverseweightedsetoftrajectorysamples: theMAPsamplefromeachanchor-intent. OurmodelcontrastswithpopularpastapproacheswhicheitherprovideonlyasingleMAPtrajec- tory[1,2,9,10,11]oranunweightedsetofsamplesviaagenerativemodel[3,4,6,12,13,14,15]. Thereareanumberofdownsidestosample-basedmethodswhenitcomestoreal-worldapplications suchasself-drivingvehicles: (1)non-determinisminasafetycriticalsystem,(2)apoorhandleon approximationerror(e.g,. “howmanysamplesmustIdrawtoknowthechancethepedestrianwill jaywalk?”),(3)noeasywaytoperformprobabilisticinferenceforrelevantqueries,suchascomputing expectationsoveraspacetimeregion. Wedemonstrateempiricallythatourmodelemitsdistributionswhichpredicttheobservedoutcomes betteronsyntheticandreal-worldpredictiondatasets: weachievehigherlikelihoodthanamodel which emits unimodal parametric distributions, showing the importance of multiple anchors in real-worlddata. Wealsocomparetosampling-basedmethodsbyusingourweightedsetofMAP trajectoriesperanchor,whichdescribethefuturebetterwithfarfewersamplesonsample-setmetrics. 2 Relatedwork We broadly categorize previous approaches to predicting future trajectory distributions into two classesofmodels: deterministicandstochastic. Deterministicmodelspredictasinglemost-likely trajectoryperagent,usuallyviasupervisedregression[1,2,9,10,11,16]. Stochastic models incorporate random sampling during training and inference to capture future non-determinism. TheseminalmotionforecastingworkofKitanietal.[14]castthisasaMarkov 2decisionprocessandlearnsa1-steppolicy,asdoesfollowonworkfocusingonegocentricvideoand pedestrians[15,17]. Toencouragesamplediversityandcoverage,R2P2[4]proposesasymmetric KLlossbetweenthepredictedanddatadistributions. Severalworksexploretheuseofconditional variationalautoencoders(CVAEs)andGANstogeneratesamples[3,6,13,18,19]. Onedrawback ofsuchnon-deterministicapproachesisthattheycanmakereproducingandanalyzingresultsina largersystemdifficult. Likeus,afewpreviousworksdirectlymodelprobabilitydistributions,eitherparametric[6,12,20] orintheformofprobabilisticstate-spaceoccupancygrids(POGs)[6,11]. Whileextremelyflexible, POGsrequirestate-space-densestoragetodescribethedistributionratherthanjustafewparameters, andit’snotobvioushowbesttoextracttrajectorysamplesfromPOGspace-timevolumes. Ourmethodisinfluencedheavilybytheconceptofpredefinedanchors,whichhavearichhistoryin machinelearningapplicationstohandlemulti-modalproblems,startingwithclassicsemi-parametric methods such as locally-weighted logistic regression, radial basis SVM and Gaussian Mixture Models[5]. Inthecomputervisionliterature,theyhavebeenusedeffectivelyfordetection[21]and human-poseestimation[22]. Likeours,theseeffectiveapproachespredictthelikelihoodofanchors andalsopredictcontinuousrefinementsofstateconditionedontheseanchors(e.g.boxcorners,joint locationsorvehiclepositions). 3 Method Givenobservationsxintheformofpasttrajectoriesofallagentsinasceneandpossiblyadditional contextualinformation(e.g.,lanesemantics,trafficlightstates),MultiPathseekstoprovide(1)a parametricdistributionoverfuturetrajectoriess: p(s|x),and(2)acompactweightedsetofexplicit trajectorieswhichsummarizesthisdistributionwell. Lettdenoteadiscretetimestep,andlets denotethestateofanagentattimet,thefuturetrajectory t s=[s ,...,s ]isasequenceofstatesfromt=1toafixedtimehorizonT. Wealsorefertoastate 1 T inatrajectoryasawaypoint. Wefactorizethenotionofuncertaintyintoindependentquantities. Intentuncertaintymodelsuncer- taintyabouttheagents’latentcoarse-scaleintentordesiredgoal. Forexample,inadrivingcontext, uncertaintyaboutwhichlanetheagentisattemptingtoreach. Conditionedonintent,thereisstill controluncertainty,whichdescribestheuncertaintyoverthesequenceofstatestheagentwillfollow tosatisfyitsintent. Bothintentandcontroluncertaintydependonthepastobservationsofstaticand dynamicworldcontextx. We model a discrete set of intents as a set of K anchor trajectories A = {ak}K , where each k=1 anchor trajectory is a sequence of states: ak = [ak,...,ak], assumed given for now. We model 1 T uncertainty over this discrete set of intents with a softmax distribution: π(ak|x) = expfk(x) , (cid:80) iexpfi(x)) wheref (x):Rd(x) (cid:55)→Ristheoutputofadeepneuralnetwork. k Wemakethesimplifyingassumptionthatuncertaintyisunimodalgivenintent,andmodelcontrol uncertaintyasaGaussiandistributiondependentoneachwaypointstateofananchortrajectory: φ(sk|ak,x)=N(sk|ak+µk(x),Σk(x)) (1) t t t t t The Gaussian parameters µk and Σk are directly predicted by our model as a function of x for t t eachtime-stepofeachanchortrajectoryak. NoteintheGaussiandistributionmean,ak+µk,the t t t µk representsascene-specificoffsetfromtheanchorstateak;itcanbethoughtofasmodelinga t t scene-specificresidualorerrortermontopoftheprioranchordistribution. Thisallowsthemodelto refinethestaticanchortrajectoriestothecurrentcontext,withvariationscomingfrom,e.g.specific roadgeometry,trafficlightstate,orinteractionswithotheragents. Thetime-stepdistributionsareassumedtobeconditionallyindependentgivenananchor,i.e.,we writeφ(s |·)insteadofφ(s |·,s ). Thismodelingassumptionallowsustopredictforalltime t t 1:t−1 stepsjointlywithasingleinferencepass,makingourmodelsimpletotrainandefficienttoevaluate. Ifdesired,itisstraightforwardtoaddaconditionalnext-time-stepdependencytoourmodel,usinga recurrentstructure(RNN). 3Toobtainadistributionovertheentirestatespace,wemarginalizeoveragentintent: K T (cid:88) (cid:89) p(s|x)= π(ak|x) φ(s |ak,x) (2) t k=1 t=1 Note that this yields a Gaussian Mixture Model distribution, with mixture weights fixed over all timesteps. Thisisanaturalchoicetomodelbothtypesofuncertainty: ithasrichrepresentational power,aclosed-formpartitionfunction,andisalsocompact. Itiseasytoevaluatethisdistribution onadiscretelysampledgridtoobtainaprobabilisticoccupancygrid,morecheaplyandwithfewer parametersthananativeoccupancygridformulation[6,11]. Obtaining anchor trajectories. Our distribution is parameterized by anchor trajectories A. As notedby[6,5], directlylearningamixturesuffersfromissuesofmodecollapse. Asiscommon practiceinotherdomainssuchasobjectdetection[23]andhumanposeestimation[22],weestimate our anchors a-priori before fixing them to learn the rest of our parameters. In practice, we used thek-meansalgorithmasasimpleapproximationtoobtainAwiththefollowingsquareddistance betweentrajectories: d(u,v) = (cid:80)T ||M u −M v ||2,whereM ,M areaffinetransformation t u t v t 2 u v matrices which put trajectories into a canonical rotation- and translation-invariant agent-centric coordinateframe. InSec.4,onsomedatasets,k-meansleadstohighlyredundantclustersduetoprior distributionsthatareheavilyskewedtoafewcommonmodes. Toaddressthis,weemployasimpler approachtoobtainAbyuniformlysamplingtrajectoryspace. Learning. We train our model via imitation learning by fitting our parameters to maximize the log-likelihoodofrecordeddrivingtrajectories. Letourdatabeoftheform{(xm,ˆsm)}M . Welearn m=1 topredictdistributionparametersπ(ak|x),µ(x)k andΣ(x)k asoutputsofadeepneuralnetwork t t parameterizedbyweightsθwiththefollowingnegativelog-likelihoodlossbuiltuponEquation2: M K T (cid:96)(θ)=− (cid:88) (cid:88) 1(k =kˆm)(cid:104) logπ(ak|xm;θ)+(cid:88) logN(sk|ak+µk,Σk;xm;θ)(cid:105) . (3) t t t t m=1k=1 t=1 This is a time-sequence extension of standard GMM likelihood fitting [5]. The notation 1(·) is the indicator function, and kˆm is the index of the anchor most closely matching the groundtruth trajectoryˆsm, measured as (cid:96)2-norm distance in state-sequence space. This hard-assignment of groundtruthanchorssidestepstheintractabilityofdirectGMMlikelihoodfitting,avoidsresorting to an expectation-maximization procedure, and gives practitioners control over the design of the anchorsastheywish(seeourchoicebelow). Onecouldalsoemployasoft-assignmenttoanchors (e.g.,proportionaltothedistanceoftheanchortothegroundtruthtrajectory),justaseasily. Inferringadiverseweightedsetoftest-timetrajectories. Ourmodelallowsustoeschewstandard samplingtechniquesattesttime,andobtainaweightedsetofK trajectorieswithoutanyadditional computation: wetaketheMAPtrajectoryestimatesfromeachofourK anchormodes,andconsider thedistributionoveranchorsπ(a |x)thesampleweights(i.e.,importancesampling). Whenmetrics k andapplicationscallforasetoftopκ