HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps LuMi1,2,†,∗,HangZhao1,∗,CharlieNash3,XiaohanJin1,JiyangGao1,ChenSun4, CordeliaSchmid4,NirShavit2,YuningChai1,DragomirAnguelov1 1Waymo,2MIT,3DeepMind,4Google Abstract High Definition (HD) maps are maps with precise def- initions of road lanes with rich semantics of the traffic rules. They are critical for several key stages in an au- tonomousdrivingsystem,includingmotionforecastingand planning. However, thereareonlyasmallamountofreal- world road topologies and geometries, which significantly limits our ability to test out the self-driving stack to gen- eralize onto new unseen scenarios. To address this issue, weintroduceanewchallengingtasktogenerateHDmaps. In this work, we explore several autoregressive models us- ingdifferentdatarepresentations,includingsequence,plain graph,andhierarchicalgraph. WeproposeHDMapGen,a hierarchicalgraphgenerationmodelcapableofproducing high-qualityanddiverseHDmapsthroughacoarse-to-fine Figure1:Left:ahierarchicalmapgraphgeneratedfromHDMap- Gen. One traffic light controlled lane is shown as an example; approach. ExperimentsontheArgoversedatasetandanin- Right:amapwithcorrespondingintersectionsandlanesrendered house dataset show that HDMapGen significantly outper- fromthehierarchicalmapgraph. formsbaselinemethods. Additionally,wedemonstratethat HDMapGenachieveshighscalabilityandefficiency. aresenttocapturethescene;then,thesensordataarepro- cessedandstitchedtogethertoobtainmapimagery;finally, humanspecialistsannotateontopoftheimagerytoprovide 1.Introduction vectorized representations of the world geometry with se- mantic attributes. On the other hand, we aim to produce High Definition maps (HD maps) are electronic maps syntheticHDmapsinadata-drivenway,mainlyduetotwo with precise depictions of the physical roads, usually with reasons: (1)buildingHDmapsfromtherealworldispro- anaccuracyofcentimeters,togetherwithrichsemanticsof hibitively expensive; (2) the small number of real-world traffic rules, such as one-way-street, stop, yield, etc. HD maps prevent us from testing the generalization capability maps sit at the core of autonomous driving applications of self-driving stacks in simulation, such as motion fore- as they provide a strong prior for the self-driving robots castingandmotionplanning. to localize themselves in the 3D space [4, 34], predict Existingmethodsinmodelingcitiesandmapsaremostly other vehicles’ motions [6, 40] and maneuver themselves relying on procedural modeling, and hand-crafted genera- around [8, 20]. Furthermore, HD maps are critical build- tionrules[30],andarethereforenotflexibleandadaptable ing blocks of city modeling and simulation. Simulating tonewscenarios. Thereisnopreviousattempttogenerate novelcityenvironmentsfindsawiderangeofapplications HDmapsusingmoderndeepgenerativemodelstothebest ingamedesignandurbanplanning. ofourknowledge. Themostrelatedworksarethosewhich Practically, HD maps are constructed according to a generatecitylayouts[10]. ComparedtoHDmaps,citylay- strictmappingprocedure:first,afleetofvehicleswithmap- outs are not suitable for autonomous driving applications pingsensorsuites(containingLiDAR,Radar, andcamera) since they only contain coarse locations of the roads (with aresolutionofroughly10meters)andlackdetailssuchas †WorkdoneduringinternshipatWaymo. ∗Correspondingto:lumi@mit.edu,zhaohang0124@gmail.com lanesoftheroadsortrafficlights. 1202 nuJ 82 ]VC.sc[ 1v08841.6012:viXraUnique attributes of HD maps pose new challenges to 2.2.GraphGenerativeModels modeling them: (1) HD maps are composed of physical Our model architecture is inspired by the rapid roadelementswithgeometricfeatures,e.g.markedstraight progress in the area of graph generation spearheaded lanes and hypothesized turning lanes; (2) Lane maps con- by seminal works such as graph recurrent neural net- tainrichsemanticattributes,e.g.thedirectionoflanes,the works (GRNN) [17], graph generative adversarial net- associationbetweentrafficlightsandlanes. Byaddressing works (GraphGAN) [36], variational graph auto-encoder thesechallenges,ourmajorcontributionsinthisworkareas (VGAE) [19], and graph recurrent attention networks follows: (GRAN) [24]. More recently, Cao et al. proposed Mol- GAN [11], and Samanta et al. proposed NEVAE [31] to • Weposeanewimportantandchallengingproblemto generate small molecule graphs. These methods all focus generateHDlanemapsinadata-drivenway. ongeneratinggraphtopologies,suchasadjacencymatrices. In comparison, HD lane maps are graphs that come with • We perform a systematic exploration of modern au- spatialcoordinatesandgeometricfeatures,whichposesan- toregressivegenerativemodelswithdifferentdatarep- otherlayerofcomplexitytothegenerationproblem. resentations and propose HDMapGen, a hierarchical graphgenerativemodelthatlargelyoutperformsother 2.3.GeometricDataGeneration baselines. Given the geometric nature of maps, another line of related work is geometric data generation. To generate • We evaluate our model on the maps of the public Ar- 2D sketch drawings, Ha et al. proposed SketchRNN [16], goversedatasetandanin-housedataset,coveringcities an RNN model with a VAE structure to sequentially pro- ofMiami,Pittsburg,andSanFrancisco. Resultsshow duce sketch strokes following human drawing sequences. thatourmodelproducesmapswithhighfidelity,diver- To assemble furniture from primitive parts, GRASS [21] sity,scalability,andefficiency. and StructureNet [26] adopted several variations of auto- encodingmodels.Morerecently,Nashetal.proposedPoly- 2.RelatedWork Gen [27], an autoregressive transformer model that gener- ates 3D furniture meshes. Compared to human sketches, 2.1.StreetMapModelingandGeneration furniture,andmolecules,graphsoflanemapsusuallycon- sistofmorenodesandedges. Insteadoftreatingthewhole Citystreetmodelingandgenerationisanimportantcom- mapasasequence,ourmethodgeneratesamapwithatwo- ponent of computer-aided urban design. Most classical levelhierarchy,greatlyimprovingthegenerationquality. worksrelyonproceduralmodelingmethods[1,2,37,5,9, 12, 13, 28, 30, 38]. One of the more well-known methods istheL-system[30],whichgeneratesroadnetworksfroma 3.Method sequenceofinstructionsdefinedbyhand-craftedproduction InSection3.1,weexploredifferentdatarepresentations rules. To impose more control over the generated results, togenerateHDmapsanddemonstratetheefficiencyofus- laterworksproposedtosamplefromproceduralmodelsac- inghierarchicalgraph. InSection3.2,weintroduceourau- cording to constraints or likelihood functions [33]. How- toregressive graph generative model HDMapGen that uses ever,thesemethodsarestillconstrainedbytherules,there- ahierarchicalgraphasdatarepresentation. fore are not flexible in terms of generating maps with dif- ferentcitystyles. 3.1.DataRepresentation Inrecentyears,deeplearningmethodshavebeenapplied tomapreconstructionandencoding. Severalworks[3,18, Toprovidedetailedroadinformationtotheautonomous 22,25]attemptedtoextractandreconstructroadtopologies vehicle,HDmapsusedinapplicationssuchasautonomous fromoverheadimages. VectorNet[14]andLaneGCN[23] driving typically contain many essential components, in- proposed to efficiently encode roads using graph atten- cluding lanes, boundaries, crosswalks, traffic lights, stop tionnetworksandgraphconvolutions,replacingtraditional signs,etc.Inthiswork,wefocusongeneratinglanes,which rendering-based models. Chu et al. propose Neural Turtle areacorecomponentofHDmaps. Alaneisusuallyrepre- Graphics(NTG)[10],agenerativemodeltoproduceroads sented geometrically by its central line as a curve, and the iteratively. They use an encoder-decoder RNN model that curveisstoredasapolylinewithasufficientamountofcon- encodes incoming paths into a node and decodes outgoing trolpointsonittoreconstructthecurvature. Ourgoalisto nodesandedges. ThegoalofNTGistogeneratecity-level generate these control points of central lane lines. Mean- road layouts. In comparison, we focus on high definition while, each lane has a unique ID. Its predecessor and suc- roadandlanegeneration,whichismuchmorechallenging. cessorlaneIDsarealsoprovided. Moreover,thelanesalsoFigure2:DifferentdatarepresentationsofHDmaps.Sequencerepresentationleadstoabigdiagonaladjacencymatrix,anditalsosuffers fromrevisitingtheintersectionpointsmultipletimes; plaingraphrepresentationresultsinabigsparseadjacencymatrix; ourproposed hierarchicalgraphrepresentationgivesasmalleranddenseradjacencymatrix(oftheglobalgraph),whichreducesgenerationdifficulty andimprovesefficiency. have semantic attributes such as whether any traffic light sequence generation. However, the method is still ineffi- controlsitornot. cient due to a large number of control points to guarantee Forlanemapgeneration, therearevariouswaystorep- ahigh-resolutionmap. Togenerateanedgebetweenevery resentthelaneobjects. Weexplorethreedifferentdatarep- pair of control points, it resorts to a very large adjacency resentations: a sequence, a plain graph, and a hierarchical matrix. graph,asshowninFigure2. Wewilllatershowthatthehi- HierarchicalGraph(HDMapGen). Here,weproposean erarchical graph representation that HDMapGen utilizes is efficientandscalablealternativetouseahierarchicalgraph themostefficientandscalable. Moreover,itvastlyoutper- torepresentthemap.Underthisapproach,wefirstconstruct formsotherrepresentationmethods. aglobalgraphwithkeypointsasitsnodes. Keypointsare Sequence. Whileassumingalllines(lanesinourcase)are defined as the endpoints of lanes or the intersection points connected,astraightforwardrepresentationistosortallthe ofmultiplelanes. Thentheedgeoftheglobalgraphrepre- lines in a map and treat the entire map as one sequence of sents whether a lane exists between two key points. Since points. This approach was adopted by SketchRNN [16]. keypointsareasmallsubset(30%)oforiginalpoints,which Each point in the sequence has an offset distance in the x require a much smaller adjacency matrix. In the second and y direction (∆x,∆y) from the previous point, and a step, we represent each lane’s curvatures details with a lo- statevariableq ∈ {1,2,3}. Thestateq = 1indicatesthat calgraph. Eachlaneisconstructedwithauniquelydefined thenextpointisstartinganewlaneobject;thestateq = 2 sequence of control points between two key points. Given identifies the next point to continue in the same lane ob- this fixed topology, we could directly predict the coordi- ject, andthestateq = 3indicatesthattheentiremapgen- nates of the local nodes. Moreover, we also predict a cor- eration has completed. One of the major disadvantages of respondingnodemasktoallowvariationsofthenumberof this method is that there is no perfect way to pre-define a control points during generation. As demonstrated in Fig- sequential ordering of these points. Secondly, to cover an ure 2, using a hierarchical graph has a better performance intersection point of multiple lanes requires revisiting that topreservethenaturalhierarchicalstructureinformationof point more than once. Under this representation, an inter- the HD map. Moreover, it also enables a smaller size of section point will be represented as multiple independent theadjacencymatrixandhigherefficiencythanusingplain pointsinthesequence. graphandsequence. PlainGraph.Thesecondbaselineistouseaplaingraphto 3.2.AutoregressiveModeling representamap. Thisstrategywasadoptedinrecentwork on road layout generation NTG [10]. Under this represen- Autoregressive modeling has achieved remarkable per- tation, a plain graph contains all control points as nodes formance in many generation tasks[15, 27], especially and all line connections between control points as edges. for sequential data such as speech [29] and text [32]. Moreover, each node has a node attribute of 2D coordi- Morerecently,severalstudiessuchasGraphRNN[39]and nates (x,y). The points inside a lane object have a degree GRAN [24] also took an autoregressive approach for the of two, while the intersection points have a degree greater graphgeneration. Inthiswork, wealsouseautoregressive thantwo. Thisrepresentationsolvestherevisitingissueof modelingforHDmapgeneration.Figure3:HDMapGenpipeline.Ateachstep,globalgraphgenerator(top)producesanewnodewithitscoordinatesanditsconnectionsto existingnodes,whichisarowandacolumnintheadjacencymatrix;localgraphgenerator(bottom)furtherdecodescoordinatesoflocal nodesbetweenthetwoconnectedglobalnodes.WealsodemonstratetheglobalgraphandthelocalgraphgeneratedfromHDMapGen. Our proposed HDMapGen is a two-level hierarchical GlobalGraphGeneration. Toapplyautoregressivemod- graph generative model, as shown in Figure 3. Firstly, the eling,wefirstlyfactorizetheprobabilityofgeneratingadja- global graph generation process outputs a graph with both cencymatrixLandcoordinatesC oftheglobalgraphas topological structures represented by adjacency matrix L, andthegeometricfeaturesrepresentedbynodes’spatialco- (cid:89)T P(L,C)= P(L ,C |L ,C ,··· ,L ,C ), ordinatesC. Eachnodeoftheglobalgraphisgeneratedau- t t 1 1 t−1 t−1 toregressively. Inourtask,wefirstmodeltheglobalgraph t=1 as an undirected graph and later assign the lane direction where P(L ,C | L ,C ,··· ,L ,C ) defines that at t t 1 1 t−1 t−1 information as the sequential order in the local graph. At thecurrentstept,theconditionalprobabilityofgenerating thecurrentstept,theglobalgraphdecoderpredictstheC t everynewnode’sattributesC anditsedgesL withnodes t t of this new node at t and all corresponding edges (lanes) generatedfromallprevioussteps1tot−1. L betweenpreviouslygeneratednodessinasingleshot, t,s Furthermore, sincetheprocessofgeneratingL andC t t wheres ∈ [1,t−1]. Then,itconstructsanewrowL and t ateachsteptarenotindependent,weexplorethreevariants column (due to the symmetry for the undirected graph) of ofglobalgraphdecodertostudytheperformancewithdif- theadjacencymatrixL. Then,weapplywithalocalgraph ferentpriority ofgenerating L or C . We usecoordinate- t t decoder, which outputs the curvature details in each lane first to refer to generating node coordinate first and then object L . The local graph outputs for each global edge t,s topology, defined as P(L ,C ) = P(L | C )P(C ), as t t t t t L aretwosequenceswithafixedlengthW ofcoordinates t,s showninFigure3.Andweusetopology-firsttorefertofirst {(C ) }andvalidmask{(M ) }foreachlocalnodej, j t,s j t,s generating graph topology and then generate node coordi- wherej ∈[1,W]. Themaskdefineswhichnodeisvalidto nates,definedasP(L ,C ) = P(C | L )P(L ). Another t t t t t allowavariationofthenumberofnodesineachlocalgraph. simplified strategy ignores the dependency between nodes Thelocalgraphdecoderisalsogeneratingthelocalgraphs andedgesisnamedasindependent,definedasP(L ,C )= t t foralllaneobjectsL inasingleshot,whichenablesvery t,s P(C )P(L ). MoredetailsareinSupplementary. t t fast and efficient running. We further add another single- Inspired by GRAN [24], which has state-of-the-art per- shotsemanticattributedecodertopredictsemanticfeatures formanceinsamplequalityandtimeefficiency,wealsoin- ofallgeneratededgesL . Weintroducethedetailsofthe t,s corporatearecurrentgraphattentionnetworkfortheglobal keycomponentsofHDMapGeninthefollowingtext. graph generation. The graph attention module [35] per-forms the node state update with the attentive message timizedwiththeminimizationofmeansquareerror(MSE) passing, enabling a better representation of global infor- forcoordinateoutput{(C ) },andwiththeminimization j t,s mation. In the model, we firstly define an initial node ofBCEforthevalidmask{(M ) }. j t,s state E0 before the message passing at each step t as SemanticAttributeGeneration. Likethelocalgraphgen- s E s0 = W LL s + W CC s + b,s ∈ [1,t − 1], where W L, eration,wepredictanadditionalattribute,trafficlightsfea- W C andbareparametersoftheMLPencoderstakingtopol- tureforthegeneratededge.GivenalaneL t,sbetweennode ogyandgeometryinputs. Thenforallnodesincludingnew t and node s, we predict whether the lane is controlled by node t and nodes s already generated, we perform multi- trafficlightswith anadditionaledgefeaturedecoder using ple runs of message passing. For each run r, the message MLP.Wetrainthisedgefeaturedecoderforbinaryclassifi- mr between node i and its neighbor node k ∈ N(i) is cationwiththeminimizationofBCE. i,k mr = f(Er −Er), where N(i) are the neighbor nodes i,k i k of i. And a binary mask is defined as B i = 0 to indi- 4.Experiments cate if node i is already generated, or defined as B = 1 i if under construction at step t. The node state Er is fur- In this section, we demonstrate the efficacy of our pro- i ther concatenated with this mask B into E˜r = [Er,B ]. posed HDMapGen model to generate high-quality maps. i i i i And the attention weights ar for edge L is defined as We explore three autoregressive generative models for se- ik i,k ar = Sigmoid(g(E˜r −E˜k)). Andthenodestateupdate quence,plaingraph,andhierarchicalgraphdatarepresenta- ik i i isthencalculatedasEr+1 =GRU(Er,(cid:80) ar mr ). tions and evaluate generation quality, diversity, scalability, i i k∈N(i) ik ik andefficiency. In this experiment, we use MLPs for both f and g. The GNN model has 7 layers and a propagation number of 1. 4.1.DatasetandImplementationDetails After the message passing is completed, we take the fi- nal node states E t and E s after message passing, and use Datasets. We evaluate the performance of our models on model MLP(E t −E s) to decode them into a mixture of twodatasetsacrossthreecitiesintheUnitedStates. Oneis Bernoulli distribution for topology outputs L t. And an- thepublicArgoversedataset[7],whichcoversMiami(total other model MLP(E t) is applied to generate a 2D Gaus- sizeof204kilometersoflanes)andPittsburg(totalsizeof sianmixturemodel(GMM)forcoordinateoutputsC t. The 86 kilometers). We randomly sample 12000 maps with a entire model is optimized with the minimization of neg- field-of-view(FoV)of200m×200masthetrainingdataset ative log-likelihood (NLL) for coordinate outputs and bi- forMiamiand5000mapswiththesameFoVforPittsburg. nary cross-entropy (BCE) for topology outputs. And for We also evaluate an in-house dataset, which covers maps the GMM, we further add a temperature term τ to control fromthecityofSanFrancisco.Wetrainon6000mapswith the diversity (variance) during the sampling. The original aFoVof120m×120m. standard deviation σ and σ in GMM are modified into x y Map Preprocessing. The key components of HD maps, σ τ andσ τ. Wefollowthecommonteacher-forcingstrat- x y thecentrallineofdrivablelanes, areusuallyover-sampled egyforautoregressivemodeltrainingbyprovidingground toguaranteeasufficientresolution. However,graphneural truthofL ,C ,··· ,L ,C asinputswhenpredicting 1 1 t−1 t−1 networkislimitedinscalability.Soinapre-processingstep, P(L ,C ),whichavoidsperformingthereparameterization t t weremoveredundantcontrolpointswithasmallvariation trickforthesamplingprocessduringthebackpropagation. of curvature in each lane. This step enables us to remove LocalGraphGeneration. Asthelocalgraphtorepresent 70% of points while not compromise the map quality. We thecurvaturedetailsoflanesalwayshasauniquetopology then define a hierarchical spatial graph based on the pre- as a sequence of control points, we use a padding vector processed vector map. We use Depth-First-Search (DFS) withafixedmaximumlengthW torepresentthesequence. toconstructtheglobalgraph’sadjacencymatrixanddefine We model it as a sequence of coordinate output {(C ) } thegenerationorderforautoregressivemodeling. Thestart j t,s and a corresponding valid mask {(M ) }, where j ∈ nodeisrandomlyselected. Thenwedefinethesequenceof j t,s [1,W]. Thentheconditionalprobabilityofthelocalgraph control points between global nodes as a local graph. The during generation is defined as P({(C ) },{(M ) } | sequentialorderisthesameasthelanedirection. j t,s j t,s L ,C ,C ,E ,E ). Thevalidmaskenablesavariationin Implementation Details. HDMapGen uses graph atten- t,s t s t s the number of nodes in each graph. For the straight lines tion neural network as the core part to perform the at- which have been filtered with redundant control points af- tentive message passing for graph inputs. We use multi- ter map preprocessing, the valid mask {(M ) } is all 0s, layer-perceptrons(MLPs)forallotherencodinganddecod- j t,s while for the lanes at the corner which usually have mul- ing steps. The graph node coordinates are normalized to tiple control points remained to guarantee the smoothness, [−1,1]. ThemodelistrainedonasingleTeslaV100GPU {(M ) }islikelytohavemorevaluesof1.Andweusean with the Adam optimizer, where we use an initial learning j t,s MLPmodeltogeneratethelocalgraph.Themodelsareop- rateof0.0001withamomentumof0.9.4.2.Baselines Wecomparethequalityofourhierarchicalgraphgenera- tivemodelHDMapGenwithasequencegenerativemodelin SketchRNN[16]andaplaingraphgenerativemodelPlain- GenderivedfromtheglobalgenerationstepofHDMapGen. SketchRNN. Wemapallcontrolpointsofthelaneobjects inside each HD map into a sequence for a sequence gen- Figure 4: The diversity of the global graph from HDMapGen is erativemodel. Eachsequentialdatapointhasapairof2D improvedastemperatureτ increases,buttherealismsuffers. coordinatesandastatevariabletodefineeachline’sconti- nuity. Wechooseanascendingorderofspatialcoordinates to define the sequential order. We explore conditional and unconditional variants for this model. For the conditional generation,atargetmapisprovidedasbothinputandtarget asavariationalauto-encoderduringtraining. Meanwhile,a targetmapisstillprovidedasinputduringsampling.Foran unconditional generation, a decoder-only model is trained togeneratethetargetmaps. Figure 5: Semantic attributes of traffic lights and local graph of PlainGen. Theplaingraphgenerativemodelhasthesame lanescontrolledbytrafficlightgeneratedfromHDMapGen. implementation as the global graph generative model in HDMapGen, and it takes the entire plain graph as inputs. topology-first)generatethehighestqualitymapsandvastly Everycontrolpointinthemapistakenasaglobalnodein outperform other methods. They capture the typical fea- theplaingraph. Acorrespondingedgeisconstructedtode- tures of HD maps, including patterns like the overall lay- fine whether these two nodes are connected. We use DFS outs, the crossroads, parallel lanes, etc. Moreover, they to construct the adjacency matrix and define the order of are capable of generating maps with different city styles. generation. Andweexploretwovariantsofcoordinate-first For the maps generated from HDMapGen with the inde- andtopology-firstforthismodel. pendent model, the problematic crossing of lanes happens frequently,asthemodelfailstotakethedependenceofco- 4.3.QualitativeAnalysis ordinatesandtopologyintoaccount. ForPlainGen,wefind GlobalGraphGeneration. Wefirstshowtheglobalgraph the model to completely fail for this task, as it is too chal- from HDMapGen in Figure 4. We could see with only a lenging to generate a graph with a much large number of small set of global nodes could represent a typical pattern nodesandedgesinthissetting, comparedtothehierarchi- in HD maps to include two parallel crossroads. We also cal graph. For SketchRNN, we find the model can learn demonstratethatthediversityofmodeloutputscouldbeim- the overall geometric patterns. However, there are a large provedbyincreasingthetemperatureτ. Whilethediversity number of problematic cut-offs or dead-ends occurring in and realism trade-off still exist during generation, we find thegeneratedlanes. Thisisbecause, duringthesequential an empirical bound of 0.2 for τ to guarantee high-quality datageneration,itispossibletostopgeneratingaconsecu- samplesinourexperiments. tivepointatanystepandthenre-startatanyarbitraryloca- Local Graph Generation. The local graph represents a tion. Thispropertyisnotanissueforasketchdrawingtask. sequence of control points that reveal the curvature details However,forHDmapgeneration,themodelneedstoguar- ofeachlaneobject. Forlanes,thedensityofcontrolpoints anteethecontinuityconstraintbetweengeneratedpoints. usuallyincreasesatthecornerofeachlane. InFigure5,we 4.4.QuantitativeAnalysis show our model is capable of generating such features to havesmoothcurvesatthecornerofeachlane. 4.4.1 Metrics Semantic Attribute Generation. As shown in Figure 5, thetrafficlightsandthetrafficlightcontrolledlanesgener- Wedesignfourmetricstoquantifythegenerationresults: atedfromHDMapGenaremostlylocatedonthecrossroads Topology fidelity. We use maximum mean discrepancy andturnroads. Thegeneratedsemanticattributesarecon- (MMD) [17] to quantify the similarity with real maps on sistentwiththereal-worldurbanscenes. graphstatisticsusingGaussiankernelswiththefirstWasser- Comparison with Baselines. In Figure 6, we show the steindistance. Thetopologystatisticsweapplyhereisthe qualitativecomparisonbetweenourmodelsandotherbase- degreedistributionsandspectrumofgraphLaplacian. lines. Theresultsshowthattheproposedhierarchicalgraph Geometryfidelity. Weusegeometryfeatures, thelength, generative model HDMapGen (both coordinate-first and andorientationoflanestoquantifythesimilaritywithrealFigure6: HDMapswithdifferentcitystylesfromhierarchicalgraphgenerativemodelsHDMapGen(coordinate-first,topology-firstand independent)outperformplaingraphgenerativemodelPlainGen,andsequencegenerativemodelsketchRNN. maps. Then we compute the Fre´chet Distance to measure mance in the spectrum of graph Laplacian. As we evalu- twonormaldistributionsusingthesefeatures. atetopologyfeaturesbytransformingoutputsofallmodels Urban planning. We use the common urban planning intotheplaingraphlevel,whichmightpreservetheintrinsic features, connectivity to represent node degree, node den- topologypatternsforoutputsfromPlainGen. sitywithinaregion,reachasthenumberofaccessiblelane Diversity. WeuseametricquantifiedbyChamferdistance within a region to evaluate the transportation plausibility, toquantifythemap-wisediversityoftheglobalgraphgen- and convenience as the Dijkstra shortest path length for erated from HDMapGen. As shown in Table 2, we evalu- nodepairsofourgeneratedmapscomparedwithrealmaps atethediversityontwolevels,oneisthenoveltycompared [10]. WealsouseFre´chetDistancetomeasuretwonormal to real map ground truth, the other is the internal diversity distributionsofthesefeatures. amongoutputsamples. Itshowsthatresultsgeneratedwith As shown in Table 1, HDMapGen (both coordinate- variedtemperaturesarenovelcomparedwithrealmaps.For firstandtopology-first)outperformthebaselinesonmostof internaldiversity, wecouldseetheChamferdistancedras- the metrics. The results are consistent with the qualitative ticallyincreasesasthetemperaturebecomeslarger. analysis: SketchRNNhasapoorperformanceintopology- 4.5.AblationStudies relatedfeatures,suchasthedegree. Sinceproblematiccut- offsordead-endsoflanesfrequentlyhappeninSketchRNN, Wefurtherconductablationstudiesontheimpactofus- sothegeneratedmapshaveaverydifferentdistributionof ing different dependence and generation priority on three nodedegreesfromrealmaps. PlainGenhasabetterperfor- variants of HDMapGen models coordinate-first, topology-Measurement UrbanPlanning GeometryFidelity TopologyFidelity Metrics Fre´chetDistance MMD Features Conne. Densi. Reach. Conve. Len. Orien. Deg. Spec. Methods 100 101 101 101 10−1 101 10−1 10−1 conditional 0.50 21.20 13.43 49.4 2.04 1.77 1.03 4.94 SketchRNN unconditional 0.44 41.12 29.83 49.5 1.64 1.22 0.83 4.74 topology-first 0.54 5.18 14.57 22.6 1.85 1.54 0.17 0.31 PlainGen coordinate-first 0.39 4.30 4.81 12.0 1.64 0.46 0.12 0.29 independent 0.31 4.38 4.22 11.2 1.49 0.52 0.06 0.54 HDMapGen topology-first 0.26 4.06 4.47 9.9 1.49 0.37 0.05 0.63 coordinate-first 0.17 4.79 4.74 11.8 1.49 0.53 0.07 0.90 Table1:Measurementsofurbanplanning,geometryfidelityandtopologyfidelityonHDMapGenandbaselinesPlainGenandSketchRNN. Urbanplaning(featuresofconnectivity,density,reachandconvenience)andGeometry(featuresofthelengthandorientationoflanes)are quantifiedwithaFre´chetDistancemetric. Topologyfidelity(featuresofnodedegreeandspectrumofgraphLaplacian)isquantifiedwith aMMDmetric.Forallmetrics,lowerisbetter.ResultsareevaluatedonArgoversedataset. Temperature 0.1 0.2 0.3 0.4 0.5 Diversity GT 11.6 11.9 11.1 11.6 13.0 Diversity Output 2.75 4.83 5.83 6.62 8.34 Table2:DiversityquantifiedbyChamferdistance(scalingby104) forglobalgraphgeneratedfromHDMapGenusingdifferenttem- peraturesτ.Noveltyiscomparedtogroundtruthoramongoutputs internally.Resultsareevaluatedonourin-housedataset. firstandindependence. Figure7:Theglobalnodegenerationorder(frombluetored)and WeshowtheablationresultsinTable3. Forcoordinate- thescalabilityofourhierarchicalgraphgenerativemodelHDMap- first model, BCE, which quantifies the topology predic- GentogenerateHDmapswithaFoVof400m×400m. tion performance, is the best optimized. As after coordi- natesaregeneratedasaprior,thenthetopologyprediction smaller number of nodes to represent a global graph, and could be better optimized. Meanwhile, for topology-first (2)utilizingGRANwithstate-of-the-arttime-efficiencyfor model,NLL,whichquantifiesthegeometrypredictionper- global graph generation and a single-shot decoder for the formance,isthebestoptimizedwiththegeneratedtopology localgraphgeneration. asprior. Incontrast,independencemodelistheworstasit fails to model the dependence of coordinate and topology Model HDMapGen PlainGen SketchRNN duringgeneration. Time[s] 0.20 0.89 2.28 Metrics coordinate-first topology-first independent Table 4: The generation time of an HD map with a FoV of NLL -5.610 -6.142 -4.490 200m×200monArgoversedatasetusingHDMapGen,PlainGen, BCE 0.001 0.050 0.053 andSketchRNN.HDMapGenisclearlythefastest. Table3:NegativeLoglikelihood(NLL)andbinarycross-entropy 5.Conclusion (BCE) on the generated global graph from three variants of HDMapGen including coordinate-first, topology-first and inde- Inthiswork, weintroduceanovelandchallengingtask pendent.ResultsareevaluatedonArgoversedataset. to generate HD maps in a data-driven way. We performed 4.6.ScalabilityAnalysis asystematicexplorationforautoregressivegenerativemod- elswithdifferentdatarepresentations.Ourproposedhierar- We further experiment HDMapGen with a FoV of chical graph generative model HDMapGen largely outper- 400m×400m. AsshowninFigure7,HDMapGenconsis- formsotherbaselines. Wefurtherdemonstratedtheadvan- tentlyachievespromisingresultsforlargegraphgeneration, tagesofHDMapGeningenerationquality,diversity,scala- demonstratingitsscalability. bility,andefficiencyonreal-worlddatasets. 4.7.LatencyAnalysis 6.Acknowledgement WeshowalatencycomparisoninTable4. HDMapGen achieves a speed-up of ten-fold compared to SketchRNN. WewouldthankBenjaminSappandTianxingHeforin- HDMapGen is much more efficient due to (1) using a sightfulcommentsandsuggestions.References In International Conference on Machine Learning, pages 1242–1250.PMLR,2014. [1] Esri: Cityengine. https://www.esri.com/en-us/ [16] David Ha and Douglas Eck. A neural representation of arcgis/products/esri-cityengine. sketchdrawings. arXivpreprintarXiv:1704.03477,2017. [2] Daniel G Aliaga, Carlos A Vanegas, and Bedrich Benes. [17] Ehsan Hajiramezanali, Arman Hasanzadeh, Krishna Interactive example-based urban layout synthesis. In SIG- Narayanan, Nick Duffield, Mingyuan Zhou, and Xiaoning GRAPHAsia,2008. Qian. Variational graph recurrent neural networks. In [3] FavyenBastani,SongtaoHe,SofianeAbbar,MohammadAl- Advances in neural information processing systems, pages izadeh, Hari Balakrishnan, Sanjay Chawla, Sam Madden, 10701–10711,2019. andDavidDeWitt.Roadtracer:Automaticextractionofroad [18] Namdar Homayounfar, Wei-Chiu Ma, Justin Liang, Xinyu networksfromaerialimages. InCVPR,2018. Wu, Jack Fan, and Raquel Urtasun. Dagmapper: Learn- [4] SvenBauer,YasaminAlkhorshid,andGerdWanielik.Using ingtomapbydiscoveringlanetopology. InProceedingsof high-definitionmaps forprecise urbanvehicle localization. theIEEE/CVFInternationalConferenceonComputerVision In 2016 IEEE 19th International Conference on Intelligent (ICCV),October2019. TransportationSystems(ITSC),pages492–497.IEEE,2016. [19] ThomasNKipfandMaxWelling. Variationalgraphauto- [5] JanBenes,AlexanderWilkie,andJaroslavKriva´nek.Proce- encoders. arXivpreprintarXiv:1611.07308,2016. duralmodellingofurbanroadnetworks.ComputerGraphics [20] StevenMLaValle. Rapidly-exploringrandomtrees: Anew Forum,2014. toolforpathplanning. 1998. [6] YuningChai,BenjaminSapp,MayankBansal,andDragomir [21] Jun Li, Kai Xu, Siddhartha Chaudhuri, Ersin Yumer, Hao Anguelov. Multipath: Multipleprobabilisticanchortrajec- Zhang, and Leonidas Guibas. Grass: Generative recursive toryhypothesesforbehaviorprediction. InCoRL,2019. autoencoders for shape structures. ACM Transactions on [7] Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jag- Graphics(TOG),36(4):1–14,2017. jeetSingh,SlawomirBak,AndrewHartnett,DeWang,Pe- [22] Zuoyue Li, Jan Dirk Wegner, and Aure´lien Lucchi. ter Carr, Simon Lucey, Deva Ramanan, et al. Argoverse: 3d tracking and forecasting with rich maps. In Proceed- Polymapper: Extracting city maps using polygons. ingsoftheIEEEConferenceonComputerVisionandPattern arXiv:1812.01497,2018. Recognition,pages8748–8757,2019. [23] Ming Liang, Bin Yang, Rui Hu, Yun Chen, Renjie Liao, [8] AnningChen,ArvindRamanandan,andJayAFarrell.High- Song Feng, and Raquel Urtasun. Learning lane graph precision lane-level road map building for vehicle naviga- representations for motion forecasting. arXiv preprint tion. InIEEE/IONposition,locationandnavigationsympo- arXiv:2007.13732,2020. sium,pages1035–1042.IEEE,2010. [24] Renjie Liao, Yujia Li, Yang Song, Shenlong Wang, Will [9] GuoningChen,GregoryEsch,PeterWonka,PascalMu¨ller, Hamilton,DavidKDuvenaud,RaquelUrtasun,andRichard andEugeneZhang. Interactiveproceduralstreetmodeling. Zemel. Efficient graph generation with graph recurrent at- TOG,2008. tention networks. In Advances in Neural Information Pro- cessingSystems,pages4255–4265,2019. [10] Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina,XinkaiWei,Ming-YuLiu,AntonioTorralba,and [25] Gelle´rt Ma´ttyus, Wenjie Luo, and Raquel Urtasun. Deep- SanjaFidler. Neuralturtlegraphicsformodelingcityroad roadmapper: Extracting road topology from aerial images. layouts. In Proceedings of the IEEE International Confer- InICCV,2017. enceonComputerVision,pages4522–4530,2019. [26] Kaichun Mo, Paul Guerrero, Li Yi, Hao Su, Peter Wonka, [11] NicolaDeCaoandThomasKipf. Molgan:Animplicitgen- NiloyMitra,andLeonidasJGuibas.Structurenet:Hierarchi- erative model for small molecular graphs. arXiv preprint calgraphnetworksfor3dshapegeneration. arXivpreprint arXiv:1805.11973,2018. arXiv:1908.00575,2019. [12] Arnaud Emilien, Ulysse Vimont, Marie-Paule Cani, Pierre [27] Charlie Nash, Yaroslav Ganin, SM Eslami, and Peter W Poulin, and Bedrich Benes. Worldbrush - interactive Battaglia. Polygen: Anautoregressivegenerativemodelof example-basedsynthesisofproceduralvirtualworlds. TOG, 3dmeshes. arXivpreprintarXiv:2002.10880,2020. 2015. [28] G Nishida, I Garcia-Dorado, and D G Aliaga. Example- [13] EricGalin,AdrienPeytavie,E´ricGue´rin,andBedrichBenes. drivenproceduralurbanroads. ComputerGraphicsForum, Authoringhierarchicalroadnetworks. ComputerGraphics 2015. Forum,2011. [29] Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen [14] Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Anguelov, Congcong Li, and Cordelia Schmid. Vectornet: AndrewSenior,andKorayKavukcuoglu.Wavenet:Agener- Encodinghdmapsandagentdynamicsfromvectorizedrep- ativemodelforrawaudio.arXivpreprintarXiv:1609.03499, resentation. In Proceedings of the IEEE/CVF Conference 2016. onComputerVisionandPatternRecognition,pages11525– [30] YoavIHParishandPascalMu¨ller. Proceduralmodelingof 11533,2020. cities.InProceedingsofthe28thannualconferenceonCom- [15] Karol Gregor, Ivo Danihelka, Andriy Mnih, Charles Blun- puter graphics and interactive techniques, pages 301–308, dell, and Daan Wierstra. Deep autoregressive networks. 2001.[31] Bidisha Samanta, Abir De, Gourhari Jana, Vicenc¸ Go´mez, Pratim Chattaraj, Niloy Ganguly, and Manuel Gomez- Rodriguez. Nevae: Adeepgenerativemodelformolecular graphs. JournalofMachineLearningResearch,21(114):1– 33,2020. [32] IlyaSutskever,OriolVinyals,andQuocVLe. Sequenceto sequencelearningwithneuralnetworks.InAdvancesinneu- ralinformationprocessingsystems,pages3104–3112,2014. [33] JerryOTalton,YuLou,SteveLesser,JaredDuke,Radom´ır Mech,andVladlenKoltun.Metropolisproceduralmodeling. TOG,2011. [34] Sebastian Thrun. Simultaneous localization and mapping. In Robotics and cognitive approaches to spatial mapping, pages13–41.Springer,2007. [35] Petar Velicˇkovic´, Guillem Cucurull, Arantxa Casanova, AdrianaRomero,PietroLio,andYoshuaBengio. Graphat- tentionnetworks. arXivpreprintarXiv:1710.10903,2017. [36] HongweiWang,JiaWang,JialinWang,MiaoZhao,Weinan Zhang,FuzhengZhang,XingXie,andMinyiGuo. Graph- gan:Graphrepresentationlearningwithgenerativeadversar- ialnets. arXivpreprintarXiv:1711.08267,2017. [37] Benjamin Watson, Pascal Mu¨ller, Oleg Veryovka, Andy Fuller, Peter Wonka, and Chris Sexton. Procedural urban modelinginpractice. IEEEComputerGraphicsandAppli- cations,28(3):18–26,2008. [38] Yong-Liang Yang, Jun Wang, Etienne Vouga, and Peter Wonka. Urban pattern: Layout design by hierarchical do- main splitting. ACM Transactions on Graphics (TOG), 32(6):1–12,2013. [39] JiaxuanYou,RexYing,XiangRen,WilliamHamilton,and JureLeskovec. Graphrnn: Generatingrealisticgraphswith deepauto-regressivemodels. InICML,2018. [40] Hang Zhao, Jiyang Gao, Tian Lan, Chen Sun, Benjamin Sapp,BalakrishnanVaradarajan,YueShen,YiShen,Yuning Chai, CordeliaSchmid, etal. Tnt: Target-driventrajectory prediction. arXivpreprintarXiv:2008.08294,2020.Inthesupplementarymaterials,wedescribethestatistics C.1.GlobalGraphGeneration: TwoVariants of HD map datasets in Section A; more qualitative results In Figure 11, we show the neural network structures of ofgeneratedmapsinSectionB;anddetailsofHDMapGen the other two variants of HDMapGen using topology-first modelsinSectionC. orindependentforglobalgraphgeneration. Topology-first:Atthestept,themodelfirstlygeneratedthe A.DatasetStatistics topologyL togeneratetheconnectionsofnodetwithpre- t viouslygeneratednodes. Andthentakingthenewtopology Inthissection,weintroducethestatisticsoftheHDmap informationL asadditionalinputstotheGRANmodelto t datasets in Table 5. In the plain graph setting, we have a generatethespatialcoordinatesC ofnodet. t maximum of 250 nodes (control points) for the Argoverse Independent: Atthestept,themodelistrainedtogenerate dataset and 498 nodes for our In-house dataset. After rep- C and L simultaneously, while not consider any depen- t t resenting the map data as hierarchical graphs, we convert dencebetweenthesetwovariables. 30% of the original control points into global graph nodes and 70% of which into local graph nodes. For the local C.2.Baselines graph generation, as we allow a variable number of local In this work, we implement two baselines nodesineachlane,wesetamaximumlengthofW witha SketchRNN [16] and PlainGen. For SketchRNN which nodevaliditymask. W issetto8fortheArgoversedataset uses a sequence generative model, we use “layer norm” and20fortheIn-housedataset. model as our encoder and decoder model. We also apply different temperatures to fine-tune a better output. For B.QualitativeResults a plain graph generative model, we use PlainGen, a model derived from our global graph generation step of B.1.GlobalGraphDiversity HDMapGen. NoticethatNTG [10]whichisdesignedfora morecoarseroadlayoutgeneration,alsousesaplaingraph InFigure8,wedemonstratemoregeneratedglobalgraph generative model. However, since the model has not been resultsasthetemperaturechangesduringthediversitycon- open-sourcedyet,wearenotabletoperformacomparison trol.WeshowthatmorenovelHDmappatterns(threeinter- withNTGonthishigh-definitionmapgenerationtask. sectionsormoreparallellanes)aregeneratedwhenalarge temperatureτ isapplied,yetthequality-diversitytrade-off stillexists. B.2.HDMapGenGenerationQuality We show more results from HDMapGen, generated maps with a FoV of 200m×200m in Figure 9, and gen- erated maps with a FoV of 400m × 400m in Figure 10. The average number of global nodes in the smaller maps is43, andtheaveragenumberofglobaledgesis50, while forthelargermaps, theaveragenumberofglobalnodesis 136,andtheaveragenumberofglobaledgesis175,which meansmorethanfourtimesofnodes. Whileourproposed HDMapGenstillachievespromisingresultsforlargegraph generation,whichdemonstratesitslargescalability. Notice that an issue with our current results is the un- smoothness of the local graph in some generated sam- ples. However,allofourdemonstratedresultsarenotpost- processedorappliedwithanyheuristicthresholdingorfil- tering during generation. One can expect a better version withadditionalstepsasdescribedabove. C.ModelDetails Inthissection,weintroducemoredetailsofbaselineand twovariantsofHDMapGenmodels.GraphType PlainGraph GlobalGraph LocalGraph Component #Nodes #Edges #Nodes #Edges #Nodes Dataset City Max Mean Max Mean Noedge/Edge Max Mean Max Mean Noedge/Edge Max Argoverse MIA 250 147 267 151 164 112 43 138 50 40 8 Argoverse PIT 250 178 265 185 187 111 51 134 64 44 8 In-house SF 498 164 537 166 188 100 47 135 52 48 20 Table5: GraphstatisticforArgoverseDataset(MiamiandPittsburg)andin-houseDataset(SanFrancisco): numberofnodes,numberof edges,theratioofnoedgesv.s.edges(sparsitylevel)inaplaingraphorahierarchicalgraph(includingbothglobalgraphandlocalgraph). Figure8:DiversitycontrolforglobalgraphgeneratedfromHDMapGen.Outputsgeneratedwithdifferenttemperaturesτ.Figure9:HierarchicalgraphwithaFoVof200m×200mgeneratedfromHDMapGen. Figure10:HierarchicalgraphwithaFoVof400m×400mgeneratedfromHDMapGen.Figure11:TwovariantsofHDMapGen:topology-firstandindependent.Themodelsconsiderdifferentdependenceandprioritytogenerate coordinatesandtopologyateachstep.