SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving ZhenpeiYang1∗,YuningChai2,DragomirAnguelov2,YinZhou2,PeiSun2, DumitruErhan3,SeanRafferty2,HenrikKretzschmar2 1UTAustin,2Waymo,3GoogleBrain Abstract largesetofdiverseandcomplexscenariosinsimulationcap- turingsensorproperties,seasons,timeofday,andweather. Autonomousdrivingsystemdevelopmentiscriticallyde- Developingsimulatorsthatsupportthelevelsofrealismre- pendentontheabilitytoreplaycomplexanddiversetraffic quiredforautonomoussystemevaluationisachallenging scenariosinsimulation. Insuchscenarios,theabilitytoac- task. Therearemanywaystodesignsimulators,including curatelysimulatethevehiclesensorssuchascameras,lidar simulating mid-level object representations [4, 11]. How- orradarisessential. However, currentsensorsimulators ever,mid-levelrepresentationsomitsubtleperceptualcues leveragegamingenginessuchasUnrealorUnity,requiring thatareimportantforsceneunderstanding,suchaspedes- manualcreationofenvironments,objectsandmaterialprop- triangesturesandblinkinglightsonvehicles. Furthermore, erties. Suchapproacheshavelimitedscalabilityandfailto asend-to-endmodelsthatcombineperception,prediction, producerealisticapproximationsofcamera,lidar,andradar andsometimesevencontrolbecomeanincreasinglymore datawithoutsignificantadditionalwork. populardirectionofresearch,wearefacedwiththeneedto Inthispaper,wepresentasimpleyeteffectiveapproach faithfullysimulatethesensordata,whichistheinputtosuch to generate realistic scenario sensor data, based only on modelsduringscenarioreplay. alimitedamountoflidarandcameradatacollectedbyan Frameworksforautonomousdrivingthatsupportrealistic autonomous vehicle. Our approach uses texture-mapped sensorsimulationaretraditionallybuiltontopofgamingen- surfels to efficiently reconstruct the scene from an initial ginessuchasUnrealorUnity[11]. Theenvironmentandits vehicle pass or set of passes, preserving rich information objectmodelsarecreatedandarrangedmanually,toapproxi- aboutobject3Dgeometryandappearance,aswellasthe matereal-worldscenesofinterest.Inordertoenablerealistic sceneconditions. WethenleverageaSurfelGANnetworkto LiDARandradarmodeling,materialpropertiesoftenneed reconstructrealisticcameraimagesfornovelpositionsand to be manually specified as well. The overall process is orientationsoftheself-drivingvehicleandmovingobjectsin time-consumingandeffort-intensive. Furthermore,simple thescene.WedemonstrateourapproachontheWaymoOpen ray-castingorray-tracingtechniquesareofteninsufficientto Datasetandshowthatitcansynthesizerealisticcameradata generaterealisticcamera,LiDAR,orradardataforaspecific forsimulatedscenarios. Wealsocreateanoveldatasetthat self-drivingsystem,andadditionalworkisrequiredtoadapt containscasesinwhichtwoself-drivingvehiclesobservethe thesimulatedsensorstatisticstotherealsensors. samesceneatthesametime. Weusethisdatasettoprovide Inthiswork,weproposeasimpleyeteffectivedata-driven additionalevaluationanddemonstratetheusefulnessofour approachforcreatingrealisticscenariosensordata. Ourap- SurfelGANmodel. proachreliesoncameraandLiDARdatacollectedduring asinglepass,orseveralpasses,ofanautonomousvehicle throughasceneofinterest. Weusethisdatatoreconstruct 1.Introduction thesceneusinga texture-mappedsurfelrepresentation. This representationissimpleandcomputationallyefficienttocre- Recent advances in deep learning have inspired break- ateandpreservesrichinformationaboutthe3Dgeometry, throughs in multiple areas related to autonomous driving semantics,andappearanceofallobjectsinthescene. Given such as perception [15, 26], prediction [4, 6] and plan- thesurfelreconstruction,wecanrenderthescenefornovel ning[12]. Theserecenttrendsonlyunderscoretheincreas- poses of theself-driving vehicle (SDV) and the othersce- ingly significant role of data-driven system development. narioagents. Therenderedreconstructionforthesenovel Oneaspectisthatdeeplearningnetworksbenefitfromlarge viewsmayhavesomemissingpartsduetoocclusiondiffer- trainingdatasets. Anotheristhatautonomousdrivingsys- encesbetweentheinitialandthenewsceneconfiguration. temevaluationrequirestheabilitytorealisticallyreplaya It can also have visual quality artifacts due to the limited ∗WorkdoneasaninternatWaymo.Correspondence:yzp@utexas.edu fidelityofthesurfelreconstruction. Weaddressthisgapby 0202 nuJ 52 ]VC.sc[ 2v44830.5002:viXraSurfelGAN Textured Surfel Surfel Semantic Instance Rendering Segmentation Segmentation a) Goal: Novel image generation b) Step 1: Surfel scene reconstruction c) Step 2: SurfelGAN image generation Figure1.Overviewofourproposedsystem.a)Thegoalofthisworkisthegenerationofcameraimagesforautonomousdrivingsimulation. Whenprovidedwithanoveltrajectoryoftheself-drivingvehicleinsimulation,thesystemgeneratesrealisticvisualsensordatathatisuseful fordownstreammodulessuchasanobjectdetector,abehaviorpredictor,oramotionplanner.Atahighlevel,themethodconsistsoftwo steps: b)First,wescanthetargetenvironmentandreconstructasceneconsistingofrichtexturedsurfels. c)Surfelsarerenderedatthe cameraposeofthenoveltrajectory,alongsidesemanticandinstancesegmentationmasks.ThroughaGAN[14],wegeneraterealistically lookingcameraimages. applyingaGANnetwork[14]totherenderedsurfelviews ronmentintoavision-basedself-drivingcarresearchenvi- toproducethefinalhigh-qualityimagereconstructions. An ronment. CARLA[11]isapopularopen-sourcesimulation overviewofourproposedsystemisillustratedinFig.1. enginethatsupportsthetrainingandtestingofSDVs. All Ourworkmakesthefollowingcontributions: 1)Wede- thesesimulatorsrelyonmanualcreationofsyntheticenvi- scribe apipeline that builds a detailedreconstruction of a ronments,whichisaformidableandlaboriousprocess. In dynamicscenefromreal-worldsensordata. Thisrepresen- CARLA[11],the3Dmodeloftheenvironment,whichin- tationallowsus torender novel views inthescene, corre- cludesbuildings,road,vegetation,vehicles,andpedestrians, spondingtodeviationsoftheSDVandtheotheragentsinthe ismanuallycreated. Thesimulatorprovidesonetownwith environmentfromtheirinitiallycapturedtrajectories(Sec. 2.9kmofdrivableroadsfortrainingandanothertownwith 3.1). 2) We propose a GAN architecture that takes in the 1.4kmofdrivableroadsfortesting. Incontrast,oursystem renderedsurfelviewsandsynthesizesimageswithquality iseasilyextendabletonewscenesthataredrivenbyanSDV. andstatisticsapproachingthatofrealimages(Tab.1)3)We Furthermore, because the environment we are building is buildthefirstdatasetforreliablyevaluatingthetaskofnovel ahigh-qualityreconstructionbasedonthevehiclesensors, viewsynthesisforautonomousdriving,whichcontainscases it naturally closes the domain gap between synthetic and inwhichtwoself-drivingvehiclesobservethesamescene realcontents,whichispresentinmosttraditionalsimulation atthesametime. Weusethisdatasettoprovideadditional environments. Similartothiswork,AADS[23]utilizesreal evaluationanddemonstratetheusefulnessofourSurfelGAN sensordatatosynthesizenovelviews. Themajordifference model. isthatwereconstructthe3Denvironment,whileAADSuses apurelyimage-basednovelviewsynthesis. Reconstructing the3Denvironmentgivesusthefreedomtosynthesizenovel 2.RelatedWork views that could not be easily captured in the real world. SimulatedEnvironmentsforDrivingAgents. Therehave Moreover,onceourenvironmentisbuilt,wenolongerneed beenmanyeffortstowardsbuildingsimulatedenvironments tostoretheimagesorquerythenearestK viewsuponsyn- forvarioustasks[5,11,40,41,42]. Muchworkhasfocused thesis,whichsavestimefordeployment. onindoorenvironments[5,40,42]basedonpublicindoor LearningonSyntheticData. Besidesenablingend-to-end datasetssuchasSUNCG[35]orMatterport3D[7]. Incon- trainingandevaluationofagents,thesimulatedenvironment trasttoindoorsettingswheretheenvironmentisrelatively can also provide a large amount of data for training deep simpleandeasytomodel,simulatorsforautonomousdriving neuralnetworks. [32]usesasyntheticscenetogeneratea exhibitsignificantchallengesinmodelingthecomplicated largeamountoffullylabeledtrainingdataforurbanscene anddynamicscenariosofreal-worldscenes. TORCS[41]is segmentation. [19]generateimagescontainingnovelplace- oneofthefirstsimulationenvironmentsthatsupportmulti- mentofdynamicobjectstoboosttheperformanceofobject agentracing,butisnottailoredforreal-worldautonomous detection. drivingresearchanddevelopment. DeepGTAV[1]provides Geometric Reconstruction and 3D Representations. A apluginthattransformstheGrandTheftAutogamingenvi- typicalapproachfor3Dreconstructionofoutdoorenviron-mentsistousestructurefrommotion[37,39]ormulti-view whileremainingefficientintermsofcomputationandstor- stereo[13]torecoveradense3Dpointcloudfromimage age. Towardsthisgoal,weproposeanoveltexture-enhanced collections,thenoptionallyusePoissonreconstruction[21] surfelmaprepresentation. Surfelsarecompact,easytore- toobtainameshrepresentation. Suchaparadigmismost construct, and because of their fixed size, easy to texture suitablewhenwehavemultipleimagescoveringthesame andcompress. Belowwedescribeourapproach,whichcan areafromdifferentperspectives, whichisnotalwaystrue preservemorefine-graineddetailscomparedtotraditional in our case. Thanks to the rapid advancement of LiDAR surfelmaprepresentations[31]. technology,wecanhaveaccuratedepthinformationtocom- Wediscretizethesceneintoa3Dvoxelgridoffixedsize plementthecameraimagedata. Ourapproachleveragesthe andprocesstheLiDARscansintheordertheyarecaptured. traditionalsurfelrepresentation[31]augmentedwithfine- Foreachvoxel,weconstructasurfeldiskbyestimatingthe grainedimagetextures,whichnotonlygreatlysimplifiesthe meancoordinateandthesurfelnormal,basedonalltheLi- 3Dreconstructionprocess,butalsoeffectivelymodelsobject DARpointsinthatvoxel.Thesurfeldiskradiusisdefinedas √ appearanceandcolorwithhighfidelity. TruncatedSigned 3v,wherevdenotesthevoxelsize. FortheLiDARpoints DistanceFunctions[10]andtheirmostrecentvariants[29] binned in a voxel, we also have the corresponding colors arealsopromisingalternativestosurfel-basedmodeling. Re- fromthecameraimage,whichwecanusetoestimatethe centworkbyAlievetal. [2]thataugments3Dcloudpoints surfel color. Note that traditional surfel maps suffer from withalearnableneuraldescriptorforrenderingpurposeshas thetrade-offbetweengeometryconsistencyandfine-grained alsoshownpromisingresults, howeveritassumesastatic details, i.e., a large voxel size gives better geometry con- environment and is not applicable in practice to outdoor sistencybutfewerdetails,whilesmallvoxelsizeresultsin drivingscenarios,whichusuallycontaintensofmillionsof finer details but less stable geometry. Therefore, we take points. analternativeapproachthataimstoachievebothgoodge- GAN-based Image Translation. Generative Adversarial ometry consistency and rich texture details. Specifically, Networks(GAN)[14]haveattractedbroadinterestinboth we discretize each surfel disk into a k ×k grid centered academia and industry. While [14] aims to synthesise re- onitspointcentroid,asillustratedinsubfigureb)inFig.1. alistic images directly, [18] targets the conditional image Eachgridcenterisassignedanindependentcolortoencode synthesis setting. Subsequent research [3, 20, 30, 44] has higher-resolutiontexturedetails. madegreatstridesinimprovingthequalityofimagesgen- Sinceeachsurfelmayhaveadifferentappearanceacross eratedbyGANmethods;wereferthereadersto[9]foran different frames, due to the variations of the lighting con- overview. [38]proposeamodeltrainedonCityscapes[8] ditionsandthechangesofrelativepose(distanceandview thatcanconvertvideosofsemanticsegmentationmasksinto angle),weproposetoenhancethesurfelrepresentationby videosofrealisticimages. Theirrequirementofhavingac- creating a codebook of such k ×k grids at n various dis- curateper-pixelsemanticannotationsofscenesofinterest tances. Foreachgridbin,wedetermineitscolorfromthe maybedifficulttosatisfy. Incontrast,ourapproachrequires firstobservation, whichwefoundisimportanttoobtaina only accurate 3D bounding boxes for the moving objects smooth rendering image. During the rendering stage, we inthescene,whichcanbemorecost-effectivetoobtainby determine which k×k patch to use based on the camera human annotation. We believe even this requirement can pose. ThefinalrenderingisshowninFig.2. Wecanseethat befurtherrelaxed,byreplacingtheground-truth3Dboxes thebaselinesurfelmapintroducesmanyartifactsatobject with3Dboxesproducedbyrunningahigh-qualityoffline boundariesandyieldsnon-smoothcoloringatnon-boundary 3DperceptionpipelineontheSDVsensordata. Finally,in areas. In contrast, our texture-enhanced surfel map elimi- ourworkwealsoaddressthetraditionalchallengeofGAN natesmuchoftheartifactsandleadstovivid-lookingimages. evaluationbyproposingtwonewmetricsthataresuitable Inourexperiments,weusev =0.2m,k =5andn=10. forthetaskofnovelviewsynthesis. HandlingDynamicObjects. Weconsidervehiclesasrigid dynamicobjectsandreconstructaseparatemodelforeach. 3.Approach Forsimplicity, weleveragethehigh-quality3Dbounding boxannotationsfromtheWaymoOpenDataset[36]toaccu- Inthissection, wedescribethekeyinnovationsofthis mulatetheLiDARpointsfrommultiplescansforeachobject work: 1)texture-enhancedsurfelscenereconstructionand2) ofinterest. WeapplytheIterativeClosestPoint(ICP)[33] SurfelGAN-basedimagesynthesisappliedonthenovelren- algorithmtorefinethepointcloudregistration,producinga deredsceneviews. Theircombinationenablesthecreation densepointcloudthatallowsanaccurate,enhancedsurfel ofarealisticdata-drivensensorsimulationenvironment. reconstructionforeachvehicle. PleaseseeSec.Aforrecon- structedexamples. Ourapproachdoesnotstrictlyrequire 3.1.SurfelSceneReconstruction 3Dboxground-truth;wecanalsoleveragestateoftheart EnhancedSurfelMap. Agoodscenereconstructionmodel vehicledetectionandtrackingalgorithms[27,34]togetini- enablesthefaithfulpreservationofthesensorinformation, tialestimatesforICP.However,weleavethisexperimentforFigure2.Visualizationofdifferentscenemodelingstrategies.Toprow:Surfelbaseline;Centerrow:ourTexture-EnhancedSurfelMap (alsoknownassurfelrenderingintherestofthepaper);Bottomrow:Realcameraimage. futurework. easytoobtain. Weleverageunpaireddatafortwopurposes: Whensimulatingtheenvironment,thereconstructedve- improvingthegeneralizationofthediscriminatorbytraining hiclemodelscanbeplacedinanylocationofchoice. Inthe withmoreunlabeledexamples,andregularizingthegenera- caseofthepedestrians,whicharedeformableobjects,we torbyenforcingcycleconsistency. Letthereversegenerator reconstruct a separate surfel model for each LiDAR scan GI→S be another encoder-decoder model which has the θI separately. Weallowplacementofthereconstructedpedes- same architecture as GS→I except more output channels θS triananywhereinthesceneforthatscan. Weleavethetask forsemanticandinstancemaps. Thenanysurfelrendering, ofaccuratedeformablemodelreconstructionfrommultiple pairedS orunpairedS canbetranslatedtoarealimage p u scanstofuturework. andtranslatedbacktoasurfelrendering,whereacyclecon- sistencylosscanbeapplied. Thesameappliestoanypaired 3.2.ImageSynthesisviaSurfelGAN I orunpairedI realimageaswell.Finally,weaddthesur- p u felrenderingdiscriminatorDS thatjudgesgeneratedsurfel Whilethesurfelscenereconstructionprovidesarichrep- φS images. WecallSurfelGANstrainedwithadditionalcycle resentationoftheenvironment,itproducessurfel-basedren- consistencySurfelGAN-SAC.Anintuitiveoverviewofthe deringsthathaveanon-negligiblerealismgapwhencom- trainingstrategyisshowninFig.3, whileSec.4contains paredtorealimages,duetoincompletereconstructionand adetaileddescriptionofourpairedandunpaireddata. We imperfectgeometryandtexturing(seeFig.2). OurSurfel- optimizethefollowingobjective: GANmodelisexplicitlydesignedtoaddressthisissue. SurfelGAN is a generative model that converts surfel max min L (GS→I,S ,I )+λ L (GI→S,I ,S ) imagerenderingstorealisticallylookingimages. Wetreat φS,φIθS,θI r θS p p 1 r θI p p semanticandinstancesegmentationmapsasadditionalren- +λ 2L a(G θS S→I,D φI I,S p,u)+λ 3L a(GI θI→S,D φS S,I p,u) deredimagechannels. Forthesakeofsimplicity,weomit +λ L (GS→I,GI→S,S )+λ L (GI→S,GS→I,I ), 4 c θS θI p,u 5 c θI θS p,u theirexplicitmentionintherestofthisthissection. (1) Let the generator GS→I be an encoder-decoder model whereL ,L ,L denotethesupervisedreconstruction,ad- θS r a c withlearnableparametersθ . Givenpairsofsurfelrender- versarialandcycleconsistencyloss, respectively. Weuse S ingsS andimagesI ,thesupervisedlosscanbeapplied hingedWassersteinlossforadversarialtraining[24,28,43] p p totrainthegenerator. WecallaSurfelGANmodelthatis inourexperimentsasithelpstostabilizethetraining. We trainedsolelywithsupervisedlearningSurfelGAN-S.Ad- use(cid:96)1-lossasreconstructionandcycle-consistencylossfor ditionally,wecanaddanadversariallossfromarealimage renderingsandimagesandcrossentropylossforsemantic discriminatorDI . SurfelGANstrainedwiththisadditional andinstancemaps. φI lossisnamedSurfelGAN-SA. DistanceWeightedLoss. Duetothelimitedcoverageof However,pairedtrainingdatabetweensurfelrenderings thesurfelmap,oursurfelrenderingcontainslargeareasof and real image is very limited. Unpaired data is however unknownregions. TheuncertaintyinthoseregionsismuchSurfel rendering DS Image-to-surfel generator Paired Unpaired discriminator G I -› S image I image I p up Unpaired surfel Paired surfel G S -› I rendering S up rendering S p Surfel-to-image generator DI Image discriminator Figure 3. (Best viewed in color) SurfelGAN training paradigm. The training setup has two symmetric encoder-decoder generators mappingfromsurfelrenderingstorealimagesGS→I andviceversaGI→S.Additionally,therearetwodiscriminators,DS,DI,which specializeinthesurfelandtherealdomain.Thelossesareshownascoloredarrows.Green:supervisedreconstructionloss.Red:adversarial loss.Blue/Yellow:cycle-consistencylosses.Whentrainingwithpaireddata,e.g.WOD-TRAIN,thesurfelrenderingstranslatetorealimages, andwecanapplyaone-directionalsupervisedreconstructionloss(SurfelGAN-S)onlyoraddanadditionaladversarialloss(SurfelGAN-SA). Whentrainingwithunpaireddata,wecangoeitherfromthesurfelrenderings(e.g.WOD-TRAIN-NV)ortherealimages(e.g.Internal CameraDataset),useoneoftheencoder-decodernetworkstogettotheotherdomainandback.Wecanthenapplyacycleconsistencyloss. (SurfelGAN-SAC).Theencoder-decodernetworksconsistof8convolutionaland8deconvolutionallayers. Discriminatorsconsistof5 convolutionallayers.Allnetworkoperateon256×256sizedinput. higherthanthatoftheregionwithsurfelinformation. Also, associateseachpixelwithasurfelindexandthendetermin- thedistancebetweenthecameraandthesurfelintroduces ingthesemanticclassorinstancenumberthroughalook-up anotherfactorofuncertainty. Therefore,weuseadistance table. weighted loss to stabilize our GAN training. Specifically, We derive another dataset from WOD, which we call duringdatapre-processing,wegenerateadistancemapthat WaymoOpenDataset-NovelView(WOD-TRAIN-NVand recordsthenearestdistancetotheobservedregionandthen WOD-EVAL-NV).Weagainstartfromreconstructedsurfel uses the distance information as weighting coefficients to scenes,butwenowrendersurfelimagesfromnovelcamera modulateourreconstructionloss. posesperturbedfromexistingcameraposes. Theperturba- TrainingDetails. WeusetheAdam[22]optimizerfortrain- tionconsistsofapplyingarandomtranslationandarandom ing. Wesettheinitiallearningrateto2e−4forboththegen- yawangleperturbationtothecameramountedvehicle. We eratorandthediscriminatorandsetβ =0.5andβ =0.9. 1 2 usetheannotated3Dboundingboxestoensuretheperturbed Weusebatchnormalization[17]afterReluactivation. We vehicledoesnotintersectwithotherobjectsinthescene. setλ = 1,λ ,λ = 0.001,,λ ,λ = 0.1inallofourex- 1 2 3 4 5 periments. Thetotaltrainingtimeofournetworkis3days, We generate one new surfel image rendering for each basedononeNvidiaTitanV100GPUwithbatchsize8. frameintheoriginaldataset. Notethatalthoughthisdataset comesforfree,i.e.wecouldgenerateanynumberoftest- 4.ExperimentalResults ingframes,itdoesnothavecorrespondingcameraimages. Therefore,thisdatasetcanonlybeusedforunpairedtraining We base our experiments mainly on the Waymo Open andonlysometypesofevaluation. Dataset[36],butwealsocollectedtwoadditionaldatasets inordertoobtainahigherqualitymodelandenableamore InternalCameraImageDataset. Wecollectedadditional extensiveevaluation. 9.8kshortsequences(100framesforeach)similartoWOD WaymoOpenDataset(WOD)[36]. Thedatasetconsists images. Theseun-annotatedimagesareusedforunpaired of798training(WOD-TRAIN)and202validation(WOD- trainingofrealimages. EVAL)sequences. Eachsequencecontains20secondsof cameraandLiDARdatacapturedat10Hz,aswellasfully Dual-Camera-Pose Dataset (DCP) Finally, we built a annotated3Dboundingboxesforvehicles,pedestrians,and unique dataset tailored for measuring the realism of our cyclists. TheLiDARdatacoversafull360degreesaround model. Thedatasetcontainsscenarioswheretwovehicles theagent,whilefivecamerascapturethefrontal180degrees. observethesamesceneatthesametime. Specifically,we After reconstructing the surfel scenes, we can render the findtheintervalwheretwovehiclesarewithin20mofeach surfel images in the same pose as the original camera im- other. Weusethesensordatafromthefirstvehicletorecon- ages,hencegeneratingsurfel-image-to-camera-imagepairs structthescene,andrenderthesurfelimageattheexactpose thatcanbeusedforpairedtrainingandevaluation. Since ofthesecondvehicle. Afterfilteringcaseswherethescene duringthereconstructionprocessweknowthecategoryfor reconstructionistooincomplete,weobtainaround1kpairs, eachsurfel,wecaneasilyderivebothsemanticandinstance forwhichwecandirectlymeasurethepixel-wiseaccuracy segmentation masks by first rendering an index map that ofthegeneratedimage.gniredneRlefruS S-NAGlefruS AS-NAGlefruS CAS-NAGlefruS LAER Figure4.QualitativecomparisonbewteendifferentSurfelGANvariantsandthebaselineonWOD-EVALunderdifferentweatherconditions. WOD-TRAIN-NV WOD-EVAL WOD-EVAL-NV AP@50↑ AP@75↑ AP↑ Rec↑ AP@50 AP@75 AP Rec AP@50 AP@75 AP Rec Surfel(baseline) 0.444 0.168 0.211 0.342 0.521 0.168 0.239 0.371 0.462 0.154 0.213 0.348 SurfelGAN-S(ours) 0.508 0.177 0.236 0.359 0.576 0.164 0.252 0.341 0.514 0.159 0.230 0.358 SurfelGAN-SA(ours) 0.554 0.200 0.259 0.382 0.610 0.174 0.266 0.394 0.567 0.180 0.257 0.387 SurfelGAN-SAC(ours) 0.564 0.200 0.263 0.385 0.620 0.181 0.272 0.400 0.570 0.181 0.258 0.388 Real(upperbound) - - - - 0.619 0.198 0.281 0.424 - - - - Table1. Realismw.r.t.anoff-the-shelfvehicleobjectdetector. WegeneratedimagesusingtheproposedSurfelGANandraninference onthemusinganoff-the-shelfobjectdetector. WereportthestandardCOCOobjectdetectionmetrics[25], includingvariantsofthe average-precision(AP)andrecallat100(Rec). SurfelisthesurfelrenderingthatistheinputtoSurfelGAN.SurfelGANistheproposed model.TheSvariantistrainedwithpairedsupervisedlearningonly.TheSAvariantaddstheadversarialloss,andtheSACvariantmakes useofadditionalunpaireddataandappliedacyclicadversarialloss.Realistherealimagecapturedbycameras,whichisonlyavailablein WOD-EVAL.Itservesasanupperboundtothedetector’squality.Asshownabove,SurfelGANsignificantlyimprovesoverthebaseline, andreachesqualitymetricvaluessimilartothoseoftherealimages. 4.1.ModelVariantsandBaseline Hence, it is only possible to train on WOD-TRAIN. Su- pervised+Adversarial(SA):westillonlyconsiderWOD- Mostexperimentswereperformedonthreevariantsof TRAINasthetrainingdata. However,weaddanadversarial our proposed model. Supervised (S): we train the surfel- lossalongsidethe(cid:96)1-loss. Supervised+Adversarial+Cy- rendering-to-imagemodelinasupervisedwaybyminimiz- cle(SAC):inthisvariation,wealsouseWOD-TRAIN-NV ingan(cid:96)1-lossbetweenthegeneratedimageandtheground- andtheInternalCameraImageDataset. Sincethesetwosets truthrealimage. Thistypeoftrainingrequirespaireddata. areunpaired,thesupervisedlossdoesnotapply. Weproposetouseacycle-consistencylossinadditiontotheadversarial Perturbation AP@50 AP@75 AP loss,asdiscussedinSec.3.2. d<=1.0 0.574 0.174 0.257 1.0