A Versatile Approach to Evaluating and Testing Automated Vehicles based on Kernel Methods Zhiyuan Huang1, Yaohui Guo2, Henry Lam3, and Ding Zhao2,4,∗ Abstract—Evaluation and validation of complicated control systems are crucial to guarantee usability and safety. Usually, failure happens in some very rarely encountered situations, but once triggered, the consequence is disastrous. Accelerated Evaluation is a methodology that efficiently tests those rarely- occurring yet critical failures via smartly-sampled test cases. The distribution used in sampling is pivotal to the performance of the method, but building a suitable distribution requires case- by-case analysis. This paper proposes a versatile approach for constructing sampling distribution using kernel method. The approach uses statistical learning tools to approximate the critical event sets and constructs distributions based on the unique properties of Gaussian distributions. We applied the method to evaluate the automated vehicles. Numerical experiments show proposed approach can robustly identify the rare failures and significantly reduce the evaluation time. I. INTRODUCTION The auto companies have been competing to get their automated vehicles (AVs) ready on road for years, yet there is still none available in the market. Partly, this is due to the challenging task of robustly testing and guaranteeing the safety of an AV before its release. Companies have been trying different methods such as road test [1], [2], computer simulation test [3] and human-vehicle interaction test [4], [5], yet providing safety certificate for an AV system is still open for solving [1]. Assisting the endeavors of solving this problem, the U.S Department of Transportation has released a new AV policy: A Vision for Safety 2.0 [6]. This official document standardizes the required safety features of an autonomous vehicle, providing guidance and clearer pathways for the various stakeholders aiming to certify the safety of their AV systems. However, even with this newly published official guideline, the testing standard remains unclear while the AV target release is quickly approaching. Thus, an effective and efficient testing method for an autonomous vehicle is an urgent need under this background. Traditional vehicle safety tests are based on crash databases collected from crashes or dangerous scenarios, such as the CSD and GIDAS crash databases [7]. However, the informa- tion logged in these databases is limited so that it is difficult to reconstruct and analyze the dangerous scenarios. As a result, *This work was funded by the Mobility Transformation Center at the University of Michigan with grant No. N021552. 1Department of Industrial and Operations Engineering, University of Michi- gan, Ann Arbor, MI 48109, USA. 2Robotics Institute, University of Michigan, Ann Arbor, MI 48109, USA. 3Department of Industrial Engineering and Operations Research, Columbia University, New York, NY 10027, USA. 4Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI 48109, USA. ∗Corresponding author. E-mail: zhaoding@umich.edu (Z. D.) the Naturalistic-Field Operational Test (N-FOT) has been ac- tively used instead in recent auto safety evaluation. An N-FOT logs naturalistic driving data such as the drivers maneuver, driving environment, and vehicle dynamics, providing detailed data for reconstructing the desired traffic scenario. Among these N-FOT records are the 100-Car Naturalistic Driving Study [8], the Safety Pilot Model Deployment (SPMD) [9], and Strategic Highway Research Program (SHRPII) [10]. To evaluate the safety of an AV algorithm, an N-FOT based safety evaluation tests the performance of driving policy under a certain sampled environment drawn from the databases. The recent research includes modeling driver failure into safety evaluation [11], [12] and analyzing the contributing factors in dangerous scenarios [13], [14], [15]. In these researches, the safety is evaluated under certain conditions, from which it is then generalized under various naturalistic driving conditions. While sampling from the N-FOT data is a safe method to imitate the reality, it turns out that such approach is deemed inefficient. According to National Highway Safety Administra- tion [16], the fatality rate is 1.08 per 100 million vehicle miles, which means it requires approximately 100 million miles data to generate a fatal crash in a simulation under natural conditions. This enormous simulation effort is indeed costly and time-consuming. Thus, an accelerated method is necessary to attain an effective and efficient auto safety evaluation. Accelerated Evaluation (AE) procedure is proposed in [17] to serve this notion of accelerated framework, and is further studied in [18], [19], [20]. This approach utilizes statistical models to represent the traffic environment and estimates the rate of safety-critical events in the test scenarios. Obtaining an unbiased estimator for these rates is accelerated by the use of Importance Sampling (IS), alleviating the needs for exten- sive simulation replications. The efficiency of this approach depends primarily on the choice of accelerated distribution, which needs to be constructed based on specific characteristics of the problem of interest. This paper extends the study of AE by proposing a new scheme for constructing accelerated distribution. The scheme merges supervised learning methods with rare event simulation and utilizes the properties of Gaussian distribution in IS theory. The proposed approach provides an alternative method for constructing accelerated distribution for AV test scenarios under Gaussian Mixture Models (GMM). The structure of the paper is as the following: Section II reviews the AE procedure and emphasizes the concepts relevant to the proposed approach. Section III presents the base knowledge and shows the procedure of the proposed ap- proach. Section IV showcases the performance of the proposed approach based on two problem instances. arXiv:1710.00283v1 [cs.LG] 1 Oct 2017 II. REVIEW OF ACCELERATED EVALUATION In this section, the AE procedure is reviewed in detail in the following order. Section II-A gives the high-level overview of the AE procedure, followed by Section II-B in which the general problem setting for the approach is described. To provide adequate knowledge regarding the statistical methods used, the IS theory, that supports the validity of AE procedure, will be concisely presented in Section II-C. A. Procedure For Accelerated Evaluation In evaluating the performance of an AV system, the AE pro- cedure suggests to examine the AV under various different test scenarios. The safety level of a vehicle is assessed by the rate of safety-critical events in the test scenarios. In this case, the goal of the AE is modeled as a probability estimation problem (described in section II-B). The AE procedure consists of four major steps [19]: • Model the behaviors of the traffic environment repre- sented by f(x) (original distribution) as the major dis- turbance to the AV using large-scale naturalistic driving data. • Skew the disturbance statistics from f(x) to modified statistics f ∗(x) (accelerated distribution) to generate more frequent and intense interactions between AVs and the traffic environment. • Conduct accelerated tests with f ∗(x). • Use the IS technique to skew back the results to under- stand real-world behavior and safety benefits. Note that the key part in the AE procedure is the use of IS technique, whose theory will be reviewed in Section II-C. The proposed approach provides an alternative in construct- ing the accelerated distribution in Step 2. In the following sec- tions, we focus on the construction of accelerated distribution. B. Evaluation Problem The AE evaluates testing vehicles by individually estimating the rate of safety-critical events under various different test scenarios. Here, we show the setting of the problem and the notations that we use in the following sections. In a test scenario, we assume that the uncertain environment traffic is represented by a vector X ∈X and is regarded as a stochastic model. Note that X is the domain of X. The distribution model f(X) for X is estimated from a sufficiently large set of naturalistic driving data. The safety- critical events occurs with certain values of X. We represent the set as ε = {X|X leads to a safety-critical event} and use an indicator function Iε(x) = ( 1 x ∈ε 0 otherwise. (1) to denote the response of a certain vector x. Iε(x) is usually complicated, but can be evaluated by on-track experiments or computer simulations. Generally, the safety-critical events are very rare (smaller than 10−6). For this reason, using crude Monte Carlo method to estimate the objective is time-consuming due to large sample size needed to observe a critical event, and even much larger samples to achieve statistical confidence. Therefore, we need to improve the efficiency of the evaluation. C. Importance Sampling IS [21], [22] is a technique to reduce the variance in sim- ulation, while providing unbiased estimation. This technique is crucial to the efficiency and unbiasedness of the AE. Consider a random vector X ∈X with distribution f(X) and a rare event set ε ⊂Ωon sample space Ω. Our goal is to estimate the probability of the rare event P(X ∈ε) = Ef[Iε(X)] = Z xf(x)dx, (2) where the event indicator function is defined as (1). Given samples X1, ..., XN independently and identically generated from f(x), the crude Monte Carlo computes the sample mean of Iε(x) ˆP(X ∈ε) = 1 N N X n=1 Iε(Xn). (3) The IS technique is derived from a change of sampling distribution as the following [21]: E[Iε(X)] = Z Iε(x)f(x)dx = Z Iε(x) f(x) f ∗(x)f ∗(x)dx, (4) where f ∗is a distribution density function that has the same support with f. Note that E∗[Iε(x) f(x) f ∗(x)] = Z Iε(x) f(x) f ∗(x)f ∗(x)dx, (5) where E∗denotes the expectation with regard to f ∗. Therefore by taking the sample mean of the expectation, ˆP(X ∈ε) = 1 N N X n=1 Iε(Xn) f(x) f ∗(x)f(x)dx (6) gives an unbiased estimator of the above expectation [21], where Xi’s are generated from f ∗. By appropriately selecting f ∗, the evaluation procedure obtains an estimation with smaller variance. We refer to f ∗as the accelerated distribution in our procedure. The construction of a “good” accelerated distribution f ∗is discussed in section III. III. CONSTRUCTING ACCELERATED DISTRIBUTION VIA CLASSIFICATION METHODS As discussed in [20], GMM is a powerful distribution model for representing traffic environment in AE. In this section, we propose a new approach to construct accelerated distributions for test scenarios under GMM setting. In Section III-A, we review the existing strategies for con- structing efficient accelerated distribution for GMM. Section III-B shows the key idea of our approach. Section III-C presents the procedure of the proposed approach. Fig. 1. The procedure of the proposed approach. Fig. 2. Scheme of constructing accelerated distribution for Gaussian Mixture Model with convex rare event set. A. Known Strategy for Efficient Accelerated Distribution When the random vector X follows Gaussian distribution with mean µ and covariance matrix Σ and the rare event set ε satisfies the convexity assumption, there is a simple scheme that obtains an efficient accelerated distribution [23], [24]. For a convex rare event set ε, we define the dominating point of ε on φ(x; µ, Σ) to be a∗= arg max a∈ε φ(a; µ, Σ), (7) where φ(x; µ, Σ) is the density function for Gaussian distri- bution with mean µ and covariance matrix Σ. By shifting the mean µ of the Gaussian distribution to a∗, we obtain an accelerated distribution that provides an efficient estimator for P(x ∈ε) [24]. Now we consider the GMM with the density f(x) = k X i=1 piφ(x; µi, Σi). (8) where k is the number of mixture components, pi is the proportion of the ith component, µi and Σi are the mean and covariance for the ith component. We use a∗ i to denote the dominating point with regard to the ith component and critical event set ε. An efficient accelerated distribution is then given by: f ∗(x) = k X i=1 piφ(x; ai, Σi). (9) Fig. 2 illustrates this scheme. Fig. 3. The procedure of constructing an accelerated distribution f∗. In practice, the critical event sets for automated vehicle evaluation are generally non-convex. Therefore, it is hard to apply this approach directly. B. Constructing Accelerated Distribution via Kernel Method We propose a procedure to construct an accelerated dis- tribution for GMM as illustrated in Fig. 3. The procedure utilizes the scheme in section III-A and uses kernel method to transform the data on a higher dimensional space. The Classification is used to obtain the linear boundaries for the critical event set. The following intuition motivates our approach: Suppose we have a feature function (see Appendix B) φ : X →Rm, where X ⊂Rn is the domain for X and n < m. We assume that there is a subspace on the feature space that is a linear transformation of the original space. (For instance, the feature function for polynomial kernel φ(x) = (x2, √ 2x, 1) satisfies the assumption.) We further assume that on the feature space, the critical event set has a linear boundary β′x + b = 0 and ˜f is a Gaussian mixture distribution that well fit the the model on the feature space. Thus, the scheme in section III-A is directly applicable. Let ˜f ∗be the accelerated distribution constructed on the feature space - we note that it is still a GMM. If the feature space contains the a linear transformation of the original space, we can easily obtain an accelerated distribution f ∗on the original space by linearly transforming the marginal distribution of ˜f ∗. Since the marginal of a Gaussian distribution is still a Gaussian distribution [25], f ∗ is still a Gaussian mixture distribution. To carry out the procedure described above, we need to find a feature function φ(x), the boundary β′x + b = 0 and the model ˜f. The selection of φ(x) depends on the shape of the critical event set, which is problem dependent. To obtain β′x+b = 0, we use classification methods, e.g. support vector machine (SVM), to learn for the linear boundaries. The learned critical event set is represented by ε = {x|β′x + b ≥0}. If f is a Gaussian distribution, the true distribution on the feature space is generally not Gaussian. In the proposed pro- cedure, we use a GMM ˜f to approximate the true distribution on the feature space. C. A Sequential Procedure for the Accelerated Evaluation To summarize the procedure of the proposed approach in section III-B, we present the step-by-step description and elab- orated discussion how to apply the approach in a sequential procedure. Suppose we have a GMM f(x) for random vector X and we are interested in E[Iε(X)]. The procedure is as follows: 1) Construct a training set with n samples, {(Xi, Yi)}n i=1, where Yi = Iε(Xi). We suggest to select Xi’s using space filling designs. 2) Select a feature function φ(x) and transform the training set as {(φ(Xi), Yi)}n i=1. We suggest to use the feature function for polynomial kernel and use a small param- eter d (d = 2 or d = 3). 3) Use a classification method with linear boundary on the training set {(φ(Xi), Yi)}n i=1 and obtain the linear boundary β′x + b = 0. We use SVM (see Appendix A) in the experiments. 4) Generate samples {Xj}m j=1 from the model f and trans- form the samples as {φ(Xj)}m j=1. 5) Select K as the number of mixture components. Use GMM with K mixtures to fit the samples {φ(Xj)}m j=1 and obtain the approximation model ˜f. The discussion about the selection of K refers to section IV-C 6) For each component in ˜f, find the dominating point using (7) and construct ˜f ∗using (9). 7) Adjust the marginal distribution of ˜f ∗and obtain f ∗. 8) Generate sample Xi’s from the distribution f ∗and use (6) to estimate the objective. 9) (Additional step) As new returns Iε(Xi) of Xi’s are obtained in step 8, update {(Xi, Yi)}n i=1 and go back to step 1. Note that if we use step 9, the procedure sequentially update the training set and the information of the critical event set. When the experiment Iε(x) is very time-consuming and expensive, we suggest to start with small sample size and add step 9 in the procedure. In the experiments in section IV, we do not use step 9. IV. NUMERICAL EXPERIMENTS In this section, we present simulation results to illustrate the procedure of the proposed approach and to show the applicability of the approach on AV evaluation. In section IV-A, we apply the proposed method on a simple problem to provide an intuition of the mechanism. Section IV-B shows the performance of the method in an AV test case. A. Simple Example Problem We use a simple example to discuss the implementation of the proposed method. The example is set to be a low dimensional problem for illustration purposes. Despite being Fig. 4. 1,000 samples from the distribution model of X with critical events labeled. Fig. 5. 1,000 samples from uniform distribution on the domain of X with critical events labeled. simple, the problem setting maintains some similarity with AV test scenarios. Assume we have a 2-dimensional random vector X ∈X, where X ∈R. Here we have X = {(x, y)|0 ≤x ≤5, 0 ≤y ≤ 5}. Suppose the distribution model f(X) of X is known and we have f(X) to be a multivariate Gaussian distribution with mean (1, 1) and identity covariance matrix I2. The critical event set is defined as ε = {(x, y)|x2 + y2 ≤0.22 or (x − 5)2 + (y −5)2 ≤1.52 or (x −3)2 + (y −5)2 ≤0.72 or (x − 5)2 +(y −3)2 ≤0.52}. The indicator function Iε(x) = 1(x ∈ ε) shows whether a vector x is in the critical event set. Our objective is to estimate the probability of the critical event, i.e. P(X ∈ε) = E[Iε(x)]. Fig. 4 shows 1,000 samples generated from the distribution model of X. We could observe that distribution concentrates on the area with relatively small value on both axes (consider that the upper bound for both coordinates is 5). Now, we start to explore the domain of the X and to learn the critical event set. We use 1,000 samples randomly generated using a uniform distribution on the domain and label these samples with regard to the critical events. Fig. 5 shows the samples and the return of the indicator function Iε(x). Let us denote these samples as Xn and the returns as Yn. It is obvious that the two types of events cannot be linearly classified and the critical event set is obviously not convex. Therefore if we want to construct an accelerated distribution for estimating the objective probability, we cannot directly use the scheme mentioned in section III-A. Fig. 6. The classification boundary obtained from linear support vector machine on the original space. Fig. 7. The estimation of the probability P(X ∈ε). We follow the procedure described in section III-C to construct an accelerated distribution for the problem. Here we use a polynomial kernel which maps vectors X = (x, y) in the original space to the feature space through the transformation φ(X) = φ(x, y) = (x2, y2, xy, x, y). We then use linear SVM to classify the transformed samples φ(Xn) with regard to the returns Yn. A linear boundary on the feature space is obtained as β′x + b = 0. In Fig. 6, we plot this boundary on the original space. We should note that the classification captures the property of the critical set. Next, we want to approximate the distribution model on the feature space. We generate extra 20,000 samples from the original distribution f(X), and transform the new generated samples with φ(X). To investigate the effect of the accuracy of the approximation model, we use two cases with different number of mixture for comparison, i.e. K1 = 20, K2 = 3. We fit Gaussian mixture distribution with K1 and K2 components on the transformed samples respectively. For each component in the mixture distribution, we find the dominant point and construct a sampling distribution on the feature space. Finally, we use the marginal distribution on the original space as the accelerated distribution. The simulation performance is shown in Fig. 7 and 8. In the figures, we use “AE Ki” to denote the proposed approach with Ki components in the approximation mixture distribution and “Crude MC” to denote the crude Monte Carlo approach. In Fig. 7, we could observe that the proposed approach attains stability within 2,000 samples. The crude Monte Carlo Fig. 8. The relative half width with 95% confidence of the estimation for P(X ∈ε). is obviously oscillating. This shows that the proposed approach provides better estimation using smaller sample size. This argument is further confirmed by Fig. 8. In Fig. 8, we plot the 95% relative half-width, which is given by w = α0.95 ˆσ √ N ˆP , (10) where α0.95 denotes the 0.95 quantile of the normal distribu- tion, N denotes the number of samples used in the estimation, ˆσ denotes the standard deviation of the sampled objective, and ˆP denotes the estimation of the objective. The 95% relative half-width for the crude Monte Carlo is much larger than the proposed approaches. To reach the same level of 95% relative half-width as the proposed approaches, the crude Monte Carlo requires roughly 100 times more samples. From the performance of this simple example, we shall note that the proposed approach provides good accelerated distributions. By comparing the performance with K1 and K2, we conclude that a more accurate approximation model on the feature space leads to a more efficient accelerated distribution. B. Lane Change Scenario The lane change model for AE is proposed by [17]. The setting is illustrated in Fig. 9, which shows that a leading human driving vehicle is conducting a lane change in front of an automated vehicle. Our objective is to evaluate the rate of safety-critical events in this scenario, where safety-critical events includes conflict, collision, injury, etc. In this example, we are interested in the rate of collision during the lane change procedure. We assume that the initial status of the two vehicles is captured by a vector X = (v, ˙R, R−1) ∈X, which consists of three important parameters: v, the velocity of the leading vehicle, ˙R, the relative range of the two vehicles, and R−1, the inverse of the range between the two vehicles. We assume that X is stochastic and the distribution of X, f(X), is a known GMM. Given X, the lane change procedure is supposed to be deterministic with regard to the testing vehicle. We use ε ⊂X to denote the critical event set and our objective is represented by P(X ∈ε). In this example, we use the vehicle control model shown in Fig. 9 as the testing vehicle. The model consists of Adaptive Fig. 9. The setting of the lane change scenario and the related parameters. Fig. 10. The estimation of the probability P(X ∈ε). Cruise Control (ACC) and Autonomous Emergency Braking (AEB) [26]. (In the figure, TTC is defined as TTC = R/ ˙R. TTCAEB is a threshold that triggers the AEB system.) We apply the proposed approach to obtain accelerated distributions. We construct a training set with 20, 000 samples by using a grid on the domain (v and ˙R are bounded; R−1 is unbounded, we sample between the minimum and maximum data in a naturalistic driving data). In this case, we use a poly- nomial kernel with feature function φ(X) = φ(v, ˙R, R−1) = (v2, ˙R2, (R−1)2, v ˙R, vR−1, ˙RR−1, v, ˙R, R−1) to map the model onto a higher dimensional space. Again, we use differ- ent value, K1 = 8 and K2 = 20, for the number of mixtures of the approximation model in the feature space. Fig. 10 presents the estimation results of the objective, P(X ∈ε). We observe that the estimation converges as we increase the number of samples. In this case, using different value for the number of mixtures for the approximation model does not result in significant difference in the resulting estimation. To illustrate the efficiency of the proposed method, we will compare the performance of the proposed method with crude Monte Carlo. Since the probability we are estimating is very small, running crude Monte Carlo would require a huge computing time. Here we use the probability we estimated to approximate the number of samples required for the crude Monte Carlo. Using the formula for the standard deviation of the crude Monte Carlo estimation ˆP(x ∈ε) by std( ˆP(x ∈ε)) = s ˆP(x ∈ε)(1 −ˆP(x ∈ε)) n , (11) TABLE I COMPARISON BETWEEN THE PROPOSED APPROACH AND CRUDE MONTE CARLO. Standard Deviation Required Samples Proposed AE 2 × 10−7 2 × 105 Crude MC 5 × 10−5 1.2 × 1010 we found that the crude Monte Carlo method requires roughly 6 × 104 times more samples to reach the sample confidence level as the proposed approach. Table I summarizes the comparison of the proposed AE with crude Monte Carlo. Column 2 shows the approximated standard deviation of the estimators. Column 3 presents the approximated number of the required samples to obtain an estimation with a 95% confidence relative half-width less than 0.4. Note that the proposed AE uses an extra of 2×104 samples to learn the critical event set and construct the accelerated distribution. These samples are much smaller than the scale of the required samples for estimation. C. Discussion on Implementation In the implementation of the proposed approach, there are some open choices in the procedure. We know from the experiment that the efficiency of the approach is largely influenced by these choices. Here, we discuss how they are related to the efficiency of the method. To obtain an efficient accelerated distribution, we want to achieve two tasks: a) the classification must learn the property of the critical event set, at least a rough bound needs to be found; b) the approximation on the feature space needs to roughly capture the shape of the transformed distribution. We discuss these two tasks separately. In the classification step, we need to construct a train set and choose a kernel function. Since the collection of train data requires test experiment, the size of the training set need to be balanced. In practice, to select a reasonable number requires prior knowledge of the test scenario. When such knowledge is not available, we suggest to start with a small size of data that includes some critical events and then follow the sequential procedure described in section III-C. Given the number of data in the training set, we can use a deterministic design (e.g. use a grid) or a random design (e.g. uniformly sample), as long as samples fill the domain well. The kernel functions to select need to have an explicit fea- ture function. We suggest to use polynomial kernel. A feature space with higher dimension would achieve better accuracy for the critical event learning, however, to approximate the distribution model on the higher dimension is generally harder. For this reason, we want the dimension of the feature space to be as small as possible. For approximating the distribution model on the feature space, we need to choose the number of mixture components. Although a larger number of components always provides a better fitting, the efficiency of the constructed accelerated distribution might not always improve. We suggest to use smaller number of components for less computing efforts. For high dimensional feature space, regularization should be applied in fitting the approximation model to make the fitting algorithm converge in fewer iterations. The selection of regularization parameter is suggested to choose through cross validation [27]. APPENDIX A. Classification Methods for Critical Event Set Learning Here we review SVM as an example of classification methods that suits the proposed approach. Please refer to [27] for more details. SVM is a popular algorithm for classification and regres- sion. Here we briefly introduce the SVM with hard margin for binary classification. Denote the training dataset as (xi, ti), i = 1, 2, 3, ..., n, where xi is the feature vector and ti ∈{+1, −1} is the label; suppose the data xi is linear separable. The SVM returns the linear classifier of the form: y(x) = β′x + b where β′ is the coefficient of the hyper plane in the feature space; b is the bias; y(xi) > 0 for ti = 1; y(xi) < 0 for ti = −1. SVM wants to find the hyperplane to separate the points while maximizing the minimum distance between the hyperplane and the nearest points xi from either side. This can be formulated with the constraints as arg max β,b { 1 ∥β∥arg min n ti(β′xi + b)} ti(β′xi + b) > 0, i = 1, 2, 3, ..., n A common solution to SVM is to use the Lagrange Multiplier, which can be found in [27]. Note that other classification methods, e.g. logistic regres- sion, also provide a linear boundary and are compatible with kernel method. These methods also fits the proposed approach. B. Kernel Method for Critical Event Set Learning The kernel is a generalized inner product, computing the inner product of two vectors in a higher dimensional space without explicitly define the feature function. Here we intro- duce the feature function, and more detail on kernel method can be found in [28] .The feature function is defined as φ : X →Rm, where X ⊂Rn is the domain for input xi, i = 1, 2, 3, ..., n, and n < m. A benefit of introducing feature space is non-separable data can be separable in some feature spaces. For instance, consider two sets on a 2-D plane: S1 = {(x, y) | x2 + y2 < 3} ⊂R2 and S2 = {(x, y) | x2 + y2 > 3} ⊂R2. There is no line can separate S1 from S2 since S1 is surrounded by S2. But if we map the point (x, y) on R2 to R3 by a feature function φ(x, y) = (x, y, x2 + y2), the mapped sets are linearly separable. Denoting the original sets S1, S2 in the feature space as Φ(S1) = {φ(x) | x ∈S1} and Φ(S2) = {φ(x) | x ∈S2}, then Φ(S1) and Φ(S2) are linearly separable by the plane z = 3 in R3. REFERENCES [1] L. Wardle, “Self-driving ubers in pittsburgh: What’s changed after 1 year on the roads?,” Sep 2017. [2] P. Newman, “Lyft to test autonomous cars in san francisco,” Sep 2017. [3] A. C. Madrigal, “Inside waymo’s secret world for training self-driving cars,” Aug 2017. [4] S. Petters, “Ford tests communication signals for cavs.” [5] A. Toor and T. Warren, “Domino’s and ford will test self-driving pizza delivery cars in michigan,” Aug 2017. [6] S. James, “U.s. dot releases new automated driving systems guidance,” Sep 2017. [7] D. Zhao, Y. Guo, and Y. J. Jia, “Trafficnet: An open naturalistic driving scenario library,” arXiv preprint arXiv:1708.01872, 2017. [8] V. L. Neale, T. A. Dingus, S. G. Klauer, J. Sudweeks, and M. Goodman, “An overview of the 100-car naturalistic study and findings,” National Highway Traffic Safety Administration, Paper, no. 05-0400, 2005. [9] U. D. of Transportation, “Safety pilot model deployment data.” [10] V. Punzo, M. T. Borzacchiello, and B. Ciuffo, “On the assessment of vehicle trajectory data accuracy and application to the next generation simulation (ngsim) program data,” Transportation Research Part C: Emerging Technologies, vol. 19, no. 6, pp. 1243–1262, 2011. [11] C. Chai, Y. D. Wong, and X. Wang, “Safety evaluation of driver cognitive failures and driving errors on right-turn filtering movement at signalized road intersections based on fuzzy cellular automata (fca) model,” Accident Analysis and Prevention, vol. 104, pp. 156 – 164, 2017. [12] J. Kovaceva, A. B´alint, and H. Fagerlind, “Contributing factors to car crashes related to driver inattention for different age groups,” in International Conference on Driver Distraction and Inattention, 4th, 2015, Sydney, New South Wales, Australia, no. 15257, 2015. [13] T. A. Dingus, F. Guo, S. Lee, J. F. Antin, M. Perez, M. Buchanan- King, and J. Hankey, “Driver crash risk factors and prevalence evaluation using naturalistic driving data,” Proceedings of the National Academy of Sciences, vol. 113, no. 10, pp. 2636–2641, 2016. [14] F. Bella, A. Calvi, and F. D’Amico, “Analysis of driver speeds under night driving conditions using a driving simulator,” Journal of safety research, vol. 49, pp. 45–e1, 2014. [15] X. Qu, Y. Kuang, E. Oh, and S. Jin, “Safety evaluation for expressways: a comparative study for macroscopic and microscopic indicators,” Traffic injury prevention, vol. 15, no. 1, pp. 89–93, 2014. [16] T. S. Facts, “Data (2014),” National Highway Traffic Safety Administra- tion, 2014. [17] D. Zhao, H. Lam, H. Peng, S. Bao, D. J. LeBlanc, K. Nobukawa, and C. S. Pan, “Accelerated evaluation of automated vehicles safety in lane-change scenarios based on importance sampling techniques,” IEEE transactions on intelligent transportation systems, vol. 18, no. 3, pp. 595–607, 2017. [18] D. Zhao, X. Huang, H. Peng, H. Lam, and D. J. LeBlanc, “Accelerated evaluation of automated vehicles in car-following maneuvers,” IEEE Transactions on Intelligent Transportation Systems, 2017. [19] Z. Huang, D. Zhao, H. Lam, and D. J. LeBlanc, “Accelerated evaluation of automated vehicles using piecewise mixture models,” arXiv preprint arXiv:1701.08915, 2017. [20] Z. Huang, H. Lam, and D. Zhao, “An accelerated testing approach for automated vehicles with background traffic described by joint distributions,” arXiv preprint arXiv:1707.04896, 2017. [21] S. Asmussen and P. W. Glynn, Stochastic simulation: algorithms and analysis, vol. 57. Springer Science & Business Media, 2007. [22] S. Ross, Simulation. Academic Press, 2013. [23] J. S. Sadowsky et al., “On monte carlo estimation of large deviations probabilities,” The Annals of Applied Probability, vol. 6, no. 2, pp. 399– 422, 1996. [24] J. S. Sadowsky and J. A. Bucklew, “On large deviations theory and asymptotically efficient monte carlo estimation,” IEEE transactions on Information Theory, vol. 36, no. 3, pp. 579–588, 1990. [25] A. Papoulis and S. U. Pillai, Probability, random variables, and stochas- tic processes. Tata McGraw-Hill Education, 2002. [26] A. G. Ulsoy, H. Peng, and M. C¸ akmakci, Automotive control systems. Cambridge University Press, 2012. [27] C. M. Bishop, Pattern recognition and machine learning. springer, 2006. [28] K. P. Murphy, Machine learning: a probabilistic perspective. MIT press, 2012.