A Probabilistic Representation for Dynamic Movement Primitives Franziska Meier1,2 and Stefan Schaal1,2 1CLMC Lab, University of Southern California, Los Angeles, USA 2 Autonomous Motion Department, MPI for Intelligent Systems, T¨ubingen, Germany ∗ Abstract Dynamic Movement Primitives have successfully been used to realize imitation learning, trial-and-error learning, reinforce- ment learning, movement recognition and segmentation and control. Because of this they have become a popular represen- tation for motor primitives. In this work, we showcase how DMPs can be reformulated as a probabilistic linear dynamical system with control inputs. Through this probabilistic repre- sentation of DMPs, algorithms such as Kalman filtering and smoothing are directly applicable to perform inference on pro- prioceptive sensor measurements during execution. We show that inference in this probabilistic model automatically leads to a feedback term to online modulate the execution of a DMP. Furthermore, we show how inference allows us to measure the likelihood that we are successfully executing a given motion primitive. In this context, we show initial results of using the probabilistic model to detect execution failures on a simulated movement primitive dataset. Introduction One of the main challenges towards autonomous robots re- mains autonomous motion generation. A key observation has been that in certain environments, such as households, tasks that need to be executed tend to contain very repetitive behav- iors (Tenorth, Bandouch, and Beetz 2009). Thus, the idea of identifying motion primitives to form building blocks for mo- tion generation has become very popular (Ijspeert et al. 2013; Paraschos et al. 2013). A popular and effective way to learn these motion primitives is through imitation learning (Schaal 1999). Part of the research in this area is concerned with mo- tion representation, and a variety of options have been pro- posed (Ijspeert et al. 2013; Khansari-Zadeh and Billard 2010; Dragan et al. 2015; Wilson and Bobick 1999; Paraschos et al. 2013). However, recently, there has been interest in going beyond pure motion representation. In the context of closing action-perception loops, motor skill representations that al- low to close that loop are attractive. The ability to feed back ∗This research was supported in part by National Science Foun- dation grants IIS-1205249, IIS-1017134, EECS-0926052, the Office of Naval Research, the Okawa Foundation, and the Max-Planck- Society. Copyright © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. sensory information during the execution of a motion primi- tives promises a less error prone motion execution framework (Pastor et al. 2011). Furthermore, the ability to associate a memory of how a motor skill should feel during execution can greatly enhance robustness in execution. This concept of en- hancing a motor skill representation with a sensory memory has been termed associative skill memories (ASMs)(Pastor et al. 2012). The idea of associative skill memories lies in leveraging the repeatability of motor skills to associate experienced sensor information with the motion primitive. This allows to adapt previously learned motor skills when this primitive is being executed again and deviates from previously experienced sensor information (Pastor et al. 2011). Thus far, existing approaches (Pastor et al. 2011; Gams et al. 2014) use the concept of coupling terms to incorporate sensor feedback into DMPs which typically involves setting task-dependent gains to modulate the effect of the feedback term. In this work we endow Dynamic Movement Primitives with a probabilistic representation, that maintains their original func- tionality, but allows for uncertainty propagation. Inference in this probabilistic DMP model is realized through Kalman filtering and smoothing. An interesting component of prob- abilistic DMPs is the fact that – given a reference signal to track – sensor feedback is automatically considered when executing a desired behavior. As a result, the feedback term is an intrinsic part of this probabilistic formulation. Similar to Kalman filtering, the feedback term is now scaled based on a uncertainty based gain matrix. Besides this insight, we also highlight the benefits of combining a probabilistic mo- tion primitive representation with a strong structural prior – which is given by the DMP framework in this work. We demonstrate the advantage of this by showing how to make use of the probabilistic model to perform failure detection when executing a motion primitive. This paper is organized as follows: We start by reviewing the DMP framework and related probabilistic motion representa- tions in the background section. Then in the main section we propose a graphical model representation for DMPs, detail the learning procedure and illustrate their usage. Finally, we discuss how of failure detection can be realized with proba- bilistic DMPs and evaluate it in that context. arXiv:1612.05932v1 [cs.RO] 18 Dec 2016 Background Dynamic Movement Primitives (DMPs) encode a desired movement trajectory in terms of the attractor dynamics of nonlinear differential equations (Ijspeert et al. 2013). For a 1 DOF system, the equations are given as: 1 τ ˙z = αz(βz(g−p)−z)+sf(x) 1 τ ˙p = z (1) such that p, ˙p, ¨p = ˙z are position, velocity, and acceleration of the movement trajectory, where f(x) = N ∑ i=1 ψiwix N ∑ i=1 ψi , with ψi = exp(−hi(x−ci)2) with 1 τ ˙x = −αxx and s = g−p0 gfit −p0,fit = g−p0 ∆g In general, it is assumed that the duration τ and goal position g are known. Thus, given τ and g the DMP is parametrized through weights w = (w1,...,wN)T which are learned to rep- resent the shape of any smooth movement. During this fitting process, the scaling variable s is set to one, and the value of ∆g is stored as a constant for the DMP. Probabilistic Motion Primitive Representations The benefits of taking a probabilistic approach to motion representation has been discussed by a variety of authors (Toussaint 2009; R¨uckert et al. 2013; Meier et al. 2011; Calinon et al. 2012; Paraschos et al. 2013; Khansari-Zadeh and Billard 2011). The use of probabilistic models varies however. For instance, the approaches presented in (Cali- non et al. 2012; Khansari-Zadeh and Billard 2011) take a dynamical systems view that utilizes statistical methods to encode variability of the motion. In contrast, (Toussaint 2009; R¨uckert et al. 2013) take a trajectory optimization approach using a probabilistic planning system. Here, as outlined in the introduction, we take the dynami- cal systems view, as a first step towards an implementation of associate skill memories. In previous work (Meier et al. 2011) we have shown how we can reformulate the DMP equa- tions into a linear dynamical system (Bishop 2006). Inference and learning in this linear dynamical system was formulated as a Kalman filtering/smoothing approach. This Kalman fil- ter view of DMPs allowed us to perform online movement recognition (Meier et al. 2011) and segmentation of complex motor skills into underlying primitives (Meier, Theodorou, and Schaal 2012). Finally, compared to previous work on probabilistic motion primitives, such as (Calinon et al. 2012; Paraschos et al. 2013), this representation explicitly represents the dynami- cal system structure as a dynamic graphical model and adds the possibility of considering sensor feedback as part of the inference process. Probabilistic Dynamic Movement Primitives In this section we introduce a new Probabilistic Dynamic Movement Primitive model. The goal of this work is to es- sentially provide a probabilistic model that can replace the standard non-probabilistic representation without loss in func- tionality. Thus, here we aim at deriving a graphical model that explicitly maintains positions, velocities and accelera- tions, such that a rollout of that model creates a full desired trajectory. Deriving the Probabilistic DMP Model We start out by deriving the new formulation and then show how learning of the new probabilistic DMPs is performed. The transformation system of a 1-DOF DMP can be dis- cretized via Euler discretization, resulting in ˙zt = τ(αz(βz(g−pt−1)−zt−1)+ f) (2) zt = ˙zt∆t +zt−1 (3) ¨pt = τ ˙zt (4) ˙pt = τzt (5) pt = ˙pt∆t + pt−1 (6) where ∆t is the integration step size, and pt, ˙pt and ¨pt are position velocity and acceleration at time step t. by plugging in Equations 2, 3 into Equations 4, 5 and setting ˙zt = 1 τ ¨pt and zt = 1 τ ˙pt this can be reduced to ¨pt = τ2(αz(βz(g−pt−1)−1 τ ˙pt−1)+ f) (7) ˙pt = ¨pt∆t + ˙pt−1 (8) pt = ˙pt∆t + pt−1 (9) By collecting pt, ˙pt and ¨pt into state st = ( ¨pt ˙pt pt)T we we can summarize this linear system of equations as st = Ast−1 +But (10) where the control input is given as ut = αzβzg+sft and the transition matrix A and control matrix B are given as A = ⎛ ⎜ ⎝ 0 −αzτ −αzβzτ2 ∆t 1.0 0 0 ∆t 1.0 ⎞ ⎟ ⎠ and B = ⎛ ⎝ 1 0 0 ⎞ ⎠ We like to account for two sources of uncertainty: transition noise, modeling any uncertainty of transitioning from one state to the next one; and observation noise, modeling noisy sensory measurements. Next we show how to incorporate both of these to arrive at the full probabilistic model. A standard approach to modeling transition noise would be to assume additive, zero mean, Gaussian noise, eg st = Ast−1 +But−1 +ε (11) with ε = N(ε ∣0,Q) which would create a time-independent transition uncertainty. Here, however, we would like to model the transition uncertainty as a function of how certain we are about our non-linear forcing term f. Remember, f represents the shape of the motion primitive and is typically trained via imitation learning. Assuming that we several demonstrations to learn from, we can estimate the mean forcing term f from the demonstrations, but also the hvariance, using Bayesian re- gression. Thus, for now, we assume that we have a predictive distribution over ft, the non-linear forcing term at time step t ft ∼N(ft ∣µft,σ2 t ) (12) with mean µft and variance σ2 ftt. The details of deriving this distribution are given in the next Subsection . Assuming ft to be drawn from a Gaussian distribution, automatically implies that the hidden state st is also Gaussian distributed st ∼N(st ∣Ast−1 +B(αzβzg+sµft),Qt) (13) where Qt = s2Bσ2 ftBT. Thus, if the variance of the distribution over ft is time-dependent, so is the state transition noise. Finally, we also want to be able to include noisy sensor mea- surements in our model. Thus, we assume that we receive observations ot that are a function of the hidden state, cor- rupted by zero mean Gaussian noise ot ∼h(st)+N(v ∣0,R) (14) A simple example is for instance, the feedback on the ac- tual position of the system. In this case, the observation function would be h(st) = Cst. with observation matrix C = (0 0 1)T. Finally, putting it all together, our probabilistic formulation takes to form of a controlled linear dynamical system, with time dependent transition noise, and time-independent obser- vation noise: st = Ast−1 +B(αzβzg+sµft)+εt ot = h(st)+v with εt ∼N(εt ∣0,s2Bσ2 ftBT) and v ∼N(v ∣0,R). Note, for clarity, we have derived the probabilistic model for a specific DMP variant (Ijspeert et al. 2013). However, the same procedure can be followed to arrive at a dynamic graphical model representation for other variants of motion primitives, such as (Pastor et al. 2009). Learning Probabilistic DMPs Probabilistic Dynamic Movement Primitives can be learned through imitation learning similar to regular DMPs. Given K demonstrations Pk demo of the same motion primitive, it is possible to estimate the distribution over the non-linear function term f via Bayesian regression (Bishop 2006). The standard approach to estimating the noise covariance R of a linear dynamical system is based on the expectation- maximization(EM) procedure. The EM algorithm iterates between estimating the posterior of the hidden state of the motion primitive, and maximizing the expected complete log likelihood with respect to the parameter of interest. The complete data log likelihood for K demonstrations is given by: ln p(P,S∣τ,g) = K ∑ k=1 T ∑ t=1 lnN(ok t ∣Csk t ,R)+lnN(sk 1 ∣0,Q0) + K ∑ k=1 T ∑ t=2 lnN(sk t ∣Ask t−1 +Buk t−1,Qt) (15) 1: procedure ROLLOUT(w, Q, R) 2: for t = 1 ∶T do 3: µp, t = Aµu, t-1 +But−1 4: V p, t = AV u, t-1AT +Q 5: end for 6: end procedure Figure 1: Probabilistic DMP: rollout 1: procedure EXECUTEANDMONITOR(w, Q, R) 2: for t = 1 ∶T do 3: µp, t = Aµu, t-1 +But−1 4: V p, t = AV u, t-1AT +Q 5: S = CV p, tCT +R 6: K = V p, tCT S−1 7: µu, t = µp, t +K(ot −CT µp, t) 8: V u, t = V p, t −KCV p, t 9: end for 10: return loglik 11: end procedure Figure 2: Probabilistic DMP: executing and tracking a reference signal Taking the expectation of the complete-data log likelihood with respect to the posterior p(S∣P,θ old) defines the function Q(θ,θ old) = ES∣θold[ln p(P,S∣θ)], which we maximize with respect to parameters θ. The up- dates for R can now be derived by setting the derivative of this function to 0, and then solve for R which can be done analytically. This completes the learning of a proba- bilistic dynamic movement primitive, where the parameters required to fully describe the probabilistic representation are θprimitive = {µw,Σw,α,β,R}. Executing a probabilistic DMP Dynamic Movement Primitives are typically used to gener- ate desired trajectories that a controller is expected to track. Note, the hidden state of the probabilistic DMP is given by st = ( ¨pt ˙pt pt) Pure feedforward trajectory generation is achieved by initializing the linear dynamical system with the task parameters (goal, start and duration of the motion) and then simply unrolling the probabilistic model (see Algo- rithm 1) - this will generate exactly the same desired trajec- tory as a standard DMP, including uncertainty estimates. As discussed above, we can also consider noisy sensor feed- back when executing a probabilistic DMP. Instead of simply forward predicting the hidden state, inference is performed to estimate the hidden state distribution. For linear dynamical systems this inference process is widely known as Kalman filtering. In order to do so, we need to formulate how the hidden state st generates the chosen sensor measurements ot. This is done by defining the observation function h(st) that creates the reference signal we want to track by transforming the hidden state. Assuming, we observe the actual position of the system, this leads to following inference steps: At time step t, the system feedforward predicts the new desired (hidden) state µp,t µp,t = Aµt−1 +But−1 (16) 0 50 100 150 200 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 time x position primitive not perturbed 0 50 100 150 200 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 time x position primitive perturbed 0 50 100 150 200 −30 −20 −10 0 time loglik not perturbed perturbed Figure 3: Failure detection illustration: In the (left) and (middle) plot the mean trajectory of a learned primitive with the variance at each time step. Additionally: (left) the green trajectory illustrates a non-perturbed observed trajectory that slightly differs from the mean. (middle) The same trajectory but perturbed. (right) The log-likelihood of both observed trajectories calculated online while trajectories were unrolled. then the Kalman innovation update, adds a term proportional to how much the reference signal deviates from the observed sensor feedback µt = µp,t +K(ot −h(st)). (17) Thus at time step t +1 the desired behavior of the DMP is online modulated µp,t+1 = A(µp,t +K(ot −h(st))+But (18) = Aµp,t +But +AK(ot −h(st)) (19) Note, the similarity of this update to the online modulation performed in (Pastor et al. 2011). To summarize, the execu- tion of a probabilistic DMP with noisy measurement observa- tions is performed via Kalman filtering (see Algorithm 2) and automatically leads to online adaptation of desired behaviors to account for disturbances. Summary In summary, the main characteristics of our probabilistic DMP representation are: • It is a probabilistic model that keep accelerations part of the hidden state - this allows to execute probabilistic DMPs just as regular DMPs - where a rollout of the hidden states produces desired accelerations, velocities and positions. • The non-linear forcing term is modeled probabilisti- cally with phase-dependent variability - creating a phase- dependent transition covariance in the linear dynamical system view. • A reference signal can be tracked as part of the probabilis- tic model - which creates a principled way of modulating the desired behavior online, when the sensor feedback deviates from the reference signal. Failure Detection with Probabilistic DMPs Besides the insight presented above, other benefits of this probabilistic formulation exist. For instance, assuming we have learned a probabilistic representation for a motion prim- itive {µw,Σw,α,β,R} we can perform online failure detec- tion: While executing the motion primitive using Algorithm 2, we can utilize the probabilistic formulation to continuously monitor how likely it is that this motion primitive is generat- ing the measured actual state of the system. We illustrate this application in Figure 3. This illustration shows a 1D primitive being executed, first perturbation free (right), and then we artificially hold the movement such that the primitive is not continued to executed but the probabilistic model expects it to (middle). On the left hand side we see how the likelihood values evolve during movement execu- tion. Note, how once we artificially hold the movement the likelihood value significantly drops. Thus, we can use this likelihood measure to detect execution failures. For initial quantitative evaluation purposes we recorded a dataset of 2D trajectories of letters with a digitizing tablet. All letters that are easily written with one stroke have been recorded, a total of 22. Each of this letter is meant to represent a movement primitive. To learn a probabilistic representation per primitive we collected 10 training demonstrations and an additional 10 demonstrations for testing purposes. Once a probabilistic representation of a motion primitive has been learned, we measure the minimum likelihood of each training demonstration given the learned model param- eters, and store that value with the parameters. Throughout the execution of a motion primitive the loglikelihood value might increase or decrease depending on how much variation from the mean we observe. The challenge is thus to detect natural variation from a perturbation and/or failure. Here we simply classify an execution as failed, if at any point during execution the loglikelihood values drops below 2 times the minimum reported value for that primitive. On our data this works very well, such that of all 22∗10 = 220 test cases only 2 test cases where classified as failed when not perturbed. When we perturb the simulated execution of each test case by artificially blocking the execution (as illustrated in Figure 3), then all 220 test cases are classified as failed. Conclusions We have presented a probabilistic model for dynamic move- ment primitives. Inference in this graphical model is equiv- alent to Kalman filtering, and when performing inference a feedback term is automatically added to the DMP trajectory generation process. Besides this insight, we have shown the potential of probabilistic DMPs on the application of failure detection. Future work will explore and evaluate the potential impact of these probabilistic representation in more detail. References [Bishop 2006] Bishop, C. 2006. Pattern recognition and machine learning. [Calinon et al. 2012] Calinon, S.; Li, Z.; Alizadeh, T.; Tsagarakis, N. G.; and Caldwell, D. G. 2012. Statistical dynamical systems for skills acquisition in humanoids. In International Conference on Humanoid Robots. [Dragan et al. 2015] Dragan, A. D.; Muelling, K.; Bagnell, J. a.; and Srinivasa, S. S. 2015. Movement Primitives via Optimization. In International Conference on Robotics and Automation. [Gams et al. 2014] Gams, A.; Nemec, B.; Ijspeert, A. J.; and Ude, A. 2014. Coupling movement primitives: Interaction with the environment and bimanual tasks. IEEE Transactions on Robotics. [Ijspeert et al. 2013] Ijspeert, A. J.; Ijspeert, A. J.; Nakanishi, J.; Nakanishi, J.; Hoffmann, H.; Hoffmann, H.; Pastor, P.; Pastor, P.; Schaal, S.; and Schaal, S. 2013. Dynamical movement primitives: learning attractor models for motor behaviors. Neural Computation. [Khansari-Zadeh and Billard 2010] Khansari-Zadeh, S. M., and Billard, A. 2010. Imitation learning of globally sta- ble non-linear point-to-point robot motions using nonlinear programming. In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, 2676–2683. IEEE. [Khansari-Zadeh and Billard 2011] Khansari-Zadeh, S. M., and Billard, A. 2011. Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Transactions on Robotics 27(5):943–957. [Meier et al. 2011] Meier, F.; Theodorou, E.; Stulp, F.; and Schaal, S. 2011. Movement segmentation using a primitive library. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011). [Meier, Theodorou, and Schaal 2012] Meier, F.; Theodorou, E.; and Schaal, S. 2012. Movement segmentation and recog- nition for imitation learning. In International Conference on Artificial Intelligence and Statistics, 761–769. [Paraschos et al. 2013] Paraschos, A.; Daniel, C.; Peters, J. R.; and Neumann, G. 2013. Probabilistic movement primitives. In Advances in neural information processing systems, 2616–2624. [Pastor et al. 2009] Pastor, P.; Hoffmann, H.; Asfour, T.; and Schaal, S. 2009. Learning and generalization of motor skills by learning from demonstration. In Robotics and Automation, 2009. ICRA’09. IEEE International Conference on, 763–768. IEEE. [Pastor et al. 2011] Pastor, P.; Schaal, S.; Pastor, P.; Righetti, L.; Kalakrishnan, M.; Righetti, L.; Kalakrishnan, M.; and Schaal, S. 2011. Online movement adaptation based on previous sensor experiences. In International Conference on Intelligent Robots and Systems. [Pastor et al. 2012] Pastor, P.; Kalakrishnan, M.; Meier, F.; Stulp, F.; Buchli, J.; Theodorou, E.; and Schaal, S. 2012. From dynamic movement primitives to associative skill mem- ories. Robotics and Autonomous Systems. [R¨uckert et al. 2013] R¨uckert, E. A.; Neumann, G.; Toussaint, M.; and Maass, W. 2013. Learned graphical models for proba- bilistic planning provide a new class of movement primitives. Frontiers in computational neuroscience 6:97. [Schaal 1999] Schaal, S. 1999. Is imitation learning the route to humanoid robots? 233–242. [Tenorth, Bandouch, and Beetz 2009] Tenorth, M.; Ban- douch, J.; and Beetz, M. 2009. The tum kitchen data set of everyday manipulation activities for motion tracking and action recognition. In Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, 1089–1096. IEEE. [Toussaint 2009] Toussaint, M. 2009. Probabilistic infer- ence as a model of planned behavior. K¨unstliche Intelligenz 3(9):23–29. [Wilson and Bobick 1999] Wilson, A. D., and Bobick, A. F. 1999. Parametric hidden Markov models for gesture recogni- tion. IEEE Transactions on Pattern Analysis and Machine Intelligence.