A Receding Horizon Push Recovery Strategy for Balancing the iCub Humanoid Robot Stefano Dafarra, Francesco Romano and Francesco Nori iCub Facility Department, Istituto Italiano di Tecnologia, 16163 Genova, Italy, name.surname@iit.it Abstract. Balancing and reacting to strong and unexpected pushes is a critical requirement for humanoid robots. We recently designed a capture point based approach which interfaces with a momentum-based torque controller and we implemented and validated it on the iCub humanoid robot. In this work we implement a Receding Horizon control, also known as Model Predictive Control, to add the possibility to predict the future evolution of the robot, especially the constraints switching given by the hybrid nature of the system. We prove that the proposed MPC extension makes the step-recovery controller more robust and reliable when exe- cuting the recovery strategy. Experiments in simulation show the results of the proposed approach. Keywords: receding horizon, push recovery, iCub humanoid robot This work was supported by the FP7 EU project CoDyCo (No. 600716 ICT 2011.2.1 Cognitive Systems and Robotics) and Horizon 2020 EU project An.Dy. (No. 731540 Research and Innovation Programme) 1 Introduction A humanoid robot is generally built with the final intent of having a machine which can help humans performing boring or dangerous tasks. The adoption of a two legged design ideally allows the robot to share the same environment of humans. On the other hand, embedding relatively simple motor skills on such complicated machines is not a straightforward task. A recent trend consists in facing this problem through the adoption of optimal control techniques, like Model Predictive Controllers (MPC). The peculiarity of MPC resides in the possibility of inserting a feedback from the plant into the optimization procedure, through the “Receding Horizon Principle” [1]. Changes in the constraints set, e.g. when walking, result in a different evolu- tion of the constrained dynamical system, making the overall system hybrid [2]. Model Predictive Controllers are appealing for controlling hybrid systems [3]. In fact, while predicting future states, MPC allow to cope with both time- and state- dependent switching, eventually undertaking actions to prepare for impending dynamics variations. Nevertheless, MPC is affected by the same problematics related to hybrid systems integration, which are indeed open research problems. arXiv:1705.10638v1 [cs.RO] 30 May 2017 2 Dafarra et al. In literature the complexity of the robot dynamics is usually boiled down to simplified models like the Linear Inverted Pendulum [4], in order to devise simple strategies. In this context, in [5] a model predictive controller has been employed to stabilize walking patterns. It is also worth mentioning the Capture Point framework [6] which allows to elaborate conditions about the possibility of the robot to maintain the upright position. In [7] it has also been applied in conjunction with MPC. In our approach we decided to take a different direction, taking into account the dynamic evolution of the center of mass (CoM) rather then adopting simple models. In addition, with respect to [8, 9, 10] we define a reactive planner which directly interfaces with the momentum-based whole body torque controller al- ready implemented on iCub [11, 12]. The hybrid nature of the system is taken into account by means of time-varying constraints. 2 Background Throughout this paper we adopt the following notation. – I represents an inertial reference frame with the origin placed on the ground, with the z−axis pointing against the gravity and the x−axis oriented frontally with respect to the robot. – xCoM ∈R3 is the position of the center of mass with respect to I. – ∥x∥2 W = x⊤Wx is the weighted square norm of x. – 1n represents a n × n identity matrix. 0n×m ∈Rn×m is a zero matrix while 0n = 0n×1 is a zero column vector of size n. – x∧∈R3×3 denotes the skew-symmetric matrix such that x∧y = x×y, where × denotes the cross product operator in R3. 2.1 Momentum-based whole-body torque control The momentum-based balancing controller has a hierarchical structure accom- plishing two different control tasks. The most important task is in charge of controlling the robot linear and angular momentum. We denote with H = [H⊤ lin, H⊤ ang]⊤∈R6 the robot linear and angular mo- mentum (computed around the CoM). Its rate of change is: ˙H = nc X i=1 CoMXifi + m¯g (1) where fi ∈R6 = [f⊤ i , τi⊤]⊤is the i-th of the nc contact wrenches. They are measured in a frame fixed with the contact surface. Matrices CoMXi ∈R6×6 transform these wrenches in a frame oriented as the inertial frame I, with the origin located on the center of mass position. Finally the robot total mass is indicated with m while ¯g ∈R6 is the 6D gravity acceleration vector. By con- trolling the robot momentum through joint torques, the first task enables us to consider f = [f ⊤ 1 , · · · , f ⊤ nc]⊤as virtual control inputs. A Receding Horizon Push Recovery Strategy 3 The second objective is responsible of tracking joint references, without in- terfering with the momentum control. In addition, it guarantees the stability of the zero dynamics resulting from the convergence of the first task. For additional details see the results in [11]. 2.2 Optimal control problem A continuous time optimal control problem can be stated as follows: minimize x(t),u(t) Γ = Z tf 0 L(x(t), u(t))dt + E(x(tf)) (2a) subject to ˙x(t) = f(x(t), u(t)), x(0) = x0 t ∈  0, tf  (2b) g(x(t), u(t)) ≤0, t ∈  0, tf  (2c) where tf is the “control horizon”, u ∈Rm is the control variable, x ∈Rn is the state variable (whose initial value is x0) and f : Rn×Rm →Rn the corresponding state evolution. L(x(t), u(t)) is the integral cost, while E(x(tf)) weights x at the end of the horizon. Finally g : Rn × Rm →Rng is the constraints function, embracing both equalities and inequalities. 3 Problem Formulation 3.1 The model In this work we focus on the dynamics of the center of mass. If we consider our case of interest, where no contacts are available except the two applied at the feet, we can rewrite Eq. 1 as: " ¨xCoM ˙Hang # = m−113 03×3 03×3 13  h CoMXl CoMXr i fl fr  +¯g (3) where the indexes l and r are relative to the left and right foot respectively, and we used the fact that Hlin = m ˙xCoM. In order to model the step in a model predictive framework, we can assume to know the instant in which the swing foot will touch the ground. In fact, the considered model does not contain any information about the posture of the robot and therefore it is not possible to define a “transition function”, e.g. the distance from the foot to the ground is generally used to predict when the step is going to take place. In other words, the controller is aware that the impact will take place in a precise instant in the future, but it does not know whether the quantities involved in the model will affect this instant or not. The most viable choice is then to consider this instant as constant and known in advance, equal to timpact, i.e. to the time needed to perform a step. For the same reason, the position that the swing foot will have after the step is also assumed to be known and constant. This takes particular importance since CoMXs (where s refers to the swing foot) directly depends on this position. Summarizing, the peculiar characteristics of the step, i.e. the duration and the target position, are assumed to be known and constant. 4 Dafarra et al. 3.2 The angular momentum We focus now on the matrix CoMXi (here i is a placeholder for subscript l or r), whose structure depends on the choice of the frame to describe the robot momentum. The matrix has the following form: CoMXi =  13 03×3 (xi −xCoM)∧ 13  . (4) The derivative of the angular momentum ˙Hang may thus be written as ˙Hang = Pl,r i (xi −xCoM)∧fi + τi. The product between xCoM and fi causes the whole formulation to be non-convex which makes the resolution dramatically more complex. In literature this problem is faced by minimizing an upper-bound of the angular momentum [10], or by approximating it through quadratic constraints [13]. In our application, angular momentum is mainly needed to bound the usage of contact wrenches, rather than be precisely controlled to zero. Thus, we can accept a more coarse approximation, relying on its Taylor expansion (around the last available values of fi and xCoM) truncated to the first order, namely: ˙Hang ≈ {l,r} X i ˙Hang + ∂˙Hang ∂fi  fi −f0 i  + ∂˙Hang ∂xCoM  xCoM −x0 CoM  ˙Hang ≈ {l,r} X i τi +  xi −x0 CoM ∧ f0 i +  xi −x0 CoM ∧ fi −f0 i  + +  f0 i ∧ xCoM −x0 CoM  . ⇒˙Hang ≈ {l,r} X i τi +  xi −x0 CoM ∧ fi +  f0 i ∧ xCoM −x0 CoM  (5) where we exploited the anticommutative property of the cross product, i.e. A∧B = −B∧A. The superscript 0 refers to the point at which the Taylor expansion is computed, in our case from the last feedback obtained from the robot. The approximation introduced with the truncation to the first order is o fi −f0 i  xCoM −x0 CoM  . If we consider a controller horizon tf not too wide, the term related to the position of the CoM is expected to be small enough, mak- ing this linear approximation effective. Finally, by defining the state variable γ and the control input variable f as γ := [ x⊤ CoM ˙x⊤ CoM H⊤ ang ]⊤, f := [ f ⊤ l f ⊤ r ]⊤we can rewrite Eq. (3) as: ˙γ = ˜Evγγ + ˜Fγf + ˜Gγ + ˜S0 γ, (6) ˜Evγ =   03×3 13 03×3 03×3 03×3 03×3 f0 l + f0 r ∧03×3 03×3  , ˜Fγ=   03×3 03×3 03×3 03×3 m−113 03×3 m−113 03×3 xl −x0 CoM ∧ 13 xr −x0 CoM ∧ 13  , ˜Gγ =  03 −¯g  , ˜S0 γ = " 06 − f0 l + f0 r ∧x0 CoM # . A Receding Horizon Push Recovery Strategy 5 3.3 Constraints definition The constraints to be enforced are mainly related to the feasibility of the exerted wrenches, e.g. center of pressure and friction cone among the most common constraints related to unilateral contacts. Furthermore the wrench acting on the swinging foot should be null for every instant before the impact, i.e. timpact. We start by examining the constraints acting on the stance foot (here as- sumed to be the left): Aclfl ≤bcl ∀t : t ≤tf. (7) Eq. (7) encompasses all the considered inequalities constraints, namely: i) friction cone with linear approximation, ii) center of pressure, iii) positivity of the normal component of the contact force. The constraints on the swing foot instead, catches the hybrid nature of the system. In particular they represent the fact that the wrench must be null before the impact and feasible after it. Formally: ( fr = 06 ∀t : t < timpact Acrfr ≤bcr ∀t : timpact ≤t ≤tf (8) where Acr and bcr describe the same feasibility constraints as Acl and bcl. The above equation assumes that timpact ≤tf. If the impact does not occur inside the control horizon, then the wrench exerted by the right foot is forced to be null throughout the whole horizon. We also included an additional constraint to enforce that balancing is kept after the step. In particular, after this instant, we can constraint the instanta- neous capture point to be inside the convex hull of the two feet, which can be predicted by knowing the future position of the right foot. Thus, we can define a set of linear inequalities such that if Achxicp ≤bch ∀t : t ≥timpact is satisfied, than xicp, i.e. the instantaneous capture point location, is in the convex hull. By imposing this constraint, we are forcing the instantaneous capture point to be in a capturable state after the step is performed. In fact, the convex hull represents the region in which it is possible to stabilize the instantaneous capture point dynamics without taking additional steps. 3.4 Cost function definition Following the optimal control formulation of Eq. (2) we are left to define the cost function applied within the MPC controller, called Γ. Note that different terms of the cost function act only after the step is taken. Formally, it has the following expression: Γ = 1 2 Z tf 0 γ(τ) −γd(τ) 2 Kγ dτ + Z tf ¯timp γ(τ) −γd(τ) 2 Kimp γ dτ+ (9) + Z tf 0 f(τ) 2 Kf dτ + Z tf ¯timp xicp(τ) −xd icp(τ) 2 Kicp dτ + γ(tf) −γd(tf) 2 Kimp γ ! 6 Dafarra et al. 0 2 4 6 -0.2 0 0.2 0.4 0.6 (a) 0 2 4 6 -0.3 -0.2 -0.1 0 0.1 (b) Fig. 1: (a) represents the center of mass evolution while performing the step. The first dotted line denotes the time of the push which occurred at t = 2.4s. The second one at t = 3.0s indicates when the right foot hits the ground. (b) presents the evolution of the y-component of the instantaneous capture point. The horizontal green dotted lines represent the approximation of the convex hull. where the superscript d indicates the reference values. K(·) (·) is a real positive semi-definite matrix of gains with suitable dimensions, accounting also for unit of measurement mismatches inside Γ. ¯timp is the minimum between timpact and tf while, for the sake of simplicity, the initial time instant is set to zero. Thus, it is possible to vary the cost applied to the state γ after the impact. This is done through the use of the matrix Kimp γ which, after the impact, adds up to Kγ. Finally, a terminal cost term is inserted using the same matrix Kimp γ , to model the finiteness of the control horizon. Consider now the final objective of having the center of mass over the centroid of the support polygon once the step is made. We decide to weight only the z−component of the CoM error throughout the whole horizon, while weighting the traverse components (i.e. x and y) only in the terminal cost (last term of Eq.(9)) and after the step is made (second part of Eq.(9)). The same rationale could be applied to the reference for the instantaneous capture point. During the step, its dynamics is exponentially diverging and, as a consequence, this contribution takes a role only after timpact, where its distance from the centroid of the convex hull is weighted. Finally, the requested reaction forces are weighted too, mostly for avoiding impulsive responses which are dangerous for the mechanical structure of the robot, without any warranty that they will be actually attained through the underneath controller. Intuitively, the tracking error for the desired wrenches will be lower with a smooth and limited reference. A Receding Horizon Push Recovery Strategy 7 0 2 4 6 0 100 200 300 400 0 2 4 6 Time [s] 0 100 200 300 400 Force [N] FL;Z FR;Z Fig. 2: Representation of the z-component of the demanded forces. 4 Simulation Results The proposed MPC approach has been tested on an iCub model in the Gazebo simulator [14], employing YARP Plugins [15]. The iCub humanoid robot [16] possesses 53 actuated joints, but nearly half of them are concentrated on the eyes and the hands. For this reason only 23 degrees of freedom (DoF) are torque controlled and used for balancing purposes. The proposed controller has been implemented in the MATLAB R ⃝/Simulink R ⃝ environment, using the WBToolbox library[17]. The experimental scenario con- sists of the robot balancing on its left foot. We have chosen this simple scenario to test the performance of the presented strategy with a single contact activation. When a strong push occurs the step recovery strategy starts. The time step dt is set to 10ms, while the controller horizon N is chosen to be 15. It has been observed that for smaller control horizons, the robot is not able to sustain the disturbance. Figure 1 depicts the result of the push experiment performed adopting the MPC approach. The robot was able to fully recover after the push, undergoing just a slight oscillation after the impact of the swing foot, whose oc- currence is highlighted by the second dashed line. The capability of recovering by performing the step can be noticed also from Figure 1, where the behavior of the y-component of the Instantaneous Capture Point is depicted (the x component has undergone just a slight perturbation). In particular the Capture Point, after exiting the support polygon (corresponding to the left foot), is fully contained in the new convex hull once the step is performed. Finally, it is worth considering the desired wrenches. Figures 2 compare the desired vertical force obtained using the presented control strategy and the one proposed in [18] (on the right). While oscillating in the first stage of the double support, the steady state is reached much faster than the previous approach. 5 Conclusions and Future Work The presented approach adopts, as model, the dynamics of the center of mass with only a small approximation necessary to remove non-convexities. It also 8 Dafarra et al. considers the discontinuous contribution of the swing foot through the adoption of time varying constraints. With respect to [18], this approach does not need the definition of a trajectory for the CoM, which indeed is part of the optimization process. As a consequence, the robustness of this technique relies more in the intrinsic knowledge of the system rather than in the expertise of the designer. This approach has been meant not to be a simple planner, but instead to provide reaction to external disturbances in real-time. However, at the present time it takes almost 0.1 seconds to get a solution on a machine running Ubuntu 14.04 on a quad-core Intel R ⃝Core i5@2.30GHz with 16GB of RAM, by using as solver the MATLAB R ⃝interface of MOSEK R ⃝. As a future work, this strategy will be implemented on an optimized C++ code and applied on the real platform. Finally, as an additional improvement we plan to supply the controller with more informations about the kinematics of the leg, thereby modeling the step movement directly as part of the problem formulation and relaxing the hypoth- esis of knowing a priori the time of the impact. References [1] Mayne D, Michalska H (1990) Receding horizon control of nonlinear sys- tems. IEEE Transactions on Automatic Control 35(7):814–824 [2] Lygeros J, Tomlin C, Sastry S (1999) Hybrid systems: modeling, analysis and control. preprint [3] Lazar M, Heemels W, Weiland S, Bemporad A (2006) Stabilizing model pre- dictive control of hybrid systems. IEEE Transactions on Automatic Control [4] Kajita S, Kanehiro F, Kaneko K, Yokoi K, Hirukawa H (2001) The 3D linear inverted pendulum mode: a simple modeling for a biped walking pattern generation. In: Intelligent Robots and Systems, 2001. Proceedings. 2001 IEEE/RSJ International Conference on, vol 1, pp 239–246 vol.1 [5] Wieber PB (2006) Trajectory free linear model predictive control for stable walking in the presence of strong perturbations. In: 2006 6th IEEE-RAS International Conference on Humanoid Robots, IEEE, pp 137–142 [6] Pratt J, CarffJ, Drakunov S, Goswami A (2006) Capture point: A step toward humanoid push recovery. In: Humanoid Robots, 2006 6th IEEE- RAS International Conference on, pp 200–207 [7] Krause M, Englsberger J, Wieber PB, Ott C (2012) Stabilization of the cap- ture point dynamics for bipedal walking based on model predictive control. IFAC Proceedings Volumes 45(22):165–171 [8] Dai H, Valenzuela A, Tedrake R (2014) Whole-body motion planning with centroidal dynamics and full kinematics. In: 2014 IEEE-RAS International Conference on Humanoid Robots, IEEE, pp 295–302 [9] Herzog A, Rotella N, Schaal S, Righetti L (2015) Trajectory generation for multi-contact momentum control. In: Humanoid Robots (Humanoids), 2015 IEEE-RAS 15th International Conference on, IEEE, pp 874–880 [10] Dai H, Tedrake R (2016) Planning robust walking motion on uneven terrain via convex optimization. In: Humanoid Robots (Humanoids), 2016 IEEE- RAS International Conference on A Receding Horizon Push Recovery Strategy 9 [11] Nava G, Romano F, Nori F, Pucci D (2016) Stability analysis and design of momentum-based controllers for humanoid robots. Intelligent Robots and Systems (IROS) 2016 IEEE International Conference on [12] Nori F, Traversaro S, Eljaik J, Romano F, Del Prete A, Pucci D (2015) iCub whole-body control through force regulation on rigid noncoplanar contacts. Frontiers in Robotics and AI 2(6) [13] Ponton B, Herzog A, Schaal S, Righetti L (2016) A convex model of momen- tum dynamics for multi-contact motion generation. In: Humanoid Robots (Humanoids), 2016 IEEE-RAS International Conference on [14] Koenig N, Howard A (2004) Design and use paradigms for gazebo, an open- source multi-robot simulator. Intelligent Robots and Systems, 2004 (IROS 2004) Proceedings 2004 IEEE/RSJ International Conference on [15] Mingo E, Traversaro S, Rocchi A, Ferrati M, Settimi A, Romano F, Na- tale L, Bicchi A, Nori F, Tsagarakis NG (2014) Yarp based plugins for gazebo simulator. In: 2014 Modelling and Simulation for Autonomous Sys- tems Workshop (MESAS), Roma, Italy, 5 -6 May 2014 [16] Metta G, Sandini G, Vernon D, Caldwell D, Tsagarakis N, Beira R, Santos- Victor J, Ijspeert A, Righetti L, Cappiello G, et al (2005) The RobotCub project-an open framework for research in embodied cognition. Humanoids Workshop, IEEE–RAS International Conference on Humanoid Robots [17] Romano F, Traversaro S, Pucci D, Prete AD, Eljaik J, Nori F (2017) A whole-body software abstraction layer for control design of free-floating me- chanical systems. In: 2017 IEEE 1st International Conference on Robotic Computing [18] Dafarra S, Romano F, Nori F (2016) Torque-controlled stepping-strategy push recovery: Design and implementation on the iCub Humanoid robot. In: Humanoid Robots (Humanoids), 2016 IEEE-RAS International Conference on