arXiv:1607.07611v1 [cs.LG] 26 Jul 2016 Learning Null Space Projections in Operational Space Formulation Hsiu-Chin Lin and Matthew Howard ∗ July 27, 2016 Abstract In recent years, a number of tools have become available that recover the under- lying control policy from constrained movements. However, few have explicitly considered learning the constraints of the motion and ways to cope with unknown environment. In this paper, we consider learning the null space projection matrix of a kinematically constrained system in the absence of any prior knowledge ei- ther on the underlying policy, the geometry, or dimensionality of the constraints. Our evaluations have demonstrated the effectiveness of the proposed approach on problems of differing dimensionality, and with different degrees of non-linearity. 1 Introduction Many everyday human skills can be considered in terms of performing some task sub- ject to a set of self-imposed or environmental constraints. For example, when pouring water from a bottle, self-imposed constraints apply to the position and the orientation of the hand so that the water falls within the glass. When wiping a table (Fig. 1), the sur- face of the table acts as an environmental constraint that restricts the hand movements when in contact with the surface. A promising way to provide robots with skills is to take examples of human demon- strations and attempt to learn a control policy that somehow capture the behaviours [1, 2, 3]. One common approach is to take the operational space formulation [4]. For example, given constraint in the end-effector space, produce a set of joint-space move- ments that can satisfy the constraints. Behaviour may be subject to various constraints that are usually non-linear in actuator space [5, 6]. For example, maintaining balance of the robot (higher priority) while accomplishing an end-effector task (lower priority) [7] or avoiding obstacles [8]. In recent years, a number of new tools have become available for recovering the underlying policy from constraint data [9, 10]; however, few have explicitly consid- ered learning the constraints and coping with unknown environment. Previous work related to the estimation of constraint takes force measurements from the end-effector to calculate the plane normal to the constraint [11, 12]. Nevertheless, these are limited to problems of a robot manipulator acting on a smooth surface in a three-dimensional space, and rely on force sensors, which are normally expensive to obtain. ∗ H. Lin( H.Lin@bham.ac.uk ) is at the School of Computer Science, University of Birmingham, UK. M. Howard ( matthew.j.howard@kcl.ac.uk ) is at the Dept. of Informatics, Kings College London, UK. Figure 1: Examples of wiping on different tables [13]. The behaviour (wiping) is sub- ject to various constraint imposed by the environment where the behaviour is performed (surfaces). In this paper, we propose a method for directly learning the kinematic constraints present in movement observations, as represented by the null space projection matrix of a kinematically constrained system. The proposed approach requires no prior informa- tion about either the dimensionality of the constraints, nor does it require information about the policy underlying the observed movement. 2 Problem Definition Based on the principles of analytical dynamics [14], we consider that the underlying policy is subject to a set of S -dimensional ( S ≤ Q ) constraints A ( x ) u ( x ) = b ( x ) (1) where x ∈ R P represents state, u ∈ R Q represents the action, and b 6 = 0 is the task- space policy describing the underlying task to be accomplished. The constraint matrix A ( x ) ∈ R S×Q is a matrix describing the constraints, which projects the task-space policy onto the relevant part of the control space. Inverting (1), results in the relation u ( x ) = A † ( x ) b ( x ) + N ( x ) π ( x ) (2) where A † is the Moore-Penrose pseudo-inverse of A , N ( x ) := ( I − A † ( x ) A ( x )) ∈ R Q×Q (3) is the projection matrix, and I ∈ R Q×Q is the identity matrix. The projection matrix N projects the null space policy π onto the null space of A , which in general, has non-linear dependence on both time and state. Note that, as N projects π onto the null space of A , the two terms u ts ( x ) := A † ( x ) b ( x ) and u ns ( x ) := N ( x ) π ( x ) (4) are orthogonal to each other. We term them as the task space component and null space component , respectively. It would be useful to know the decomposition of A , b , N , and π ; however, the true quantities of those variables are unavailable by assumption. In human demonstrations, it is normally not clear which dimensions are restricted as part of the task constraints. For example, when picking up a glass of water, a reasonable assumption is that the orientation of the hand is controlled such that the glass is kept up right. However, it is less clear whether the hand position is part of the task (result of b ) or a comfortable position is chosen as part of the redundancy resolution (result of π ). Several studies have been devoted to learning u ns (or, equivalently, u ts ) and π [10], but so far, few have explicitly considered estimating A or N . However, the ability to estimate A or N is favourable for adaptation. When dealing with an unseen constraint (i.e., a different A ), it might be time-consuming to re-learn a policy π and require lots of additional human demonstrations. For example, adapting the wiping behaviour onto a different table (see Fig. 1) requires prior knowledge of the surface itself. If the constraint (the surface) can be estimated, we can adapt the previously learnt wiping behaviour onto the new constraint. In [15], it was first demonstrated that N could be learnt purely from data for the case Q = 2 . Subsequently, [16] showed that N can be estimated for problems where A ∈ R S×Q and 1 < S < Q , for the special case of Au = 0 . By extending that work, here we propose a method to estimate N for the generic case described by (1) (with b 6 = 0 ), the first time this has been shown for problems in the full operational space formulation (1)-(2). 3 Method The proposed method works on data given as N pairs of observed states x n and ob- served actions u n . It is assumed that (i) the observations follow the formulation in (2), (ii) the task space policy b varies across observations, (iii) u are generated using the same null space policy π , (iv) neither A , b nor N are explicitly known for any given observation. We define the shorthand notation A † n := A † ( x n ) , b n := b ( x n ) , N n := N ( x n ) and π n := π ( x n ) ). 3.1 Learning Null Space Component The first step is to extract an estimate of the null space component (4) from the raw observations. From [10], an estimate ̃ u ns ( x ) is sought which minimises E [ ̃ u ns ] = N ∑ n =1 || ̃ P n u n − ̃ u ns n || 2 (5) where ̃ u ns n := ̃ u ns ( x n ) and ̃ P n := ̃ u ns n ̃ u ns n ⊤ / || ̃ u ns n || 2 . This exploits the identity Pu = P ( u ts + u ns ) = u ns , by seeking a estimate consistent with this, see [10] for details. 3.2 Learning Null Space Projections Having an estimate of the null space term ̃ u ns , and knowing that the data follows the relationship (2), allows several properties of (2) to be used to form the estimate of A . Firstly, since u ns is the projection of π onto the image space of N (see (4)), and by the indempotence of N , Nu ns = u ns (6) must hold [16]. This means that an estimate ̃ N may be furnished by optimising (6), i.e., minimising E [ ̃ N ] = N ∑ n =1 || ̃ u ns n − ̃ N n ̃ u ns n || 2 (7) where ̃ N n := ̃ N ( x n ) . Fig. 2a-2b shows a visualisation of this objective function. In Fig. 2a, an example data point is plotted where A is a vector parallel to the z -axis, its null space is the xy -plane, the null space component u ns is parallel to the y -axis, (a) (b) (c) (d) Figure 2: Visualisation of the objective functions. (a) Example data point with task and null space components plotted. (b) The objective function (7) minimises || u ns − ̃ Nu ns || (red dashed line). (c) Alternately, (8) minimises || ̃ Nu ts || (blue dashed line). (d) It is proposed to use (9), which minimises the sum of these distances. and the task space component u ts is parallel to the z -axis. The objective function (7) aims at minimising the distance between u ns and its projection onto the null space of A (green plane), illustrated as the red dashed line. Second, note that, by definition, u ts and u ns are orthogonal ( i.e., u ts ⊤ u ns = 0 ) and so the true projection matrix must also satisfy Nu ts = 0 . Using this insight, another alternative is to seek an estimate ̃ N that minimises E [ ̃ N ] = N ∑ n =1 || ̃ N ̃ u ts n || 2 (8) where ̃ u ts n := ̃ u ts ( x n ) . A visualisation of (8) for the example data point is shown in Fig. 2c, where now the blue dashed line indicates the distance minimised. Since both (7) and (8) contain information about the projection matrix, the third alternative, proposed here is to minimise the sum of two, namely, E [ ̃ N ] = N ∑ n =1 || ̃ u ns n − ̃ N n ̃ u ns n || 2 + || ̃ N n ̃ u ts n || 2 (9) as illustrated in Fig. 2d. While this incurs a slight increase in computational cost (due to the need to evaluate the two terms instead of one), it has important benefits in ensuring the learnt ̃ A has correct rank , that are missed if (7) or (8) are used in isolation. In other words (9) helps ensuring the learnt constraints have the correct dimensionality , as shall be illustrated in the following. 3.3 Ensuring Correct Rank Consider again the example data point illustrated in Fig. 2a, with u ts (blue) parallel to the z -axis and u ns (red) parallel to the y -axis. Minimising (7) or (8) in isolation for this data point has subtly different results. (a) (b) Figure 3: Issues with learning ̃ N through (7) or (8). (a) Using (7), both ̃ A 1 ( xz -plane) and ̃ A 2 ( x -axis) are candidate solutions. Note that, ̃ A 2 is incorrect since ̃ N 2 u ts = u ts , but this is not evident from the error measure. (b) Using (8) instead, ̃ A 1 (on the z - axis) and ̃ A 2 (the yz -plane) are candidate solutions. However, ̃ A 2 is wrong because ̃ N 2 u ns = 0 . 3.3.1 Minimising (7) Recall that (7) relies on finding ̃ N such that u ns lies in the image space of ̃ N , or u ns ∈ Image ( ̃ N ) . However, the problem is that the superset of N also minimises this error measure. Let ̃ N ′ be a superset of ̃ N such that ̃ N ⊂ ̃ N ′ . Since u ns ∈ Image ( ̃ N ) and ̃ N ⊂ ̃ N ′ , then it is also true that u ns ∈ Image ( ̃ N ′ ) . In other words, the rank of ̃ N can be overestimated by minimising (7). This is visualised in Fig. 3a. There, two candidate solutions are (i) ̃ A 1 ( xz -plane), with the y -axis as the null space, and (ii) ̃ A 2 ( x -axis) with the yz -plane as null space. Note that both minimise (7) exactly, so without prior knowledge of the task-space nor dimensionality S , it is hard to decide which one is the better estimate. If the data contains rich enough variations in u ns such that u ns spans R Q−S , then the ̃ N with the lowest rank, can heuristically be chosen, but this may be hard to verify in real world problems. Meanwhile, note that ̃ A 2 is not a correct solution since ̃ N 2 u ts = u ts . 3.3.2 Minimising (8) The converse problem arises by optimising (8) instead. By definition, there exists a solution ̃ N in R Q−S such that ̃ Nu ts = 0 . However, note also that there also exist subspaces of ̃ N that satisfy this condition. Specifically, (8) seeks a projection matrix ̃ N such that its image space is orthogonal to u ts . The rank of the true solution N is Q − S , which implies that the image space of N can be described by Q − S linearly independent row vectors. Note that, if we find a ̃ N such that ̃ N is orthogonal to u ts , then each row vector of ̃ N , and their linear combinations, are also orthogonal to u ts . The issue is shown graphically in Fig. 3b. There, candidate solutions ̃ A 1 ( z -axis) and ̃ A 2 ( yz -plane) both minimise (8). However, ̃ A 2 is wrong because ̃ N 2 u ns = 0 . The risk here is underestimating the rank of ̃ N . If it is known that u ts spans the task- space, then the ̃ N with the highest rank (or ̃ A with the lowest rank) can be chosen, but this again relies on a heuristic choice. 3.3.3 Minimising (9) If we have rich enough observations such that u ts spans R S and u ns spans R Q−S , there is a projection matrix N ∈ R ( Q−S ) ×Q such that Nu ns = u ns and Nu ts = 0 . From Algorithm 1 Learning Null-space Projection Input: x : observed states u : observed actions Output: ̃ A : the estimated constraint matrix 1: Decompose u into ̃ u ts and ̃ u ns by (5) 2: Set ̃ A ← ∅ and s ← 1 3: Learn α ∗ 1 by minimising (7) 4: while Adding α ∗ s to ̃ A does not increase (9) do 5: Set ̃ A ← [ α ∗ 1 , · · · , ˆ α ∗ s ] ⊤ and s ← s + 1 6: Learn α ∗ s by (7) such that α s ⊥ α ∗ j ∀ j < s 7: end while 8: Return ̃ A the prior analysis, if ̃ Nu ns = u ns is satisfied, a projection matrix ̃ N such that N ⊆ ̃ N has been found. Likewise, if ̃ Nu ts = 0 , then it is also true that that ̃ N ⊆ N . If both conditions are met, ̃ N ⊆ N ⊆ ̃ N , and the only possibility is ̃ N = N . Therefore, (9) can be applied to ensure that our estimated ̃ N has the current rank. Assuming that A n is formed from a set of S row vectors A n = [ α 1 ( x n ) ⊤ , α 2 ( x n ) ⊤ , ..., α S ( x n ) ⊤ ] ⊤ where α s corresponds to the s th constraint in the observation. ̃ A can be estimated by an iterative approach, whereby a series of α s are fitted to form an estimate ̃ A [16]. Specifically, the s th constraint vector α s is learnt by optimising (7), and α s is added only if it does not reduce the fit under (9). The process is summarised in Algorithm 1. Note that, no prior knowledge of the true projection matrix N is required . 3.4 Learning constraints in operational space formulation In the context of operational space formulation [4], a constraint can be imposed in the operational/task space , which has a one-to-one relationship with the state space. For example, given a constraint in the end-effector (task space), produce the joint move- ment (state space) that satisfied the constraints. We proposed two methods for learning the constraints in this formulation. 3.4.1 Learning the selection matrix Λ One way to represent A n is to assume that A n = Λ J n where Λ is a selection matrix specifying which dimensions are constrained (i.e., Λ s,s = 1 if the s th dimension of the end-effector is constrained), and J n is the Jacobian matrix that relates the joint velocity to the end-effector velocity. If J is known, we can form Λ as a set of S orthonormal vectors Λ = [ Λ ⊤ 1 , Λ ⊤ 2 , ..., Λ ⊤ S ] ⊤ where Λ s ∈ R 1 ×S corresponds to the s th dimension in the task space and Λ i ⊥ Λ j for i 6 = j . From [16], the objective function in (7) can be written as E [ ̃ N ] = N ∑ n =1 ̃ u ns n ⊤ ̃ A † n ̃ A n ̃ u ns n . Substituting A n = Λ J n , (7) can be written as E [ ̃ N ] = N ∑ n =1 ( ̃ u ns n ) ⊤ ( Λ J n ) † ( Λ J n ) ̃ u ns n (10) Then, the optimal Λ can be formed by iteratively searching the choice of Λ s that min- imises the (10). Following [16], an unit vector ˆ α = (ˆ a 1 , ˆ a 2 , · · · , ˆ a Q ) with an arbitrary dimension Q can be represented by Q − 1 parameters θ = ( θ 1 , θ 2 , · · · , θ Q− 1 ) ⊤ where ˆ a 1 = cos θ 1 , ˆ a 2 = sin θ 1 cos θ 2 , ˆ a 3 = sin θ 1 sin θ 2 cos θ 3 . . . ˆ a Q− 1 = Q− 2 ∏ q =1 sin θ q cos θ Q− 1 , ˆ a Q = Q− 1 ∏ q =1 sin θ q (11) Using the formulation above, each Λ s can be represented by parameters θ s ∈ R Q− 1 . Note that estimating θ ∗ s is a non-linear least squares problem, which cannot be solved in closed form. We use the Levenberg-Marquardt algorithm, a numerical optimization technique, to find the optimal θ ∗ s . 3.4.2 Learning state dependent constraint vector α ( x n ) In case Jacobian is not available, the second approach considers directly learning the constraint vector α . Assuming that A n is formed from a set of S unit vectors A n = [ ˆ α 1 ( x n ) , ˆ α 2 ( x n ) , ..., ˆ α S ( x n )] ⊤ where ˆ α s corresponds to the s th constraint in the ob- servation and ˆ α i ⊥ ˆ α j for all i 6 = j . Similar to the procedure in §3.4.1, we can represent each ˆ α s as a set of Q − 1 param- eters θ s as described in (11). For parameter estimation, Λ s is modeled as θ s ( x n ) = W s β ( x n ) where W s ∈ R ( S− 1) ×M is a matrix of weights, and β ( x n ) ∈ R M is a vector of M fixed basis functions. We chose the normalised radial basis functions β i ( x n ) = K ( x n − c i ) ∑ M m =1 K ( x n − c m ) where K ( . ) denotes Gaussian kernels and c i are M pre- determined centres chosen according to k-means. 4 Evaluation In this section, some numerical results are presented to evaluate the learning perfor- mance. 4.1 Evaluation Criteria The goal of this work is to predict the projection matrix N of the underlying the constrained observations so that these may be reproduced through a suitable learn- ing scheme ( e.g., [9, 10, 17]). For testing the performance of learning, the following evaluation criteria may be defined. 4.1.1 Normalised Projected Policy Error This error measure measures the difference between the policy subject to the true con- straints, and that of the policy subject to the estimated constraints. Formally, the nor- malised projected policy error (NPPE) can be defined as E P P E = 1 N σ 2 u N ∑ n =1 || N n π n − ̃ N n π n || 2 (12) where N is the number of data points, π n are samples of the policy, and N and ̃ N are the true and the learnt projection matrices, respectively. The error is normalised by the variance of the observations σ 2 u . 4.1.2 Normalised Projected Observation Error To evaluate the fit of ̃ N under the objective function (9), we suggest to use the nor- malised projected observation error (NPOE), E P OE = 1 N σ 2 u N ∑ n =1 || u ns n − ̃ N n u ns n || 2 + || ̃ N n u ts n || 2 (13) which reaches zero only if the model exactly satisfies (9). 4.1.3 Normalised Null Space Component Error Since the quality of the input data to our proposed method depends on the fitness of ̃ u ns , we also suggest looking at the normalised null space component error (NNCE), E N CE = 1 N σ 2 u N ∑ n =1 || u ns n − ̃ u ns n || 2 (14) which measures the distance between the true and the learnt null space components u ns and ̃ u ns , respectively. 4.2 Toy Example Our first experiment demonstrates our approach on a two-dimensional system with a one-dimensional constraint ( A ∈ R 1 × 2 ). We consider three different null space poli- cies π : (i) a linear policy: π = − L ̄ x where ̄ x := ( x ⊤ , 1) ⊤ and L = ((2 , 4 , 0) , (1 , 3 , − 1)) ⊤ , (ii) a limit-cycle policy: ̇ r = r ( ρ − r 2 ) with radius ρ = 0 . 75 m , angular velocity ̇ θ = 1 rad/s , where r and φ are the polar representation of the state, i.e., x = ( r cos φ, r sin φ ) ⊤ , and (iii) a sinusoidal policy: π = (cos z 1 cos z 2 , − sin z 1 sin z 2 ) ⊤ with z 1 = πx 1 and z 2 = π ( x 2 + 1 2 ) . The training data consists of 150 data points, drawn randomly across the space ( x ) i ∼ U ( − 1 , 1) , i ∈ { 1 , 2 } . The data points are subjected to a 1 -D constraint A = ˆ α ∈ R 1 × 2 , in the direction of the unit vector ˆ α , which is drawn uniform-randomly θ ∼ U (0 , π ] rad at the start of each experiment. The underlying task is to move with fixed velocity to the target point ρ ∗ along the direction given by ˆ α . This is achieved with a linear attractor policy in task space b ( x ) = β ts ( ρ ∗ − ρ ) (15) Policy NNCE NPPE NPOE Linear ∼ 10 − 7 ∼ 10 − 9 ∼ 10 − 9 Limit-cycle 0 . 08 ± 0 . 02 0 . 001 ± 0 . 002 0 . 001 ± 0 . 002 Sinusoidal 5 . 26 ± 4 . 46 0 . 011 ± 0 . 017 0 . 014 ± 0 . 021 Table 1: NNCE, NPPE, and NPOE (mean ± s.d.) × 10 − 2 over 50 trials. x 1 x 2 x 1 x 2 Figure 4: Samples of training data generated from a limit-cycle policy (left) and a grid of test points (right). Shown are the true null space component u ns (black), the predicted null space component ̃ u ns (red), and the true observation u (blue). where ρ denotes the position in task space, and β ts = 0 . 1 is a scaling factor. For each trajectory, the task space target was drawn randomly ρ ∗ ∼ U [ − 2 , 2] . A sample data set for the limit-cycle policy is presented in Fig. 4 (left) where the colours denote the true null space component u ns (black) and the true observation u (blue). The null space component is modeled as ̃ u ns = w β where β is a vector of M = 16 radial basis functions, and w is a vector of weights learnt through minimisation of the objective function (5). Then, the projection matrix ̃ N is learnt by minimising (7), according to the scheme outlined in §3. The experiment is repeated 50 times and evaluated on a set of 150 test data points, generated through the same procedure as described above. Table 1 summarises NPPE, NPOE, and NNCE ((12)-(14)) for each policy. The values are (mean ± s.d.) over 50 trials on the hold-out testing set. In terms of NPPE and NPOE, we can learn a good approximation with both measurement < 10 − 4 without the true decomposition of u ts and u ns . Looking at the result of NNCE, we note that quality of ̃ N is affected by the accuracy of ̃ u ns . However, if the true u ns are given, NPPE and NPOE are both lower than 10 − 15 , and this is not significantly affected by the policy. The results for the limit-cycle policy are shown in Fig. 4 (right). The predicted ̃ u ns (red) are plotted on top of the true u ns (black). As can be seen, there is good agree- ment between the two, verifying that using ̃ u ns , there is little degradation in predicting constrained motion. To further characterise the performance of the proposed approach, we also looked at the effect of varying the size of the input data for the limit-cycle policy. We tested our method for 5 < N < 250 data points and estimated ̃ N from the learnt u ts and u ns . The results (in log scale) over 50 trials are plotted in Fig. 5 (left). It can be seen that Data Points Normalised Error (log) NPPE NPOE NNCE 50 100 150 200 250 −6 −4 −2 0 Noise (%) Normalised Error (log) NPPE NPOE NNCE 5 10 15 20 −4 −3 −2 −1 0 Figure 5: NNCE, NPPE, and NPOE for increasing number of data points (left) and increasing noise levels in the observed u (right). Curves are mean ± s.d. over 50 trials. the NPPE, NPOE, and NNCE rapidly decrease as the number of input data increases. This is to be expected, since a data set with richer variations can form a more accurate estimate of ̃ u ts , ̃ u ns , and ̃ N . Note that even at relatively small data set ( N < 50 ), the error is still very low ( ≈ 10 − 3 ). We also tested how the levels of noise in the training data affect the performance of our method. For this, we contaminated the limit-cycle policy π with Gaussian noise, the scale of which we varied to match up to 20% of the data. The resulting NNCE, NPPE, and NPOE follows the noise level, as plotted in Fig. 5 (right). It should be noted, however, that the error is still relatively low (NPPE < 10 − 2 ), even when the noise is as high as 5% of the variance of the data. 4.3 Three Link Planar Arm The goal of the second experiment is to assess the performance of the proposed ap- proach for more realistic constraints. For this, constrained motion data from a kine- matic simulation of a planar three link arm is used. The set up is as follows. The state and action spaces of the arm are described by the joint angles x := q ∈ R 3 and the joint velocities u := ̇ q ∈ R 3 . The task space is described by the end-effector r = ( r x , r z , r θ ) ⊤ where r x and r z specify the position, and r θ is the orientation. Joint space motion is recorded as the arm performs tasks under different constraints in the end-effector space. As discussed in §3.4.1, a task constraint A at state x n is described through the form A n = ΛJ n (16) where J n ∈ R 3 × 3 is the manipulator Jacobian, and Λ ∈ R 3 × 3 is the selection matrix specifying the coordinates to be constrained. The following three different constraints can be defined: 1. Λ ( x,z ) = ((1 , 0 , 0) , (0 , 1 , 0) , (0 , 0 , 0)) ⊤ , 2. Λ ( x,θ ) = ((1 , 0 , 0) , (0 , 0 , 0) , (0 , 0 , 1)) ⊤ , and 3. Λ ( z,θ ) = ((0 , 0 , 0) , (0 , 1 , 0) , (0 , 0 , 1)) ⊤ . Choosing Λ ( x,z ) allows the orientation to follow the null space policy π provided that the end-effector position moves as required by the task space policy b . Constraint Constraint NNCE Method NPPE NPOE Λ x,z 0 . 32 ± 1 . 07 Learn ˆ α 3 . 54 ± 2 . 29 0 . 06 ± 0 . 04 Learn Λ 0 . 30 ± 0 . 44 0 . 04 ± 0 . 04 Λ x,θ 3 . 89 ± 1 . 10 Learn ˆ α 7 . 79 ± 12 . 0 2 . 14 ± 6 . 48 Learn Λ 0 . 32 ± 1 . 42 2 . 53 ± 1 . 11 Λ z,θ 0 . 61 ± 0 . 60 Learn ˆ α 2 . 80 ± 2 . 59 0 . 43 ± 0 . 24 Learn Λ 0 . 24 ± 0 . 51 0 . 14 ± 0 . 25 Table 2: NNCE, NPPE, and NPOE (mean ± s.d.) × 10 − 2 for each constraint over 50 trials. Λ ( x,θ ) restricts the space defined by the x-coordinate r x and orientation of the end- effector r θ , while r z is unconstrained. The trajectories are recorded from a null space policy π = − L ( x − x ∗ ) where x ∗ = 0 and L = I under the active constraint. In each trajectory, the task space policy is a linear policy tracking a task space target r ∗ , which is drawn uniformly from r ∗ x ∼ U [ − 1 , 1] , r ∗ z ∼ U [0 , 2] , r ∗ θ ∼ U [0 , π ] . The start states for each trajec- tory were randomly selected from the uniform distribution q 1 ∼ U [0 ° , 10 ° ] , q 2 ∼ U [90 ° , 100 ° ] , q 3 ∼ U [0 ° , 10 ° ] . Targets without a valid inverse kinematics solution are discarded. For each task constraint, 50 trajectories each of length 50 steps were generated. Following the same procedure, another 50 trajectories are also generated as unseen testing data. Fig. 6 (left) shows an example trajectory with constraint Λ x,z and task space target r ∗ = [1 , 0] (black line). In a real situation, the null space component is ̃ u ns is unlikely to be available in the raw data. Therefore, a parametric model of the form ̃ u ns = w β is learnt from this data by minimising (5). Here, β is a vector of 100 radial basis functions, and w is a vector of parameters. The latter is then used to learn the constraint A through the methods outlined in §3.4. In the following results for 50 repeats of this experiment are reported. Looking at the NPPE and NPOE columns in Table 2, a good approximation of the null space projector is learnt for each of the constraints, with errors of order 10 − 2 or lower in all cases. It can also be seen that the errors when learning Λ are lower than those when learning ˆ α . This is to be expected since the former relies on prior knowledge of J ( x n ) , while the latter models the nonlinear, state dependent Λ J ( x n ) in absence of this information. To further test the accuracy of the approximation, the trajectory generated under the learnt constraints can be compared with those under the ground truth constraints, using the same task and null space policies. In Fig. 6 (left), the red line shows this for the aforementioned example trajectory with learnt ̃ Λ x,z . As can be seen, the latter matches the true trajectories extremely well. Finally, to test the ability of the learnt approximation to generalise to new situations, the trajectories generated with (previously unseen) (i) null space policy π ′ and (ii) tasks b ′ are also compared to see if the learnt constraint can be used to predict behavioural outcomes for policies not present in the training data. Fig. 6 (centre) shows the trajectory generated when a new null space policy π ′ = − 0 . 05 || x || 2 , not present in the training data, is subjected to (i) the true constraints ( i.e., A † b + N π ′ , black), and (ii) the learnt constraint ( i.e., ̃ A † b + ̃ N π ′ , red). Fig. 6 (right) shows the trajectory generated when a new task space policy b ′ with a new task space target r ′ = [ − 1 , 2] under (i) the true constraints ( i.e., A † b ′ + N π , black), and (ii) the learnt constraint ( i.e., ̃ A † b ′ + ̃ N π , red). In both cases, a close match is seen be- tween the predicted behaviour under the true and learnt constraints, indicating good 0 1 0 1 2 r x r z Initial Target 0 1 0 1 2 r x r z Initial Target −1 0 1 0 1 2 r x r z Initial Target Figure 6: Example trajectories generated from testing data (left), a new null space policy (centre), and a new task space policy (right). Shown are trajectories using the true A (black) and the learnt ̃ A (red). generalisation performance. 5 Conclusion In this paper, we consider how the null space projection matrix of a kinematically constrained system, and have developed a method by which that matrix can be ap- proximated in the absence of any prior knowledge either on the underlying movement policy, or the geometry or dimensionality of the constraints. Our evaluations have demonstrated the effectiveness of the proposed approach on problems of differing dimensionality, and with different degrees of non-linearity. The have also validated the use of our method in the adaptation of a learnt policy onto a novel constraints. For future research, we plan to validate the proposed method on robots with higher degree of freedom and human data where the true policy and constraint are both un- known, and to study variants of the approach that may improve its efficiency through iterative learning approaches. References [1] A. Billard, S. Calinon, R. Dillmann, and S. Schaal, Robot programming by demonstration . MIT Press, 2007. [2] J. Craig, P. Hsu, and S. Sastry, “Adaptive control of mechanical manipulators,” Int. J. Robotics Research , vol. 6, no. 2, pp. 16–28, 1987. [3] M. Kawato, “Feedback-error-learning neural network for supervised motor learn- ing,” in Advanced Neural Computers , R. Eckmiller, Ed. Elsevier, 1990, pp. 365–372. [4] O. Khatib, “A unified approach for motion and force control of robot manipula- tors: The operational space formulation,” IEEE J. Robotics & Automation , vol. 3, no. 1, pp. 43–53, 1987. [5] M. Gienger, H. Janssen, and C. Goerick, “Task-oriented whole body motion for humanoid robots,” 2005 5th IEEE-RAS Int. Conf. Humanoid Robots , 2005. [6] O. Khatib, L. Sentis, J. Park, and J. Warren, “Whole-body dynamic behavior and control of human-like robots,” Int. J. Humanoid Robotics , 2004. [7] L. Sentis and O. Khatib, “Synthesis of whole-body behaviors through hierarchical control of behavioral primitives,” Int. J. Humanoid Robotics , vol. 2, no. 4, pp. 505–518, 2005. [8] M. Stilman and J. Kuffner, “Planning among movable obstacles with artificial constraints,” Int. J. Robotics Research , vol. 27, no. 11-12, pp. 1295–1307, 2008. [9] M. Howard, S. Klanke, M. Gienger, C. Goerick, and S. Vijayakumar, “A novel method for learning policies from variable constraint data,” Autonomous Robots , 2009. [10] C. Towell, M. Howard, and S. Vijayakumar, “Learning nullspace policies,” in IEEE Int. Conf. Intelligent Robots and Systems , 2010. [11] M. Blauer and P. Belanger, “State and parameter estimation for robotic manip- ulators using force measurements,” IEEE Transactions on Automatic Control , vol. 32, no. 12, 1987. [12] T. Yoshikawa, “Dynamic hybrid position/force control of robot manipulators,” IEEE J. Robotics & Automation , vol. 3, no. 5, 1987. [13] M. Howard, S. Klanke, M. Gienger, C. Goerick, and S. Vijayakumar, “Learning potential-based policies from constrained motion,” in IEEE Int. Conf. Humanoid Robots , 2008. [14] F. E. Udwadia and R. E. Kalaba, Analytical dynamics: a new approach . Cam- bridge University Press, 2007. [15] H.-C. Lin, M. Howard, and S. Vijayakumar, “A novel approach for representing and generalising periodic gaits,” Robotica , vol. 32, pp. 1225–1244, 2014. [16] ——, “Learning null space projections,” in IEEE Int. Conf. on Robotics and Au- tomation , 2015, pp. 2613–2619. [17] M. Howard, S. Klanke, M. Gienger, C. Goerick, and S. Vijayakumar, “Robust constraint-consistent learning,” in IEEE Int. Conf. Intelligent Robots and Systems , 2009.