arXiv:1405.7178v3 [cs.RO] 20 Oct 2015 MOVIC2014 Vol.X, No.X, XXXX Artificial Wrestling: A Dynamical Formulation of Autonomous Agents Fighting in a Coupled Inverted Pendula Framework Katsutoshi YOSHIDA∗, Shigeki MATSUMOTO∗and Yoichi MATSUE∗∗ ∗Department of Mechanical and Intelligent Engineering, Utsunomiya University 7-1-2 Yoto, Utsunomiya, Tochigi, 321-8585, Japan E-mail: yoshidak@cc.utsunomiya-u.ac.jp ∗∗Yuhara Mfg Co., Ltd 1256 Ujiie, Sakura, Tochigi 329-1311, Japan Abstract We develop autonomous agents fighting with each other, inspired by human wrestling. For this purpose, we propose a coupled inverted pendula (CIP) framework in which: 1) tips of two inverted pendulums are linked by a connection rod, 2) each pendulum is primarily stabilized by a PD-controller, 3) and is additionally equipped with an intelligent controller. Based on this framework, we dynamically formulate an intelligent controller designed to store dynamical correspondence from initial states to final states of the CIP model, to receive state vectors of the model, and to output impulsive control forces to produce desired final states of the model. Developing a quantized and reduced order design of this controller, we have a practical control procedure based on an off-line learning method. We then conduct numerical simulations to investigate individual performance of the intelligent controller, showing that the performance can be improved by adding a delay element into the intelligent controller. The result shows that the performance depends not only on quantization resolutions of learning data but also on delay time of the delay element. Finally, we install the intelligent controllers into both pendulums in the proposed framework to demonstrate autonomous competitive behavior between inverted pendulums. Key words : Multiagent System, Competitive Problem, Intelligent Control, Nonlinear Dynamics, Reachable Set 1. Introduction Wrestling seems to be composed artificially of two mechanical agents maintaining their balance, coupled via me- chanical interactions such as contact, connection, collision, etc., and equipped with intelligent controllers competitive with each other. In this paper, we develop a simple model to create such competitive agents. For this purpose, we propose a coupled inverted pendula (CIP) framework in which: 1) tips of two inverted pendulums are linked by a connection rod, 2) each pendulum is primarily stabilized by a PD-controller, 3) and is additionally equipped with an intelligent controller that individually generates a series of impulsive internal forces to achieve its own desired final states based on knowledge of correspondence from initial states to the final states. In general, multiple agents can exhibit competitive and cooperative dynamics when sharing common resources and environments. Historically, early mathematical insights into such mutual interactions seem to have appeared in the filed of mathematical ecology (Hofbauer and Sigmund, 1998) in which population dynamics of different species sharing a common environment is described by a system of coupled nonlinear differential equations such as the Lotka-Volterra equation. Contrary to the ecosystem in which the medium of interaction is given by environments, in our CIP framework, the medium of interaction is given by a mechanical structure. In our previous study (Yoshida and Ohta, 2008), we have already demonstrated that the CIP model, even without intelligent controllers, can produce competitive dynamics comparable to that in the ecosystem such as coexistence and dominance by assigning competitive meanings to the stable equilibriums of the CIP model. Although quite similar mechanical models have been considered in the field of multiple manipulator systems (Hsu, 1993; Nakamura et al., 1987; Panwar et al., 2012), they have only focused on cooperative dynamics because of their aim at developing coordinated motions in those systems. ∗Received DD MM, YYYY (No. XX-XXXX) [DOI: 10.1299/XXX.X.1] Copyright c⃝2014 by JSME 1 MOVIC2014 Vol.X, No.X, XXXX In our study mentioned above (Yoshida and Ohta, 2008), each pendulum is PD-controlled to be bistable at the top and bottom dead points such that the coupled system produces quadra-stability. In absence of additional inputs, this system converges into one of the four stable positions (equilibriums) depending upon the initial conditions. We then applied a single impulsive force to one of the pendulums to generate switching behavior from a given stable position to a desired stable position and considered it as a prototype of fighting-like behavior. In this prototype, however, the behavior is exactly determined by the initial position and strength of the impulse because of uniqueness of solution of differential equation. Therefore, it is quite hard to say that this first prototype is comparable to the wrestling players who seek how to generate internal forces to achieve desired final positions in autonomous ways. In order to build such autonomous agents fighting with each other, a certain intelligent motion controller is required. On such controllers, extensive research has been conducted in the field of multi-robot systems (Maravall et al., 2013; Pagello et al., 1999). The major issue in this field appears to be how to obtain cooperative group dynamics of robots both in algorithm-based approaches (Stone and Veloso, 2000) and in differential-equation-based approaches (Hsu, 1993; Nakamura et al., 1987; Panwar et al., 2012). On the other hand, competitive group behavior seems to have been studied mainly based on algorithm-based ap- proaches. For example, Nelson et al. (2004) studied an evolutionary controller to investigate a form of reinforcement learning that makes use of competitive tournaments of games (robot capture the flag) played by individuals in a popula- tion of neural controllers. Moreover, Wu et al. (2013) developed rule or knowledge-based techniques to analyze strategy in robot soccer game. In this way, in contrast to our approach (Yoshida and Ohta, 2008), differential-equation-based tech- niques are not always essential in these studies because they seek step-by-step algorithms predicting free space determined by other robots. In this paper, we first introduce the CIP framework to describe the competitive behavior in differential-equation- based manners. Next, we develop a competitive intelligent controller that receives state vectors of the CIP model and outputs impulsive control forces to produce desired final states. After evaluating individual performance of the intelligent controller and investigating how to improve the performance, we will demonstrate autonomous competitive behavior between two inverted pendulums equipped with the proposed intelligent controllers. 2. Coupled inverted pendula framework 2.1. Coupled inverted pendula In order to create wrestler-like mechanical agents maintaining their balance while being coupled mechanically with each other, we consider a CIP model (Yoshida and Ohta, 2008) as shown in Fig. 1. Each inverted pendulum consists of a cart moving along the horizontal floor (Y = 0) and a simple pendulum rotating about a point on the cart. For simplicity, a common physical specification is given to both of the pendulums where mθ is a mass and r is a length of the pendulum, and mx is a mass of the cart. Linking the tips of the pendulums with a viscoelastic connection rod of length w, we obtain the CIP model in Fig. 1 where Ti is an input torque on θi, f i is a reaction force acting on the tip of ith pendulum, and kw and cw are a spring coefficient and a viscous friction coefficient of the rod respectively. We assume that a mass of the rod is negligible. As configuration of this linkage is uniquely determined by the four variables: horizontal displacements of the carts x1, x2 and slant angles of the pendulums θ1, θ2, the dynamics of this linkage is described by the eight-dimensional state vector: x := (xT 1 , xT 2 )T, xi := xi, ˙xi, θi, ˙θi T (i = 1, 2), (1) where AT denotes the transpose of a matrix A. According to Lagrangian mechanics and assuming viscous friction forces cx ˙xi and cθ ˙θi on xi and θi respectively, we obtain equations of motion (EOM) of the CIP model in Fig. 1 as follows:  (mx + mθ)¨xi + (mθr cos θi)¨θi −mθr˙θ2 i sin θi = −cx ˙xi + (1, 0)fi, (mθr cos θi)¨xi + (mθr2)¨θi −mθgr sin θi = −cθ ˙θi + r(cos θi, −sin θi)f i + Ti (i = 1, 2), (2) where ˙X := dX/dt. 2.2. Reaction force from the connection rod We calculate the reaction force, say p, from the connection rod as shown in Fig. 2. The displacement vector w from 2 MOVIC2014 Vol.X, No.X, XXXX Fig. 1 Coupled inverted pendula model via viscoelastic connection. Fig. 2 Reaction force p from the connection rod. the left-hand tip to the right-hand tip of pendulums is given by w =  wX wY := X2 −X1, Xi =  Xi Yi =  xi + r sin θi r cos θi  (i = 1, 2), (3) and the length of rod is expressed as w = ∥w∥= p {(x2 −x1) + r(sin θ2 −sin θ1)}2 + {r(cos θ2 −cos θ1)}2 (4) with the time derivative ˙w = ( ˙wXwX + ˙wYwY)/w. Then, we model viscoelasticity of the connection rod as p = ∥p∥:= −kw(w −w0) −cw ˙w, (5) where w0 is a natural length of the connection rod. As the force vector p is parallel to the displacement vector w, we have the reaction force: p = (p/w)w. (6) Substituting p into the EOM (2) through, f 1 = −p, f 2 = p or f i = (−1)ip , (7) we obtain an analytic expression of the CIP model shown in Fig. 1 via the viscoelastic connection. 2.3. Modeling floor In the previous study (Yoshida and Ohta, 2008), the pendulum can fall down freely to the bottom dead point, in other word, there was no floor in the previous CIP model. In that case, both of forward and backward falling motions converge to the same equilibrium of the model so that orbital information is required in order to detect the direction of falling. Not only to simplify the detection process but also to develop more realistic simulator of wrestling, we introduce the floor model to the CIP model in the following manner. Based on penalty methods (Moore and Wilhelms, 1988), we first model a normal force Ri from the floor (Y = 0) acting on the tip of ith pendulum as Ri = U(−Yi){−k fYi −c f ˙Yi}, (8) where Yi is a height of the ith tip from the floor in (3), U( · ) is a unit step function, and k f, c f are viscoelastic parameters representing property of the reaction. In practice, in order to avoid numerical errors, we approximate the step function with a sigmoid function differentiable, defined by Uσ(s) := 1 + exp(−σs) −1 , (9) where limσ→∞Uσ(s) = U(s) holds. Furthermore, a Coulomb friction force Fi acting on the ith tip from the floor can be expressed as Fi = −µRi sgn( ˙Xi), (10) where µ is a friction coefficient, ˙Xi is a relative horizontal velocity of the ith tip from the floor, and sgn( · ) is a unit signum function whose smooth approximation can be given by sgn(s) ≈sgnσ(s) := 2Uσ(s) −1. Therefore, the CIP model via the viscoelastic connection in (7) on the floor can be obtained by substituting f i = (−1)ip + (Fi, Ri)T (i = 1, 2) (11) into the EOM in (2). 3 MOVIC2014 Vol.X, No.X, XXXX Fig. 3 Competitive interpretation of the equilibriums. Fig. 4 Architecture of the proposed control system to generate competitive motions. 2.4. Standing control with falling The CIP framework in absence of intelligent controllers is completed by giving dynamical meanings of winning and losing to states of the CIP model. To this end, we begin with developing a feedback controller by which each inverted pendulum on the floor forms three stable equilibriums: θi = 0 for standing or winning and θi = ±π/2 for falling or losing. This can be done by introducing a feedback controller in the following form: Ti = upd i := trapα(θi; ∆θ){−Kpθi −Kd ˙θi} (i = 1, 2), (12) where trapα(θ; ∆θ) := Uα(θ + ∆θ) · Uα(−θ + ∆θ) (13) is a smooth trapezoidal function of unit height and centered at θ = 0 made of a product of the sigmoid function in (9), ∆θ > 0 is a half width of the trapezoidal shape, and the shape is getting steeper as α increases. It follows from the deadband characteristics in (12) that upd i simply acts as a PD controller within the limit |θi| < ∆θ while it rapidly cuts offthe output outside of the interval. Therefore, appropriate setting of the gains Kp, Kd make it possible for the ith pendulum to be stabilized about the standing position θi = 0 while to be falling down to the floor when |θi| exceeds the given limit ∆θ. It is worthy to note that in the field of gerontology and related fields, human standing (or falling) limits comparable to the threshold ∆θ have been measured by the functional reach test (Duncan et al., 1990) in which the difference between arm length and maximal forward reach of human subjects is measured to evaluate risk of falls of them. 2.5. CIP framework In what follows, we compare the inverted pendulums in this model to wrestler-like agents maintaining their standing balance. Since each agent with the standing control in (12) has the three stable equilibriums, a pair of the agents being coupled with each other under the suitable conditions can produce 3 × 3 = 9 stable equilibriums: ωi := lim t→∞x(t) = ¯x1, 0, ¯θ1, 0, ¯x2, 0, ¯θ2, 0T (i = 1, · · · , 9), (14) as shown in Fig. 3 when equating horizontal translations of final position ¯x1, ¯x2 without loss of generality. Namely, the components x1(t), x2(t) of the solution x(t) of (2) are not stable asymptotically but neutrally because no restoring forces on x1(t), x2(t) are assumed by definition. It also should be noted that due to the penalty method in (8), the gravity force makes the equilibrium ¯θi on the floor slightly exceed the floor, i.e., |¯θi| −π/2 > 0, but we formally denote θi = ¯θi = ±π/2 because this slight exceedance only affects almost converged states and does not change the correspondence from the initial to final states. We then attach competitive meanings to the nine equilibriums as listed in Fig. 3 in which the agent that remains standing is regarded as a winner. Eventually, we have the CIP framework composed of the set of (A) and (B): (A) The CIP model: the system of equations defined in (2), (11) and (12). (B) The win-loss matrix: the competitive interpretation of the nine equilibriums defined in Fig. 3. 4 MOVIC2014 Vol.X, No.X, XXXX 3. Intelligent controller Based on the CIP framework in Section 2.5, we develop an intelligent controller (IC) to produce desired final posi- tions in Fig. 3 from given initial states of CIP by generating certain impulsive forces. 3.1. Problem setting and requirements According to the definition of the state vector x = (xT 1 , xT 2 )T in (1), the CIP model in (2) can be expressed as an eight-dimensional dynamical system: ˙x = f(x, T), x(0) = x0, T := (T1, T2)T (15) that can be divided into a pair of four-dimensional subsystems:  ˙x1 = f 1(x1, x2, T1), ˙x2 = f 2(x2, x1, T2). (16) We introduce the IC by adding an intelligent control input uic i to the torque Ti of CIP model as T := upd + uic = upd 1 , upd 2 T + uic 1 , uic 2 T, (17) where upd i is the standing control input already given in (12). In the present study, the input-output relationship around uic i is designed as shown in Fig. 4, in which uic i receives all the state vectors x1, x2 and outputs a series of impulsive forces given by uic i (t) := N X j=1 Pi I∆τ(t −t j i ), (18) where I∆τ(t) =  (∆τ)−1 (0 ≤t < ∆τ), 0 (otherwise) (19) is a rectangular function of unit area of width ∆τ ≪1, Pi is an angular impulse of the input torque uic i (t), and {t1 i , · · · , tN i } is a series of rise time satisfying t1 i < t2 i < · · · < tN i , max j,k |t j i −tk i | ≥τG ≥∆τ, (20) where τG is a relaxation time to avoid overlapped outputs. In practical implementation, the rise times t1 i , · · · , tN i are supposed to be determined sequentially by a real time architecture described in Fig. 5, which is composed of three components, a classifier C, a selector S J, and an impulse generator G. 3.2. Classifier C We define the classifier C as a function from a state vector x = (xT 1 , xT 2 )T at the time t = t0, say x(t0) = ξ0, to an index number ν of equilibrium ων. The function C takes the value C(ξ0) = ν if a solution of the following system: ˙x = f(x, T), x(t0) = ξ0, T := upd +  P1 I∆τ(t −t0) 0  (21) converges to the equilibrium ων. From this definition in which a single impulse at t = t0 is applied only on the left-hand agent, it is implied that the classifier C is valid only in absence of additional inputs, in other words, it can fail to return correct equilibriums for more general cases of input as in (18) where both agents can produce impulsive forces for their own decisions. Despite that, we will proceed with a discussion to build a first prototype of artificial wrestling. Consider the transition operator of a solution of (21) as x(t) := φt(ξ0, T), (22) 5 MOVIC2014 Vol.X, No.X, XXXX Fig. 5 Intelligent controller. Fig. 6 Impulse generator. and define a set of the initial state ξ0 approaching the equilibrium ωi as Φi := n ξ0 ∈R8 lim t→∞φt(ξ0, T) = ωi o . (23) The set Φi is generally called a basin of attraction (Lhommeau et al., 2011) or a reachable set (Bayadi et al., 2013). It follows from uniqueness of solution of initial value problem in (21) that the reachable set in (23) satisfies, Φi ∩Φ j = ∅ (i , j). (24) Thus, the classifier C is obtained as a single-valued function: Cξ0  := ν if ξ0 ∈Φν. (25) In Section 3.5, we will discuss a numerical approximation of C because explicit expressions of C( · ) are hardly obtainable from nonlinear systems such as (15) and also (21). 3.3. Selector SJ In the competitive problem in Fig. 3, some of the equilibriums are selected depending upon strategies of the agent considered. Such a selection process can be modeled by a selector S J given by δ = S J(ν) :=  1 (ν ∈J ⊂{1, · · · , 9}), 0 (otherwise), (26) where ν = C(x) is an output of the classifier and J is a given subset of indices of the equilibriums ω1, · · · , ω9. For example, the trajectory x(t) in (21) starting from an initial state ξ0 satisfying (S J ◦C)(ξ0) := S J  C(ξ0)  = 1 for J = {2, 3} (27) converges one of the two equilibriums ω2 and ω3. 3.4. Impulse generator G The impulse generator G is designed to receive the binary signal δ(t) = S J(ν(t)) from the selector and output the unit impulse G(t) as shown in Fig. 6, which is composed of a two-input AND gate and two timer functions TI and TG. The timer TI produces a unit impulse as G(t) = TI(t) := I∆τ(t −tr), (28) and the timer TG cuts offthe binary signal δ(t) = S J(ν(t)) for a given relaxation time τG discussed in (20) as TG(t) :=  0 (tr < t < tr + τG), 1 (otherwise), (29) where tr is the rise time from 0 to 1 of the Boolean product ˆδ(t) = S J(ν(t))∧TG(t). Note that although the signal ˆδ(t) can be pulses of infinitesimal width in this continuous time expression, this problem will never arise in discrete time applications with digital computers. 6 MOVIC2014 Vol.X, No.X, XXXX 3.5. Numerical approximation of the classifier C Explicit expression of C(x) is hardly obtainable from nonlinear systems such as (15). It seems that there are two types of solutions: solving (15) in process or making a numerical table of the mapping C : x 7→ν in advance. We take the latter approach in the following manner. For practical applications, we introduce a linear measurement equation: y = Hx, y ∈RM, x ∈R8, (30) where M ≤8 and H is a M × 8 matrix of rank M, and define a reduced-order reachable set in the following sense: Ψi = H(Φi) := n η0 ∈RM ξ0 = h(η0) ∈R8, lim t→∞φt(ξ0, T) = ωi o , (31) where x′ = h(y) (, x in general) is a certain inverse of y = Hx. An actual example of h will be given in Section 3.6. Firstly, we take a M-dimensional cubic region D of measuring range within a direct sum of the reachable sets as 9 M i=1 Ψi ⊃D := [a1, b1] × · · · × [aM, · · · , bM] (32) where L represents a direct sum (disjoint union) and [a, b] denotes an interval. We divide it into a direct sum of uniform subcubes Di as D = M n Di i ∈I = [1, · · · , m1] × · · · × [1, · · · , mM] o , (33) where I is a space of M-dimensional indices and m j is the number of subcubes in the jth direction. We then define center points of the subcubes yi ∈Di (i ∈I) whose jth component is given by (yi) j := a j + (i) j −1 2 ! b j −a j m j , (34) where (u) j denotes the jth component of a vector u. The setup above allows us to build a numerical method as follows: ( 1 ) As an offline learning procedure, the mapping ¯C: ¯C(i) := ν if lim t→∞φt h(yi), T = ων, (35a) is numerically stored by solving (21) from ξ0 = h(yi). ( 2 ) Within the IC in process, the classifier C is quantized by C∗as C(x) ≈C∗(x) := ¯C(i) for i such that Hx ∈Di. (35b) In this method, accuracy of the classifier C∗depends on the dimension of measurement M, the size and placement of measuring range [a j, b j], and the resolution of quantization of reachable set m j (j = 1, · · · , M). In summary, we have obtained the IC for the left-hand agent as a composed function of the quantized classifier C∗, the selector S J, and the impulse generator G, which results in the closed-loop form: uic 1 = uic 1 x(t); J := P1 · (G ◦S J ◦C∗)x(t). (36) If M = 8 and τG is sufficiently large, it is implied that the solution x(t) in (21) starting from any initial states in D undergoes an impulsive force at the time t = t0 that uic 1 decides autonomously and that it converges to the stable equilibriums specified by J, under the resolution limit, minj(m j) →∞. 3.6. Reduced-order design of measurement for rigid connection The connection rod can behave like a rigid rod if the viscoelastic parameters kw, cw are sufficiently large. For the rest of this paper, we restrict our problem to this nearly rigid connection. Although human wrestling involves more flexible interactions between agents, this allows us substantially to reduce the computational efforts as follows. In this case, dependency between displacements and velocities of the left-hand and right-hand carts is imposed in the following form: x2 = x2(x1, θ1, θ2) = x1 −r(sin θ2 −sin θ1) + q w2 0 −r2(cos θ2 −cos θ1)2, (37a) ˙x2 = ˙x2(˙x1, θ1, ˙θ1, θ2, ˙θ2) = ˙x1 −r(˙θ2 cos θ2 −˙θ1 cos θ1) + r2(cos θ2 −cos θ1)(˙θ2 sin θ2 −˙θ1 sin θ1) q w2 0 −r2(cos θ2 −cos θ1)2 , (37b) 7 MOVIC2014 Vol.X, No.X, XXXX Table 1 Parameter setting of the CIP model. Parameters Values mθ mass of pendulum 0.68 kg mx mass of cart 0.067 kg g acceleration of gravity 9.8 m/s2 r length of pendulum 0.3 m cx viscous coefficient along x 0.01 Ns/m cθ viscous coefficient about θ 0.01 Ns/m w0 natural length of connection rod 1 m kw spring coefficient of connection rod 5000 N/m cw viscous coefficient of connection rod 50 Ns/m kf spring coefficient of floor 500 N/m cf viscous coefficient of floor 10 Ns/m µ Coulomb friction coefficient of floor 0 σ steepness of step function 106 α steepness of trapezoidal function 25 Kp proportional gain of standing control 1 Kd derivative gain of standing control 0.01 ∆θ threshold of standing control π/6 rad Table 2 Parameter setting of the four-dimensional IC and numerical integration. Parameters Values Qmax maximal strength of impulse 0.06 Nms P1 strength of impulse of the 1st IC = Qmax P2 strength of impulse of the 2nd IC = −Qmax ∆τ width of impulse = ∆t τG relaxation time of impulse generator = ∆t D region of measurement [−0.13, 0.43] × [−3.28, 10.58] ×[−0.35, 0.31]×[−3.80, 5.15] ∆t step size of numerical integration 5 × 10−4 s Free parameters m resolution of numerical classifier (mj := m for all j) Q strength of impulsive disturbance τd delay time of delay element where w = w0 ≫2r is a constant length of the rigid rod. Then, a loss-less state feedback to IC can be done by the following measurement: y = Hx = (x1, ˙x1, θ1, ˙θ1, θ2, ˙θ2)T, H = h e(6) 1 , e(6) 2 , e(6) 3 , e(6) 4 , o(6), o(6), e(6) 5 , e(6) 6 i , (38a) where e(d) i , o(d) denote the ith standard basis vector and the zero vector in Euclidean space Rd respectively. A corresponding inverse satisfying the rigid constraint can be defined by h(y) := H+y + x2(x1, θ1, θ2)e(8) 5 + ˙x2(˙x1, θ1, ˙θ1, θ2, ˙θ2)e(8) 6 , (38b) where H+ is the Moore-Penrose pseudoinverse of H. This measurement reduces computational efforts to obtain the quantized reachable sets of ¯C in (35a) from O(m8) to O(m6) with respect to the resolution of quantization m. In the following numerical examples, we perform a further reduction of order given by y = Hx = (θ1, ˙θ1, θ2, ˙θ2)T, H = h o(4), o(4), e(4) 1 , e(4) 2 , o(4), o(4), e(4) 3 , e(4) 4 i , (39a) h(y) := H+y + x1e(8) 1 + ˙x1e(8) 2 + x2(x1, θ1, θ2)e(8) 5 + ˙x2(˙x1, θ1, ˙θ1, θ2, ˙θ2)e(8) 6 with x1 = ˙x1 = 0. (39b) Although this measurement suffers complete loss of information about the cart motion x1, ˙x1 (and x2, ˙x2 via (37)), it reduces the computational efforts into O(m4). In this paper, we employ this four-dimensional measurement in priority to reducing the computational efforts. Moreover, in this four-dimensional measurement, the controller uic 1 (x; J) originally designed for the left-hand agent can symmetrically be reused for the right-hand agent by a transformation: uic 2 (x; J′) := uic 1 (x′; J), P2 := −P1, x′ := −(xT 2 , xT 1 )T, (40) where J′ = {4, 7} for J = {2, 3} due to the transpose of the 3 × 3 matrix of ωi in Fig. 3. 4. Numerical investigation We conduct numerical experiments to evaluate performance of the four-dimensional IC of the measurement in (39). For simplicity, in what follows, we set the resolution m j of quatization to a common value m for all j. The parameter values used in the following examples are listed in Table 1. The physical dimensions mθ, mx and r are roughly collected from the commercially available inverted pendulum (ZMP INC., 2011). For numerical integration, the fourth-order Runge-Kutta- Gill method is employed with the time step ∆t listed in Table 2. 4.1. Individual performance of IC In order to investigate individual performance of IC, we first consider impulse responses of the CIP model in (15) equipped with the IC in the left-hand only given by T = upd + uic + u = upd +  uic 1 x(t); J1  0 +  v(t) 0 , v(t) := QI∆τ(t), J1 := {2, 3}, (41) 8 MOVIC2014 Vol.X, No.X, XXXX where upd is the standing controller in (12), Q is an impulse of initial disturbance, uic 1 is the four-dimensional IC developed in (39), and I∆τ(t) is the unit impulse function defined in (19). In the following numerical examples, we assume the maximal strength of impulse Qmax = 0.06 so that v(t) cannot produce switching motions from the trivial initial state x(0) = o(8) + w0e(8) 5 (or ω1) to the other stable states in order to avoid trivial switching motions. Following this assumption, the region of measurement D is taken as listed in Table 2 so that it circumscribes at least all the trajectories y(t) = Hx(t) for the maximal disturbance Q = Qmax. We also set the strength of impulse of IC to a common value P1 = −P2 = Qmax to avoid the trivial switching motions mentioned above. Note that under the conditions of kw and cw listed in Table 1 and for Q ≤Qmax, dynamic change of length of the connection rod in (41) is limited to |w(t) −w0|/w0 < 0.01 so that the measurement in Section 3.6 is expected to work. Solving (15) with (41) numerically from a trivial initial state x(0) = o(8) + w0e(8) 5 for a given Q, we have the corre- sponding final position ων and obtain correspondence from Q to ν as plotted in Fig. 7 for the resolution m = 50 where Q is taken at NQ = 100 points with uniform increment over the interval [0, Qmax]. The small circles represent the results of ν(Q) in presence of the IC’s outputs and the cross marks represent those in absence of the outputs due to weak v(t). To evaluate the performance, we define a success rate in the following form: E = E(J) := NJ/(NQ −N0) (0 ≤E ≤1), (42) where NJ is the number of points on Q at which limt→∞x(t) = ων for all ν ∈J and N0 is the number of points in absence of the IC’s outputs. The success rate of the result for m = 50 shown in Fig. 7 is calculated as E = 0.165. Also, the rate for m = 100 can be obtained as E = 0.305 in the same manner. Therefore, it appears that our IC has low performance. 4.2. Performance improvement with a delay element One of the reasons of the low performance above can be explained in Fig. 8. The colored areas represent the quantized reachable sets: Ψ∗ ν := M n Di ¯C(i) = ν, i ∈I o (ν = 1, · · · , 9), (43) plotted on the hyper plane that contains the point x(t1) on the phase trajectory at which the IC’s output occurs (indicated by the cross mark). The resolutions of quantization of reachable sets are m = 50 for Fig. 8 (a) and m = 1000 for Fig. 8 (b). In the low resolution case in Fig. 8 (a), it can numerically be clarified that x(t1) belongs to the reachable set Ψ∗ 2 (in red) of ω2 while x(t) (t > t1) actually converges to ω4 (, ω2). Such misclassification can be refined by the high resolution as shown in Fig. 8 (b). In this case, the point x(t1) is primly classified into Ψ∗ 4 (in blue) of ω4. Although in theory, taking a sufficiently large resolution m provides nearly exact reachable sets, it greatly enlarges computational efforts. Another approach to reduce the misclassification is to replace the quantized reachable set Ψ∗ i in (43) of ωi with a subset Ψ◦ i := Ψ∗ i −∆Ψ∗ i where ∆Ψ∗ i ⊂Ψ∗ i is a set of border points neighboring the other sets Ψ∗ j (j , i). However, extracting the border points is not necessarily easy because the reachable sets sometimes exhibit nested structures as discussed by one of the authors (Yoshida, 2009). Actually, in Fig. 8 (b), quite narrow region of Ψ∗ 1 (in white) appears between Ψ∗ 2 (in red) and Ψ∗ 4 (in blue). Therefore, let us take yet another approach by replacing the IC with a delayed IC (DIC) in the following form: udic i  x(t); J, τd := uic i  x(t −τd); J  (i = 1, 2), (44) where τd is a delay time. It seems reasonable to expect that a trajectory about to crossing a course-grained border of a reachable set reaches the true border soon. Figure 9 shows the success rate E corresponding to this replacement given by T = upd +  udic 1  x(t); J1, τd 0 + u, J1 := {2, 3}, (45) where E is averaged over the two types of initial disturbance: u(t) = v(t), 0T and 0, v(t)T, and the other procedures of obtaining E are the same as in Fig. 7. In Fig. 9, the triangles and the circles represent E as functions of the delay time τd of DIC in (44) for m = 50 and 100 respectively. It is clearly seen that the functions E(τd) are nearly concave down. The maximal values are E = 0.392 at τd = 0.0045 and E = 0.683 at τd = 0.0025 for m = 50 and 100 respectively. Therefore, the DIC roughly doubles the success rate, namely, 0.392/0.165 ≈2.38 for m = 50 and 0.683/0.305 ≈2.24 for m = 100. 9 MOVIC2014 Vol.X, No.X, XXXX ν Q Success rate E = 0.165 IC unresponsive 1 2 3 4 5 6 7 8 9 0 0.01 0.02 0.03 0.04 0.05 0.06 Fig. 7 Index of final state ων as a function of the strength of initial disturbance Q for the resolution m = 50. -1.5 -1 -0.5 0 -1.5 -1 -0.5 0 0.5 1 θ · 2 θ · 1 (a) Trajectory IC output 1 2 3 4 5 6 7 8 9 -1.5 -1 -0.5 0 -1.5 -1 -0.5 0 0.5 1 θ · 2 θ · 1 (b) Trajectory IC output 1 2 3 4 5 6 7 8 9 Fig. 8 Misclassification of reachable sets. Colored areas are quantized reachable sets for (a) m=50 and (b) m=1000. 0 0.2 0.4 0.6 0.8 0 0.002 0.004 0.006 0.008 0.01 E τd m = 50 m = 100 Fig. 9 Success rate E as functions of the delay time τd of DIC in (44). 4.3. Competitive behavior In this final section, we install the DICs into both sides of controllers as T = upd +  udic 1  x(t); J1, τd 1  udic 2  x(t); J2, τd 2  + u, J1 := {2, 3}, J2 := {4, 7}, (46) where upd is the standing controller in (12), udic 1 and udic 2 is the four-dimensional DIC in (44) through (40), and u(t) = v(t), 0T, 0, v(t)T are initial impulsive disturbances of strength Q as shown in (41). The competitive meanings of J1 and J2 are shown in Fig. 3. Figure 10 shows a competitive behavior between the normal IC: udic 1 x(t); J1, 0 = uic 1 x(t); J1  for m = 100 and the DIC with optimal delay time: udic 2 x(t); J2, 0.0045 for m = 50. The time responses in Fig. 10 (a) are obtained by solving (15) with (46) from x(0) = o(8)+w0e(8) 5 under the conditions listed in Table 1 and Table 2 by applying the initial disturbance u(t) = v(t), 0T with Q = 0.0186. The same solution is plotted in Fig. 10 (b) as a motion of the CIP mechanism. For convenience, we refer to the former IC as “IC100” and the later as “DIC50”. It is shown in Fig. 10 that the ICs above generate the impulsive control forces autonomously to make the CIP system drop into ω2. According to the competitive interpretation in Fig. 3 inspired by some kind of wrestling match, it can be 10 MOVIC2014 Vol.X, No.X, XXXX -π/2 0 π/2 0 3 6 9 12 15 P2 0 P1 θi ui dic t i =1 i =2 (a) Time responses 0 0.1 0.2 0.3 0.4 0.5 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 Y X (b) Motion of mechanism Fig. 10 Competitive behavior for IC100 vs DIC50 (τd 2 = 0.0045). ν Q Total success rate E1 = 0.395 (ic; m = 100) E2 = 0.575 (dic; m = 50) v=(v(t),0) v=(0,v(t)) IC unresponsive 1 2 3 4 5 6 7 8 9 0 0.01 0.02 0.03 0.04 0.05 0.06 Fig. 11 Index of final state ων as functions of Q for IC100 vs DIC50 (τd 2 = 0.0045). said that “the left agent wins by pulling the right agent.” Figure 11 shows the result of competition. The plots are obtained in the same manner as those in Fig. 7, except that in Fig. 11, the open and filled circles represent the results for u(t) = v(t), 0T and 0, v(t)T respectively, and that the cross marks represent the results in which neither of IC100 and DIC50 produces its own output. Similar to the individual case in (42), we define the success rate of this competition as Ei = E(Ji) := NJi/(NQ −N0) (0 ≤Ei ≤1) (i = 1, 2), (47) where NQ = 100 × 2 is the number of all plots in Fig. 11, N0 is the number of trials in absence of the IC’s outputs (cross marks), and the definition of NJi is the same as that of NJ in (42). From the result in Fig. 11, the success rates are obtained as E1 = 66/167 ≈0.395 for IC100 and E2 = 96/167 ≈0.575 for DIC50. Therefore, it is shown that at least based on the present definition of competition and success rate, the performance improvement using time delay is more effective than that doubling the quantization resolution without the time delay. 5. Conclusion In this paper, we discussed a competitive problem in which mechanical agents are fighting with each other and formulated it as the set of: (A) the nonlinear dynamical model with nine stable equilibriums and (B) the matrix describing competitive interpretation of these equilibriums. Based on this framework, we proposed a competitive IC that receives the state vector and output the impulsive forces to make the competitor fall down. Developing a quantized and reduced order design of the controller, we derived a practical control procedure along with an off-line learning method. To investigate performance of the controller in individual use and also in competition use, we conducted numerical experiments and obtained the following results: • The individual performance of IC depends on resolutions of the quantized reachable sets. • The individual performance of IC can be improved by adding the delay element into the IC. 11 MOVIC2014 Vol.X, No.X, XXXX • To improve the competitive performance of IC, adding the delay element may become more effective than refining the resolutions. In future work, we plan to investigate a further order reduction of measurement based on time delayed embedding methods and to improve classification accuracy by applying machine learning techniques. We also plan to conduct com- petitions between humans and the proposed ICs. Moreover, application of position control to the carts will be considered to investigate competitive problems in a bounded area. Acknowledgment The authors appreciate the feedback offered by Dr. Munehisa Sekikawa. References Bayadi, R., Banavar, R. N. and Chang, D. E., Characterizing the reachable set for a spacecraft with two rotors, Systems & Control Letters, Vol.62, No.6 (2013), pp.453–460. Duncan, P. W., Weiner, D. K., Chandler, J. and Studenski, S., Functional reach: A new clinical measure of balance, Journal of Gerontology, Vol.45, No.6 (1990), pp.M192–M197. Hofbauer, J. and Sigmund, K., Evolutionary Games and Population Dynamics (1998), Cambridge University Press. Hsu, P., Coordinated control of multiple manipulator systems, IEEE Transactions on Robotics and Automation, Vol.9, No.4 (1993), pp.400–410. Lhommeau, M., Jaulin, L. and Hardouin, L., Capture basin approximation using interval analysis, International Journal of Adaptive Control and Signal Processing, Vol.25, No.3 (2011), pp.264–272. Maravall, D., de Lope, J. and Dom´ınguez, R., Coordination of communication in robot teams by reinforcement learning, Robotics and Autonomous Systems, Vol.61, No.7 (2013), pp.661–666. Moore, M. and Wilhelms, J., Collision detection and response for computer animation, SIGGRAPH Computer Graphics, Vol.22, No.4 (1988), pp.289–298. Nakamura, Y., Nagai, K. and Yoshikawa, T., Mechanics of coordinative manipulation by multiple robotic mechanisms, Proceedings of 1987 IEEE International Conference on Robotics and Automation, Vol.4 (1987), pp.991–998. Nelson, A., Grant, E. and Henderson, T., Evolution of neural controllers for competitive game playing with teams of mobile robots, Robotics and Autonomous Systems, Vol.46, No.3 (2004), pp.135–150. Pagello, E., DAngelo, A., Montesello, F., Garelli, F. and Ferrari, C., Cooperative behaviors in multi-robot systems through implicit communication, Robotics and Autonomous Systems, Vol.29, No.1 (1999), pp.65–77. Panwar, V., Kumar, N., Sukavanam, N. and Borm, J., Adaptive neural controller for cooperative multiple robot manipula- tor system manipulating a single rigid object, Applied Soft Computing, Vol.12, No.1 (2012), pp.216–227. Stone, P. and Veloso, M., Multiagent systems: A survey from a machine learning perspective, Autonomous Robots, Vol.8, No.3 (2000), pp.345–383. Wu, J., Sn´aˇsel, V., Ochodkov´a, E., Martinoviˇc, J., Svatoˇn, V. and Abraham, A., Analysis of strategy in robot soccer game, Neurocomputing, Vol.109 (2013), pp.66–75. Yoshida, K., Fractal dependence on initial conditions of coupled inverted pendula model of competition and cooperation, Journal of System Design and Dynamics, Vol.3, No.6 (2009), pp.966–974. Yoshida, K. and Ohta, H., Coupled inverted pendula model of competition and cooperation, Journal of System Design and Dynamics, Vol.2, No.3 (2008), pp.727–737. ZMP INC., e-nuvo WHEEL (2011), http://www.zmp.co.jp/e-nuvo/en/. 12