arXiv:1405.7178v3 [cs.RO] 20 Oct 2015  MOVIC2014   Vol.X, No.X, XXXX  Artificial Wrestling: A Dynamical Formulation of Autonomous Agents Fighting in a Coupled Inverted Pendula Framework  Katsutoshi YOSHIDA ∗ , Shigeki MATSUMOTO ∗   and Yoichi MATSUE ∗∗  ∗   Department of Mechanical and Intelligent Engineering, Utsunomiya University 7-1-2 Yoto, Utsunomiya, Tochigi, 321-8585, Japan E-mail: yoshidak@cc.utsunomiya-u.ac.jp  ∗∗   Yuhara Mfg Co., Ltd 1256 Ujiie, Sakura, Tochigi 329-1311, Japan  Abstract  We develop autonomous agents fighting with each other, inspired by human wrestling.   For this purpose, we propose a coupled inverted pendula (CIP) framework in which: 1) tips of two inverted pendulums are linked by a connection rod, 2) each pendulum is primarily stabilized by a PD-controller, 3) and is additionally equipped with an intelligent controller. Based on this framework, we dynamically formulate an intelligent controller designed to store dynamical correspondence from initial states to final states of the CIP model, to receive state vectors of the model, and to output impulsive control forces to produce desired final states of the model. Developing a quantized and reduced order design of this controller, we have a practical control procedure based on an o ff -line learning method. We then conduct numerical simulations to investigate individual performance of the intelligent controller, showing that the performance can be improved by adding a delay element into the intelligent controller. The result shows that the performance depends not only on quantization resolutions of learning data but also on delay time of the delay element. Finally, we install the intelligent controllers into both pendulums in the proposed framework to demonstrate autonomous competitive behavior between inverted pendulums.  Key words   : Multiagent System, Competitive Problem, Intelligent Control, Nonlinear Dynamics, Reachable Set  1. Introduction  Wrestling seems to be composed artificially of two mechanical agents maintaining their balance, coupled via me- chanical interactions such as contact, connection, collision, etc., and equipped with intelligent controllers competitive with each other. In this paper, we develop a simple model to create such competitive agents. For this purpose, we propose a coupled inverted pendula (CIP) framework in which: 1) tips of two inverted pendulums are linked by a connection rod, 2) each pendulum is primarily stabilized by a PD-controller, 3) and is additionally equipped with an intelligent controller that individually generates a series of impulsive internal forces to achieve its own desired final states based on knowledge of correspondence from initial states to the final states. In general, multiple agents can exhibit competitive and cooperative dynamics when sharing common resources and environments. Historically, early mathematical insights into such mutual interactions seem to have appeared in the filed of mathematical ecology (Hofbauer and Sigmund, 1998) in which population dynamics of di ff erent species sharing a common environment is described by a system of coupled nonlinear di ff erential equations such as the Lotka-Volterra equation. Contrary to the ecosystem in which the medium of interaction is given by environments, in our CIP framework, the medium of interaction is given by a mechanical structure.   In our previous study (Yoshida and Ohta, 2008), we have already demonstrated that the CIP model, even without intelligent controllers, can produce competitive dynamics comparable to that in the ecosystem such as coexistence and dominance by assigning competitive meanings to the stable equilibriums of the CIP model. Although quite similar mechanical models have been considered in the field of multiple manipulator systems (Hsu, 1993; Nakamura et al., 1987; Panwar et al., 2012), they have only focused on cooperative dynamics because of their aim at developing coordinated motions in those systems.  ∗   Received DD MM, YYYY (No. XX-XXXX) [DOI: 10.1299/XXX.X.1]  Copyright c ©   2014 by JSME  1
MOVIC2014   Vol.X, No.X, XXXX  In our study mentioned above (Yoshida and Ohta, 2008), each pendulum is PD-controlled to be bistable at the top and bottom dead points such that the coupled system produces quadra-stability. In absence of additional inputs, this system converges into one of the four stable positions (equilibriums) depending upon the initial conditions. We then applied a single impulsive force to one of the pendulums to generate switching behavior from a given stable position to a desired stable position and considered it as a prototype of fighting-like behavior. In this prototype, however, the behavior is exactly determined by the initial position and strength of the impulse because of uniqueness of solution of di ff erential equation. Therefore, it is quite hard to say that this first prototype is comparable to the wrestling players who seek how to generate internal forces to achieve desired final positions in autonomous ways. In order to build such autonomous agents fighting with each other, a certain intelligent motion controller is required. On such controllers, extensive research has been conducted in the field of multi-robot systems (Maravall et al., 2013; Pagello et al., 1999). The major issue in this field appears to be how to obtain cooperative group dynamics of robots both in algorithm-based approaches (Stone and Veloso, 2000) and in di ff erential-equation-based approaches (Hsu, 1993; Nakamura et al., 1987; Panwar et al., 2012). On the other hand, competitive group behavior seems to have been studied mainly based on algorithm-based ap- proaches. For example, Nelson et al. (2004) studied an evolutionary controller to investigate a form of reinforcement learning that makes use of competitive tournaments of games (robot capture the flag) played by individuals in a popula- tion of neural controllers. Moreover, Wu et al. (2013) developed rule or knowledge-based techniques to analyze strategy in robot soccer game. In this way, in contrast to our approach (Yoshida and Ohta, 2008), di ff erential-equation-based tech- niques are not always essential in these studies because they seek step-by-step algorithms predicting free space determined by other robots. In this paper, we first introduce the CIP framework to describe the competitive behavior in di ff erential-equation- based manners. Next, we develop a competitive intelligent controller that receives state vectors of the CIP model and outputs impulsive control forces to produce desired final states. After evaluating individual performance of the intelligent controller and investigating how to improve the performance, we will demonstrate autonomous competitive behavior between two inverted pendulums equipped with the proposed intelligent controllers.  2. Coupled inverted pendula framework  2.1. Coupled inverted pendula  In order to create wrestler-like mechanical agents maintaining their balance while being coupled mechanically with each other, we consider a CIP model (Yoshida and Ohta, 2008) as shown in Fig. 1. Each inverted pendulum consists of a cart moving along the horizontal floor ( Y   =   0) and a simple pendulum rotating about a point on the cart. For simplicity, a common physical specification is given to both of the pendulums where   m θ   is a mass and   r   is a length of the pendulum, and   m x   is a mass of the cart. Linking the tips of the pendulums with a viscoelastic connection rod of length   w , we obtain the CIP model in Fig. 1 where   T i   is an input torque on   θ i ,   f   i   is a reaction force acting on the tip of   i th pendulum, and   k w  and   c w   are a spring coe ffi cient and a viscous friction coe ffi cient of the rod respectively. We assume that a mass of the rod is negligible. As configuration of this linkage is uniquely determined by the four variables: horizontal displacements of the carts   x 1 ,   x 2   and slant angles of the pendulums   θ 1 , θ 2 , the dynamics of this linkage is described by the eight-dimensional state vector:  x   : =   ( x T  1   ,   x T  2   ) T   ,   x i   : =   ( x i ,    ̇ x i , θ i ,    ̇ θ i  ) T   ( i   =   1 ,   2) ,   (1) where   A T   denotes the transpose of a matrix   A . According to Lagrangian mechanics and assuming viscous friction forces  c x    ̇ x i   and   c θ    ̇ θ i   on   x i   and   θ i   respectively, we obtain equations of motion (EOM) of the CIP model in Fig. 1 as follows:            ( m x   +   m θ )  ̈ x i   +   ( m θ r   cos   θ i ) ̈ θ i   −   m θ r    ̇ θ 2  i   sin   θ i   =   − c x    ̇ x i   +   (1 ,   0)   f i ,  ( m θ r   cos   θ i )  ̈ x i   +   ( m θ r 2 ) ̈ θ i   −   m θ g r   sin   θ i   =   − c θ    ̇ θ i   +   r (cos   θ i ,   −   sin   θ i )   f   i   +   T i   ( i   =   1 ,   2) ,   (2) where  ̇ X   : =   dX / dt .  2.2. Reaction force from the connection rod  We calculate the reaction force, say   p , from the connection rod as shown in Fig. 2. The displacement vector   w   from  2
MOVIC2014   Vol.X, No.X, XXXX  Fig. 1   Coupled inverted pendula model via viscoelastic connection.   Fig. 2   Reaction force   p   from the connection rod.  the left-hand tip to the right-hand tip of pendulums is given by  w   =        w X  w Y          : =   X 2   −   X 1 ,   X i   =        X i  Y i          =        x i   +   r   sin   θ i  r   cos   θ i          ( i   =   1 ,   2) ,   (3) and the length of rod is expressed as  w   =   ‖ w ‖   =   √ { ( x 2   −   x 1 )   +   r (sin   θ 2   −   sin   θ 1 ) } 2   +   { r (cos   θ 2   −   cos   θ 1 ) } 2   (4) with the time derivative  ̇ w   =   (  ̇ w X   w X   +    ̇ w Y   w Y   ) /w.   Then, we model viscoelasticity of the connection rod as  p   =   ‖ p ‖   : =   − k w ( w   −   w 0 )   −   c w    ̇ w,   (5) where   w 0   is a natural length of the connection rod. As the force vector   p   is parallel to the displacement vector   w , we have the reaction force:  p   =   ( p /w ) w .   (6) Substituting   p   into the EOM (2) through,  f   1   =   − p ,   f   2   =   p   (   or   f   i   =   ( − 1) i   p   ) ,   (7) we obtain an analytic expression of the CIP model shown in Fig. 1 via the viscoelastic connection.  2.3. Modeling floor  In the previous study (Yoshida and Ohta, 2008), the pendulum can fall down freely to the bottom dead point, in other word, there was no floor in the previous CIP model. In that case, both of forward and backward falling motions converge to the same equilibrium of the model so that orbital information is required in order to detect the direction of falling. Not only to simplify the detection process but also to develop more realistic simulator of wrestling, we introduce the floor model to the CIP model in the following manner. Based on penalty methods (Moore and Wilhelms, 1988), we first model a normal force   R i   from the floor ( Y   =   0) acting on the tip of   i th pendulum as  R i   =   U ( − Y i ) {− k   f   Y i   −   c   f    ̇ Y i } ,   (8) where   Y i   is a height of the   i th tip from the floor in (3),   U (   ·   ) is a unit step function, and   k   f   ,   c   f   are viscoelastic parameters representing property of the reaction. In practice, in order to avoid numerical errors, we approximate the step function with a sigmoid function di ff erentiable, defined by  U σ ( s ) : =   { 1   +   exp( − σ s ) } − 1   ,   (9) where lim σ →∞   U σ ( s )   =   U ( s ) holds. Furthermore, a Coulomb friction force   F i   acting on the   i th tip from the floor can be expressed as  F i   =   − μ R i   sgn(  ̇ X i ) ,   (10) where   μ   is a friction coe ffi cient,  ̇ X i   is a relative horizontal velocity of the   i th tip from the floor, and sgn(   ·   ) is a unit signum function whose smooth approximation can be given by sgn( s )   ≈   sgn σ ( s ) : =   2 U σ ( s )   −   1 .  Therefore, the CIP model via the viscoelastic connection in (7) on the floor can be obtained by substituting  f   i   =   ( − 1) i   p   +   ( F i ,   R i ) T   ( i   =   1 ,   2)   (11) into the EOM in (2).  3
MOVIC2014   Vol.X, No.X, XXXX  Fig. 3   Competitive interpretation of the equilibriums.   Fig. 4   Architecture of the proposed control system to generate competitive motions.  2.4. Standing control with falling  The CIP framework in absence of intelligent controllers is completed by giving dynamical meanings of winning and losing to states of the CIP model. To this end, we begin with developing a feedback controller by which each inverted pendulum on the floor forms three stable equilibriums:   θ i   =   0 for standing or winning and   θ i   =   ± π/ 2 for falling or losing. This can be done by introducing a feedback controller in the following form:  T i   =   u pd  i   : =   trap α ( θ i ;   ∆ θ ) {− K p   θ i   −   K d    ̇ θ i }   ( i   =   1 ,   2) ,   (12) where trap α ( θ ;   ∆ θ ) : =   U α ( θ   + ∆ θ )   ·   U α ( − θ   + ∆ θ )   (13) is a smooth trapezoidal function of unit height and centered at   θ   =   0 made of a product of the sigmoid function in (9),  ∆ θ >   0 is a half width of the trapezoidal shape, and the shape is getting steeper as   α   increases. It follows from the deadband characteristics in (12) that   u pd  i   simply acts as a PD controller within the limit   | θ i |   <   ∆ θ  while it rapidly cuts o ff   the output outside of the interval.   Therefore, appropriate setting of the gains   K p ,   K d   make it possible for the   i th pendulum to be stabilized about the standing position   θ i   =   0 while to be falling down to the floor when  | θ i |   exceeds the given limit   ∆ θ . It is worthy to note that in the field of gerontology and related fields, human standing (or falling) limits comparable to the threshold   ∆ θ   have been measured by the   functional reach test   (Duncan et al., 1990) in which the di ff erence between arm length and maximal forward reach of human subjects is measured to evaluate risk of falls of them.  2.5. CIP framework  In what follows, we compare the inverted pendulums in this model to wrestler-like agents maintaining their standing balance. Since each agent with the standing control in (12) has the three stable equilibriums, a pair of the agents being coupled with each other under the suitable conditions can produce 3   ×   3   =   9 stable equilibriums:  ω i   : =   lim  t →∞   x ( t )   =   (    ̄ x 1 ,   0 ,    ̄ θ 1 ,   0 ,    ̄ x 2 ,   0 ,    ̄ θ 2 ,   0 ) T   ( i   =   1 ,   · · ·   ,   9) ,   (14) as shown in Fig. 3 when equating horizontal translations of final position  ̄ x 1 ,    ̄ x 2   without loss of generality. Namely, the components   x 1 ( t ) ,   x 2 ( t ) of the solution   x ( t ) of (2) are not stable asymptotically but neutrally because no restoring forces on   x 1 ( t ) ,   x 2 ( t ) are assumed by definition. It also should be noted that due to the penalty method in (8), the gravity force makes the equilibrium  ̄ θ i   on the floor slightly exceed the floor, i.e.,   |  ̄ θ i | −   π/ 2   >   0, but we formally denote   θ i   =    ̄ θ i   =   ± π/ 2 because this slight exceedance only a ff ects almost converged states and does not change the correspondence from the initial to final states. We then attach competitive meanings to the nine equilibriums as listed in Fig. 3 in which the agent that remains standing is regarded as a winner. Eventually, we have the   CIP framework   composed of the set of (A) and (B): (A)   The CIP model: the system of equations defined in (2), (11) and (12). (B)   The win-loss matrix: the competitive interpretation of the nine equilibriums defined in Fig. 3.  4
MOVIC2014   Vol.X, No.X, XXXX  3. Intelligent controller  Based on the CIP framework in Section 2.5, we develop an intelligent controller (IC) to produce desired final posi- tions in Fig. 3 from given initial states of CIP by generating certain impulsive forces.  3.1. Problem setting and requirements  According to the definition of the state vector   x   =   ( x T  1   ,   x T  2   ) T   in (1), the CIP model in (2) can be expressed as an eight-dimensional dynamical system:  ̇ x   =   f   ( x ,   T ) ,   x (0)   =   x 0 ,   T   : =   ( T 1 ,   T 2 ) T   (15) that can be divided into a pair of four-dimensional subsystems:             ̇ x 1   =   f   1 ( x 1 ,   x 2 ,   T 1 ) ,   ̇ x 2   =   f   2 ( x 2 ,   x 1 ,   T 2 ) .   (16) We introduce the IC by adding an intelligent control input   u ic  i   to the torque   T i   of CIP model as  T   : =   u pd   +   u ic   =   ( u pd 1   ,   u pd 2  ) T   +   ( u ic 1   ,   u ic 2  ) T   ,   (17) where   u pd  i   is the standing control input already given in (12). In the present study, the input-output relationship around   u ic  i   is designed as shown in Fig. 4, in which   u ic  i   receives all the state vectors   x 1 ,   x 2   and outputs a series of impulsive forces given by  u ic  i   ( t ) : =   N ∑  j = 1  P i   I ∆ τ ( t   −   t   j i   ) ,   (18) where  I ∆ τ ( t )   =            ( ∆ τ ) − 1   (0   ≤   t   <   ∆ τ ) ,  0   (otherwise)   (19) is a rectangular function of unit area of width   ∆ τ   ≪   1,   P i   is an angular impulse of the input torque   u ic  i   ( t ), and   { t 1  i   ,   · · ·   ,   t N i   }  is a series of rise time satisfying  t 1  i   <   t 2  i   <   · · ·   <   t N i   ,   max  j , k   | t   j i   −   t k i   | ≥   τ G   ≥   ∆ τ,   (20) where   τ G   is a relaxation time to avoid overlapped outputs. In practical implementation, the rise times   t 1  i   ,   · · ·   ,   t N i   are supposed to be determined sequentially by a real time architecture described in Fig. 5, which is composed of three components, a classifier   C , a selector   S   J , and an impulse generator   G .  3.2. Classifier   C  We define the classifier   C   as a function from a state vector   x   =   ( x T  1   ,   x T  2   ) T   at the time   t   =   t 0 , say   x ( t 0 )   =   ξ 0 , to an index number   ν   of equilibrium   ω ν . The function   C   takes the value   C ( ξ 0 )   =   ν   if a solution of the following system:  ̇ x   =   f   ( x ,   T ) ,   x ( t 0 )   =   ξ 0 ,   T   : =   u pd   +        P 1   I ∆ τ ( t   −   t 0 ) 0          (21) converges to the equilibrium   ω ν . From this definition in which a single impulse at   t   =   t 0   is applied only on the left-hand agent, it is implied that the classifier   C   is valid only in absence of additional inputs, in other words, it can fail to return correct equilibriums for more general cases of input as in (18) where both agents can produce impulsive forces for their own decisions. Despite that, we will proceed with a discussion to build a first prototype of artificial wrestling. Consider the transition operator of a solution of (21) as  x ( t ) : =   φ t ( ξ 0 ,   T ) ,   (22)  5
MOVIC2014   Vol.X, No.X, XXXX  Fig. 5   Intelligent controller.   Fig. 6   Impulse generator.  and define a set of the initial state   ξ 0   approaching the equilibrium   ω i   as  Φ i   : =   { ξ 0   ∈   R 8   ∣ ∣ ∣ ∣   lim  t →∞   φ t ( ξ 0 ,   T )   =   ω i  } .   (23) The set   Φ i   is generally called a basin of attraction (Lhommeau et al., 2011) or a reachable set (Bayadi et al., 2013). It follows from uniqueness of solution of initial value problem in (21) that the reachable set in (23) satisfies,  Φ i   ∩   Φ   j   =   ∅   ( i   ,   j ) .   (24) Thus, the classifier   C   is obtained as a single-valued function:  C ( ξ 0  )   : =   ν   if   ξ 0   ∈   Φ ν .   (25) In Section 3.5, we will discuss a numerical approximation of   C   because explicit expressions of   C (   ·   ) are hardly obtainable from nonlinear systems such as (15) and also (21).  3.3. Selector   S J  In the competitive problem in Fig. 3, some of the equilibriums are selected depending upon strategies of the agent considered. Such a selection process can be modeled by a selector   S   J   given by  δ   =   S   J ( ν ) : =            1   ( ν   ∈   J   ⊂ { 1 ,   · · ·   ,   9 } ) ,  0   (otherwise) ,   (26) where   ν   =   C ( x ) is an output of the classifier and   J   is a given subset of indices of the equilibriums   ω 1 ,   · · ·   ,   ω 9 .   For example, the trajectory   x ( t ) in (21) starting from an initial state   ξ 0   satisfying ( S   J   ◦   C )( ξ 0 ) : =   S   J  ( C ( ξ 0 ) )   =   1   for   J   =   { 2 ,   3 }   (27) converges one of the two equilibriums   ω 2   and   ω 3 .  3.4. Impulse generator   G  The impulse generator   G   is designed to receive the binary signal   δ ( t )   =   S   J ( ν ( t )) from the selector and output the unit impulse   G ( t ) as shown in Fig. 6, which is composed of a two-input AND gate and two timer functions   T I   and   T G . The timer   T I   produces a unit impulse as  G ( t )   =   T I   ( t ) : =   I ∆ τ ( t   −   t r ) ,   (28) and the timer   T G   cuts o ff   the binary signal   δ ( t )   =   S   J ( ν ( t )) for a given relaxation time   τ G   discussed in (20) as  T G ( t ) : =            0   ( t r   <   t   <   t r   +   τ G ) ,  1   (otherwise) ,   (29) where   t r   is the rise time from 0 to 1 of the Boolean product ˆ δ ( t )   =   S   J ( ν ( t )) ∧ T G ( t ). Note that although the signal ˆ δ ( t ) can be pulses of infinitesimal width in this continuous time expression, this problem will never arise in discrete time applications with digital computers.  6
MOVIC2014   Vol.X, No.X, XXXX  3.5. Numerical approximation of the classifier   C  Explicit expression of   C ( x ) is hardly obtainable from nonlinear systems such as (15). It seems that there are two types of solutions: solving (15) in process or making a numerical table of the mapping   C   :   x   7   →   ν   in advance. We take the latter approach in the following manner. For practical applications, we introduce a linear measurement equation:  y   =   H x ,   y   ∈   R M ,   x   ∈   R 8 ,   (30) where   M   ≤   8 and   H   is a   M   ×   8 matrix of rank   M , and define a reduced-order reachable set in the following sense:  Ψ i   =   H ( Φ i ) : =   { η 0   ∈   R M   ∣ ∣ ∣   ξ 0   =   h ( η 0 )   ∈   R 8 ,   lim  t →∞   φ t ( ξ 0 ,   T )   =   ω i  } ,   (31) where   x ′   =   h ( y ) ( ,   x   in general) is a certain inverse of   y   =   H x . An actual example of   h   will be given in Section 3.6. Firstly, we take a   M -dimensional cubic region   D   of measuring range within a direct sum of the reachable sets as  9 ⊕  i = 1  Ψ i   ⊃   D   : =   [ a 1 ,   b 1 ]   × · · · ×   [ a M ,   · · ·   ,   b M ]   (32) where   ⊕   represents a direct sum (disjoint union) and [ a ,   b ] denotes an interval. We divide it into a direct sum of uniform subcubes   D i   as  D   =  ⊕ { D i   ∣ ∣ ∣   i   ∈   I   =   [1 ,   · · ·   ,   m 1 ]   × · · · ×   [1 ,   · · ·   ,   m M ] } ,   (33) where   I   is a space of   M -dimensional indices and   m   j   is the number of subcubes in the   j th direction. We then define center points of the subcubes   y i   ∈   D i   ( i   ∈   I ) whose   j th component is given by ( y i )   j   : =   a   j   +  (  ( i )   j   −   1  2  )   b   j   −   a   j  m   j  ,   (34) where ( u )   j   denotes the   j th component of a vector   u . The setup above allows us to build a numerical method as follows: ( 1 )   As an o ffl ine learning procedure, the mapping  ̄ C :  ̄ C ( i ) : =   ν   if   lim  t →∞   φ t  ( h ( y i ) ,   T )   =   ω ν ,   (35a) is numerically stored by solving (21) from   ξ 0   =   h ( y i ). ( 2 )   Within the IC in process, the classifier   C   is quantized by   C ∗   as  C ( x )   ≈   C ∗ ( x ) : =    ̄ C ( i )   for   i   such that   H x   ∈   D i .   (35b) In this method, accuracy of the classifier   C ∗   depends on the dimension of measurement   M , the size and placement of measuring range [ a   j ,   b   j ], and the resolution of quantization of reachable set   m   j   (   j   =   1 ,   · · ·   ,   M ). In summary, we have obtained the IC for the left-hand agent as a composed function of the quantized classifier   C ∗ , the selector   S   J , and the impulse generator   G , which results in the closed-loop form:  u ic 1   =   u ic 1  ( x ( t );   J )   : =   P 1   ·   ( G   ◦   S   J   ◦   C ∗ ) ( x ( t ) ) .   (36) If   M   =   8 and   τ G   is su ffi ciently large, it is implied that the solution   x ( t ) in (21) starting from any initial states in  D   undergoes an impulsive force at the time   t   =   t 0   that   u ic 1   decides autonomously and that it converges to the stable equilibriums specified by   J , under the resolution limit, min   j ( m   j )   → ∞ .  3.6. Reduced-order design of measurement for rigid connection  The connection rod can behave like a rigid rod if the viscoelastic parameters   k w ,   c w   are su ffi ciently large. For the rest of this paper, we restrict our problem to this nearly rigid connection. Although human wrestling involves more flexible interactions between agents, this allows us substantially to reduce the computational e ff orts as follows.   In this case, dependency between displacements and velocities of the left-hand and right-hand carts is imposed in the following form:  x 2   =   x 2 ( x 1 , θ 1 , θ 2 )   =   x 1   −   r (sin   θ 2   −   sin   θ 1 )   +  √  w 2 0   −   r 2 (cos   θ 2   −   cos   θ 1 ) 2 ,   (37a)  ̇ x 2   =    ̇ x 2 (  ̇ x 1 , θ 1 ,    ̇ θ 1 , θ 2 ,    ̇ θ 2 )   =    ̇ x 1   −   r ( ̇ θ 2   cos   θ 2   −    ̇ θ 1   cos   θ 1 )   +   r 2 (cos   θ 2   −   cos   θ 1 )( ̇ θ 2   sin   θ 2   −    ̇ θ 1   sin   θ 1 )  √  w 2 0   −   r 2 (cos   θ 2   −   cos   θ 1 ) 2  ,   (37b)  7
MOVIC2014   Vol.X, No.X, XXXX  Table 1   Parameter setting of the CIP model.  Parameters   Values  m θ   mass of pendulum   0.68 kg  m x   mass of cart   0.067 kg  g   acceleration of gravity   9.8 m / s 2  r   length of pendulum   0.3 m  c x   viscous coe ffi cient along   x   0 . 01 Ns / m  c θ   viscous coe ffi cient about   θ   0 . 01 Ns / m  w 0   natural length of connection rod   1 m  k w   spring coe ffi cient of connection rod   5000 N / m  c w   viscous coe ffi cient of connection rod   50 Ns / m  k f   spring coe ffi cient of floor   500 N / m  c f   viscous coe ffi cient of floor   10 Ns / m  μ   Coulomb friction coe ffi cient of floor   0  σ   steepness of step function   10 6  α   steepness of trapezoidal function   25  K p   proportional gain of standing control   1  K d   derivative gain of standing control   0 . 01  ∆ θ   threshold of standing control   π/ 6 rad  Table 2   Parameter setting of the four-dimensional IC and numerical integration.  Parameters   Values  Q max   maximal strength of impulse   0.06 Nms  P 1   strength of impulse of the 1st IC   =   Q max  P 2   strength of impulse of the 2nd IC   =   − Q max  ∆ τ   width of impulse   = ∆ t  τ G   relaxation time of impulse generator   = ∆ t D   region of measurement   [ − 0 . 13 ,   0 . 43]   ×   [ − 3 . 28 ,   10 . 58]  × [ − 0 . 35 ,   0 . 31]   ×   [ − 3 . 80 ,   5 . 15]  ∆ t   step size of numerical integration   5   ×   10 − 4   s  Free parameters  m   resolution of numerical classifier   ( m j   : =   m   for all   j )  Q   strength of impulsive disturbance  τ d   delay time of delay element  where   w   =   w 0   ≫   2 r   is a constant length of the rigid rod. Then, a loss-less state feedback to IC can be done by the following measurement:  y   =   H x   =   ( x 1 ,    ̇ x 1 , θ 1 ,    ̇ θ 1 , θ 2 ,    ̇ θ 2 ) T   ,   H   =   [ e (6) 1   ,   e (6) 2   ,   e (6) 3   ,   e (6) 4   ,   o (6) ,   o (6) ,   e (6) 5   ,   e (6) 6  ]   ,   (38a) where   e ( d )  i   ,   o ( d )   denote the   i th standard basis vector and the zero vector in Euclidean space   R d   respectively. A corresponding inverse satisfying the rigid constraint can be defined by  h ( y ) : =   H + y   +   x 2 ( x 1 , θ 1 , θ 2 ) e (8) 5   +    ̇ x 2 (  ̇ x 1 , θ 1 ,    ̇ θ 1 , θ 2 ,    ̇ θ 2 ) e (8) 6   ,   (38b) where   H +   is the Moore-Penrose pseudoinverse of   H .   This measurement reduces computational e ff orts to obtain the quantized reachable sets of  ̄ C   in (35a) from   O ( m 8 ) to   O ( m 6 ) with respect to the resolution of quantization   m . In the following numerical examples, we perform a further reduction of order given by  y   =   H x   =   ( θ 1 ,    ̇ θ 1 , θ 2 ,    ̇ θ 2 ) T   ,   H   =   [ o (4) ,   o (4) ,   e (4) 1   ,   e (4) 2   ,   o (4) ,   o (4) ,   e (4) 3   ,   e (4) 4  ]   ,   (39a)  h ( y ) : =   H + y   +   x 1 e (8) 1   +    ̇ x 1 e (8) 2   +   x 2 ( x 1 , θ 1 , θ 2 ) e (8) 5   +    ̇ x 2 (  ̇ x 1 , θ 1 ,    ̇ θ 1 , θ 2 ,    ̇ θ 2 ) e (8) 6   with   x 1   =    ̇ x 1   =   0 .   (39b) Although this measurement su ff ers complete loss of information about the cart motion   x 1 ,    ̇ x 1   (and   x 2 ,    ̇ x 2   via (37)), it reduces the computational e ff orts into   O ( m 4 ). In this paper, we employ this four-dimensional measurement in priority to reducing the computational e ff orts. Moreover, in this four-dimensional measurement, the controller   u ic 1   ( x ;   J ) originally designed for the left-hand agent can symmetrically be reused for the right-hand agent by a transformation:  u ic 2   ( x ;   J ′ ) : =   u ic 1   ( x ′ ;   J ) ,   P 2   : =   − P 1 ,   x ′   : =   − ( x T  2   ,   x T  1   ) T   ,   (40) where   J ′   =   { 4 ,   7 }   for   J   =   { 2 ,   3 }   due to the transpose of the 3   ×   3 matrix of   ω i   in Fig. 3.  4. Numerical investigation  We conduct numerical experiments to evaluate performance of the four-dimensional IC of the measurement in (39). For simplicity, in what follows, we set the resolution   m   j   of quatization to a common value   m   for all   j . The parameter values used in the following examples are listed in Table 1. The physical dimensions   m θ ,   m x   and   r   are roughly collected from the commercially available inverted pendulum (ZMP INC., 2011). For numerical integration, the fourth-order Runge-Kutta- Gill method is employed with the time step   ∆ t   listed in Table 2.  4.1. Individual performance of IC  In order to investigate individual performance of IC, we first consider impulse responses of the CIP model in (15) equipped with the IC in the left-hand only given by  T   =   u pd   +   u ic   +   u   =   u pd   +        u ic 1  ( x ( t );   J 1  )  0          +        v ( t ) 0          ,   v ( t ) : =   QI ∆ τ ( t ) ,   J 1   : =   { 2 ,   3 } ,   (41)  8
MOVIC2014   Vol.X, No.X, XXXX  where   u pd   is the standing controller in (12),   Q   is an impulse of initial disturbance,   u ic 1   is the four-dimensional IC developed in (39), and   I ∆ τ ( t ) is the unit impulse function defined in (19). In the following numerical examples, we assume the maximal strength of impulse   Q max   =   0 . 06 so that   v ( t ) cannot produce switching motions from the trivial initial state   x (0)   =   o (8)   +   w 0 e (8) 5   (or   ω 1 ) to the other stable states in order to avoid trivial switching motions. Following this assumption, the region of measurement   D   is taken as listed in Table 2 so that it circumscribes at least all the trajectories   y ( t )   =   H x ( t ) for the maximal disturbance   Q   =   Q max . We also set the strength of impulse of IC to a common value   P 1   =   − P 2   =   Q max   to avoid the trivial switching motions mentioned above. Note that under the conditions of   k w   and   c w   listed in Table 1 and for   Q   ≤   Q max , dynamic change of length of the connection rod in (41) is limited to   | w ( t )   −   w 0 | /w 0   <   0 . 01 so that the measurement in Section 3.6 is expected to work. Solving (15) with (41) numerically from a trivial initial state   x (0)   =   o (8)   +   w 0 e (8) 5   for a given   Q , we have the corre- sponding final position   ω ν   and obtain correspondence from   Q   to   ν   as plotted in Fig. 7 for the resolution   m   =   50 where   Q  is taken at   N Q   =   100 points with uniform increment over the interval [0 ,   Q max ]. The small circles represent the results of  ν ( Q ) in presence of the IC’s outputs and the cross marks represent those in absence of the outputs due to weak   v ( t ). To evaluate the performance, we define a success rate in the following form:  E   =   E ( J ) : =   N J / ( N Q   −   N 0 )   (0   ≤   E   ≤   1) ,   (42) where   N J   is the number of points on   Q   at which lim t →∞   x ( t )   =   ω ν   for all   ν   ∈   J   and   N 0   is the number of points in absence of the IC’s outputs. The success rate of the result for   m   =   50 shown in Fig. 7 is calculated as   E   =   0 . 165. Also, the rate for  m   =   100 can be obtained as   E   =   0 . 305 in the same manner. Therefore, it appears that our IC has low performance.  4.2. Performance improvement with a delay element  One of the reasons of the low performance above can be explained in Fig. 8. The colored areas represent the quantized reachable sets:  Ψ ∗  ν   : =  ⊕ { D i   ∣ ∣ ∣ ∣    ̄ C ( i )   =   ν,   i   ∈   I }   ( ν   =   1 ,   · · ·   ,   9) ,   (43) plotted on the hyper plane that contains the point   x ( t 1 ) on the phase trajectory at which the IC’s output occurs (indicated by the cross mark). The resolutions of quantization of reachable sets are   m   =   50 for Fig. 8 (a) and   m   =   1000 for Fig. 8 (b). In the low resolution case in Fig. 8 (a), it can numerically be clarified that   x ( t 1 ) belongs to the reachable set   Ψ ∗  2  (in red) of   ω 2   while   x ( t ) ( t   >   t 1 ) actually converges to   ω 4   ( ,   ω 2 ). Such misclassification can be refined by the high resolution as shown in Fig. 8 (b).   In this case, the point   x ( t 1 ) is primly classified into   Ψ ∗  4   (in blue) of   ω 4 .   Although in theory, taking a su ffi ciently large resolution   m   provides nearly exact reachable sets, it greatly enlarges computational e ff orts. Another approach to reduce the misclassification is to replace the quantized reachable set   Ψ ∗  i   in (43) of   ω i   with a subset   Ψ ◦  i   : = Ψ ∗  i   − ∆Ψ ∗  i   where   ∆Ψ ∗  i   ⊂   Ψ ∗  i   is a set of border points neighboring the other sets   Ψ ∗  j   (   j   ,   i ). However, extracting the border points is not necessarily easy because the reachable sets sometimes exhibit nested structures as discussed by one of the authors (Yoshida, 2009). Actually, in Fig. 8 (b), quite narrow region of   Ψ ∗  1   (in white) appears between   Ψ ∗  2   (in red) and   Ψ ∗  4   (in blue). Therefore, let us take yet another approach by replacing the IC with a delayed IC (DIC) in the following form:  u dic  i  ( x ( t );   J , τ d )   : =   u ic  i  ( x ( t   −   τ d   );   J )   ( i   =   1 ,   2) ,   (44) where   τ d   is a delay time. It seems reasonable to expect that a trajectory about to crossing a course-grained border of a reachable set reaches the true border soon. Figure 9 shows the success rate   E   corresponding to this replacement given by  T   =   u pd   +         u dic 1  ( x ( t );   J 1 , τ d   )  0           +   u ,   J 1   : =   { 2 ,   3 } ,   (45) where   E   is averaged over the two types of initial disturbance:   u ( t )   =   ( v ( t ) ,   0 ) T   and   ( 0 , v ( t ) ) T   , and the other procedures of obtaining   E   are the same as in Fig. 7. In Fig. 9, the triangles and the circles represent   E   as functions of the delay time   τ d  of DIC in (44) for   m   =   50 and 100 respectively. It is clearly seen that the functions   E ( τ d   ) are nearly concave down. The maximal values are   E   =   0 . 392 at   τ d   =   0 . 0045 and   E   =   0 . 683 at   τ d   =   0 . 0025 for   m   =   50 and 100 respectively. Therefore, the DIC roughly doubles the success rate, namely, 0 . 392 / 0 . 165   ≈   2 . 38 for   m   =   50 and 0 . 683 / 0 . 305   ≈   2 . 24 for   m   =   100.  9
MOVIC2014   Vol.X, No.X, XXXX  ν  Q  Success rate   E   = 0.165   IC unresponsive  1  2  3  4  5  6  7  8  9  0   0.01   0.02   0.03   0.04   0.05   0.06  Fig. 7   Index of final state   ω ν   as a function of the strength of initial disturbance   Q   for the resolution   m   =   50.  -1.5  -1  -0.5  0  -1.5   -1   -0.5   0   0.5   1  θ  · 2  θ  · 1  (a)   Trajectory  IC output  1  2  3  4  5  6  7  8  9  -1.5  -1  -0.5  0  -1.5   -1   -0.5   0   0.5   1  θ  · 2  θ  · 1  (b)   Trajectory  IC output  1  2  3  4  5  6  7  8  9  Fig. 8   Misclassification of reachable sets. Colored areas are quantized reachable sets for (a) m = 50 and (b) m = 1000.  0  0.2  0.4  0.6  0.8  0   0.002   0.004   0.006   0.008   0.01  E  τ d  m   = 50  m   = 100  Fig. 9   Success rate   E   as functions of the delay time   τ d   of DIC in (44).  4.3. Competitive behavior  In this final section, we install the DICs into both sides of controllers as  T   =   u pd   +          u dic 1  ( x ( t );   J 1 , τ d  1  )  u dic 2  ( x ( t );   J 2 , τ d  2  )          +   u ,   J 1   : =   { 2 ,   3 } ,   J 2   : =   { 4 ,   7 } ,   (46) where   u pd   is the standing controller in (12),   u dic 1   and   u dic 2   is the four-dimensional DIC in (44) through (40), and   u ( t )   =  ( v ( t ) ,   0 ) T   ,   ( 0 , v ( t ) ) T   are initial impulsive disturbances of strength   Q   as shown in (41). The competitive meanings of   J 1   and  J 2   are shown in Fig. 3. Figure 10 shows a competitive behavior between the normal IC:   u dic 1  ( x ( t );   J 1 ,   0 )   =   u ic 1  ( x ( t );   J 1  )   for   m   =   100 and the DIC with optimal delay time:   u dic 2  ( x ( t );   J 2 ,   0 . 0045 )   for   m   =   50. The time responses in Fig. 10 (a) are obtained by solving (15) with (46) from   x (0)   =   o (8)   + w 0 e (8) 5   under the conditions listed in Table 1 and Table 2 by applying the initial disturbance  u ( t )   =   ( v ( t ) ,   0 ) T   with   Q   =   0 . 0186. The same solution is plotted in Fig. 10 (b) as a motion of the CIP mechanism. For convenience, we refer to the former IC as “IC100” and the later as “DIC50”. It is shown in Fig. 10 that the ICs above generate the impulsive control forces autonomously to make the CIP system drop into   ω 2 . According to the competitive interpretation in Fig. 3 inspired by some kind of wrestling match, it can be  10
MOVIC2014   Vol.X, No.X, XXXX  - π /2  0  π /2  0   3   6   9   12   15  P 2  0  P 1  θ i  u i  dic  t i   =1  i   =2  (a) Time responses  0  0.1  0.2  0.3  0.4  0.5  -0.2   0   0.2   0.4   0.6   0.8   1   1.2  Y X  (b) Motion of mechanism Fig. 10   Competitive behavior for IC100 vs DIC50 ( τ d  2   =   0 . 0045).  ν  Q  Total success rate   E 1   = 0.395 (ic;   m   = 100)  E 2   = 0.575 (dic;   m   = 50)  v =( v ( t ),0)  v =(0, v ( t ))  IC unresponsive  1  2  3  4  5  6  7  8  9  0   0.01   0.02   0.03   0.04   0.05   0.06  Fig. 11   Index of final state   ω ν   as functions of   Q   for IC100 vs DIC50 ( τ d  2   =   0 . 0045).  said that “the left agent wins by pulling the right agent.” Figure 11 shows the result of competition. The plots are obtained in the same manner as those in Fig. 7, except that in Fig. 11, the open and filled circles represent the results for   u ( t )   =   ( v ( t ) ,   0 ) T   and   ( 0 , v ( t ) ) T   respectively, and that the cross marks represent the results in which neither of IC100 and DIC50 produces its own output. Similar to the individual case in (42), we define the success rate of this competition as  E i   =   E ( J i ) : =   N J i   / ( N Q   −   N 0 )   (0   ≤   E i   ≤   1)   ( i   =   1 ,   2) ,   (47) where   N Q   =   100   ×   2 is the number of all plots in Fig. 11,   N 0   is the number of trials in absence of the IC’s outputs (cross marks), and the definition of   N J i   is the same as that of   N J   in (42). From the result in Fig. 11, the success rates are obtained as   E 1   =   66 / 167   ≈   0 . 395 for IC100 and   E 2   =   96 / 167   ≈   0 . 575 for DIC50. Therefore, it is shown that at least based on the present definition of competition and success rate, the performance improvement using time delay is more e ff ective than that doubling the quantization resolution without the time delay.  5. Conclusion  In this paper, we discussed a competitive problem in which mechanical agents are fighting with each other and formulated it as the set of: (A) the nonlinear dynamical model with nine stable equilibriums and (B) the matrix describing competitive interpretation of these equilibriums. Based on this framework, we proposed a competitive IC that receives the state vector and output the impulsive forces to make the competitor fall down. Developing a quantized and reduced order design of the controller, we derived a practical control procedure along with an o ff -line learning method. To investigate performance of the controller in individual use and also in competition use, we conducted numerical experiments and obtained the following results:  •   The individual performance of IC depends on resolutions of the quantized reachable sets.  •   The individual performance of IC can be improved by adding the delay element into the IC.  11
MOVIC2014   Vol.X, No.X, XXXX  •   To improve the competitive performance of IC, adding the delay element may become more e ff ective than refining the resolutions. In future work, we plan to investigate a further order reduction of measurement based on time delayed embedding methods and to improve classification accuracy by applying machine learning techniques. We also plan to conduct com- petitions between humans and the proposed ICs. Moreover, application of position control to the carts will be considered to investigate competitive problems in a bounded area.  Acknowledgment  The authors appreciate the feedback o ff ered by Dr. Munehisa Sekikawa.  References  Bayadi, R., Banavar, R. N. and Chang, D. E., Characterizing the reachable set for a spacecraft with two rotors, Systems & Control Letters, Vol.62, No.6 (2013), pp.453–460. Duncan, P. W., Weiner, D. K., Chandler, J. and Studenski, S., Functional reach: A new clinical measure of balance, Journal of Gerontology, Vol.45, No.6 (1990), pp.M192–M197. Hofbauer, J. and Sigmund, K., Evolutionary Games and Population Dynamics (1998), Cambridge University Press. Hsu, P., Coordinated control of multiple manipulator systems, IEEE Transactions on Robotics and Automation, Vol.9, No.4 (1993), pp.400–410. Lhommeau, M., Jaulin, L. and Hardouin, L., Capture basin approximation using interval analysis, International Journal of Adaptive Control and Signal Processing, Vol.25, No.3 (2011), pp.264–272. Maravall, D., de Lope, J. and Dom ́ ınguez, R., Coordination of communication in robot teams by reinforcement learning, Robotics and Autonomous Systems, Vol.61, No.7 (2013), pp.661–666. Moore, M. and Wilhelms, J., Collision detection and response for computer animation, SIGGRAPH Computer Graphics, Vol.22, No.4 (1988), pp.289–298. Nakamura, Y., Nagai, K. and Yoshikawa, T., Mechanics of coordinative manipulation by multiple robotic mechanisms, Proceedings of 1987 IEEE International Conference on Robotics and Automation, Vol.4 (1987), pp.991–998. Nelson, A., Grant, E. and Henderson, T., Evolution of neural controllers for competitive game playing with teams of mobile robots, Robotics and Autonomous Systems, Vol.46, No.3 (2004), pp.135–150. Pagello, E., DAngelo, A., Montesello, F., Garelli, F. and Ferrari, C., Cooperative behaviors in multi-robot systems through implicit communication, Robotics and Autonomous Systems, Vol.29, No.1 (1999), pp.65–77. Panwar, V., Kumar, N., Sukavanam, N. and Borm, J., Adaptive neural controller for cooperative multiple robot manipula- tor system manipulating a single rigid object, Applied Soft Computing, Vol.12, No.1 (2012), pp.216–227. Stone, P. and Veloso, M., Multiagent systems: A survey from a machine learning perspective, Autonomous Robots, Vol.8, No.3 (2000), pp.345–383. Wu, J., Sn ́ aˇ sel, V., Ochodkov ́ a, E., Martinoviˇ c, J., Svatoˇ n, V. and Abraham, A., Analysis of strategy in robot soccer game, Neurocomputing, Vol.109 (2013), pp.66–75. Yoshida, K., Fractal dependence on initial conditions of coupled inverted pendula model of competition and cooperation, Journal of System Design and Dynamics, Vol.3, No.6 (2009), pp.966–974. Yoshida, K. and Ohta, H., Coupled inverted pendula model of competition and cooperation, Journal of System Design and Dynamics, Vol.2, No.3 (2008), pp.727–737. ZMP INC., e-nuvo WHEEL (2011), http: // www.zmp.co.jp / e-nuvo / en / .  12