Multi-Objective Design of State Feedback Controllers Using Reinforced Quantum-Behaved Particle Swarm Optimization Kaveh Hassani∗and Won-Sook Lee† School of Computer Science and Electrical Engineering University of Ottawa, Canada Abstract In this paper, a novel and generic multi-objective design paradigm is proposed which utilizes quantum-behaved PSO(QPSO) for deciding the optimal configuration of the LQR controller for a given problem considering a set of competing objectives. There are three main contributions introduced in this paper as follows. (1) The standard QPSO algorithm is reinforced with an informed initialization scheme based on the simulated annealing algorithm and Gaussian neighborhood selection mechanism. (2) It is also augmented with a local search strategy which integrates the advantages of memetic algorithm into conventional QPSO. (3) An aggregated dynamic weighting criterion is introduced that dynamically combines the soft and hard constraints with control objectives to provide the designer with a set of Pareto optimal solutions and lets her to decide the target solution based on practical preferences. The proposed method is compared against a gradient-based method, seven meta-heuristics, and the trial-and-error method on two control benchmarks using sensitivity analysis and full factorial parameter selection and the results are validated using one-tailed T-test. The experimental results suggest that the proposed method outperforms opponent methods in terms of controller effort, measures associated with transient response and criteria related to steady-state. 1 Introduction Optimal control theory refers to the controller design patterns that simultaneously sat- isfy the physical constraints of the controlled process and optimize some predetermined performance criteria. The evolution of optimal control theory has led to the emergence of linear quadratic regulators (LQR) – an optimal multivariable feedback control approach that improves the stability and minimizes the excursion in state trajectories of a system while requiring minimum controller effort. Applying LQR technique to a controllable linear time-invariant (LTI) system results in a set of optimal feedback gains that mini- mizes a quadratic criterion and stabilizes the system [31]. The LQR approach has been utilized in vast variety of real world engineering applications such as but not limited to missile guidance [46,47], flight control [26], multiple spacecraft formation [50], control- ling unmanned vehicles [18], active car suspension [45], ABS break system [22], power ∗email: kaveh.hassani@uottawa.ca; Corresponding author †email: wslee@uottawa.ca The complete paper appears in Applied Soft Computing DOI: http://dx.doi.org/10.1016/j.asoc.2015.12.024 1 arXiv:1607.00765v1 [cs.NE] 4 Jul 2016 converters [38,40], active power filter [25], and tuning PID controllers [28]. Essentially, LQR controllers minimize a quadratic cost function also known as performance index that consists of two penalty matrices including state (Q) and control (R) weighting ma- trices. These two parameters are main design parameters to be selected by the designer and greatly influence the behavior of the controller. It is worth noting that it is not a trivial task to decide these two matrices. In general, deciding the LQR parameters for a given process is a continuous, multimodal and multi-objective optimization problem. For the simplicity purposes, in most of the applications the problem is modeled as a single-objective which limits the designer to only one configuration. In the proposed approach, the problem is considered as a multi-objective problem and as a result the designer is provided with a set of Pareto optimal solutions which lets her to decide the target solution based on some practical preferences. Traditionally, weighting matrices are determined by the trial-and-error method in which a domain expert adjusts the weighting matrices intuitively and then refines them iteratively to obtain a satisfactory performance. This method is not feasible for high- dimensional systems and even for simpler systems is labor-intensive and time-consuming. Bryson’s method [23] is another iterative method in which initial state and feedback variables are normalized with respect to their largest permissible and then are uti- lized to initialize the weighting matrices. Then, similar to trial-and-error method, the weighting matrices are gradually refined to approach the minimum index value. Pole placement [44] is another popular technique for determining the weighting matrices in which the matrices are decided based on the given poles. This approach neither guaran- tees a good performance nor the satisfaction of the constraints. Other approaches have been proposed as well such as utilizing asymptotic modal properties [15] and express- ing the system as an explicit function of the weighting matrix elements [57]. Yet they suffer from the similar deficiencies. These classic approaches are labor-intensive, time- consuming and do not guarantee the expected performance. They only aim to minimize the quadratic performance index and ignore the other competing or incommensurable control objectives such as minimizing the overshoot, rise-time, settling-time, and the steady-state error. In order to tackle these problems, some studies have utilized soft computing tech- niques including but not limited to particle swarm optimization (PSO) [56], artificial bee colony (ABC) [5], ant colony optimization (ACO) [9], genetic algorithm (GA) [61], differential evolution (DE) [34], memetic algorithm (MA) [64], artificial immune sys- tems (AIS) [43], imperialist competitive algorithm (ICA) [42], neural networks [60], and fuzzy systems [55]. These methods can explore the search hyperspace in an informed manner and converge to the optimal solutions in a few iterations using a combination of knowledge sharing and individual explorations. The main problem with most of the computational intelligence techniques is that they are prone to premature convergence which causes them to get trapped within the local optima. In this paper, a novel and generic multi-objective design paradigm is proposed which utilizes a global convergence guaranteed variation of particle swarm optimization (PSO) called quantum-behaved PSO (QPSO) for deciding the optimal configuration of the LQR controller for a given problem considering a set of competing objectives including the quadratic performance index, overshoot, rise-time, settling-time, steady-state error, and integrated absolute error. The proposed method is called reinforced multi-objective quantum-behaved PSO (RMO-QPSO). The rationale behind the selection of QPSO as the core optimizer is as follows. (1) The experimental studies suggest that PSO as the predecessor of QPSO outperforms GA, DE, MA, ACO, and ABC in terms of success 2 rate, solution quality and processing time [6, 11]. It is also experimentally shown that PSO is scalable, requires less computational resources, and its processing time grows at a linear rate with respect to the size of the problem [1]. (2) it is theoretically guaranteed that the QPSO converges to the global optimum; (3) it is less sensitive to the bias problem as it has only one parameter, and (3) it outperforms other PSO variations in finding the optimal solutions [51]. There are three main contributions introduced in this paper as follows. (1) The standard QPSO algorithm is reinforced with an informed initialization based on the simulated annealing and Gaussian neighborhood selection mechanism. (2) It is also augmented with a local search strategy which integrates the advantages of memetic algorithm into conventional QPSO. (3) Finally, an aggregated dynamic weighting crite- rion is introduced that dynamically combines the soft and hard constraints with control objectives to provide the designer with a set of Pareto optimal solutions and lets her to decide the target solution based on practical preferences. As far as the authors’ knowledge is concerned, this is the first time that a multi-objective derivation of PSO is applied for deciding the configuration of LQR controllers. Without loss of generality, RMO-QPSO is utilized to decide the configuration of LQR controllers for two control benchmarks including: (1) stabilizing an inverted pendulum system, and (2) controlling a flight landing system. In order to have comparative studies, nine different techniques (i.e. one trial-and-error based method, one gradient based technique, and seven stochastic meta-heuristics) including the trail-and-error method, Levenberg-Marquardt optimization (LM), GA, DE, ABC, PSO, QPSO, chaotic PSO (CPSO), and adaptive inertia weighted PSO (AIWPSO) are utilized to decide the LQR parameters of the same benchmarks. In order to mitigate the bias problem and have a fair comparison among the different techniques, full factorial parameter selection and sensitivity analysis [29] are exploited for all of the applied techniques to find the best parameter setting including population size, iteration number, etc. The comparative results are also validated using one-tailed T-test to investigate whether the experimental results are statistically significant. The paper is organized as follows. Section 2 provides the mathematical foundation of LQR controllers. In section 3 an overview of related works is presented. In section 4, we investigate the concept of multi-objective QPSO. In section 5, we present the proposed technique for optimal tuning of LQR controllers and in section 6 we discuss the experimental results. Section 7 concludes the paper. 2 Linear Quadratic Regulators The LQR is an optimal multivariable feedback control approach that improves the sta- bility and minimizes the excursion in the state trajectories of a system while requiring minimum controller effort. From a Mathematical point of view, for a controllable LTI system with a state-space model shown in 1, the LQR approach constructs a linear state feedback law as depicted in 2.  ˙x(t) = Ax(t) + Bu(t) y(t) = Cx(t) + Du(t) (1) u(t) = −Kx(t) (2) In these equations, x(t) denotes an n-dimensional state vector,y(t) presents an r- dimensional output vector, and u(t) is an m-dimensional control vector. K ∈ℜn×m is 3 the optimal state-feedback gain matrix. The control law in 2 minimizes the quadratic performance index shown in 3 which integrates the state and control energies through the time. In other words, it minimizes the distance between the process outputs and the desired outputs with minimum control energy. J = Z ∞ 0 (xT Qx + uT Ru)dt (3) In 3 Q ∈ℜn × n denotes a symmetric positive semi-definite state weighting (state penalty) matrix and R ∈ℜm × m denotes a symmetric positive definite control weight- ing (control penalty) matrix. The control gain matrix K is given by 4. K = R−1BT P (4) where P is a unique symmetric positive semi-definite solution to the algebraic Riccati equation shown in 5. PA + AT P + Q −PBR−1BT P = 0 (5) 3 Related Works One of the most successful yet less applied techniques for optimal designing of LQR controllers is PSO. The superiority of PSO over GA in finding optimal weighting matrices has been experimentally shown in some studies [13, 63]. In [37] PSO, GA and trial- and-error methods are utilized to adjust LQR weighting matrices which is applied to controlling an aircraft landing flare system. It is concluded that LQR design based on PSO is more efficient and robust compared to other methods. In [59] authors compared the performance of the ordinary LQR, the LQR with prescribed degree of stability (LQRPDS) and the PSO-based LQR in controlling distribution static compensator and showed that PSO-based design results in best performance under different operating conditions. In [49] a method is proposed to determine the weighting matrices using PSO with pole region constraint for controlling a flexible-link manipulator. In [37] it is suggested that PSO-based LQR produces better result compared to trial-and-error approach for the active suspension system. In [42] authors applied ordinary LQR, PSO-based LQR, AWPSO-based LQR and ICA-based LQR for optimal load frequency control and con- cluded that AWPSO-based LQR outperforms other approaches in terms of settling-time and maximum overshoot. A PSO-based optimal real-time LQR controller for stabilizing an inverted pendulum system is proposed in [14]. In [3] wavelet-PSO is proposed for tuning LQR controllers and is applied to an optimal structural control. In [24] it is con- cluded that contrary to trial-and-error approach, a PSO-based state feedback controller does not have sub-optimal performance in case of partial state feedback. Hassani et al. [17] proposed a quantum-behaved PSO algorithm for tuning a given LQR controller. They exploited a conventional weighted aggregation of control objectives which can only provide the designer with one solution. More studies on PSO-based LQR design can be found in [26,48,56,62]. They are a few studies that exploit multi-objective design for tuning LQR controllers. In [33] a multi-objective evolutionary algorithm is presented to control a double inverted pendulum system. They concluded that multi-objective approach results in shorter ad- justing time and smaller amplitude value deviating from steady-state in comparison 4 with the pole assignment design approach. [36] also applies a multi-objective evolution- ary algorithm to stabilize a double inverted pendulum system. Their results suggest that multi-objective approach results in shorter adjusting time and smaller amplitude value deviating from steady-state in comparison to non-dominated sorting GA. As far as the authors’ knowledge is concerned, it is the first time that a multi-objective derivative of the PSO algorithm is applied for optimal design of LQR controllers. 4 Multi-Objective Quantum-Behaved PSO 4.1 Multi-Objective Optimization MOO is a common practice in engineering applications and refers to optimizing two or more competing, confronting or incommensurable objective functions simultaneously. In contrast to single-objective problems in which optimal solution is well-defined, in multi-objective problems it is not possible to reach the global optimum for all objectives at the same point. Instead there is a whole set of optimal trade-offs which forms the solution set called Pareto optimal set. Although it is possible to treat a problem with multi-objective nature as a single-objective problem, it is not an efficient approach due to two main reasons: (1) the objectives must be aggregated to a single weighted objective, and (2) the optimizer must run for several times to obtain the Pareto solutions. A MOO problem over an n-dimensional search space X is defined as finding the vector X∗= [x∗ 1, x∗ 2, . . . , x∗ n] ∈X such that:  Minimize fi(x), i = 1 . . . k Subject to gj(x) ⩽0, j = 1 . . . m (6) fi(x) and gj(x) are vectors of k objective functions and m constraints defined over X, respectively. Mostly it is not possible to find such a global and acceptable solution for all conflicting objectives. A feasible solution is to find a set of solutions in a way that each solution satisfies 6 without being dominated by other solutions. A solution u = [u1, . . . , uk] is said to dominate solution v = [v1, . . . , vk] if and only if fi(u) ⩽ fi(v)∀i = 1 . . . k and fi(u) < fi(v) for at least one objective function and is denoted by u ≺v. A feasible solution x is Pareto optimal if and only if there is no other solution that dominates x. The set of all Pareto optimal solutions is referred as Pareto optimal set and the set of corresponding objective function values is called the Pareto front. The goal of a given MOO method is to find the Pareto optimal set which results in the best possible Pareto front regarding the given objectives and constraints. 4.2 Particle Swarm Optimization The PSO algorithm [10] is a population-based optimization technique that explores the search space by simulating the choreographed motion of the birds based on socio- cognitive characteristics of flocking individuals. A PSO algorithm consists of a swarm of |S| particles each containing a position and a velocity vectors and a scalar fitness value. For a D-dimensional objective function, the position and velocity vectors of each particle are D-dimensional as well. The position vector represents the position of the particle within the D-dimensional search space and encodes a potential solution to the optimization problem at hand. The velocity vector determines the direction and mag- nitude of the changes in position vector through time. Each particle shares information with other particles through the best position seen so far in the swarm history and 5 follows its trajectory toward the global optimum based on Newtonian mechanics. A domain-dependent fitness function is used to determine how strong a given particle is (i.e. how close the particle is to the global optimum). The velocity vector of the particle i in a D-dimensional space is updated in through time using 7. ⃗vi(t + 1) = w(t) ×⃗vi(t) + cp × ⃗rp ⊗(⃗xi,pbest(t) −⃗xi(t)) + cg × ⃗rg ⊗(⃗xgbest(t) −⃗xi(t)) (7) ⃗vi(t) denotes the D-dimensional velocity vector of the particle i whose values are defined within the range of [vmin, vmax]. w(t) is the inertia weight that controls the momentum of the particle by weighting the contribution of its previous velocity values. The inertia weight is equivalent to an infinite impulse response low pass filter. cp and cg are two positive parameters known as cognitive and social learning rates that determine the weight of cognitive aspects and social pressure in decision making process, respec- tively. rp and rg are D-dimensional vectors of randomly generated values by a uniform distribution in the range of [0, 1]. ⊗denotes point-wise vector multiplication operator. ⃗xi,pbest is the best position seen by the particle i up to time t (pbest), xgbest(t) is the best position seen by whole swarm up to time t (gbest), and ⃗xi(t) is the position vector of particle i in time t. The inertia factor is computed by 8. w(t) = wmax −(wmax −wmin) × t/T (8) wmax and wmin are maximum and minimum values of inertia, respectively. t and T denote current iteration and maximum number of iterations, respectively. The position vector of particle i is updated using 9. ⃗xi(t + 1) = ⃗xi(t) + ⃗vi(t + 1) (9) Without loss of generality, the best personal and the best global positions for a minimization problem are computed by 10 and 11, respectively. In these equations, f denotes the fitness function (i.e. performance index or cost function). ⃗xi,pbest(t + 1) =  ⃗xi,pbest(t), f (⃗xi(t + 1)) ≥f (⃗xi,pbest(t)) ⃗xi(t + 1), f (⃗xi(t + 1)) ≤f (⃗xi,pbest(t)) (10) ⃗xgbest(t + 1) = arg min ⃗xi,pbest f (⃗xi,pbest (t + 1)) (11) The PSO algorithm has been exploited in optimizing various engineering problems such as electric power systems [2], wireless-sensor networks [27], image processing [4,35], robotics [21,41], to name only a few. 4.3 Quantum-behaved PSO The main drawback of the PSO algorithm is that it does not guarantee the global convergence and is prone to the premature convergence [58]. The quantum-behaved PSO (QPSO) [52–54] is a global convergence guaranteed optimizer that belongs to the bare-bones PSO (BBPSO) category in which particles do not have a velocity vector and their position is updated by sampling a prob-ability distribution of interest. The QPSO was inspired by quantum mechanics and trajectory analysis of PSO [51]. In [7], the trajectory analysis of the particles showed that each particle oscillates around its local attractor. It was also mathematically shown that if the upper limits of cognitive and 6 social learning rates are selected properly, each particle will converge toward its local attractor. The local attractor is interpreted as the center of gravity toward which the particle careens while declining its kinetic energy. If the search space is stationary (i.e. which is the case in most of the practical applications) there will be no periodic orbits also known as unstable equilibria in the search hyperspace. Although satisfying this condition guarantees the convergence, there is no systematic way to select those limits. The local attractor of the ith particle denoted by pi is computed using 12. ⃗pi(t) = (cp × ⃗xi,pbest(t) + cg × ⃗xgbest(t)) cp + cg (12) Contrary to PSO algorithm in which the particles follow their trajectory toward their local attractor based on Newtonian mechanics, in QPSO algorithm particles obey the quantum mechanics in which trajectory is not a valid concept. It is assumed that particles are attracted to a quantum potential field which is centered on their local attractors. In QPSO, the state of a particle is expressed by its wave function Ψ (x, t). The probability of particle i being in position xi is computed by probability density function |Ψ (x, t) |2 whose form depends on the potential field in which the particle is moving. One possible model is quantum Delta potential well model [52]. In this model, it is assumed that a particle moves in a Delta potential well field in the search space with center p calculated by 12. In the quantum model, only the probability density function of the particle position is available while for computing the objective function the exact position of the particle is required. To address this issue,Monte Carlo sampling method is used to simulate the measure-ment process from the wave function which estimates the position of the ith particle as expressed by 13. ⃗xi(t + 1) = ⃗pi(t) ± |⃗xi(t) −⃗pi(t)| ⊗ln (1 \ ⃗u) g (13) In this equation, pi(t) denotes the local attractor of particle i computed using 12, u is a D-dimensional randomly generated vector following a uniform distribution in the range of [0, 1], and g is a control parameter greater than ln √ 2. The QPSO algorithm is summarized in Algorithm 1. |T| and |S| denote the number of iterations and swarm size,respectively. U[0, 1] is a uniform random number in the range of [0, 1] and f(x) is an arbitrary objective function. The QPSO algorithm has some advantages over other variations of the PSO algorithm. It is less sensitive to the bias problem as it has only one parameter to be tuned. It is theoretically guaranteed that the QPSO converges to the global optimum. It is also experimentally shown that the QPSO algorithm outperforms other PSO variations in finding the optimal solutions and has stronger exploring capabilities [51]. For more comprehensive evaluations on the QPSO algorithm, please refer to [51]. The QPSO algorithm has been successfully applied to a vast variety of engineering problems such as but not limited to system identification [12], image processing [30], power systems [65], neural network training [32], brain-computer interfacing [16], etc. 5 Proposed Approach 5.1 Individual Representation In order to encode the problem of deciding the optimal weights of a given LQR, each particle is considered as a potential solution that encodes the elements of weighting 7 Algorithm 1: The standard QPSO algorithm. Result: returns the position vector of the global best particle begin Initialize the current positions randomly for t = 1 to T do for i = 1 to |S| do Calculate fitness f (⃗xi(t)) Update personal best (⃗xi,pbest(t)) using 10 Update global best (⃗xgbest(t)) using 11 cp, cg ∼U[0, 1] Compute the local attractor ⃗pi(t) using 12 for d = 1 to |D| do u ∼U[0, 1] L = (1/g) × |⃗xi,d(t) −⃗pi,d(t)| ⃗xi,d(t) = ⃗pi,d(t) −L × ln (1/u) with probability 0.5 otherwise ⃗xi,d(t) = ⃗pi,d(t) + L × ln (1/u) return ⃗xgbest matrices into its position vector. The state and control weighting matrices consist of n2 and m2 elements respectively which results in the total number of n2 + m2 elements to be decided. Finding the optimum solution within a (n2 + m2)-dimensional space is a challenging task due to the curse-of-dimensionality effect (e.g. 250 features for a system with 15 state and 5 control variables). A practical solution which is widely exploited in the real world applications is to define both matrices in diagonal form to reduce the search space to (n+m) dimensions. Hence, as shown in 14, the particles’ position vector is defined as an (n+m) dimensional vector that represents the concatenation of diagonal state and control weighting matrices. ⃗xi(t) = [xi,1(t), . . . , xi,n+m(t)] = [Q1,1(t), . . . , Qn,n(t), R1,1(t), . . . , Rm,m(t)] (14) This simple trick reduces the number of features from n2 + m2 to n + m and as a result enhances the exploration capabilities and reduces the computational costs while reserving the accuracy. 5.2 Evaluation Criteria To decide the evaluation criteria, the optimization goals are categorized to hard con- straints and soft objectives. The former category refers to those constraints that must be satisfied. If a solution does not satisfy a given hard constrain, it is considered as an infeasible solution. On the other hand, those solutions that satisfy the hard con- straints are feasible solutions that compete within the feasible sub-space to reach the better regions by optimizing the soft objectives. According to the definition in Section 2, Q and R matrices must be symmetric positive matrices and therefore those solutions with negative values are considered as infeasible solutions. This is the only hard con- straint addressed in the proposed approach. The infeasible solutions are treated using a stochastic approach inspired by local search strategy in MA [20]. These solutions are 8 either repaired with probability Pr or debilitated by being penalized with probability (1Pr). The repair probability is defined by 15. Pr (⃗xi(t)) = 1 −   n+m X j=1 H (xi,j(t)) n + m   (15) In this equation, xi(t) denotes the position vector of particle i in time t, n and m represent the cardinality of diagonal Q and R matrices respectively and H(.) is the Heaviside step function that maps negative values to zero and positive values to one. The repair probability decreases proportional to the number of negative values within the position vector of a given particle. The repair strategy is thoroughly discussed in Section 5.3. Three classes of soft objectives are considered as follows. (1) A given LQR controller must minimize the quadratic performance index J defined by 3. (2) An optimal con- troller also should minimize the time-domain control objectives that correspond to the transient response. The transient response of a system may be described using two com- peting factors including the swiftness and the closeness of the response to the desired response. The swiftness of the response is measured by the rise-time Tr. The closenes- sis measured by the overshoot OS and settling-time Ts. (3) The steady-state behavior of the system is addressed by minimizing the steady-state error Ess. Furthermore, in order to augment the multi-objective aspect of the evaluation criteria, each particle is rewarded using16 which is the difference between the number of solutions that are dominated by that particle and the number of solutions that dominate the particle. R (⃗xi) =   X ∀x∈S,x̸=xi xi ≺x  −   X ∀x∈S,x̸=xi xi ≻x   (16) The domination reward is normalized using a sigmoid function and aggregated with the violation penalty into a uniform weighted penalty-reward factor defined by 17. fP−R (⃗xi) =  1 1 + e−|S|×R(⃗xi)  × Pr (⃗xi(t)) × φ (17) In this equation, fP−R denotes the penalty-reward factor and φ denotes the penalty for violating the hard constraint. In order to aggregate the soft objectives in a multi- objective fashion, dynamic weighted aggregation (DWA) method in exploited in which the gradual changes (set by the change frequency F) of the weights force the particles to keep moving on the Pareto front. For a problem with two competing objectives, the dynamics of the weights is defined by 18. w1(t) = | sin (2πt/F) |, w2(t) = 1 −w1(t) (18) It has been shown that DWA outperforms bang-bang weighted aggregation (BWA) in case of convex Pareto front and is identical to BWA with concave Pareto front. DWA is also competitive with population-based non-Pareto approaches such as VEGA and VEPSO in terms of optimality while outperforms them in terms of efficiency [39]. The aggregated multi-objective criterion based on DWA is defined as shown in 19. f (⃗xi) = fP−R (⃗xi) × ((w1(t) × (log10(J) + Ess)) + ((w2(t) × (OS + Ts −TR)))) (19) 9 As shown in 19, the quadratic performance index is normalized using a logarithmic operation. This is because the quadratic performance index is mostly much bigger than other elements in real world applications. The dynamically aggregated objective function shown in 19 lets the designer to decide the priority between minimizing the control effort and steady-state response, and optimizing the transient response while it automatically satisfies the hard constraint of the system. In other words, it provides the designer with a set of Pareto optimal solutions and lets her to decide the target solution based on practical preferences. 5.3 Optimization process The standard QPSO algorithm introduced in [52] is augmented with two mechanisms in- cluding informed initialization and repairing strategy. Mostly, population-based heuris- tics construct an initial population of random individuals and then exploit an iterative optimization to push the population toward the global optima.The informed initializa- tion refers to initializing the population based on a priori knowledge and has been shown that is more efficient than naive and random initialization [8]. In order to reinforce the algorithm with informed initialization, simulated annealing search [19] is exploited which is a fast search strategy that can escape the local optima due to its stochastic nature. This algorithm optimizes the problem at hand by emulating the cooling process of a solid until it reaches the minimum energy configuration. In this algorithm, in each step a random neighbor of the current state is chosen. In case that the selected neigh- bor is fitter than the cur-rent state, the search continues exploring from the neighbor state.Otherwise, it moves to the neighbor state with a probability proportional to the current temperature and the difference between the fitness values of the current and the neighbor states. This prob-ability decreases as the temperature reduces through the time. The proposed informed initialization based on simulated annealing is shown in Algorithm 2. Algorithm 2: The informed initialization strategy. Result: returns a diverse and semi-optimal set of solutions begin Initialize population:⃗xi ∼U[0, Xmax], i = 1 . . . |S| for i = 1 to |S| do w1, w2 ∼U[0, 1] for t = 1 to Tini do for d = 1 to D do Succd (⃗xi) = min (max (N (⃗xi,d, σ) , 0) , Xmax) with probability pSucc Calculate f (⃗xi) and Succ (⃗xi) using 14 with current w1 and w2 ∆E = f (⃗xi) −f (Succ (⃗xi)) T = e−αt if ∆E > 0 then f (⃗xi) ←f (Succ (⃗xi)) else f (⃗xi) ←f (Succ (⃗xi)) with probability e∆E/T return ⃗xi, i = 1 . . . |S| 10 The neighbor state (Succ(xi)) is selected using Gaussian mutation operator adopted from evolutionary computation in which N(x, σ) is a normal distribution of random variable x with standard deviation of σ. The simulated annealing search is performed for Tini iterations on each particle which is fairly smaller than the number of iterations within the main optimizer. This process results in an initial swarm that is close to the Pareto optimal. It is noteworthy that to provide the initial swarm with a good degree of diversity,the aggregation weights are set randomly for each particle rather than using 18. The repair strategy is introduced to map the infeasible solutions into the feasible sub- space. As mentioned in Section 5.2, a particle with negative values within its position vectors violates the described hard constraint and is considered as an infeasible solution. A naive repair strategy is to randomly replace the negative values with random positive values. A more profound strategy is to exploit the strong feasible solutions to guide the infeasible particles to those regions of feasible sub-space that are more likely to be close to the Pareto optimal. For this purpose, the cross-over and tournament selection operations are adopted from evolutionary computations. The repair strategy is as follows. For each infeasible particle, two feasible particles that are strongest in a randomly selected neighborhood are selected. Then, for each negative value embedded within the infeasible particles, the corresponding values of two selected particles are combined in a linear weighted manner. The negative value of the particle is then overwritten by the resulted value. The algorithm of repair strategy is shown in Algorithm 3. Algorithm 3: The repair strategy to map the infeasible particles to the feasible sub-space. Data: ⃗xi : position of infeasible particle i Result: returns a feasible solution begin ⃗p1, ⃗p2 ←Tournament-Selection() for d = 1 to D do λ ∼U[0, 1] if ⃗xi,d < 0 then ⃗xi,d = λ⃗p1,d + (1 −λ) ⃗p2,d with probability 0.5 otherwise ⃗xi,d = (1 −λ) ⃗p1,d + λ⃗p2,d return ⃗xi Using informed initialization and repair strategy, we introduce our proposed rein- forced MO-QPSO algorithm for optimal tuning of LQR controllers in Algorithm 4. In this algorithm, |T|, |S|, and D denote the number of iterations, swarm size, and the number of diagonal elements of the weighting matrices, respectively. As shown, the proposed algorithm returns all non-dominant particles which build the Pareto optimal set. These Pareto optimal solutions are then presented to the designer and she decides the final solution based on some domain-dependent and practical considerations and her intuitions and expertise. The schematic of the integration of proposed RMO-QPSO algorithm into the closed-loop system is illustrated in Figure 1. 11 Algorithm 4: Proposed reinforced RMO-QPSO algorithm for optimal tuning of LQR controllers. Result: returns set of optimal diagonal Q and R matrices begin Informed-Initialization(): Algorithm 2 for t = 1 to |T| do for i = 1 to |S| do Calculate P from Riccati equation: 5 Calculate the feedback gain K: 4 Simulate the closed-loop system: 2 Calculate fitness f (⃗xi(t)): 19 Repair (⃗xi(t)) with probability Pr (⃗xi(t)) : 15 and Algorithm 3 Update personal best ⃗xi,pbest(t): 10 Update global best ⃗xgbest(t): 11 cp, cg ∼U[0, 1] Compute the local attractor using ⃗pi(t): 12 for i = 1 to |S| do u ∼U[0, 1] L = (1/g) × |⃗xi,d(t) −⃗pi,d(t)| ⃗xi,d(t) = ⃗pi,d(t) −L × ln (1/u) with probability 0.5 otherwise ⃗xi,d(t) = ⃗pi,d(t) + L × ln (1/u) return all non-dominant particles Plant B A C D RMO-QPSO K + + + y x u ẋ + - ur + Figure 1: Schematic of the proposed RMO-QPSO within the closed-loop system. 12 θ lcosθ l u x y lsinθ mg M w h u q θ µ δ (a) (b) Figure 2: Schematics of evaluation benchmarks. (a) Inverted pendulum system with two degrees of freedom. (b) Aircraft landing flare system. 6 Experimental results 6.1 Modeling Without loss of generality, two classic control benchmarks are used for evaluating the introduced method. These benchmarks are an inverted pendulum system with two degrees of freedom and an aircraft landing flare system. The schematic of these two systems is illustrated in Figure 2. The inverted pendulum system consists of a pendulum mounted on a movable cart which is restricted to linear motion. The control objective is to maintain the unstable equilibrium position by moving the cart along a horizontal track. The pendulum initially starts in an upright position with two acting forces including a vertical gravity force (mg) and an external horizontal force u. The state space representation of the system derived from Lagrange equation and linearized using Taylor series expansion is depicted in 20.   ˙x ˙υ ˙θ ˙ω  =   0 1 0 0 0 0 −mg M 0 0 0 0 1 0 0 g(m+M) lM 0  ×   x υ θ ω  +   0 1 M 0 −1 lM  ×  u  (20) In this equation, x denotes the position of the cart and v is its linear velocity, θ is the angle of the pendulum with respect to the vertical direction, and ω is its angular velocity. The simulation parameters are set as follows. The mass of the cart and pendulum are set to 0.5 kg and 0.2 kg, respectively. The length of the pendulum and the gravitational acceleration are set to 0.6 m and 9.81 ms2, respectively. The initial condition of the system is defined as [x(0)υ(0)θ(0)ω(0)]T = [0009]T . The second system is an aircraft landing flare system with the simplified motion shown in 21. 13   ˙u ˙w ˙q ˙θ ˙h ˙e   =   0.058 0.065 0 0.171 0 1 0.303 0.685 1.109 0 0 0 0.072 0.685 0.947 0 0 0 0 0 1 0 0 0 0 −1 0 1.133 0 0 0 0 0 0 0 0.571   ×   u w q θ h e   +   0 0 0.119 0.054 0 0.074 1.117 0 0.115 0 0 0 0 0 0 0 0.571 0   ×   µ γ δ   (21) In this equation, u denotes the longitudinal velocity, w presents the vertical velocity, q is the angular velocity about pitch with respect to the ground, θ denotes the pitch with respect to the ground, h is the height, e is the forward acceleration caused by throttle action, µ is the elevator angle, δ is the spoiler angle, and γ is the throttle value. The simulated landing trajectory is described by h′ + 0.2h = 0 and the initial condition is defined as [u(0)w(0)q(0)θ(0)h(0)e(0)]T = [52.513150.5]T . Considering the dynamics of this system, we use the integrated absolute error(IAE) measure rather than steady-state error. Hence, in 19 we replace the Ess term with IAE measure. Using landing trajectory and considering h′ = w + 1.133 × θ from 21, the IAE measure is computed using 22. IAE = Z tf 0 |e(t)|dt = Z tf 0 |w(t) + 1.133 × θ + 0.2 × h(t)|dt (22) 6.2 Experimental Setup with Sensitivity Analysis In order to conduct comparative studies on the performance of the proposed algorithm, the two benchmark systems are stabilized using the trail-and-error, LM, GA, DE, ABC, PSO, CPSO, AIWPSO, and RMO-QPSO methods. In order to eliminate the effect of bias (i.e.sensitivity of the meta-heuristics to their parameter setting) in the comparisons, sensitivity analysis based on full factor design [29] is utilized. Although full factorial design is not feasible for systems with high degrees of freedom such as NN but it is completely tractable for most of the meta-heuristics as there are a few parameters to be decided. In order to carry out this task, each system is stabilized using a fixed set of parameter values for five times and the performance criterion is averaged. This process is repeated for all possible parameter settings and as soon as all the settings are evaluated, the setting that has resulted in the best performance is selected as the parameter setting for the meta-heuristic at hand. In order to generate the all possible parameter settings, a systematic and incremental approach is utilized. For each given parameter, a predetermined range is defined and then starting from the lower boundary, the value of the parameter is incremented by a defined step size until it reaches the upper boundary. These ranges and step sizes are defined as follows. The range of iterations T for all competing methods except trail-and-error is set to [25,150] with step size of 25. For LM optimizer, the sensitivity analysis is performed on the blending factor λ within the range of [5,15] and step size of 1. For population-based meta-heuristics, the range of population size |S| is set to[10,100] with increment factor of 10. For the GA, single point uniform cross-over, tournament selection mechanism with the neighborhood radius of 5 chromosomes, and Gaussian mutation operators are utilized. The cross-over probability pc is defined in the range of [0.3, 0.9] with step size of 0.1 and the mutation probability pm is defined in the range of [0.05, 0.3] with the step size of 0.05. Considering that contrary to GA, DE mostly relies on the mutation operator, the parameter setting for DE is define by switching the ranges of the cross-over 14 Table 1: Acquired parameter settings for simulations by sensitivity analysis. Method Parameter setting Trail-error — LM λ ←10, T ←150 GA |S| = 70, T ←150, pc ←0.9, pm ←0.1 DE |S| = 50, T ←150, pc ←0.15, pm ←0.8 ABC |S| = 30, T ←175 PSO |S| = 40, T ←50, cp ←0.7, cg ←1.5 CPSO |S| = 20, T ←175 AIWPSO |S| = 40, T ←70, wmin ←0.05, wmax ←0.95 MO-QPSO |S| = 20, T ←75, g ←1.5ln √ 2 and mutation probabilities used in GA. The ABC setting is as follows. The percentage of onlooker bees is set to 50% of the colony, the employed bees are set to 50% of the colony and the number of scout bees is selected as one. In the PSO algorithm, the range of the both cognitive and social parameters is set to [0.5, 2.0] with a step size of 0.1. The CPSO and AIWPSO settings are identical to the PSO setting. The chaotic sequence ofCPSO is generated by logistic maps. Finally, the control parameter of RMO-QPSO g is set to the range of [0,1]. As suggested in literature [52], the control parameter is updated by coefficient of ln √ 2.The results of parameter settings for different optimizers using full factorial sensitivity analysis are summarized in Table 1. As shown, the best parameter settings for two benchmarks are selected using full factorial design. These acquired parameter settings will mitigate the bias sensitivity of the opponent methods and will result in fair comparisons. 6.3 Simulation Results The nine competitive methods are applied to stabilize the two benchmarks based on the parameter settings decided by sensitivity analysis. The trial-and-error method and the MA optimizer are run for one time as they do not have stochastic nature. On the other hand, the stochastic meta-heuristics are run for fifty times to approximate their behavior based on the acquired averages and standard deviations. The results are shown in Table 2. As shown in Table 2, in case of inverted pendulum system,results indicate that all of the methods have successfully eliminated the steady-state error. In terms of overshoot the trial-and-error out-performs other methods. This is due to the slow response of the designed controller by expert. In terms of the rise-time RMO-QPSO and CSPO outperform other methods where as in terms of settling-time RMO-QPSO reaches the steady state faster than others. Also,RMO-QPSO and DE reach the best quadratic performance index.In controlling the flight landing system, RMO-QPSO outperforms other techniques in terms of integrated absolute error, overshoot,rise-time and settling time. In terms of minimizing the quadratic index CPSO and RMO-QPSO perform best. The results suggest in the inverted pendulum system, RMO-QPSO enhances the rising time, settling time, and the quadratic performance index by 4%, 8%, and 2%, respectively in comparison to the second best optimizer. Considering the fairly fast dynamics of the inverted pendulum system, these enhancements can boost the stabi- lization process in turn. Also, the results suggest that for the flight landing system, 15 Table 2: Acquired results from stabilizing two benchmark problems (inverted pendulum on the left and flight landing on the right). Optimizer Ess OS Tr Ts J IAE OS Tr Ts J Trial-error Mean 0 0.23 0.46 2.09 26,000 31.45 0.22 9.25 18.23 24,500 SD – – – – – – – – – – LM Mean 0 0.51 0.49 1.84 24,200 26.08 0.21 9.03 14.32 26,700 SD – – – – – – – – – – GA Mean 0 0.85 0.54 1.19 22,000 20.32 0.19 6.53 12.31 24,200 SD 0 0.06 0.07 0.12 740.9 4.58 0.07 1.46 2.34 512.5 DE Mean 0 0.62 0.46 1.11 20,500 14.35 0.11 2.51 11.22 24,220 SD 0 0.04 0.04 0.08 230.5 5.26 0.04 0.84 0.98 251.3 ABC Mean 0 0.43 0.47 1.13 20,570 10.78 0.17 4.76 11.84 20,550 SD 0 0.02 0.06 0.08 268.0 1.56 0.04 0.79 1.87 98.5 PSO Mean 0 0.89 0.48 1.11 21,600 11.21 0.18 5.23 11.97 22,000 SD 0 0.07 0.09 0.06 231.6 3.35 0.06 0.62 1.36 112.0 CPSO Mean 0 0.46 0.43 1.15 20,700 10.06 0.19 4.89 11.93 19,300 SD 0 0.02 0.02 0.07 212.3 3.24 0.11 0.97 0.76 104.3 AIWPSO Mean 0 0.59 0.47 1.20 20,850 9.64 0.15 3.91 11.46 20,580 SD 0 0.07 0.05 0.07 225.0 2.01 0.05 1.12 1.12 115.6 RMO-QPSO Mean 0 0.67 0.43 1.02 20,500 8.89 0.07 1.42 7.27 19,300 ± 0 0.10 0.09 0.70 220.1 6.21 0.23 1.03 2.32 472.1 RMO-QPSO enhances the integrated absolute error, over-shoot, rising time, settling time, and the quadratic performance index by 8%, 50%, 70%, 50%, and 6%, respec- tively. It is noteworthy that due to slower dynamics of the flight landing system, the proposed algorithm has resulted in greater enhancements in comparison with the in- verted pendulum system. All in all, the results suggest that RMO-QPSO outperforms trail-and-error,gradient-based and stochastic techniques. It is also shown that ABC,DE, CPSO and AIWPSO achieve better performance in comparison to trial-and-error, ML, GA and PSO. In order to prove that the results shown in Table 2 carry statistical significance, pairwise one-tailed t-test is performed between the results of the RMO-QPSO and the other stochastic methods. The results of the t-test are summarized in Table 3. In a standard t-test, the null hypothesis is rejected if the corresponding p-value is less than 0.05. As shown in Table 3,the t-tests performed among the results of different objective of RMO-QPSO and opponent stochastic methods reveals that all p-values are smaller than the threshold value. In other words, all the performed t-tests reject the null hypothesis and hence the simulation results are in fact statistically valid. The responses of the pendulum’s angle and its angular velocity to the initial condi- tions are depicted in Figure 3. The responses of the spoiler angle, elevator angle, and throttle value of the flight landing system to the initial conditions are illustrated in Fig- ure 4. It is noteworthy that for the purpose of readability, the diagrams only illustrate the responses of the systems under the control of four best performed methods. 16 Table 3: Acquired results from stabilizing two benchmark problems (inverted pendulum on the left and flight landing on the right). Optimizer OS Tr Ts J IAE OS Tr Ts J GA p-value 0.02 0.01 0.00 0.00 0.00 0.01 0.00 0.02 0.01 DE p-value 0.01 0.00 0.00 0.00 0.03 0.00 0.02 0.00 0.01 ABC p-value 0.00 0.02 0.04 0.03 0.00 0.03 0.02 0.04 0.00 PSO p-value 0.00 0.02 0.00 0.00 0.02 0.04 0.00 0.03 0.00 CPSO p-value 0.00 0.00 0.04 0.01 0.00 0.03 0.01 0.00 0.04 AIWPSO p-value 0.03 0.04 0.00 0.00 0.02 0.00 0.00 0.01 0.00 7 Conclusion In this paper, a novel and generic multi-objective design paradigm is proposed which utilizes a global convergence guaranteed variation of particle swarm optimization (PSO) called quantum-behaved PSO (QPSO) for deciding the optimal configuration of the LQR controller for a given problem considering a set of competing objectives. There are three main contributions introduced in this paper as follows. (1) The standard QPSO algorithm is reinforced with an informed initialization scheme based on the sim- ulated annealing algorithm and Gaussian neighborhood selection mechanism. (2) It is also augmented with a local search strategy which integrates the advantages of memetic algorithm into conventional QPSO. (3) Finally, an aggregated dynamic weighting crite- rion is introduced that dynamically combines the soft and hard constraints with control objectives to provide the designer with a set of Pareto optimal solutions and lets her to decide the target solution based on practical preferences. As far as the authors’ knowledge is concerned, this is the first time that a multi-objective derivation of PSO is applied for deciding the configuration of LQR controllers. The proposed method is compared against a gradient-based method, seven meta- heuristics, and the trial-and-error method on two control benchmarks including inverted pendulum system and flight landing system using sensitivity analysis and full factorial parameter selection and the results are validated using one-tailed T-test. The exper- imental results suggest that in inverted pendulum system, RMO-QPSO enhances the rising time, settling time, and the quadratic performance index by 4%, 8%, and 2%, respectively in comparison to the second best optimizer. Also, the results suggest that for the flight landing system, RMO-QPSO enhances the integrated absolute error, over- shoot, rising time, settling time, and the quadratic performance index by 8%, 50%, 70%, 50%, and 6%, respectively. All in all, the results suggest that RMO-QPSO outperforms trail-and-error, gradient-based and stochastic techniques. It is also shown that ABC, DE, CPSO and AIWPSO achieve better performance in comparison to trial-and-error, ML, GA and PSO. A possible direction for future studies is to enhance the RMO-QPSO in a way that it can address the online optimization of the LQR based on the signal stream. 17 1.0 0.6 0.4 0.2 0 -0.2 0.8 Angle Time 0 1 2 3 4 5 6 DE CPSO ABC MO-QPSO 10 6 4 2 0 -2 8 Angular velocity Time 0 1 2 3 4 5 6 DE CPSO ABC MO-QPSO (a) (b) Figure 3: The responses of inverted pendulum system to the initial conditions. (a) Response of the angle. (b) Response of the angular velocity. -5 -10 -15 -20 -25 0 Spoiler angle Time 0 5 10 15 20 25 30 ABC AIWPSO DE MO-QPSO -7 0 -1 -2 -3 -4 2 1 -5 -6 Elevator angle Time 0 5 10 15 20 25 30 ABC AIWPSO DE MO-QPSO 10 5 0 -5 15 Throttle value Time 0 5 10 15 20 25 30 (a) (b) (c) ABC AIWPSO DE MO-QPSO Figure 4: The responses of the flight landing system to the initial conditions. (a) Response of the spoiler angle. (b) Response of the elevator angle. (c) Response of the throttlevalue. 18 References [1] Bahriye Akay. A study on particle swarm optimization and artificial bee colony algorithms for multilevel thresholding. Applied Soft Computing, 13(6):3066–3091, 2013. [2] Mohammed R AlRashidi and Mohamed E El-Hawary. A survey of particle swarm optimization applications in electric power systems. IEEE Transactions on Evolu- tionary Computation, 13(4):913–918, 2009. [3] Fereidoun Amini, N Khanmohammadi Hazaveh, and A Abdolahi Rad. Wavelet pso- based lqr algorithm for optimal structural control using active tuned mass dampers. Computer-Aided Civil and Infrastructure Engineering, 28(7):542–557, 2013. [4] Akhilesh Chander, Amitava Chatterjee, and Patrick Siarry. A new social and mo- mentum component adaptive pso algorithm for image segmentation. Expert Systems with Applications, 38(5):4998–5004, 2011. [5] Sun Changhao and Haibin Duan. Artificial bee colony optimized controller for unmanned rotorcraft pendulum. Aircraft Engineering and Aerospace Technology, 85(2):104–114, 2013. [6] Pinar Civicioglu and Erkan Besdok. A conceptual comparison of the cuckoo-search, particle swarm optimization, differential evolution and artificial bee colony algo- rithms. Artificial Intelligence Review, 39(4):315–346, 2013. [7] Maurice Clerc and James Kennedy. The particle swarm-explosion, stability, and convergence in a multidimensional complex space. IEEE transactions on Evolu- tionary Computation, 6(1):58–73, 2002. [8] Dipankar Dasgupta, German Hernandez, Deon Garrett, Pavan Kalyan Vejandla, Aishwarya Kaushal, Ramjee Yerneni, and James Simien. A comparison of multi- objective evolutionary algorithms with informed initialization and kuhn-munkres algorithm for the sailor assignment problem. In Proceedings of the 10th annual conference companion on Genetic and evolutionary computation, pages 2129–2134. ACM, 2008. [9] Ali Douik, Liouane Hend, and Hassani Messaoud. Optimised eigenstructure assign- ment by ant system and lqr approaches. IJCSA, 5(4):45–56, 2008. [10] Russ C Eberhart, James Kennedy, et al. A new optimizer using particle swarm theory. In Proceedings of the sixth international symposium on micro machine and human science, volume 1, pages 39–43. New York, NY, 1995. [11] Emad Elbeltagi, Tarek Hegazy, and Donald Grierson. Comparison among five evolutionary-based optimization algorithms. Advanced engineering informatics, 19(1):43–53, 2005. [12] Gao Fei, Li Zhuo-Qiu, and Tong Heng-Qing. Parameters estimation online for lorenz system by a novel quantum-behaved particle swarm optimization. Chinese Physics B, 17(4):1196, 2008. 19 [13] S Amir Ghoreishi and Mohammad Ali Nekoui. Optimal weighting matrices de- sign for lqr controller based on genetic algorithm and pso. In Advanced Materials Research, volume 433, pages 7546–7553. Trans Tech Publ, 2012. [14] Liu Guoping, Xiao Genfu, and Yang Xiaohui. The lqr real-time control for the inverted pendulum based on pso. In Electrical and Control Engineering (ICECE), 2010 International Conference on, pages 2363–2366. IEEE, 2010. [15] Charles Harvey and Gunter Stein. Quadratic weights for asymptotic regulator properties. IEEE Transactions on Automatic Control, 23(3):378–387, 1978. [16] Kaveh Hassani and Won-Sook Lee. An incremental framework for classification of eeg signals using quantum particle swarm optimization. In 2014 IEEE Inter- national Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), pages 40–45. IEEE, 2014. [17] Kaveh Hassani and Won-Sook Lee. Optimal tuning of linear quadratic regulators using quantum particle swarm optimization. In Proceedings of the International Conference on Control, Dynamic Systems, and Robotics (CDSR14), 2014. [18] Wilmar Hernandez and Norberto Ca˜nas. Non-linear control of an autonomous ground vehicle. In IECON 2011-37th Annual Conference on IEEE Industrial Elec- tronics Society, pages 2678–2683. IEEE, 2011. [19] Lester Ingber. Simulated annealing: Practice versus theory. Mathematical and computer modelling, 18(11):29–57, 1993. [20] Hisao Ishibuchi, Tadashi Yoshida, and Tadahiko Murata. Balance between genetic search and local search in memetic algorithms for multiobjective permutation flow- shop scheduling. IEEE transactions on evolutionary computation, 7(2):204–223, 2003. [21] Wisnu Jatmiko, Kosuke Sekiyama, and Toshio Fukuda. A pso-based mobile robot for odor source localization in dynamic advection-diffusion with obstacles envi- ronment: theory, simulation and measurement. IEEE Computational Intelligence Magazine, 2(2):37–51, 2007. [22] Tor Arne Johansen, Idar Petersen, Jens Kalkkuhl, and Jens Ludemann. Gain- scheduled wheel slip control in automotive brake systems. IEEE Transactions on Control Systems Technology, 11(6):799–811, 2003. [23] MA Johnson and MJ Grimble. Recent trends in linear optimal quadratic multivari- able control system design. In IEE Proceedings D-Control Theory and Applications, volume 134, pages 53–71. IET, 1987. [24] Srinivas Bhaskar Karanki, Mahesh K Mishra, and B Kalyan Kumar. Particle swarm optimization-based feedback controller for unified power-quality conditioner. IEEE transactions on power delivery, 25(4):2814–2824, 2010. [25] Bachir Kedjar and Kamal Al-Haddad. Dsp-based implementation of an lqr with integral action for a three-phase three-wire shunt active power filter. IEEE Trans- actions on Industrial Electronics, 56(8):2821–2828, 2009. 20 [26] Arsalan H Khan, Zhang Weiguo, Shi Jingping, and Zeashan H Khan. Optimized reconfigurable modular flight control design using swarm intelligence. Procedia Engineering, 24:621–628, 2011. [27] Raghavendra V Kulkarni and Ganesh Kumar Venayagamoorthy. Particle swarm optimization in wireless-sensor networks: A brief survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 41(2):262– 267, 2011. [28] E Vinodh Kumar and Jovitha Jerome. Lqr based optimal tuning of pid controller for trajectory tracking of magnetic levitation system. Procedia Engineering, 64:254– 264, 2013. [29] Yooyoung Lee, James J Filliben, Ross J Micheals, and P Jonathon Phillips. Sensitiv- ity analysis for biometric systems: A methodology based on orthogonal experiment designs. Computer Vision and Image Understanding, 117(5):532–550, 2013. [30] Xiujuan Lei and Ali Fu. Two-dimensional maximum entropy image segmentation method based on quantum-behaved particle swarm optimization algorithm. In 2008 Fourth International Conference on Natural Computation, volume 3, pages 692–696. IEEE, 2008. [31] Frank L Lewis and Vassilis L Syrmos. Optimal control. John Wiley & Sons, 1995. [32] Shouyi Li, Ronggui Wang, Weiwei Hu, and Jianqing Sun. A new qpso based bp neural network for face detection. In Fuzzy information and engineering, pages 355–363. Springer, 2007. [33] Yong Li, Jianchang Liu, and Yu Wang. Design approach of weighting matrices for lqr based on multi-objective evolution algorithm. In Information and Automation, 2008. ICIA 2008. International Conference on, pages 1188–1192. IEEE, 2008. [34] Hend Liouane, Ibtissem Chiha, Ali Douik, and Hassani Messaoud. Probabilistic differential evolution for optimal design of lqr weighting matrices. In 2012 IEEE International Conference on Computational Intelligence for Measurement Systems and Applications (CIMSA) Proceedings, pages 18–23. IEEE, 2012. [35] Madhubanti Maitra and Amitava Chatterjee. A hybrid cooperative–comprehensive learning based pso algorithm for image segmentation using multilevel thresholding. Expert Systems with Applications, 34(2):1341–1350, 2008. [36] Mohammad Ali Nekoui and Hassan Heidari Jame Bozorgi. Weighting matrix se- lection method for lqr design based on a multi-objective evolutionary algorithm. In Advanced Materials Research, volume 383, pages 1047–1054. Trans Tech Publ, 2012. [37] Arfah Syahida Mohd Nor, Hazlina Selamat, and Ahmad Jais Alimin. Optimal controller design for a railway vehicle suspension system using particle swarm op- timization. Jurnal Teknologi, 54(1):71–84, 2012. [38] Carlos Olalla, Ramon Leyva, Abdelali El Aroudi, and Isabelle Queinnec. Robust lqr control for pwm converters: an lmi approach. IEEE Transactions on industrial electronics, 56(7):2548–2558, 2009. 21 [39] Konstantinos E Parsopoulos and Michael N Vrahatis. Particle swarm optimization method in multiobjective problems. In Proceedings of the 2002 ACM symposium on Applied computing, pages 603–607. ACM, 2002. [40] M Bayati Poodeh, S Eshtehardiha, and M Namnabat. Optimized state controller on dc-dc converter. In 2007 7th Internatonal Conference on Power Electronics, pages 153–158. IEEE, 2007. [41] Jim Pugh and Alcherio Martinoli. Inspiring and modeling multi-robot search with particle swarm optimization. In 2007 IEEE Swarm Intelligence Symposium, pages 332–339. IEEE, 2007. [42] Elyas Rakhshani. Intelligent linear-quadratic optimal output feedback regulator for a deregulated automatic generation control system. Electric Power Components and Systems, 40(5):513–533, 2012. [43] SA Panimadai Ramaswamy, Ganesh K Venayagamoorthy, and SN Balakrishnan. Optimal control of class of non-linear plants using artificial immune systems: Ap- plication of the clonal selection algorithm. In 2007 IEEE 22nd International Sym- posium on Intelligent Control, pages 249–254. IEEE, 2007. [44] Mehrdad Saif. Optimal linear regulator pole-placement by weight selection. Inter- national Journal of Control, 50(1):399–414, 1989. [45] Yahaya Md Sam, Mohd Ruddin Hj Ab Ghani, and Nasarudin Ahmad. Lqr controller for active car suspension. In TENCON 2000. Proceedings, volume 1, pages 441–444. IEEE, 2000. [46] Andrey V Savkin, PUBUDU N Pathirana, and FARHAN A Faruqi. Problem of precision missile guidance: Lqr and h control frameworks. IEEE Transactions on Aerospace and Electronic Systems, 39(3):901–910, 2003. [47] Andrey V Savkin, PUBUDU N Pathirana, and FARHAN A Faruqi. Problem of precision missile guidance: Lqr and h control frameworks. IEEE Transactions on Aerospace and Electronic Systems, 39(3):901–910, 2003. [48] Mahmud Iwan Solihin, Rini Akmeliawati, et al. Pso-based optimization of state feedback tracking controller for a flexible link manipulator. In Soft Computing and Pattern Recognition, 2009. SOCPAR’09. International Conference of, pages 72–76. IEEE, 2009. [49] Mahmud Iwan Solihin, Ari Legowo, Rini Akmeliawati, et al. Comparison of lqr and pso-based state feedback controller for tracking control of a flexible link manipula- tor. In Information Management and Engineering (ICIME), 2010 The 2nd IEEE International Conference on, pages 354–358. IEEE, 2010. [50] Scott R Starin, RK Yedavalli, and Andrew G Sparks. Design of a lqr controller of reduced inputs for multiple spacecraft formation flying. In American Control Conference, 2001. Proceedings of the 2001, volume 2, pages 1327–1332. IEEE, 2001. [51] Jun Sun, Wei Fang, Xiaojun Wu, Vasile Palade, and Wenbo Xu. Quantum-behaved particle swarm optimization: Analysis of individual particle behavior and parameter selection. Evolutionary computation, 20(3):349–393, 2012. 22 [52] Jun Sun, Bin Feng, and Wenbo Xu. Particle swarm optimization with particles having quantum behavior. In Congress on Evolutionary Computation, 2004. [53] Jun Sun, Wenbo Xu, and Bin Feng. A global search strategy of quantum-behaved particle swarm optimization. In Cybernetics and Intelligent Systems, 2004 IEEE Conference on, volume 1, pages 111–116. IEEE, 2004. [54] Jun Sun, Wenbo Xu, and Bin Feng. Adaptive parameter control for quantum- behaved particle swarm optimization on individual level. In 2005 IEEE Interna- tional Conference on Systems, Man and Cybernetics, volume 4, pages 3049–3054. IEEE, 2005. [55] Chin-Wang Tao, Jin-Shiuh Taur, and YC Chen. Design of a parallel distributed fuzzy lqr controller for the twin rotor multi-input multi-output system. Fuzzy Sets and Systems, 161(15):2081–2103, 2010. [56] Shang-Jeng Tsai, Chih-Li Huo, Ying-Kuei Yang, and Tsung-Ying Sun. Variable feedback gain control design based on particle swarm optimizer for automatic fighter tracking problems. Applied Soft Computing, 13(1):58–75, 2013. [57] J Tyler and F Tuteur. The use of a quadratic performance index to design multi- variable control systems. IEEE Transactions on Automatic Control, 11(1):84–92, 1966. [58] Frans Van Den Bergh. An analysis of particle swarm optimizers. PhD thesis, University of Pretoria, 2006. [59] P Harsha Vardhana, B Kalyan Kumar, and Mahesh Kumar. A robust controller for dstatcom. In 2009 International Conference on Power Engineering, Energy and Electrical Drives, pages 546–551. IEEE, 2009. [60] Lukasz Wiklendt, Stephan Chalup, and Rick Middleton. A small spiking neural net- work with lqr control applied to the acrobot. Neural Computing and Applications, 18(4):369–375, 2009. [61] Chaiporn Wongsathan and Chanapoom Sirima. Application of ga to design lqr controller for an inverted pendulum system. In Robotics and Biomimetics, 2008. ROBIO 2008. IEEE International Conference on, pages 951–954. IEEE, 2009. [62] Xin Xiong and Zhou Wan. The simulation of double inverted pendulum control based on particle swarm optimization lqr algorithm. In 2010 IEEE International Conference on Software Engineering and Service Sciences, pages 253–256. IEEE, 2010. [63] Xian Qiang Zeng, Lan Jing, Ze En Yao, and Yu Hui Guo. A pso-based lqr controller for accelerator pwm power supply. In Advanced Materials Research, volume 490, pages 71–75. Trans Tech Publ, 2012. [64] Jing Zhang, Lixiang Zhang, and Jing Xie. Application of memetic algorithm in control of linear inverted pendulum. In 2011 IEEE International Conference on Cloud Computing and Intelligence Systems, pages 103–107. IEEE, 2011. 23 [65] Zhang Zhisheng. Quantum-behaved particle swarm optimization algorithm for eco- nomic load dispatch of power system. Expert Systems with Applications, 37(2):1800– 1803, 2010. 24