PDE-Based Optimization for Stochastic Mapping and Coverage Strategies using Robotic Ensembles Karthik Elamvazhuthi 1 , Hendrik Kuiper 2 , and Spring Berman 1 1 School for Engineering of Matter, Transport and Energy, Arizona State University, Tempe, AZ, USA 2 School of Mathematical and Statistical Sciences, Arizona State University, Tempe, AZ, USA November 30, 2017 Abstract This paper presents a novel partial differential equation (PDE)-based framework for controlling an ensemble of robots, which have limited sens- ing and actuation capabilities and exhibit stochastic behaviors, to perform mapping and coverage tasks. We model the ensemble population dynamics as an advection-diffusion-reaction PDE model and formulate the mapping and coverage tasks as identification and control problems for this model. In the mapping task, robots are deployed over a closed domain to gather data, which is unlocalized and independent of robot identities, for recon- structing the unknown spatial distribution of a region of interest. We frame this task as a convex optimization problem whose solution repre- sents the region as a spatially-dependent coefficient in the PDE model. We then consider a coverage problem in which the robots must perform a desired activity at a programmable probability rate to achieve a target spatial distribution of activity over the reconstructed region of interest. We formulate this task as an optimal control problem in which the PDE model is expressed as a bilinear control system, with the robots’ coverage activity rate and velocity field defined as the control inputs. We vali- date our approach with simulations of a combined mapping and coverage scenario in two environments with three target coverage distributions. 1 Introduction Over the past few decades, partial differential equation (PDE) models of multi- agent systems have been used extensively in mathematical biology to analyze collective behaviors such as chemotaxis, flocking, schooling, predator-prey inter- actions, and pattern formation [33]. Many of these models are linear or nonlinear advection-diffusion type PDEs, which describe the spatiotemporal evolution of probability densities of agents. Mathematical tools such as bifurcation analysis, optimization, and control theory can be applied to these continuum macroscopic models to make qualitative and quantitative predictions about the system behav- ior. Typically, each PDE model corresponds to a discrete microscopic model that 1 arXiv:1711.11018v1 [cs.SY] 29 Nov 2017 captures the stochastic and deterministic actions of individual agents. While these microscopic models are more accurate descriptions of the agents’ behavior, the macroscopic models enable tractable analysis for large agent numbers. Recently, this work has motivated the use of similar types of PDEs to model and control the spatiotemporal dynamics of very large collectives, or swarms [15], of small, resource-constrained robots (e.g., [24, 39]) that are currently be- ing developed for applications such as environmental monitoring, exploration, surveillance, disaster response, and biomedical procedures. PDEs have been used to characterize the distributions of chemotactic robots in a diffusive fluid environment [17], miniature robots inspecting a model of jet turbine blades [37], and honeybee-inspired agents that aggregate at the optimal value of a scalar field [7]. The parameters of these PDE models can be mapped to control inputs that drive the robots’ motion and probability rates of switching between states or tasks, and the collective behavior of the robots follows the PDE model predic- tion in expectation. Several works have exploited this correspondence to control the spatial distribution of a ensemble [31, 14]. These control approaches can be viewed as extensions of stochastic task allocation schemes based on nonspatial rate equation models [5, 8, 28, 27]. Other applications of continuum popula- tion dynamical models to multi-agent control include optimized confinement strategies [22], consensus using the theory of mean field controls [32], controlled flocking [35] that includes non-parallel motions [21], and pattern generation in the presence of obstacles [36]. There has also been some recent work on using PDEs to model Laplacian network dynamics of agents for formation control; see [16, 29, 11] and references therein. We apply this PDE-based modeling framework to develop a control approach for allocating tasks among an ensemble of robots. In our scenarios, a task is defined as a desired activity that a robot performs in a certain spatial region of the environment. The tasks can be performed in parallel, and multiple robots can be simultaneously allocated to each task. While various deterministic ap- proaches have been developed for multi-robot task allocation, including central- ized and decentralized market-based techniques [9, 6] and centralized methods for optimal task assignment and trajectory planning [1, 41], their computation and/or communication requirements do not scale well to very large numbers of robots and tasks. In contrast to these works, we develop a stochastic approach in which tasks are performed at random times by unidentified robots with lim- ited computing capabilities and no communication or global localization. Such limitations will be common in swarm robotic platforms, e.g. micro aerial ve- hicles [24] and microrobots [39], and in scenarios where the robots operate in GPS-denied environments where communication is impractical or unreliable. In our proposed approach, a task allocation emerges from the collective ensemble activity. We first consider a mapping problem in which the objective is to estimate a scalar spatial field from unlocalized data obtained by the robots. We then define a coverage problem in which the ensemble must produce a target spatial density of activity over a region of interest, which may be estimated in the map- ping problem. For this problem, we express the PDE model as a bilinear control system [4] and formulate an optimal control problem that computes the control inputs. Since we do not assume that agents are capable of global localization or estimation of the local agent population density, we frame the coverage prob- lem as an open-loop control problem that does not require feedback on agent 2 positions or densities. We follow the variational approach described in [40] for optimal control of the PDE model. While there has been some prior work on bilinear optimal control of systems of PDEs [34, 3], these works do not address the types of PDEs that we consider. An optimal control problem for a bilin- ear parabolic PDE was formulated in [34] with the diffusion coefficient as the control. In [3], bilinear control of a class of advection-reaction systems was considered; unlike our PDE models, these systems did not include diffusion. The mapping and controller synthesis approaches described in this paper require a central supervisor with the computational capabilities necessary to solve the associated optimization problems. Despite this centralized component, the approaches are scalable with the number of agents in the ensemble since each agent executes the same controllers with the same control variables, which are preprogrammed or broadcast by the supervisor. In our coverage strategy, there are only three control variables to be computed; in contrast, the most naive approach to controlling an ensemble of N agents moving in d dimensions would require the computation of N d control inputs. We first presented our coverage approach in [12], where we introduced a sim- ilar optimal control problem, derived the gradient of the objective functional with respect to the control parameters, and used a gradient descent algorithm to compute the optimal control. This paper provides a complete analysis of our approach in [12] by investigating the well-posedness of the PDE model and the optimal control problem. The theory of weak solutions that we use to establish the well-posedness of the PDE model is classical [13]. However, to the best of our knowledge, there have been no prior results on well-posedness that can be directly applied to our model, which is a system of PDEs in which diffusion is present only in one of the species, the control variables are time-dependent, and a zero-flux boundary condition is imposed on the boundary of a Lipschitz domain. In this paper, we prove the existence and uniqueness of solutions of our PDE model by deriving suitable energy estimates for the solutions. We also use these derived energy estimates to ensure that the computation of the gra- dient, performed using the adjoint equation approach, is well-posed. Moreover, we prove the existence of an optimal control for the problem using standard compactness arguments adapted to the PDE control setting [40]. In addition to this analysis, our formulation of the mapping problem in the same framework is a novel contribution of this paper; in [12] it was assumed that the environment is known beforehand. The paper is organized as follows. Section 2 describes the robot capabilities and their programmed behaviors during the mapping and coverage assignments, and Section 3 defines the microscopic and macroscopic models of the ensemble and its activity during each assignment. Section 4 defines key mathematical ter- minology that is used in Sections 5 and 6 to formulate and analyze the mapping and coverage objectives, respectively, as optimization problems that incorporate the macroscopic models. We validate our approach in Section 7 with simula- tions in which a region of interest must first be mapped and then covered with a target distribution of robot activity, and we conclude in Section 8. 3 2 Task Descriptions and Assumptions We consider a scenario in which ( 1 ) a small number of agents must map a re- gion of interest in an unknown, bounded environment, which we refer to as the mapping assignment , and then ( 2 ) a larger ensemble of agents must produce a target spatial distribution of activity within the mapped regions, which we call the coverage assignment . For instance, this activity could consist of sensor mea- surements, or as in our previous work [12], contacts with flowers to effect crop pollination. The mapping and coverage assignments will be formulated in a de- coupled manner by posing them as two separate optimization problems in terms of their associated mean-field PDE models. We will then demonstrate through numerical simulations that these two problems can be solved sequentially in order to achieve the desired coverage objective. 2.1 Robot capabilities We assume that the agents lack global localization, inter-agent communication, and prior data about the environment. Each agent is equipped with local sensing capabilities, allowing it to detect and distinguish between different types of regions within its sensing range, and a compass, enabling it to move with a specified heading. Additionally, the agents have sufficient memory to store the times at which they record observations of regions of interest. Similarly, it is assumed that agents have sufficient memory to store time-dependent velocity and task-switching parameters for the coverage assignment. Both the mapping and coverage assignments involve numerical optimiza- tion computations that are performed offline by an external supervisor. After the mapping assignment, the supervisor collects recorded information from the robots and uses this information to reconstruct the map of the environment. We note that the supervisor does not have information on the individual identi- ties of the robots. Based on this map, the supervisor calculates the parameters of the agent behaviors for a specified coverage objective and broadcasts these parameters to the agents before they are deployed for the coverage assignment. This broadcast architecture has been previously proposed for controlling large numbers of robots [30]. Although the external supervisor constitutes a potential single point of failure, it enables the control of large ensembles of agents that are subject to the constraints that we consider. 2.2 Robot controllers The agents traverse the environment, a bounded open connected set Ω ⊂ R 2 with Lipschitz continuous boundary ∂ Ω, with a combination of deterministic and stochastic motion during a deployment. Each agent moves with a time- dependent velocity field v ( t ) ∈ R 2 , which may be broadcast to the agents or imposed on them using external stimuli, such as magnetic fields in microrobotic applications. Concurrently, each agent exhibits random motion that may be programmed, for instance to perform probabilistic search and tracking, or that arises from inherent sensor and actuator noise. We assume that this random movement can be modeled as a Brownian motion that drives diffusion with an associated diffusion coefficient D . This approach to modeling the motion of members of a robotic swarm has, for example, been used previously in [20]. 4 The agents switch stochastically between behavioral states at constant or time-dependent transition rates , which define the probability per unit time for an agent to switch from one state to another. We define a region of interest , which may be disconnected and is assumed to be Lebesgue measurable, as Γ ⊂ Ω. During mapping , a continually moving agent records observations of Γ at a constant rate k o . During coverage , an agent that is moving over Γ pauses to perform an action (such as a sensor measurement) at a time-dependent rate k ( t ), and it resumes moving at a constant rate k f , which determines the time taken to complete the action. This coverage strategy can be extended to scenarios where there are different types of regions, and the agents perform actions over each region at a different rate, as in our prior work [12]. We specify that the agents’ velocity field and transition rates are controllable parameters. In the mapping assignment, v ( t ) is designed to ensure thorough exploration of the environment (for instance, following a lawnmower pattern), and k o is assigned a high value to yield a large number of observations and thus produce an accurate map. In the coverage assignment, k f is fixed while v ( t ) and k ( t ) are computed prior to the agents’ deployment according to the optimal control approach in Section 6. 3 Models of Ensemble Mapping and Coverage 3.1 Microscopic Models The microscopic model is used to simulate the individual robots’ motion and probabilistic decisions that are produced by the robot controllers in Section 2.2. Remark 1 Here we describe the microscopic model at a formal level as a discrete- time stochastic process. A rigorous correspondence between the microscopic model and the macroscopic model for the case Ω = R 2 was shown recently by the authors and collaborators in [43]. We model a robot’s changes in state and performance of desired actions as a Chemical Reaction Network (CRN) in which the possible species are M , a moving robot; S , a stationary robot; and A , an instance of a desired robot activity. These reactions can only occur when a robot is located in the region of interest Γ. For example, in an artificial pollination scenario [12], Γ could represent the subset of the domain in which flowers are distributed. A robot’s mapping activity consists of a single irreversible reaction, M k o − → M + A, (1) where A is the robot’s observation of the region of interest. A robot’s coverage activity consists of two irreversible reactions, M k ( t ) − − → S + A, (2) S k f − → M , (3) where A is a desired action that the robot performs. 5 In addition, we model the displacement of a robot i over a timestep ∆ t using the standard-form Langevin equation [18], X i ( t + ∆ t ) − X i ( t ) = Y i ( t ) ( v ( t )∆ t + (2 D ∆ t ) 1 / 2 Z i ( t ) ) , (4) where X i ( t ) ∈ R 2 is the position of robot i ∈ { 1 , .., N } at time t and Z i ( t ) ∈ R 2 is a vector of independent standard normal random variables that are generated at time t . Here, Y i ( t ) is a binary variable associated with the discrete state of robot i at time t . When the robot is performing the mapping assignment, Y i ( t ) = 1 for all t . When it is performing the coverage assignment, Y i ( t ) = 1 if the robot is in state M at time t and Y i ( t ) = 0 if it is in state S at time t . During the coverage assignment, Y i ( t ) evolves according to the conditional probabilities P ( Y i ( t + ∆ t ) = 0 | Y i ( t ) = 1) = H Γ ( X i ( t )) k ( t )∆ t and P ( Y i ( t + ∆ t ) = 1 | Y i ( t ) = 0) = k f ∆ t , where H Γ is the indicator function of the set Γ and P is the probability measure induced by the process ( X , Y ) on the set of sample paths. The robot avoids collisions with the domain boundary by per- forming a specular reflection when it encounters this boundary. We emphasize that the velocity v ( t ) and diffusion coefficient D are the same for each robot i . This assumption enables the coverage problem to be posed as a PDE control problem, as shown in Sections 5 and 6. 3.2 Macroscopic Models The macroscopic model consists of a set of advection-diffusion-reaction (ADR) PDEs that describe the evolution of probability densities of agents that follow the microscopic model, conditioned on the initial distribution of the agents’ states. Since we assume that there are no interactions between the agents, the random variables associated with the agents are independent and identically distributed, and hence we can drop the subscript i from X i ( t ), Y i ( t ), and Z i ( t ) and represent the macroscopic model as a single system of PDEs rather than N systems of PDEs. The state variables of this model are the population density fields of moving robots, y 1 ( x , t ); stationary robots, denoted by y 2 ( x , t ) in the coverage model; and instances of the desired robot activity, denoted by y 2 ( x , t ) in the mapping model and by y 3 ( x , t ) in the coverage model. We define Q = Ω × (0 , T ) and Σ = ∂ Ω × (0 , T ) for some fixed final time T . The vector n ( x ) ∈ R 2 is the outward normal to ∂ Ω, defined for almost every x ∈ ∂ Ω. We also define an initial density of moving robots y 0 ( x ) over Ω that is normalized such that ∫ Ω y 0 ( x ) d x = 1. The macroscopic model of mapping is given by: ∂y 1 ∂t = ∇ · ( D ∇ y 1 − v ( t ) y 1 ) in Q, ∂y 2 ∂t = k o H Γ ( x ) y 1 in Q (5) with the no-flux boundary condition n · ( D ∇ y 1 − v ( t ) y 1 ) = 0 on Σ (6) and initial conditions y 1 ( · , 0) = y 0 , y 2 ( · , 0) = 0 on Ω , (7) 6 where 0 is the function on Ω that takes the value 0 almost everywhere. The microscopic model of mapping defined in Section 3.1 is related to the above macroscopic model by the approximation P ( X ( t ) ∈ Λ) ≈ ∫ Λ y 1 ( x , t ) d x . Ad- ditionally, ∫ Λ y 2 ( x , t ) d x is the expected number of observations recorded by the robots in the region Λ up until time t . The macroscopic model of coverage is defined as: ∂y 1 ∂t = ∇ · ( D ∇ y 1 − v ( t ) y 1 ) − k ( t ) H Γ ( x ) y 1 + k f y 2 in Q, ∂y 2 ∂t = k ( t ) H Γ ( x ) y 1 − k f y 2 in Q, ∂y 3 ∂t = k ( t ) H Γ ( x ) y 1 in Q, (8) with the no-flux boundary condition (6) and initial conditions y 1 ( · , 0) = y 0 , y 2 ( · , 0) = y 3 ( · , 0) = 0 on Ω . (9) In this macroscopic model, the density fields y 1 ( x , t ) and y 2 ( x , t ) are re- lated to the microscopic model of coverage, given by the stochastic process ( X ( t ) , Y ( t )) defined in Section 3.1, as follows: P (( X ( t ) , Y ( t )) ∈ (Λ × { 1 } )) ≈ ∫ Λ y 1 ( x , t ) d x , P (( X ( t ) , Y ( t )) ∈ (Λ × { 0 } )) ≈ ∫ Λ y 2 ( x , t ) d x , (10) where Λ is a measurable subset of Ω. Additionally, ∫ Λ y 3 ( x , t ) d x represents the expected number of instances of robot coverage activity performed in the region Λ up until time t . 4 Mathematical Preliminaries We now define some mathematical terms that are used in later sections. Given two Banach spaces P and Q , L ( P, Q ) is the space of bounded operators from P to Q . For a finite collection of Banach spaces, { Z 1 , ..., Z M } , we equip the Cartesian product of the spaces, Z 1 × ... × Z M , with the norm ( ∑ M α =1 ‖ · ‖ 2 α ) 1 / 2 . We define L 2 (Ω) as the Hilbert space of square integrable functions over Ω, where Ω ⊂ R n is a bounded open subset of a Euclidean domain of dimension n . The Hilbertian structure of L 2 (Ω) is induced by the standard inner product, 〈· , ·〉 L 2 (Ω) : L 2 (Ω) × L 2 (Ω) → R , given by 〈 f, g 〉 L 2 (Ω) = ∫ Ω f ( x ) g ( x ) d x for each f, g ∈ L 2 (Ω). The norm ‖ · ‖ L 2 (Ω) on the space L 2 (Ω) is defined as ‖ f ‖ L 2 (Ω) = 〈 f, f 〉 1 / 2 L 2 (Ω) for all f ∈ L 2 (Ω). For a positive integer n ∈ Z + , P n refers to the Cartesian product of n copies of the space P ; that is, P n = ∏ n i =1 P . We use P ∗ to denote the space of linear continuous functionals on the Banach space P . The bilinear form that induces the duality between P and P ∗ is given by 〈· , ·〉 P ∗ ,P : P ∗ × P → R , where 〈 x, y 〉 P ∗ ,P = x ( y ) for each x ∈ P ∗ and each y ∈ P . 7 We define the Sobolev space H 1 (Ω) = { z ∈ L 2 (Ω) : ∂z ∂x 1 , ∂z ∂x 2 ∈ L 2 (Ω) } . Here, the spatial derivatives are to be understood as weak derivatives defined in the distributional sense. See [13] for this notion of a derivative of a function that is not necessarily differentiable in the classical sense. We equip the space with the usual Sobolev norm ‖ y ‖ H 1 (Ω) = ( ‖ y ‖ 2 L 2 (Ω) + ∑ 2 i =1 ∥ ∥ ∥ ∂y ∂x i ∥ ∥ ∥ 2 L 2 (Ω) ) 1 / 2 . The dual space of H 1 (Ω), denoted by H 1 (Ω) ∗ , is the space of bounded linear functionals on H 1 (Ω) through the inner product of L 2 (Ω). Defining V = H 1 (Ω) and X = V × L 2 (Ω) 2 , it follows that X ∗ := V ∗ × L 2 (Ω) 2 . We say that a sequence f n is weakly converging to f in X , written as f n ⇀ f , if lim n →∞ 〈 φ, f n 〉 X ∗ ,X = 〈 φ, f 〉 X ∗ ,X for all φ in X ∗ . For norm convergence of a sequence f n to f , we write f n → f . The space L 2 (0 , T ; X ) consists of all strongly measurable functions u : [0 , T ] → X for which ‖ u ‖ L 2 (0 ,T ; X ) := ( ∫ T 0 ‖ u ( t ) ‖ 2 X ) 1 / 2 < ∞ . The space C ([0 , T ]; X ) consists of all continuous functions u : [0 , T ] → X for which ‖ u ‖ C ([0 ,T ]; X ) = max 0 ≤ t ≤ T ‖ u ( t ) ‖ X < ∞ . We will also use the trace operator , τ : H 1 (Ω) → L 2 ( ∂ Ω), defined as the unique linear bounded operator that satisfies ( τ u )( x ) = u ( x ) for every x ∈ ∂ Ω whenever u is in C 1 ( ̄ Ω) (that is, u is at least once differentiable in the classical sense). Informally speaking, the trace operator gives meaning to the evaluation of an element of H 1 (Ω) on ∂ Ω, which is a set of measure zero. 5 Optimization Problem for the Mapping As- signment This section formulates the mapping assignment as an optimization problem that identifies the unknown coefficient H Γ in the mapping macroscopic model (5). This coefficient is reconstructed from the cumulative robot sensor data ˆ g ( t ), the total number of observations made over the entire domain by all the robots up until time t . We recall that the robots record these observations stochastically according to the chemical reaction (1) and that the agents’ spatial states evolve according to equation (4). Here, the velocity input v ( t ) is predetermined by the user and should be chosen such that the agents approximately cover the entire domain. An example of such a choice for v ( t ) is one that would drive the robots along a lawnmower path, as shown in Fig. 1 in Section 7. In the mapping macroscopic model (5), the cumulative robot sensor data is interpreted as a continuous quantity given by an integral over the domain, g ( t ) = ∫ Ω y 2 ( x , t ) d x . We address the existence and uniqueness of the solutions of the macroscopic model (5) in Section 6, since this model is a special case of the coverage macroscopic model (8) for which k f = 0, k ( t ) = k o , and the stationary robot state is absent. We define ˆ H as an estimate of H Γ . Due to the one-sided coupling between the state variables y 1 and y 2 in the PDE model (5), the variable y 1 is not affected by the coefficient H Γ . Hence, assuming that the PDE is well-posed and that a unique solution exists in H 1 (Ω) × L 2 (Ω), we can pose the mapping problem follows: 8 Problem 2 (Mapping Problem) Given an initial condition y 0 ( x ) of PDE model (5) and the corresponding solution y 1 ( x , t ) , find an estimate ˆ H ∈ S ad , where S ad := { u ∈ L 2 (Ω); 0 ≤ u ( x ) ≤ 1 a.e. x ∈ Ω } , that satisfies the following equation: ( K ˆ H )( t ) := ∫ Ω k o ˆ H ( x ) y 1 ( x , t ) d x = g ( t ) . (11) Ideally, the estimate ˆ H would be constrained to be a function that takes values of 0 or 1 over its domain. However, this constraint would make the resulting inverse problem, i.e. solving equation (11) for ˆ H in Problem 2, non-convex when considered as an optimization problem. To make the optimization problem tractable, we relax the range of ˆ H and consider it as an essentially bounded element of L 2 (Ω). Generally, equations of the form (11) need not have unique solutions unless some special conditions on k o y 1 ( x , t ), the kernel of the integral operator K : L 2 (Ω) → L 2 (0 , T ), can be guaranteed. To resolve the ill-posedness of the inverse problem, it can be alternatively posed as an optimization problem with a functional J ( ˆ H ) that is convex, but not necessarily strictly convex: min ˆ H ∈ S ad J ( ˆ H ) = ‖ K ˆ H − g ‖ 2 L 2 (0 ,T ) (12) To ensure a unique solution for ˆ H , the optimization problem can be recast with a strictly convex functional: min ˆ H ∈ S ad J λ ( ˆ H ) = 1 2 ‖ K ˆ H − g ‖ 2 L 2 (0 ,T ) + λ 2 ‖ ˆ H ‖ 2 L 2 (Ω) , (13) where λ > 0 is the regularization parameter that is often used in the so-called “Tikohnov regularization” of inverse problems [25]. Since S ad is convex in L 2 (Ω), the existence and uniqueness of regularized problems of the form (13) can be guaranteed and has been been well-studied in the theory of inverse problems [25]. Hence, we have the following result in Theorem 3. The assumption in the theorem that y 1 ∈ C ([0 , T ]; L 2 (Ω)) is justified in Section 6.2. Theorem 3 A unique solution to the regularized inverse problem (13) exists for each λ > 0 , under the assumptions that y 1 ∈ C ([0 , T ]; L 2 (Ω)) and k o > 0 . Proof. For y 1 ∈ C ([0 , T ]; L 2 (Ω)) and k o > 0, the operator K in equation (11) is a bounded operator from L 2 (Ω) to L 2 (0 , T ). Thus, the result follows from [25][Theorem 2.11]. We use a gradient descent method to solve the optimization problem (13). To apply this method, we need to characterize the derivative of the objective functional J λ ( ˆ H ). This functional is differentiable in the Fr ́ echet sense. Since K ∈ L ( L 2 (Ω) , L 2 (0 , T )), the derivative of K is itself. Then by the chain rule of differentiation, the Fr ́ echet derivative of J λ ( ˆ H ), denoted by J ′ λ ( ˆ H ), is defined as 〈 J ′ λ ( ˆ H ) , s 〉 L 2 (Ω) = 〈 K ˆ H − g, Ks 〉 L 2 (0 ,T ) + λ 〈 ˆ H, s 〉 L 2 (Ω) . (14) 9 Using Riesz representation [13][Appendix D, Theorem 2], we can obtain an explicit representation of the gradient of J λ ( ˆ H ), ∇ J λ ( ˆ H ) = K ∗ ( K ˆ H − g ) + λ ˆ H ∈ L 2 (Ω) , (15) where K ∗ ∈ L ( L 2 (0 , T ) , L 2 (Ω)) is defined as ( K ∗ G )( x ) = ∫ T 0 k o G ( t ) y 1 ( x , t ) dt ∀ G ∈ L 2 (0 , T ) . (16) To verify that this characterization of K ∗ is correct, it is straightforward to check that 〈 K ˆ H, G 〉 L 2 (0 ,T ) − 〈 ˆ H, K ∗ G 〉 L 2 (Ω) = 0 (17) ∀ ˆ H ∈ L 2 (Ω) , ∀ G ∈ L 2 (0 , T ) . 6 Optimal Control Problem for the Coverage Assignment In Section 6.1, we formulate the coverage assignment as an optimal control problem that is used to compute the parameters v ( t ) and k ( t ) in the coverage macroscopic model (8). We analyze the existence and uniqueness of solutions to the PDE model (8) in Section 6.2, the well-posedness of the optimal control in Section 6.3, and the differentiability of the objective functional in Section 6.4. 6.1 Formulation of the Optimal Control Problem We first express the PDE model (8) as a bilinear control system with control inputs v ( t ) and k ( t ). Toward this end, we supply the following definitions. Recall from Section 4 that V = H 1 (Ω), X = V × L 2 (Ω) 2 , and X ∗ := V ∗ × L 2 (Ω) 2 , and that L ( X, L 2 (Ω) 3 ) denotes the space of linear bounded operators from X to L 2 (Ω) 3 . We define the operators A , B i ∈ L ( X, L 2 (Ω) 3 ), i = 1 , 2 , 3, as follows: A =   D ∇ 2 k f 0 0 − k f 0 0 0 0   B 1 =   − ∂ ∂x 1 0 0 0 0 0 0 0 0   B 2 =   − ∂ ∂x 2 0 0 0 0 0 0 0 0   B 3 =   − H Γ 0 0 H Γ 0 0 H Γ 0 0   . We also define the spaces F = L 2 (0 , T ; L 2 (Ω) 3 ), G = L 2 (0 , T ; L 2 ( ∂ Ω)), Y = L 2 (0 , T ; X ), and Y ∗ = L 2 (0 , T ; X ∗ ). These definitions will be used to establish the well-posedeness of the PDE (8) using the classical theory of weak solutions of linear PDEs [13]. For f ∈ F , g ∈ G , and y 0 ∈ L 2 (Ω) 3 , let y ∈ Y be a function with time derivative ∂ y /∂t ∈ Y ∗ . Denoting the components of the robots’ velocity field by v ( t ) = [ v x ( t ) v y ( t )] T , we define the control inputs u 1 ( t ) = v x ( t ), u 2 ( t ) = v y ( t ), and u 3 ( t ) = k ( t ). We can now write the PDE (8) with boundary condition (6) and initial condition (9) as a bilinear control 10 system in the following form: ∂ y ∂t = A y + 3 ∑ i =1 u i B i y + f in Q, n · ( ∇ y 1 − [ u 1 u 2 ] T y 1 ) = g on Σ , y (0) = y 0 on Ω , (18) where Q and Σ are as defined in Section 3.2. We note that the solution of the PDE model in equations (8), (6), (9) is the solution of the system (18) with f = 0 and g = 0. We consider the more general form (18) for the purpose of analyzing the differentiability properties of the control-to-state map (defined in Section 6.4) and the objective functional in the optimal control problem. A function y is a weak solution of system (18) if 〈 ∂ y ∂t , φ 〉 Y ∗ ,Y = ∫ T 0 〈 A g ( t ) y ( t ) , φ ( t ) 〉 X ∗ ,X dt (19) + 3 ∑ i =1 〈 u i B i y , φ 〉 F + 〈 f , φ 〉 F for all φ ∈ L 2 (0 , T ; X ). Here, A g ( t ) : X → X ∗ is the variational form of the operator A for each t ∈ (0 , T ). The boundary condition (6) is equipped with A g in the variational formulation using Green’s theorem as, A g ( t ) =   M g ( t ) k f 0 0 − k f 0 0 0 0   , (20) where M g ( t ) : V → V ∗ for each t ∈ (0 , T ) is the Laplacian in the variational formulation and is defined as 〈 M g ( t ) y, φ 〉 V ∗ ,V = − 〈 D ∇ y, ∇ φ 〉 L 2 (Ω) + ∫ ∂ Ω ( g ( t ) + n · [ u 1 ( t ) u 2 ( t )] T y ) φd x (21) for all y ∈ V and φ ∈ V ∗ . The coverage problem can now be framed as follows. Problem 4 (Coverage Problem) Define a target spatial distribution of robot coverage activity, y Ω ∈ L 2 (Ω) 3 , to be achieved by time T , and a set of admissible control inputs, U ad = { u ∈ L 2 (0 , T ) 3 ; u min i ≤ u i ( t ) ≤ u max i , i = 1 , 2 , 3 , a.e. t ∈ (0 , T ) } . Let ̄ Y = C ([0 , T ] , L 2 (Ω) 3 ) , and define a function W ∈ L ( L 2 (Ω) 3 ) that weights the relative significance of minimizing the distances between different state vari- ables and their target distributions. Then, given an initial condition y 0 ( x ) , solve the optimal control problem min ( y , u ) ∈ ̄ Y × U ad J ( y , u ) = 1 2 ‖ W y ( · , T ) − y Ω ‖ 2 L 2 (Ω) 3 + λ 2 ‖ u ‖ 2 L 2 (0 ,T ) m (22) 11 subject to equation (18) with f = 0 and g = 0 . Note that, due to the essential bounds on u , we have that u ∈ L ∞ (0 , T ) 3 . Here, we only consider essentially bounded control inputs since we only prove existence of solutions for control variables in L ∞ (0 , T ) 3 . The existence and uniqueness of solutions for control variables in L p (0 , T ) 3 , p < ∞ , is a nontrivial issue and is beyond the scope of this work. 6.2 Energy Estimates Energy estimates refer to bounds on the solutions of a PDE system with respect to certain parameters of interest, such as the initial condition, coefficients, or boundary parameters. Whereas in the theory of weak solutions of PDEs [13], these energy estimates are used to show the existence of solutions, in optimal control analysis they are used to study the differentiability properties of the control-to-state map. In this section, we derive energy estimates for the solutions of the PDE model (19) and use them to establish existence and uniqueness of solutions to the model, a result (Lemma 7) that we previously stated without proof in our work [12]. More importantly, the estimates are then used to prove existence of solutions to the optimal control problem. The differentiability of the control-to-state map also follows from these energy estimates. Lemma 5 Let b ∈ R 2 be such that max i =1 , 2 | b i | ≤ max i =1 , 2 | u max i | + min i =1 , 2 | u max i | . Sup- pose that ̃ g ∈ L 2 ( ∂ Ω) . Define M : V → V ∗ as 〈 M y, φ 〉 V ∗ ,V = 〈 D ∇ y, ∇ φ 〉 L 2 (Ω) − ∫ ∂ Ω ( ̃ g + n · b y ) φd x . (23) Then we have the following energy estimate for all y ∈ V : ̃ β ‖ y ‖ 2 V ≤ 〈 M y, y 〉 V ∗ ,V + ̃ α ( ‖ ̃ g ‖ 2 L 2 ( ∂ Ω) + ‖ y ‖ 2 L 2 (Ω) ) , (24) which holds for some ̃ α, ̃ β > 0 that depend only on Ω , max i =1 , 2 | u min i | , and max i =1 , 2 | u max i | . Proof. Setting φ = y in definition (21), we obtain the inequalities D ∫ Ω |∇ y | 2 d x ≤ 〈 M y, y 〉 V ∗ ,V + ∫ ∂ Ω ( ̃ g + n · b y ) yd x ≤ 〈 M y, y 〉 V ∗ ,V + ∫ ∂ Ω | ̃ g || y | d x + ‖ b ‖ ∫ ∂ Ω | y | 2 d x , from which we can conclude that D ‖ y ‖ 2 V ≤ 〈 M y, y 〉 V ∗ ,V + 1 2 ‖ ̃ g ‖ 2 L 2 ( ∂ Ω) + 1 2 ‖ τ y ‖ 2 L 2 ( ∂ Ω) + ‖ b ‖‖ τ y ‖ 2 L 2 ( ∂ Ω) + D ‖ y ‖ 2 L 2 (Ω) , (25) where τ is the trace operator defined in Section 4. Then it follows from a modified form of the Trace Theorem for domains with Lipschitz boundaries 12 [19][Theorem 1.5.1.10] that there exists a constant K , independent of  ∈ (0 , 1) and y ∈ V , such that the following inequality holds for every  ∈ (0 , 1): ‖ τ y ‖ 2 L 2 ( ∂ Ω) ≤ K 1 / 2 ‖ y ‖ 2 V + K (  − 1 / 2 −  +1 / 2 ) ‖ y ‖ 2 L 2 (Ω) . Thus, for each  ∈ (0 , 1), the inequality (25) becomes D ‖ y ‖ 2 V ≤ 〈 M y, y 〉 V ∗ ,V + 1 2 ‖ ̃ g ‖ 2 L 2 ( ∂ Ω) + ̃ b  ‖ y ‖ 2 V + C  ‖ y ‖ 2 L 2 (Ω) , (26) where ̃ b  = K 2  1 / 2 + rK 1 / 2 , C  = D + K 2 (  − 1 / 2 −  +1 / 2 ) + rK (  − 1 / 2 −  +1 / 2 ), and r = max i =1 , 2 | u max i | + min i =1 , 2 | u max i | . Note that we can choose an  > 0 arbitrarily small, independent of y ∈ V , such that D > ̃ b  . We fix such an  > 0. Then we set ̃ β = D − ̃ b  and ̃ α = max { 1 2 , C  } . This concludes the proof. Corollary 6 Define A g ( t ) : X → X ∗ as in equation (20) . Then, for almost every t ∈ (0 , T ) , we have the following energy estimate: β ‖ y ‖ 2 X ≤ − 〈 A g ( t ) y , y 〉 X ∗ ,X + α ( ‖ g ( t ) ‖ 2 L 2 ( ∂ Ω) + ‖ y ‖ 2 L 2 (Ω) 3 ) , (27) which holds for some α, β > 0 that depend only on Ω , max i =1 , 2 | u min i | , and max i =1 , 2 | u max i | . Proof. Comparing equations (21) and (23), we observe that M g ( t ) = − M , b = [ u 1 ( t ) u 2 ( t )] T , and ̃ g = g ( t ) for almost every t ∈ (0 , T ). Then, from the definition (20) of the operator A g ( t ), we have that − 〈 A g ( t ) y , y 〉 X ∗ ,X = 〈 M y 1 , y 1 〉 V ∗ ,V − k f 〈 y 1 , y 2 〉 L 2 (Ω) + k f ‖ y 2 ‖ 2 L 2 (Ω) . Using Cauchy’s inequality and Young’s inequality [13], 〈 M y , y 〉 X ∗ ,X ≤ − 〈 A g ( t ) y 1 , y 1 〉 V ∗ ,V + k f 2 ‖ y 1 ‖ 2 2 + k f 2 ‖ y 2 ‖ 2 2 + k f ‖ y 2 ‖ 2 L 2 (Ω) . We set β = ̃ β and α = max { ̃ α, 3 2 k f } , where ̃ α and ̃ β are defined as in the proof of Lemma 5. This concludes the proof. Lemma 7 Given f ∈ L 2 (0 , T ; L 2 (Ω) 3 ) , g ∈ L 2 (0 , T ; L 2 ( ∂ Ω)) , and an initial condition y 0 ∈ L 2 (Ω) 3 , there exists a unique solution y ∈ C ([0 , T ]; L 2 (Ω) 3 ) to system (18) . We have the following energy estimate for this solution: ‖ y ‖ C ([0 ,T ]; L 2 (Ω) 3 ) + ‖ y ‖ L 2 (0 ,T ; X ) + ‖ ∂ y /∂t ‖ L 2 (0 ,T ; X ∗ ) ≤ K ( ‖ y 0 ‖ L 2 (Ω) 3 + ‖ f ‖ L 2 (0 ,T ; L 2 (Ω) 3 ) + ‖ g ‖ L 2 (0 ,T ; L 2 ( ∂ Ω) ) , (28) where K > 0 depends only on Ω , max 1 ≤ i ≤ 3 | u max i | , and max 1 ≤ i ≤ 3 | u min i | . 13 Proof. We first determine a bound on the sum ‖ y ‖ C ([0 ,T ]; L 2 (Ω) 3 ) + ‖ y ‖ L 2 (0 ,T ; X ) in the energy estimate (28). Let y be a weak solution of system (19), and set φ = y in (19). Then 〈 ∂ y ∂t ( t ) , y ( t ) 〉 X,X ∗ − 〈 A g ( t ) y ( t ) , y ( t ) 〉 X,X ∗ = 3 ∑ i =1 〈 u i ( t ) B i y ( t ) , y ( t ) 〉 L 2 (Ω) 3 + 〈 f ( t ) , y ( t ) 〉 L 2 (Ω) 3 . Combining this equality with the inequality (27), and observing that 〈 ∂ y ∂t ( t ) , y ( t ) 〉 X,X ∗ = 1 2 d dt ‖ y ( t ) ‖ 2 L 2 (Ω) 3 , we obtain the inequality d dt ‖ y ( t ) ‖ 2 L 2 (Ω) 3 + 2 β ‖ y ( t ) ‖ 2 X ≤ 2 3 ∑ i =1 ‖ u i ( t ) ‖ L ∞ (0 ,T ) 〈 B i y ( t ) , y ( t ) 〉 L 2 (Ω) 3 + 2 〈 f ( t ) , y ( t ) 〉 L 2 (Ω) 3 + 2 α ( ‖ g ( t ) ‖ 2 L 2 ( ∂ Ω) + ‖ y 1 ( t ) ‖ 2 L 2 (Ω) ) . Using Cauchy’s inequality and Young’s inequality [13], we derive the following inequality: d dt ‖ y ( t ) ‖ 2 L 2 (Ω) 3 + 2 β ‖ y ( t ) ‖ 2 X ≤ 2 p ∑ i =1 ‖ u i ‖ L ∞ (0 ,T ) ( ‖ B i y ( t ) ‖ L 2 (Ω) 3 ‖ y ( t ) ‖ L 2 (Ω) 3 ) + ( ‖ f ( t ) ‖ 2 L 2 (Ω) 3 + ‖ y ( t ) ‖ 2 L 2 (Ω) 3 ) + 2 α ( ‖ g ( t ) ‖ 2 L 2 ( ∂ Ω) + ‖ y ( t ) ‖ 2 L 2 (Ω) 3 ) . (29) Let E = r × ( max 1 ≤ i ≤ 3 | u max i | + max 1 ≤ i ≤ 3 | u min i | ), where r is the maximum of the operator norms of the operators B i , i = 1 , 2 , 3. Then we have a bound on the first term on the right side of inequality (29) as follows: 2 3 ∑ i =1 ‖ u i ‖ L ∞ (0 ,T ) ( ‖ B i y ( t ) ‖ L 2 (Ω) 3 ‖ y ( t ) ‖ L 2 (Ω) 3 ) ≤ 2 E ( ‖ y ( t ) ‖ X ‖ y ( t ) ‖ L 2 (Ω) 3 ) . (30) Applying Young’s inequality, we can bound the term on the right side of in- equality (30): 2 E ( ‖ y ( t ) ‖ X ‖ y ( t ) ‖ L 2 (Ω) 3 ) ≤ β ‖ y ( t ) ‖ 2 X + E 2 β ‖ y ( t ) ‖ 2 L 2 (Ω) 3 . (31) Combining these upper bounds with the inequality (29), we find that: d dt ‖ y ( t ) ‖ 2 L 2 (Ω) 3 + β ‖ y ( t ) ‖ 2 X 14 ≤ E 2 β ‖ y ( t ) ‖ 2 L 2 (Ω) 3 + ( ‖ f ‖ 2 L 2 (Ω) 3 + ‖ y ( t ) ‖ 2 L 2 (Ω) 3 ) + 2 α ( ‖ g ( t ) ‖ 2 L 2 ( ∂ Ω) + ‖ y ( t ) ‖ 2 L 2 (Ω) 3 ) , and therefore d dt ‖ y ( t ) ‖ 2 L 2 (Ω) 3 + β ‖ y ( t ) ‖ 2 X ≤ C ( ‖ y ( t ) ‖ 2 L 2 (Ω) 3 + ‖ f ( t ) ‖ 2 L 2 (Ω) 3 + ‖ g ( t ) ‖ 2 L 2 ( ∂ Ω) ) (32) for C = E 2 β + 1 + 2 α . This inequality implies that d dt ‖ y ( t ) ‖ 2 L 2 (Ω) 3 ≤ C ( ‖ y ( t ) ‖ 2 L 2 (Ω) 3 + ‖ f ( t ) ‖ 2 L 2 (Ω) 3 + ‖ g ( t ) ‖ 2 L 2 ( ∂ Ω) ) . Setting η ( t ) = ‖ y ( t ) ‖ 2 L 2 (Ω) 3 and ψ ( t ) = C ( ‖ f ( t ) ‖ 2 L 2 (Ω) 3 + ‖ g ( t ) ‖ 2 L 2 ( ∂ Ω) ), and applying Gronwall’s lemma [13] with these functions, we get: max 0 ≤ t ≤ T ‖ y ( t ) ‖ 2 L 2 (Ω) 3 ≤ e CT ( ‖ y 0 ‖ 2 L 2 (Ω) 3 + ‖ f ‖ 2 L 2 (0 ,T ; L 2 (Ω) 3 ) + ‖ g ‖ 2 L 2 (0 ,T ); L 2 ( ∂ Ω) ) . Then integrating both sides of inequality (32) over the time interval (0 , T ), and using the inequality above, we obtain a bound on the sum of the first two terms in the energy estimate (28): max 0 ≤ t ≤ T ‖ y ( t ) ‖ 2 L 2 (Ω) 3 + ‖ y ‖ 2 L 2 (0 ,T ; X ) (33) ≤ C ′ ( ‖ y 0 ‖ 2 L 2 (Ω) 3 + ‖ f ‖ 2 L 2 (0 ,T ; L 2 (Ω) 3 ) + ‖ g ‖ 2 L 2 (0 ,T ); L 2 ( ∂ Ω) ) , where C ′ = max { e CT , C β } . Next, we derive a bound on the third term in estimate (28), ‖ ∂ y /∂t ‖ L 2 (0 ,T ; X ∗ ) . We know that B i is a bounded operator from X to L 2 (Ω) 3 for each i ∈ { 1 , 2 , 3 } . Therefore, ‖ B i q ‖ L 2 (Ω) 2 ≤ c i ‖ q ‖ X for all q ∈ X and for some positive con- stants c i . Similarly, A g ∈ L ( X, X ∗ ) when g = 0. It follows that ‖ A g q ‖ 2 X ≤ c A ( ‖ q ‖ 2 X + ‖ g ‖ 2 L 2 ( ∂ Ω) ) for all q ∈ X and for a fixed g ∈ L 2 ( ∂ Ω) and some posi- tive constant c A . Let v ∈ X such that ‖ v ‖ X ≤ 1. Then, using inequality (33) and the constants c A and c i , i = 1 , 2 , 3, we find that any weak solution y of system (18) satisfies the following inequality, ∣ ∣ ∣ ∣ 〈 ∂ y ∂t ( t ) , v 〉 X ∗ ,X ∣ ∣ ∣ ∣ ≤ C ′′ ( ‖ y ( t ) ‖ X + ‖ f ( t ) ‖ L 2 (Ω) 3 + ‖ g ( t ) ‖ L 2 ( ∂ Ω) ) , for almost every t ∈ (0 , T ) and for the constant C ′′ = r × C ′ × max { c A , c 1 , c 2 , c 3 } , where r = max 1 ≤ i ≤ 3 | u max i | + max 1 ≤ i ≤ 3 | u min i | + max i =1 , 2 | b i | + 1. Here, we are bounding the left-hand side of equation (19) by setting φ ( t ) = v for almost every t ∈ (0 , T ). The inequality above implies that ∫ T 0 ‖ ∂ y ∂t ( t ) ‖ 2 X ∗ dt ≤ C ′′ ∫ T 0 ( ‖ y ( t ) ‖ 2 X + ‖ f ( t ) ‖ 2 L 2 (Ω) 3 + ‖ g ( t ) ‖ 2 L 2 ( ∂ Ω) ) dt. 15 The bound on the term ‖ ∂ y /∂t ‖ L 2 (0 ,T ; X ∗ ) follows from the above inequality. Hence, the bounds in the energy estimate (28) hold for K = max { 2 C ′ , 2 C ′′ } . To confirm the existence and uniqueness of the solution to the PDE (8), we note that the weak form of the PDE (19) is in the abstract form d y dt = ̃ A ( t ) y + ̃ f ( t ) , y (0) = y 0 , (34) where ̃ A ( t ) = A 0 ( t ) + ∑ 3 i =1 u i ( t ) B i ∈ L ( X, X ∗ ) and ̃ f ∈ L 2 (0 , T ; X ∗ ) is given by 〈 ̃ f ( t ) , ψ 〉 X,X ∗ = 〈 f ( t ) , ψ 〉 L 2 (Ω) 3 + ∫ ∂ Ω g ( x , t ) ψ 1 ( x ) d x for almost every t ∈ [0 , T ]. Here, the term ∫ ∂ Ω g ( x , t ) ψ 1 ( x ) d x defines a bounded functional on X due to the bounds on the trace map τ [19][Theorem 1.5.1.10]. Then the result follows from [42][Theorem 26.1] and the estimate (27). 6.3 Existence of Optimal Control In this section, we prove the existence of a solution to the optimal control problem (22). Toward this end, we define the control-to-state mapping, Ξ: U ad → ̄ Y , which maps a control, u , to y , the corresponding solution to system (18) for f = 0 and g = 0. We introduce the following reduced optimization problem, min u ∈ U ad ˆ J ( u ) := J (Ξ( u ) , u ) , (35) where J is defined in problem (22). This problem incorporates the PDE system (18) into the objective functional, rather than defining it as a set of constraints as in the original problem formulation (22). We shall henceforth analyze the reduced problem (35). Theorem 8 An optimal control u ∗ exists that minimizes the objective func- tional ˆ J in problem (35) . Proof. The functional ˆ J ( u ) is bounded from below, which implies that the in- fimum is a finite real number. Therefore, q = inf u ∈ U ad ˆ J ( u ) exists. We now determine an optimal pair ( y ∗ , u ∗ ), for which J ( y ∗ , u ∗ ) = q . Let { u n } ∞ n =1 be a minimizing sequence such that ˆ J ( u n ) → q as n → ∞ . U ad is a bounded and closed convex set, and thus is weak sequentially compact. Hence, there exists a subsequence { u n } ∞ n =1 such that u n ⇀ u ∗ in L 2 (0 , T ) 3 , (36) with u ∗ ∈ U ad . Recall from Section 4 that X = V × L 2 (Ω) 2 . Then, the uniform boundedness of the solution, y , in the energy estimate (28) allows us to extract a subsequence { y n } ∞ n =1 , where y n = Ξ( u n ), such that y n ⇀ y ∗ in L 2 (0 , T ; X ) . (37) It is necessary to confirm that Ξ( u ∗ ) = y ∗ , since we do not know whether Ξ is weakly continuous. Using the energy estimate (28), we have uniform bounds on ‖ y n ‖ L 2 (0 ,T ; X ) + ‖ ∂ y n /∂t ‖ L 2 (0 ,T ; X ∗ ) . Therefore, by the Aubin-Lions lemma [38], we have that there exists a subsequence such that y n 1 → y ∗ 1 in L 2 (0 , T ; L 2 (Ω)) . (38) 16 The energy estimate (28) implies that the terms ∇ y n 1 , ∂ y n ∂t , and k f y n 2 are uni- formly bounded. Hence, we can also conclude that there exist subsequences that satisfy: ∇ y n 1 ⇀ ∇ y ∗ 1 in L 2 (0 , T ; L 2 (Ω)) , ∇ y n 1 → ∇ y ∗ 1 in L 2 (0 , T ; V ∗ ) , ∂ y n ∂t ⇀ ∂ y ∗ ∂t in L 2 (0 , T ; X ∗ ) , k f y n 2 ⇀ k f y ∗ 2 in L 2 (0 , T ; L 2 (Ω)) . The first two components of u n are denoted by v n and the third component by k n . From the strong convergence of y n 1 in L 2 (0 , T ; L 2 (Ω)) and the weak convergence of u n in L 2 (0 , T ) 3 , we can further deduce that: k n H Γ y n 1 ⇀ kH Γ y ∗ 1 in L 2 (0 , T ; L 2 (Ω)) , v n ∇ y n 1 ⇀ v ∇ y ∗ 1 in L 2 (0 , T ; V ∗ ) . To prove convergence of the terms in the weak form of the PDE (19) that arise from the boundary condition (6), we apply Green’s theorem 〈 v n · ∇ y n 1 , φ 〉 L 2 (0 ,T ; L 2 (Ω)) + ∫ T 0 ∫ ∂ Ω n · ( v n y 1 φ ) d x dt = − 〈 v n · y n 1 , ∇ φ 〉 L 2 (0 ,T ; L 2 (Ω)) (39) for all φ ∈ L 2 (0 , T ; V ). Due to the strong convergence of y n 1 in L 2 (0 , T ; L 2 (Ω)) and the weak convergence of v n in L 2 (0 , T ) 2 , we have that v n · y n 1 ⇀ v ∗ · y ∗ 1 in L 2 (0 , T ; L 2 (Ω)) . (40) The convergence of all the sequences defined thus far implies that the sequence of solutions y n = Ξ( u n ), which satisfies 〈 ∂ y n ∂t , φ 〉 Y ∗ ,Y = 〈 A g y n , φ 〉 Y ∗ ,Y + 3 ∑ i =1 〈 u n i B i y n , φ 〉 F (41) with g = 0, converges to the solution y ∗ = Ξ( u ∗ ), which satisfies 〈 ∂ y ∗ ∂t , φ 〉 Y ∗ ,Y = 〈 A g y ∗ , φ 〉 Y ∗ ,Y + 3 ∑ i =1 〈 u ∗ i B i y ∗ , φ 〉 F (42) with g = 0. It remains to be shown that ˆ J ( u ∗ ) = q . Because J is weakly lower semicon- tinuous, we can state that q = lim n →∞ J ( y n , u n ) ≤ J ( y ∗ , u ∗ ) . (43) Since q was defined earlier as the infimum of ˆ J ( u ), we conclude that ˆ J ( u ∗ ) = J ( y ∗ , u ∗ ) = q, (44) which completes the proof. 17 6.4 Differentiability of the Control-to-State Map In this section, we summarize several results from our prior work [12] that we apply to derive the gradient of ˆ J ( u ). We use the gradient in the method of projected gradient descent to numerically compute a locally optimal control. The numerical computation of this gradient is made possible by deriving an expression for the gradient in terms of the adjoint equation , which is a sys- tem of PDEs. This method for computing the control variables is explained in [10][Section 5.2]. See also [40][Section 3.7.1], where the method is discussed in the more general context of optimal control of parabolic PDEs. Proposition 9 The mapping Ξ is directionally differentiable at every u ∈ U ad along each direction h ∈ L ∞ (0 , T ) 3 . Its directional derivative Ξ ′ ( u ) : U ad → ̄ Y in the direction h ∈ L ∞ (0 , T ) 3 , denoted by Ξ ′ ( u ) h , is given by the solution w to the following equation, ∂ w ∂t = A w + 3 ∑ i =1 u i B i w + 3 ∑ i =1 h i B i y in Q, n · ( ∇ w 1 − [ u 1 u 2 ] T · w 1 ) = n · ([ h 1 h 2 ] T y 1 ) on Σ , w (0) = 0 on Ω . (45) Theorem 10 The reduced objective functional ˆ J is directionally differentiable at every u ∈ U ad along all h ∈ L ∞ (0 , T ) 3 . Its directional derivative along h ∈ L ∞ (0 , T ) 3 has the form d ˆ J ( u ) h = ∫ T 0 〈 n · ([ h 1 h 2 ] T p 1 ) , y 1 〉 L 2 ( ∂ Ω) dt + ∫ T 0 〈 3 ∑ i =1 h i B i y , p 〉 L 2 (Ω) 3 dt + λ 〈 u , h 〉 L 2 (0 ,T ) 3 , (46) where p is the solution of the backward-in-time adjoint equation, − ∂p 1 ∂t = ∇ · ( D ∇ p 1 + v ( t ) p 1 ) + k ( t ) H Γ ( − p 1 + p 2 + p 3 ) in Q, − ∂p 2 ∂t = k f p 1 − k f p 2 in Q, − ∂p 3 ∂t = 0 in Q, (47) with the Neumann boundary conditions n · ∇ p 1 = 0 on Σ (48) and final time condition p ( T ) = W ∗ ( W y ( · , T ) − y Ω ) , (49) where W is defined in Problem 4. 18 Remark 11 The adjoint equation (47) - (49) should not be confused with the backward heat equation, which is known to be ill-posed in H 1 (Ω) . The well- posedness of system (47) - (49) can be established using a standard variable trans- formation of the form p ∗ ( t ) = p ( T − t ) . The resulting equation, ∂p ∗ 1 ∂t = ∇ · ( D ∇ p ∗ 1 + v ( t ) p ∗ 1 ) + k ( t ) H Γ ( − p ∗ 1 + p ∗ 2 + p ∗ 3 ) in Q, ∂p ∗ 2 ∂t = k f p ∗ 1 − k f p ∗ 2 in Q, ∂p ∗ 3 ∂t = 0 in Q, (50) with the Neumann boundary conditions n · ∇ p ∗ 1 = 0 on Σ (51) and initial condition p ∗ (0) = W ∗ ( W y ( · , T ) − y Ω ) , (52) can be shown to have a unique solution p ∗ using arguments similar to those in the proof of Lemma 7. 7 Simulation Results and Discussion In this section, we validate our approaches to the mapping assignment (Section 5) and coverage assignment (Section 6) with numerical simulations. The PDE models of ensemble mapping and coverage, presented in Section 3.2, are numer- ically solved using the method of lines . In this method, the operators associated with the PDE are initially discretized in space. For this spatial discretization, we use a finite-difference and flux limiter based scheme on a uniform mesh. Time is viewed as a continuous variable, and the resulting system of ordinary differential equations (ODEs) is solved using built-in ODE solvers in MATLAB. See [23] for further details on this approach to numerically solving advection- diffusion-reaction type PDEs. 7.1 Mapping Assignment We validate our mapping approach in two test cases. In both scenarios, the domain is Ω = [0 , 100] 2 m 2 , the ensemble has N = 30 agents, the diffusion coefficient is D = 10 − 4 m 2 /s , and the agents’ rate of recording observations over a region of interest is k o = 100 s − 1 , yielding a high probability rate of registering observations. The agents’ velocity field v ( t ) is defined to drive the agents along a lawnmower path, as illustrated by the two agent trajectories in Figure 1. This path is perturbed by stochastic fluctuations arising from the agents’ diffusive motion. We define two environments, Case 1 and Case 2 , with different regions of interest, labeled Actual H Γ ( x ) in Figure 2. The results of our mapping approach are shown in the plots labeled Estimated H Γ ( x ). To use the estimated maps in simulations of our coverage approach (Section 7.2), we thresholded the estimated 19 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 x 1 ( t ) x 2 ( t ) Starting Point, Agent 1 Agent 2 Figure 1: Trajectories of two agents during the mapping assignment. H Γ ( x ) in both environments by defining the Thresholded H Γ ( x ), also shown in Figure 2, as H T ( x ) = 1 for each x ∈ Ω such that H Γ ( x ) ≥ 0 . 5 and H T ( x ) = 0 otherwise. In both cases, our optimization method is able to reconstruct the spatial coefficient H Γ ( x ) with considerable accuracy, even though a relatively small number of robots was used ( N = 30). The largest error in the estimates occurs in the top half of the Case 2 environment, which can be attributed to the increased dispersion of the agents, due to their diffusive motion, as they reach the upper portion of the domain. We would expect a larger ensemble of agents to generate a more accurate map, since the microscopic model converges to the macroscopic model as N → ∞ [43]. 7.2 Coverage Assignment We validate our coverage approach in three test cases, Cases 1, 2, and 3 . The region of interest in Case 1 is defined as Thresholded H Γ ( x ) in Figure 2(a), and the region of interest in Cases 2 and 3 is defined as Thresholded H Γ ( x ) in Figure 2(b). In all scenarios, the domain is Ω = [0 , 100] 2 m 2 , and the diffusion coefficient is D = 5 × 10 − 4 m 2 /s . To illustrate the scalability of our control methodology to large numbers of agents, we simulate an ensemble with N = 1000 agents. All agents are initialized in the moving state, with an initial density given by a Gaussian distribution centered at [ x o 1 x o 2 ] = [10 10]. Thus, the initial conditions (9) are: y 1 ( x , 0) = A exp ( − ( x 1 − x o 1 ) 2 2 σ 2 x 1 − ( x 2 − x o 2 ) 2 2 σ 2 x 2 ) , y 2 ( x , 0) = 0 , y 3 ( x , 0) = 0 , (53) with σ x 1 = σ x 2 = 0 . 02 and A defined such that the Gaussian distribution integrates to 1 over the domain. The final time is set to T = 800 s for Cases 1 and 2 and T = 300 s for Case 3 . We define y Ω , the target spatial distribution in Problem 4, as follows. We partition the domain Ω into P = 20 cells, denoted by { Ω nm } . The cell Ω 00 20 Actual H Γ ( x ) x 1 x 2 0 50 100 0 20 40 60 80 100 Estimated H Γ ( x ) x 1 x 2 0 50 100 0 20 40 60 80 100 Thresholded H Γ ( x ) x 1 x 2 0 50 100 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 (a) Case 1 Actual H Γ ( x ) x 1 x 2 0 50 100 0 20 40 60 80 100 Estimated H Γ ( x ) x 1 x 2 0 50 100 0 20 40 60 80 100 Thresholded H Γ ( x ) x 1 x 2 0 50 100 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 (b) Case 2 Figure 2: Actual, estimated, and thresholded maps of regions of interest. 21 occupies the region [0 , 100 P ] × [0 , 100 P ], and all other cells are defined as Ω nm = ( 100 m P , 100 m +1 P ] × (100 n P , 100 n +1 P ], where n, m ∈ { 0 , 1 , ..., P − 1 } and n 6 = 0 or m 6 = 0. For Case 1 and Case 2 , we set the target number of instances of desired robot activity in each cell Ω mn to be z ∗ mn = C × μ (Ω nm ∩ Γ) μ (Ω mn ) , where C is a positive constant and μ is the Lebesgue measure on Ω. The target distribution of robot coverage activity, which is the third component of y Ω , is defined as y ∗ 3 ( x , T ) = C 50 for all x ∈ Γ. Since we do not require the first two components of y Ω (the densities of moving and stationary robots) to reach target distributions at time T , we set the function W in Problem 4 to be a diagonal operator matrix of the form diag ([0 0 I ]), where I is the identity operator on L 2 (Ω). We specify the following sub-cases: C = 450 in Case 1a and Case 2a , and C = 3600 in Case 1b and Case 2b . For Case 3 , coverage activity is desired only in the upper half of the domain; we set y ∗ 3 ( x , T ) = 36 for x ∈ Γ with x 2 ≥ 60, and y ∗ 3 ( x , T ) = 0 otherwise. Figure 3 plots the time evolution of the objective function J for each case. For Cases 1a,b and 2a,b , the low values of J at the final time T = 800 s indicate that the target coverage density is nearly achieved. This is in part due to the accuracy of the thresholded estimate of H Γ ( x ) obtained from the mapping task (Section 7.1). The lower error in the estimated H Γ ( x ) for Case 1 compared to this mapping error for Case 2 contributes to the better coverage performance (i.e., lower values of J ) in Case 1a versus Case 2a and in Case 1b versus Case 2b . For Case 3 , the value of J is higher at the final time T = 300 s than the values of J for Cases 1a,b and 2a,b at T = 800 s , indicating that the final coverage density is relatively farther from the target density. The poorer coverage perfor- mance in Case 3 , in which coverage activity is limited to a subset of the region of interest, is a consequence of the limited controllability of the system, which can be attributed to three factors. First, only three control variables are used to control the PDE model, which is an infinite-dimensional dynamical system. Second, the system is constrained to achieve the target coverage density by time T . Third, the time-dependent diffusion, reaction, and advection operators in the PDE model commute [26], and this commutativity of the control and drift vector fields degrades the controllability properties of the system [2]. Thus, the ensemble would achieve better coverage performance in assignments with less stringent requirements on the system controllability properties. Such assign- ments include those in which the target coverage distribution is proportional to an environmental parameter, or in which the objective is to achieve a minimum coverage density in each region of interest rather than an exact coverage density. Figure 4 plots the target coverage density z ∗ mn for Cases 1b and 2a alongside the corresponding expected coverage density, y 3 ( x , T ) from the macroscopic PDE model (18), and the achieved coverage density, z Ω mn 3 ( T ) from the mi- croscopic model, at the final time T = 800 s. The lower value of the target coverage density in Cases 1a, 2a ( C = 450) than in Cases 1b, 2b ( C = 3600) results in larger stochastic fluctuations of the achieved coverage density around the expected coverage density. These larger fluctuations produce poorer cover- age performance (higher values of J ) in Case 1a versus Case 1b and in Case 2a versus Case 2b . The plots in Figure 4 display the relative degree of these fluctuations in Case 1b versus Case 2a at time T = 800 s . The plots show that the achieved coverage density from the microscopic model approximates the ex- pected coverage density from the macroscopic model, due to the large number 22 0 100 200 300 400 500 600 700 800 Time ( s ) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Objective Function Case 1a (C = 450) Case 1b (C = 3600) Case 2a (C = 450) Case 2b (C = 3600) Case 3 Figure 3: Time evolution of the objective function J in Problem 4 for different coverage scenarios. of agents ( N = 1000) in the ensemble. As demonstrated in [43], the discrepancy between these two density fields will tend to zero as N → ∞ . 8 Conclusion In this paper, we presented stochastic approaches to mapping and coverage assignments for an ensemble of autonomous robots in a PDE control framework. This framework enables the modeling and control of a robotic ensemble in a rigorous way that is scalable with the number of robots. We demonstrated that temporal data obtained by a small ensemble of diffusive agents can provide rich information about the spatial distribution of a region of interest, despite severe restrictions on the agents’ sensing, localization, tracking, and computational capabilities. We also showed that we can pose the coverage task as an optimal control problem that computes the agents’ control inputs to achieve a target distribution of coverage activity over the previously mapped regions of interest. In future work, we will investigate the controllability properties of the PDE models that we have presented in this paper. Additionally, we plan to incorpo- rate pairwise interactions between agents, such as those defined by attraction- repulsion potentials, in order to increase the cohesiveness of the ensemble and improve the reachability properties of the system. The component of the robots’ velocity field that is induced by pairwise interactions would be included in the advection term, resulting in a nonlinear PDE as the macroscopic model. This type of model would require more advanced analytical tools than the ones used in this paper. 23 y 3 ( x , 800) x 1 x 2 0 50 100 0 20 40 60 80 100 z Ω mn 3 (800) x 1 x 2 0 50 100 0 20 40 60 80 100 z ∗ mn x 1 x 2 0 50 100 0 20 40 60 80 100 0 20 40 60 0 20 40 60 0 20 40 60 (a) Case 1 b y 3 ( x , 800) x 1 x 2 0 50 100 0 20 40 60 80 100 z Ω mn 3 (800) x 1 x 2 0 50 100 0 20 40 60 80 100 z ∗ mn x 1 x 2 0 50 100 0 20 40 60 80 100 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 (b) Case 2 a Figure 4: Expected, achieved, and target densities of robot coverage activity for two scenarios. 24 References [1] Aviv Adler, Mark de Berg, Dan Halperin, and Kiril Solovey. Efficient multi-robot motion planning for unlabeled discs in simple polygons. In Algorithmic Foundations of Robotics XI , pages 1–17. Springer, 2015. [2] Andrei A. Agrachev and Yuri Sachkov. Control Theory from the Geometric Viewpoint , volume 87. Springer Science & Business Media, 2013. [3] M Annunziato and A Borz` ı. Optimal control of a class of piecewise deter- ministic processes. European Journal of Applied Mathematics , 25(1):1–25, 2014. [4] John M. Ball, Jerrold E. Marsden, and Marshall Slemrod. Controllability for distributed bilinear systems. SIAM Journal on Control and Optimiza- tion , 20(4):575–597, 1982. [5] Spring Berman, ́ Ad ́ am Hal ́ asz, M. Ani Hsieh, and Vijay Kumar. Opti- mized stochastic policies for task allocation in swarms of robots. IEEE Transactions on Robotics , 25(4):927–937, 2009. [6] H.-L. Choi, L. Brunet, and J. P. How. Consensus-based decentralized auc- tions for robust task allocation. IEEE Transactions on Robotics , 25(4):912– 926, 2009. [7] Nikolaus Correll and Heiko Hamann. Probabilistic modeling of swarming systems. In Springer Handbook of Computational Intelligence , pages 1423– 1432. Springer, 2015. [8] Nikolaus Correll and Alcherio Martinoli. System identification of self- organizing robotic swarms. Distributed Autonomous Robotic Systems 7 , pages 31–40, 2006. [9] M. B. Dias, R. M. Zlot, N. Kalra, and A. Stentz. Market-based multirobot coordination: a survey and analysis. Proceedings of the IEEE , 94(7):1257– 1270, 2006. [10] Karthik Elamvazhuthi. A variational approach to planning, allocation and mapping in robot swarms using infinite dimensional models . Arizona State University, 2014. [11] Karthik Elamvazhuthi and Spring Berman. Scalable formation control of multi-robot chain networks using a PDE abstraction. In Int’l. Symp. on Distributed Autonomous Robotic Systems (DARS) , Daejeon, Korea, 2014. [12] Karthik Elamvazhuthi and Spring Berman. Optimal control of stochastic coverage strategies for robotic swarms. In IEEE Int’l. Conf. on Robotics and Automation (ICRA) , pages 1822–1829, 2015. [13] Lawrence C. Evans. Partial Differential Equations . Providence, Rhode Island: American Mathematical Society, 1998. [14] Greg Foderaro, Silvia Ferrari, and Thomas A. Wettergren. Distributed opti- mal control for multi-agent trajectory optimization. Automatica , 50(1):149– 154, 2014. 25 [15] Gianpiero Francesca and Mauro Birattari. Automatic design of robot swarms: achievements and challenges. Frontiers in Robotics and AI , 3(29), 2016. [16] Paul Frihauf and Miroslav Krstic. Leader-enabled deployment onto planar curves: A PDE-based approach. IEEE Transactions on Automatic Control , 56(8):1791–1806, 2011. [17] Aram Galstyan, Tad Hogg, and Kristina Lerman. Modeling and mathe- matical analysis of swarms of microscopic robots. In Swarm Intelligence Symposium (SIS) , pages 201–208. IEEE, 2005. [18] Daniel T. Gillespie. The chemical Langevin equation. Journal of Chemical Physics , 113(1):297–306, 2000. [19] Pierre Grisvard. Elliptic problems in nonsmooth domains . SIAM, 2011. [20] Heiko Hamann and Heinz W ̈ orn. A framework of space–time continu- ous models for algorithm design in swarm robotics. Swarm Intelligence , 2(2):209–239, 2008. [21] Thanh-Trung Han and Shuzhi Sam Ge. Styled-velocity flocking of au- tonomous vehicles: A systematic design. IEEE Transactions on Automatic Control , 60(8):2015–2030, 2015. [22] Musad Haque, Amir Rahmani, Magnus Egerstedt, and Anthony Yezzi. Ef- ficient foraging strategies in multi-agent systems through curve evolutions. IEEE Transactions on Automatic Control , 59(4):1036–1041, 2014. [23] Willem Hundsdorfer and Jan G. Verwer. Numerical Solution of Time- Dependent Advection-Diffusion-Reaction Equations , volume 33. Springer Science & Business Media, 2013. [24] Konstantinos Karydis and Vijay Kumar. Energetics in robotic flight at small scales. Royal Society Interface Focus , 7(1):20160088, 2016. [25] Andreas Kirsch. An Introduction to the Mathematical Theory of Inverse Problems , volume 120. Springer, 2011. [26] Debby Lanser and Jan G. Verwer. Analysis of operator splitting for advection–diffusion–reaction problems from air pollution modelling. Jour- nal of Computational and Applied Mathematics , 111(1):201–216, 1999. [27] Kristina Lerman, Chris Jones, Aram Galstyan, and Maja J. Matari ́ c. Anal- ysis of dynamic task allocation in multi-robot systems. International Jour- nal of Robotics Research , 25(3):225–241, 2006. [28] Alcherio Martinoli, Kjerstin Easton, and William Agassounon. Modeling swarm robotic systems: A case study in collaborative distributed manipu- lation. International Journal of Robotics Research , 23:415–436, 2004. [29] Thomas Meurer and Miroslav Krstic. Finite-time multi-agent deployment: A nonlinear pde motion planning approach. Automatica , 47(11):2534–2542, 2011. 26 [30] Nathan Michael, Jonathan Fink, Savvas Loizou, and Vijay Kumar. Archi- tecture, abstractions, and algorithms for controlling large teams of robots: Experimental testbed and results. In Robotics Research , pages 409–419. Springer, 2010. [31] Dejan Milutinovic and Pedro Lima. Modeling and optimal centralized con- trol of a large-size robotic population. IEEE Transactions on Robotics , 22(6):1280–1285, 2006. [32] Mojtaba Nourian, Peter E. Caines, Roland P. Malhame, and Minyi Huang. Nash, social and centralized solutions to consensus problems via mean field control theory. IEEE Transactions on Automatic Control , 58(3):639–653, 2013. [33] Akira Okubo. Dynamical aspects of animal grouping: swarms, schools, flocks, and herds. Advances in Biophysics , 22:1–94, 1986. [34] Yongsheng Ou and Eugenio Schuster. On the stability of receding horizon control of bilinear parabolic pde systems. In Decision and Control (CDC), 2010 49th IEEE Conference on , pages 851–857. IEEE, 2010. [35] Benedetto Piccoli, Francesco Rossi, and Emmanuel Tr ́ elat. Control to flock- ing of the kinetic Cucker-Smale model. arXiv preprint arXiv:1411.4687 , 2014. [36] Luciano C. A. Pimenta, Nathan Michael, Renato C. Mesquita, Guilherme A. S. Pereira, and Vijay Kumar. Control of swarms based on hydrodynamic models. In IEEE Int’l. Conf. on Robotics and Automation (ICRA) , pages 1948–1953, 2008. [37] Amanda Prorok, Nikolaus Correll, and Alcherio Martinoli. Multi-level spa- tial modeling for stochastic distributed robotic systems. International Jour- nal of Robotics Research , 30(5):574–589, 2011. [38] Jacques Simon. Compact sets in the space L p (0 , T ; B ). Annali di Matem- atica Pura ed Applicata , 146(1):65–96, 1986. [39] Metin Sitti, Hakan Ceylan, Wenqi Hu, Joshua Giltinan, Mehmet Turan, Sehyuk Yim, and Eric Diller. Biomedical applications of untethered mobile milli/microrobots. Proceedings of the IEEE , 103(2):205–224, 2015. [40] Fredi Tr ̈ oltzsch. Optimal Control of Partial Differential Equations: Theory, Methods and Applications , volume 112. American Mathematical Society, 2010. [41] Matthew Turpin, Nathan Michael, and Vijay Kumar. CAPT: Concurrent assignment and planning of trajectories for multiple robots. International Journal of Robotics Research , 33(1):98–112, 2014. [42] J Wloka. Partial Differential Equations . Cambridge University Press, Cam- bridge, 1987. 27 [43] Fangbo Zhang, Andrea L Bertozzi, Karthik Elamvazhuthi, and Spring Berman. Performance bounds on spatial coverage tasks by stochastic robotic swarms. To appear in. IEEE Transactions on Automatic Control , 63(6), 2018. 28