Conservative collision prediction and avoidance for stochastic trajectories in continuous time and space Jan-P. Calliess, Michael Osborne, Stephen Roberts∗ Dept. of Engineering Science, University of Oxford September 13, 2018 Abstract Existing work in multi-agent collision prediction and avoid- ance typically assumes discrete-time trajectories with Gaus- sian uncertainty or that are completely deterministic. We pro- pose an approach that allows detection of collisions even be- tween continuous, stochastic trajectories with the only restric- tion that means and covariances can be computed. To this end, we employ probabilistic bounds to derive criterion func- tions whose negative sign provably is indicative of probable collisions. For criterion functions that are Lipschitz, an al- gorithm is provided to rapidly find negative values or prove their absence. We propose an iterative policy-search approach that avoids prior discretisations and, upon termination, yields collision-free trajectories with adjustably high certainty. We test our method with both fixed-priority and auction-based protocols for coordinating the iterative planning process. Re- sults are provided in collision-avoidance simulations of feedback controlled plants. 1 Introduction Due to their practical importance, multi-agent collision avoidance and con- trol have been extensively studied across different communities including AI, robotics and control. Considering continuous stochastic trajectories, reflect- ing each agent’s uncertainty about its neighbours’ time-indexed locations in an environment space, we exploit a distribution-independent bound on collision probabilities to develop a conservative collision-prediction module. It avoids ∗The authours gratefully acknowledge funds via EPSRC EP/I011587. 1 arXiv:1402.4157v2 [cs.AI] 12 May 2014 temporal discretisation by stating collision-prediction as a one-dimensional op- timization problem. If mean and standard deviation are computable Lipschitz functions of time, one can derive Lipschitz constants that allow us to guarantee collision prediction success with low computational effort. This is often the case, for instance, when dynamic knowledge of the involved trajectories is available (e.g. maximum velocities or even the stochastic differential equations). To avoid collisions detected by the prediction module, we let an agent re-plan repeatedly until no more collisions occur with a definable probability. Here, re- planning refers to modifying a control signal (influencing the basin of attraction and equilibrium point of the agent’s stochastic dynamics) so as to bound the collision probability while seeking low plan execution cost in expectation. To keep the exposition concrete, we focus our descriptions on an example scenario where the plans correspond to sequences of setpoints of a feedback controller regulating an agent’s noisy state trajectory. However, one can apply our method in the context of more general policy search problems. In order to foster low social cost across the entire agent collective, we com- pare two different coordination mechanisms. Firstly, we consider a simple fixed- priority scheme [11], and secondly, we modify an auction-based coordination protocol [7] to work in our continuous setting. In contrast to pre-existing work in auction-style multi-agent planning (e.g. [7,16]) and multi-agent collision avoidance (e.g. [1, 2, 15]), we avoid a priori discretizations of space and time. Instead, we recast the coordination problem as one of incremental open-loop pol- icy search. That is, as a succession of continuous optimisation or root-finding problems that can be efficiently and reliably solved by modern optimisation and root-finding techniques (e.g. [13,23]). While our current experiments were conducted with linear stochastic differ- ential equation (SDE) models with state-independent noise (yielding Gaussian processes), our method is also applicable to any situation where mean and co- variances can be evaluated. This encompasses non-linear, non-Gaussian cases that may have state-dependent uncertainties (cf. [12]). This preprint is an extended and improved version of a conference paper that appeared in Proc. of the 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2014) [6]. 1.1 Related Work Multi-agent trajectory planning and task allocation methods have been related to auction mechanisms by identifying locations in state space with atomic goods to be auctioned in a sequence of repeated coordination rounds (e.g. [7,16,26]). Unfortunately, even in finite domains the coordination is known to be intractable – for instance the sequential allocation problem is known to be NP-hard in the number of goods and agents [14, 22]. Furthermore, collision avoidance corre- sponds to non-convex interactions. This renders the coordination problem inapplicable to standard optimization techniques that rely on convexity of the joint state space. In recent years, several works have investigated the use of mixed-integer programming techniques for 2 single- and multi-agent model-predictive control with collision avoidance both in deterministic and stochastic settings [7,19]. To connect the problem to pre- existing mixed-integer optimization tools these works had to limit the models to dynamics governed by linear, time-discrete difference equations with state- independent state noise. The resulting plans were finite sequences of control inputs that could be chosen freely from a convex set. The controls gained from optimization are open-loop – to obtain closed-loop policies the optimization problems have to be successively re-solved on-line in a receding horizon fashion. However, computational effort may prohibit such an approach in multi-agent systems with rapidly evolving states. Furthermore, prior time-discretisation comes with a natural trade-off. On the one hand, one would desire a high temporal resolution in order to limit the chance of missing a collision predictably occurring between consecutive time steps. On the other hand, communication restrictions, as well as poor scalability of mixed-integer programming techniques in the dimensionality of the input vectors, impose severe restrictions on this resolution. To address this trade- off, [10] proposed to interpolate between the optimized time steps in order to detect collisions occurring between the discrete time-steps. Whenever a collision was detected they proposed to augment the temporal resolution by the time- step of the detected collision thereby growing the state-vectors incrementally as needed. A detected conflict, at time t, is then resolved by solving a new mixed-integer linear programme over an augmented state space, now including the state at t. This approach can result in a succession of solution attempts of optimization problems of increasing complexity, but can nonetheless prove relatively computationally efficient. Unfortunately, their method is limited to linear, deterministic state-dynamics. Another thread of works relies on dividing space into polytopes [1,17], while still others [8,9,15,21] adopt a potential field. In not accommodating uncertainty and stochasticity, these approaches are forced to be overly conservative in order to prevent collisions in real systems. In contrast to all these works, we will consider a different scenario. Our ex- position focuses on the assumption that each agent is regulated by influencing its continuous stochastic dynamics. For instance, we might have a given feed- back controller with which one can interact by providing a sequence of setpoints constituting the agent’s plan. While this restricts the choice of control action, it also simplifies computation as the feedback law is fixed. The controller can gen- erate a continuous, state-dependent control signal based on a discrete number of control decisions, embodied by the setpoints. Moreover, it renders our method applicable in settings where the agents’ plants are controlled by standard off- the-shelf controllers (such as the omnipresent PID-controllers) rather than by more sophisticated customized ones. Instead of imposing discreteness, we make the often more realistic assumption that agents follow continuous time-state tra- jectories within a given continuous time interval. Unlike most work [1,21,25,27] in this field, we allow for stochastic dynamics, where each agent cannot be cer- tain about the location of its team-members. This is crucial for many real-world multi-agent systems. The uncertainties are modelled as state-noise which can re- 3 flect physical disturbances or merely model inaccuracies. While our exposition’s focus is on stochastic differential equations, our approach is generally applicable in all contexts where the first two moments of the predicted trajectories can be evaluated for all time-steps. As noted above, this paper is an extended version of work that has been published in the proceedings of AAMAS’14 [6] and an earlier stage of this work was presented at an ICML [5] workshop. 2 Predictive Probabilistic Collision Detection with Criterion Functions Task. Our aim is to design a collision-detection module that can decide whether a set of (predictive) stochastic trajectories is collision-free (in the sense defined below). The module we will derive is guaranteed to make this decision correctly, based on knowledge of the first and second order moments of the trajectories alone. In particular, no assumptions are made about the family of stochastic processes the trajectories belong to. As the required collision probabilities will generally have to be expressed as non-analytic integrals, we will content our- selves with a fast, conservative approach. That is, we are willing to tolerate a non-zero false-alarm-rate as long as decisions can be made rapidly and with zero false-negative rate. Of course, for certain distributions and plant shapes, one may derive closed-form solutions for the collision probability that may be less conservative and hence, lead to faster termination and shorter paths. In such cases, our derivations can serve as a template for the construction of criterion functions on the basis of the tighter probabilistic bounds. Problem Formalization. Formally, a collision between two objects (or agents) a, r at time t ∈I := [t0, tf] ⊂R can be described by the event Ca,r(t) = {(xa(t), xr(t))| ∥xa(t) −xr(t)∥2 ≤Λa+Λr 2 }. Here, Λa, Λr denote the objects’ diameters, and xa, xr : I →RD are two (possibly uncertain) trajectories in a common, D-dimensional interaction space. In a stochastic setting, we desire to bound the collision probability below a threshold δ ∈(0, 1) at any given time in I. We loosely say that the trajectories are collision-free if Pr[Ca,r(t)] < δ, ∀t ∈I. Approach. For conservative collision detection between two agents’ stochas- tic trajectories xa, xr, we construct a criterion function γa,r : I →R (eq. as per Eq. 2 below). A conservative criterion function has the property γa,r(t) > 0 ⇒Pr[Ca,r(t)] < δa. That is, a collision between the trajectories with proba- bility above δ can be ruled-out if γa,r attains only positive values. If one could evaluate the function t 7→Pr[Ca,r(t)], an ideal criterion function would be γa,r ideal(t) := δ −Pr[Ca,r(t)]. (1) It is ideal in the sense that γa,r ideal(t) > 0 ⇔Pr[Ca,r(t)] < δ. However, in most cases, evaluating the criterion function in closed form will not be feasi- ble. Therefore, we adopt a conservative approach: That is, we determine a 4 criterion function γa,r(t) such that provably, we have γa,r(t) ≤γa,r ideal(t), ∀t, in- cluding the possibility of false-alarms. That is, it is possible that for some times t, γa,r(t) ≤0, in spite of γa,r ideal(t) > 0. Utilising the conservative criterion functions for collision-prediction, we as- sume a collision occurs unless mint∈I γa,r(t) > 0, ∀r ̸= a. If the trajectories’ means and standard deviations are Lipschitz functions of time then one can often show that γa,r is Lipschitz as well. In such cases negative values of γa,r can be found or ruled out rapidly, as will be discussed in Sec. 2.1. In situations where a Lipschitz constant is unavailable or hard to determine, we can base our detection on the output of a global minimization method such as DIRECT [13]. 2.1 Finding negative function values of Lipschitz functions Let t0, tf ∈R, t0 ≤tf, I := [t0, tf] ⊂R. Assume we are given a Lipschitz continuous target function f : I →R with Lipschitz constant L ≥0. That is, ∀S ⊂I ∃LS ≤L ∀x, x′ ∈S : |f(x) −f(x′)| ≤LS |x −x′|. Let t0 < t1 < t2 < ... < tN < tf and define GN = (t0, . . . , tN+1) to be the sample grid of size N + 2 ≥2 consisting of the inputs at which we choose to evaluate the target f. Our goal is to prove or disprove the existence of a negative function value of target f. 2.1.1 A naive algorithm As a first, naive method, Alg. 1 leverages Lipschitz continuity to answer the question of positivity correctly after a finite number of function evaluations. The algorithm evaluates the function values on a finite grid assuming a uniform constant Lipschitz number L. The grid is iteratively refined until either a negative function value is found or, the Lipschitz continuity of function γ allows us to infer that no negative function values can exist. The latter is the case whenever mint∈GN γ(t) > L ∆where GN = (t0, ..., tN+1) is the grid of function input (time) samples, ∆= |ti+1 −ti| (i = 0, ..., N −1) and L > 0 a Lipschitz number of the function γ : (t0, tf) →R which is to be evaluated. The claim is established by the following Lemma: Lemma 2.1. Let γ : [t0, tf] ⊂R →R be a Lipschitz function with Lipschitz number L > 0. Furthermore, let GN = (t0, t1, . . . , tN+1) be an equidistant grid with ∆= |ti+1 −ti| (i = 0, ..., N −1). We have, γ(t) > 0, ∀t ∈(t0, tf) if ∀t ∈GN : γ(t) > L ∆. Proof. Since L is a Lipschitz constant of γ we have |γ(t)−γ(t′)| ≤L|t−t′|, ∀t, t′ ∈ (t0, tf). Now, let t∗∈(t0, tf) and ti, ti+1 ∈GN such that t∗∈[ti, ti+1]. Consistent with the premise of the implication we aim to show, we assume γ(ti), γ(ti+1) > L∆and, without loss of generality, we assume γ(ti) ≤γ(ti+1). Let δ := |ti −t∗|. Since ti ≤t∗≤ti+1 we have 0 ≤∆−δ. Finally, 0 < L∆< |γ(ti)| implies γ(t∗) ≥γ(ti) −|γ(ti) −γ(t∗)| ≥γ(ti) −L|ti −t∗| > L∆−Lδ = L(∆−δ) ≥0. 5 input : Domain boundaries t0, tf ∈R, function γ : (t0, tf) →R, Lipschitz constant L > 0. output: Flag flag indicating presence of a non-positive function value (flag = 1 indicates existence of a non-positive function value; flag =0 indicates it has been ruled out that a negative function value can exist). Variable criticalTime contains the time of a non-positive function value if such exists (criticalTime = t0 −1, iff γ((t0, tf)) ⊂R+). flag ←−1; criticalTime ←t0 −1; TimeGrid ←{t0, tf}; r ←−1; repeat r ←r + 1; ∆←tf −t0 2r ; N ←(tf −t0)/∆; TimeGrid ←∪N i=0{t0 + i∆}; minVal ←mint∈TimeGrid γ(t); if minVal ≤0 then flag ←1; criticalTime ←arg mint∈TimeGrid γ(t); else if minVal > L ∆then flag ←0; until flag = 1 OR flag = 0; Algorithm 1: Naive algorithm deciding whether a Lipschitz continuous function γ has a non-positive value on a compact domain. Note, if minVal > L ∆the function is guaranteed to map into the positive reals exclusively. 6 1.5 2 2.5 3 3.5 4 4.5 −1.5 −1 −0.5 0 0.5 1 1.5 2 ground truth floor ceiling samples 1.5 2 2.5 3 3.5 4 4.5 −1 −0.5 0 0.5 1 1.5 ground truth floor ceiling samples 1.5 2 2.5 3 3.5 4 4.5 −1 −0.5 0 0.5 1 ground truth floor ceiling samples Figure 1: Proving the existence of a negative value of function x 7→ |sin(x)| cos(x) + 1 4. Left: Initial condition. Centre: First refinement. Right: The second refinement has revealed the existence of a negative value. Appart from a termination criterion, the lemma establishes that larger Lip- schitz numbers will generally cause longer run-times of the algorithm as finer resolutions ∆t will be required to ensure non-negativity of the function under investigation. 2.1.2 An improved adaptive algorithm Next, we will present an improved version of the algorithm provided above. We can define two functions, ceiling uN and floor lN, such that (i) they bound the target ∀t ∈I : lN(t) ≤γ(t) ≤uN(t), and (ii) the bounds get tighter for denser grids. In particular, one can show that lN, uN N→∞ −→f uniformly if GN converges to a dense subset of [a, b]. Define ξl N := arg minx∈I lN(x). It has been shown that ξl N = minN−1 i=1 ti+1+ti 2 −γ(ti+1)−γ(ti) 2L and lN(ξl N) = mini γ(ti+1)+γ(ti) 2 −L ti+1−ti 2 (see [13,23]). It is trivial to refine this to take localised Lipschitz constants into account: ξl N = mini γ(ti+1)+γ(ti) 2 −LJi ti+1−ti 2 where LJi is a Lipschitz number valid on interval Ji = (ti, ti+1). This suggests the following algorithm: We refine the grid GN to grid GN+1, by including ξl N, f(ξl N) as a new sample. This process is repeated until either of the following stopping conditions are met: (i) a negative function value of γ is discovered (f(ξl N) < 0), or (ii) lN(ξl N) ≥0 (in which case we are guaranteed that no negative function values can exist). For pseudo-code refer to Alg. 2. An example run is depicted in Fig. 1. Note, without our stopping criteria, our algorithm degenerates to Shubert’s minimization method [23]. The stopping criteria are important to save computation, especially in the absence of negative function values. 2.2 Deriving collision criterion functions This subsection is dedicated to the derivation of a (Lipschitz) criterion function. In lieu to the approach of [7, 20], the idea is to define hyper-cuboids Ha, Hr sufficently large to contain a large enough proportion of each agent’s probability mass to ensure that no collision occurs (with sufficient confidence) as long as the 7 input : Domain boundaries t0, tf ∈R, function γ : (t0, tf) →R, Lipschitz constant L > 0. output: Flag flag indicating presence of a non-positive function value (flag = 1 indicates existence of a non-positive function value; flag =0 indicates it has been ruled out that a negative function value can exist). Variable criticalTime contains the time of a non-positive function value if such exists (criticalTime = t0 −1, iff γ((t0, tf)) ⊂R+). flag ←−1; criticalTime ←t0 −1; GN ←{t0, tf}; N = 0; repeat ξl ←minN i=1 ti+1+ti 2 −γ(ti+1)−γ(ti) 2L ; lN(ξl N) ←minN i=1 γ(ti+1)+γ(ti) 2 −L ti+1−ti 2 ; minVal ←γ(ξl); if minVal ≤0 then flag ←1; criticalTime ←ξl; else if lN(ξl N) > 0 then flag ←0; else N ←N + 1; GN ←GN ∪{ξl}; end until flag = 1 OR flag = 0; Algorithm 2: Adaptive algorithm based on Shubert’s method to prove whether a Lipschitz continuous function γ has a non-positive value on a compact domain. Note, if lN(ξl N) > 0 the function is guaranteed to map into the positive reals exclusively. 8 cuboids do not overlap. We then define the criterion function so as to negative values whenever the hyper-cuboids do overlap. For ease of notation, we omit the time index t. For instance, in this subsec- tion, xa now denotes random variable xa(t) rather than the stochastic trajectory. The next thing we will do is to derive sufficient conditions for absence of collisions, i.e. for Pr[Ca,r] < δ. To this end, we make an intermediate step: For each agent q ∈{a, r} we define an open hyper-cuboid Hq centred around mean µq = ⟨xq(t)⟩. As a D- dimensional hyper-cuboid, Hq is completely determined by its centre point µq and its edge lengths lq 1, ..., lq D. Let Oq denote the event that xq /∈Hq and P q := Pr[Oq]. We derive a simple disjunctive constraint on the component distances of the means under which we can guarantee that the collision probability is not greater than the probability of at least one object being outside its hyper-cuboid. This is the case if the hypercuboids do not overlap. That is, their max-norm distance is at least Λa,r := Λa+Λr 2 . Before engaging in a formal discussion we need to establish a preparatory fact: Lemma 2.2. Let µq j denote the jth component of object q’s mean and rq j = 1 2lq j . Furthermore, let Fa,r := Ca,r be the event that no collision occurs and Ba,r := Ha × Hr the event that xa ∈Ha and xr ∈Hr. Assume the component-wise distance between the hyper-cuboids Ha, Hr is at least Λa,r, which is expressed by the following disjunctive constraint: ∃j ∈{1, ..., D} : µa j −µr j > Λa,r + ra j + rr j. Then, we have : Ba,r ⊂Fa,r. Proof. Since ∥x∥∞≤∥x∥2 , ∀x we have F∞:= {(xa, xr)| ∥xa −xr∥∞> Λa,r} ⊂{(xa, xr)| ∥xa −xr∥2 > Λa,r} = Fa,r. It remains to be shown that Ba,r ⊂F∞: Let (xa, xr) ∈Ba,r = Ha × Ha. Thus, ∀j ∈{1, ..., D}, q ∈{a, r} : xq j −µq j ≤rq j . For contradiction, assume (xa, xr) /∈F∞. Then, |xa i −xr i| ≤Λa,r for all i ∈{1, ..., D}. Hence, |µa i −µr i| = |µa i −xa i + xa i −xr i + xr i −µr i| ≤|µa i −xa i | + |xa i −xr i| + |xr i −µr i| ≤ra i + Λa,r + rr i, ∀i ∈{1, ..., D} which contradicts our disjunctive constraint in the premise of the lemma. q.e.d. Theorem 2.3. Let µq j denote the jth component of object q’s mean and rq j = 1 2lq j . Assume, xa, xr are random variables with means µa = ⟨xa⟩, µr = ⟨xr⟩, respectively. The max-norm distance between hypercuboids Ha, Hr is at least Λa,r > 0 (i.e. the hypercuboids do not overlap), which is expressed by the fol- lowing disjunctive constraint: ∃j ∈{1, ..., D} : µa j −µr j > Λa,r + ra j + rr j. Then, we have : Pr[Ca,r] ≤P a + P r −P r P a ≤P a + P r 9 where P q = Pr[xq /∈Hq], (q ∈{a, r}). Proof. As in Lem. 2.2, let Fa,r := Ca,r be the event that no collision occurs and let Ba,r := Ha × Hr. We have Pr[Ca,r] ≤1 −Pr[Ca,r] = 1 −Pr[Fa,r]. By Lem. 2.2 we have Ba,r ⊂Fa,r and thus, 1 −Pr[Fa,r] ≤1 −Pr[Ba,r] = Pr[Ba,r]. Now, Pr[Ba,r] = Pr[xa /∈Ha ∨xr /∈Hr] = P a + P r −P a P r ≤P a + P r. q.e.d. One way to define a criterion function is as follows: γa,r(t; ϱ(t)) := max i=1,...,D{|µa i (t) −µr i(t)| −Λa,r −ra i (t) −rr i(t)} (2) where ϱ = (ra 1, . . . , ra D, rr 1, . . . , rr D) is the parameter vector of radii. (For nota- tional convenience, we will often omit explicit mention of parameter ϱ in the function argument.) For more than two agents, agent a′s overall criterion function is Γa(t) := minr∈A\{a} γa,r(t). Thm. 2.3 tells us that the collision probability is bounded from above by the desired threshold δ if γa,r(t) > 0, provided we chose the radii ra j , rr j (j = 1, ..., D) such that P a, P r ≤δ 2. Let q ∈{a, r}. Probability theory provides several distribution-independent bounds relating the radii of a (possibly partly unbounded) hypercuboid to the probability of not falling into it. That is, these are bounds of the form P q ≤β(rq 1, ..., rq D; Θ) where β is a continuous function that decreases monotonically with increasing radii and Θ represents additional information. In the case of Chebyshev-type bounds information about the first two moments are folded in, i.e. Θ = (µq, Cq) where Cq(t) ∈RD×D is the variance (-covariance) matrix. We then solve for radii that fulfil the inequality δ 2 ≥β(rq 1, ..., rq D; Θ) while simultaneously ensuring collision avoidance with the desired probability. Inspecting Eq. 2, it becomes clear that, in order to maximally diminish conservatism of the criterion function, it would be ideal to choose the radii in ϱ such that ϱ = argmaxϱγa,r(t; ϱ) = argmaxra 1 ,...,ra Drr 1,...,rr D maxi=1,...,D{|µa i −µr i|− Λa,r−ra i −rr i} subject to the constraints δ 2 ≥β(rq 1, ..., rq D; Θ), (q ∈{a, r}). Solving this constrained optimisation problem can often be done in closed form. In the context where β is derived from a Chebyshev-type bound, we propose to set as many radii as large as possible (in order to decrease (β to satisfy the constraints) while setting the radii ra i , rr i as small as possible without violating the constraint (where i is some dimension). That is, we define the radii as follows: Set rq j := ∞, ∀j ̸= i. The remaining unknown variable, rq i , then is defined as the solution to the equation δ 2 = β(rq 1, ..., rq D; Θ). The resulting criterion function, denoted by γa,r i , we obtain with this procedure of course depends on the arbitrary choice of dimension i. Therefore, we obtain a less conservative criterion function by repeating this process for each dimension i and then constructing a new criterion function as the point-wise maximum: γa,r(t) := maxi γa,r i (t). 10 A concrete example of this procedure is provided below. 2.2.1 Example constructions of distribution-independent criterion functions We can use the above derivation as a template for generating criterion functions. Consider the following concrete example. Combining union bound and the standard (one-dim.) Chebyshev bound yields P q = Pr[xq /∈Hq] ≤PD j=1 Cq jj rq j rq j =: β(rq 1, . . . , rq D; Cq). Setting every radius, except rq i , to infinitely large values and β equal to δ 2 yields δ 2 = Cq ii rq i rq i , i.e. rq i = q 2Cq ii δ . (Note, this a correction of the radius provided in the conference version of this paper.) Finally, inserting these radii ( for q = a, r) into Eq. 2 yields our first collision criterion function: γa,r(t) := |µa i (t) −µr i(t)| −Λa,r − q 2Ca ii(t) δ − q 2Cr ii(t) δ . Of course, this argument can be made for any choice of dimension i. Hence, a less conservative, yet valid, choice is γa,r(t) := max i=1,...,D |µa i (t) −µr i(t)| −Λa,r − r 2Ca ii(t) δ − r 2Cr ii(t) δ . (3) Notice, this function has the desirable property of being Lipschitz con- tinuous, provided the mean µq i : I →R and standard deviation functions σq ii = p Cq ii : I →R+ are. In particular, it is easy to show L(γa,r) ≤ maxi=1,...,D L(µa i ) + L(µr i) + q 2 δ L(σa ii) + L(σr ii)  where, as before, L(f) de- notes a Lipschitz constant of function f. For the special case of two dimensions, we can derive a less conservative alternative criterion function based on a tighter two-dimensional Chebyshev- type bound [28]: Theorem 2.4 (Alternative collision criterion function). Let spatial dimension- ality be D = 2. Choosing rq i (t) := q 1 2δa r Cq ii(t) + √ Cq ii(t)Cq jj(t)(Cq ii(t)Cq jj(t)−(Cq ij(t))2) Cq jj(t) (q ∈{a, r}, i ∈{1, 2}, j ∈{1, 2} −{i}) in Eq. 2 yields a valid distribution- independend criterion function. That is, γa,r(t) > 0 ⇒Pr[Ca,r(t)] < δa. A proof sketch and a Lipschitz constant (for non-zero uncertainty) are pro- vided in the appendix. Note, the Lipschitz constant we have derived therein becomes infinite in the limit of vanishing variance. In that case, the presence of negative criterion values can be tested based on the sign of the minimum of the criterion function. This can be found employing a global optimiser. Future work will investigate, in how far Hoelder continuity instead of Lipschitz conti- nuity can be leveraged to yield a similar algorithm as the one provided in Sec. 2.1.2. 11 0 0.5 1 1.5 2 −1.5 −1 −0.5 0 0.5 1 Criterion function value Collision ruled out Collision suspected 0 5 10 15 −5 0 5 10 Criterion function value Collision ruled out Collision suspected 0 5 10 15 −4 −2 0 2 4 6 8 10 12 Criterion function value Collision ruled out Collision suspected Figure 2: Criterion function values (as per Eq. 3) as a function of ∥⟨xr⟩−⟨xa⟩∥∞and with δ = 0.05, Λa,r = 1. Left: variances Ca = Cr = diag(.00001, .00001). Centre: variances Ca = Cr = diag(.1, .1). Right: vari- ances Ca = Cr = diag(.1, .1) and with improved criterion function (as per Thm. B.3). 2.2.2 Multi-agent case. Let a ∈A, A′ ⊂A such that a /∈A′ a subset of agents. We define the event that a collides with at least one of the agents in A’ at time t as Ca,A′(t) := {(xa(t), xr(t))|∃r ∈A′ : ∥xa(t) −xr(t)∥2 ≤Λ} = S r∈A′ Ca,r. By union bound, Pr[Ca,A′(t)] ≤P r∈A′ Pr[Ca,r(t)]. Theorem 2.5 (Multi-Agent Criterion). Let γa,r be valid criterion functions defined w.r.t. collision bound δa. We define multi-agent collision criterion func- tion Γa,A′(t) := minr∈A′ γa,r(t). If Γa,A′(t) > 0 then the collision probability with A’ is bounded below δa|A′|. That is, Pr[Ca,A′(t)] < δa|A′|. Proof. Let a ∈A, A′ ⊂A such that a /∈A′ a subset of agents. We define the event that a collides with at least one of the agents in A’ at time t as Ca,A′(t) := {(xa(t), xr(t))|∃r ∈A′ : ∥xa(t) −xr(t)∥2 ≤∆} = S r∈A′ Ca,r. We have established that if ∀r ∈A′ : γa,r(t) > 0 then Pr[Ca,r(t)] < δa, ∀r ∈ A′. Now, let Γa,A′ < δa. Hence,∀r ∈A′ : γa,r(t) > 0. Thus, ∀r ∈A′ : Pr[Ca,r(t)] < δa) Therefore, P r∈A′ Pr[Ca,r(t)] ≤|A′| δa. By union bound, Pr[Ca,A′(t)] ≤P r∈A′ Pr[Ca,r(t)]. Consequently, we have Pr[Ca,A′(t)] ≤|A′| δa. q.e.d. Moreover, Γa,A′ is Lipschitz if the constituent functions γa,r are (see Ap- pendix B). Our distribution-independent collision criterion functions have the virtue that they work for all distributions – not only the omnipresent Gaussian. Un- fortunately, distribution-independence is gained at the price of conservativeness ( ref. to Fig. 2). In our experiments in Sec. 4, the collision criterion function as per Thm. B.3 is utilized as an integral component of our collision avoidance mechanisms. The results suggest that the conservativeness of our detection module does not entail prohibitively high-false-alarm rates for the distribution- independent approach to be considered impractical. That said, whenever distri- butional knowledge can be converted into a criterion function. One could then 12 use our derivations as a template to generate refined criterion functions using Eq. 2 with adjusted radii ri,rj, reflecting the distribution at hand. 3 Collision Avoidance In this section we outline the core ideas of our proposed approach to multi-agent collision avoidance. After specifying the agent’s dynamics and formalizing the notion of a single-agent plan, we define the multi-agent planning task. Then we describe how conflicts, picked-up by our collision prediction method, can be resolved. In Sec. 3.1 we describe the two coordination approaches we consider utilizing to generate conflict-free plans. I) Model (example). We assume the system contains a set A of agents indexed by a ∈{1, ..., |A |}. Each agent a’s associated plant has a probabilistic state trajectory following stochastic controlled D-dimensional state dynamics (we consider the case D = 2) in the continuous interval of (future) time I = (t0, tf]. We desire to ask agents to adjust their policies to avoid collisions. Each policy gives rise to a stochastic belief over the trajectory resulting from executing the policy. For our method to work, all we require is that the trajectory’s mean function m : I →RD and covariance matrix function Σ : I →RD×D are evaluable for all times t ∈I. A prominent class for which closed-form moments can be easily derived are linear stochastic differential equations (SDEs). For instance, we consider the SDE dxa(t) = K ξa(t) −xa(t)  dt + B dW (4) where K, B ∈RD×D are matrices xa : I →RD is the state trajectory and W is a vector-valued Wiener process. Here, u(xa; ξa) := K(ξa −xa) could be interpreted as the control policy of a linear feedback-controller parametrised by ξa. It regulates the state to track a desired trajectory ξa(t) = ζa 0χ{0}(t) + PHa i=1 ζa i χτ a i (t) where χτi : R →{0, 1} denotes the indicator function of the half- open interval τ a i = (ta i−1, ta i ] ⊂[0, T a] and each ζa i ∈RD is a setpoint. If K is positive definite the agent’s state trajectory is determined by setpoint sequence pa = (ta i , ζa i )Ha i=0 (aside from the random disturbances) which we will refer to as the agent’s plan. For example, plan pa := (t0, xa 0), (tf, xa f)  could be used to regulate agent a’s start state xa 0 to a given goal state xa f between times t0 and tf. For simplicity, we assume the agents are always initialized with plans of this form before coordination commences. One may interpret a setpoint as some way to alter the stochastic trajectory. Below, we will determine setpoints that modify a stochastic trajectory to reduce collision probability while maintaining low expected cost. From the vantage point of policy search, ξa is agent a’s policy parameter that has to be adjusted to avoid collisions. II) Task. Each agent a desires to find a sequence of setpoints (pa) such that (i) it moves from its start state xa 0 to its goal state xa f along a low-cost trajectory and (ii) such that along the trajectory its plant (with diameter ∆) 13 does not collide with any other agents’ plant in state space with at least a given probability 1 −δ ∈(0, 1). III) Collision resolution. An agent seeks to avoid collisions by adding new setpoints to its plan until the collision probability of the resulting state trajectory drops below threshold δ. For choosing these new setpoints we consider two methods WAIT and FREE. In the first method the agents insert a time- setpoint pair (t, xa 0) into the previous plan pa. Since this aims to cause the agent to wait at its start location xa 0 we will call the method WAIT. It is possible that multiple such insertions are necessary until collisions are avoided. Of course, if a higher-priority agent decides to traverse through xa 0, this method is too rigid to resolve a conflict. In the second method the agent optimizes for the time and location of the new setpoint. Let pa ↑(t,s) be the plan updated by insertion of time-setpoint pair (t, s) ∈I × RD. We propose to choose the candidate setpoint (t, s) that minimizes a function being a weighted sum of the expected cost entailed by executing updated plan pa ↑(t,s) and a hinge-loss collision penalty ca coll(pa ↑(t,s)) := λ max{0, −mint Γa(t)}. Here, Γa is computed based on the assumption we were to execute pa ↑(t,s) and λ >> 0 determines the extent to which collisions are penalized. Since the new setpoint can be chosen freely in time and state-space we refer to the method as FREE. 3.1 Coordination We will now consider how to integrate our collision detection and avoidance methods into a coordination framework that determines who needs to avoid whom and at what stage of the coordination process. Such decisions are known to significantly impact the social cost (i.e. the sum of all agents’ individual costs) of the agent collective. Fixed-priorities (FP). As a baseline method for coordination we consider a basic fixed-priority method (e.g. [3,11]). Here, each agent has a unique ranking (or priority) according to its index a (i.e. agent 1 has highest priority, agent |A| lowest). When all higher-ranking agents are done planning, agent a is informed of their planned trajectories which it has to avoid with a probability greater than 1 −δ. This can be done by repeatedly invoking for collision detection and resolution methods described above until no further collision with higher- ranking agents are found. Lazy Auction Protocol (AUC). While the FP method is simple and fast the rigidity of the fixed ranking can lead to sub-optimal social cost and coordi- nation success. Furthermore, its sequential nature does not take advantage of possible parallelization a distributed method could. To alleviate this we propose to revert the ranking flexibly on a case-by-case basis. In particular, the agents are allowed to compete for the right to gain passage (e.g. across a region where a collision was detected) by submitting bids in the course of an auction. The structure of the approach is outlined in Alg. 3. Assume an agent a detects a collision at a particular time step tcoll and 14 input : Agents a ∈A, cost functions ca, dynamics, initial start and goal states, initial plans p1, ..., p|A| . output: collision-free plans p1, ..., p|A|. repeat for a ∈A do [ flag a, Ca, tcoll] ←CollDetect a(a, A −{a}) if flaga = 1 then winner ←Auction(Ca ∪{a}, tcoll) foreach r ∈(Ca ∪{a}) −{winner} do pr ←Avoidr((Ca ∪{a}) −{r}, tcoll) Broadcastr (pr) end end end until ∀a ∈A : flaga = 0; Algorithm 3: Lazy auction coordination method (AUC) (written in a sequen- tialized form). Collisions are resolved by choosing new setpoints to enforce collision avoidance. Ca: set of agents detected to be in conflict with agent a. flaga: collision detection flag (=0, iffno collision detected). tcoll: earliest time where a collision was detected. Avoid: collision resolution method updating the plan by a single new setpoint according to WAIT or FREE. invites the set of agents Ca = {r|γa,r(tcoll) ≤0} to join an auction to decide who needs to avoid whom. In particular, the auction determines a winner who is not required to alter his plan. The losing agents need to insert a new setpoint into their respective plans designed to avoid all other agents in Ca while keeping the plan cost function low. The idea is to design the auction rules as a heuristic method to minimize the social cost of the ensuing solution. To this end, we define the bids such that their magnitude is proportional to a heuristic magnitude of the expected regret for losing and not gaining passage. That is agent a submits a bid ba = la −sa. Magnitude la is defined as a’s anticipated cost ca plan(pa ↑(t,s)) for the event that the agent will not secure “the right of passage” and has to create a new setpoint (t, s) (according to (III)) tailored to avoid all other agents engaged in the current auction. On the other hand, sa := ca plan(pa) is the cost of the unchanged plan pa. If there is a tie among multiple agents the agent with the lowest index among the highest bidders wins. Acknowledging that swinner+P a̸=winner la is an estimated social cost (based on current beliefs of trajectories) after the auction, we see that the winner determination rule greedily attempts to minimize social cost: bwinner ≥br ⇔ ∀r : sr + P a̸=r la ≥swinner + P a̸=winner la. 15 −2 0 2 4 6 0 2 4 6 8 10 0 0.5 1 1.5 2 dim. 2 simulated trajectories dim. 1 time −2 0 2 4 6 −2 0 2 4 6 8 10 0 0.5 1 1.5 2 dim. 2 simulated trajectories dim. 1 time −2 0 2 4 6 0 5 10 15 0 0.5 1 1.5 2 dim. 2 simulated trajectories dim. 1 time Figure 3: EXP1. Draws from uncoordinated agents’ plans (left), after coordina- tion and collision resolution with methods FP-WAIT (centre) and AUC-WAIT (right). Experiment 1 Experiment 2 Quantity NONE AUC-WAIT FP-WAIT NONE AUC-FREE FP-FREE A 78 0 0 51 0 0 B 13.15 13.57 12.57 14.94 16.22 18.13 C 0.05 0.04 25.8 0.05 0.05 0.05 D 0 6 3 0 4 4 Table 1: Quantities estimated based on 100 draws from SDEs simulating exe- cutions of the different plans in EXP1 and EXP2. A: estimated collision prob- ability [%]; B: averaged path length away from goal; C: averaged sqr. dist. of final state to goal; D: number of collision resolution rounds. Notice, our colli- sion avoidance methods succeed in preventing collisions. In EXP1 the FP-WAIT method failed to reach its first goal in time which is reflected in the sqr. distance to goal measure. Note the discrepancies in avg. path length are relatively low due to convexity effects and the contribution of state noise to the path lengths. 0 2 4 6 8 10 12 0 5 10 15 20 0 0.5 1 1.5 2 dim. 2 simulated trajectories dim. 1 time 0 2 4 6 8 10 12 0 5 10 15 20 0 0.5 1 1.5 2 dim. 2 simulated trajectories dim. 1 time 0 5 10 15 0 5 10 15 20 0 0.5 1 1.5 2 dim. 2 simulated trajectories dim. 1 time Figure 4: EXP2. Draws from uncoordinated agents’ plans (left), after coordina- tion and collision resolution with methods FP-FREE (centre) and AUC-FREE (right). 16 4 Simulations As a first test, we simulated three simple multi-agent scenarios, EXP1, EXP2 and EXP3. Each agent’s dynamics were an instantiation of an SDE of the form of Eq. 4. We set δ to achieve collision avoidance with certainty greater than 95%. Collision prediction was based on the improved criterion function as per Thm. B.3. During collision resolution with the FREE method each agent a assessed a candidate plan pa according to cost function ca plan(pa) = w1 ca traj(pa) + w2 ca miss(pa) + w3 ca coll(pa). Here ca traj is a heuristic to penalize expected control energy or path length; in the second summand, ca miss(pa) = xa(tf) −xa f 2 penalizes expected deviation from the goal state; the third term ca coll(pa) penalizes collisions (cf. III ). The weights are design parameters which we set to w1 = 10, w2 = 103 and w3 = 106, emphasizing avoidance of mission failure and collisions. Note, if our method was to be deployed in a receding horizon fashion, the parameters could also be adapted online using standard learning techniques such as no-regret algorithms [18,24]. −2 0 2 4 6 8 10 −5 0 5 10 15 0 0.5 1 1.5 2 dim. 2 simulated trajectories dim. 1 time −4 −2 0 2 4 6 8 10 −5 0 5 10 15 0 0.5 1 1.5 2 dim. 2 simulated trajectories dim. 1 time Figure 5: Ex. of EXP3 with 5 agents. Draws from uncoordinated agents’ plans (left), after coordination and collision resolution with methods AUC-FREE (right). uncoord. AUC FP 1 2 3 4 5 6 0 50 100 # agents % of draws with collisions AUC FP 1 2 3 4 5 6 0 20 40 # agents Coord. Method # coord. rounds Figure 6: Recorded results for EXP3 with 1 to 6 agents. Note, all collisions were successfully avoided. EXP1. Collision resolution was done with the WAIT method to update 17 plans. Draws from the SDEs with the initial plans of the agents are depicted in Fig. 3 (left). The curves represent 20 noisy trajectories of agents 1 (red) and 2 (blue). Each curve is a draw from the stochastic differential dynamics obtained by simulating the execution of the given initial plan. The trajectories were simulated with the Euler-Maruyama method for a time interval of I = [0s, 2s]. The spread of the families of curves is due to the random disturbances each agent’s controller had to compensate for during runtime. Agent 1 desired to control the state from start state x1 0 = (5, 10) to goal x1 f = (5, 5). Agent 2 desired to move from start state x2 0 = (5, 0) via intermediate goal x2 f1 = (5, 7) (at 1s) to final goal state x2 f2 = (0, 7). While the agents meet their goals under the initial plans, their execution would imply a high probability of colliding around state (5, 6) (cf. Fig. 3 (left), Tab. 1). Coordination with fixed priorities (1 (red) > 2 (blue)) yields conflict-free plans (Fig. 3 (centre)). However, agent 2 is forced to wait too long at its start location to be able to reach intermediate waypoint x2 f,1 in time and therefore, decides to move directly to its second goal. This could spawn high social cost due to missing one of the designated goals (Tab. 1 ). By contrast, the auction method is flexible enough to reverse the ranking at the detected collision point causing agent 1 to wait instead of 2 (Fig. 3 (right)). Thereby, agent 2 is able to reach both of its goal states in time. This success is reflected by low social cost (see Tab. 1). 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 −2 0 2 4 6 8 10 criterion function coll. detection Time prior coord. after coord. Figure 7: EXP1. Criterion functions for collision detection of agent 2 before and after coordination. The first graph accurately warns of a collision before coordination (as indicated by negative values), whereas the one corresponding to collision-free trajectories is in the positive-half space as desired. EXP2. The setup was analogous to EXP1 but with three agents and dif- ferent start and goal states as depicted in Fig. 4. Furthermore, collisions were avoided with the FREE method with 10 random initializations of the local op- timizer. Coordination of plans with fixed priorities (1 (red) > 2 (blue) > 3 (green) ) caused 2 to avoid agent 1 by moving to the left. Consequently, 3 now had to temporarily leave its start and goal state to get out of the way (see Fig. 4 (centre) ). With two agents moving to avoid collisions social cost was relatively 18 high (see Tab. 1). During coordination with the auction-based method agent 2 first chose to avoid agent 1 (as in the FP method). However, losing the auction to agent 3 at a later stage of coordination, agent 2 decided to finally circumvent 1 by arcing to the right instead of to the left. This allowed 3 to stay in place (see Tab. 1). EXP3. Next, we conducted a sequence of experiments for varying numbers of agents ranging from |A| = 1, .., 7. In each experiment all agents’ start loca- tions were placed on a circle. Their respective goals were placed on the opposite ends of the circle. The eigenvalues of the feedback gain matrices of each agent were drawn at random from a uniform distribution on the range [2,7]. An ex- ample situation for an experiment with 5 agents is depicted in Fig. 5. Collision avoidance was achieved. Note, that despite this setting being close to worst case (i.e. almost all agents try to traverse a common, narrow corridor) the coordination overhead is moderate (see Fig. 6, right). Also, all collisions were successfully avoided (see Fig. 6, left). 5 Conclusions This work considered multi-agent planning under stochastic uncertainty and non-convex chance-constraints for collision avoidance. In contrast to pre-existing work, we did not need to rely on prior space or time-discretisation. This was achieved by deriving criterion functions with the property that the collision probability is guaranteed to be below a freely definable threshold δ ∈(0, 1) if the criterion function attains no negative values. Thereby, stochastic collision detection is reduced to deciding whether such negative values exist. For Lip- schitz criterion functions, we provided an algorithm for making this decision rapidly. We described a general procedure for deriving criterion functions and presented two such functions based on Chebyshev-type bounds. The advantage of using Chebyshev inequalities is their independence of the underlying distribu- tion. Therefore, our approach is applicable to any stochastic state noise model for which the first two moments can be computed at arbitrary time steps. In particular, this would apply to models with state-dependent uncertainty and non-convex chance constraints which, to the best to our knowledge, have not been successfully approached in the multi-agent control literature. Nonetheless, future work could build on our results and derive less conservative criterion func- tions by using more problem-specific probabilistic inequalities. For instance, in simple cases such as additive Gaussian noise, tighter bounds can be given [4] and used in Eq. 2. To enforce collision avoidance, our method modified the agent’s plans un- til no collisions could be detected. To coordinate the detection and avoidance efforts of the agents, we employed an auction-based as well as a fixed-priority method. Our experiments are a first indication that our approach can succeed in 19 finding collision-free plans with high-certainty with the number of required co- ordination rounds scaling mildly in the number of agents. While in its present form, the coordination mechanism does not come with a termination guaran- tee, in none of our simulations have we encountered an infinite loop. For graph routing, [7] provides a termination guarantee of the lazy auction approach under mild assumptions. Current work considers if their analysis can be extended to our continuous setting. Moreover, if required, our approach can be combined with a simple stopping criterion that terminates the coordination attempt when a computational budget is expended or an infinite loop is detected. The computation time within each coordination round depends heavily on the time required for finding a new setpoint and for collision detection. This involves minimizing (t, s) 7→ca plan(pa ↑(t,s)) and ca coll, respectively. The worst- case complexity depends on the choice of cost functions, their domains and the chosen optimizer. Fortunately, we can draw on a plethora of highly advanced global optimisation methods (eg [13, 23]) guaranteeing rapid optimization suc- cess. In terms of execution time, we can expect considerable alleviations from implementation in a compiled language. Furthermore, the collision detection and avoidance methods are based on global optimization and thus, would be highly amenable to parallel processing – this could especially benefit the auc- tion approach. While our exposition was focussed on the task of defining setpoints of feedback- controlled agents, the developed methods can be readily applied to other policy search settings, where the first two moments of the probabilistic beliefs over the trajectories (that would result from applying the found policies) can be computed. References [1] N. Ayanian and V. Kumar. Decentralized feedback controllers for multiagent teams in environments with obstacles. IEEE Trans. on Robotics, 26(5), 2010. [2] D. Bareiss and J. van den Berg. Reciprocal collision avoidance for quadrotor helicopters using lqr-obstacles. In AAAI-12 Workshop on Multiagent Pathfinding, 2012. [3] M. Bennewitz, W. Burgard, and S. Thrun. Exploiting constraints during priori- tized path planning for teams of mobile robots. In IROS, 2001. [4] L. Blackmore, H. Li, and B. Williams. A probabilistic approach to optimal robust path planning with obstacles. In American Control Conference, 2006. [5] J. Calliess, M. Osborne, and S. Roberts. Towards auction-based multi-agent collision-avoidance under continuous stochastic dynamics. Presented at ICML- 2012, Workshop on Markets, Mechanisms, and Multi-Agent Models - Examining the Interaction of Machine Learning and Economics, 2012. [6] J. Calliess, M. Osborne, and S. Roberts. Conservative collision prediction and avoidance for stochastic trajectories in continuous time and space. In Proc. of the 13th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), 2014. 20 [7] J. Calliess, D. Lyons, and U. Hanebeck. Lazy auctions for multi-robot collision avoidance and motion control under uncertainty. Technical Report PARG-01-11, Dept. of Engineering Science, University of Oxford, 2011. [8] D.E. Chang, S.C. Shadden, J.E. Marsden, and R. Olfati-Saber. Collision avoid- ance for multiple agent systems. In Decision and Control, 2003. Proceedings. 42nd IEEE Conference on, volume 1, pages 539–543. IEEE, 2003. [9] D.V. Dimarogonas, S.G. Loizou, K.J. Kyriakopoulos, and M.M. Zavlanos. A feedback stabilization and collision avoidance scheme for multiple independent non-point agents. Automatica, 42(2):229–243, 2006. [10] M. G. Earl and R. D’Andrea. Iterative MILP Methods for Vehicle-Control Prob- lems. IEEE Trans. on Robotics, 2005. [11] M. A. Erdmann and T. Lozano-Perez. On multiple moving objects. Algorithmica, 1987. [12] C. Gardiner. Stochastic Methods-A Handbook for the Natural and Social Sciences. Springer, 4th edition, 2009. [13] D.R. Jones, C.D. Peritunen, and B.E. Stuckman. Lipschitzian optimization with- out the lipschitz constant. JOTA, 79(1), 1991. [14] Sven Koenig, Craig Tovey, X. Zheng, and I. Sungur. Sequential bundle-bid single- sale auction algorithms for decentralized control. In IJCAI, 2007. [15] D. Kostic, S. Adinandra, J. Caarls, and H. Nijmeijer. Collision-free motion coordi- nation of unicycle multi-agent systems. In American Control Conference (ACC), 2010, pages 3186–3191. IEEE, 2010. [16] M. Lagoudakis, V. Markakis, D. Kempe, P. Keskinocak, S. Koenig, A. Kleywegt, C. Tovey, A. Meyerson, and S. Jain. Auction-based multi-robot routing. In Int. Conf. on Robotics: Science and Systems, 2005. [17] Y. Li and K. Gupta. Motion planning of multiple agents in virtual environments on parallel architectures. In ICRA, 2007. [18] Nick Littlestone and Manfred K. Warmuth. The weighted majority algorithm. In IEEE Symposium on Foundations of Computer Science, pages 256–261, 1989. [19] D. Lyons, J. Calliess, and U. Hanebeck. Chance constrained model predictive control for multi-agent systems with coupling constraints. In American Control Conference (ACC), 2012. [20] Daniel Lyons, J.P. Calliess, and U.D. Hanebeck. Chance-constrained Model Pre- dictive Control for Multi-Agent Systems. Arxiv preprint arXiv:1104.5384, 2011. [21] S. Mastellone, D.M. Stipanovi´c, C.R. Graunke, K.A. Intlekofer, and M.W. Spong. Formation control and collision avoidance for multi-agent non-holonomic sys- tems: Theory and experiments. The International Journal of Robotics Research, 27(1):107–126, 2008. [22] T. Sandholm. Algorithm for optimal winner determination on combinatorial auc- tions. Artif. Int., 2002. [23] B. Shubert. A sequential method seeking the global maximum of a function. SIAM J. on Numerical Analysis, 9, 1972. [24] N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Gaussian Process Optimization in the Bandit Setting : No Regret and Experimental Design. In ICML, 2010. 21 [25] D.M. Stipanovi´c, P.F. Hokayem, M.W. Spong, D.D. ˇSiljak, et al. Cooperative avoidance control for multiagent systems. Journal of Dynamic Systems, Mea- surement, and Control, 129:699, 2007. [26] C. Tovey, M. Lagoudakis, S. Jain, and S. Koenig. The generation of bidding rules for auction-based robot coordination. In Multi-Robot Systems: From Swarms to Intelligent Automata, volume 3, pages 3–14, 2005. [27] J. Van den Berg, M. Lin, and D. Manocha. Reciprocal velocity obstacles for real-time multi-agent navigation. In ICRA, 2008. [28] P. Whittle. A multivariate generalization of Tchebichev’s inequality. Quarterly Journal of Mathem., 1958. 22 Supplementary derivations. A Derivations of the covariance and mean of the feedback-controlled agents as deployed in the trajectory planning experiments We will now solve the mean and covariance for the dynamics given by the Ito- SDE describing the dynamics of the feedback controlled agents considered in the experiments of this paper. Our aim is to obtain closed-form solutions avoiding the need to approximate any integrals. This ameliorates the computational burden of our method since mean and covariance matrix functions need to be evaluated frequently in the course of collision detection and resolution. Theorem A.1. For all t ∈[t0, T] let ξ(t), x(t) ∈RD, A = diag(a1, ..., aD), K = diag(k1, ..., kD), B = diag(√ν1, ..., √νD) and let x(t0) be a normally distributed random vector. The solution to the SDE dx = (Ax −K(x −ξ))dt + B dW is a Gaussian process with vector-valued mean function µ : [t0, T] →RD and matrix- valued covariance function C : [t0, T]2 →RD×D. Here the mean components are µj(t) = e(kj−aj)(t0−t) ⟨xj(t0)⟩+ Z t t0 kj e(kj−aj)(˜t−t)ξj(˜t)d˜t and the covariance matrix function is C(s, t) = diag(cov11(s, t), ..., covDD(s, t)) where covjj(s, t) = e(aj−kj)(t+s−2t0)(⟨x2 j(t0)⟩−⟨xj(t0)⟩2) + νj 2(kj −aj)[e(aj−kj)|t−s| −e(kj−aj)(2t0−(s+t))]. Proof. Let qj := kj −aj. Owing to the diagonal form of A, K and B and the independence of the output dimensions of the vector-valued Wiener process, the given SDE decomposes into a system of D indpendent 1-dimensional SDEs dxj = (−qjxj +kjξj)dt+√νjdWj (j = 1, ..., D) which can be treated separately. For ease of notation we omit the subscripts yielding an SDE of the form: dx = (−qx + kξ)dt + √ν dW. To solve each of these SDEs we introduce the substitution y := xeqt. With Ito’s product rule we find dy = x deqt + eqt dx + dx deqt = ξkeqt dt + √νeqtdW where the last equality follows from substitution of the SDE for dx and utilization the formal rules dt dW = 0, (dt)2 = 0. This SDE is solved by y(t) = y(t0) + R t t0 kξ(˜t)eq˜td˜t + R t t0 eq˜t√νdW(˜t) where the last integral is to be interpreted as an Ito integral. Re-substitution yields x(t) = x(t0)eq(t0−t) + Z t t0 k eq(˜t−t)ξ(˜t)d˜t + Z t t0 eq(˜t−t)√νdW(˜t). 23 Random variables x(t) (t ∈[t0, T]) are given as affine transformations of a Wiener process. Since the Wiener process is Gaussian, the solution is clearly a Gaussian process as long as x(t0) is a normally distributed random variable. For notational convenience we define Jt := Z t t0 k eq(˜t−t)ξ(˜t) d˜t, Ωt := Z t t0 eq(˜t−t)√ν dW(˜t) and ct := x(t0)eq(t0−t). Therefore, we can write x(t) = ct + Jt + Ωt. It remains to find the first and second moments. Owing to the linearity of the expectation and, since the expectation of the Ito integral of a non-anticipating integrand is zero, the mean function is: µ(t) = ⟨x(t)⟩= ⟨ct⟩+ ⟨Jt⟩+ ⟨Ωt⟩= ⟨x(t0)⟩ek(t0−t) + Jt. Now, we calculate the second moment: cov(s, t) = ⟨x(s)x(t)⟩−µ(s)µ(t) where ⟨x(s)x(t)⟩= ⟨(cs + Js + Ωs)(ct + Jt + Ωt)⟩ = ⟨csct⟩+ ⟨csJt⟩+ ⟨csΩt⟩+ ... + ⟨ΩsΩt⟩. Once again leveraging that the expectation of an Ito integral with non-anticipating integrand is zero, the cross-terms ⟨cqΩr⟩, ⟨JqΩr⟩(q, r ∈{s, t}) vanish. Hence, we obtain ⟨x(s)x(t)⟩= ⟨csct⟩+ ⟨csJt⟩+ ⟨ctJs⟩+ JtJs + ⟨ΩsΩt⟩ = ⟨csct⟩+ Jt⟨cs⟩+ Js⟨ct⟩+ JtJs + ⟨ΩsΩt⟩ = µ(s)µ(t) + ⟨csct⟩−⟨cs⟩⟨ct⟩+ ⟨ΩsΩt⟩. Hence, cov(s, t) = ⟨csct⟩−⟨cs⟩⟨ct⟩+ ⟨ΩsΩt⟩ = e−q(t+s−2t0)(⟨x2(t0)⟩−⟨x(t0)⟩2) + ⟨ΩsΩt⟩ It is well-known that for non-anticipating f, g interval I we have ⟨ Z I f(t)dW(t) Z I g(t)dW(t)⟩= Z I f(t)g(t)dt. 24 Applying this fact as well as leveraging the independent increments property of the Wiener process yields: ⟨ΩsΩt⟩= ν eq(s+t) Z min{t,s} t0 e2q˜td˜t = ν 2qeq(s+t) [e2k min{t,s} −e2qt0] = ν 2q [eq(2 min{t,s}−(s+t) −eq(2t0−(s+t))] = ν 2q [e−q|t−s| −eq(2t0−(s+t))] Notice, that when altering the plan and hence, ξ, the integral Jt,j := Z t t0 kj e(kj−aj)(˜t−t)ξj(˜t)d˜t has to be computed. For general forms of allowable ξ we would have to rely on numerical approximation methods. Since repeated calculations of solutions need to be done in the course of coordination we will want to alleviate the computational burden thereof as much as possible. This motivated our restric- tion to plans that give rise to step functions for which we can show that Jt is closed-form. Lemma A.2. Let t ≥t0, t, t0 ∈[0, T a]. Given plan pa = (ta i , ζa i ) Ha i=0 where each ζa i = (ζa i,j)D j=1. Let i = arg maxi{ti ≤t0}, i = arg mini{ti ≥t} and I := {i ∈{1, ..., Ha}|i < i ≤i}. Furthermore, let ξa j denote the jth component of step-function setpoint signal ξa. We have Z t t0 kj eqj(˜t−t)ξa j (˜t)d˜t = X i∈I kj qj ζa i,j(eqj(min{ti,t}−t) −eqj(max{ti−1,t0}−t)). Proof. Remember, τ a i = (ta i−1, ta i ] ⊂[0, T a]. We have Z t t0 kj exp(qj(˜t −t))ξa j (˜t)d˜t = Z t t0 kj exp(qj(˜t −t)) Ha X i=1 ζa i,jχτ a i (˜t)d˜t = X i∈I Z min{t,ti} max{t0,ti−1} kj exp(qj(˜t −t))ζa i,jd˜t 25 Calculation of the anti-derivate and substitution of the integration bounds yields the desired result. We immediately get the following corollary Corollary A.3. Let a ∈A be an agent with controlled plant dynamics dxa = (Axa −K(xa −ξa))dt + B dW where for all t ∈[t0, T]: ξa(t), xa(t) ∈RD, A = diag(a1, ..., aD), K = diag(k1, ..., kD), B = diag(√ν1, ..., √νD). Let xa(t0) be a normally distributed random vector. Assume a’s plan is pa = (ta i , ζa i ) Ha 0=1 where each ζa i = (ζa i,j)D j=1. Let t ≥t0, i = arg maxi{ti ≤t0}, i = arg mini{ti ≥t} and I := {i ∈{1, ..., Ha}|i < i ≤i}. Furthermore, let ξa j denote the jth component of step-function reference signal ξa. The solution to agent a’s SDE is a Gaussian process with vector-valued mean function µ : [t0, T] →RD and matrix-valued covariance function C : [t0, T]2 → RD×D. Here the mean components are µj(t) = ekj(t0−t) ⟨xj(t0)⟩ + X i∈I kj kj −aj ζa i,j(e(kj−aj)(min{ti,t}−t) −e(kj−aj)(max{ti−1,t0}−t)) and the covariance matrix function is C(s, t) = diag(cov11(s, t), ..., covDD(s, t)) where covjj(s, t) = e(aj−kj)(t+s−2t0)(⟨x2 j(t0)⟩−⟨xj(t0)⟩2) + νj 2(kj −aj)[e(aj−kj)|t−s| −e(kj−aj)(2t0−(s+t))]. B Deriving Lipschitz numbers So far, we have established how knowledge of the Lipschitz number of a Lipschitz function can be utilized to exclude the presence of any negative function values on a compact domain based on a finite number of function evaluations. To employ this insight in the context of collision detection we will have to know a Lipschitz number L of the criterion functions γa,r. We may consider two cases: • We have a belief quantified as a distribution over the smallest Lipschitz constant L′. Let the cumulative distribution function of this belief be denoted by F : R →[0, 1]. That is, F(x) = Pr[L′ < x]. If we desire a guaranteed success of collision detection of at least δ ∈(0, 1) we invoke Alg.1 with a Lipschitz number L ≥min{x ∈R|F(x) ≥δ} to detect non- positive values of γa,r. 26 • If we do not have such a belief function or desire complete certainty in collision detection success it may be possible to derive a Lipschitz number for the γa,r based on the mean and covariance functions of the agent’s stochastic trajectories. In turn, these may be derived from the agents’ SDEs. How this can be accomplished is the subject of the remainder of this section. B.1 Lipschitz arithmetic In preparation of the derivations of Lipschitz numbers, we need to establish a few basic properties of Lipschitz continuous functions. As a convention, Lφ will always denote the Lipschitz constant of a Lipschitz continuous function φ. Lemma B.1 (Lipschitz arithmetic). Let, I, J ⊂R+. Let f : R →R be Lipschitz on I with Lipschitz number LI(f) and g : R →R be Lipschitz on J with Lipschitz number LJ(g). We have: 1. Mapping t 7→|f(t)| is Lipschitz on I with constant LI(f). 2. If g is Lipschitz on all of J = f(I) the concatenation g ◦f : t 7→g(f(t)) is Lipschitz on I with Lipschitz constant LI(g ◦f) ≤LJ(g) LI(f). 3. Let r ∈R. r f : x 7→r f(x) is Lipschitz on I having a Lipschitz constant LI(r f) = |r| LI(f). 4. f + g : t 7→f(t) + g(t) has Lipschitz number at most LI(f) + LJ(g). 5. Let mf = supt∈I f(t) and mg = supt∈I g(t). Product function f · g : x 7→f(x) g(x) has a Lipschitz number on I which is at most (mf LJ(g) + mg LI(f)). 6. Let h(t) = max{f(t), g(t)}, ∀t ∈I∩J. We have LI∩J(h) = max{LI(f), LJ(g)}. 7. Let b := inft∈I |f(t)| > 0 and let φ(t) = 1 f(t), ∀t ∈I. Then LI(φ) ≤ b−2 LI(f) on I. 8. f cont. differentiable on I ⇒LI(f) = supt∈I | ˙f(t)|. 9. Let c ∈R, f(t) = c, ∀t ∈I. Then LI(f) = 0. 10. LI(f 2) ≤2 s(f) LI(f). 11. f cont. differentiable ⇒∀q ∈Q : LI(f q) = |q| supτ∈I |f q−1(τ) ˙f(τ)|. Proof. 1) We show |f| has the same Lipschitz number as f. Let t, t′ ∈I arbitrary. We enter a case differentiation: 1st case: f(t), f(t′) ≥0. Hence, |f(t)| −|f(t′)| = f(t) −f(t′) f Lipschitz ≤ LI(f)|t −t′|. 27 2nd case: f(t) ≥0, f(t′) ≤0. Note, |y| = −y, iffy ≤0. Hence, |f(t)| −|f(t′)| ≤ |f(t)| + |f(t′)| = |f(t)| −f(t′) = f(t) −f(t′) ≤LI(f) |t −t′|. 3rd case: f(t) ≤0, f(t′) ≥0. Completely analogous to 2nd case. 4th case: f(t), f(t′) ≤0. |f(t)| −|f(t′)| = f(t′) −f(t) = f(t) −f(t′) f Lipschitz ≤ LI(f)|t −t′|. 2) For arbitrary t, t′ ∈I we have: g(f(t)) −g(f(t′))| ≤LJ(g) |f(t) −f(t′)| ≤LJ(g) LI(f) |t −t′| where the two inequalities are due to the Lipschitz properties of g and f, respectively. 3) For arbitrary t, t′ ∈I, r ∈R we have: r f(t) −r f(t′)| = |r| |f(t) −f(t′)| ≤|r| LI(f) |t −t′|. 4) For arbitrary t, t′ ∈I, r ∈R we have: g(t) + f(t) −(g(t′) + f(t′))| = g(t) −g(t′) + f(t) −f(t′)| ≤ g(t) −g(t′) + f(t) −f(t′)| ≤(LJ(g) + LI(f)) |t −t′|. 5) Let t, t′ ∈I, d := f(t) −f(t′). g(t)f(t) −g(t′)f(t′) = g(t)(f(t′) + d) −g(t′)f(t′) = g(t) −g(t′)  f(t′) + g(t)d ≤ g(t)−g(t′) |f(t′)|+ g(t) |d| ≤LI(g)|t−t′||f(t′)|+ g(t) LI(f)|t−t′| ≤ LI(g)|t−t′| supt′∈I{|f(t′)|}+supt∈I{ g(t) } LI(f)|t−t′| =  LI(g) supt′∈I{|f(t′)|}+ supt∈I{ g(t) } LI(f)  |t −t′|. 6) Proof in ”‘ Nick Weaver, Lipschitz algebras”’. 7) Let t, t′ ∈I. 1 f(t)− 1 f(t′) = f(t′) f(t′)f(t)− f(t) f(t′)f(t) = f(t′)−f(t) f(t′) f(t) ≤LI(f)|t−t′| inft∈I |f(t)|. 8) Define ℓ:= supt∈I | ˙f(t)| = LI(f). In two steps, we show that ℓis the smallest Lipschitz constant. Firstly, we show that it is a Lipschitz constant: Let t, t′ ∈I, t < t′. Due to the mean value theorem ∃ξ ∈[t, t′] ⊂I : |f(t)−f(t′)| |t−t′| = | ˙f(ξ)| ≤ℓ. Secondly, we show that ℓis the smallest Lipschitz constant: Let ¯ℓbe another Lipschitz con- stant such that ¯ℓ≤ℓ. Since I is compact and ˙f is continuous there is some ξ ∈I such that ˙f(ξ) = ℓ. Pick any sequence (tk)∞ k=1 such that tk k→∞ −→ξ. ∀k : tk ∈I and ¯ℓis a Lipschitz constant on I. Hence, ¯ℓ≥|f(tk)−f(ξ)| |tk−ξ| k→∞ −→| ˙f(ξ)| = ℓ. Thus, ¯ℓ= ℓ. 9) Immediate consequence of 8). 28 10) L(f 2) = L(f f) ≤s(f)L(f)+L(f)s(f) where the last inequality follows from property 5). 11) L(f q) 8)= supτ∈I | d d tf q(τ)| = |q| supτ∈I |f q−1(τ) ˙f(τ)|. Notice, that several of the inequalities in the Lemma are not tight (Eg. Ineq. (5), (10) ). Therefore, it may sometimes be better not to apply it if instead one is able to determine the Lipschitz constant directly to yield a lower Lipschitz number. Lemma B.2. For 0 < a < b let J ⊂R+ be the domain interval of square root function √· : J →R+, t 7→ √ t such that inf J = a and sup J = b. We have LJ(√·) ≤ 1 2√a where LJ(√·) denotes the Lipschitz constant of the square root function on J. Proof. Applying Lem. B.1 and leveraging differentiability of the square root function reduces the problem of determining a Lipschitz constant to finding supt∈J | d√· dt (t)| = supt∈J | 1 2 √ t|. Since d2 dt2 √· = − 1 4√·3 attains only negative values on J ⊂R+ we know that the first derivative d√· dt = 1 2√· is strictly monotonously decreasing. Thus, supt∈J | d√· dt (t)| = d√· dt (inf J) = 1 2 √ inf J = 1 2√a. B.2 An alternative collision-criterion function and the deriva- tion of its Lipschitz number Theorem B.3 (Alternative collision criterion function). Let spatial dimen- sionality D = 2. Let δa ∈(0, 1) denote the maximum upper bound on in- stantaneous collision probability at time t agent a ∈A is allowed to toler- ate. Let µa(t) ∈RD be the mean of trajectory xa and Ca ij(t) be the spatial between dimensions i and j at time t. For i ∈{1, 2}, j ∈{1, 2} −{i} let ra i (t) := q 1 2δa r Ca ii(t) + √ Ca ii(t)Ca jj(t)(Ca ii(t)Ca jj(t)−(Ca ij(t))2) Ca jj(t) . Let a, r be two agents’ plants whose radii sum to Λa,r with state trajectories xa, xr, respectively. Define ba,r j (t, t′) := ra j (t) + rr j(t′) + Λa,r. The function γa,r(t) := max j∈{1,...,D}{|µa j(t) −µr j(t)| −ba,r j (t, t)} is a valid criterion function. That is, γa,r(t) > 0 ⇒Pr[Ca,r(t)] < δa. Proof. (Sketch) Let t ∈I be an arbitrary but fixed time. It is straight-forward to adapt the proof of Thm. 2 in [20] to our case showing that Pr[Ca,r(t)] < δa if |µa j(t) −µr j(t, t′)| −bj(t, t′) > 0 for at least one j ∈{1, . . . , D}. Hence, {t ∈I|γa,r(t, t) > 0} ⊂{t ∈I|Pr[Ca,r(t)] < δa}. 29 Definition B.4. For future reference we define s : f 7→supt∈I |f(t)| and ι : f 7→inft∈I |f(t)| on the space of continuous functions on interval I. Theorem B.5. Let D = 2 be the dimensionality of state space. For any agent a let µa j : I = [t0, tf] →R denote the jth component of a’s mean function and Ca ij(t) the covariance of agent a’s trajectory between dimension j and i at time t. For q ∈{a, r} and all i, j ∈{1, . . . , D} assume the µq j, Cq ij are Lipschitz on I with Lipschitz numbers L(µq j) and Lipschitz numbers L(Cq ij), respectively. Let γa,r denote the collision criterion function between agents a and r as defined in Thm. B.3. Then γa,r is Lipschitz on I with Lipschitz constant L(γa,r) ≤ max j∈{1,...,D}{L(µa j −µr j) + L(ba,r j )} ≤ max j∈{1,...,D}{L(µa j) + L(µr j) + L(ba,r j )} where γa,r, ba,r, αa,r are defined as in Def. B.3 and ∀i ∈{1, 2}, j ∈{1, 2} −{i} ∀q ∈{a, r} we have: 1. L(ba,r j ) ≤1 2 L(αa j) + L(αr j)  , 2. L(αq i ) ≤ q 1 2δq 1 ι(gi)L(gi) where gi(t) := Cq ii(t) + r (Cq ii)2 −Cq ii Cq ij 2 Cq jj and where (i) L(gi) ≤L(Cq ii)+Q L  (Cq ii)2 +Q s( Cq ij 2)L  Cq ii Cq jj  +Q L( Cq ij 2)s  Cq ii Cq jj  , and (ii) L(gi) ≤L(Cq ii)+Q L  (Cq ii)2 +Q s(Cq ii)L Cq ij 2 Cq jj  +Q L Cq ii  s Cq ij 2 Cq jj  . Here, Q = inft∈I  (Cq ii(t))2 −Cq ii(t) Cq ij(t) 2 Cq jj(t)  = inft∈I(gi −Cq ii)2(t). 3. also, L(αq j) ≤ 1 2√aL  Cq jj Cq ii  s(αq i )+s r Cq jj Cq ii  L(αq 1), where a = inft∈I Cq jj(t) Cq jj(t), 4. L  Cq ii Cq jj  ≤L(Cq jj)ι(Cq jj)−2s(Cq ii)+s( 1 Cq jj ) L(Cq ii) if ι(Cjj) = inft∈I |Cjj(t)| > 0, 5. and similarly, L  (Cq ij)2 Cq jj  ≤L(Cq jj)ι(Cq jj)−2s((Cq ij)2) + s( 1 Cq jj ) L((Cq ij)2) if ι(Cq jj) = inft∈I |Cq jj(t)| > 0. 6. ∀i, j ∈{1, 2} : L (Cq ij)2 ≤2 s(Cq ij) L Cq ij  . 30 Proof. The equalities follow from successively applying Lem. B.1 to the defini- tions of the parts of the criterion function given in Thm. B.3. In our derivations we will note which of the properties of the Lemma we utilized as a superscript above the inequality sign. We have L(γa,r) def. = L(maxj∈{1,...,D}{|µa j −µr j| −ba,r j }) (6) = maxj∈{1,...,D} L(|µa j −µr j| −ba,r j ) ≤maxj∈{1,...,D} L(|µa j −µr j|) + L(ba,r j ) (1,3,4) ≤ maxj∈{1,...,D} L(µa j) + L(µr j) + L(ba,r j ). Proof of Ineq. 1): By definition, L(ba,r j ) = L( 1 2(αa j(t) + αr j(t′)) + ∆a,r) (3,4) ≤ 1 2(L((αa j) + L(αr j)) + L(∆a,r) (9) = 1 2(L((αa j) + L(αr j)). Furthermore, to prove the remaining inequalities, assume q ∈{a, r}. Proof of Ineq. 2): Let g(t) := Cq ii(t)+ r (Cq ii)2 −Cq ii Cq ij 2 Cq jj . Then, L(αq 1) (3) = q 2 δa L(√g) ≤ q 2 δq 1 2ι(g)L(g) where the last inequality follows from Lem. B.2 and Lem. B.1.(2) as before. Furthermore, L(g) (4) ≤L(Cq ii) + L( r (Cq ii)2 −Cq ii Cq ij 2 Cq jj ) = L(Cq ii) + L r (Cq ii)2 −Cq ii Cq ij 2 Cq jj  = L(Cq ii) + Q L  (Cq ii)2 −Cq ii Cq ij 2 Cq jj  where according to Lem. B.2, Q = ι  (Cq ii)2 −Cq ii Cq ij 2 Cq jj  . Hence, L(g) ≤L(Cq ii) + Q L  (Cq ii)2 + Q L  Cq ii Cq ij 2 Cq jj  . Proof of Ineq. 3): We have L(αq j(t)) def = L( r Cq jj Cq ii αq i ) (5) ≤s( r Cq jj Cq ii )L(αq i ) + L( r Cq jj Cq ii )s(αq i ) (2) ≤s( r Cq 22 Cq ii )L(αq i ) + LJ(√·)L( Cq jj Cq ii )s(αq i ) where J = Cq jj Cq ii (I). Inequality 2) now follows from applying Lem. B.2. From here we can make two alternative derivations: i) L(g) ≤L(Cq 11)+Q L  (Cq 11)2 +Q s( Cq 12 2)L  Cq 11 Cq 22  +Q L( Cq 12 2)s  Cq 11 Cq 22  . Alternatively, one can obtain: ii) L(g) ≤L(Cq 11)+Q L  (Cq 11)2 +Q s(Cq 11)L Cq 12 2 Cq 22  +Q L Cq 11  s Cq 12 2 Cq 22  . Proof of Ineq. 4): L  Cq ii Cq jj  = L  Cq ii 1 Cq jj  (5) ≤L  1 Cq jj  s  Cq ii  +L  Cq ii  s  1 Cq jj  31 (7) ≤b−2 L  Cq jj  s  Cq ii  + L  Cq ii  s  1 Cq jj  where b ∈R+ chosen such that Cq jj(I) ∩[−b, b] = ∅(we assume such a b exists). A valid choice certainly is b := inft∈I |Cq jj(t)|. Proof of Ineq. 5): Completely, analogous to proof of 4). Proof of Ineq. 6): Consequence of Lem. B.1.(10). The theorem provides a recipe to find a Lipschitz bound for the collision criterion function given known Lipschitz numbers of the trajectories’ means and spatial covariance mappings. However, since most equalities are not tight one should attempt to deter- mine Lipschitz numbers directly wherever possible rather than using the in- equalities provided in Lem. B.1. For instance, if one can determine the best Lipschitz number for L (Cq ij)2 directly (e.g. by utilizing Lem B.1.11) this would normally yield a better Lipschitz constant than obtained by expanding into 2 s(Cq ij) L Cq ij  due to application of Lem. B.1.6. Examining the terms in the inequalities we notice the occurrence of suprema of covariances s(Cij) or inverted covariances of the form s( 1 Cii ). The latter re- quires non-vanishing uncertainty in our model. Furthermore, note, the need to evaluate know the extrema is not to burdensome as they can be rapidly found by pre-existing Lipschitz optimizers which are highly efficient. However, in many cases the optima are known a priori. For instance, if one knows that the uncer- tainty monotonously increases over time we have e.g. s(Cq ij) = Cq ij(inf I) and s( 1 Cq ij ) = Cq ij(sup I). Alternatively, the covariances may allow for an analytic closed-form solution of the extremum which may be analytically derived before run-time. We will revisit these issues in Sec. B.3 where we examine a concrete appli- cation of the theorem to a multi-agent control scenario. B.3 A Lipschitz number for the criterion function of our feedback-controlled agents Let a ∈A be an agent with controlled plant dynamics given by the Ito-SDE dxa = K(ξa −xa)dt + B dW. Here xa(t) ∈RD denotes the agent a′s state (e.g. location), ξa(t) ∈RD is the agent’s setpoint signal at time t ∈I = [t0, tf]. Furthermore, K = diag(k1, ..., kD) > 0 is the controller’s gain matrix and B = diag(√ν1, ..., √νD) reflects the magnitude of the uncertainties (disturbances). Let uncertain start state xa(t0) be a normally distributed random vector. Assume a’s plan is pa = (ta i , ζa i ) Ha 0=1 where each ζa i = (ζa i,j)D j=1. Let t ≥t0, i = arg maxi{ti ≤t0}, 32 i = arg mini{ti ≥t} and I := {i ∈{1, ..., Ha}|i < i ≤i}. Furthermore, let ξa j denote the jth component of step-function reference signal ξa. For ease of notation we will drop the agent superscripts throughout the remainder of this subsection. The solution to agent a’s SDE is a Gaussian process with vector-valued mean function µa : [t0, tf] →RD and matrix-valued covariance function Ca : [t0, tf]2 →RD×D. By applying Ito-calculus to the suitable expectations of the SDE we can show that we have µa j(t) = eka j (t0−t) ⟨xa j(t0)⟩+ ka j e−ka j t Z t t0 eka j ˜tξa j (˜t)d˜t. The covariance matrix function is (s, t) 7→diag(cova 11(s, t), ..., cova DD(s, t)) where cova jj(s, t) = e−ka j (t+s−2t0)(⟨ xa j(t0) 2⟩−⟨xa j(t0)⟩2) + νa j 2ka j [e−ka j |t−s| −eka j (2t0−(s+t))]. Using the notation of Thm. B.5 we have Ca jj(t) = cova jj(t, t) = e−ka j 2(t−t0)(⟨ xa j(t0) 2⟩−⟨xa j(t0)⟩2) + νa j 2ka j [1 −eka j 2(t0−t)] = e−ka j 2(t−t0)Ca jj(t0) −νa j 2ka j  + νa j 2ka j . where Ca jj(t0) is assumed to be a known quantification of the initial state un- certainty. Next we will derive Lipschitz constants for the the component means and covariances which is necessary to derive a Lipschitz number for the collision criterion function. Firstly, we consider the mean function. Defining va(t) := eka j (t0−t) ⟨xa j(t0)⟩, wa(t) := ka j e−ka j t and ˙qa(t) := eka j tξa j (t) we we can restate the component mean function ma j as µa j(t) = va j (t) + wa j (t) qa j (t). Leveraging Lem. B.1 we see that L µa j  ≤L(va j ) + s(wa j ) L(qa j ) + s(qa j ) L(wa j ) (5) ≤s( ˙va j ) + s(wa j ) s( ˙qa j ) + s(qa j ) s(wa j ) (6) where as before s(f) = supt∈I |f(t)| for any function f. Evaluation of the suprema depends on the setpoint signal ξ and on the kj. For instance, choosing a constant setpoint ξ and kj > 0 would yield: • s(˙va j ) = supt∈[t0,tf ] | −ka j ⟨xa j(t0)⟩eka j t0e−ka j t| = |kj⟨xa j(t0)⟩| where the last equality holds since e−ka j t decreases monotonically, 33 • s(wa j ) = supt∈[t0,tf ] |ka j e−ka j t| = |ka j |e−ka j t0, • s( ˙qa j ) = supt∈[t0,tf ] |ξa j eka j t| = |ξa j | eka j tf , • s(qa j ) = supt∈[t0,tf ] | R t t0 eka j ˜tξa j d˜t| = | ξa j ka j [eka j tf −eka j t0]|. Here we leveraged the monotonicity of the exponential function. Next, we derive Lipschitz constants for the covariances: Note the cross-covariances are zero Ca ij(t) = 0, ∀t, i ̸= j. Fortunately, the diagonal of the covariance matrix function are also continuously differen- tiable. In particular, we have ˙Ca jj(t) = −ka j 2e−ka j 2(t−t0)Ca jj(t0) − νa j 2ka j  . We can once again utilize Lem. B.1 yielding a Lipschitz bound L(Ca jj) ≤s ˙Ca jj  ≤ sup t∈[t0,tf ] |ka j 2 Ca jj(t0) −νa j 2ka j  |e−ka j 2(t−t0) = |ka j 2 Ca jj(t0) −νa j 2ka j  |. where the last equality follows from the fact that t 7→exp(−kj2(t −t0)) is monotonically decreasing. In summary, we have found L(Ca jj) ≤|ka j 2 Ca jj(t0) −νa j 2ka j  |, (7) L(Ca 12) = L(Ca 21) = 0, (8) L(µa j) ≤s(˙va j ) + s(wa j )s( ˙qa j ) + s(qa j )s(wa j ), (9) =|ka j ⟨xa j(t0)⟩| + |ka j ||ξa j | eka j (tf −t0) + |ξa j [eka j (tf −t0) −1]|. (10) Next, we combine our estimates of the mean and covariances with Lem. B.1 and Thm. B.5 to derive a Lipschitz number for the criterion function defined in Thm. B.3. Let q ∈{a, r}. Since L(Cq 12) = 0 we have gi(t) := Cq ii(t) + r (Cq ii)2 −Cq ii Cq ij 2 Cq jj = 2Cq ii(t). By Thm. B.5.2, this implies L(αq i ) ≤ q 1 2δq 1 ι(gi)L(gi) = q 1 2δq 1 ι(2Cq ii)L(2Cq ii) = q 1 2δq 1 ι(Cq ii)L(Cq ii) where the last equal- ity is due to Lem. B.1.3 and due to the fact that inft |r f(t)| = |r| inft |f(t)| for all constants r, functions f. Next, we determine ι(Cq ii). By inspecting its deriva- tive, we notice that Cq ii is strictly monotonously increasing iffCq ii(t0) −vq i 2kq i < 0 and monotonously decreasing otherwise. Also, Cq ii does not attain negative values implying Cq ii = Cq ii . Hence, ι(Cq ii) = inf{|Cq ii(I)|} = inf{Cq ii(I)} 34 =    Cq ii(t0), if Cq ii(t0) < vq i 2kq i ; Cq ii(tf), if Cq ii(t0) ≥ vq i 2kq i ; Now, we have all the necessary ingredients to utilize Thm. B.5 in order to wrap-up: L(γa,r) ≤maxj{L(µa j) + L(µr j) + L(ba,r j )} ≤maxj{L(µa j) + L(µr j) + 1 2L(αa j) + 1 2L(αr j)} ≤maxj  |ka j µa j(t0)| + |ka j ||ξa j |eka j (tf −t0) + |ξa j [eka j (tf −t0) −1]| + |kr jµr j(t0)| + |kr j||ξr j|ekr j(tf −t0)+|ξr j [ekr j(tf −t0)−1]|+ 1 2 q 1 2δa 1 ι(Ca jj)L(Ca jj)+ 1 2 q 1 2δr 1 ι(Cr jj)L(Cr jj) . We can see that this Lipschitz number might adopt large values in certain parts of the domain. Therefore, it might be helpful to recompute the Lipschitz numbers adaptively for different parts of the domain. C Utilising priors encoding belief over change of sign of a criterion function Detection of collisions is based on excluding the possibility of negative criterion function values. However, as these functions are non-convex any numerical procedure executed on a digital computer has to achieve this with only a finite number of function evaluations. Given this, what is our confidence in not having missed a negative criterion function value? Thus far, we have proposed using a knowledge (i.e. a prior) about a Lips- chitz number of the criterion function to rule out collisions in continuous time based on a finite number of samples. In addition to the Lipschitz-based method presented above, we will now consider an alternative method that assumes a prior belief about the anticipated change of sign of a criterion function. Before commencing it will prove helpful to introduce the notion of a sign change point (SCP). An SCP is a time step which is the border between two changes in sign of a function. More precisely, time t is an SCP of function f if there exist open intervals I′ := (t′, t) and I′′ := (t, t′′) such that sgn(f(τ ′)) ̸= sgn(f(τ ′′)), ∀τ ′ ∈I′, τ ′′ ∈I′′. To give an example, consider the function φ : t 7→−t χR−(t) + 0 χ[0,1](t) + (t −1)χR>1(t). As before, χS denotes the indicator function of set S. Function φ has exactly two SCPs – at t = 0 and t = 1. Resuming with our discussion, assume we are given f(t0), ..., f(tk) on a lat- tice of times (0 = t0 < ... < tk = T). If f(ti) ≤0 for some ti we will want to conservatively assume a collision has occurred. On the other hand, if all evaluations are positive we desire to specify our confidence that all intermittent unobserved values are. This is the case if no SCPs occur. The presence of an odd number of SCPs between two time steps ti, ti+1 is detectable by checking sgn(f(ti)) ̸= sgn(f(ti+1)). In fact, if the total number n of SCPs in [0, T] is 35 odd we will detect a change of sign. By contrast, if an even number of SCPs occur we have sgn(f(ti)) = sgn(f(ti+1)) and hence, will be oblivious of negative function values in the interval (ti, ti+1). Now, assume we are given a distribution Q : N →[0, 1] representing our belief over the number of occurring SCPs. By the law of total probability, our belief that we will miss the existence of a collision is X n∈2N Pn,kQ[n] (11) where Pn,k denotes the probability of missing the existence of a collision during collision detection given that n SCPs occur in the interior of the lattice. In preparation of the next theorem we need the following result: Lemma C.1 (Improved bound). Given a set S = {s1, ..., sn} of n ∈2N objects let P be the number of ways the set can be partitioned into pairs, i.e. P = |{{P1, ..., Pn/2}|∀i ̸= j : Pi ∩Pj = ∅, S i Pi = {s1, ..., sn}, ∀i∃q ̸= r : Pi = {sq, sr}}|. We have P ≤ n(n−1)/2 n/2  . Proof. We can create (n2 −n)/2 = n(n −1)/2 distinct sets of the form {sq, sr} of cardinality two. That is, |T| = n(n −1)/2 where T = {{si, sj}|i ̸= j, i, j = 1, ..., n}. To generate a partition {P1, ..., Pn/2} we need to select a subset of T con- taining n/2 two-element sets. Conservatively (not taking into account that not every n/2 -element subset is an actual partition), this could be done in at most n(n−1)/2 n/2  ways. Hence P ≤ n(n−1)/2 n/2  . Theorem C.2. Assume we are given n ∈2N0 SCPs s1, ..., sn whose locations are drawn independently from an identical distribution (drawn i.i.d.). Further- more, we are given a grid of test points 0 = t0 < t1 < ... < tk = T where the intermediate times are chosen such that ∀i ∈{1, ..., n}, j ∈{1, ..., k} : Pr[si ∈ (tj−1, tj)] = 1/k. The probability of missing the existence of a collision by look- ing for non-positive elements in the sample f(t0), ..., f(tk) is: Pn,k ≤ P √ kn ≤ n(n−1)/2 n/2  √ kn where P is a function of n (but not of k) as defined in Lemma C.1. In particular, we have limk→∞Pn,k = 0. Proof. We define the sample space Ω:= {(b1, ..., bn) ∈{1, ..., k}n} where each bi ∈{1, ..., k} denotes the index of the time interval (“bin”) (tbi−1, tbi] the ith SCP si falls into (i = 1, ..., n). Due to the assumption that the assignment of SCP to bin is i.i.d., each sample has equal probability and we can compute Pn,k as a Laplace probability. That is, Pn,k = |G| |Ω| where G is the set of events 36 describing that no bin contains an odd number of SCPs (because if at least one does contain an odd number we detect the presence of a collision). Obviously, |Ω| = kn. On the other hand, G = {(v1, ..., vk)| Pk j=1 vj = n, ∀j ∈{1, ..., k} : vj ∈ {0, ..., n}∩2N0} where vj ∈{0, ..., n} denotes the number of SCPs falling into bin j ∈{1, ..., k}. We will find an upper bound on G′s cardinality by constructing a finite set H for whose cardinality one can easily establish √ k n P as an upper bound. We show that one can define an injective function ψ : G →H. The latter establishes that |G| ≤|H|. Thus, |G| |Ω| ≤ |H| |Ω| ≤ P √ k n which will hence conclude the proof. We generate H by invoking a two-stage process (where in each stage it is easy to enumerate all possible elements that are generated). In the first stage, we partition the SCPs into n/2 pairs (which we always can since we assumed n to be even). In the second stage, we assign these pairs to the bins in which the pairs are merged into joint sets of SCPs. Therefore, H = {(M1, ..., Mk)| M1 ⊂ {s1, ..., sn}, ..., Mk ⊂{s1, ..., sn}, |M1|, ..., |Mk| ∈2N0, |M1|+...+|Mk| = n, H = S i Mi}. Let P be the number of ways in which one can partition the n SCPs into n/2 (unordered) pairs i.e. P = |{{P1, ..., Pn/2}|∀i ̸= j : Pi ∩Pj = ∅, S i Pi = {s1, ..., sn}, ∀i∃q ̸= r : Pi = {sq, sr}}|. (Cf. Lem. C.1 for a bound). In the second stage, the pairs are distributed among the k bins (intervals) (which can be done in B = kn/2 ways) before the sets within each bin are merged. The number of different paths the process can take to generate an element in H is P (number of partitions into pairs) multiplied with B (number of ways the pairs constituting the partition can then be distributed into the bins). By construction, each final assignment (subsets of SCPs to bins) generated by the two-stage process is an element of H. Conversely, let (M1, ..., Mk) ∈H then it is easy to verify it could be generated by the two-stage process (however, there may be multiple paths in the process generating the same element of H). Hence, |H| ≤P B. We finalize our considerations by defining the function ψ : G →H, (v1, ..., vn) 7→ (M1, ..., Mk) where M1 := {s1, ..., sv1} and for i > 1: Mi := {s1+wi, ..., svi+wi} where wi = P j