arXiv:1510.06496v2 [cs.SY] 14 Dec 2016 Synthesizing least-limiting guidelines for safety of semi-autonomous systems Jana Tumova and Dimos V. Dimarogonas Abstract— We consider the problem of synthesizing safe-by- design control strategies for semi-autonomous systems. Our aim is to address situations when safety cannot be guaranteed solely by the autonomous, controllable part of the system and a certain level of collaboration is needed from the uncontrollable part, such as the human operator. In this paper, we propose a systematic solution to generating least-limiting guidelines, i.e. the guidelines that restrict the human operator as little as possible in the worst-case long-term system executions. The algorithm leverages ideas from 2-player turn-based games. I. INTRODUCTION Recent technological developments have enhanced the ap- plication areas of autonomous and semi-autonomous cyber- physical systems to a variety of everyday scenarios from industrial automation to transportation and to housekeeping services. These examples have a common factor; they in- volve operation in an uncertain environment in the presence of highly unpredictable and uncontrollable agents, such as humans. In robot-aided manufacturing, there is a natural combination of autonomy and human contribution. In semi- autonomous driving, the vehicle is partially controlled au- tomatically and partially by a human driver. Even in fully autonomous driving, passengers and pedestrians interact with the vehicle and actively influence the overall system safety and performance. The need for obtaining guarantees on behaviors of these systems is then even more crucial as the stakes are high. Formal verification and formal methods- based synthesis techniques were designed to provide such guarantees and recently, they have gained a considerable amount of popularity in applications to correct-by-design robot control. For instance, in [12], [18] temporal logic control of robots in uncertain, reactive environments was addressed. In [11] control synthesis for nondeterministic systems from temporal logic specifications was developed. Loosely speaking, these works achieve the provable guaran- tees by accounting for the worst-case scenarios in the control synthesis procedure. The uncertainty is therein treated as an adversary, which however, often prevents the synthesis procedure to find a correct-by-design autonomous controller. In this paper, we take a fresh perspective on correct-by- design control synthesis. We specifically focus on situations when the desired controller does not exist. In contrast to the above mentioned approach, we view the uncertain, uncontrollable elements in the system as collaborative in The authors are with the ACCESS Linnaeus Center, School of Electri- cal Engineering, KTH Royal Institute of Technology, SE-100 44, Stock- holm, Sweden and with the KTH Centre for Autonomous Systems. This work was supported by the H2020 ERC Starting Grant BUCOPHSYS. {tumova,dimos}@kth.se the sense that they have as much interest in keeping the overall system behavior safe, effective, and efficient as the autonomous controller does. At the same time, we still view them as to a large extent uncontrollable in the sense that they still have their own intentions and we cannot force them to follow literal step-by-step instructions. In contrast, we aim to advise them on what not to do if completely necessary, while keeping their options as rich as possible. For example, consider a collaborative human-robot man- ufacturing task with the goal of assembling products ABC through connecting pieces of types A and C to a piece of type B. The human operator can put together A with B or with BC, whereas the autonomous robot can put together B or AB with C. Our goal is to guarantee system safety, meaning that the human and the robot do not work with the same piece of type B at the same time. While we can design a controller for the robot that does not reach for a piece being held by a human, we cannot guarantee that the human will not reach for a piece being held by the robot. To that end, we aim to synthesize guidelines for the human, i.e. advise that reaching for a piece that the robot holds will lead to the safety violation. Under the assumption that the human follows this advise, the safety is guaranteed. Yet, this advise is still much less restrictive for the human operator than if the human-robot system was considered controllable as a whole. Namely, in such a case, a correct-by-design controller could dictate the human to always touch only solo B pieces while the robot would be supposed to work only with AB pieces pre-produced by the human. Clearly, the former mentioned guidelines allow for much more freedom of the human’s decisions as the human may choose to work with an instance of B piece or BC piece. A similar situation occurs in an autonomous driving scenario with a pedestrian crossing the street. If the pedestrian jumps right in front of the car, the collision is unavoidable. A possible guideline for the human enabling the system safety would be not to ever cross the street. This is however a very limiting constraint. Instead, advising the human not to cross the street if the car is close seems quite reasonable. This paper introduces a systematic way to synthesize least-limiting guidelines for the uncontrollable elements in (semi-)autonomous systems, such as humans in human-robot systems, that allow the autonomous part of the system to maintain safety. Similarly as in some related work on correct- by-design control synthesis (e.g., [11]), we model the overall system state space as a two-player game on a graph with a safety winning condition. The autonomous, controllable entity takes the role of the game protagonist, whereas the un- controllable entity is the adversary. We specifically work with situations, where the protagonist does not have a winning strategy in the game. We formalize the notion of adviser as a function that “forbids” the application of certain adversary’s inputs in certain system states. Furthermore, we classify the advisers based on the level of limitation they impose on the adversary. Finally, we provide an algorithm to find a least- limiting adviser that allows the protagonist to win the game, i.e. to keep the system safe. We also discuss the use of the synthesized advisers for on-the-fly guidance of the system execution. In this work, we do not focus on how the interface between the adviser and the uncontrollable element, such as human, should look like. Rather than that, the contribution of this paper can be summarized as the development of a theoretical framework for automated synthesis of reactive, least-limiting guidelines and control strategies that guarantee the system safety. Related work includes literature on synthesis of envi- ronment assumptions that enable a winning game [6] and on using counter-strategies for synthesizing assumptions in generalized reactivity (1) (GR(1)) fragment of LTL [13], [1]. These works however synthesize the assumptions in the form of logic formulas, whereas we focus on guiding the adversary through explicitly enumerating the inputs that should not be applied. Synthesis of maximally permissive strategies is considered in [4] and also in discrete-event systems literature in [17], where however, only controllable inputs are being restricted. Our approach is different to the above works, since we aim for systematic construction of reactive guidelines in the sense that if the least-limiting adviser is not followed, a suitable substitute adviser is supplied if such exists. We also use a different criterion to measure the level of limitation that is the worst-case long-term average of restrictions as opposed to the cumulative number of restrictions considered in [6] or the size of the set of behaviors considered in [4]. Other related literature studies problems of minimal model repair [3], [7], synthesis of least-violating strategies [9], [16], or design of reward structures for decision-making processes in context of human-machine interaction [14]. This work can be also viewed in the context of literature aimed at collaborative human-robot control, e.g., [15], [10]. The paper is structured as follows. In Sec. II we introduce necessary notation and preliminaries. In Sec. III, we state our problem. In Sec. IV, we introduce the synthesis algorithm in details and discuss the use of the synthesized solution for on- the-fly guidance. Sec. V concludes the paper and discusses future research. Throughout the paper, we provide several illustrative examples demonstrating the developed theory. II. PRELIMINARIES Given a set S, we use 2S, |S|, S∗, Sω to denote the powerset of S, the cardinality of S, and the set of all finite and infinite sequences of elements from S, respectively. Given a finite sequence w and a finite or an infinite sequence w′, we use w·w′ to denote their concatenation. Let w(i) and w⇝j denote the i-th element of word w and the prefix of w that ends in w(j), respectively. Furthermore, assuming that S is a set of finite sequences and S′ is a set of finite and/or infinite sequences, S · S′ = {w · w′ | w ∈S ∧w′ ∈S′}. Z denotes the set of integers. Definition 1 (Arena) A 2-player turn-based game arena is a transition system T = (S, ⟨Sp, Sa⟩, sinit, Up, Ua, T ), where S is a nonempty, finite set of states; ⟨Sp, Sa⟩is a partition of S into the set of protagonist (player p) states Sp and the set of adversary (player a) states Sa, such that Sp∩Sa = ∅, Sp∪Sa = S; sinit ∈Sp is the initial protagonist state; Up is the set of inputs of the protagonist; Ua is the set of inputs of the adversary; T = Tp ∪Ta, is a partial injective transition function, where Tp : Sp × Up →Sa and Ta : Sa × Ua →Sp. Note that in a protagonist state, only an input of the protag- onist can be applied, and analogously, in an adversary state, only an input of the adversary can be applied. We assume that from a protagonist state, the system can only transition to an adversary state and vice versa. This assumption is not restrictive, since it can be easily shown that any game arena with Tp : Sp × Up →S and Ta : Sa × Ua →S can be transformed to satisfy it. Loosely speaking, each transition from a protagonist state to a protagonist state is split into two transitions, to and from a new adversary state. Analogous transformation can be applied to the transitions from adversary states to adversary states. Let U si i = {ui ∈Ui | Ti(si, ui) is defined} denote the set of inputs of player i ∈{p, a} that are enabled in the state si ∈Si. Arena T is non-blocking if |U si i | ≥1, for all i ∈ {p, a} and all si ∈Si and blocking otherwise. A play in T is an infinite alternating sequence of protagonist and adversary states π = sp,1sa,1sp,2sa,2 . . ., such that sp,1 = sinit and for all j ≥1 there exist up,j ∈Up, ua,j ∈Ua, such that Tp(sp,j, up,j) = sa,j, and Ta(sa,j, ua,j) = sp,j+1. Note that for each play π, π(2k) ∈Sa, while π(2k −1) ∈Sp, for all 1 ≤k. A play prefix π⇝j = π(1) . . . π(j) is a finite prefix of a play π = π(1)π(2) . . .. Let PlaysT denote the set of all plays in T . If a set of plays Plays ˙T of a blocking arena ˙T is nonempty, then ˙T can be transformed into an equivalent non- blocking arena T via a systematic removal of blocking states and their adjacent transitions that are defined inductively as follows: (i) each si ∈Si, i ∈{p, a}, such that U si i = ∅is a blocking state and (ii) if Ti(si, ui) is a blocking state for each ui ∈U si i , then si, i ∈{i, p} is a blocking state, too. Then Plays ˙T = Plays T . A deterministic control strategy (or strategy, for short) of player i ∈{p, a} is a partial function σT i : S∗· Si →Ui that assigns a player i’s enabled input ui ∈ U si i to each play prefix in T that ends in a player i’s state si ∈Si. Strategies σT p , σT a induce a play πσT p ,σT a = sp,1sa,1sp,2sa,2 . . . ∈(Sp · Sa)ω, such that sp,1 = sinit, and for all j ≥1, Tp(sp,j, σp(sp,1sa,1 . . . sp,j)) = sa,j, and Ta(sa,j, σa(sp,1sa,1 . . . sp,jsa,j)) = sp,j+1. A strategy σT i is called memoryless if it satisfies the property that σT i (s1 . . . sn) = σT i (s′ 1 . . . s′ m) whenever sn = s′ m. Hence, with a slight abuse of notation, memoryless control strategies are viewed as functions ςT i : Si →Ui. The set of all strategies of player i in T is denoted by ΣT i . The set of all plays induced by all strategies in ΣT p , ΣT a , i.e. the set of all plays in T is PlaysΣT p ,ΣT a = {πσT p ,σT a | σT p ∈ΣT p , σT a ∈ ΣT a }. Analogously, we use PlaysσT p ,ΣT a = {πσT p ,σT a | σT a ∈ ΣT a } to denote the set of plays induced by a given strategy σT p and by all strategies σT a ∈ΣT a . A game G = (T , W) consists of a game arena T and a winning condition W ⊆PlaysΣT p ,ΣT a that is in general a subset of plays in T . A safety winning condition is WSafe = {π ∈PlaysΣT p ,ΣT a | for all j ≥1. π(j) ∈Safe}, where S = ⟨Safe, Unsafe⟩is a partition of the set of states into the safe and unsafe state subsets. A protagonist’s strategy σT p is winning if PlaysσT p ,ΣT a ⊆W. Let ΩT p ⊆ΣT p denote the set of all protagonist’s winning strategies. Let T = (S, ⟨Sp, Sa⟩, sinit, Up, Ua, T ) be an arena and w : S×S →Z be a weight function that assigns a weight to each (s, s′), such that there exists u ∈Up ∪Ua, where (s, u, s′) ∈ T . Then (T , w) can be viewed as an arena of a mean-payoff game. The value secured by protagonist’s strategy σT p is ν(σT p ) = inf σT a ∈ΣT a lim inf n→∞ 1 n n X j=1 w(πσT p ,σT a (j), πσT p ,σT a (j+1)). An optimal protagonist’s strategy σT ∗ p secures the optimal value ν(σT ∗ p ) = supσT p ∈ΣT p ν(σT p ). Several algorithms exist to find the optimal protagonist’s strategy, see, e.g., [5]. For more details on games on graphs in general, we refer the interested reader e.g., to [2]. III. PROBLEM FORMULATION The system that we consider consists of two entities: the first one is the autonomous part of the system that we aim to control (e.g., a robotic arm), and the second one is the agent that is uncontrollable, and to a large extent unpredictable (e.g., a human operator in a human-robot manufacturing scenario). The overall state of such system is determined by the system states of these entities (e.g., the positions of the robotic and the human arms and objects in their common workspace and the status of the manufacturing). In this paper, we consider systems with a finite number of states Q (obtained, e.g., by partitioning the workspace into cells). The system state can change if one of the entities takes a decision and applies an input (e.g., the robot can move the arm from on cell to another, or the human can pick up an object). For simplicity, we assume that the entities take regular turns in applying their inputs. This assumption is however not too restrictive as we may allow the entities to apply a special pass input ǫ that does not induce any change to the current system state. To model the system formally, we call the former, control- lable entity the protagonist, the latter, uncontrollable entity the adversary, and we capture the impacts of their inputs to the system states through a game arena (see Def. 1) T = (S, ⟨Sp, Sa⟩, sinit, Up, Ua, T ). (1) The set of the arena states is S = Q×{p, a} and each arena state s = (q, i) ∈S is defined by the system state q ∈Q and the entity i ∈{p, a} whose turn it is to apply its input, i.e. (q, p) ∈Sp, and (q, a) ∈Sa, for all q ∈Q. Behaviors of the system are thus captured through plays in the arena. The goal of the former, controllable entity is to keep the system safe, i.e. to avoid the subset of unsafe system states, while the latter entity has its own goals, such as to reach a certain system state, etc. Formally, the protagonist is given a partition of states S = ⟨Safe, Unsafe⟩and the corresponding safety winning condition WSafe. The arena T together with the safety winning condition WSafe establish a game (T, WSafe). Example 1 Consider the simplified manufacturing scenario outlined in the introduction. A system state is determined by the current pieces in the workspace and their status; each of them is either on the desk, held by the human, or by the robot: Q ⊆2{A,B,C,AB,BC,ABC}×{desk,human,robot}. The robot acts as the protagonist and the human as the adversary. sinit = {(A, desk), (B, desk), (C, desk)}, a  is an example of a system initial state. The inputs of the robot are Up =  {grabp, dropp} × {A, B, C, AB, BC, ABC} ∪ {connectp} × {(B, C), (AB, C)} and similarly, Ua =  graba, dropa}×{A, B, C, AB, BC, ABC}∪{connecta}× {(A, B), (A, BC)}}. The transition function reflects the ef- fect of inputs on the system state. For instance, T {(A, desk), (B, desk), (C, desk)}, a  , (graba, A)  = = {(A, human), (B, desk), (C, desk)}, p  , or T {(A, desk), (B, robot), (C, robot)}, p  , connectp, (B, C)  = {(A, human), (AB, robot)}, a  . Note that the transition function does not have to be manually enumerated. Rather than that, it can be generated from conditions, such as T {(x, y)} ∪ Z, a  , (graba, x)  = {(x, human)} ∪Z, p  , applied to all x ∈{A, B, C, AB, AC, ABC}, y ∈{desk, robot}, Z ⊆ ({A, B, C, AB, AC, ABC}\{x})×{desk, human, robot}. The problem of finding a protagonist’s winning control strategy σT p guaranteeing system safety has been studied before and even more complex winning conditions have been considered [2]. In this work, we focus on a situation when the protagonist does not have a winning control strategy. For such cases, we aim to generate a least-limiting subset of ad- versary’s control strategies that would permit the protagonist to win. Loosely speaking, this subset can be viewed as the minimal guidelines for the adversary’s collaboration. Note that this problem differs from the supervisory control of discrete event systems as we do not limit only the application of controllable, but also the uncontrollable inputs. However, it also differs from the synthesis of controllers for fully controllable systems as we aim to limit the adversary’s application of uncontrollable inputs as little as possible. We formalize the guidelines for the adversary’s collaboration through the notion of adviser and adviser restricted arena. Definition 2 (Adviser) An adviser is a mapping α : Sa → 2Ua, where α(sa) ⊆U sa a represents the subset of adversary’s inputs that are forbidden in state sa. Given an arena T = (S, ⟨Sp, Sa⟩, sinit, Up, Ua, Tp ∪Ta), and an adviser α, the adviser restricted arena is ˙T α = (S, ⟨Sp, Sa⟩, sinit, Up, Ua, ˙T α p ∪˙T α a ), where ˙T α p = Tp and ˙T α a = Ta \ {(sa, ua, sp) | ua ∈α(sa)}. The set of all plays in ˙T α is denoted by Plays ˙α. If α(sa) = U sa a for some sa ∈Sa, the adviser restricted arena becomes blocking, and hence, not every sequence sp,1sa,1sp,2sa,2 . . . sa,k, satisfying sp,1 = sinit, and for all 1 ≤j ≤k, 1 ≤ℓ< k, ˙T α p (sp,j, σp(sp,1sa,1 . . . sp,j)) = sa,j, and ˙T α a (sa,ℓ, σa(sp,1sa,1 . . . sp,ℓsa,ℓ)) = sp,ℓ+1, can be extended to a play. However, if Plays ˙α is nonempty, we can transform ˙T α into a non-blocking adviser restricted arena T α = (Sα, ⟨Sα p , Sα a ⟩, sinit, Up, Ua, T α p ∪T α a ) (2) that has the exact same set of plays Playsα = Plays ˙α as ˙T α as outlined in Sec. II. Let us denote the sets of all protagonist’s and adversary’s strategies in T α by Σα p and Σα a, respectively. Playsσα p ,Σα a refers to the set of plays induced by σα p ∈Σα p and Σα a in T α. If however ˙Playsα is empty, a non-blocking adviser restricted arena T α does not exist. Given the winning condition WSafe, we define a good adviser α as one that permits the protagonist to achieve safety in the non-blocking adviser restricted arena T α. Definition 3 (Good adviser) An adviser α is good for (T , WSafe) if there exists a non-blocking adviser restricted arena T α and a protagonist’s strategy σα p ∈Σα p , such that Playsσα p ,Σα a ⊆WSafe. Given a good adviser α, the set of protagonist’s winning strategies is denoted by Ωα p ⊆Σα p . Since there might be more good advisers, we need to distinguish which of them limit the adversary less and which of them more. To that end, we associate each adviser with a cost, called adviser level of limitation. Definition 4 (Adviser level of limitation) Given an arena T and a good adviser α, we define the adviser level of limitation λ(α) = inf σα p ∈Ωα p γ(σα p ), where (3) γ(σα p ) = sup σα a ∈Σα a lim sup n→∞ 1 n n X j=1 α(πσα p ,σα a (2j)) . (4) In other words, λ(α) is the worst-case long-term average of the number of forbidden inputs along the plays induced by the best-case protagonist’s strategy σα p . The choice of the worst-case long-term average is motivated by the fact that although the adversary can be advised, it cannot be controlled. On the other hand, the consideration of the best- case σα p is due to the protagonist being fully controllable. We provide some intuitive explanations on the introduced terminology through the following illustrative example. Example 2 (Safety game and adviser) An example of a game arena with a safety winning condition WSafeis given in Fig. 1. (A). The squares illustrate the protagonist’s states and the circles illustrate the adversary’s ones. Transitions (A) s1 s2 s3 s4 s5 s6 s7 up1 up2 ua1 ua2 ua3 up3 up4 ua4 ua5 up5 up6 up7 ua6 ua7 (B) (C) (D) Fig. 1: (A) An example of a game arena with a safety winning condition. The protagonist’s and adversary’s states are illustrated as squares and circles, respectively. The safe set Safe is in green, the unsafe set Unsafe in blue. Transitions are depicted as arrows between them and they are labeled with the respective inputs that trigger them. (B) – (D) show three different advisers αB, αC and αD, respectively, via marking the forbidden transitions in red. are depicted as arrows between them and they are labeled with the respective inputs that trigger them. The safe states in Safe are shown in green and the unsafe ones in Unsafe are in blue. Fig. 1.(B)-(D) show three advisers αB, αC and αD, respectively, via marking the forbidden transitions in red. In Fig. 1.(B), αB(s2) = {ua3}, αB(s4) = {ua4, ua5}, and αB(s6) = {ua6, ua7}. In Fig. 1.(C), αC(s2) = {ua3} and αC(s4) = αC(s6) = ∅. Finally, in Fig. 1.(D), αD(s2) = {ua2, ua3} and αD(s4) = αD(s6) = ∅. For αB, the non-blocking adviser restricted arena con- tains states SαB = {s1, s2, s3}. The set of protago- nist’s strategies in T αB is ΣαB p = {σαB p }, such that σαB p (π(1) . . . π(2j)s1) = up1 and σαB p (π(1) . . . π(2j)s3) = up3, for all play prefixes π(1) . . . π(2j), j ≥0 of all plays π ∈PlaysΣ αB p ,Σ αB a . Since σαB p is winning, αB is good. It is easy to see that the set of protagonist’s winning strategies and the set of all adversary’s strategies in T αB induce a set of plays PlaysΩ αB p ,Σ αB a = {s1s2π(3)s2π(5)s2π(7)s2 . . . | π(2j + 1) ∈{s1, s3}, for all j ≥1}. The strategy σαB p ∈ ΩαB p is therefore associated with the value γ(σαB p ) = supσ αB a ∈Σ αB a lim supn→∞ 1 n Pn j=1 αB(πσ αB p ,σ αB a (2j)) = lim supn→∞ 1 n Pn j=1 αB(s2) = 1, and the level of limi- tation of αB is λ(αB) = 1. Although it might seem that adviser αB is more limiting than αC, it is not the case. The non-blocking adviser restricted arena T αC in this case contains all states from T , SαC = S. However, the set of winning protagonist’s strategies ΩαC p in T αC is analo- gous as in case (B). Namely, if σαC p (π(1) . . . π(2j)s1) = up2 or σαC p (π(1)π(2) . . . π(2j)s3) = up4, the resulting play would not be winning for the protagonist as all ad- versary’s choices in s4 lead to an unsafe state. Hence, PlaysΩ αC p ,Σ αC a = PlaysΩ αB p ,Σ αB a and the level of limitation of αC is λ(αC) = 1. Finally, αD is more limiting than αB and αC. Following similar reasoning as above, we can see that PlaysΩ αD p ,Σ αD a = {s1s2s1s2s1s2s1s2 . . .}, but since |αD(s2)| = 2, we have λ(αD) = infσ αD p ∈Ω αD p γ(σαD p ) = lim supn→∞ 1 n Pn j=1 αD(s2) = 2. Problem 1 Consider T = (S, ⟨Sp, Sa⟩, sinit, Up, Ua, T ), and a safety winning condition WSafe given via a partition S = ⟨Safe, Unsafe⟩. Synthesize an adviser α⋆, and a protagonist’s winning strategy σα⋆⋆ p , such that: (i) α⋆is good and σα⋆⋆ p ∈Ωα⋆ p , (ii) λ(α⋆) = infα∈A λ(α), where A is the set of all good advisers for (T , WSafe), i.e. λ(α⋆) is least-limiting and (iii) γ(σα⋆⋆ p ) = infσα⋆ p ∈Ωα⋆ p γ(σα⋆ p ), i.e. σα⋆⋆ p is optimal. IV. SOLUTION Our solution builds on several steps: first, we generate a so-called nominal adviser, which assigns to each adversary state the set of forbidden inputs. We prove that the nominal adviser is by construction good, but does not have to be least-limiting. Second, building on the nominal adviser, we efficiently generate a finite set of candidate advisers. Third, the structural properties of the candidate advisers inherited from the properties of the nominal adviser allow us to prove that the problem of finding α⋆and σα⋆⋆ p can be transformed to a mean-payoff game. By that, we prove that at least one σα⋆⋆ p is memoryless and hence we establish decidability of Problem 1. Finally, we discuss how the set of the candidate advisers and their associated optimal protagonist’s winning strategies can be used to guide an adversary who disobeys a subset of advises provided by a least-limiting adviser. A. Nominal adviser The algorithm to find the nominal adviser α0 is sum- marized in Alg. 1. It systematically finds a set of states Losing, from which reaching of the unsafe set Unsafe cannot be avoided under any possible protagonist’s and any adversary’s choice of inputs. The set Losing is obtained via the computation of the finite converging sequence Unsafe = Losing0 ⊂Losing1 ⊂. . . ⊂Losingn−1 = Losingn = Losing, n ≥0, where for all 0 ≤j < n, Losingj+1 is the set of states each of which either already belongs to Losingj or has all outgoing transitions leading to Losingj (line 15). The nominal adviser α0 is set to forbid all transitions that lead to Losing (line 12). By construction, the algorithm terminates in at most |S| iteration of the while loop (lines 9–16). The following three lemmas summarize the key features of α0 computed according to Alg. 1. The first two state that, if there exists a good adviser for (T , WSafe), then the nominal adviser is good. The third states that, if the nominal adviser forbids the adversary to apply an input ua ∈α0(sa) in a state sa, then there does not exist a less limiting good adviser α′, such that ua ̸∈α′(sa). Algorithm 1: The nominal adviser α0 Data: T = (S, ⟨Sp, Sa⟩, sinit, Up, Ua, T ), and unsafe set Unsafe ⊆S Result: α0 : Sa →2Ua 1 forall the sa ∈Sa do 2 α0(sa) := ∅ 3 end 4 forall the sa ∈Unsafe do 5 α0(sa) := U sa a 6 end 7 Losing0 := Unsafe 8 j := 0 9 while j = 0 or Losingj ̸= Losingj−1 do 10 forall the sp ∈Losingj do 11 forall the sa, ua, such that T (sa, ua) = sp do 12 α0(sa) := α0(sa) ∪{ua} 13 end 14 end 15 Losingj+1 := Losing j ∪{si ∈Si | S ui∈Usi i {Ti(si, ui)} ⊆Losingj, i ∈{a, p}} 16 j := j + 1 17 end 18 Losing := Losingj Lemma 1 If sinit ∈Losing then there does not exist a good adviser for (T , WSafe). Proof: Suppose that sinit ∈Losing and there exist a good adviser α for (T , WSafe). Then there exists a non- blocking adviser restricted arena T α and a protagonist’s strategy σα p ∈Σα p , such that Playsσα p ,Σα a ⊆WSafe in T α. Consider a play π = sp,1sa,1sp,2sa,2 . . . ∈Playsσα p ,Σα a in T α and note that π does not intersect Unsafe. Suppose that sp,1 = sinit ∈Losing \ Unsafe. Then there exists j ≥1, such that sinit ∈Losingj, but sinit ̸∈Losingj−1 and S up∈Usinit p {T α p (sinit, up)} ⊆Losingj−1 (line 15). Thus, sa,1 ∈Losingj−1. Furthermore, if sa,1 ̸∈Unsafe then S ua∈U sa,1 a {T α a (sa,1, ua)} ⊆Losingj−2 (line 15). Via induc- tive application of analogous arguments, we obtain that there exists k ≥1, such that either sp,k ∈Losing0 = Unsafe or sa,k ∈Losing0 = Unsafe. This contradicts the assumption that π is winning, i.e. the assumption that α is a good adviser. Lemma 2 If sinit ̸∈Losing, then α0 computed by Alg. 1 is a good adviser. Proof: Let sinit ̸∈Losing. From the construction of Losing and α0, it follows that for all sp ∈Sp \ Losing there exists an input up and a state sa ∈Sa \ Losing, such that Ta(sp, up) = sa (line 15). Furthermore, for all sa ∈Sa \ Losing and all ua ∈U sa a \ α0(sa) it holds that T (sa, ua) ∈ Sp\Losing (line 12). Hence, there exists a play π ∈Plays ˙α0 in ˙T α0, and thus there also exists a non-blocking adviser restricted arena T α0 and σα0 p ∈Σα0 p , such that any play π ∈Playsσα0 p ,Σα0 a in T α0 does not intersect Losing. Because Unsafe ⊆Losing, it holds that Playsσα0 p ,Σα0 a ⊆WSafe and adviser α0 is good. Intuitively, Lemmas 1 and 2 state that the restrictions imposed by the nominal adviser α0 were sufficient. As a corollary, it also holds that the non-blocking nominal adviser restricted arena T α0 does not contain any state in Losing and therefore that all plays in T α0 are winning. Note however, that the nominal adviser does not have to be least-limiting. As we illustrate through the following example, imposing additional restrictions on the adversary’s choices might, perhaps surprisingly, lead to the avoidance of adversary’s states, where a high number of inputs are forbidden. Example 3 An example of a safety game is shown in Fig. 2.(A). The result of the nominal adviser computation according to Alg. 1 is illustrated in Fig. 2.(B). Namely, Losing = {s4}, and α0(s2) = {ua2}, α0(s6) = {ua5}, and α0(s7) = ∅. There is only one protagonist’s strategy {σα0 p } = Σα0 p , and it is winning σα0 p ∈ Ωα0 p since Playsσα0 p ,Σα0 a = {s1s2s3s6s3s6 . . . , s1s2s5s7s5s7 . . .}. The level of limitation of α0 is thus λ(α0) = supσα a ∈Σα a lim supn→∞ 1 n Pn j=1 α(πσα p ,σα a (2j)) = lim supn→∞ 1 n  α(s2) + Pn j=2 α(s3)  = 1. Loosely speaking, the worst-case adversary’s strategy σa that respects the nominal adviser α0 takes the play to the left-hand branch of the system. Fig. 2.(C) shows an alternative adviser α′ that guides each play to the right-hand branch of the system. It is good since there is a non-blocking adviser limited arena T α′ and the only protagonist’s strategy σα′ p ∈Σα′ p on T α′ is winning, since Playsσα′ p ,Σα′ a = {s1s2s5s7s5s7 . . .}. The level of limitation of α′ is λ(α′) = lim supn→∞ 1 n  α(s1) + Pn j=2 α(s5)  = lim supn→∞ 1 n  α(s1)  ≪1. Hence, α′ is less limiting than the nominal adviser α0. Lemma 3 Consider an adviser α′ for (T , WSafe) and sup- pose that there exists a state sa ∈Sa and ua ∈Ua, such that ua ∈α0(sa) and ua ̸∈α′(sa). Then α′ is either not good or at least as limiting as the nominal adviser α0, i.e. λ(α0) ≤λ(α′). Proof: The proof is lead by contradiction. Consider an adviser α′ for (T , WSafe). Suppose that there exists a state sa ∈Sa and ua ∈Ua, such that ua ∈α0(sa) and ua ̸∈α′(sa) and α′ is good. Furthermore, let Ωα′ p be the set of protagonist’s winning strategies on the non- blocking adviser restricted arena T α′. and assume that α′ is less limiting that α0, i.e. that λ(α′) < λ(α0). Then from the definition of λ in Eq. (3), there exists a protagonist’s strategy σα′ p ∈Ωα′ p , such that γ(σα′ p ) < λ(α0). Henceforth, there also exists a winning play π = sp,1sa,1sp,2sa,2 . . . ∈ Playsσα′ p ,Σα′ a on T α′ with the property that for some k ≥1, sp,k+1 ∈T (sa,k, ua), where ua ̸∈α′(sa,k) and ua ∈ α0(sa,k). If such a winning play does not exist, it holds that γ(σα′ p ) ≥λ(α0), which contradicts the assumption that α′ is less limiting than α0. Since ua ∈α0(sa,k), it holds sp,k+1 ∈ Losing by construction (line 12). Either sp,k+1 ∈Unsafe, (A) s1 s2 s3 s4 s5 s6 s7 up1 ua1 ua2 ua3 up2 up4 up3 ua4 ua5 ua6 (B) (C) Fig. 2: (A) An example of a game arena with a safety winning condition. The protagonist’s and adversary’s states are illustrated as squares and circles, respectively. The safe set Safe is in green, the unsafe set Unsafe in blue. Transitions are depicted as arrows between them and they are labeled with the respective inputs that trigger them. (B) shows the nominal adviser α0 and Losing via marking the forbidden transitions and the states in Losing in red. (C) shows an alternative adviser α′ that is also good and less limiting than α0. which directly contradicts the assumption that α′ is good, or sp,k+1 ∈Losingj, for some j ≥1. From the iterative construction of Losing, we obtain sa,k+1 ∈Losingj−1 and if sa,k+1 ̸∈Unsafe, then sp,k+2 ∈Losingj−2. By inductive reasoning it follows that there exists ℓ≥k + 1, such that either sp,ℓ∈Losing0 = Unsafe, or sa,ℓ∈Losing0 = Unsafe. This contradicts the assumption that π is winning, i.e. the assumption that α′ is good. Thanks to Lemma 3, we know that there exists a good adviser α⋆that is least-limiting and builds on the nominal one in the following sense: α0(sa) ⊆α⋆(sa), for all sa ∈Sa. Whereas following the nominal adviser is essential for maintaining the system safety, following the additional restrictions suggested by α⋆can be perceived as a weak form of advice. If this advice is not respected by the adversary, safety is not necessarily going to be violated, however, in order to maintain safety, the adversary might need to obey further, more limiting advises. We will discuss later on in Sec. IV-D how to use both the combination of a least-limiting adviser and the nominal one in order to guide the adversary during the system execution (the play on the game arena). B. Least-limiting solution Let ˙Acand denote the finite set of candidate advisers obtained from the nominal adviser α0, ˙Acand = {α | α0(sa) ⊆α(sa), for all sa ∈Sa}. Note that α ∈ ˙Acand does not have to be good since it might not allow for an existence of a non-blocking adviser restricted arena T α. As outlined in Sec. II, it can be however decided whether ˙T α from Def. 2 has an equivalent non-blocking arena T α. Building on ideas from Lemmas 1 and 2, we can easily see that the existence of non-blocking adviser restricted arena T α also implies the existence of a protagonist’s winning strategy σα p ∈Ωα p . In fact, because states from Losing were removed from T α0 (lines 4–6, 9–16 of Alg. 1), all plays in T α are winning and Σα p = Ωα p . Acand = {α ∈˙Acand | α is a good adviser}. (5) From Lemma 3 and the construction of Acand, at least one least-limiting good adviser belongs to Acand. In the remainder of the solution, we focus on solving the following sub-problem for each α ∈Acand. Problem 2 Consider a good adviser α ∈Acand. Find λ(α) and an optimal protagonist’s winning strategy σα⋆ p with γ(σα⋆ p ) = infσα p ∈Ωα p γ(σα p ) = infσα p ∈Σα p γ(σα p ). We propose to translate Problem 2 to finding an optimal strategy to a mean-payoff game on a modified arena eT α: Definition 5 (Mean-payoff game arena eT α) Given a non-blocking adviser restricted arena T α = (Sα, ⟨Sα p , Sα a ⟩, sinit, Up, Ua, T α p ∪T α a ), we define the mean-payoff game arena eT α = (T α, w), where for all eTp(sp, up) = sa, w(sp, sa) = −|α(sa)| and for all eTa(sa, ua) = sp, w(sa, sp) = 0. Lemma 4 Problem 2 reduces to the problem of optimal strategy synthesis for the mean-payoff game eT α. Proof: The optimal strategy eσα⋆ p for the mean-payoff game f T α obtained e.g., by the algorithm from [5] has the value ν(eσα⋆ p ) = sup σα p ∈Σα p inf σα a ∈Σα a lim inf n→∞ 1 n n X j=1 w(πσα p ,σα a (j), πσα p ,σα a (j + 1)) = inf σα p ∈Σα p sup σα a ∈Σα a lim sup n→∞ 1 n n X j=1 −w(πσα p ,σα a (j), πσα p ,σα a (j + 1)) = inf σα p ∈Σα p sup σα a ∈Σa lim sup n→∞ 1 n n X j=1 |α(πσα p ,σα a (2j)| = λ(α). Furthermore, as noted above Σα p = Ωα p and hence the proof is complete. It has been shown in [8] that in mean-payoff games, memoryless strategies suffice to achieve the optimal value. In fact, using the algorithm from [5], the strategy eσα⋆ p takes the form of a memoryless strategy eςα⋆ p : Sα p →Up. C. Overall solution We summarize how the algorithms from Sec. IV-A and Sec. IV-B serve in finding a solution to Problem 1. 1) The nominal adviser α0 is built according to Alg. 1. If there does not exist a non-blocking adviser restricted arena T α0, then there does not exist a solution to Problem 1. 2) The set of candidate advisers Acand is built according to Eq. (5). 3) For each candidate adviser α ∈Acand, the value λ(α) and the memoryless optimal protagonist’s winning strategy ςα⋆ p ∈Ωα p are computed through the translation to a mean- payoff game optimal strategy synthesis according to Def. 5. 4) An adviser α⋆∈Acand with λ(α⋆) = infα∈Acand λ(α) together with its associated optimal strategy ςα⋆⋆ p are the solution to Problem 1. D. Guided system execution Finally, we discuss how the set of good advisers Acand can be used to guide the adversary on-the-fly during the system execution. Given an adviser α ∈Acand, let us call the fact that ua ∈α(sa) an advise. We distinguish two types of advises, hard and soft. Hard advises are the ones imposed by the nominal adviser, ua ∈α0(sa), while soft are the remaining ones that can be violated without jeopardizing the system safety. The goal of the guided execution is to permit the adversary to disobey a soft advise and react to this event via a switch to another, possibly more limiting adviser that does not contain this soft advise. Let ⪯be a partial ordering on the set Acand, where α ⪯α′ if α(sa) ⊆α′(sa), for all sa ∈Sa. Hence, for the nominal adviser α0, it holds that α0 ⪯α, for all α ∈Acand. The system execution that corresponds to a play in T proceeds as follows: 1) The system starts at the initial state scurr = sinit with the current adviser being least-limiting adviser αcurr = α⋆and the current protagonist’s strategy being the memoryless winning strategy ςp,curr = ςα⋆⋆ p . 2) The input ςp,curr(scurr) is applied by the protagonist and the system changes its current state scurr according to Tp. The current state belongs to the adversary. 3) αcurr(scurr) is provided. The adversary chooses an input ua ∈U scurr a . a) If ua ̸∈αcurr(scurr), then the system updates its state scurr according to Ta and proceeds with step 2. b) If ua ∈ α0(scurr) then hard advise is disobeyed and system safety will be unavoidably violated and the system needs to stop immediately. c) If ua ∈αcurr(scurr), but ua ̸∈α0(scurr), then only a soft advise is disobeyed. The current adviser αcurr is updated to α′, with the property that λ(α′) = infα∈A⪯λ(α), where A⪯= {α ∈Acand | α ⪯αcurr} and the current protagonist’s strategy ςp,curr is updated to ςα′⋆ p . The current state scurr is updated according to Ta and the system proceeds with step 2). Example 4 Consider the safety game in Fig. 3.(A). The re- sult of the nominal adviser computation according to Alg. 1 is illustrated in Fig. 3.(B). Namely, α0(s2) = {ua2}, α0(s6) = ∅, α0(s7) = {ua5}, α0(s8) = {ua7, ua8}, and α0(s11) = ua9. The states in Losing are marked in red. Fig. 3.(C) shows the non-blocking adviser restricted arena T α0 with the re- moved states and transitions in light grey. The corresponding optimal protagonist’s winning strategy ςα0⋆ p in T α0 is high- lighted in green in Fig. 3.(B), i.e. ςα0⋆ p (s1) = up1, ςα0⋆ p (s3) = up2, ςα0⋆ p (s5) = up5, and ςα0⋆ p (s9) = up6. The level of limitation of α0 is λ(α0) = lim supn→∞ 1 n(|α0(s2)| + Pn j=2 |α0(s8)|) = lim supn→∞ 1 n(2n−1). Fig. 3.(D) shows least-limiting adviser α⋆. As opposed to α0, α⋆(s2) = {ua2, ua3}, where the advise ua3 ∈α⋆(s2) (in magenta) is soft. Fig. 3.(E) illustrates the non-blocking adviser restricted arena T α⋆. The optimal protagonist’s winning strategy is the only protagonist’s strategy in T α⋆. The level of limitation of α⋆is λ(α⋆) = lim supn→∞ 1 n(|α⋆(s2)| + Pn j=2 |α0(s6)|) = lim supn→∞ 1 n < λ(α0). There exist more good advisers α′ ∈Acand. For each of them, either λ(α′) = λ(α0) or λ(α′) = λ(α⋆). (A) s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 up1 ua1 ua2 ua3 up2 up5 up3 ua4 ua5 up4 ua6 ua7 ua8 up7 up8 ua9 up6 (B) (C) (D) (E) Fig. 3: (A) An example of a game arena with a safety winning condition. (B) The nominal adviser α0 and Losing via marking the forbidden transitions and states in Losing in red. ςα0⋆ p is in green. (C) The non-blocking adviser restricted arena T α0. (D) α⋆ and (E) The non-blocking adviser restricted arena T α⋆. The guided system execution proceeds as follows: The system starts in state scurr = sp1 with αcurr = α⋆and ςp,curr = ςα⋆⋆ p . Input up1 is applied, scurr = s2. Then, αcurr(scurr) = α⋆(s2) is provided. The adversary chooses either ua1, ua2, or ua3, but, through the adviser it is recom- mended not to select ua3 (soft advise) and ua2 (hard advise). If the choice is ua1, the system state is updated to scurr = s3, and in the remainder of the execution, the protagonist and the adversary apply up2 and ua4, respectively, switching between states s3 and s6. If the choice is ua3, a soft advice is disobeyed, the current state becomes s5 and the current adviser and strategy are updated to αcurr = α0 and ςp,curr = ςα0⋆ p , which satisfy that λ(α0) = infα∈A⪯λ(α). Input up5 is then applied and scurr = s8. In the remainder of the execution, the adversary is guided to follow the hard advices ua7, ua8 ∈αcurr(s8), leading the system to switching between s8 and s9. If the choice in s2 is ua2 despite the hard advice, the system reaches an unsafe state. V. CONCLUSIONS AND FUTURE WORK We have studied the problem of synthesizing least-limiting guidelines for decision making in semi-autonomous systems involving entities that are uncontrollable, but partially willing to collaborate on achieving safety of the overall system. We have proposed a rigorous formulation of such problem and an algorithm to synthesize least-limiting advisers for an adversary in a 2-player safety game and we have proposed a systematic way to guide the system execution with their use. As far as we are concerned, this paper presents one of the first steps towards studying the problem of synthesizing guidelines for uncontrollable entities. Future work naturally includes extensions to more complex winning conditions, different measures of level of violation, and continuous state spaces. We also plan to implement the algorithms and show their potential in a case study. REFERENCES [1] R. Alur, S. Moarref, and U. Topcu. Counter-strategy guided refinement of GR(1) temporal logic specifications. In Formal Methods in Computer-Aided Design, pages 26–33. IEEE, 2013. [2] K. R. Apt and E. Gr¨adel, editors. Lectures in Game Theory for Computer Scientists. Cambridge University Press, 2011. [3] E. Bartocci, R. Grosu, P. Katsaros, C. Ramakrishnan, and S. Smolka. Model repair for probabilistic systems. In Tools and Algorithms for the Construction and Analysis of Systems, volume 6605 of LNCS, pages 326–340. Springer Berlin Heidelberg, 2011. [4] J. D. W. I. Bernet, Julien. Permissive strategies : from parity games to safety games. Theoretical Informatics and Applications, 36(3):261– 275, 2002. [5] L. Brim, J. Chaloupka, L. Doyen, R. Gentilini, and J. Raskin. Faster algorithms for mean-payoff games. Formal Methods in System Design, 38(2):97–118, 2011. [6] K. Chatterjee, T. Henzinger, and B. Jobstmann. Environment assump- tions for synthesis. In Concurrency Theory (CONCUR), volume 5201 of LNCS, pages 147–161. Springer Berlin Heidelberg, 2008. [7] T. Chen, E. M. Hahn, T. Han, M. Kwiatkowska, H. Qu, and L. Zhang. Model repair for Markov decision processes. In International Sympo- sium on Theoretical Aspects of Software Engineering (TASE), pages 85–92. IEEE, 2013. [8] A. Ehrenfeucht and J. Mycielski. Positional strategies for mean payoff games. International Journal of Game Theory, 8(2):109–113, 1979. [9] M. Faella. Best-effort strategies for losing states. CoRR, abs/0811.1664, 2008. [10] S. Hirche and M. Buss. Human-oriented control for haptic teleopera- tion. Proceedings of the IEEE, 100(3):623–647, 2012. [11] M. Kloetzer and C. Belta. Dealing with nondeterminism in symbolic control. In Hybrid Systems: Computation and Control (HSCC), pages 287–300, Berlin, Heidelberg, 2008. Springer-Verlag. [12] H. Kress-Gazit, G. E. Fainekos, and G. J. Pappas. Temporal logic- based reactive mission and motion planning. IEEE Transactions on Robotics, 25(6):1370–1381, 2009. [13] W. Li, L. Dworkin, and S. A. Seshia. Mining assumptions for synthesis. In ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE), 2011. [14] M. Mazo and M. Cao. Design of reward structures for sequential decision-making processes using symbolic analysis. In American Control Conference (ACC), 2013, pages 4393–4398, 2013. [15] K. Savla, T. Temple, and E. Frazzoli. Human-in-the-loop vehicle routing policies for dynamic environments. In IEEE Conference on Decision and Control (CDC), pages 1145–1150, 2008. [16] J. Tumova, G. C. Hall, S. Karaman, E. Frazzoli, and D. Rus. Least- violating control strategy synthesis with safety rules. In Hybrid Systems: Computation and Control (HSCC), pages 1–10. ACM, 2013. [17] A. van Hulst, M. Reniers, and W. Fokkink. Maximally permissive controlled system synthesis for modal logic. In Theory and Practice of Computer Science (SOFSEM), volume 8939 of LNCS, pages 230– 241. Springer Berlin Heidelberg, 2015. [18] T. Wongpiromsarn, U. Topcu, and R. M. Murray. Receding horizon control for temporal logic specifications. In Hybrid Systems: Compu- tation and Control (HSCC), pages 101–110, 2010.