arXiv:1409.8536v3 [cs.RO] 8 Oct 2014 1 Optimal Tourist Problems and Anytime Planning of Trip Itineraries Jingjin Yu Javed Aslam Sertac Karaman Daniela Rus Abstract We introduce and study the problem in which a mobile sensing robot (our tourist) is tasked to travel among and gather intelligence at a set of spatially distributed point-of-interests (POIs). The quality of the information collected at each POI is characterized by some non-decreasing reward function over the time spent at the POI. With limited time budget, the robot must balance between spending time traveling to POIs and spending time at POIs for information collection (sensing) so as to maximize the total reward. Alternatively, the robot may be required to acquire a minimum mount of reward and hopes to do so with the least amount of time. We propose a mixed integer programming (MIP) based anytime algorithm for solving these two NP-hard optimization problems to arbitrary precision. The effectiveness of our algorithm is demonstrated using an extensive set of computational experiments including the planning of a realistic itinerary for a first-time tourist in Istanbul. I. INTRODUCTION Imagine that a roboticist travels to Turkey to attend an international conference in Istanbul. Unfortu- nately, due to her busy schedule, our roboticist does not have much time for touring this historic city. Yet, as luck would have it, near the end of her trip, she finds herself with a day of spare time and decides to do some sightseeing. Planning such a day trip, however, turns out to be quite challenging: the roboticist must decide among a large number of point-of-interest (POIs) which ones to go to, how to travel from one POI to another, and how much time she should spend at each POI that she does decide to visit. Naturally, she hopes to get the most out of her tour under her limited time budget. Could we help our roboticist plan an optimal itinerary for such a journey automatically? Alternatively, an environmental scientist may need to plan an automated, GPS-guided trip for an aerial mobile (sensing) robot to collect scientific data at a set of spatially distributed locations. Because of the high cost associated with operating the robot, our scientist, similar to our roboticist in Istanbul, must select a subset of locations for the aerial robot to visit and decide how much effort (time) the robot should spend at each location to perform necessary measurements. Is there a principled method that our environmental scientist can use for planning such a trip with optimality guarantees? In this paper, we propose the Optimal Tourist Problem (OTP) that is motivated by and models after the scenarios mentioned above. In the basic setup, a tourist is interested in visiting some n POIs that are spatially distributed. Each POI is associated with a reward function or learning curve that is non- decreasing over the time spent at the POI. Because traveling between POIs and staying at a POI to gain reward are both time consuming, optimization problems naturally arise. We introduce two such related problems. In the first problem, a reward-maximizing tourist (RMT) seeks to maximize the gained reward given limited time budget. From a dual perspective, in the second problem, a budget-minimizing tourist (BMT) seeks to minimize the time spent to collect a predetermined amount of reward. We provide a mixed integer programming (MIP) based anytime algorithm for solving both RMT and BMT variants of the OTP problem. J. Yu and D. Rus are with the Computer Science and Artificial Intelligence Lab at the Massachusetts Institute of Technology. E-mail: {jingjin, rus}@csail.mit.edu. J. Aslam is with the Department of Computer Science at Northeastern University. Email: jaa@ccs.neu.edu. S. Karaman is with the Department of Aerospace and Astronautics Engineering at the Massachusetts Institute of Technology. E-mail: karaman@mit.edu. This work was supported in part by ONR projects N00014-12-1-1000 and N00014-09-1-1051, and the Singapore-MIT Alliance on Research and Technology (SMART) Future of Urban Mobility project. 2 The primary motivation behind our study of OTP is its potential application to robotic surveillance and monitoring problems such as automated reconnaissance and scientific survey Smith et al. (2011); Grocholsky et al. (2006), which we refer to under the umbrella term of informative path planning (IPP). In an IPP problem, a path is planned to satisfy some information collection objective, sometimes under additional constraints such as path length or total time limit. In Alamdari et al. (2014), an O(logn) approximation algorithm yields iterative TSP paths that minimize the maximum latency (the inverse of the frequency with with a node is visited) across all n nodes in a connected network. In Smith et al. (2012), the authors proposed a method for generating speed profiles along predetermined cyclic (closed) paths to keep bounded the uncertainty of a varying field using single or multiple robots. For the problem of observing stochastically arriving events at multiple locations with a single mobile robot, a (1 + ε)- optimal algorithm was proposed in Yu et al. (2014) to solve the multi-objective optimization problem of maximizing event observation in a balanced manner and minimizing delay between event observa- tions across the locations. Recently, a method called Recursive Adaptive Identification is proposed as a polynomial time polylogarithmic-approximation algorithm for attacking adaptive IPP problems Lim et al. (2014). Sampling based methods Kavraki et al. (1996); LaValle (1998); Karaman and Frazzoli (2011) have also been applied to IPP problems with success. In Hollinger and Sukhatme (2013), Rapidly-Exploring Random Graphs (RRG) are combined with branch-and-bound methods for planning most informative paths. In Lan and Schwager (2013), the authors tackle the problem of planning cyclic trajectories for the best estimation of a time-varying Gaussian Random Field, using a variation of RRT called Rapidly- Expanding Random Cycles (RRC). An optimization problem that is intimately connected to OTP is the Orienteering Problem (OP) Chao et al. (1996); Vansteenwegen et al. (2011); Gavalas et al. (2014), which is obtained when rewards at the POIs are fixed in an RMT problem. The fixed reward is collected in full once a POI is visited. OP, which is easy to see as an NP-hard problem, is observed to be difficult to solve exactly for even medium sized instances with over a hundred of POIs. On the side of approximation algorithms, constant approximation ratios down to (2 + ε) are only known under metric settings for OP with uniform reward across the POIs on undirected graphs Chekuri et al. (2012). No constant ratio approximation algorithm is known for directed graphs. On the other hand, many MIP-based algorithms exist for OP and related problems Vansteenwegen et al. (2011); Gavalas et al. (2014). These algorithms often allow the precise encoding of the problem in the MIP model. A work in this domain that is closest to ours studies an OP problem in which the reward may depend on the time spent at the POIs Erdoˇgan and Laporte (2013). It proposes a solution method that iteratively adds constraints that are violated by the incomplete model. In comparison, our work studies a more general problem that allows multiple starting POIs and arbitrary reward functions. Moreover, we construct a static (i.e. constraints are fixed), arbitrarily precise MIP model that gives rise to a natural anytime algorithm. On the side of trip planning problems, many interesting works De Choudhury et al. (2010); Basu Roy et al. (2011); Yoon et al. (2012) compute “optimal” itineraries according to some reward metric. For example, the authors of De Choudhury et al. (2010) apply a recursive greedy approximation algorithm for OP Chekuri and P´al (2005) to plan suggested itineraries. Most of these work focus on the data mining aspect of trip planning problems, e.g., how POI related data, such as the average visiting times for POIs and tourist preference through POI correlations, may be derived and used. In contrast, we provide a clean separation between two elements of the OTP problem, the transportation model and the reward model, and focus on the interaction between these two elements through an algorithmic study. The rest of the paper is organized as follows. In Section II, we formulate the two variants of OTP, RMT and BMT. In Section III, we provide a step-by-step introduction of our MIP model for solving the proposed OTP variants, after which many generalizations are also presented. In Section IV, we discuss the overall algorithm and some of its important properties in more detail. We present computational simulations in Section V and conclude in Section VI. 3 II. PROBLEM FORMULATION Let the set V = {v1,...,vn} represents n point-of-interests (POIs) in R2. There is a directed edge ei, j between two POI vertices vi,vj ∈V if there is a path from vi to vj that does not pass through any intermediate POIs. When an edge ei, j exists, let di, j denote its length. There is a tourist (alternatively, an agent or a mobile robot) that travels between the POIs following single integrator dynamics. Denoting the tourist’s location as p, when the tourist is traveling from POI to POI, ˙p = u,∥u ∥= 1. Otherwise, ˙p = 0. The tourist is interested in visiting the POIs. To do so, she starts from some base vertex vB ∈B ⊂V with |B| = nB ≤n, travels between the POIs, and eventually returns to vB. For example, B may represent the choices of hotels. For each vi ∈V, she associates a maximum reward ri with the location, which can be gained through spending time at vi. We assume that the obtained reward depends on the time ti the tourist spends at vi. More precisely, the obtained reward is defined as ri fi(ti), in which fi ∈[0,1] is some function of ti that is non-decreasing. We further require that fi is C1 continuous and f ′ i (0) is bounded away from zero. That is, for all 1 ≤i ≤n, f ′ i is continuous and f ′ i (0) ≥λ for some fixed λ > 0. We also assume that f(0) = 0 for convenience (it can be easily verified later that this does not reduce generality). Remark. We mention that no generality is lost by focusing on non-decreasing functions. After presenting our MIP models in Section III, it will become clear that any reasonable fi can be turned into an equivalent non-decreasing function which can then be used in setting up the MIP model. We will revisit this point in Section III-D. The function fi may effectively be viewed as a learning curve. In this paper, two specific types of one-parameter learning curves are studied in detail: linear and exponential. Let λi > 0 denote the learning rate. In the case of a linear learning curve, fi(ti) = λiti, 0 ≤ti ≤1 λi . (1) The exponential learning curve is specified as fi(ti) = 1−e−λiti, 0 ≤ti ≤+∞, (2) which captures the notion of “diminishing return” that are often present in learning tasks. After a trip is completed, our tourist would have traveled through a subset of the edges Etr ⊂E and have spent time t1,...,tn,ti ≥0 at the n POIs. She would have spent a total time of JT := ∑ ei, j∈Etr di, j + n ∑ i=1 ti (3) and gained a total reward of JR := n ∑ i=1 ri fi(ti). (4) Note that some edges ei, j may be passed through by the tourist multiple times, in which case di, j is included once each time ei, j is enumerated in (3). That is, Etr is a multi-set. We define T := {t1,...,ti}, R := {r1,...,rn}, and F := { f1,..., fn}. During the trip planning phase, a tourist often faces the challenging task of planning ahead so as to spend the optimal amount of time to travel and to do sightseeing to gain the most out of a trip. This gives rise to two OTP variants. In the first, our optimal tourist is given a time budget MT, during which she hopes to maximize her total reward. That is, Problem 1 (Reward-Maximizing Tourist (RMT)) Given a 5-tuple (V,B,D,R,F) and a time budget MT > 0, compute the sets Etr and T such that JR is maximized under the constraint JT ≤MT. We do not need to specify the edge set E because it is implicitly fixed by D. The second, equally natural problem is in a sense a dual problem of RMT, in which the goal is to minimize the time spent to achieve a predetermined reward. 4 Problem 2 (Budget-Minimizing Tourist (BMT)) Given a 5-tuple (V,B,D,R,F) and a reward require- ment MR > 0, compute the sets Etr and T such that JT is minimized under the constraint JR ≥MR. Besides RMT and BMT as formulated in this section, many practical variations are possible. For example, it may be the case that a path (starting and ending at hotels, train stations, and so on) is required instead of a closed tour. Alternatively, maybe a multi-day itinerary is more desirable than a one-day itinerary. These variations and a few additional generalizations are also addressed later in this paper (in Section III-D). Remark. We emphasize that the problems formulated in this section apply to an array of scenarios other than itinerary planning for tourists. For example, our tourist may well be a mobile aerial robot equipped with on-board cameras and automated computer vision-based algorithms for traffic monitoring at key intersections in a large city. In this case, spending more time at a given location will allow more observations, leading to higher quality information about the traffic pattern at the given location. Given limited flying time, the aerial robot must balance between traveling around and spending time at important sites to gather more traffic information (under some proper metric). We can easily imagine extensions of this traffic monitoring application to surveillance, reconnaissance, and scientific exploration tasks. III. MIP MODELS FOR BMT AND RMT In this section, we propose mixed integer programming models for solving RMT and BMT using an MIP solver. First, we describe an MIP model derived from an existing one for the orienteering problem (OP) that applies to RMT and BMT problems with |B| = 1 (i.e., a single base) and linear learning curves. The case of |B| = 1 is often referred to as a rooted problem. Then, the MIP model is generalized to allow multiple bases and arbitrary learning curves through linearization. Before moving to model construction, we point out that the proposed problems are computationally intractable, given their similarity to TSP and OP. Proposition 1 RMT and BMT are NP-hard. PROOF. Let ri ≡1 and let the functions from the set F be linear with unit slope, i.e., f ′ i = λi ≡1. The maximum achievable reward is then n and achieving such a reward requires ti = 1 for all 1 ≤i ≤n. Under these restrictions, solving a BMT instance with MR = n is equivalent to finding a TSP tour over all n POI vertices, which is NP-hard. Now, given a time budget MT, the decision problem of whether MT is sufficient for achieving a reward of JR ≥n is NP-hard, implying that RMT is NP-hard as well. □ A. MIP Model for a Single Base and Linear Learning Curves In this subsection, we introduce an MIP model for BMT and RMT with a single base and with the set F being linear functions. These models are partially based on models from Vansteenwegen et al. (2011); Erdoˇgan and Laporte (2013). Without loss of generality, let our tourist start from v1. Because the reward at a given POI only depends on the total time spent at the POI, we also assume that the time the tourist spent at a POI is spent during a single visit to the POI. When a tourist spends time at a POI, we say the tourist stays at the POI. With these assumptions, the tourist will eventually have stayed at some ℓPOIs with the order vs1,...,vsℓ, and have spent time ts1,...,tsℓat these POIs. For i /∈{s1,...,sℓ}, ti = 0. Although the tourist only needs to stay at a POI at most once, she may need to pass through a POI multiple times (e.g., if the POI is a transportation hub). To distinguish these two types of visits to a POI, we perform a transitive closure on the set D. That is, we compute all-pairs shortest paths for vi,vj ∈V,1 ≤i, j ≤n. This gives us a set of shortest directed paths P = {pi, j} with corresponding lengths D′ = {d′ i, j}. We say that the tourist takes a path pi, j if the tourist stays at vj immediately after staying at vi, except when the tourist starts and ends her trip at v1. With this update, the tourist’s final tour is simply ps1,s2,..., psℓ,s1. Let xi j be a binary variable with xi j = 1 if and only if pi, j is taken by the tourist. 5 The number of times that the tourist stays at (resp. leaves after staying) a POI vertex vi is ∑n j=1, j̸=i xi j (resp. ∑n j=1, j̸=i xji). Both summations can be at most one since by assumption, the tourist never stays at a POI twice. The tour constraint then says they must be equal, i.e., ∑n j=1, j̸=ixi j = ∑n j=1, j̸=i xji. Let xi be the binary variable indicating whether the tourist stayed at vi. We have the following edge-use constraints n ∑ j=1, j̸=i xi j = n ∑ j=1, j̸=i xji = xi ≤1, ∀2 ≤i ≤n. (5) The case of i = 1 is special since we need to ensure that v1 is visited, even if the tourist does not actually stay at v1. For this purpose, we add a self-loop variable x11 at v1 and require n ∑ j=1 x1j = n ∑ j=1 xj1 = x1 = 1. (6) The constraints (5) and (6) guarantee that the tourist takes a tour starting from v1. However, they do not prevent multiple disjoint tours from being created. To prevent this from happening, a sub-tour restriction constraint is introduced. Let 2 ≤ui ≤n be integer variables for 2 ≤i ≤n. If there is a single tour starting from v1, then ui can be chosen to satisfy the constraints ui −uj +1 ≤(n−1)(1−xi j), 2 ≤i, j ≤n,i ̸= j. (7) To see that this is true, note that since ui −uj + 1 ≤n −1 regardless the of the values taken by 2 ≤ui,uj ≤n, (7) can only be violated if xi j = 1. The condition xi j = 1 only holds if the path pi, j taken. Setting ui to be the order with which the tourist stays at vi, if xi j = 1, then ui −uj +1 = 0, satisfying (7). On the other hand, if there is another tour besides the one starting from v1 and when vi j = 1, then the RHS of (7) equals zero. For (7) to hold, we must have ui −uj +1 ≤0 ⇒ui < uj. However, this condition cannot hold for all consecutive pairs of POI vertices on a cycle. Thus, (7) enforces that only a single tour may exist. With the introduction of the variables {xi j}, the time spent by the tourist is given by JT = n ∑ i=1 n ∑ j=1, j̸=i xi jdi j + n ∑ i=1 ti. (8) To represent the total reward JR, we introduce a continuous variable wi,1 ≤i ≤n, to denote the reward collected at vi. For a linear fi, λi, the learning rate, is simply the slope of fi. The reward wi and the visiting time ti then satisfy wi ≤rixi, (9) wi = tiλi, (10) The constraint (9) allows reward only if the tourist stays at vi and limits the maximum reward at ri. The constraint (10) reflects the linear dependency of the reward wi over the visiting time ti. The total reward JR is simply JR = n ∑ i=1 wi. (11) Solving RMT with a single base and linear learning curves can then be encoded as a mixed integer program that seeks to maximize JR subject to JT ≤MT, (5), (6), (7), (9), and (10). Similarly, solving BMT with a single base and linear learning curves can be encoded as a mixed integer program that minimizes JT subject to JR ≥MR, (5), (6), (7), (9), and (10). 6 B. Incorporating Multiple Bases We now look at the case of |B| > 1. To enable the selection of any particular vi ∈B, a virtual origin vertex o is created, which is both a source and a sink. Then, each base vertex vi is split into two copies, vin i and vout i . The edges connecting vi to other POI vertices of V are split such that all edges going from vi to other POI vertices are now rooted at vout i and all edges connecting other POI vertices to vi are now ending at vin i . In addition, two crossover edges between vin i and vout i are added, one in each direction. Lastly, an outgoing edge from o to vout i and an incoming edge from vin i to o are added. An illustration of this gadget is given in Figure 1. vi vi out vi in o Fig. 1. [left] A base vertex vi and its outgoing (dotted) and incoming (solid) edges. [right] The gadget that split vi into vin i and vout i , along with the split edges and the newly added four (bold) edges. This gadget is duplicated for every element of B using the same origin vertex o. The basic MIP model from the previous subsection is then updated to enable the routing of the tourist through at least one element of B. For each vi ∈B, we create four additional binary variables to represent whether the four newly added edges are used in a solution. These variables are xo,out i (edge from o to vout i ), xin,o i (edge from vin i to o), xout,in i (edge from vout i to vin i ), and xin,out i (edge from vin i to vout i ). To ensure that at least one vertex of B is used, we add the constraint ∑ vi∈B xo,out i = 1. (12) The edge-use constraints also need to be updated accordingly. Due to the vertex split for vertices from the set B, we have two sets of such edge-use constraints. The constraint (5) now applies to all non-base vertices. The constraint (6) is updated for all base vertices vi ∈B to n ∑ j=1, j̸=i xi j +xout,in i −xin,out i −xo,out i = 0, (13) n ∑ j=1, j̸=i xji +xout,in i −xin,out i −xin,o i = 0. (14) With constraint (12), o goes to exactly one vout i and later returns to vin i . Then, constraints (13) and (14), along with the existing edge-use constraint (5), ensure that one or more tours are created. Finally, to prevent multiple tours from being created, we update the variables ui’s to 1 ≤ui ≤n for 1 ≤i ≤n. For a base vertex vi ∈B, we add the constraint ui −uj +1 ≤(2−xi j −xin,out i )n. (15) If vi is not a base vertex, we require ui −uj +1 ≤(1−xi j)n. (16) Constraints (15) and (16) replace the constraint (7). The constraint (16) has the same effect as the constraint (7) in preventing a separate tour from being created. For base vertices, when xin,out i = 1, which is the case unless xo,out i ̸= 1, the constraint (15) is the same as (16). If xo,out i = 1, then (15) becomes ui −uj +1 ≤n+(1−xi j)n, which always holds. That is, the constraint (15) treats the selected base vertex differently. 7 C. Linearization of Arbitrary Learning Curves To accommodate arbitrary learning curves into our MIP model, a linearization scheme is used. We show that, with carefully constructed linear approximations of fi’s, arbitrarily optimal MIP models can be built. The basic idea behind our linearization scheme is rather simple. Given a C1 continuous fi ∈[0,1] with f ′ i (0) ≥λ > 0, it can be approximated to arbitrary precision with a continuous, piecewise linear function efi such that for arbitrary ε > 0 and all ti ≥0, | fi −efi| fi ≤ε, (17) with efi having the form (see, e.g., Figure 2) efi =        ai,1ti +bi,1, 0 ≤ti ≤ti,1 ai,2ti +bi,2, ti,1 ≤ti ≤ti,2 ..., ... ai,kiti +bi,ki, ti,ki−1 ≤ti ≤∞ (18) A numerical procedure for computing such an efi is provided in Section IV. fi t i,2 t i,3 t i,1 t i,4 ai,2 + t i b i,2 fi~ Fig. 2. Approximation of some fi with a continuous, piecewise linear function (bold dashed line segments). The approximation is concave between [0,ti,2], [ti,2,ti,3], and so on. Once a particular efi is constructed, the constraints on the reward wi must be updated. To make the explanation clear, we use the efi from Figure 2 as a concrete example. Starting from ti = 0, we introduce a new continuous variable t1 i over the first maximally concave segment of fi. In the case of the efi in Figure 2, the first maximally concave segment contains two line segments, ending at ti,2. In this case, we have 0 ≤t1 i ≤ti,2. To represent the reward obtained over the first maximally concave segment, a continuous variable w1 i is introduced, which satisfies the following constraints w1 i ≤ai,1t1 i +bi,1, w1 i ≤ai,2t1 i +bi,2. Then, for the next maximally concave segment, another continuous variable t2 i is introduced. In our example, the second maximally concave segment contains one line segment and thus ti,2 ≤t2 i ≤ti,3. (19) We need to ensure that t2 i is active only if t1 i is maximized. We achieve this through the introduction of an additional binary variable x2 i , which is set to satisfy the constraint x2 i ≤t1 i ti,2 . 8 The constraint ensures that x2 i = 1 only if t1 i is maximized and takes the value ti,2. To avoid potential numerical issues that may prevent x2 i = 1 from happening, in practice, we may write the constraint as x2 i ≤(t1 i +δ)/ti,2, in which δ is a small positive real number. We can then activate t2 i through the constraint t2 i ≤x2 i (ti,3 −ti,2)+ti,2, which also renders the constraint (19) unnecessary. The reward for this second maximally concave segment, w2 i , is then w2 i ≤ai,3t1 i +bi,3 −(ai,2ti,2 +bi,2). After all of efi are encoded as such, we combine the individual time and reward variables into ti and wi as ti = t1 i +(t2 i −ti,2)+..., (20) wi = w1 i +w2 i +.... (21) We note that the additional constraints that are introduced is proportional to the complexity of efi. We now prove that the overall MIP model constructed in this way allows arbitrary approximations of the original problem. Theorem 2 Given an RMT instance specified by a 5-tuple (V,B,D,R,F), MT > 0, and a positive real number ε, a (1+ε)-optimal solution of this RMT instance can be computed by solving a mixed integer programming problem, obtained over a (1+ε/2) piece-wise linear approximation of F. PROOF. Assume that the RMT instance has an optimal solution that has a reward J∗ R and spends t∗ 1,...,t∗ n time at the n POI vertices. Let efi be a piece-wise linear (ε/2)-approximation of fi for 1 ≤i ≤n. Assume that the optimal solution to the RMT instance (V,B,D,R, eF) has a reward J† R and spends t† 1,...,t† n time at the n POI vertices. Using the approximation with the format given in (18) and satisfying (17), we have (1−ε 2) fi(ti) ≤efi(ti) ≤(1+ ε 2) fi(ti) (22) and 1 1+ ε 2 efi(ti) ≤fi(ti) ≤ 1 1−ε 2 efi(ti). (23) Then (by (23)), J† R = n ∑ i=1 fi(t† i ) ≤ 1 1−ε 2 n ∑ i=1 efi(t† i ), (24) in which the summation ∑n i=1 efi(t† i ) is the reward returned by the MIP algorithm. On the other hand, by (22), we have J∗ R = n ∑ i=1 efi(t∗ i ) ≤(1+ ε 2) n ∑ i=1 fi(t∗ i ) (25) which implies that n ∑ i=1 efi(t† i ) ≤(1+ ε 2) n ∑ i=1 fi(t∗ i ). (26) To see that (26) holds, assume instead it is false. Then, by by (22) again, we have (1+ ε 2) n ∑ i=1 fi(t∗ i ) < n ∑ i=1 efi(t† i ) ≤(1+ ε 2) n ∑ i=1 fi(t† i ). 9 We then have J∗ R = ∑n i=1 fi(t∗ i ) < ∑n i=1 fi(t† i ), a contradiction. Putting (24) and (26) together yields J† R ≤1+ ε 2 1−ε 2 J∗ R = (1+ε +o(ε))J∗ R. □ For the BMT problem, since time is split between traveling and actually staying at POIs, a direct (1+ε)- optimality assurance cannot be established. Nevertheless, for a BMT instance with a reward requirement of MR > 0, assuming that the optimal solution requires J∗ T time, we can guarantee that a reward of at least (1−ε)MR is achieved using time no more than J∗ T. Theorem 3 Given a BMT instance specified by a 5-tuple (V,B,D,R,F), MR > 0, and a positive real number ε, let its solution have a required total time of J∗ T. Then, an MIP model can be constructed that computes a solution with JR ≥(1−ε)MR and JT ≤J∗ T. PROOF. For simplicity as well as diversity, we use a piece-wise linear approximation that is slightly different. Instead of making the piece-wise linear function satisfy (17), we use only line segments that are no less fi. That is, we can construct efi such that fi(ti) ≤efi(ti) ≤ 1 1−ε fi(ti). Suppose that the optimal solution to the original BMT instance spend t∗ 1,...,t∗ n time at the n POI vertices. Since for all 1 ≤i ≤n, efi(ti) ≥fi(ti), the approximate MIP model constructed using eF instead of F will not need as much time to reach a reward of MR. That is, the approximate model produces a solution with JT ≤J∗ T. Let the time spent at the POI vertices in the solution to the approximate MIP model be t† 1,...,t† n, then the actual achieved reward is JR = n ∑ i=1 fi(t† i ) ≥ n ∑ i=1 (1−ε) efi(t† i ) = (1−ε)MR. □ D. Extensions and Generalizations Before concluding this section, we briefly mention a few extensions and generalizations of our MIP model. We only cover the RMT problem here. The extension to BMT is straightforward. Multiple tours: In a sense, the MIP model described so far creates a single-day itinerary since the plan is a single tour that starts and ends at the same base. However, our MIP model can be easily generalized to allow the planning of trips with multiple tours. There are two possible generalizations with different applications. The first possibility is to force multiple tours to start at the same base, which represents the problem of a tourist staying at the same hotel for multiple days. Given the number of days m, we may obtain a more general MIP model by simply create m copies of the edge-use variables, i.e., xi j’s. For each copy, a separate maximum time constraint (a daily time limit) is imposed. These m copies are then aggregated together, i.e., through ∑m k=1 xi jk = xi j. We also require that the base POI vertices have either 0 or m incoming edges being used. the The rest of the MIP model remains essentially the same. The second possibility is applicable to multi-robot surveillance problems, in which all tours are disjoint. That is, each mobile robot covers a disjoint set of POIs to cooperatively collect the maximum amount of reward. Note that this implies |B| ≥m. To achieve this, we again create multiple copies of the edge-use variables, enforce the time constraints for each copy, and aggregate the variables. Then, we let the base POI vertices have at most a single incoming edge being used. 10 Non-cyclic trip: The current MIP model forces a (cyclic) tour to be created. Whereas this may be more applicable to tourists, sometimes it may be beneficial to have non-cyclic routes. For example, using multiple hotels may allow a tourist to significantly increase the potential total reward due to reduced travel time. Alternatively, in a surveillance or monitoring problem, a single use probe may be sent to follow a one-time, non-cyclic route. To allow this, we may simply remove the constraint that forces both the incoming and outgoing edge from the origin vertex o to a base vertex to be used. Non-cyclic trip can be directly combined with the multiple-tour generalization. Variations on learning curves: Although we focus on non-decreasing C1 continuous learning curves with first order derivatives bounded away from zero, other types of learning curves can also be supported. The only requirement on the fi’s is that they can be approximated arbitrarily well using a piece-wise linear, continuous function with a finite number of line segments. In particular, we note that the learning curve being non-monotone does not present an issue for the MIP model. Given a general fi that non- monotone, we can turn it into a non-decreasing function over which our MIP model can be applied. To do so, starting from ti = 0, we find the first local maximum, say at ti = ti,1, at which point we augment fi by extending it from fi(ti,1) until it reaches the original fi at a point ti = ti,2 where fi starts increasing again. We then repeat the same iterative process starting from ti = ti,2. Such augmentation of fi is never problematic because our MIP model maximizes the reward using the least amount of time and will never allocate more time at a POI vertex when the reward is less or remains the same. IV. THE ALGORITHM AND ITS ANALYSIS The overall algorithm construction is outlined in Algorithm 1. In Line 1 of the algorithm, it computes all-pairs shortest paths and their respective lengths using a transitive closure based algorithm, for example, the Floyd-Warshall algorithm Floyd (1962); Warshall (1962). Then, in Lines 2-4, the algorithm computes a piece-wise linear (1 +ε/2)-approximation of each fi ∈F, if necessary. Finally, Once D′ is computed and all of eF is built, Lines 5-11 of the algorithm can be carried out according to the steps outlined in Section III. In the rest of this section, we cover two important properties of our algorithm. A. Finite Complexity of Piece-Wise Linear Approximation In Section III, we mentioned that a reasonably nice learning curve can be approximated to arbitrary precision using a piece-wise linear function, which is not difficult to imagine. However, to encode the approximated piece-wise linear function into the MIP model, the function must have finitely many line segments. We now show that the approximation indeed has limited complexity. Theorem 4 Let f ∈[0,1] be a C1 continuous, non-decreasing function with f(0) = 0 and f ′(0) ≥λ for some fixed λ > 0. For any given ε > 0, there exists a piece-wise linear approximation of f containing only finite number of line segments, denoted ef, such that | f(t)−ef(t)| f(t) ≤ε. (27) PROOF. At t = 0, by the continuity of f ′(t), for an arbitrary λε > 0, there exists tδ such that for all 0 ≤t ≤tδ, f ′(0)−λε ≤f ′(t) ≤f ′(0)+λε. (28) Since f ′(0) ≥λ, we obtain from (28) that (1−ε) f ′(0) ≤f ′(t) ≤(1+ε) f ′(0). (29) Then, since f(0) = 0, (29) implies that (1−ε) f ′(0)t ≤f(t) ≤(1+ε) f ′(0)t. (30) 11 Algorithm 1: OPTIMALTOURISTINTINERARY Input : V, the set of POIs, B, the set of bases, D, the set of (incomplete) inter-POI distances, R, the set of maximum POI rewards, F, the set of learning curves, MT (or MR), the time (or reward) constraint, and ε, the required accuracy Output: J∗ R, the maximum attainable reward (or J∗ T, the minimum required time), and Etr, a set of visited edges associated with J∗ R (or J∗ T) %Compute all pairs of shortest paths between all 1 ≤i, j ≤n 1 (P′,D′) ←FLOYDWARSHALL(V,D) %Compute for each fi ∈F,1 ≤i ≤n, a piece-wise linear (1 + ε/2)-approximation 2 for fi ∈F,1 ≤i ≤n do 3 efi ←COMPUTEEPSILONAPPROXIMATION( fi,ε/2) 4 end %Setting up the MIP model and optimize it using an MIP solver 5 (V ′,D′) ←VERTEXSPLIT(V,B,D′) ; %Split v ∈B 6 BUILDMODEL(V ′,D,R, eF) ; %Also builds JR and JT 7 if MT is given then 8 Set JT ≤MT and maximize JR ; %Maximize reward 9 else 10 Set JR ≥MR and minimize JT ; %Minimize time 11 end 12 return J∗ R (or J∗ T), and the associated Etr We let the first (left most) line segment of the approximation ef be simply f ′(0)t for 0 ≤t ≤tδ =: τ1 (see Figure 3 for a graphical illustration). Then, the second inequality of (30) becomes f(t) ≤(1+ε) ef(t) ⇒ 1 1+ε f(t) ≤ef(t) ⇒(1−ε) f(x) ≤ef (t), which implies (27) for 0 ≤t ≤τ1. Same holds for the first inequality of (30). ~ f t1 f (0) ¶ (1 + ")f (0)t ¶ 1 t2 t5 t4 t3 t1 (1+ ")f ( ) t2 (1+ ")f ( ) t3 f f (0)t ¶ (1 { ")f (0)t ¶ Fig. 3. A graphical illustration of the constructive proof for Theorem 4. For the second line segment, we simply extend from (τ1, f ′(0)τ1) either horizontally (when f ′(0)τ1 > f(τ1) or vertically (when f ′(0)τ1 < f(τ1) until the line segment meets f. Let this point on f(t) be (τ2, f(τ2)). The rest of ef can then be iteratively defined starting from the point (τ2, f(τ2)). For the third line segment, we let its end point be (τ3, f(τ3)) such that f(τ3) = min{1,(1+ε) f(τ2)}. Because f is non-decreasing, 12 over τ2 ≤t ≤τ3, f(τ2) ≤f(t) ≤f(τ3) ≤(1+ε) f(τ2), the same holds true for ef over τ2 ≤t ≤τ3. Therefore, over τ2 ≤t ≤τ3, | ef(t)−f(t)| ≤ε f(τ2) ⇒| ef(t)−f(t)| f (t) ≤ε f(τ2) f(t) ≤ε. We can then iteratively define the rest of ef similarly. Because each time we extend ef by (1+ε) and we start from f(τ2) > 0, in finite number of iterations ef reaches 1. □ Remark. We emphasize that the constructive proof of Theorem 4 may yield approximations that are far from the best piece-wise linear approximations. On the other hand, practical, non-linear learning functions often do not require complex piece-wise linear functions to approximate. As an example, when a learning curve from the exponential family is used, e.g., fi(ti) = 1 −e−λiti, a 1.05-approximation of fi can be achieved using only four line segments. Since the derivative of fi can be easily computed in this case, numerically computing the approximation is fairly easy. Moreover, only a one-time computation is required; simple scaling can then extend the computation easily to different learning rates (λi’s) and rewards (ri’s). The initial and the approximated curves for the case of λi = 1 are illustrated in Figure 4. It is straightforward to verify that | efi −fi|/ fi < 0.05. 0 0.2 0.6 1 0 1 2 3 (0.10, 0.10) (0.63, 0.49) (2.01, 0.91) k = 1.00 k = 0.74 k = 0.30 k = 0.01 fi fi~ Fig. 4. A graphical illustration of the constructive proof for Theorem 4. The different k’s indicate the slopes of the corresponding line segments. B. The Anytime Property An very useful property of Algorithm 1 that we obtain for free is that it yields an anytime algorithm. The anytime property is a direct consequence of solving the MIP models for RMT and BMT using an MIP solver, which generally use some variations of the branch-and-bound algorithm Land and Doig (1960). Roughly speaking, a branch-and-bound algorithm works with a (high-dimensional) polytope that contains all the feasible solutions to an optimization problem. The algorithm then iteratively partitions the polytope into smaller ones and truncates more and more of the polytope that are known not to contain the optimal solution. After some initial steps, a tree structure is built and the leaves of the tree contains portions of the original feasibility polytope that are still active. For each of these polytopes, suppose we are working on a maximization problem, it is relatively easy to locate a feasible solution with the correct integrality condition (i.e., a feasible solution in which binary/integer variables get assigned binary/integer values). The maximum of all these feasible solution is then a lower bound of the optimal value. On the other hand, it is also possible to compute for each leaf the maximum achievable objective without respecting the integrality constraints, which yields a lower bound on the optimal value. The difference between the two bounds is often referred to as the gap. When the gap is zero, the optimal solution is found. Over the 13 running course of a branch-and-bound algorithm, if the gap gradually decreases, an anytime algorithm is obtained. For our particular problems, the anytime property is quite useful since computing the true optimal solution to the (potentially approximate) MIP model for RMT and BMT can be very time consuming. We will see in Section V that for medium sized problems, a 1.2-optimal solution, which is fairly good for practical purposes, can often be computed quickly. V. COMPUTATIONAL EXPERIMENTS In this section, we evaluate our proposed algorithm in several computational experiments. In these experiments, we look at the solution structure, computational performance, and an application to planning a day tour of Istanbul. The simulation is implemented in the Java programming language. For the MIP solver, Gurobi Gurobi Optimization (2014) is used. Our computational experiments were carried out on an Intel Core-i7 3930K PC with 64GB of memory. A. Anytime Solution Structure Our first set of experiments was performed over a randomly generated example, created in the following way. The example contains 30 uniformly randomly distributed POIs in a 10×15 rectangle (see Figure 5). Each POI vi is associated with a λi ∈[1,2) and an ri ∈[1,2) that were both uniformly randomly selected. The λi’s and ri’s are selected not to vary by much because we expect that in practice, this will present a more difficult choice for a tourist or a mobile robot. For fi, both linear (e.g., with the form (1)) and exponential (e.g., with the form (2)) types were used, with the learning rates specified by the λi’s. We set ε = 0.05 when we approximate the non-linear fi’s with piece-wise linear functions (that is, we use the linear approximation illustrated in Figure 4 with proper scaling). Note that ε = 0.05 yields a 1.1- optimal MIP model for exponential fi’s. These steps determine the sets V, R, F, eF. We let B to be the set {v1,v9,v17,v25}. For deciding E and D, we let there be an edge between two POI vertices vi,vj if the Euclidean distance between them is no more than 10. Finally, the constraints were set as follows. For RMT, MT = 50 for both linear and exponential fi’s. For BMT, MR = 30.55 for linear fi’s and MR = 25.78 for exponential fi’s. These MR’s were selected because they are the optimal JR value for the respective RMT problems with MT = 50. For each problem instance, we extract the solution after the gap becomes no more than 100%, 50%, 20%, 10%, 5%, 1%, and 0%. These solutions for the RMT instance with linear learning curves are illustrated in Figure 5. Because the large number of POIs involved, we do not list the computed ti’s but point out that, in the linear case, when the set of POIs for staying is selected, it is always beneficial to exhaust the reward at POIs with the largest learning rate since time is best used this way. The computation of these five solutions took 0.96,1.05,2.16,3.70, and 10.2 seconds, respectively. Confirming that the last solution (Figure 5(e)) is indeed the optimal solution took 76 seconds. For the BMT instance with MR = 30.55, we similarly plot the solutions at different accuracies in Figure 6. Note that the optimal solution (Figure 6(e)) yields the same tour as the optimal solution to the corresponding RMT problem (Figure 5(e)). We note that JT is actually smaller than 50 in this case, suggesting MT = 50 is not necessary to reach a reward of JR = 30.55. The computation of these five solutions took 0.48,0.60,3.13,6.19, and 28.60 seconds, respectively. Confirming that the last solution is indeed the optimal solution took 50 seconds. For exponential learning curves, similar results were obtained. The optimal tours for RMT and BMT are illustrated in Figure 7, which, as expected, have the same tour. Computing the optimal solution to these more complex 1.1-optimal MIP models took 27.6 and 30.1 seconds, respectively. B. Computational Performance Since the models for RMT and BMT attempt to solve an NP-hard problem precisely (note that the problem after linearization remains NP-hard), no polynomial time algorithm exists unless P = NP. 14 J = 17.29 R J = 23.52 R (a) (b) J = 27.30 R J = 30.26 R (c) (d) J = 30.55 R (e) Fig. 5. Figures (a) - (e): POIs visited by the best solution to the RMT problem after the gap dips just below 100%, 50%, 20%, 10%, and 5%, respectively. The solution obtained after the gap dips below 5% is in fact the optimal solution for this particular example. The black and the green dots are the POIs and the green dots are the base vertices. J = 91.92 T J = 85.17 T (a) (b) J = 55.92 T J = 49.84 T (c) (d) J = 49.56 T (e) Fig. 6. Figures (a) - (e): POIs visited by the best solution to the BMT problem after the gap dips just below 100%, 50%, 20%, 10%, and 1%, respectively. 15 J = 25.78 R J = 50.00 T (a) (b) Fig. 7. (a) Optimal solution to RMT with exponential learning curves and MT = 50.00. (b) Optimal solution to BMT with exponential learning curves with MR = 25.78. Therefore, our evaluation of the algorithm’s computational performance is limited to an empirical one. For this, two large sets of computations are performed. In the first set of computations, rectangular grids of various sizes were constructed. The POIs reside on the lattice points on these grid, with the reward and learning rate selected uniformly randomly from [1,2). Vertices n/3 and 2n/3 are selected as base vertices. For each choice of grid sizes, 10 example problems are created. For the RMT instances, a time budget of 1.5 times the grid perimeter is used. For the BMT instances, a reward requirement of 0.6 times the grid perimeter is used. These constraints are chosen to allow the tour to go through 10% to 25% of the total POIs. For both RMT and BMT instances, we perform computations with both linear and exponential learning curves (with 5% linearization). The average time, in seconds, required to compute a solution up to given accuracy is listed in Table I. The number in the parenthesis denote the number of times, out of a total of ten, that the computation completed within a limit of 900 seconds. Our second set of computations generates the POI locations uniformly randomly according to the same rules used in Section V-A, in a |V|×1.2|V| rectangle. Then, for RMT instances, a time budget of 4 p |V| is used. For BMT instances, a reward requirement of 2 p |V| is used. The rest of the setup is done similarly as in the rectangular grid case. The computational performance is listed in Table II. From the computational experiments, we observe that in the grid case, for up to 200 POIs, the proposed method can compute a 1.2-optimal (corresponding to a 20% gap) MIP solution for almost all instances (199 out of 200 instances), under very reasonable computation time. Moreover, for up to 80 POIs, the method can compute a 1.05- optimal MIP solution for almost all instances (158 out of 160 instances). When the POIs are selected randomly, the computation seems to be more challenging. Computing 1.2- optimal MIP solution starts to become challenging when there are more than 40 POIs. The difficulty seems to come from the fact that randomly selected POIs can potentially be packed more densely in certain local regions. Nevertheless, we were still able to compute 1.5-optimal MIP solutions in most of the cases when there are 100 POIs. Overall, the two large sets of computations suggest that our algorithm can be used to do itinerary planning for practical-sized instances in large cities. C. Planning a One-Day Istanbul Tour As a last computational example, we illustrate how one may use real data to compute a day tour of Is- tanbul over 20 POIs.1 These 20 POIs are selected by taking the top-ranked attractions from TripAdvisor’s2 city guide for Istanbul. We select the top 20 POIs that are not general areas and have at least 300 user reviews. These POIs are (the ordering is by the POI’s rank): 1. Suleymaniye Mosque, 2. Rahmi M. Koc Museum, 3. Rustem Pasha Mosque, 4. Hagia Sophia Museum, 5. Kariye Museum, 6. Basilica Cistern, 7. Bosphorus Strait, 8. Blue Mosque, 9. Rumeli Fortress, 10. Eyup Sultan Mosque, 11. Kucuk Ayasofya Camii, 12. Topkapi Palace, 13. Miniaturk, 14. Istanbul Archaeological Museums, 15. Gulhane Park, 16. Istanbul Modern Museum, 17. New Mosque, 18. Dolmabahce Palace, 19. The Bosphorus Bridge, and 20. Galata Tower. 1We intentionally limited the size and complexity of this example to provide all important details. 2http://www.tripadvisor.com 16 TABLE I COMPUTATION TIME FOR SOLVING RMT AND BMT OVER POIS LOCATED AT THE LATTICE POINTS ON VARIOUS SIZED INTEGER GRIDS. grid size problem learning curve MIP gap 100% 50% 20% 10% 5% 1% 0% 4×5 RMT linear 0.085s (10) 0.135s (10) 0.203s (10) 0.261s (10) 0.675s (10) 2.285s (10) 2.357s (10) BMT linear 0.070s (10) 0.108s (10) 0.271s (10) 0.571s (10) 0.974s (10) 1.101s (10) 1.102s (10) RMT exponential 0.149s (10) 0.171s (10) 0.240s (10) 0.388s (10) 0.471s (10) 1.293s (10) 1.343s (10) BMT exponential 0.061s (10) 0.090s (10) 0.174s (10) 0.364s (10) 0.505s (10) 0.605s (10) 0.608s (10) 5×6 RMT linear 0.309s (10) 0.342s (10) 0.439s (10) 0.531s (10) 1.561s (10) 17.00s (10) 18.66s (10) BMT linear 0.191s (10) 0.250s (10) 0.868s (10) 2.038s (10) 5.580s (10) 8.038s (10) 8.080s (10) RMT exponential 0.361s (10) 0.395s (10) 0.522s (10) 0.814s (10) 1.225s (10) 11.31s (10) 12.41s (10) BMT exponential 0.147s (10) 0.194s (10) 0.586s (10) 1.803s (10) 5.710s (10) 9.383s (10) 9.483s (10) 6×7 RMT linear 0.683s (10) 0.687s (10) 0.816s (10) 1.009s (10) 5.790s (10) 161.3s (7) 209.8s (7) BMT linear 0.501s (10) 0.514s (10) 6.308s (10) 31.76s (10) 79.22s (10) 127.9s (10) 129.0s (10) RMT exponential 0.870s (10) 0.914s (10) 1.784s (10) 5.268s (10) 17.91s (10) 182.6s (8) 234.6s (8) BMT exponential 0.701s (10) 0.715s (10) 3.718s (10) 11.37s (10) 78.40s (10) 79.43s (8) 80.87s (8) 8×10 RMT linear 2.272s (10) 2.443s (10) 2.953s (10) 21.58s (10) 87.13s (10) 454.6s (3) 809.0s (1) BMT linear 2.188s (10) 2.382s (10) 3.111s (10) 20.75s (10) 134.1s (9) 284.2s (6) 295.2s (6) RMT exponential 2.134s (10) 2.345s (10) 5.664s (10) 22.75s (10) 67.28s (10) 342.5s (4) 498.5s (3) BMT exponential 2.530s (10) 2.849s (10) 20.64s (10) 79.32s (10) 274.6s (9) 492.9s (6) 524.7s (6) 10×20 RMT linear 17.31s (10) 17.31s (10) 18.96s (10) 98.66s (10) 433.9s (7) N/A N/A BMT linear 43.28s (10) 48.84s (10) 93.40s (10) 241.9s (9) 346.8s (4) N/A N/A RMT exponential 17.33s (10) 26.87s (10) 48.06s (9) 59.64s (5) 317.3s (1) N/A N/A BMT exponential 37.17s (10) 44.29s (10) 241.3s (10) 424.2s (6) 435.4s (1) N/A N/A After the POIs are selected, we compute the maximum reward of these POIs using the formula 3√nreview+ 10−rank/5, in which nreview is the total number of reviews received for the POI on TripAdvisor and rank is the POI’s rank on TripAdvisor. The attractions are mostly museums and architectural sites, to which we assign the learning rates of 1−0.01ri, i.e., we expect a tourist to spend more time at more renowned POIs . Using Google Map3, we extracted the pair-wise distances between any two of these POIs and build the sets E and D. The base vertex set is selected to contain the 1st, 6th, 11th, and 16th ranked POIs. With these parameters, we solve the RMT problem with exponential learning curves and a time budget of 9 hours. From the solution (an exact solution to the 1.1-optimal MIP model, computed in about five seconds) we extracted the itinerary listed in Table III. The itinerary visits 14 POIs and yields a reward of 115 out of a total possible reward of 380. A visual inspection of the itinerary suggests that it is a fairly reasonable solution to our proposed problem. During a recent trip to Istanbul for the WAFR 2014 conference, due to a tight schedule, some of us 3http://maps.google.com. 17 TABLE II COMPUTATION TIME FOR SOLVING RMT AND BMT OVER POIS THAT ARE UNIFORMLY RANDOMLY SELECTED. # of samples problem learning curve MIP gap 100% 50% 20% 10% 5% 1% 0% 20 RMT linear 0.118s (10) 0.236s (10) 0.730s (10) 2.645s (10) 4.832s (10) 5.887s (10) 5.92 s (10) BMT linear 0.049s (10) 0.112s (10) 1.183s (10) 1.727s (10) 1.908s (10) 1.962s (10) 1.966s (10) RMT exponential 0.274s (10) 0.379s (10) 3.780s (10) 6.499s (10) 13.38s (10) 17.49s (10) 17.57s (10) BMT exponential 0.071s (10) 0.166s (10) 2.531s (10) 5.731s (10) 6.946s (10) 7.400s (10) 7.424s (10) 30 RMT linear 0.435s (10) 1.122s (10) 15.41s (10) 74.97s (10) 228.1s (9) 81.90s (7) 82.95s (7) BMT linear 0.289s (10) 0.821s (10) 18.23s (10) 54.56s (10) 76.39s (10) 81.34s (10) 81.58s (10) RMT exponential 1.221s (10) 2.664s (10) 17.44s (10) 81.39s (10) 168.4s (9) 159.2s (8) 161.8s (8) BMT exponential 0.339s (10) 1.243s (10) 8.676s (10) 24.96s (10) 41.74s (10) 47.93s (10) 48.15s (10) 42 RMT linear 6.058s (10) 13.73s (10) 49.33s (8) 164.5s (6) 262.9s (3) 170.2s (1) 187.8s (1) BMT linear 0.612s (10) 1.513s (10) 142.2s (9) 107.0s (7) 141.2s (7) 148.9s (7) 149.1s (7) RMT exponential 14.13s (10) 24.80s (10) 93.65s (10) 132.2s (6) 288.2s (4) 375.9s (3) 381.1s (3) BMT exponential 1.711s (10) 7.759s (10) 179.4s (7) 195.4s (4) 279.6s (3) 362.7s (3) 365.0s (3) 100 RMT linear 54.26s (10) 57.26s (10) 439.3s (8) N/A N/A N/A N/A BMT linear 19.98s (10) 105.5s (10) N/A N/A N/A N/A N/A RMT exponential 59.45s (10) 125.3s (9) 309.0s (5) 790.8s (1) N/A N/A N/A BMT exponential 12.03s (10) 251.1s (10) 577.3s (1) N/A N/A N/A N/A 200 RMT linear 40.26s (10) 255.0s (8) N/A N/A N/A N/A N/A BMT linear 170.7s (10) 185.1s (9) N/A N/A N/A N/A N/A RMT exponential 34.50s (10) 229.5s (8) N/A N/A N/A N/A N/A BMT exponential 223.4s (10) 494.3s (3) N/A N/A N/A N/A N/A only had a few hours to visit local attractions. In the end, we visited the Hagia Sophia Museum, the Blue Mosque, and the Basilica Cistern. It turns out that, when we run the RMT algorithm with three hours of budget, this is the exact itinerary returned by the algorithm (see Table IV). VI. CONCLUSION In this paper, we proposed the Optimal Tourist Problem (OTP) that tie together the problem of maximizing information collection efforts at point-of-interests (POIs) and minimizing the required time spent on traveling between the set of discrete, distributed POIs. A particular novelty is that our formulation encompasses a general class of time-based reward functions. For solving the two variants of OTP, RMT and BMT, we construct an exact (when reward function is linear) or an arbitrarily optimal (when reward function is non-linear) MIP model that gives rise to an anytime algorithm for solving such problems. Computational results suggest that our algorithm is applicable to practical-sized itinerary planning or informative path planning problems and generates fairly sensible plans. 18 TABLE III A 9-HOUR COMPUTED ITINERARY IN ISTANBUL. 1 Start from the Suleymaniye Mosque, stay for 0.84 hour 2 Take a taxi to Topkapi Palace (8 min), stay for 0.88 hour 3 Take a taxi to Kucuk Ayasofya Camii (6 min), stay for 0.14 hour 4 Walk to Blue Mosque (6 min), stay for 0.90 hour 5 Walk to Basilica Cistern (4 min), stay for 0.90 hour 6 Walk to Hagia Sophia Museum (4 min), stay for 0.93 hour 7 Walk to Gulhane Park (4 min), stay for 0.11 hour 8 Walk to Archaeological Museums (2 min), stay for 0.78 hour 9 Take a taxi to Rustem Pasha Mosque (6 min), stay for 0.78 hour 10 Take a taxi to Rahmi M. Koc Museum (9 min), stay for 0.76 hour 11 Take a taxi to Kariye Museum (8 min), stay for 0.82 hour 12 Take a taxi and return to Suleymaniye Mosque (12 min) TABLE IV A 3-HOUR COMPUTED ITINERARY IN ISTANBUL. 1 Start at Basilica Cistern, stay for 0.90 hour 2 Walk to Hagia Sophia Museum (4 min), stay for 0.93 hour 3 Walk to Blue Mosque (8 min), stay for 0.89 hour 4 Walk back to Basilica Cistern (4 min) REFERENCES R. N. Smith, M. Schwager, S. L. Smith, B. H. Jones, D. Rus, and G. S. Sukhatme, “Persistent ocean monitoring with underwater gliders: Adapting sampling resolution,” Journal of Field Robotics, vol. 28, no. 5, pp. 714–741, Sep-Oct 2011. B. Grocholsky, J. Keller, V. Kumar, and G. Pappas, “Cooperative air and ground surveillance,” IEEE Robotics and Automation Magazine, vol. 13, no. 3, pp. 16–25, Sep 2006. S. Alamdari, E. Fata, and S. L. Smith, “Persistent monitoring in discrete environments: Minimizing the maximum weighted latency between observations,” The International Journal of Robotics Research, vol. 33, no. 1, pp. 138–154, 2014. S. L. Smith, M. Schwager, and D. Rus, “Persistent robotic tasks: Monitoring and sweeping in changing environments,” IEEE Transactions on Robotics, vol. 28, no. 2, pp. 410–426, April 2012. J. Yu, S. Karaman, and D. Rus, “Persistent monitoring of events with stochastic arrivals at multiple stations,” in Proceedings IEEE International Conference on Robotics & Automation, 2014, preliminary extended journal version available at http://arxiv.org/abs/1309.6041. Z. W. Lim, D. Hsu, and W. S. Lee, “Adaptive informative path planning in metric spaces,” in Proceedings Workshop on Algorithmic Foundations of Robotics, 2014. L. E. Kavraki, P. Svestka, J.-C. Latombe, and M. H. Overmars, “Probabilistic roadmaps for path planning in high-dimensional configuration spaces,” IEEE Transactions on Robotics & Automation, vol. 12, no. 4, pp. 566–580, Jun. 1996. S. M. LaValle, “Rapidly-exploring random trees: A new tool for path planning,” Iowa State University, Tech. Rep., Oct 1998, computer Science Department TR 98-11. S. Karaman and E. Frazzoli, “Sampling-based algorithms for optimal motion planning,” International Journal of Robotics Research, vol. 30, no. 7, pp. 846–894, June 2011. [Online]. Available: http://ares.lids.mit.edu/papers/Karaman.Frazzoli.IJRR11.pdf G. A. Hollinger and G. S. Sukhatme, “Sampling-based motion planning for robotic information gathering,” in Robotics: Science and Systems, 2013. X. Lan and M. Schwager, “Planning periodic persistent monitoring trajectories for sensing robots in gaussian random fields,” in Proceedings IEEE International Conference on Robotics & Automation, May 2013, pp. 2407–2412. I. Chao, B. Golden, and E. Wasil, “Theory and methodology - the team orienteering problem,” European Journal of Operational Research, vol. 88, pp. 464–474, 1996. P. Vansteenwegen, W. Souffriau, and D. V. Oudheusden, “The orienteering problem: A survey,” European Journal of Operational Research, vol. 209, pp. 1–10, 2011. D. Gavalas, C. Konstantopoulos, J. Mastakas, and G. Pantziou, “A survey on algorithmic approaches for solving tourist trip design problems,” Journal of Heuristics, vol. 20, no. 3, pp. 291–328, 2014. C. Chekuri, N. Korula, and M. P´al, “Improved algorithms for orienteering and related problems,” ACM Transactions on Algorithms (TALG), vol. 8, no. 3, p. 23, 2012. G. Erdoˇgan and G. Laporte, “The orienteering problem with variable profits,” Networks, vol. 61, no. 2, pp. 104–116, 2013. M. De Choudhury, M. Feldman, S. Amer-Yahia, N. Golbandi, R. Lempel, and C. Yu, “Automatic construction of travel itineraries using social breadcrumbs,” in Proceedings of the 21st ACM conference on Hypertext and hypermedia. ACM, 2010, pp. 35–44. S. Basu Roy, G. Das, S. Amer-Yahia, and C. Yu, “Interactive itinerary planning,” in Data Engineering (ICDE), 2011 IEEE 27th International Conference on. IEEE, 2011, pp. 15–26. 19 H. Yoon, Y. Zheng, X. Xie, and W. Woo, “Social itinerary recommendation from user-generated digital trails,” Personal and Ubiquitous Computing, vol. 16, no. 5, pp. 469–484, 2012. C. Chekuri and M. P´al, “A recursive greedy algorithm for walks in directed graphs,” in Foundations of Computer Science, 2005. FOCS 2005. 46th Annual IEEE Symposium on. IEEE, 2005, pp. 245–253. R. W. Floyd, “Algorithm 97: shortest path,” Communications of the ACM, vol. 5, no. 6, p. 345, 1962. S. Warshall, “A theorem on boolean matrices,” Journal of the ACM (JACM), vol. 9, no. 1, pp. 11–12, 1962. A. H. Land and A. G. Doig, “An automatic method of solving discrete programming problems,” Econometrica, vol. 28, no. 3, pp. 497–520, 1960. I. Gurobi Optimization, “Gurobi optimizer reference manual,” 2014. [Online]. Available: http://www.gurobi.com