Supplementary Materials for Dynamic Service Migration in Mobile Edge-Clouds Shiqiang Wang∗ , Rahul Urgaonkar† , Murtaza Zafer‡¶ , Ting He† , Kevin Chan§ , and Kin K. Leung∗ ∗ Imperial † IBM College London, United Kingdom, Email: {shiqiang.wang11, kin.leung}@imperial.ac.uk T. J. Watson Research Center, Yorktown Heights, NY, United States, Email: {rurgaon, the}@us.ibm.com ‡ Nyansa Inc., Palo Alto, CA, United States, Email: [email protected] § Army Research Laboratory, Adelphi, MD, United States, Email: [email protected] has M = 3N 2 + 3N states (excluding state (0, 0)). When we ignore the complexity of matrix inversion in the policy iteration procedure, the standard value and policy iteration approaches on the 2-D MDP have a complexity of O(M 2 ) in each iteration, because the optimal action needs to be found for each state, which requires enumerating through all the states and all possible actions for each state (similarly as in Lines 18–20 of Algorithm 1). In the simulations, N = 10, so we have M = 330. Recall that the complexity of Algorithm 1 used in the proposed approach is O(N 2 ), so the ratio of the computational complexities of different approaches can 2 3 be approximated by M N 2 ≈ 10 . Therefore, the standard value and policy iteration approaches consume about 103 times more computation time compared with the proposed approach. I. I NTRODUCTION Supplementary materials for [1] is provided in this report. The variables, notations, and numberings (for equations, propositions, etc., except for sections and references) mentioned in this report refer to those in [1]. Newly introduced numberings in this report all start with prefix “S”. This report is organized as follows. Section II outlines the difference between service migration and cellular handover. The main notations in [1] are summarized in Section III. Section IV provides some additional discussions on the simulation results in Section IV-D of [1]. The proofs are given in Section V, and a method to approximate a general cost function with a constant-plus-exponential cost function is discussed in Section VI. B. Observations on Cost Value Characteristics II. R ELATIONSHIP WITH C ELLULAR H ANDOVER The cost from the optimal policy approaches the cost from the always-migrate policy when −βl is small, and it approaches the cost from the never-migrate policy when −βl is large. This is because, when −βl is small, the migration cost is relatively small, and migration can be beneficial for most cases; when −βl is large, the migration cost is large, and it is better not to migrate in most cases. The myopic policy is the same as the never-migrate policy when −βl ≥ 0.5, because cm (x) and cd (x) are both concave according to the simulation settings and we always have cm (x) ≥ cd (x) when −βl ≥ 0.5. It is also because the myopic policy does not consider the future impact of actions. There is an intersection of the costs from never-migrate and always-migrate policies. When −βl takes values that are larger than the value at the intersection point, the optimal cost is close to the cost from the nevermigrate policy when γ is small, and the gap becomes larger with a larger γ. The reason is that the impact of migration is more obvious when we look further ahead into the future. A related area of work that is also relevant to the user mobility is handover policies in the context of cellular networks. However, the notion of service migration is quite different from cellular handover. Handover is usually decided by signal strengths from different basestations, and a handover must occur if a user is no longer covered by its original basestation. In the service migration context, a user may continue receiving service from an MEC even if it is no longer associated with that basestation, because the user can exchange data with the remote MEC via its currently associated basestation. As a result, the service for a particular user can potentially be placed on any MEC, and the service migration problem has a much larger decision space. III. S UMMARY OF N OTATIONS The main notations in [1] are summarized in Table S.1. IV. A DDITIONAL D ISCUSSIONS ON S IMULATION R ESULTS IN [1, S ECTION IV-D] V. P ROOFS A. Reduction in Computation Time The computation time reduction is explained as follows. As discussed in Section [1, Section IV-B], for a distance-based MDP with N states (excluding state zero), the 2-D MDP A. Proof of Lemma 1 Assume that we are given a service migration policy π such that the service can be migrated to a location that is farther away from the user. We will show that, for an arbitrary sample path of the user, we can find a (possibly history dependent) ¶ Contributions of the author to this work are not related to his current employment at Nyansa Inc. 1 policy ψ that does not migrate to locations farther away from the user, which performs not worse than policy π. In the following, we consider an arbitrary sample path of the user. For this sample path, denote the timeslot t0 as the first timeslot (starting from t = 0) in which the service is migrated to somewhere farther away from the user when following policy π. The initial state at timeslot t0 is denoted by s(t0 ) = (u(t0 ), h(t0 )). When following π, the state shifts from s(t0 ) to s0π (t0 ) = (u(t0 ), h0π (t0 )), where ku(t0 ) − h0π (t0 )k > ku(t0 ) − h(t0 )k. The subsequent states in this case are denoted by sπ (t) = (u(t), hπ (t)) for t > t0 . Now, we define a policy ψ such that the following conditions are satisfied for the given sample path of the user: • The migration actions in timeslots t < t0 are the same when following either ψ or π. • The policy ψ specifies that there is no migration within the timeslots t ∈ [t0 , tm − 1], where tm > t0 is a timeslot index that is defined later. • The policy ψ is defined such that, at timeslot tm , the service is migrated to h0π (tm ), where h0π (tm ) is the service location (after possible migration at timeslot tm ) when following π. • For timeslots t > tm , the migration actions for policies ψ and π are the same. For t > t0 , the states when following ψ are denoted by sψ (t) = (u(t), hψ (t)). The timeslot tm is defined as the first timeslot after t0 such that the following condition is satisfied: 1) ku(tm ) − hψ (tm )k > ku(tm ) − h0π (tm )k, i.e., the transmission cost (before migration at timeslot tm ) when following ψ is larger than the transmission cost (after possible migration at timeslot tm ) when following π. Accordingly, within the interval [t0 +1, tm −1], the transmission cost when following ψ is always less than or equal to the transmission cost when following π, and there is no migration when following ψ. Therefore, for timeslots t ∈ [t0 , tm − 1], policy π cannot bring lower cost than policy ψ. In timeslot tm , we can always choose a migration action for ψ where the migration cost is smaller than or equal to the sum of the migration costs of π within [t0 , tm ]. The reason is that ψ can migrate (in timeslot tm ) following the same migration path as π within [t0 , tm ]. It follows that, for timeslots within [t0 , tm ], policy π cannot perform better than policy ψ, and both policies have the same costs for timeslots within [0, t0 −1] and [tm +1, ∞). The above procedure can be repeated so that all the migration actions to a location farther away from the user can be removed without increasing the overall cost, thus we can redefine ψ to be a policy that removes all such migrations. We note that the policy ψ can be constructed without prior knowledge of the user’s future location. For any policy π, the policy ψ is a policy that does not migrate whenever π migrates to a location farther away from the user (corresponding to timeslot t0 ), and then it migrates to h0π (tm ) when condition 1 is satisfied (corresponding to timeslot tm ). Table S.1 S UMMARY OF MAIN NOTATIONS IN [1] Notation Description L Set of locations l Location kl1 − l2 k Distance between locations l1 and l2 u(t) User location at timeslot t h(t) Service location at timeslot t cm (x) Migration cost cd (x) Transmission cost s(t) Initial state at slot t 0 Intermediate state at slot t π Decision policy s (t) a(s) Action taken when system is in state s(t) Ca (s) Sum of migration and transmission costs when taking action a(s) in slot t V (s0 ) Discounted sum cost when starting at state s0 Ps00 ,s1 Transition probability from intermediate state s00 to initial state s1 (where s1 is in the next slot) γ Discount factor of the MDP d(t) User-service distance in slot t before possible migration (state in the distance-based MDP) N Number of states (excluding state zero) in the distance-based MDP p0 , p, q Transition probabilities of the distancebased MDP (see Fig. 3) βc , βl , δc , δl , µ, θ Parameters related to the constant-plusexponential cost function (see (4) and (5)) Ak , Bk , D, H, m1 , m2 Parameters related to the closed-form solution of the distance-based MDP (see (7)–(10)) {nk : k ≥ 0} Series of migration states (i.e., all nk such that a(nk ) 6= nk ) r Transition probability to one of the neighbors in the 2-D model e(t) Offset of the user from the service as a 2-D vector (state in the 2-D MDP) 2 Therefore, V (d) can be expressed as (7). The policy ψ is a history dependent policy, because its actions depend on the past actions of the underlying policy π. From [2, Section 6], we know that history dependent policies cannot outperform Markovian policies for our problem (note that the optimal policy found from (2) and (3) are Markovian policies). Therefore, there exists a Markovian policy that does not migrate to a location farther away from the user, which does not perform worse than π. Thus, we have proved the lemma. D. Proof of Proposition 2 Recall that among the neighbors (i0 , j 0 ) of a cell (i, j) in the 2-D offset-based MDP {e(t)}, when i > 0, we have two (or, correspondingly, one) cells with i0 = i − 1, and two (or, correspondingly, three) cells with i0 = i + 1. To “even out” the different number of neighboring cells, we define a new (modified) MDP {y(t)} for the 2-D offset-based model, where the states are connected as in the original 2-D MDP, but the transition probabilities are different, as shown in Fig. S.1. In the modified MDP {y(t)}, the transition probabilities starting from state (0, 0) to each of its neighboring cells have the same value r. For all the other states (i, j) 6= (0, 0), they are defined as follows: • The transition probability to each of its neighbors with the same ring index i is r. • If state (i, j) has two (correspondingly, one) neighbors in the lower ring i−1, then the transition probability to each of its neighbors in the lower ring is 1.5 2 r (correspondingly, 1.5r). • if state (i, j) has two (correspondingly, three) neighbors in the higher ring i + 1, then the transition probability to each of its neighbors in the higher ring is 2.5 2 r (correspondingly, 2.5 r). 3 We denote the discounted sum cost from the original MDP {e(t)} by V (i, j), and denote that from the modified MDP {y(t)} by U (i, j), where (i, j) stands for the initial state in the discounted sum cost definition. B. Proof of Corollary 1 The proof follows the same procedure as the proof of Lemma 1. For any given policy π that migrates to locations other than the user location, we show that there exists a policy ψ that does not perform such migration, which performs not worse than the original policy π. The difference is that t0 is defined as the first timeslot such that u(t0 ) 6= h0π (t0 ), and tm is defined as the first timeslot after t0 such that u(tm ) = h0π (tm ). Due to the strict inequality relationship of the cost functions given in the corollary, and because 0 < γ < 1, we can conclude that π is not optimal. C. Proof of Proposition 1 For a given policy, the discounted sum cost V (d) for 0 ≤ d ≤ N can be computed from a set of difference equations [3]. For 0 < d < n0 or nk−1 < d < nk , we have a(d) = d, according to (6), we have V (d) = δc + δl θd + γ d+1 X Pdd1 V (d1 ) (S.1) d1 =d−1 We can rewrite (S.1) as V (d) = φ1 V (d − 1) + φ2 V (d + 1) + φ3 + φ4 θd γq 1−γ(1−p−q) , φ2 , δl φ4 , 1−γ(1−p−q) . Part I – Upper bound on the difference between V (i, j) and U (i, j) for a given policy π (S.2) Assume we have the same policy π for the original and modified MDPs. Then, the balance equation of V (i, j) is X V (i, j) = Ca (i, j)+γ (1 − 6r) V (a(i, j)) + r V (i0 , j 0 ) γp 1−γ(1−p−q) , φ3 , , δc 1−γ(1−p−q) , and This difference function has characteristic roots √ √ 1 − 1 − 4φ1 φ2 1 + 1 − 4φ1 φ2 , m2 = (S.3) m1 = 2φ2 2φ2 where φ1 (i0 ,j 0 )∈N (a(i,j)) (S.8) where a(i, j) is the new state after possible migration at state (i, j), and N (a(i, j)) is the set of states that are neighbors of state a(i, j). For U (i, j), we have When 0 < γ < 1, we always have m1 6= m2 . The homogeneous version of (S.2) has general solution Vh (d) = Ak md1 + Bk md2 (S.4) To solve the non-homogeneous equation (S.2), we try a particular solution in the form of ( D + H · θd if 1 − φθ1 − φ2 θ 6= 0 Vp (d) = (S.5) D + Hd · θd if 1 − φθ1 − φ2 θ = 0 U (i, j) = Ca (i, j) + γ ! + H= 1− φ1 θ φ3 1 − φ1 − φ2 φ4 −φ2 θ φ1 θ φ4 −φ2 θ if 1 − if 1 − φ1 θ φ1 θ 0 0 Py0 (t)=a(i,j),y(t+1)=(i0 ,j 0 ) U (i , j ) (S.9) where Py0 (t)=a(i,j),y(t+1)=(i0 ,j 0 ) is the transition probability of the modified MDP {y(t)} as specified earlier. In the following, let ia(i,j) denote the ring index of a(i, j). We define sets N − (a(i, j)) = (i0 , j 0 ) ∈ N (a(i, j)) : i0 = ia(i,j) − 1 N + (a(i, j)) = (i0 , j 0 ) ∈ N (a(i, j)) : i0 = ia(i,j) + 1 (S.6) − φ2 θ 6= 0 − φ2 θ = 0 X (i0 ,j 0 )∈N (a(i,j)) where D and H are constant coefficients. By substituting (S.5) into (S.2), we get D= (1 − 6r) U (a(i, j)) (S.7) 3 |U (i, j) − V (i, j)| = γ (1 − 6r) (U (a(i, j)) − V (a(i, j))) X + Py0 (t)=a(i,j),y(t+1)=(i0 ,j 0 ) (U (i0 , j 0 ) − V (i0 , j 0 )) (i0 ,j 0 )∈N (a(i,j)) P ± 0.5λia(i,j) r 0 0 (i0 ,j 0 )∈N + (a(i,j)) V (i , j ) |N + (a(i, j))| 0 0 (i0 ,j 0 )∈N − (a(i,j)) V (i , j ) |N − (a(i, j))| P − P P 0 0 0 0 (i0 ,j 0 )∈N − (a(i,j)) V (i , j ) (i0 ,j 0 )∈N + (a(i,j)) V (i , j ) ≤ γ + 0.5γr − |N + (a(i, j))| |N − (a(i, j))| ≤ γ + 0.5γr max i,j,j 0 :(i+1,j)∈N2 (i−1,j 0 ) (3 1) (3,1) (2,0) (2 11) (2,11) r r r r r r (2 1) (2,1) r r (1 0) (1,0) (1 1) (1,1) r ( ) (0,0) • (a) (3 1) (3,1) (2,0) (2 11) (2,11) 1 .5 r 2 .5 3 2 .5 3 r 2 .5 2 r 2 .5 3 1 .5 2 (1 0) (1,0) 1 .5 r ( ) (0,0) r r r • (3 2) (3,2) r 2 .5 2 r (S.10) (S.11) (S.12) The sum probability of moving to the lower ring in {y(t)} is by 0.5r smaller (or, correspondingly, greater) than that in {e(t)}. The sum probability of moving to the higher ring in {y(t)} is by 0.5r greater (or, correspondingly, smaller) than that in {e(t)}. Therefore, the transition probablity difference for each neighboring state in the lower (or higher) ring is ±0.5r divided by the number of neighbors in the lower (or higher) ring. Also note that the probablity difference for lower and higher rings have opposite signs. This explains the last term of (S.10), which captures the difference in the transition probabilities and its impact on the discounted sum costs. The inequality in (S.11) is from the triangle inequality. We note that the subtraction in the last term of (S.10) only occurs on V (i0 , j 0 ) values that are two-hop neighbors, so we have the inequality in (S.12) by replacing the value with the maximum. From (S.12), we can obtain a balance equation for the upper bound of |U (i, j) − V (i, j)|, which is (2 1) (2,1) 1 .5 2 transition probabilities specified by the modified MDP. The difference in their transition proabilities is captured by the last term of (S.10). There is no difference in the transition probabilities when ia(i,j) = 0, thus λia(i,j) = 0 when ia(i,j) = 0. In the following, we consider ia(i,j) > 0 and further explain the last term of (S.10). We first note that there is difference in the transition probabilities only when moving to the lower or higher ring: (3 2) (3,2) r r |(V (i + 1, j) − V (i − 1, j 0 ))| ! r (1 1) (1,1) r (b) Figure S.1. Illustration of original and modified 2-D MDPs, only some exemplar states and transition probabilities are shown: (a) original, (b) modified. to represent the neighboring states of a(i, j) that are respectively in the lower and higher rings. We use |·| to denote the number of elements in a set. Assume |U (i, j) − V (i, j)| ≤ for all i and j, and the value of is unknown for now. We subtract (S.8) from (S.9), and then take the absolute value, yielding (S.10), (S.11), and (S.12), where the set N2 (i, j) is the set of states that are twohop neighbors of state (i, j), the variable λia(i,j) = 0 when ia(i,j) = 0, and λia(i,j) = 1 when ia(i,j) > 0. The first two terms of (S.10) subtract the discounted sum cost of the original MDP {e(t)} from that of the modified MDP {y(t)}, by assuming that both chains have the same = γ+0.5γr max i,j,j 0 :(i+1,j)∈N2 (i−1,j 0 ) |(V (i+1, j)−V (i−1, j 0 ))| (S.13) Because 0 < γ < 1, the value of V (i, j) converges after a number of iterations according to (S.8) [2, Section 6], so |(V (i + 1, j) − V (i − 1, j))| also converges, and the value of can be solved by γr maxi,j,j 0 :(i+1,j)∈N2 (i−1,j 0 ) |(V (i+1, j)−V (i−1, j 0 ))| 2(1 − γ) (S.14) Note that the above argument also applies when interchang- V = 4 policy for the distance-based MDP {d(t)}. A policy for {d(t)} can also be mapped to a policy for {y(t)} by considering the shortest path between different states in {y(t)}, as discussed in [1, Section IV-B]. This implies that there is a one-to-one mapping between the optimal policy for {y(t)} and a policy for {d(t)}, because the optimal policy for {y(t)} also only migrates along the shortest path between states. We now show that the policy for {d(t)} obtained from the optimal policy for {y(t)} is optimal for {d(t)}. To find the optimal policy for {d(t)}, we can perform value iteration according to the following update equation: ( ing V and U , which means that an alternative upper bound of the cost is γr maxi,j,j 0 :(i+1,j)∈N2 (i−1,j 0 ) |(U (i+1, j)−U (i−1, j 0 ))| U = 2(1 − γ) (S.15) The upper bound can also be expressed as = min {V , U }, but either V or U may have the smaller value. Part II – Optimal policy for the modified 2-D MDP {y(t)} is equivalent to the optimal policy for the distance-based MDP {d(t)} We note that the optimal policy of an MDP can be found from value iteration [2, Section 6]. For the modified MDP {y(t)}, we initialize with a never migrate policy, which gives the initial value function U0 (i, j) = cd (i), satisfying U0 (i, j) = U0 (i, j 0 ) for all i and j 6= j 0 . Suppose Un (i, j) = Un (i, j 0 ) for all i and j 6= j 0 . In each iteration, we use the following equation to obtain the new value function and the corresponding actions for each i and j: ( Un+1 (i) = min Ca (i) a ) X X 0 +γ Py(t)=a(i),y(t+1)=(i0 ,j 0 ) Un (i ) i0 The difference between (S.16) and (S.17) is that (S.17) does not distinguish the actions and value functions with different j indexes. Recall that for the modified 2-D MDP {y(t)}, we have Un (i, j) = Un (i, j 0 ) for all n, i and j 6= j 0 , so the index j can be natually removed from the value functions. Further, if it is optimal to migrate at state (i, j) to a state in ring i0 , it must be optimal to migrate at states (i, j) for all j to a state (which may not be the same state) in ring i0 . The migration cost for different j are the same because they all follow the shortest path from state (i, j) to ring i0 . If it is optimal not to migrate at state (i, j), then it is optimal not to migrate at states (i, j) for all j. Therefore, we can also remove the j index associated with the actions, without affecting the value function. It follows that the optimal policy for {y(t)} is equivalent to the optimal policy for {d(t)}, both bringing the same value functions (discounted sum costs). Un+1 (i, j) = min Ca (i, j) a ) +γ XX i0 0 0 Py0 (t)=a(i,j),y(t+1)=(i0 ,j 0 ) Un (i , j ) (S.17) j0 (S.16) j0 From the hexagon model in Fig. 5, we can see that for ring indexes i and i0 , where i0 < i, we can always reach from state (i, j) to a state in ring i0 with i − i0 hops, regardless of the index j. In the (n + 1)th iteration, if it is optimal to migrate from state (i, j) to a state with ring index i0 , then the migration destination must be i − i0 hops away from origin state, because Un (i0 , j) = Un (i0 , j 0 ) for all i0 and j 6= j 0 , it cannot be beneficial to migrate to somewhere further away. Further, if it is optimal to migrate at a state (i, j) to a state in ring i0 , it must be optimal to migrate at states (i, j) for all j to a state (which may not be the same state) in ring i0 , bringing the same cost, i.e. Un+1 (i, j) = Un+1 (i, j 0 ) for j 6= j 0 . This is due to the symmetry of cost functions (in the sense that Un (i, j) = Un (i, j 0 ) for all i and j 6= j 0 ) and symmetry of transition probabilities (in the sense that the sum probability of reaching ring i−1 from any state in ring i is the same, and the sum probability of reaching ring i + 1 from any state in ring i is also the same). Similarly, if it is optimal not to migrate at a state (i, j), then it is optimal not to migrate at states (i, j) for all j, which also brings Un+1 (i, j) = Un+1 (i, j 0 ) for any j 6= j 0 . Because the value iteration converges to the optimal policy and its corresponding cost as n → ∞, for the optimal policy of the modified MDP {y(t)}, we have the same discounted sum cost for states in the same ring, i.e. U ∗ (i, j) = U ∗ (i, j 0 ) for all i and j 6= j 0 . Meanwhile, for a given i, the optimal actions a∗ (i, j) and a∗ (i, j 0 ) for any j 6= j 0 have the same ring index ia∗ (i,j) . Since the optimal actions a∗ (i, j) only depend on the ring index i and the ring index of a∗ (i, j) does not change with j, the optimal policy for {y(t)} can be directly mapped to a Part III – Error bound for distance-based approximation By now, we have shown the upper bound on the discounted sum cost difference between the original and modified 2-D MDPs {e(t)} and {y(t)}, when both MDPs use the same policy. We have also shown that the optimal policy for the modified 2-D MDP {y(t)} is equivalent to the optimal policy for the distance-based MDP {d(t)}. Note that the true optimal cost is obtained by solving for the optimal policy for the original 2-D MDP {e(t)}, and the approximate optimal cost is obtained by applying the optimal policy for {d(t)} to {e(t)}. In the following, we consider the upper bound on the difference between the true and approximate optimal discounted sum costs. ∗ We start with a (true) optimal policy πtrue for {e(t)}, denote ∗ the discounted sum costs from this policy as Vπtrue (i, j). When using the same policy on {y(t)}, the difference between ∗ ∗ the costs Uπtrue (i, j) and Vπtrue (i, j) satisfies the upper bound given in (S.14), i.e. ∗ ∗ Uπtrue (i, j) − Vπtrue (i, j) ≤ Vπ∗ true 5 (S.18) 13.5 10 Original Approximated 9.5 12 11.5 11 9 8.5 8 7.5 300 250 200 150 10.5 7 100 10 6.5 50 9.5 0 5 10 n 15 20 6 0 5 (a) Original Approximated 350 Value of cost function 12.5 Value of cost function Value of cost function 13 400 Original Approximated 10 n 15 20 0 0 5 (b) 10 n 20 (c) Figure S.2. Examples of approximating a general cost function with exponential cost function: (a) f (n) = ln(n + 1) + 10, (b) f (n) = f (n) = n2 . ∗ Since Vπtrue (i, j) are the optimal costs, we have Vπ∗ (i+1, j)−Vπ∗ (i−1, j 0 ) max true true 0 0 15 √ n + 1 + 5, (c) VI. A PPROXIMATING A G ENERAL C OST F UNCTION WITH C ONSTANT-P LUS -E XPONENTIAL C OST F UNCTION i,j,j :(i+1,j)∈N2 (i−1,j ) ≤ max {cm (x + 2) − cm (x)} x (S.19) In the following, we only focus on the migration cost function cm (x), because cm (x) and cd (x) are defined in the same way. For a given cost function f (n), where n = 0, 1, 2, ..., we would like to approximate it with the constantplus-exponential cost function given by cm (n) = βc + βl µn . Because cm (n) includes both a constant term and an exponential term, we cannot obtain an exact analytical expression to minimize a commonly used error function, such as the mean squared error. One way, however, is to solve for the parameters that minimize the error cost function numerically. To reduce the computational complexity compared with a numerical solution, we provide an approximate solution in this section. because, otherwise, there exists at least one pair of states (i + 1, j) and (i − 1, j 0 ) for which it is beneficial to migrate from state (i+1, j) to state (i−1, j 0 ), according to the 2-D extension of (11). The cost after performing such migration is upper ∗ bounded by (S.19), which contradicts with the fact that πtrue is optimal. Define c = γr maxx {cm (x + 2) − cm (x)} 2(1 − γ) (S.20) From (S.14), (S.18) and (S.19), we have ∗ ∗ Uπtrue (i, j) − Vπtrue (i, j) ≤ c We are given an integer w that may be chosen according to practical considerations, and we solve for the parameters βc , βl , and µ according to the following system of equations: (S.21) According to the equivalence of {y(t)} and {d(t)}, we ∗ of {d(t)} is also optimal know that the optimal policy πappr for {y(t)}. Hence, we have ∗ ∗ Uπappr (i, j) ≤ Uπtrue (i, j) βc + βl = f (0) (S.25) βc + βl µw = f (w) (S.26) 2w (S.27) βc + βl µ (S.22) = f (2w) because the cost from the optimal policy cannot be higher than the cost from any other policy. Subtracting (S.25) respectively from (S.26) and (S.27) and dividing the two results gives ∗ πappr f (2w) − f (0) µ2w − 1 = w µ −1 f (w) − f (0) When using the policy on {e(t)}, we get costs ∗ ∗ Vπappr (i, j). From (S.15), and because Uπappr (i, j) is the optimal cost for {y(t)}, we have ∗ ∗ Vπappr (i, j) − Uπappr (i, j) ≤ Uπappr ≤ c ∗ subject to µw 6= 1. We can then solve µw from (S.28) which gives p R ± R2 − 4(R − 1) w µ = (S.29) 2 (S.23) (0) where R , ff(2w)−f (w)−f (0) ≥ 1, it follows that we always have 2 R − 4(R − 1) ≥ 0 and µw ≥ 0. To guarantee that µw 6= 1, we set µw = 1 ± 0 if |µw − 1| < 0 , where 0 is a small number and the sign is the same as the sign of µw − 1. Then, From (S.21), (S.22), and (S.23), we get ∗ ∗ Vπappr (i, j) − Vπtrue (i, j) ≤ 2c (S.28) (S.24) which completes the proof. 6 we have µ= R± ! w1 p R2 − 4(R − 1) 2 From which we can solve f (0)µw − f (w) βc = µw − 1 βl = f (w) − f (0) µw − 1 (S.30) (S.31) (S.32) According to (S.30), we have two possible values of µ, and correspondingly two possible sets of values of βc and βl . To determine compute the sum squared P2w which is better, we n 2 error (f (n) − (β + β µ )) for each parameter set, c l n=0 and choose the parameter set that produces the smaller sum squared error. Some examples of approximation results are shown in Fig. S.2. ACKNOWLEDGMENT This research was sponsored in part by the U.S. Army Research Laboratory and the U.K. Ministry of Defence and was accomplished under Agreement Number W911NF-06-30001. The views and conclusions contained in this document are those of the author(s) and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Army Research Laboratory, the U.S. Government, the U.K. Ministry of Defence or the U.K. Government. The U.S. and U.K. Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. R EFERENCES [1] S. Wang, R. Urgaonkar, M. Zafer, T. He, K. Chan, and K. K. Leung, “Dynamic service migration in mobile edge-clouds,” in Proc. of IFIP Networking 2015, May 2015. [2] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, 2009, vol. 414. [3] S. Elaydi, An Introduction to Difference Equations, Third Edition. Springer, 2005. 7