Supplementary materials for dynamic service migration in mobile edge-clouds

Supplementary Materials for Dynamic Service
Migration in Mobile Edge-Clouds
Shiqiang Wang∗ , Rahul Urgaonkar† , Murtaza Zafer‡¶ , Ting He† , Kevin Chan§ , and Kin K. Leung∗
∗ Imperial
† IBM
College London, United Kingdom, Email: {shiqiang.wang11, kin.leung}@imperial.ac.uk
T. J. Watson Research Center, Yorktown Heights, NY, United States, Email: {rurgaon, the}@us.ibm.com
‡ Nyansa Inc., Palo Alto, CA, United States, Email: [email protected]
§ Army Research Laboratory, Adelphi, MD, United States, Email: [email protected]
has M = 3N 2 + 3N states (excluding state (0, 0)). When
we ignore the complexity of matrix inversion in the policy
iteration procedure, the standard value and policy iteration
approaches on the 2-D MDP have a complexity of O(M 2 ) in
each iteration, because the optimal action needs to be found for
each state, which requires enumerating through all the states
and all possible actions for each state (similarly as in Lines
18–20 of Algorithm 1). In the simulations, N = 10, so we
have M = 330. Recall that the complexity of Algorithm 1
used in the proposed approach is O(N 2 ), so the ratio of
the computational complexities of different approaches can
2
3
be approximated by M
N 2 ≈ 10 . Therefore, the standard value
and policy iteration approaches consume about 103 times more
computation time compared with the proposed approach.
I. I NTRODUCTION
Supplementary materials for [1] is provided in this report. The variables, notations, and numberings (for equations,
propositions, etc., except for sections and references) mentioned in this report refer to those in [1]. Newly introduced
numberings in this report all start with prefix “S”.
This report is organized as follows. Section II outlines the
difference between service migration and cellular handover.
The main notations in [1] are summarized in Section III. Section IV provides some additional discussions on the simulation
results in Section IV-D of [1]. The proofs are given in Section
V, and a method to approximate a general cost function with a
constant-plus-exponential cost function is discussed in Section
VI.
B. Observations on Cost Value Characteristics
II. R ELATIONSHIP WITH C ELLULAR H ANDOVER
The cost from the optimal policy approaches the cost
from the always-migrate policy when −βl is small, and it
approaches the cost from the never-migrate policy when −βl
is large. This is because, when −βl is small, the migration cost
is relatively small, and migration can be beneficial for most
cases; when −βl is large, the migration cost is large, and it is
better not to migrate in most cases. The myopic policy is the
same as the never-migrate policy when −βl ≥ 0.5, because
cm (x) and cd (x) are both concave according to the simulation
settings and we always have cm (x) ≥ cd (x) when −βl ≥ 0.5.
It is also because the myopic policy does not consider the
future impact of actions. There is an intersection of the costs
from never-migrate and always-migrate policies. When −βl
takes values that are larger than the value at the intersection
point, the optimal cost is close to the cost from the nevermigrate policy when γ is small, and the gap becomes larger
with a larger γ. The reason is that the impact of migration is
more obvious when we look further ahead into the future.
A related area of work that is also relevant to the user mobility is handover policies in the context of cellular networks.
However, the notion of service migration is quite different
from cellular handover. Handover is usually decided by signal
strengths from different basestations, and a handover must
occur if a user is no longer covered by its original basestation.
In the service migration context, a user may continue receiving
service from an MEC even if it is no longer associated with
that basestation, because the user can exchange data with the
remote MEC via its currently associated basestation. As a
result, the service for a particular user can potentially be placed
on any MEC, and the service migration problem has a much
larger decision space.
III. S UMMARY OF N OTATIONS
The main notations in [1] are summarized in Table S.1.
IV. A DDITIONAL D ISCUSSIONS ON S IMULATION R ESULTS
IN [1, S ECTION IV-D]
V. P ROOFS
A. Reduction in Computation Time
The computation time reduction is explained as follows. As
discussed in Section [1, Section IV-B], for a distance-based
MDP with N states (excluding state zero), the 2-D MDP
A. Proof of Lemma 1
Assume that we are given a service migration policy π such
that the service can be migrated to a location that is farther
away from the user. We will show that, for an arbitrary sample
path of the user, we can find a (possibly history dependent)
¶ Contributions of the author to this work are not related to his current
employment at Nyansa Inc.
1
policy ψ that does not migrate to locations farther away from
the user, which performs not worse than policy π.
In the following, we consider an arbitrary sample path
of the user. For this sample path, denote the timeslot t0
as the first timeslot (starting from t = 0) in which the
service is migrated to somewhere farther away from the user
when following policy π. The initial state at timeslot t0 is
denoted by s(t0 ) = (u(t0 ), h(t0 )). When following π, the
state shifts from s(t0 ) to s0π (t0 ) = (u(t0 ), h0π (t0 )), where
ku(t0 ) − h0π (t0 )k > ku(t0 ) − h(t0 )k. The subsequent states in
this case are denoted by sπ (t) = (u(t), hπ (t)) for t > t0 .
Now, we define a policy ψ such that the following conditions are satisfied for the given sample path of the user:
• The migration actions in timeslots t < t0 are the same
when following either ψ or π.
• The policy ψ specifies that there is no migration within
the timeslots t ∈ [t0 , tm − 1], where tm > t0 is a timeslot
index that is defined later.
• The policy ψ is defined such that, at timeslot tm , the
service is migrated to h0π (tm ), where h0π (tm ) is the
service location (after possible migration at timeslot tm )
when following π.
• For timeslots t > tm , the migration actions for policies
ψ and π are the same.
For t > t0 , the states when following ψ are denoted by
sψ (t) = (u(t), hψ (t)).
The timeslot tm is defined as the first timeslot after t0 such
that the following condition is satisfied:
1) ku(tm ) − hψ (tm )k > ku(tm ) − h0π (tm )k, i.e., the
transmission cost (before migration at timeslot tm ) when
following ψ is larger than the transmission cost (after
possible migration at timeslot tm ) when following π.
Accordingly, within the interval [t0 +1, tm −1], the transmission cost when following ψ is always less than or equal to the
transmission cost when following π, and there is no migration
when following ψ. Therefore, for timeslots t ∈ [t0 , tm − 1],
policy π cannot bring lower cost than policy ψ.
In timeslot tm , we can always choose a migration action for
ψ where the migration cost is smaller than or equal to the sum
of the migration costs of π within [t0 , tm ]. The reason is that
ψ can migrate (in timeslot tm ) following the same migration
path as π within [t0 , tm ].
It follows that, for timeslots within [t0 , tm ], policy π cannot
perform better than policy ψ, and both policies have the same
costs for timeslots within [0, t0 −1] and [tm +1, ∞). The above
procedure can be repeated so that all the migration actions to
a location farther away from the user can be removed without
increasing the overall cost, thus we can redefine ψ to be a
policy that removes all such migrations.
We note that the policy ψ can be constructed without prior
knowledge of the user’s future location. For any policy π, the
policy ψ is a policy that does not migrate whenever π migrates
to a location farther away from the user (corresponding to
timeslot t0 ), and then it migrates to h0π (tm ) when condition 1
is satisfied (corresponding to timeslot tm ).
Table S.1
S UMMARY OF MAIN NOTATIONS IN [1]
Notation
Description
L
Set of locations
l
Location
kl1 − l2 k
Distance between locations l1 and l2
u(t)
User location at timeslot t
h(t)
Service location at timeslot t
cm (x)
Migration cost
cd (x)
Transmission cost
s(t)
Initial state at slot t
0
Intermediate state at slot t
π
Decision policy
s (t)
a(s)
Action taken when system is in state
s(t)
Ca (s)
Sum of migration and transmission
costs when taking action a(s) in slot t
V (s0 )
Discounted sum cost when starting at
state s0
Ps00 ,s1
Transition probability from intermediate state s00 to initial state s1 (where s1
is in the next slot)
γ
Discount factor of the MDP
d(t)
User-service distance in slot t before possible migration (state in the
distance-based MDP)
N
Number of states (excluding state zero)
in the distance-based MDP
p0 , p, q
Transition probabilities of the distancebased MDP (see Fig. 3)
βc , βl , δc ,
δl , µ, θ
Parameters related to the constant-plusexponential cost function (see (4) and
(5))
Ak , Bk , D,
H, m1 , m2
Parameters related to the closed-form
solution of the distance-based MDP
(see (7)–(10))
{nk : k ≥ 0}
Series of migration states (i.e., all nk
such that a(nk ) 6= nk )
r
Transition probability to one of the
neighbors in the 2-D model
e(t)
Offset of the user from the service as a
2-D vector (state in the 2-D MDP)
2
Therefore, V (d) can be expressed as (7).
The policy ψ is a history dependent policy, because its
actions depend on the past actions of the underlying policy π.
From [2, Section 6], we know that history dependent policies
cannot outperform Markovian policies for our problem (note
that the optimal policy found from (2) and (3) are Markovian
policies). Therefore, there exists a Markovian policy that does
not migrate to a location farther away from the user, which
does not perform worse than π. Thus, we have proved the
lemma.
D. Proof of Proposition 2
Recall that among the neighbors (i0 , j 0 ) of a cell (i, j) in
the 2-D offset-based MDP {e(t)}, when i > 0, we have two
(or, correspondingly, one) cells with i0 = i − 1, and two (or,
correspondingly, three) cells with i0 = i + 1. To “even out”
the different number of neighboring cells, we define a new
(modified) MDP {y(t)} for the 2-D offset-based model, where
the states are connected as in the original 2-D MDP, but the
transition probabilities are different, as shown in Fig. S.1.
In the modified MDP {y(t)}, the transition probabilities
starting from state (0, 0) to each of its neighboring cells have
the same value r. For all the other states (i, j) 6= (0, 0), they
are defined as follows:
• The transition probability to each of its neighbors with
the same ring index i is r.
• If state (i, j) has two (correspondingly, one) neighbors in
the lower ring i−1, then the transition probability to each
of its neighbors in the lower ring is 1.5
2 r (correspondingly,
1.5r).
• if state (i, j) has two (correspondingly, three) neighbors
in the higher ring i + 1, then the transition probability
to each of its neighbors in the higher ring is 2.5
2 r
(correspondingly, 2.5
r).
3
We denote the discounted sum cost from the original MDP
{e(t)} by V (i, j), and denote that from the modified MDP
{y(t)} by U (i, j), where (i, j) stands for the initial state in
the discounted sum cost definition.
B. Proof of Corollary 1
The proof follows the same procedure as the proof of
Lemma 1. For any given policy π that migrates to locations
other than the user location, we show that there exists a policy
ψ that does not perform such migration, which performs not
worse than the original policy π. The difference is that t0 is
defined as the first timeslot such that u(t0 ) 6= h0π (t0 ), and tm is
defined as the first timeslot after t0 such that u(tm ) = h0π (tm ).
Due to the strict inequality relationship of the cost functions
given in the corollary, and because 0 < γ < 1, we can
conclude that π is not optimal.
C. Proof of Proposition 1
For a given policy, the discounted sum cost V (d) for 0 ≤
d ≤ N can be computed from a set of difference equations
[3]. For 0 < d < n0 or nk−1 < d < nk , we have a(d) = d,
according to (6), we have
V (d) = δc + δl θd + γ
d+1
X
Pdd1 V (d1 )
(S.1)
d1 =d−1
We can rewrite (S.1) as
V (d) = φ1 V (d − 1) + φ2 V (d + 1) + φ3 + φ4 θd
γq
1−γ(1−p−q) , φ2 ,
δl
φ4 , 1−γ(1−p−q)
.
Part I – Upper bound on the difference between V (i, j) and
U (i, j) for a given policy π
(S.2)
Assume we have the same policy π for the original and
modified MDPs. Then, the balance equation of V (i, j) is


X
V (i, j) = Ca (i, j)+γ (1 − 6r) V (a(i, j)) + r
V (i0 , j 0 )
γp
1−γ(1−p−q) ,
φ3 ,
,
δc
1−γ(1−p−q) , and
This difference function has characteristic roots
√
√
1 − 1 − 4φ1 φ2
1 + 1 − 4φ1 φ2
, m2 =
(S.3)
m1 =
2φ2
2φ2
where φ1
(i0 ,j 0 )∈N (a(i,j))
(S.8)
where a(i, j) is the new state after possible migration at state
(i, j), and N (a(i, j)) is the set of states that are neighbors of
state a(i, j).
For U (i, j), we have
When 0 < γ < 1, we always have m1 6= m2 . The
homogeneous version of (S.2) has general solution
Vh (d) = Ak md1 + Bk md2
(S.4)
To solve the non-homogeneous equation (S.2), we try a
particular solution in the form of
(
D + H · θd
if 1 − φθ1 − φ2 θ 6= 0
Vp (d) =
(S.5)
D + Hd · θd if 1 − φθ1 − φ2 θ = 0
U (i, j) = Ca (i, j) + γ
!
+
H=


1−
 φ1
θ
φ3
1 − φ1 − φ2
φ4
−φ2 θ
φ1
θ
φ4
−φ2 θ
if 1 −
if 1 −
φ1
θ
φ1
θ
0
0
Py0 (t)=a(i,j),y(t+1)=(i0 ,j 0 ) U (i , j )
(S.9)
where Py0 (t)=a(i,j),y(t+1)=(i0 ,j 0 ) is the transition probability of
the modified MDP {y(t)} as specified earlier.
In the following, let ia(i,j) denote the ring index of a(i, j).
We define sets
N − (a(i, j)) = (i0 , j 0 ) ∈ N (a(i, j)) : i0 = ia(i,j) − 1
N + (a(i, j)) = (i0 , j 0 ) ∈ N (a(i, j)) : i0 = ia(i,j) + 1
(S.6)
− φ2 θ 6= 0
− φ2 θ = 0
X
(i0 ,j 0 )∈N (a(i,j))
where D and H are constant coefficients. By substituting (S.5)
into (S.2), we get
D=
(1 − 6r) U (a(i, j))
(S.7)
3
|U (i, j) − V (i, j)|
= γ (1 − 6r) (U (a(i, j)) − V (a(i, j)))
X
+
Py0 (t)=a(i,j),y(t+1)=(i0 ,j 0 ) (U (i0 , j 0 ) − V (i0 , j 0 ))
(i0 ,j 0 )∈N (a(i,j))
P
± 0.5λia(i,j) r
0 0
(i0 ,j 0 )∈N + (a(i,j)) V (i , j )
|N + (a(i, j))|
0 0
(i0 ,j 0 )∈N − (a(i,j)) V (i , j )
|N − (a(i, j))|
P
−
P
P
0 0
0 0 (i0 ,j 0 )∈N − (a(i,j)) V (i , j ) (i0 ,j 0 )∈N + (a(i,j)) V (i , j )
≤ γ + 0.5γr −
|N + (a(i, j))|
|N − (a(i, j))|
≤ γ + 0.5γr
max
i,j,j 0 :(i+1,j)∈N2 (i−1,j 0 )
(3 1)
(3,1)
(2,0)
(2 11)
(2,11)
r
r
r
r
r
r
(2 1)
(2,1)
r
r
(1 0)
(1,0)
(1 1)
(1,1)
r
( )
(0,0)
•
(a)
(3 1)
(3,1)
(2,0)
(2 11)
(2,11)
1 .5 r
2 .5
3
2 .5
3
r
2 .5
2
r
2 .5
3
1 .5
2
(1 0)
(1,0)
1 .5 r
( )
(0,0)
r
r
r
•
(3 2)
(3,2)
r
2 .5
2
r
(S.10)
(S.11)
(S.12)
The sum probability of moving to the lower ring in {y(t)}
is by 0.5r smaller (or, correspondingly, greater) than that
in {e(t)}.
The sum probability of moving to the higher ring in
{y(t)} is by 0.5r greater (or, correspondingly, smaller)
than that in {e(t)}.
Therefore, the transition probablity difference for each neighboring state in the lower (or higher) ring is ±0.5r divided
by the number of neighbors in the lower (or higher) ring.
Also note that the probablity difference for lower and higher
rings have opposite signs. This explains the last term of (S.10),
which captures the difference in the transition probabilities and
its impact on the discounted sum costs.
The inequality in (S.11) is from the triangle inequality. We
note that the subtraction in the last term of (S.10) only occurs
on V (i0 , j 0 ) values that are two-hop neighbors, so we have the
inequality in (S.12) by replacing the value with the maximum.
From (S.12), we can obtain a balance equation for the upper
bound of |U (i, j) − V (i, j)|, which is
(2 1)
(2,1)
1 .5
2
transition probabilities specified by the modified MDP. The
difference in their transition proabilities is captured by the
last term of (S.10). There is no difference in the transition
probabilities when ia(i,j) = 0, thus λia(i,j) = 0 when ia(i,j) =
0.
In the following, we consider ia(i,j) > 0 and further explain
the last term of (S.10). We first note that there is difference
in the transition probabilities only when moving to the lower
or higher ring:
(3 2)
(3,2)
r
r
|(V (i + 1, j) − V (i − 1, j 0 ))|
!
r
(1 1)
(1,1)
r
(b)
Figure S.1. Illustration of original and modified 2-D MDPs, only some
exemplar states and transition probabilities are shown: (a) original, (b)
modified.
to represent the neighboring states of a(i, j) that are respectively in the lower and higher rings. We use |·| to denote the
number of elements in a set.
Assume |U (i, j) − V (i, j)| ≤ for all i and j, and the
value of is unknown for now. We subtract (S.8) from (S.9),
and then take the absolute value, yielding (S.10), (S.11), and
(S.12), where the set N2 (i, j) is the set of states that are twohop neighbors of state (i, j), the variable λia(i,j) = 0 when
ia(i,j) = 0, and λia(i,j) = 1 when ia(i,j) > 0.
The first two terms of (S.10) subtract the discounted sum
cost of the original MDP {e(t)} from that of the modified
MDP {y(t)}, by assuming that both chains have the same
= γ+0.5γr
max
i,j,j 0 :(i+1,j)∈N2 (i−1,j 0 )
|(V (i+1, j)−V (i−1, j 0 ))|
(S.13)
Because 0 < γ < 1, the value of V (i, j) converges after
a number of iterations according to (S.8) [2, Section 6], so
|(V (i + 1, j) − V (i − 1, j))| also converges, and the value of
can be solved by
γr maxi,j,j 0 :(i+1,j)∈N2 (i−1,j 0 ) |(V (i+1, j)−V (i−1, j 0 ))|
2(1 − γ)
(S.14)
Note that the above argument also applies when interchang-
V =
4
policy for the distance-based MDP {d(t)}. A policy for {d(t)}
can also be mapped to a policy for {y(t)} by considering the
shortest path between different states in {y(t)}, as discussed
in [1, Section IV-B]. This implies that there is a one-to-one
mapping between the optimal policy for {y(t)} and a policy
for {d(t)}, because the optimal policy for {y(t)} also only
migrates along the shortest path between states.
We now show that the policy for {d(t)} obtained from
the optimal policy for {y(t)} is optimal for {d(t)}. To find
the optimal policy for {d(t)}, we can perform value iteration
according to the following update equation:
(
ing V and U , which means that an alternative upper bound of
the cost is
γr maxi,j,j 0 :(i+1,j)∈N2 (i−1,j 0 ) |(U (i+1, j)−U (i−1, j 0 ))|
U =
2(1 − γ)
(S.15)
The upper bound can also be expressed as =
min {V , U }, but either V or U may have the smaller value.
Part II – Optimal policy for the modified 2-D MDP {y(t)} is
equivalent to the optimal policy for the distance-based MDP
{d(t)}
We note that the optimal policy of an MDP can be found
from value iteration [2, Section 6]. For the modified MDP
{y(t)}, we initialize with a never migrate policy, which
gives the initial value function U0 (i, j) = cd (i), satisfying
U0 (i, j) = U0 (i, j 0 ) for all i and j 6= j 0 .
Suppose Un (i, j) = Un (i, j 0 ) for all i and j 6= j 0 . In each
iteration, we use the following equation to obtain the new
value function and the corresponding actions for each i and j:
(
Un+1 (i) = min Ca (i)
a


)
X X
0


+γ
Py(t)=a(i),y(t+1)=(i0 ,j 0 ) Un (i )
i0
The difference between (S.16) and (S.17) is that (S.17) does
not distinguish the actions and value functions with different
j indexes. Recall that for the modified 2-D MDP {y(t)}, we
have Un (i, j) = Un (i, j 0 ) for all n, i and j 6= j 0 , so the
index j can be natually removed from the value functions.
Further, if it is optimal to migrate at state (i, j) to a state in
ring i0 , it must be optimal to migrate at states (i, j) for all j
to a state (which may not be the same state) in ring i0 . The
migration cost for different j are the same because they all
follow the shortest path from state (i, j) to ring i0 . If it is
optimal not to migrate at state (i, j), then it is optimal not to
migrate at states (i, j) for all j. Therefore, we can also remove
the j index associated with the actions, without affecting the
value function. It follows that the optimal policy for {y(t)} is
equivalent to the optimal policy for {d(t)}, both bringing the
same value functions (discounted sum costs).
Un+1 (i, j) = min Ca (i, j)
a
)
+γ
XX
i0
0
0
Py0 (t)=a(i,j),y(t+1)=(i0 ,j 0 ) Un (i , j )
(S.17)
j0
(S.16)
j0
From the hexagon model in Fig. 5, we can see that for ring
indexes i and i0 , where i0 < i, we can always reach from
state (i, j) to a state in ring i0 with i − i0 hops, regardless
of the index j. In the (n + 1)th iteration, if it is optimal to
migrate from state (i, j) to a state with ring index i0 , then the
migration destination must be i − i0 hops away from origin
state, because Un (i0 , j) = Un (i0 , j 0 ) for all i0 and j 6= j 0 , it
cannot be beneficial to migrate to somewhere further away.
Further, if it is optimal to migrate at a state (i, j) to a state
in ring i0 , it must be optimal to migrate at states (i, j) for all j
to a state (which may not be the same state) in ring i0 , bringing
the same cost, i.e. Un+1 (i, j) = Un+1 (i, j 0 ) for j 6= j 0 . This
is due to the symmetry of cost functions (in the sense that
Un (i, j) = Un (i, j 0 ) for all i and j 6= j 0 ) and symmetry of
transition probabilities (in the sense that the sum probability of
reaching ring i−1 from any state in ring i is the same, and the
sum probability of reaching ring i + 1 from any state in ring
i is also the same). Similarly, if it is optimal not to migrate at
a state (i, j), then it is optimal not to migrate at states (i, j)
for all j, which also brings Un+1 (i, j) = Un+1 (i, j 0 ) for any
j 6= j 0 .
Because the value iteration converges to the optimal policy
and its corresponding cost as n → ∞, for the optimal policy
of the modified MDP {y(t)}, we have the same discounted
sum cost for states in the same ring, i.e. U ∗ (i, j) = U ∗ (i, j 0 )
for all i and j 6= j 0 . Meanwhile, for a given i, the optimal
actions a∗ (i, j) and a∗ (i, j 0 ) for any j 6= j 0 have the same
ring index ia∗ (i,j) .
Since the optimal actions a∗ (i, j) only depend on the ring
index i and the ring index of a∗ (i, j) does not change with
j, the optimal policy for {y(t)} can be directly mapped to a
Part III – Error bound for distance-based approximation
By now, we have shown the upper bound on the discounted
sum cost difference between the original and modified 2-D
MDPs {e(t)} and {y(t)}, when both MDPs use the same
policy. We have also shown that the optimal policy for
the modified 2-D MDP {y(t)} is equivalent to the optimal
policy for the distance-based MDP {d(t)}. Note that the true
optimal cost is obtained by solving for the optimal policy for
the original 2-D MDP {e(t)}, and the approximate optimal
cost is obtained by applying the optimal policy for {d(t)}
to {e(t)}. In the following, we consider the upper bound
on the difference between the true and approximate optimal
discounted sum costs.
∗
We start with a (true) optimal policy πtrue
for {e(t)}, denote
∗
the discounted sum costs from this policy as Vπtrue
(i, j).
When using the same policy on {y(t)}, the difference between
∗
∗
the costs Uπtrue
(i, j) and Vπtrue
(i, j) satisfies the upper bound
given in (S.14), i.e.
∗
∗
Uπtrue
(i, j) − Vπtrue
(i, j) ≤ Vπ∗
true
5
(S.18)
13.5
10
Original
Approximated
9.5
12
11.5
11
9
8.5
8
7.5
300
250
200
150
10.5
7
100
10
6.5
50
9.5
0
5
10
n
15
20
6
0
5
(a)
Original
Approximated
350
Value of cost function
12.5
Value of cost function
Value of cost function
13
400
Original
Approximated
10
n
15
20
0
0
5
(b)
10
n
20
(c)
Figure S.2. Examples of approximating a general cost function with exponential cost function: (a) f (n) = ln(n + 1) + 10, (b) f (n) =
f (n) = n2 .
∗
Since Vπtrue
(i, j) are the optimal costs, we have
Vπ∗ (i+1, j)−Vπ∗ (i−1, j 0 ) max
true
true
0
0
15
√
n + 1 + 5, (c)
VI. A PPROXIMATING A G ENERAL C OST F UNCTION WITH
C ONSTANT-P LUS -E XPONENTIAL C OST F UNCTION
i,j,j :(i+1,j)∈N2 (i−1,j )
≤ max {cm (x + 2) − cm (x)}
x
(S.19)
In the following, we only focus on the migration cost
function cm (x), because cm (x) and cd (x) are defined in
the same way. For a given cost function f (n), where n =
0, 1, 2, ..., we would like to approximate it with the constantplus-exponential cost function given by cm (n) = βc + βl µn .
Because cm (n) includes both a constant term and an exponential term, we cannot obtain an exact analytical expression
to minimize a commonly used error function, such as the
mean squared error. One way, however, is to solve for the
parameters that minimize the error cost function numerically.
To reduce the computational complexity compared with a
numerical solution, we provide an approximate solution in this
section.
because, otherwise, there exists at least one pair of states (i +
1, j) and (i − 1, j 0 ) for which it is beneficial to migrate from
state (i+1, j) to state (i−1, j 0 ), according to the 2-D extension
of (11). The cost after performing such migration is upper
∗
bounded by (S.19), which contradicts with the fact that πtrue
is optimal.
Define
c =
γr maxx {cm (x + 2) − cm (x)}
2(1 − γ)
(S.20)
From (S.14), (S.18) and (S.19), we have
∗
∗
Uπtrue
(i, j) − Vπtrue
(i, j) ≤ c
We are given an integer w that may be chosen according to
practical considerations, and we solve for the parameters βc ,
βl , and µ according to the following system of equations:
(S.21)
According to the equivalence of {y(t)} and {d(t)}, we
∗
of {d(t)} is also optimal
know that the optimal policy πappr
for {y(t)}. Hence, we have
∗
∗
Uπappr
(i, j) ≤ Uπtrue
(i, j)
βc + βl = f (0)
(S.25)
βc + βl µw = f (w)
(S.26)
2w
(S.27)
βc + βl µ
(S.22)
= f (2w)
because the cost from the optimal policy cannot be higher than
the cost from any other policy.
Subtracting (S.25) respectively from (S.26) and (S.27) and
dividing the two results gives
∗
πappr
f (2w) − f (0)
µ2w − 1
=
w
µ −1
f (w) − f (0)
When using the policy
on {e(t)}, we get costs
∗
∗
Vπappr
(i, j). From (S.15), and because Uπappr
(i, j) is the
optimal cost for {y(t)}, we have
∗
∗
Vπappr
(i, j) − Uπappr
(i, j) ≤ Uπappr
≤ c
∗
subject to µw 6= 1. We can then solve µw from (S.28) which
gives
p
R ± R2 − 4(R − 1)
w
µ =
(S.29)
2
(S.23)
(0)
where R , ff(2w)−f
(w)−f (0) ≥ 1, it follows that we always have
2
R − 4(R − 1) ≥ 0 and µw ≥ 0. To guarantee that µw 6= 1,
we set µw = 1 ± 0 if |µw − 1| < 0 , where 0 is a small
number and the sign is the same as the sign of µw − 1. Then,
From (S.21), (S.22), and (S.23), we get
∗
∗
Vπappr
(i, j) − Vπtrue
(i, j) ≤ 2c
(S.28)
(S.24)
which completes the proof.
6
we have
µ=
R±
! w1
p
R2 − 4(R − 1)
2
From which we can solve
f (0)µw − f (w)
βc =
µw − 1
βl =
f (w) − f (0)
µw − 1
(S.30)
(S.31)
(S.32)
According to (S.30), we have two possible values of µ, and
correspondingly two possible sets of values of βc and βl .
To determine
compute the sum squared
P2w which is better, we
n 2
error
(f
(n)
−
(β
+
β
µ
))
for each parameter set,
c
l
n=0
and choose the parameter set that produces the smaller sum
squared error.
Some examples of approximation results are shown in
Fig. S.2.
ACKNOWLEDGMENT
This research was sponsored in part by the U.S. Army
Research Laboratory and the U.K. Ministry of Defence and
was accomplished under Agreement Number W911NF-06-30001. The views and conclusions contained in this document
are those of the author(s) and should not be interpreted as
representing the official policies, either expressed or implied,
of the U.S. Army Research Laboratory, the U.S. Government,
the U.K. Ministry of Defence or the U.K. Government. The
U.S. and U.K. Governments are authorized to reproduce and
distribute reprints for Government purposes notwithstanding
any copyright notation hereon.
R EFERENCES
[1] S. Wang, R. Urgaonkar, M. Zafer, T. He, K. Chan, and K. K. Leung,
“Dynamic service migration in mobile edge-clouds,” in Proc. of IFIP
Networking 2015, May 2015.
[2] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, 2009, vol. 414.
[3] S. Elaydi, An Introduction to Difference Equations, Third Edition.
Springer, 2005.
7