### Endhost-Based Shortest Path Routing in Dynamic Networks: Supporting Materials

```Endhost-Based Shortest Path Routing in Dynamic Networks:
Supporting Materials
Ting He∗ , Dennis Goeckel† , Ramya Raghavendra∗ , and Don Towsley†
∗ IBM Research, Yorktown, NY, USA. Email: {the,rraghav}@us.ibm.com
† University of Massachusetts, Amherst, MA, USA.
Email: {[email protected],[email protected]}.umass.edu
∗
1
Introduction
This report contains supplementary materials in support of [1], including proofs and algorithm pseudo codes.
We refer to the original paper for formulation and definitions.
2
Proof of Theorems
Theorem 2.1. For mutually independent and i.i.d. link weights, the average regret of any O(log T )-regret
algorithm under coupled probing and routing satisfies
X
E[RT ] ≥ min
∆p uT (p)
(1)
Φ
p∈P ′
∆
for all sufficiently large T , where ∆p = E[Wp − Wp∗ ] is the suboptimality of path p, and uT (p) the average
number of times p is used up to time T , constrained by
∆
Φ = {(uT (p))p∈P ′ : uT (p) ≥ 0, ∀p ∈ P ′ ;
X
log T
, ∀i ∈ E ′ }.
pi uT (p) ≥
D
i
′
(2)
p∈P
Proof. The proof is based on a similar idea as in the proof of Theorem 3.1 in [2]: if we modify the link
weights so that a competitive link becomes part of the optimal path and thus must be used most of the
time by any good algorithm, then this link must be used sufficiently often (Ω(log T ) in T steps) by the same
algorithm under the original link weights because we do not know which weight configuration is in effect,
thus introducing a minimum learning cost.
Specifically, consider a competitive link i ∈ E ′ . For any ǫ > 0, select a new link weight distribution L′i
such that a suboptimal path p ∈ P ′ containing link i becomes optimal and |D(Li ||L′i ) − Di | ≤ ǫDi .
Consider any algorithm achieving O(log T ) regret under arbitrary link configurations. Let UT (i) (UT′ (i))
denote the number of times link i is used up to time T under the original (new) configuration. The O(log T )
∗ Research was sponsored by the U.S. Army Research Laboratory and the U.K. Ministry of Defence under Agreement Number
W911NF-06-3-0001. The views and conclusions are those of the authors and do not represent the U.S. Army Research Laboratory, the
U.S. Government, the U.K. Ministry of Defence or the U.K. Government. The U.S. and U.K. Governments are authorized to reproduce
and distribute reprints for Government purposes notwithstanding copyright notation.
1
regret implies that E[T − UT′ (i)] = O(log T ) since link i is part of the optimal path under the new configuration. From the proof of Theorem 3.1 in [2], we have that under the original configuration,
lim Pr{UT (i) <
T →∞
log T
} = 0,
(1 + 2ǫ)(1 + ǫ)Di
(3)
i.e., lim inf T →∞ E[UT (i)]/ log T ≥ 1/Di . That is, for all sufficiently large T , each competitive link i ∈ E ′
must be used at least log T /Di times on the average.
∆
Let uT (p) = E[UT (p)] for each p ∈ P ′ denote the average number of times that a suboptimal path
p is used up to time T . The above result implies that the usage of all suboptimal paths, denoted by
uT (p) (p ∈ P ′ ), must result in a per-link usage of at least log T /Di for each competitive link i, i.e.,
(uT (p))p∈P ′ ∈ Φ for Φ defined in (2). This constraint
Pholds for any algorithm with O(log T ) regret. Since
for a fixed algorithm, the average regret is given by p∈P ′ ∆p uT (p), the minimum value of this function
over all possible (uT (p))p∈P ′ ∈ Φ thus bounds the minimum average regret. That is, the average regret of
an arbitrary O(log T )-regret algorithm is lower bounded as
X
E[RT ] ≥ min
∆p uT (p).
(4)
Φ
p∈P ′
∆
Corollary 2.2. The lower bound in (1) is Ω(log T ) and O(min(N, M ) log T ). Specifically, for Dmin = min′ Di ,
i∈E
∆min
∆max
log T ≤ min ∆T uT ≤
min(M, N ) log T.
Φ
Dmin
Dmin
P
P
P
Proof. The lower bound is simply because ∆T uT ≥ ∆min p∈P ′ uT (p) and p∈P ′ uT (p) ≥ p∈P ′ pi uT (p) ≥
log T /Di for all i ∈ E ′ .
For the upper bound, first note that uT (p) = maxi∈E ′ :pi =1 log T /Di is a feasible solution. Thus,
X
∆p log T
∆max
min ∆T uT ≤
≤
M log T.
(5)
Φ
mini∈E ′ :pi =1 Di
Dmin
′
p∈P
Moreover, since each path contains at least one link, the total number of times of using suboptimal
paths
P
u
(p)
≤
does
not
need
to
exceed
the
total
number
of
times
of
using
competitive
i.e.,
min
′
Φ
T
p∈P
P
log
T
/D
.
Thus,
′
i
i∈E
min ∆T uT ≤ ∆max
Φ
X log T
∆max
≤
N log T.
D
Dmin
i
′
(6)
i∈E
Theorem 2.3. The average regret of OSPR is O(N 4 ) and specifically,
2c
4
2
3
E[RT ] ≤ ∆max N
H
N
+
H
+
1
,
∆2min
∆2min
(7)
where c is a constant satisfying c ≥ log M/N .
Proof. Define U (t) as the number of times suboptimal paths are used for routing in the first t steps. Since
we always probe the least measured link, we have that mini mi (t) ≥ ⌊t/N ⌋. For any n ≥ 0,
E[U (T )] ≤ nN +
T
X
Pr{∃p ∈ P ′ :
T
X
pi l̂i (t − 1) ≤
i
t=nN +1
≤ nN +
X
X
t=nN +1 p∈P ′
Pr{
X
i
2
pi ˆli (t − 1) ≤
X
p∗i l̂i (t − 1)}
(8)
i
X
i
p∗i l̂i (t − 1)},
(9)
where (8) is because we have chosen p̂(t) to be the empirically shortest path based on l̂(t − 1). We leverage
the Chernoff-Hoeffding bound to bound (9).
Chernoff-Hoeffding bound [3]: Let X1 , . . . , X
n be random variables with common range [0, 1] such that
P
E[Xt |X1 , . . . , Xt−1 ] = µ, ∀1 ≤ t ≤ n. Let Sn = ni=1 Xi . Then for all a ≥ 0,
2
Pr{Sn ≥ nµ + a} ≤ e−2a
/n
2
Pr{Sn ≤ nµ − a} ≤ e−2a
,
/n
.
(10)
Using this result, we will show that if mi ≥ k for all i, then
X
X
2
Pr{
pi l̂i ≤
p∗i ˆli } ≤ 2He−2k(∆min /2H) .
i
(11)
i
This is because by union bound, we have
Pr{
X
i
pi l̂i ≤
X
i
X
X
∆min
∆min
} + Pr{
p∗i l̂i ≥
p∗i li +
}
2
2
i
i
i
i
X
X
∆min
∆min
≤
Pr{l̂i − li ≤ − P } +
Pr{l̂i − li ≥ P ∗ }
2
p
2
∗
i i
i pi
i:p =1
p∗i l̂i } ≤ Pr{
X
pi l̂i ≤
X
pi li −
i:pi =1
i
≤ H Pr{l̂i − li ≤ −
∆min
∆min
} + H Pr{l̂i − li ≥
}.
2H
2H
Then applying the Chernoff-Hoeffding bound to lˆi and relaxing it using mi ≥ k give (11).
Applying (11) to (9), noting mi (t − 1) ≥ ⌊(t − 1)/N ⌋ ≥ t−1
N − 1, gives
E[U (T )] ≤ nN +
T
X
2
2M He(1+N )∆min/(2N H
2
)
2
· e−t∆min /(2N H
2
)
t=nN +1
≤ nN +
2
2
2M He−(n−1)N ∆min/(2N H )
.
2
2
1 − e−∆min /(2N H )
(12)
This bound can grow exponentially in N because of the M -factor. We can, however, select n such that the
2
2
product M e−(n−1)N ∆min/(2N H ) = O(1). Specifically, if M ≤ ecN , setting n = 1 + 2cN H 2 /∆2min gives
E[U (T )] ≤
2c
2H
2c
4
≈ 2 N 2H 2 + N + 2 N H 3
N 2H 2 + N +
2
2)
−∆
/(2N
H
∆2min
min
∆
∆
1−e
min
min
(13)
for N H 2 ≫ ∆2min .
Finally, since we use suboptimal paths U (T ) times in the first T steps, each incurring at most ∆max extra
weight, the average regret is bounded as
2c
4
2
3
E[RT ] ≤ ∆max E[U (T )] ≤ ∆max N
H
N
+
H
+
1
.
(14)
∆2min
∆2min
Theorem 2.4. For mutually independent and temporally i.i.d. link weights, the average regret of any
constant-regret (in T ) algorithm under decoupled probing and routing satisfies
E[RT ] ≥
∆min (1 − e−δmin T )
1 − e−δmin
for all sufficiently large T .
3
(15)
Proof. For any given algorithm with constant regret in T , let U (T ) denote the number of times suboptimal
paths are used up to time T . We have
E[U (T )] =
T
X
Pr{∃p ∈ P ′ appearing optimal at time t}.
(16)
t=1
Consider the event that the empirical
L′ make a suboptimal path p ∈ P ′ appear
Pdistributions
∗
′
better than the optimal path,
i.e.,
p
E[L
]
<
p
E[L
].
By
Sanov’s theorem [4], we know that this
i
i
i i
i i
P
− i mi D(L′i ||Li )
happens with probability e
for all mi ’s large enough, and this approximation is accurate to
the firstPorder in the exponent. At time t, since mi (t − 1) ≤ t − 1, this probability is lower bounded by
′
′
e−(t−1) i D(Li ||Li ) P
= e−(t−1)D(L ||L)
P.
∆
Let Ψp = {L′ : i pi E[L′i ] < i p∗i E[L′i ]} for each p ∈ P ′ . Then Pr{∃p ∈ P ′ appearing optimal at t} is
lower bounded by
′
max′ max
Pr{empirically, links ∼ L′ } ≥ e−(t−1) minP ′ minΨp D(L ||L) = e−(t−1)δmin .
′
p∈P L ∈Ψp
(17)
Applying this to (16) gives
E[U (T )] ≥
1 − e−δmin T
.
1 − e−δmin
(18)
Then substituting this bound into E[RT ] ≥ ∆min E[U (T )] yields the final result.
Corollary 2.5. The average regret of OSPR with S-path probing is O(N 4 /S) and specifically,
∆max N 2cN H 2
4H 3
E[RT ] ≤
+ 2 +1 ,
S
∆2min
∆min
(19)
assuming c ≥ log M/N .
Proof. The proof is analogous to that of Theorem 2.3. Still let U (t) denote the number of times of using
suboptimal paths in the first t steps. With multi-path probing, the number of measurements per link satisfies
mini mi (t) ≥ ⌊St/N ⌋ ≥ St/N − 1. Applying this to (11) gives
2 !
X
X
S(t
−
1)
∆
min
∗
Pr{
pi lˆi (t − 1) ≤
pi l̂i (t − 1)} ≤ 2Hexp −2
−1
.
(20)
N
2H
i
i
Substituting (20) into (9) gives
E[U (T )] ≤ nN +
T
X
2
2M He(S+N )∆min/(2N H
2
)
2
· e−tS∆min /(2N H
2
)
t=nN +1
2
2
2M He−(Sn−1)∆min/(2H )
≤ nN +
.
2
2
1 − e−S∆min/(2N H )
If M ≤ ecN , then setting n =
N
E[U (T )] ≤
S
1
S (1
+ 2cN H 2 /∆2min ) yields
2cN H 2
2H
N 2cN H 2
4H 3
+1 +
≈
+ 2 +1
2
2
∆2min
S
∆2min
∆min
1 − e−S∆min /(2N H )
for N H 2 ≫ S∆2min . The final result is then obtained by E[RT ] ≤ ∆max E[U (T )].
4
(21)
Corollary 2.6. The average regret of OSPR-UP for source-destination pair f is bounded as
∆2min,f P
Cis
∆max,f Nf cf Nf + 2H
2
s
f
E[RT,f ] ≤ max
Nf ∆2min,f P Cis
i∈Ef
s Ns
2Hf2
+
4∆max,f Hf3
∆2min,f mini∈Ef
for cf ≥ log Mf /Nf .
P
Cis
s Ns
(22)
Proof. Consider a source-destination pair f ; subscript f will be omitted if there is no ambiguity.
The key is
P
to bound the minimum link sample
size
min
m
(t)
for
general
C
’s.
By
definition,
m
(t)
≥ t for all
i∈Ef
i
is
is
i
P
s = 1, . . . , S. In the worst case i mis (t) = t, step 5 in Algorithm 4 iteratively solves the following integer
linear programming (ILP)1 :
max k
s.t.
S
X
(23)
mis (t)Cis ≥ k,
i = 1, . . . , N,
s=1
N
X
mis (t) ≤ t,
s = 1, . . . , S,
i=1
mis (t) ∈ N,
∀i, ∀s.
That is, distributing the t probing P
paths of each source-destination pair s to evenly cover the links,
P such that
the minimum link sample size mini Ss=1 mis (t)Cis is maximized. In particular, consider mini∈Ef Ss=1 mis (t)Cis ,
the minimum link sample size for source-destination pair f . Denote its maximum value under the constraints of (23) by k ∗ . Moreover, note that mis (t) = ⌊t/Ns ⌋Cis is a feasible solution to (23). Thus,
PS
∆
mini∈Ef mi (t) ≥ k ∗ ≥ mini∈Ef s=1 ⌊t/Ns ⌋Cis = κ(t).
The rest of the proof follows that of Theorem 2.3. Specifically, applying
!
X tCis X (1 + Ns )Cis
min mi (t − 1) ≥ κ(t − 1) ≥ min
−
i∈Ef
i∈Ef
Ns
Ns
s
s
into (11) and then into (9) yields
T
X
X Cis X (1 + Ns )Cis
∆2
E[U (T )] ≤ max nNf +
2M Hexp − min2 t
−
i∈Ef
2H
Ns
Ns
s
s
t=nNf +1
2 P
∆min
(Ns −nNf )Cis
2HM exp 2H
2
s
Ns
.
≤ max nNf +
∆2min P Cis
i∈Ef
1 − exp − 2H 2
s Ns
!!
(24)
For M ≤ ecNf , setting
n = max
i∈Ef
1 Here
cNf +
∆2min
2H 2
Nf ∆2min
2H 2
N denotes the set of natural numbers.
5
P
P
s Cis
Cis
s Ns
(25)
makes M exp
∆2min
2H 2
P
s
(Ns −nNf )Cis
Ns
E[U (T )] ≤ max nNf +
i∈Ef
≤ 1 for all i ∈ Ef . Therefore, (24) becomes
2H
∆2min P
1 − exp − 2H
2
s
Cis
Ns
≈ nNf +
4H 3
P
∆2min mini∈Ef s
Cis
Ns
(26)
for S∆2min ≪ H 2 mins Ns . Finally, combining (26) with E[R(T )] ≤ ∆max E[U (T )] and substituting (25) gives
the result.
Corollary 2.7. Consider a variant of OSPR where a probed link is excluded from probing for K slots2 . If
t=1 are K-dependent and K < N , then the regret of this variant of OSPR has the same
upper bound as in (7).
Proof. The modified probing process guarantees that samples used to estimate mean link weights are mutually independent. Thus, the same proof in Theorem 2.3 applies, except that the minimum link sample size
at slot t is now bounded by mini mi (t) ≥ ⌊t/ max(N, K + 1)⌋. As long as K < N , this bound is still ⌊t/N ⌋,
the same as that in the proof of Theorem 2.3. Therefore, the upper bound in Theorem 2.3 still holds.
Corollary 2.8. If probes are delayed by no more than (β − 1)N slots (β ≥ 1), the average regret of OSPR
remains O(N 4 ) and specifically,
4
2c
2
3
E[RT ] ≤ ∆max N
H
N
+
H
+
β
,
(27)
∆2min
∆2min
where the terms are defined the same as in (7).
Proof. The proof follows the same arguments as in the proof of Theorem 2.3, except that the minimum link
sample size at slot t is now bounded by mini mi (t) ≥ ⌊ Nt − β + 1⌋ ≥ Nt − β. Applying this bound to (11)
and then to (9) gives
E[U (T )] ≤ nN +
T
X
2M He−2(
t−1
2
N −β)(∆min /2H)
t=nN +1
2
2
2M He−(n−β)∆min/(2H )
≤ nN +
.
2
2
1 − e−∆min /(2N H )
(28)
For M ≤ ecN , setting n = β + 2cN H 2 /∆2min yields
E[U (T )] ≤
4
2c
N 2 H 2 + βN + 2 N H 3 .
∆2min
∆min
(29)
This implies that the regret bound is the same as (14) except that the constant term within the parentheses
is changed from 1 to β.
Proposition 2.9. If link weights in the control/sliding windows are i.i.d. respectively with the same mean,
then
s
(
)
2
2
Pr |l̂i,c − ˆli,s | ≥
log
≤ α.
(30)
ω
α
2 An excluded link can still be on a probing path, but it cannot be considered as a least measured link, and its measurement
will not be used for link quality estimation.
6
Algorithm 1 Learning with Linear Cost (LLC)
Require: Candidate paths P.
Ensure: Select a path to route and probe at each time step and compute estimated mean link weights
(l̂i )N
i=1 .
1: Initialization: select paths p(1), . . . , p(N ) s.t. pi (i) = 1 to measure each link at least once, compute
(l̂i , mi )N
i=1 accordingly
2: for t = N + 1, . . . do
q
P
log t
3:
Select the path p(t) such that p(t) = arg minp∈P i pi l̂i − (H+1)
mi
4:
Update (l̂i , mi )N
i=1 accordingly
Algorithm 2 Online SPR (OSPR)
Require: Candidate paths P.
Ensure: Select a path to route and a path to probe at each time step, and compute estimated mean link
weights (l̂i )N
i=1 .
Initialization: for t = 1, . . . , N ,
1: Route over a randomly selected path
2: Probe paths p(1), . . . , p(N ) s.t. pi (i) = 1 to measure each link at least once
3: Compute (l̂i , mi )N
i=1 accordingly
4:
5:
6:
7:
for t = N + 1, . . . do
P
Route over path p̂(t) = arg minp∈P i pi ˆli
Probe the path p(t) containing the least measured link, i.e., link j s.t. mj = mini mi
Update (l̂i , mi )N
i=1 accordingly
Proof. We prove the result by bounding the deviation between l̂i,c and l̂i,s . For any ǫ > 0, we have
ǫ
ǫ
Pr{|l̂i,c − ˆli,s | ≥ ǫ} ≤ Pr{|l̂i,c − li | ≥ } + Pr{|l̂i,s − li | ≥ }
2
2
2
≤ 2e−ωǫ /2 ,
q
where (31) is by applying the Chernoff-Hoeffding bound (10). Then setting ǫ = ω2 log
bound equal to α.
3
(31)
2
α
makes the upper
Pseudo Code for Algorithms
Algorithm 1 is the coupled probing/routing algorithm proposed in [5]. Algorithm 2 is the basic version of
our decoupled probing and routing algorithm based on single-source learning. Algorithms 3 and 4 are the
extended versions of the above for joint learning of multiple sources, with/without coordination in probing
path selection. Algorithm 5 is the window-based detection procedure used to trigger re-learning in the case
of fundamental changes in link qualities.
References
[1] T. He, D. Goeckel, R. Raghavendra, and D. Towsley, “Endhost-Based Shortest Path Routing in Dynamic Networks: An Online
Learning Approach,” 2012, under submission.
[2] V. Anantharam, P. Varaiya, and J. Walrand, “Asymptotically Efficient Allocation Rules for the Multiarmed Bandit Problem with
Multiple Plays—Part I: I.I.D. Rewards,” IEEE Trans. Automatic Control, 1987.
[3] D. Pollard, Convergence of Stochastic Processes, 1984.
7
Algorithm 3 Online SPR under Coordinated Probing (OSPR-CP)
Require: Candidate paths Pf for each source-destination pair (f = 1, . . . , S).
Ensure: At each time step, globally select S paths to probe to compute estimated mean link weights (l̂i )N
i=1
and locally select a path to route for each source.
Probing:
N
1: Initialization: probe each link at least once, compute (l̂i , mi )N
i=1 , broadcast (l̂i )i=1 to all the sources
2: for each time step t do
3:
for s = 1, . . . , S do
∆ S
4:
Probe the path p ∈ P = Sf=1 Pf containing the globally least measured link
5:
Update (l̂i , mi )N
i=1 accordingly
N
6:
Broadcast (l̂i )i=1 to all the sources
7:
8:
9:
Routing: for a source-destination pair f after receiving initial (l̂i )N
i=1
for each time step t do
P
Route over path p̂f (t) = arg minp∈Pf i pi l̂i
i=1
Algorithm 4 Online SPR under Uncoordinated Probing (OSPR-UP)
Require: Candidate paths Pf for a source-destination pair of interest.
Ensure: Select a path to route and a path to probe at each time step, and compute estimated mean link
weights (l̂i )N
i=1 .
1: Initialization: probe each link covered by Pf at least once, compute (l̂i,f , mi,f )N
i=1 , share it with the
other sources
2: for each time step t do
P
P
3:
Compute the aggregate link estimate: ˆli = ( Ss=1 l̂i,s mi,s )/( Ss=1 mi,s ) for all i
P
4:
Route over path p̂f (t) = arg minp∈Pf i pi l̂i
PS
5:
Probe the path pf (t) ∈ Pf containing the globally least measured (in terms of mi = s=1 mi,s ) link
among those covered by Pf
6:
Update (l̂i,f , mi,f )N
i=1 accordingly and share it with the other sources
Algorithm 5 Window-based Change Detection (WCD)
Require: Window size ω, false alarm constraint α.
Ensure: Detect changes on each link i and restart learning upon detection.
1: for each new measurement of link i do
2:
Update ˆli,s
q
3:
if |l̂i,c − l̂i,s | ≥ ω2 log α2 then
4:
5:
6:
reset (l̂i , mi ) (and also l̂i,c , l̂i,s )
else
update (l̂i , mi ) according to the measurement
[4] T. Cover and J. Thomas, Elements of Information Theory.
John Wiley & Sons, Inc., 1991.
[5] Y. Gai, B. Krishnamachari, and R. Jain, “Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits
with Linear Rewards and Individual Observations,” IEEE/ACM Trans. Networking, 2012.
8
```
Similar pages
ROHM BA7180AS