Endhost-Based Shortest Path Routing in Dynamic Networks: Supporting Materials Ting He∗ , Dennis Goeckel† , Ramya Raghavendra∗ , and Don Towsley† ∗ IBM Research, Yorktown, NY, USA. Email: {the,rraghav}@us.ibm.com † University of Massachusetts, Amherst, MA, USA. Email: {[email protected],[email protected]}.umass.edu ∗ 1 Introduction This report contains supplementary materials in support of [1], including proofs and algorithm pseudo codes. We refer to the original paper for formulation and definitions. 2 Proof of Theorems Theorem 2.1. For mutually independent and i.i.d. link weights, the average regret of any O(log T )-regret algorithm under coupled probing and routing satisfies X E[RT ] ≥ min ∆p uT (p) (1) Φ p∈P ′ ∆ for all sufficiently large T , where ∆p = E[Wp − Wp∗ ] is the suboptimality of path p, and uT (p) the average number of times p is used up to time T , constrained by ∆ Φ = {(uT (p))p∈P ′ : uT (p) ≥ 0, ∀p ∈ P ′ ; X log T , ∀i ∈ E ′ }. pi uT (p) ≥ D i ′ (2) p∈P Proof. The proof is based on a similar idea as in the proof of Theorem 3.1 in [2]: if we modify the link weights so that a competitive link becomes part of the optimal path and thus must be used most of the time by any good algorithm, then this link must be used sufficiently often (Ω(log T ) in T steps) by the same algorithm under the original link weights because we do not know which weight configuration is in effect, thus introducing a minimum learning cost. Specifically, consider a competitive link i ∈ E ′ . For any ǫ > 0, select a new link weight distribution L′i such that a suboptimal path p ∈ P ′ containing link i becomes optimal and |D(Li ||L′i ) − Di | ≤ ǫDi . Consider any algorithm achieving O(log T ) regret under arbitrary link configurations. Let UT (i) (UT′ (i)) denote the number of times link i is used up to time T under the original (new) configuration. The O(log T ) ∗ Research was sponsored by the U.S. Army Research Laboratory and the U.K. Ministry of Defence under Agreement Number W911NF-06-3-0001. The views and conclusions are those of the authors and do not represent the U.S. Army Research Laboratory, the U.S. Government, the U.K. Ministry of Defence or the U.K. Government. The U.S. and U.K. Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding copyright notation. 1 regret implies that E[T − UT′ (i)] = O(log T ) since link i is part of the optimal path under the new configuration. From the proof of Theorem 3.1 in [2], we have that under the original configuration, lim Pr{UT (i) < T →∞ log T } = 0, (1 + 2ǫ)(1 + ǫ)Di (3) i.e., lim inf T →∞ E[UT (i)]/ log T ≥ 1/Di . That is, for all sufficiently large T , each competitive link i ∈ E ′ must be used at least log T /Di times on the average. ∆ Let uT (p) = E[UT (p)] for each p ∈ P ′ denote the average number of times that a suboptimal path p is used up to time T . The above result implies that the usage of all suboptimal paths, denoted by uT (p) (p ∈ P ′ ), must result in a per-link usage of at least log T /Di for each competitive link i, i.e., (uT (p))p∈P ′ ∈ Φ for Φ defined in (2). This constraint Pholds for any algorithm with O(log T ) regret. Since for a fixed algorithm, the average regret is given by p∈P ′ ∆p uT (p), the minimum value of this function over all possible (uT (p))p∈P ′ ∈ Φ thus bounds the minimum average regret. That is, the average regret of an arbitrary O(log T )-regret algorithm is lower bounded as X E[RT ] ≥ min ∆p uT (p). (4) Φ p∈P ′ ∆ Corollary 2.2. The lower bound in (1) is Ω(log T ) and O(min(N, M ) log T ). Specifically, for Dmin = min′ Di , i∈E ∆min ∆max log T ≤ min ∆T uT ≤ min(M, N ) log T. Φ Dmin Dmin P P P Proof. The lower bound is simply because ∆T uT ≥ ∆min p∈P ′ uT (p) and p∈P ′ uT (p) ≥ p∈P ′ pi uT (p) ≥ log T /Di for all i ∈ E ′ . For the upper bound, first note that uT (p) = maxi∈E ′ :pi =1 log T /Di is a feasible solution. Thus, X ∆p log T ∆max min ∆T uT ≤ ≤ M log T. (5) Φ mini∈E ′ :pi =1 Di Dmin ′ p∈P Moreover, since each path contains at least one link, the total number of times of using suboptimal paths P u (p) ≤ does not need to exceed the total number of times of using competitive links, i.e., min ′ Φ T p∈P P log T /D . Thus, ′ i i∈E min ∆T uT ≤ ∆max Φ X log T ∆max ≤ N log T. D Dmin i ′ (6) i∈E Theorem 2.3. The average regret of OSPR is O(N 4 ) and specifically, 2c 4 2 3 E[RT ] ≤ ∆max N H N + H + 1 , ∆2min ∆2min (7) where c is a constant satisfying c ≥ log M/N . Proof. Define U (t) as the number of times suboptimal paths are used for routing in the first t steps. Since we always probe the least measured link, we have that mini mi (t) ≥ ⌊t/N ⌋. For any n ≥ 0, E[U (T )] ≤ nN + T X Pr{∃p ∈ P ′ : T X pi l̂i (t − 1) ≤ i t=nN +1 ≤ nN + X X t=nN +1 p∈P ′ Pr{ X i 2 pi ˆli (t − 1) ≤ X p∗i l̂i (t − 1)} (8) i X i p∗i l̂i (t − 1)}, (9) where (8) is because we have chosen p̂(t) to be the empirically shortest path based on l̂(t − 1). We leverage the Chernoff-Hoeffding bound to bound (9). Chernoff-Hoeffding bound [3]: Let X1 , . . . , X n be random variables with common range [0, 1] such that P E[Xt |X1 , . . . , Xt−1 ] = µ, ∀1 ≤ t ≤ n. Let Sn = ni=1 Xi . Then for all a ≥ 0, 2 Pr{Sn ≥ nµ + a} ≤ e−2a /n 2 Pr{Sn ≤ nµ − a} ≤ e−2a , /n . (10) Using this result, we will show that if mi ≥ k for all i, then X X 2 Pr{ pi l̂i ≤ p∗i ˆli } ≤ 2He−2k(∆min /2H) . i (11) i This is because by union bound, we have Pr{ X i pi l̂i ≤ X i X X ∆min ∆min } + Pr{ p∗i l̂i ≥ p∗i li + } 2 2 i i i i X X ∆min ∆min ≤ Pr{l̂i − li ≤ − P } + Pr{l̂i − li ≥ P ∗ } 2 p 2 ∗ i i i pi i:p =1 p∗i l̂i } ≤ Pr{ X pi l̂i ≤ X pi li − i:pi =1 i ≤ H Pr{l̂i − li ≤ − ∆min ∆min } + H Pr{l̂i − li ≥ }. 2H 2H Then applying the Chernoff-Hoeffding bound to lˆi and relaxing it using mi ≥ k give (11). Applying (11) to (9), noting mi (t − 1) ≥ ⌊(t − 1)/N ⌋ ≥ t−1 N − 1, gives E[U (T )] ≤ nN + T X 2 2M He(1+N )∆min/(2N H 2 ) 2 · e−t∆min /(2N H 2 ) t=nN +1 ≤ nN + 2 2 2M He−(n−1)N ∆min/(2N H ) . 2 2 1 − e−∆min /(2N H ) (12) This bound can grow exponentially in N because of the M -factor. We can, however, select n such that the 2 2 product M e−(n−1)N ∆min/(2N H ) = O(1). Specifically, if M ≤ ecN , setting n = 1 + 2cN H 2 /∆2min gives E[U (T )] ≤ 2c 2H 2c 4 ≈ 2 N 2H 2 + N + 2 N H 3 N 2H 2 + N + 2 2) −∆ /(2N H ∆2min min ∆ ∆ 1−e min min (13) for N H 2 ≫ ∆2min . Finally, since we use suboptimal paths U (T ) times in the first T steps, each incurring at most ∆max extra weight, the average regret is bounded as 2c 4 2 3 E[RT ] ≤ ∆max E[U (T )] ≤ ∆max N H N + H + 1 . (14) ∆2min ∆2min Theorem 2.4. For mutually independent and temporally i.i.d. link weights, the average regret of any constant-regret (in T ) algorithm under decoupled probing and routing satisfies E[RT ] ≥ ∆min (1 − e−δmin T ) 1 − e−δmin for all sufficiently large T . 3 (15) Proof. For any given algorithm with constant regret in T , let U (T ) denote the number of times suboptimal paths are used up to time T . We have E[U (T )] = T X Pr{∃p ∈ P ′ appearing optimal at time t}. (16) t=1 Consider the event that the empirical L′ make a suboptimal path p ∈ P ′ appear P link ′weight Pdistributions ∗ ′ better than the optimal path, i.e., p E[L ] < p E[L ]. By Sanov’s theorem [4], we know that this i i i i i i P − i mi D(L′i ||Li ) happens with probability e for all mi ’s large enough, and this approximation is accurate to the firstPorder in the exponent. At time t, since mi (t − 1) ≤ t − 1, this probability is lower bounded by ′ ′ e−(t−1) i D(Li ||Li ) P = e−(t−1)D(L ||L) P. ∆ Let Ψp = {L′ : i pi E[L′i ] < i p∗i E[L′i ]} for each p ∈ P ′ . Then Pr{∃p ∈ P ′ appearing optimal at t} is lower bounded by ′ max′ max Pr{empirically, links ∼ L′ } ≥ e−(t−1) minP ′ minΨp D(L ||L) = e−(t−1)δmin . ′ p∈P L ∈Ψp (17) Applying this to (16) gives E[U (T )] ≥ 1 − e−δmin T . 1 − e−δmin (18) Then substituting this bound into E[RT ] ≥ ∆min E[U (T )] yields the final result. Corollary 2.5. The average regret of OSPR with S-path probing is O(N 4 /S) and specifically, ∆max N 2cN H 2 4H 3 E[RT ] ≤ + 2 +1 , S ∆2min ∆min (19) assuming c ≥ log M/N . Proof. The proof is analogous to that of Theorem 2.3. Still let U (t) denote the number of times of using suboptimal paths in the first t steps. With multi-path probing, the number of measurements per link satisfies mini mi (t) ≥ ⌊St/N ⌋ ≥ St/N − 1. Applying this to (11) gives 2 ! X X S(t − 1) ∆ min ∗ Pr{ pi lˆi (t − 1) ≤ pi l̂i (t − 1)} ≤ 2Hexp −2 −1 . (20) N 2H i i Substituting (20) into (9) gives E[U (T )] ≤ nN + T X 2 2M He(S+N )∆min/(2N H 2 ) 2 · e−tS∆min /(2N H 2 ) t=nN +1 2 2 2M He−(Sn−1)∆min/(2H ) ≤ nN + . 2 2 1 − e−S∆min/(2N H ) If M ≤ ecN , then setting n = N E[U (T )] ≤ S 1 S (1 + 2cN H 2 /∆2min ) yields 2cN H 2 2H N 2cN H 2 4H 3 +1 + ≈ + 2 +1 2 2 ∆2min S ∆2min ∆min 1 − e−S∆min /(2N H ) for N H 2 ≫ S∆2min . The final result is then obtained by E[RT ] ≤ ∆max E[U (T )]. 4 (21) Corollary 2.6. The average regret of OSPR-UP for source-destination pair f is bounded as ∆2min,f P Cis ∆max,f Nf cf Nf + 2H 2 s f E[RT,f ] ≤ max Nf ∆2min,f P Cis i∈Ef s Ns 2Hf2 + 4∆max,f Hf3 ∆2min,f mini∈Ef for cf ≥ log Mf /Nf . P Cis s Ns (22) Proof. Consider a source-destination pair f ; subscript f will be omitted if there is no ambiguity. The key is P to bound the minimum link sample size min m (t) for general C ’s. By definition, m (t) ≥ t for all i∈Ef i is is i P s = 1, . . . , S. In the worst case i mis (t) = t, step 5 in Algorithm 4 iteratively solves the following integer linear programming (ILP)1 : max k s.t. S X (23) mis (t)Cis ≥ k, i = 1, . . . , N, s=1 N X mis (t) ≤ t, s = 1, . . . , S, i=1 mis (t) ∈ N, ∀i, ∀s. That is, distributing the t probing P paths of each source-destination pair s to evenly cover the links, P such that the minimum link sample size mini Ss=1 mis (t)Cis is maximized. In particular, consider mini∈Ef Ss=1 mis (t)Cis , the minimum link sample size for source-destination pair f . Denote its maximum value under the constraints of (23) by k ∗ . Moreover, note that mis (t) = ⌊t/Ns ⌋Cis is a feasible solution to (23). Thus, PS ∆ mini∈Ef mi (t) ≥ k ∗ ≥ mini∈Ef s=1 ⌊t/Ns ⌋Cis = κ(t). The rest of the proof follows that of Theorem 2.3. Specifically, applying ! X tCis X (1 + Ns )Cis min mi (t − 1) ≥ κ(t − 1) ≥ min − i∈Ef i∈Ef Ns Ns s s into (11) and then into (9) yields T X X Cis X (1 + Ns )Cis ∆2 E[U (T )] ≤ max nNf + 2M Hexp − min2 t − i∈Ef 2H Ns Ns s s t=nNf +1 2 P ∆min (Ns −nNf )Cis 2HM exp 2H 2 s Ns . ≤ max nNf + ∆2min P Cis i∈Ef 1 − exp − 2H 2 s Ns !! (24) For M ≤ ecNf , setting n = max i∈Ef 1 Here cNf + ∆2min 2H 2 Nf ∆2min 2H 2 N denotes the set of natural numbers. 5 P P s Cis Cis s Ns (25) makes M exp ∆2min 2H 2 P s (Ns −nNf )Cis Ns E[U (T )] ≤ max nNf + i∈Ef ≤ 1 for all i ∈ Ef . Therefore, (24) becomes 2H ∆2min P 1 − exp − 2H 2 s Cis Ns ≈ nNf + 4H 3 P ∆2min mini∈Ef s Cis Ns (26) for S∆2min ≪ H 2 mins Ns . Finally, combining (26) with E[R(T )] ≤ ∆max E[U (T )] and substituting (25) gives the result. Corollary 2.7. Consider a variant of OSPR where a probed link is excluded from probing for K slots2 . If link weights (Li (t))∞ t=1 are K-dependent and K < N , then the regret of this variant of OSPR has the same upper bound as in (7). Proof. The modified probing process guarantees that samples used to estimate mean link weights are mutually independent. Thus, the same proof in Theorem 2.3 applies, except that the minimum link sample size at slot t is now bounded by mini mi (t) ≥ ⌊t/ max(N, K + 1)⌋. As long as K < N , this bound is still ⌊t/N ⌋, the same as that in the proof of Theorem 2.3. Therefore, the upper bound in Theorem 2.3 still holds. Corollary 2.8. If probes are delayed by no more than (β − 1)N slots (β ≥ 1), the average regret of OSPR remains O(N 4 ) and specifically, 4 2c 2 3 E[RT ] ≤ ∆max N H N + H + β , (27) ∆2min ∆2min where the terms are defined the same as in (7). Proof. The proof follows the same arguments as in the proof of Theorem 2.3, except that the minimum link sample size at slot t is now bounded by mini mi (t) ≥ ⌊ Nt − β + 1⌋ ≥ Nt − β. Applying this bound to (11) and then to (9) gives E[U (T )] ≤ nN + T X 2M He−2( t−1 2 N −β)(∆min /2H) t=nN +1 2 2 2M He−(n−β)∆min/(2H ) ≤ nN + . 2 2 1 − e−∆min /(2N H ) (28) For M ≤ ecN , setting n = β + 2cN H 2 /∆2min yields E[U (T )] ≤ 4 2c N 2 H 2 + βN + 2 N H 3 . ∆2min ∆min (29) This implies that the regret bound is the same as (14) except that the constant term within the parentheses is changed from 1 to β. Proposition 2.9. If link weights in the control/sliding windows are i.i.d. respectively with the same mean, then s ( ) 2 2 Pr |l̂i,c − ˆli,s | ≥ log ≤ α. (30) ω α 2 An excluded link can still be on a probing path, but it cannot be considered as a least measured link, and its measurement will not be used for link quality estimation. 6 Algorithm 1 Learning with Linear Cost (LLC) Require: Candidate paths P. Ensure: Select a path to route and probe at each time step and compute estimated mean link weights (l̂i )N i=1 . 1: Initialization: select paths p(1), . . . , p(N ) s.t. pi (i) = 1 to measure each link at least once, compute (l̂i , mi )N i=1 accordingly 2: for t = N + 1, . . . do q P log t 3: Select the path p(t) such that p(t) = arg minp∈P i pi l̂i − (H+1) mi 4: Update (l̂i , mi )N i=1 accordingly Algorithm 2 Online SPR (OSPR) Require: Candidate paths P. Ensure: Select a path to route and a path to probe at each time step, and compute estimated mean link weights (l̂i )N i=1 . Initialization: for t = 1, . . . , N , 1: Route over a randomly selected path 2: Probe paths p(1), . . . , p(N ) s.t. pi (i) = 1 to measure each link at least once 3: Compute (l̂i , mi )N i=1 accordingly 4: 5: 6: 7: for t = N + 1, . . . do P Route over path p̂(t) = arg minp∈P i pi ˆli Probe the path p(t) containing the least measured link, i.e., link j s.t. mj = mini mi Update (l̂i , mi )N i=1 accordingly Proof. We prove the result by bounding the deviation between l̂i,c and l̂i,s . For any ǫ > 0, we have ǫ ǫ Pr{|l̂i,c − ˆli,s | ≥ ǫ} ≤ Pr{|l̂i,c − li | ≥ } + Pr{|l̂i,s − li | ≥ } 2 2 2 ≤ 2e−ωǫ /2 , q where (31) is by applying the Chernoff-Hoeffding bound (10). Then setting ǫ = ω2 log bound equal to α. 3 (31) 2 α makes the upper Pseudo Code for Algorithms Algorithm 1 is the coupled probing/routing algorithm proposed in [5]. Algorithm 2 is the basic version of our decoupled probing and routing algorithm based on single-source learning. Algorithms 3 and 4 are the extended versions of the above for joint learning of multiple sources, with/without coordination in probing path selection. Algorithm 5 is the window-based detection procedure used to trigger re-learning in the case of fundamental changes in link qualities. References [1] T. He, D. Goeckel, R. Raghavendra, and D. Towsley, “Endhost-Based Shortest Path Routing in Dynamic Networks: An Online Learning Approach,” 2012, under submission. [2] V. Anantharam, P. Varaiya, and J. Walrand, “Asymptotically Efficient Allocation Rules for the Multiarmed Bandit Problem with Multiple Plays—Part I: I.I.D. Rewards,” IEEE Trans. Automatic Control, 1987. [3] D. Pollard, Convergence of Stochastic Processes, 1984. 7 Algorithm 3 Online SPR under Coordinated Probing (OSPR-CP) Require: Candidate paths Pf for each source-destination pair (f = 1, . . . , S). Ensure: At each time step, globally select S paths to probe to compute estimated mean link weights (l̂i )N i=1 and locally select a path to route for each source. Probing: N 1: Initialization: probe each link at least once, compute (l̂i , mi )N i=1 , broadcast (l̂i )i=1 to all the sources 2: for each time step t do 3: for s = 1, . . . , S do ∆ S 4: Probe the path p ∈ P = Sf=1 Pf containing the globally least measured link 5: Update (l̂i , mi )N i=1 accordingly N 6: Broadcast (l̂i )i=1 to all the sources 7: 8: 9: Routing: for a source-destination pair f after receiving initial (l̂i )N i=1 for each time step t do P Route over path p̂f (t) = arg minp∈Pf i pi l̂i Receive updated (l̂i )N i=1 Algorithm 4 Online SPR under Uncoordinated Probing (OSPR-UP) Require: Candidate paths Pf for a source-destination pair of interest. Ensure: Select a path to route and a path to probe at each time step, and compute estimated mean link weights (l̂i )N i=1 . 1: Initialization: probe each link covered by Pf at least once, compute (l̂i,f , mi,f )N i=1 , share it with the other sources 2: for each time step t do P P 3: Compute the aggregate link estimate: ˆli = ( Ss=1 l̂i,s mi,s )/( Ss=1 mi,s ) for all i P 4: Route over path p̂f (t) = arg minp∈Pf i pi l̂i PS 5: Probe the path pf (t) ∈ Pf containing the globally least measured (in terms of mi = s=1 mi,s ) link among those covered by Pf 6: Update (l̂i,f , mi,f )N i=1 accordingly and share it with the other sources Algorithm 5 Window-based Change Detection (WCD) Require: Window size ω, false alarm constraint α. Ensure: Detect changes on each link i and restart learning upon detection. 1: for each new measurement of link i do 2: Update ˆli,s q 3: if |l̂i,c − l̂i,s | ≥ ω2 log α2 then 4: 5: 6: reset (l̂i , mi ) (and also l̂i,c , l̂i,s ) else update (l̂i , mi ) according to the measurement [4] T. Cover and J. Thomas, Elements of Information Theory. John Wiley & Sons, Inc., 1991. [5] Y. Gai, B. Krishnamachari, and R. Jain, “Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards and Individual Observations,” IEEE/ACM Trans. Networking, 2012. 8

- Similar pages
- ROHM BA7180AS