Extended Ethernet Congestion Management (E2CM): Per Path ECM-A Hybrid Proposal

Extended Ethernet Congestion Management
(E2CM): Per Path ECM - A Hybrid Proposal
M. Gusat, C. Minkenberg and R. Luijten
IBM Research GmbH, Zurich
March 14th 2007
Outline
• Status at 802.1
• Critique Analysis of BCN
• IBM hybrid proposal: E2CM
¾ unite BCN/ECM + recent CM proposals
• E2CM simulations
¾ A) Orientative results
¾ B) Reference results
• Conclusions
IBM Research GmbH, Zurich
2
Status at 802.1au
1. 802.1 WG critique of BCN performance
1. Stability: oscillations and ‘slow’ convergence
2. Fairness: BCN’s rate allocation is ‘unfair’
2. Recent rate-based (RB) CM proposals in 802.1
1. BECN evolves into FECN (R. Jain)
2. FECN-like destination-based probing / DBP (M. Seaman)
3. First reconciliation proposal: QCN (B. Prabhakar)
¾ optimized BCN
¾ same or better performance at lower overhead
¾ 802.1au agreement on CM method appears difficult...
9 adhoc simulation progress slowdown
IBM Research GmbH, Zurich
3
Critique of BCN vs. ECN:
Of Fairness, Stability and Speed
IBM Research GmbH, Zurich
4
Current CM Proposals:
R. Jain’s FECN + B. Prabhakar’s QCN + M. Seaman’s DBP
A) R. Jain presents in Monterey a new rate-based CM (RB-CM) proposal, i.e. FECN
•
Performance claims (top 3):
1.
2.
3.
“Perfect Fairness”
“Fast Convergence”
“No PAUSE required or issued”
[Obs.: here “perfect” stands for max-min]
[Obs.: 10ms here]
[Obs.: in certain cases]
Note: FECN results are to be validated by other simulation teams.
B) M. Seaman’s DST-based probing (DBP) proposal
“...algorithm comprising three sets of algorithms for Sources, Bridges, and Destinations.
1.
the Destination originates and transmits regular Rate Report (RR) frames to each active Source.
2. Each RR traces the reverse path from the Destination to the Source and carries an advertised
rate for use by the Source in transmitting to that Destination. The RR originally carries a rate set
by the destination to be its receiving link speed. “
C) B. Prabhakar’s insight leading to the QCN proposal
“BCN generates extra signaling traffic
¾
¾
Hence sampling probability is kept at 1%; this can go up to 10% and improve responsiveness by a lot
But, if [i] forward signaling is possible, or [ii] another means of signaling more frequently can
be found, then we can send less information per signal”
IBM Research GmbH, Zurich
5
IBM Proposal: BCN + FECN + DBP + QCN => E2CM
The Hybrid Dilemma: How to Combine the Best
Features ... Only?
Tree Saturation => Complex CM ... yet leads to our proposal
port 1
port 4
.
.
.
.
.
. . .
.
.
.
. . .
.
port 4
.
.
.
HCARx
.
. . .
. . . . .
.
HCATx
port 1
shortcut routes within same SEs
. . . . .
HCATx
(PPT animation)
.
.
port 128
port 128
32 4x4
HCARx
32 4x4
32 4x4
32 4x4
16 8x8
same SEs
•
Documented in papers from sim and h/w experiments [refs available]
•
Effect: nonlinear (saturation chain) and time-varying system (2 non-invariants)
•
Transport lag => transcendental characteristic equation
¾
¾
Link-level flow control induces blocking chains in depth and breadth
Few hot flows hog resources (high-order HOL-B) => blocking many/all cold/victim flows
¾
Impact on piecewise linearization, particularly at linearization points
¾
Root-locus and Routh-Hurwitz apply only to rational transfer functions...
IBM Research GmbH, Zurich
7
Critique – 1 Analysis : Stability of BCN
•
A key observation: Baseline BCN has robust performance in the linear region!
¾ However, its dynamic range (DR) is limited by the queue capacity
¾ Furthermore, possibly fed by n simultaneous arrivals...
from ECM Spec
Saturated Integrator behavior...
⇒ Feedback: Fb(t) = -(q(t) – Qeq) + w*(dq/dt) / (μj* ps) =>
0 ≤ q(t) ≤ qmax
OVF
UDF
¾ Fast transition between lower/upper saturation (n+1 stochastic procs)
–
–
requires frequent use of saturation signals: BCN_Max, BCN(0,0)
non-linear saturation patches reduce the efficiency of the baseline control alg.
IBM Research GmbH, Zurich
8
Extending the Linear Region of a Saturated Integrator
•
How to scale BCN’s stability properties w/ network size?
1. Increase the dynamic range by chaining the j queues along the path i...
2. Control the chain of queues instead of the individual queue
¾ Adopt per path probing ...
•
OVF
UDF
Concatenate multiple queues along a path into a Path Queue ->
Σqij
¾ State equations
From local queue stability to per path stability:
LQueue)
dq/dt = HSD*λ(t) – μj , where
max(HSD) = N, and max (μj ) = Cj
PQueue)
Q’ij = Σidqij/dt = Σiλij (t) – μj , 1< i ≤ HSD
Obs.: Slope steepness decreases: from n+1 to 2 stochastic procs
IBM Research GmbH, Zurich
OVF
UDF
9
Critique - 2 Analysis : Fairness of BCN
•
BCN’s ‘unfairness’: from probabilistic sampling of aggregate occupancy
¾ Queue contains frames from any flow => lack of per-flow state sensing
9 Per-flow state in the bridge is prohibited by PAR
•
A) One approach is to calculate / iterate the fair share (FS) as in
ABR methods such as FECN, UT, OSU, H2
¾ TBD: scalability and h/w complexity
•
B) Other methods were proposed by Stanford Univ. [Allerton paper]
•
C) IBM proposal: A per-flow probing sensor in the edge node...
¾ Probing: triggered by BCN, or autonomous (congestion avoidance)
¾ Why per flow? -> fast max-min convergence even w/o FS in the bridge
IBM Research GmbH, Zurich
10
E2CM Principles: Dual Heritage
•
E2CM includes saturation tree mgnt: avoidance, isolation and recovery.
¾ Historically this wasn’t the case in TCP and ATM ABR
I) BCN and QCN heritage: E2CM draws upon the baseline BCN
¾ PD control + feedback equation (extended, instead of piecewise linearized)
¾ AIMD-based SRF + parameters
9 Potential QCN optimizations:
–
–
Bridges do not send increase signals
Sparse quantization (6/5 bit)
II) FECN and DBP heritage: SRC probes for RTT + DST calculates rate
¾ from FECN and DBP we adopt per path probing and DST rate reporting
Effect: E2CM extends BCN’s dynamic range to a wider linear region proportional
to the number of buffers traversed per path
¾ increased stability and phase margin (improved convergence)
¾ scalability w/ network size
•
By adopting per-flow accounting in the end node, E2CM converges to fair
allocation rates w/o using rate-based calculations in the bridge
IBM Research GmbH, Zurich
11
E2CM Operation (PPT animation)
Probe
Switch 1
src
1. BCN
1. Probe
arrives
arrives
at source
at source
1. Probe arrives at dst
2. Update timestamp, insert
1. Qeq exceeded flow service rate
2. Send BCN to
3.source
Return probe to source
Switch
2
BCN
2. Install
2. Pathrate
occupancy
limiter computed
3. Inject
3. AIMD
probe
control
w/ timestamp
applied using
same rate limiter
Switch 3
dst
•
Probing is triggered by BCN frames; only rate-limited flows are probed
•
Per flow, BCN and probes employ the same rate limiter
¾ Insert one probe every X KB of data sent per flow, e.g. X = 75 KB
¾ Probes traverse network inband: Objective is to observe real current queuing delay
¾
¾
¾
¾
Control per-flow (probe) as well as per-queue (BCN) occupancy
CPID of probes = destination MAC
Rate limiter is never associated with probe CPID
Parameters re. probes may be set differently (in particular Qeq,flow, Qmax,flow, Gd,flow, Gi,flow)
IBM Research GmbH, Zurich
12
Proposed Compromise Scheme details
•
Extension: BCN reception triggers RTT and Tput probing in the end nodes
¾
A) While activated SRC periodically (t or n-pkts) insert probe frame for every RLT flow
9
9
Probe contains timestamp
Probes traverse network in-band with regular data frames
¾
B) Upon reception of forward probe, the DST will
¾
C) Upon reception of reverse probe back at the originating SRC
¾
1.
2.
3.
Update timestamp to reflect forward latency L
Calculate and report the flow service rate R since last flow probe
Return probe to sending SRC
1.
2.
Adjust latency L for flight time L0
Apply Little’s Formula:
3.
Apply the extended BCN source response function
9
9
One rate limiter per flow
Associated with last negative feedback
Q = (L - L0)*R
1.
Yields the mean number of bytes of probed flow stored on entire forward path
1.
E.g., set Qequilibrium and apply AIMD rate adjustment
Net: E2CM extends buffer occupancy per path and flow (however, CM triggering may be
done on rate –instead of size– thresholds)
9
9
per flow: Separates hot from cold flows
per path: extends the region of linear BCN operation
IBM Research GmbH, Zurich
13
Reaction Point
if (bcn.type() == BCN_BCN) {
// Compute BCN reaction as usual
...
} else if (bcn.type() == BCN_PROBE) {
// Store minimum latency as time of flight
if (flightTime > bcn.getLatency() || flightTime == 0.0)
flightTime = bcn.getLatency();
// Compute amount of data queued on forward path, adjusting for flight time
flowQ = bcn.getThroughput()*(bcn.getLatency() - flightTime);
flowdQ = max( min( flowQ - flowLastQ, 2*flowQeq ), -2*flowQeq );
flowQoff = max( min( flowQeq - flowQ, flowQeq ), -flowQeq );
if (flowQ > flowQmax) // Qmax threshold exceeded?
feedback = -(1+2*W)*flowQeq; // Apply maximum negative feedback
else
feedback = (flowQoff - W*flowdQ); // Compute feedback
flowLastQ = flowQ; // Store last queue estimate
// Apply AIMD rate adjustment
if (feedback > 0) // Additive increase
rate = rate + flowGi*feedback*rateUnit;
else if (feedback < 0) // Multiplicative decrease
rate = rate * (1.0 + flowGd*feedback);
}
// If needed, instantiate new rate limiter or update rate
// Associate rate limiter with CP if feedback < 0 and not probe
...
IBM Research GmbH, Zurich
14
E2CM Frame Format
•
General fields
¾ Congestion point MAC
9 Inserted by congestion point
9 Probe congestion point MAC = flow destination MAC
¾ Flow identifier
9 Hash based on src MAC, dst MAC, priority
¾ BCN type
9 BCN, BCN_MAX, BCN_ZERO, BCN_PROBE_FWD, BCN_PROBE_REV
•
BCN-specific fields
¾ Queue offset
9 Inserted by congestion point
¾ Queue delta
9 Inserted by congestion point
•
Probe-specific fields
¾ Forward latency
9 Already provided in original BCN format (but different usage)
9 Timestamp inserted by source node
9 Updated by destination node (latency = now – timestamp)
¾ Flow throughput field
9 Inserted by destination node
9 Measured between two subsequent probes for same flow
* Red: new fields
IBM Research GmbH, Zurich
15
E2CM: Selected Initial Simulation Results
Baseline IG and OG simulations: Orientative Preview
(not for reference)
Input-generated Hotspot
Node 2
Node 3
Node 4
5%
Core
Switch
40%
Node 5
Node 6
•
•
•
5%
Node 1
80%
80%
5 flows sending to hotspot: aggregate load = 21 Gb/s
Max-min fair rates = (0.5; 0.5; 3; 3; 3) Gb/s = (62.5; 62.5; 375; 375; 375) MB/s
Hotspot starts at t = 0.1s; at t=0.5s, service rate of node 1 is reduced by half; fair rates = (62.5; 62.5; 167; 167;
167) MB/s
IBM Research GmbH, Zurich
17
BCN vs. E2CM : Fair and Steady Rate Allocation
BCN
E2CM
Rate fluctuations
Stable max-min fair rates
•
Graphs show aggregate (red) and per-flow throughput
•
1.
2.
3.
4.
5.
6.
Params
Qeq_BCN = 75 kB, Qeq_E2CM = 15 kB, Gd = 1.333*10^-6, Gi = 6.6667*10^-4
(Gdf = 0.5, Gif = 0.1; E2CM gains are 5 times as high, because Qeq is 5 times as low)
Ru = Rmin = 250 kB/s = 2 Mb/s
M = 300 kB/port, Thr_hi = 295500, Thr_lo = 147750
sample interval = 75 kB (for BCN as well as E2CM), 15 kB for rate-limited flows
BCN_MAX disabled
IBM Research GmbH, Zurich
18
Output-Generated Single-Hop Hotspot
Node 2
85%
Service rate = 20%
Core
Switch
Node N
•
•
•
85%
Node 1
85%
All nodes: Uniform destination distribution, load = 85% (8.5 Gb/s)
Node 1 service rate = 20%
One congestion point
¾
¾
Hotspot degree = N-1
All flows affected
•
Params
1.
2.
3.
4.
5.
6.
Qeq_BCN = 75 kB, Qeq_E2CM = 15 kB, Gd = 1.333*10^-6, Gi = 6.6667*10^-5
(Gdf = 0.5, Gif = 0.01; E2CM gains are 5 times as high, because Qeq is 5 times as low)
Ru = Rmin = 250 kB/s = 2 Mb/s
M = 300 kB/port, Thr_hi = 295500, Thr_lo = 147750
sample interval = 75 kB (for BCN as well as E2CM), 15 kB for rate-limited flows
BCN_MAX enabled (Qsc = 280500)
IBM Research GmbH, Zurich
19
BCN vs. E2CM : Output-generated (Tp and Q)
Reduced transient
Slow recovery
Fast recovery
Transient
No underutilization
Underutilization
E2CM
BCN
IBM Research GmbH, Zurich
20
Reference Simulation Results
E2CM r1.0
Simulation Setup & Parameters
•
Traffic
–
–
–
•
•
I.i.d. Bernoulli arrivals
Uniform destination distribution (to all
nodes except self)
Fixed frame size = 1500 B
•
Scenarios
1.
2.
3.
4.
5.
6.
Baseline input-generated (IG)
Max-min (mice-elephant) IG
Single-hop output-generated (OG)
Multi-Hop OG background HS
Bursty On-Off
Parking lot
•
Switch
–
–
–
–
M = 300 KB/port
Partitioned memory per input, shared
among all outputs
No limit on per-output memory usage
PAUSE enabled
•
•
•
Applied on a per input basis based on
local high/low watermarks
watermarkhigh = 280 KB
watermarklow = 260 KB
Adapter
–
–
–
–
–
ECM
–
–
–
–
–
–
–
–
–
•
Per-node virtual output queuing
No limit on number of rate limiters
Unlimited ingress buffer size
Egress buffer size = 150 KB
PAUSE enabled
•
•
watermarkhigh = 140 KB
watermarklow = 130 KB
W = 2.0
Qeq = 75 KB (= M/4)
Gd = 0.5 / ((2*W+1)*Qeq)
Gi0 = (Rlink / Runit) * ((2*W+1)*Qeq)
Gi = 0.005 * Gi0
Psample = 2% (on average 1 sample every 75
KB) or 10% (15 KB)
Runit = Rmin = 1 Mb/s
BCN_MAX enabled, threshold = 280 KB
No BCN(0,0), no self-increase
E2CM (per-flow)
–
–
–
–
–
–
–
W = 2.0
Qeq = 15 KB
Gd = 2.5 / ((2*W+1)*Qeq)
Gi = 0.025 * Gi0
Psample = 2% (on average 1 sample every 75
KB) or 10% (15 KB)
Runit = Rmin = 1 Mb/s
BCN_MAX enabled, threshold = 56 KB
IBM Research GmbH, Zurich
22
1. Baseline Input-Generated Hotspot
Node 2
50%
Node 3
50%
Edge
Switch 1
50%
Node 1
Edge
Switch 2
Edge
Switch 5
Node 6
Core
Switch
50%
Edge
Switch 3
Edge
Switch 6
Node 4
Node 5 50%
•
•
•
Edge
Switch 4
Node 7
Four culprit flows of 5 Gb/s each from nodes 2, 3, 4, 5 to node 6 (hotspot)
One victim flows of 5 Gb/s from node 1 to node 7
Fair allocation provides 2.5 Gb/s to all culprits and 5 Gb/s to the victim
IBM Research GmbH, Zurich
23
Results Baseline scenario (Tp)
ECM
E2CM
No underutilization
75 KB
Victim not affected
Unfair rate
allocation
~150 ms transient
Perfect rate
allocation
Fast recovery
15 KB
Still unfair rate
allocation
IBM Research GmbH, Zurich
Shorter transient
24
Results Baseline scenario (Q)
ECM
E2CM
Stable queue length
Stable queue length
75 KB
Stable per-flow
occupancy!
15 KB
IBM Research GmbH, Zurich
25
Results Baseline scenario (Tp, Gi = 0.25*Gi0)
ECM
E2CM
Fair allocation
75 KB
Fast fairness
15 KB
IBM Research GmbH, Zurich
26
Convergence times IG single-hop
Gi = 0.025 * Gi0
400
ECM, 75 KB
ECM, 15 KB
300
E2CM, 15 KB
250
200
150
100
50
14
E2CM, 75 KB
Convergence time (ms)
relative
Convergence time (ms)
350
Gi = 0.25 * Gi0
16
Ideal
ECM, 75 KB
E2CM, 75 KB
ECM, 15 KB
E2CM, 15 KB
12
10
8
6
4
2
0
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.05
Convergence accuracy
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Convergence accuracy
•
Convergence times determined over 1-ms averages of hot OQ length
•
Relative accuracy a means that measured values stay within band [v *(1-a), v *(1+a)],
where v is the steady-state value, so band width = 2*a
IBM Research GmbH, Zurich
27
2. Input-Generated Mice-Elephant Hotspot
Node 2
Node 3
Node 4
5%
Node 6
•
•
Core
Switch
40%
Node 5
•
•
5%
Node 1
80%
80%
5 flows sending to hotspot: aggregate load = 21 Gb/s
Max-min fair rates = (0.5; 0.5; 3; 3; 3) Gb/s =
= (62.5; 62.5; 375; 375; 375) MB/s
Hotspot max-min fair rates = (62.5; 62.5; 167; 167; 167) MB/s
Achieved...
IBM Research GmbH, Zurich
28
Results “Mice-Elephant” scenario (Tp and Q)
ECM
E2CM
75 KB
IBM Research GmbH, Zurich
Gi = 0.1 * (Rlink / Runit) * ((2*W+1)*Qeq)
29
3. Output-Generated Single-Hop Hotspot
Node 2
85%
Service rate = 10%
Core
Switch
Node N
•
•
•
85%
Node 1
85%
All nodes: Uniform destination distribution, load = 85% (8.5 Gb/s)
Node 1 service rate = 10%
One congestion point
–
Hotspot degree = N-1
– All flows affected => step response (test stability)
IBM Research GmbH, Zurich
30
Results OG multi-hop scenario (Tp)
ECM
E2CM
75 KB
15 KB
IBM Research GmbH, Zurich
31
Results OG (Q): >2x Faster Convergence..!
ECM
E2CM
Lag => zero in RH splane => undershoots
Reduced
undershoot
75 KB
15 KB
IBM Research GmbH, Zurich
32
Convergence times single-hop OG scenario
0.18
0.45
0.35
ECM, 75 KB
E2CM, 75 KB
ECM, 15 KB
E2CM, 15 KB
0.3
0.25
0.2
0.15
0.1
0.05
Absolute
0.16
Absolute convergence tim e (s)
Relative convergence time (s)
0.4
0.14
0.12
0.1
0.08
0.06
0.04
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.02
Convergence accuracy (%)/100
Relative
0
ECM, 75 KB
E2CM, 75 KB
ECM, 15 KB
E2CM, 15 KB
Configuration
•
Convergence times determined over 1-ms averages of hot OQ length
•
Relative accuracy a means that measured values stay within band [v *(1-a), v
*(1+a)], where v is the steady-state value, so band width = 2*a
•
Absolute accuracy means that measured values stay within band [1.5, 280] KB
IBM Research GmbH, Zurich
33
4. OG Multi-Hop Background Traffic Hotspot
25%
Node 1
Service rate = 5%
Node 2
40%
Edge
Switch 1
25%
Edge
Switch 3
Node 3 25%
Node 4
Node 5
25%
40% Node 8
Core
Switch
25%
Edge
Switch 2
Node 7
Node 9
Edge
Switch 4
40%
40%
Node 10
Node 6 25%
•
•
•
•
All nodes: Uniform destination distribution
Nodes 1-6 load = 25% (2.5 Gb/s), nodes 7-10 load = 40% (4 Gb/s)
– Mean aggregate load = (6*.25+4*.4)/10 = 31% (3.1 Gb/s)
Node 7 service rate = 5%
Five congestion points
– All switches and all flows affected
IBM Research GmbH, Zurich
34
Results OG multi-hop BGND (Q)
ECM
E2CM
75 KB
15 KB
IBM Research GmbH, Zurich
35
Convergence times multi-hop OG scenario
1.4
Convergence time (s)
1.2
ECM, 75 KB
E2CM, 75 KB
ECM, 15 KB
E2CM, 15 KB
•
Convergence times
determined over 1-ms
averages of hot OQ length
•
Relative accuracy a means
that measured values stay
within band [v *(1-a), v *(1+a)],
where v is the steady-state
value, so band width = 2*a
•
Absolute accuracy means that
measured values stay within
band [1.5, 280] KB
1
0.8
relative
0.6
0.4
0.2
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Convergence accuracy (%)/100
0.16
absolute
0.14
Convergence time (s)
0.12
0.1
0.08
0.06
0.04
0.02
0
ECM, 75 KB
E2CM, 75 KB
ECM, 15 KB
Configuration
E2CM, 15 KB
IBM Research GmbH, Zurich
36
5. Bursty Baseline Input-Generated Hotspot
Node 2 100%
100%
Node 3
Edge
Switch 1
Node 1
Edge
Switch 2
Edge
Switch 5
Node 6
Core
Switch
100%
Edge
Switch 3
Edge
Switch 6
Node 4
Node 5 100%
•
•
•
•
Edge
Switch 4
Node 7
Four hot flows of 10 Gb/s each from nodes 2, 3, 4, 5 to node 6 (hotspot)
Every 100 ms, flows from nodes 4 and 5 are switched from off to on and vice versa
(duty cycle = 200 ms)
Fair allocation provides 2.5 Gb/s per flow when 4 are active, 5 Gb/s when 2 are active
Pause disabled, very small adapter buffers (10 frames)
IBM Research GmbH, Zurich
37
Bursty Baseline IG scenario (Tp): Convergence < 5ms
ECM
E2CM
Convergence but unfair rate
allocation
Fast convergence (< 5ms)
and fair rate allocation
75 KB
Close to ideal
15 KB
IBM Research GmbH, Zurich
38
Bursty Baseline IG scenario (Q)
ECM
E2CM
Fast convergence on stable queue
level (75 KB) after every transition
Idem
75 KB
Note different queue levels (30 vs. 60
KB) depending on number of active flows
Idem
Idem
15 KB
IBM Research GmbH, Zurich
39
6. Parking Lot Scenario
Node 1
Node 2
100%
Node 6
Switch 2
Switch 3
Node 7
Node 9
100%
Node 4
100%
100%
Node 5
•
•
•
•
•
Node 8
100%
Switch 1
Node 3
100%
Four hot flows of 10 Gb/s each from nodes 1, 2, 3, 4 to node 9 (hotspot)
Two cold flows of 10 Gb/s from node 5 to 7 and 6 to 8
Max-min fair allocation provides 2.0 Gb/s to all flows
Proportionally fair allocation provides 1.67 Gb/s to all hot flows and 3.33 Gb/s to all
cold flows
Pause disabled, very small adapter buffers (10 frames)
IBM Research GmbH, Zurich
40
Parking Lot Scenario (75KB) – Qeq, all = 15 KB
E2CM
ECM
Tp
Proportional fairness
Q
IBM Research GmbH, Zurich
41
E2CM - Parking Lot Scenario (75KB) – Per-flow Qeq
Qeq,hot = 15 KB, Qeq,cold = 7.5 KB
Qeq,all = 15 KB
Tp
Proportional fairness: 4x1.67 Gb/s, 2x3.33 Gb/s
Max-min fairness: 6x2 Gb/s
Q
IBM Research GmbH, Zurich
42
Conclusion
•
E2CM extends baseline BCN w/ per-path probing adopted from DBP and FECN
•
E2CM addresses the main critiques of BCN
¾ improves performance in
9 stability and speed, based on an saturated integrator w/ extended dynamic range
(distributed queue Qij = Σqij)
9 linear range of Fb is now scalable w/ no. of buffers in the network
9 fairness (per flow accuracy possible in end-nodes, when needed)
–
•
scalability cost to 100+ Gbps Ethernet and 1M-node datacenter is TBD
User-defined fairness:
¾ 1. max-min (canonical, beneficial for ‘mice’)
¾ 2. proportional (tempers ‘remote’ flows w/ long routing distance)
¾ 3. max-Tput (maximize utilization at cost of unfairness, but no starvation)
•
Synergy w/ FECN/DBP (probing) and w/ baseline ECM (param tuning)
•
We propose the hybrid E2CM as baseline CM approach
That’s all, thanks!
IBM Research GmbH, Zurich
43