.ppt

Optimal Space Lower
Bounds for all Frequency
Moments
David Woodruff
Based on SODA ’04 paper
The Streaming Model [AMS96]
4





3
7
3
1
1
0
…
Stream of elements a1, …, aq each in {1, …, m}
Want to compute statistics on stream
Elements arranged in adversarial order
Algorithms given one pass over stream
Goal: Minimum space algorithm
Frequency Moments
Notation
 q = stream size, m = universe size
 fi = # occurrences of item i
k-th moment
 F0 = # of Distinct elements
 F1 = q
 F2 = repeat rate
Why are frequency moments important?
Applications


Estimating # distinct elts. w/ low space
 Estimate selectivity of queries to DB w/o expensive sort
 Routers gather # distinct destinations w/limited memory.
Estimating F2 estimates size of self-joins:
Bob
a
Alice b
Bob
c
Alice
b
y
x
Bob
a
x
, Alice y
Bob
a
z
Bob
Bob
c
x
Bob
c
z
Bob
z
The Best Determininistic Algorithm
 Trivial algorithm for Fk
 Store/update fi for each item i, sum fik at end
 Space = O(mlog q): m items i, log q bits to count fi
 Negative Results [AMS96]:
 Compute Fk exactly => (m) space
 Any deterministic alg. outputs x with |Fk –
x| <  must use (m) space
What about randomized algorithms?
Randomized Approx Algs for Fk
 Randomized alg. -approximates Fk if outputs x s.t.
Pr[|Fk – x| <  Fk ] > 2/3
 Can -approximate F0 [BJKST02], F2 [AMS96], Fk
[CK04], k > 2 in space:
(big-Oh notation suppresses polylog(1/, m, q) factors)
 Ideas:
 Hashing: O(1)-wise independence
 Sampling
Example: F0 [BJKST02]
 Idea: For random function h:[m] -> [0,1] and distinct
elts b1, b2, …, bF0, expect mini h(bi) ¼ 1/F0
Algorithm:
 Choose 2-wise indep. hash function h: [m] -> [m3]
 Maintain t = (1/2) distinct smallest values h(bi)
 Let v be t-th smallest value
 Output tm3/v as estimate for F0
 Success prob up to 1- => take median O(log 1/) copies
 Space: O((log 1/)/2)
Example: F2 [AMS99]
Algorithm:
 Choose 4-wise indep. hash function h:[m] -> {-1,1}
 Maintain Z = i in [m] fi ¢ h(i)
 Output Y = Z2 as estimate for F2
Correctness:
Chebyshev’s inequality => O(1/2) space
Previous Lower Bounds:
 [AMS96] 8 k, –approximating Fk => (log m) space
 [Bar-Yossef] -approximating F0 => (1/) space
 [IW03] -approximating F0 =>
space if
 Questions:
bound hold for k  0?
 Does it hold for F0 for smaller ?
 Does the
Our First Result

Optimal Lower Bound: 8 k  1, any  = (m-.5),
-approximate Fk => (-2) bits of space.
 F1 = q trivial in log q space
 Fk trivial in O(m log q) space, so need  = (m-.5)

Technique: Reduction from 2-party protocol for
computing Hamming distance (x,y)

Use tools from communication complexity
Lower Bound Idea
Alice
y 2 {0,1}m
x 2 {0,1}m
Stream s(x)
(1 § ) Fk
algorithm A
Bob
Stream s(y)
S
Internal
state of A
(1 § ) Fk
algorithm A
• Compute (1 § ) Fk(s(x) ± s(y)) w.p. > 2/3
• Idea: If can decide f(x,y) w.p. > 2/3, space used
by A at least randomized 1-way comm. Complexity of f
Randomized 1-way comm. complexity

Boolean function f: X £ Y ! {0,1}
Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y)
Only 1 message m sent: must be from Alice to Bob
Communication cost = maxx,y Ecoins [|m|]

 -error randomized 1-way communication



complexity R(f), is cost of optimal protocol
computing f with probability ¸ 1-
Ok, but how do we lower bound R(f)?
Shatter Coefficients [KNR]
 F = {f : X ! {0,1}} function family, f 2 F length-|X| bitstring
 For S µ X, shatter coefficient SC(fS) of S :
|{f |S}f 2 F| = # distinct bitstrings when F restricted to S
 SC(F, p) = maxS µ X, |S| = p SC(fS). If SC(fS) = 2|S|, S shattered
 Treat f: X £ Y ! {0,1} as function family fX :
 fX = { fx(y) : Y ! {0,1} | x 2 X }, where fx(y) = f(x,y)
 Theorem [BJKS]: For every f: X £ Y ! {0,1}, every integer p,
R1/3(f) = (log(SC(fX, p)))
Warmup: (1/) Lower Bound [Bar-Yossef]
 Alice input x 2R {0,1}m, wt(x) = m/2
 Bob input y 2R {0,1}m, wt(y) = m
 s(x), s(y) any streams w/char. vectors x, y
PROMISE:
(1) wt(x Æ y) = 0
OR
(2) wt(x Æ y) = m
f(x,y) = 0
f(x,y) = 1
F0(s(x) ± s(y)) = m/2 + m
F0(s(x) ± s(y)) = m/2




R1/3(f) = (1/) [Bar-Yossef] (uses shatter coeffs)
(1+’)m/2 < (1 - ’)(m/2 + m) for ’ = ()
Hence, can decide f ! F0 alg. uses (1/) space
Too easy! Can replace F0 alg. with a Sampler!
Our Reduction:
Hamming Distance Decision Problem (HDDP)
Set t = (1/2)
Alice
Bob
x 2 {0,1}t
y 2 {0,1}t
Promise Problem :
(x,y) · t/2 – (t1/2)
f(x,y) = 0
OR
(x,y) > t/2
f(x,y) = 1
Lower bound R1/3(f) via SC(fX, t), but need a lemma
Main Lemma
S µ{0,1}n
=T
= S-T

y
9 S µ {0,1}n with |S| = n s.t. exist 2(n) “good”
sets T µ S s.t.

9 y 2 {0,1}n s.t


8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0
8 t 2 S – T, (y,t) > n/2
Lemma Resolves HDDP Complexity


Theorem: R1/3(f) = (t) = (-2).
Proof:
 Alice gets yT for random good set T
applying main lemma with n = t.
 Bob gets random s 2 S
 Let f: {yT }T £ S ! {0,1}.
 Main Lemma =>SC(f) = 2(t)
 [BJKS] => R1/3(f) = (t) = (-2)

Corollary: (1/2) space for randomized 2-party
protocol to approximate (x,y) between inputs
 First known lower bound in terms of !
Back to Frequency Moments
Use -approximator for Fk to solve HDDP
y 2 {0,1}t
ay
Fk Alg
s 2 S µ {0,1}t
i-th universe element included exactly
as
once in stream ay iff yi = 1 (as same)
State
Fk Alg
Solving HDDP with Fk

Alice/Bob compute -approx to Fk(ay ± as)
Fk(ay ± as) = 2k wt(y Æ s) + 1k (y,s)
For k  1,

Alice also transmits wt(y) in log m space.


Conclusion: -approximating Fk(ay ± as) decides HDDP, so
space for Fk is (t) = (-2)
Back to the Main Lemma

Recall: show 9 S µ {0,1}n with |S| = n s.t. 2(n)
“good” sets T µ S s.t:
 9 y 2 {0,1}n s.t
1. 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0
2. 8 t 2 S – T, (y,t) > n/2

Probabilistic Method
 Choose n random elts in {0,1}n for S
 Show arbitrary T µ S of size n/2 is good with
probability > 2-zn for constant z < 1.
 Expected # good T is 2(n)
 So exists S with 2(n) good T
Proving the Main Lemma



T ={t1, …, tn/2} µ S arbitrary
Let y be majority codeword of T
What is probability p that both:
1. 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0
2. 8 t 2 S – T, (y,t) > n/2


Put x = Pr[8 t 2 T, (y,t) · n/2 – cn1/2]
Put y = Pr[8 t 2 S-T, (y,t) > n/2] = 2-n/2
Independence => p = xy = x2-n/2
The Matrix Problem
 Wlog, assume y = 1n (recall y is majority word)
 Want lower bound Pr[8 t 2 T, (y,t) · n/2 – cn1/2]
 Equivalent to matrix problem:
t1 ->
t2 ->
…
tn/2 ->
101001000101111001
100101011100011110
001110111101010101
101010111011100011
For random n/2 x n binary matrix M, each column
majority 1, what is probablity each row ¸ n/2 + cn1/2 1s?
A First Attempt



Set family A µ 2^{0,1}n monotone increasing if
S1 2 A, S1 µ S2 => S2 2 A
For uniform distribution on S µ {0,1}n, and A, B monotone
increasing families, [Kleitman]
Pr[A Å B] ¸ Pr[A] ¢ Pr[B]
First try:
 Let R be event M ¸ n/2 + cn1/2 1s in each row, C event M
majority 1 in each column
 Pr[8 t 2 T, (y,t) · n/2 – cn1/2] = Pr[R | C] = Pr[R Å C]/Pr[C]
 M characteristic vector of subset of [.5n2] => R,C monotone
increasing
 => Pr[R Å C]/Pr[C] ¸ Pr[R]Pr[C]/Pr[C] = Pr[R] < 2-n/2
 But we need > 2-zn/2 for constant z < 1, so this fails…
A Second Attempt

Second Try:
 R1: M ¸ n/2 + cn1/2 1s in first m rows
 R2: M ¸ n/2 + cn1/2 1s in remaining n/2-m rows
 C: M majority 1 in each column
Pr[8 t 2 T, (y,t) · n/2 – cn1/2] = Pr[R1 Å R2 | C]
= Pr[R1 Å R2 Å C]/Pr[C]
R1, R2, C monotone increasing
=> Pr[R1 Å R2 Å C]/Pr[C] ¸ Pr[R1 Å C]Pr[R2]/Pr[C]
= Pr[R1 | C] Pr[R2]
Want this at least 2-zn/2 for z < 1

Pr[ Xi > n/2 + cn1/2] > ½ - c (2/pi)1/2 [Stirling]

Independence => Pr[R2] > (½ - c(2/pi)1/2)n/2 - m




Remains to show Pr[R1 | C] large.
Computing Pr[R1 | C]
 Pr[R1 | C] = Pr[M ¸ n/2 + cn1/2 1s in 1st m rows | C]
 Show Pr[R1 | C] > 2-z’m for certain constant z’ < 1
 Ingredients:
 Expect to get n/2 + (n1/2) 1s in each of 1st m rows | C
 Use negative correlation of entries in a given row =>
show n/2 + (n1/2) 1s in a given row w/good probability
for small enough c
 A simple worst-case conditioning argument on these
1st m rows shows they all have ¸ n/2 + cn1/2 1s
Completing the Proof





Recall: what is probability p = xy, where
1. x = Pr[ 8 t 2 T, (y, t) · n/2 – cn1/2]
2. y = Pr[ 8 t 2 S – T, (y,t) > n/2] = 2-n/2
3. R1: M ¸ n/2 + cn1/2 1s in first m rows
4. R2: M ¸ n/2 + cn1/2 1s in remaining n/2-m rows
5. C: M majority 1 in each column
x ¸ Pr[R1 | C] Pr[R2] ¸ 2-z’m (½ - c(2/pi)1/2)n/2 – m
Analysis shows z’ small so this ¸ 2-z’’n/2, z’’ < 1
Hence p = xy ¸ 2-(z’’+1)n/2
Hence expected # good sets 2n-O(log n)p = 2(n)
So exists S with 2(n) good T
Bipartite Graphs
 Matrix Problem  Bipartite Graph Counting
Problem:
…
…
 How many bipartite graphs exist on n/2 by n
vertices s.t. each left vertex has degree > n/2 +
cn1/2 and each right vertex degree > n/2?
Our Result on # of Bipartite Graphs

Bipartite graph count:
 Argument shows at least 2n^2/2 – zn/2 –n such bipartite
graphs for constant z < 1.
 Main lemma shows # bipartite graphs on n + n
vertices w/each vertex degree > n/2 is > 2n^2-zn-n
 Can replace > with <
 Previous knowncount: 2n^2-2n
 [MW – personal comm.]
 Follows easily from Kleitman inequality
Summary

Results:

Optimal Fk Lower Bound: 8 k  1 and any
 = (m-1/2), any -approximator for Fk must use
(-2) bits of space.

Communication Lower Bound of (-2) for oneway communication complexity of (, )approximating (x, y)

Bipartite Graph Count: # bipartite graphs on
n + n vertices w/each vertex degree > n/2 at
least 2n^2-zn-n for constant z < 1.