Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA ’04 paper The Streaming Model [AMS96] 4 3 7 3 1 1 0 … Stream of elements a1, …, aq each in {1, …, m} Want to compute statistics on stream Elements arranged in adversarial order Algorithms given one pass over stream Goal: Minimum space algorithm Frequency Moments Notation q = stream size, m = universe size fi = # occurrences of item i k-th moment F0 = # of Distinct elements F1 = q F2 = repeat rate Why are frequency moments important? Applications Estimating # distinct elts. w/ low space Estimate selectivity of queries to DB w/o expensive sort Routers gather # distinct destinations w/limited memory. Estimating F2 estimates size of self-joins: Bob a Alice b Bob c Alice b y x Bob a x , Alice y Bob a z Bob Bob c x Bob c z Bob z The Best Determininistic Algorithm Trivial algorithm for Fk Store/update fi for each item i, sum fik at end Space = O(mlog q): m items i, log q bits to count fi Negative Results [AMS96]: Compute Fk exactly => (m) space Any deterministic alg. outputs x with |Fk – x| < must use (m) space What about randomized algorithms? Randomized Approx Algs for Fk Randomized alg. -approximates Fk if outputs x s.t. Pr[|Fk – x| < Fk ] > 2/3 Can -approximate F0 [BJKST02], F2 [AMS96], Fk [CK04], k > 2 in space: (big-Oh notation suppresses polylog(1/, m, q) factors) Ideas: Hashing: O(1)-wise independence Sampling Example: F0 [BJKST02] Idea: For random function h:[m] -> [0,1] and distinct elts b1, b2, …, bF0, expect mini h(bi) ¼ 1/F0 Algorithm: Choose 2-wise indep. hash function h: [m] -> [m3] Maintain t = (1/2) distinct smallest values h(bi) Let v be t-th smallest value Output tm3/v as estimate for F0 Success prob up to 1- => take median O(log 1/) copies Space: O((log 1/)/2) Example: F2 [AMS99] Algorithm: Choose 4-wise indep. hash function h:[m] -> {-1,1} Maintain Z = i in [m] fi ¢ h(i) Output Y = Z2 as estimate for F2 Correctness: Chebyshev’s inequality => O(1/2) space Previous Lower Bounds: [AMS96] 8 k, –approximating Fk => (log m) space [Bar-Yossef] -approximating F0 => (1/) space [IW03] -approximating F0 => space if Questions: bound hold for k 0? Does it hold for F0 for smaller ? Does the Our First Result Optimal Lower Bound: 8 k 1, any = (m-.5), -approximate Fk => (-2) bits of space. F1 = q trivial in log q space Fk trivial in O(m log q) space, so need = (m-.5) Technique: Reduction from 2-party protocol for computing Hamming distance (x,y) Use tools from communication complexity Lower Bound Idea Alice y 2 {0,1}m x 2 {0,1}m Stream s(x) (1 § ) Fk algorithm A Bob Stream s(y) S Internal state of A (1 § ) Fk algorithm A • Compute (1 § ) Fk(s(x) ± s(y)) w.p. > 2/3 • Idea: If can decide f(x,y) w.p. > 2/3, space used by A at least randomized 1-way comm. Complexity of f Randomized 1-way comm. complexity Boolean function f: X £ Y ! {0,1} Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y) Only 1 message m sent: must be from Alice to Bob Communication cost = maxx,y Ecoins [|m|] -error randomized 1-way communication complexity R(f), is cost of optimal protocol computing f with probability ¸ 1- Ok, but how do we lower bound R(f)? Shatter Coefficients [KNR] F = {f : X ! {0,1}} function family, f 2 F length-|X| bitstring For S µ X, shatter coefficient SC(fS) of S : |{f |S}f 2 F| = # distinct bitstrings when F restricted to S SC(F, p) = maxS µ X, |S| = p SC(fS). If SC(fS) = 2|S|, S shattered Treat f: X £ Y ! {0,1} as function family fX : fX = { fx(y) : Y ! {0,1} | x 2 X }, where fx(y) = f(x,y) Theorem [BJKS]: For every f: X £ Y ! {0,1}, every integer p, R1/3(f) = (log(SC(fX, p))) Warmup: (1/) Lower Bound [Bar-Yossef] Alice input x 2R {0,1}m, wt(x) = m/2 Bob input y 2R {0,1}m, wt(y) = m s(x), s(y) any streams w/char. vectors x, y PROMISE: (1) wt(x Æ y) = 0 OR (2) wt(x Æ y) = m f(x,y) = 0 f(x,y) = 1 F0(s(x) ± s(y)) = m/2 + m F0(s(x) ± s(y)) = m/2 R1/3(f) = (1/) [Bar-Yossef] (uses shatter coeffs) (1+’)m/2 < (1 - ’)(m/2 + m) for ’ = () Hence, can decide f ! F0 alg. uses (1/) space Too easy! Can replace F0 alg. with a Sampler! Our Reduction: Hamming Distance Decision Problem (HDDP) Set t = (1/2) Alice Bob x 2 {0,1}t y 2 {0,1}t Promise Problem : (x,y) · t/2 – (t1/2) f(x,y) = 0 OR (x,y) > t/2 f(x,y) = 1 Lower bound R1/3(f) via SC(fX, t), but need a lemma Main Lemma S µ{0,1}n =T = S-T y 9 S µ {0,1}n with |S| = n s.t. exist 2(n) “good” sets T µ S s.t. 9 y 2 {0,1}n s.t 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0 8 t 2 S – T, (y,t) > n/2 Lemma Resolves HDDP Complexity Theorem: R1/3(f) = (t) = (-2). Proof: Alice gets yT for random good set T applying main lemma with n = t. Bob gets random s 2 S Let f: {yT }T £ S ! {0,1}. Main Lemma =>SC(f) = 2(t) [BJKS] => R1/3(f) = (t) = (-2) Corollary: (1/2) space for randomized 2-party protocol to approximate (x,y) between inputs First known lower bound in terms of ! Back to Frequency Moments Use -approximator for Fk to solve HDDP y 2 {0,1}t ay Fk Alg s 2 S µ {0,1}t i-th universe element included exactly as once in stream ay iff yi = 1 (as same) State Fk Alg Solving HDDP with Fk Alice/Bob compute -approx to Fk(ay ± as) Fk(ay ± as) = 2k wt(y Æ s) + 1k (y,s) For k 1, Alice also transmits wt(y) in log m space. Conclusion: -approximating Fk(ay ± as) decides HDDP, so space for Fk is (t) = (-2) Back to the Main Lemma Recall: show 9 S µ {0,1}n with |S| = n s.t. 2(n) “good” sets T µ S s.t: 9 y 2 {0,1}n s.t 1. 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0 2. 8 t 2 S – T, (y,t) > n/2 Probabilistic Method Choose n random elts in {0,1}n for S Show arbitrary T µ S of size n/2 is good with probability > 2-zn for constant z < 1. Expected # good T is 2(n) So exists S with 2(n) good T Proving the Main Lemma T ={t1, …, tn/2} µ S arbitrary Let y be majority codeword of T What is probability p that both: 1. 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0 2. 8 t 2 S – T, (y,t) > n/2 Put x = Pr[8 t 2 T, (y,t) · n/2 – cn1/2] Put y = Pr[8 t 2 S-T, (y,t) > n/2] = 2-n/2 Independence => p = xy = x2-n/2 The Matrix Problem Wlog, assume y = 1n (recall y is majority word) Want lower bound Pr[8 t 2 T, (y,t) · n/2 – cn1/2] Equivalent to matrix problem: t1 -> t2 -> … tn/2 -> 101001000101111001 100101011100011110 001110111101010101 101010111011100011 For random n/2 x n binary matrix M, each column majority 1, what is probablity each row ¸ n/2 + cn1/2 1s? A First Attempt Set family A µ 2^{0,1}n monotone increasing if S1 2 A, S1 µ S2 => S2 2 A For uniform distribution on S µ {0,1}n, and A, B monotone increasing families, [Kleitman] Pr[A Å B] ¸ Pr[A] ¢ Pr[B] First try: Let R be event M ¸ n/2 + cn1/2 1s in each row, C event M majority 1 in each column Pr[8 t 2 T, (y,t) · n/2 – cn1/2] = Pr[R | C] = Pr[R Å C]/Pr[C] M characteristic vector of subset of [.5n2] => R,C monotone increasing => Pr[R Å C]/Pr[C] ¸ Pr[R]Pr[C]/Pr[C] = Pr[R] < 2-n/2 But we need > 2-zn/2 for constant z < 1, so this fails… A Second Attempt Second Try: R1: M ¸ n/2 + cn1/2 1s in first m rows R2: M ¸ n/2 + cn1/2 1s in remaining n/2-m rows C: M majority 1 in each column Pr[8 t 2 T, (y,t) · n/2 – cn1/2] = Pr[R1 Å R2 | C] = Pr[R1 Å R2 Å C]/Pr[C] R1, R2, C monotone increasing => Pr[R1 Å R2 Å C]/Pr[C] ¸ Pr[R1 Å C]Pr[R2]/Pr[C] = Pr[R1 | C] Pr[R2] Want this at least 2-zn/2 for z < 1 Pr[ Xi > n/2 + cn1/2] > ½ - c (2/pi)1/2 [Stirling] Independence => Pr[R2] > (½ - c(2/pi)1/2)n/2 - m Remains to show Pr[R1 | C] large. Computing Pr[R1 | C] Pr[R1 | C] = Pr[M ¸ n/2 + cn1/2 1s in 1st m rows | C] Show Pr[R1 | C] > 2-z’m for certain constant z’ < 1 Ingredients: Expect to get n/2 + (n1/2) 1s in each of 1st m rows | C Use negative correlation of entries in a given row => show n/2 + (n1/2) 1s in a given row w/good probability for small enough c A simple worst-case conditioning argument on these 1st m rows shows they all have ¸ n/2 + cn1/2 1s Completing the Proof Recall: what is probability p = xy, where 1. x = Pr[ 8 t 2 T, (y, t) · n/2 – cn1/2] 2. y = Pr[ 8 t 2 S – T, (y,t) > n/2] = 2-n/2 3. R1: M ¸ n/2 + cn1/2 1s in first m rows 4. R2: M ¸ n/2 + cn1/2 1s in remaining n/2-m rows 5. C: M majority 1 in each column x ¸ Pr[R1 | C] Pr[R2] ¸ 2-z’m (½ - c(2/pi)1/2)n/2 – m Analysis shows z’ small so this ¸ 2-z’’n/2, z’’ < 1 Hence p = xy ¸ 2-(z’’+1)n/2 Hence expected # good sets 2n-O(log n)p = 2(n) So exists S with 2(n) good T Bipartite Graphs Matrix Problem Bipartite Graph Counting Problem: … … How many bipartite graphs exist on n/2 by n vertices s.t. each left vertex has degree > n/2 + cn1/2 and each right vertex degree > n/2? Our Result on # of Bipartite Graphs Bipartite graph count: Argument shows at least 2n^2/2 – zn/2 –n such bipartite graphs for constant z < 1. Main lemma shows # bipartite graphs on n + n vertices w/each vertex degree > n/2 is > 2n^2-zn-n Can replace > with < Previous knowncount: 2n^2-2n [MW – personal comm.] Follows easily from Kleitman inequality Summary Results: Optimal Fk Lower Bound: 8 k 1 and any = (m-1/2), any -approximator for Fk must use (-2) bits of space. Communication Lower Bound of (-2) for oneway communication complexity of (, )approximating (x, y) Bipartite Graph Count: # bipartite graphs on n + n vertices w/each vertex degree > n/2 at least 2n^2-zn-n for constant z < 1.