The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden Data streams Algorithms access data in a sequential fashion One pass / small space Need to be randomized and approximate [FM, MP, AMS] Algorithm Main Memory 2 3 4 16 0 100 5 4 501 200 401 2 3 6 0 Frequency Moments and Norms Stream defines updates to a set of items 1,2,…,d. fi = weight of positive-only item i vs. turnstile model Frequency Moment Fk = i |fi|k p-th Norm: Lp = kfkp = (i |fi|p)1/p Maximum frequency: p=1 Distinct Elements: p=0 Heavy hitters Assume length of stream and magnitude of updates is · poly(d) k-th Classical Results Approximating problem Lp and Fp is the same For 0 · p · 2, Fp is approximable in O~(1) space (AMS, FM, Indyk, …) For p > 2, Fp is approximable in O~(d1-2/p) space (IW) this is best-possible (BJKS, CKS) Cascaded Aggregates Stream defines updates to pairs of items in {1,2,…n} x {1,2,…,d} fij = weight of item (i,j) Two aggregates P and Q 0 B B B @ f 11 f 21 .. . f 12 f 22 .. . ::: ::: .. . f 1d f 2d .. . f n1 f n2 : : : f nd 1 C C C A 0 Q B B B @ Q(Row 1) Q(Row 2) .. . Q(Row n) P ± Q = cascaded aggregate 1 C C C A P P±Q Motivation Multigraph traffic streams for analyzing IP [Cormode-Muthukrishnan] Corresponds to P ± F0 for different P’s F0 returns #destinations accessed by each source Also introduced the more general problem of estimating P ± Q Computing complex join estimates Product metrics [Andoni-Indyk-Krauthgamer] Stock volatility, computational geometry, operator norms The Picture 1 k 2 We give the (n1-1/k) bound n for Lk ± L0 and Lk ± L1 Õ(n 1-2/k n1-1/kFollows n1-1/k from n 1/2) £(1) 0 0 1 d k=p deletions [CM] Õ(n1-1/k) for Lk ± Lp for any p 1-2/k n in {0,1} in turnstile [MW] [Ganguly] (without deletions) techniques of Our upper [ADIW] bound 1 for L2 ± L0 without 1-2/p ? We give a 1-pass O~(n1-2/kd1-2/p) space algorithm when k¸p 1-2/p d d We also provide a matching £(1) lower bound based on multiparty disjointness 1 2 p Estimating Lk ± Lp d Our Problem: Fk ± Fp 0 M= B B B @ f 11 f 21 .. . f 12 f 22 .. . ::: ::: .. . f 1d f 2d .. . f n1 f n2 : : : f nd Fk ± Fp (M) = i (j |fij|p)k = i Fp(Row i)k 1 C C C A High Level Ideas: Fk ± Fp 1. do we do We want the Fk-value of the How vector the sampling (Fp(Row 1), …, Fp(Row n)) efficiently?? 2. We try to sample a row i with probability / Fp(Row i) 3. Spend an extra pass to compute Fp(Row i) 4. Could then output Fp(M) ¢ Fp(Row i)k-1 (can be seen as a generalization of [AMS]) Review – Estimating Fp [IW] Level sets: t St = f i j (1 + ²) · jf i j · (1 + ²) t+ 1 g Level t is good if |St|(1+ε)2t ¸ F2/B Items good from such level sets are also ²-Histogram [IW] Finds sets approximate sizes s’t of level For all St, s’t · (1+ε)|St| For good St, s’t ¸ (1- ε)|St| Also provides O~(1) random samples from each good St Space: O~(B) Sampling Rows According to Fp value Treat n x d matrix M as a vector: Run ε-Histogram on M for certain B Pr[row i] = pt/F ’(M)]¢[|S Å row Obtain (1§ε)-approximation sti|/|S ’ to |S t [st’(1+ε) t| for good t p t t|] FFpp(row Fk¼±Problems (M’) i)/F ¸ (1-ε) p(M) Fk ± Fp(M), where M’ is M 1. High to level algorithm requires inequality) restricted good items (Holder’s many samples (up to n1-1/k) from the St, but [IW] just gives O~(1). To sample, Can’t afford to repeat in low space Choose a good t with probability st’(1+ε)pt/Fp’(M), may misclassifypt a pair where2.FAlgorithm p’(M) = sumgood t st’ (1+ε) (i,j) into St when it is in St-1 Choose random sample (i, j) from St Let row i be the current sample High Level Ideas: Fk ± Fp 1. We want the Fk-value of the vector (Fp(Row 1), …, Fp(Row n)) 2. We try to sample a row i with probability / Fp(Row i) 3. Spend an extra pass to compute Fp(Row i) 4. Could then output Fp(M) ¢ Fp(Row i)k-1 How do we avoid an extra pass?? (can be seen as a generalization of [AMS]) Avoiding an Extra Pass Now we can sample a Row i / Fp(Row i) We design a new Fk-algorithm to run on (Fp(Row 1), …, Fp(Row n)) which only receives IDs i with probability / Fp(Row i) For each j 2 [log n], algorithm does: 1. Choose a random subset of n/2j rows 2. Sample a row i from this set with Pr[Row i] / Fp(Row i) We show that O~(n1-1/k) oracle samples is enough to estimate Fk up to 1§ε New Lower Bounds Alice n x d matrix A Bob n x d matrix B NO instance: for all rows i, ¢(Ai, Bi) · 1 YES instance: there is a unique row j for which ¢(Aj, Bj) = d, and for all i j, ¢(Ai, Bi) · 1 We show distinguishing these cases requires (n/d) randomized communication CC Implies estimating Lk(L0) or Lk(L1) needs (n1-1/k) space Information Complexity Paradigm [CSWY, BJKS]: the information cost IC is the amount of information the transcript reveals about the inputs For any function f, CC(f) ¸ IC(f) Using their direct sum theorem, it suffices to show an (1/d) information cost of a protocol for deciding if ¢(x,y) = d or ¢(x,y) · 1 Caveat: distribution is only on instances where ¢(x,y) · 1 Working with Hellinger Distance Given the prob. distribution vector ¼(x,y) over transcripts of an input (x,y) Let Ã(x,y)¿ = ¼(x,y)¿1/2 for all ¿ Information cost can be lower bounded by E¢(u,v) = 1 kÃ(u,u) - Ã(u,v)k2 Unlike previous work, we exploit the geometry of the squared Euclidean norm (useful in later work [AJP]) Short diagonals property: E¢(u,v) = 1 kÃ(u,u) - Ã(u,v)k2 ¸ (1/d) E¢(u,v) = d kÃ(u,u) - Ã(u,v)k2 d a e f b c a2 + b2 + c2 + d2 ¸ e2 + f2 Open Problems Lk ± Lp estimation for k < p Other cascaded aggregates, e.g. entropy Cascaded aggregates with 3 or more stages