.ppt

The Data Stream Space Complexity
of Cascaded Norms
T.S. Jayram
David Woodruff
IBM Almaden
Data streams

Algorithms access data in a sequential fashion
 One pass / small space
 Need to be randomized and approximate
 [FM, MP, AMS]
Algorithm
Main
Memory
2 3 4 16 0 100 5 4 501 200 401 2 3 6 0
Frequency Moments and Norms
 Stream
defines updates to a set of
items 1,2,…,d.
 fi = weight of
 positive-only
item i
vs. turnstile model
Frequency Moment Fk = i |fi|k
 p-th Norm: Lp = kfkp = (i |fi|p)1/p
 Maximum frequency: p=1
 Distinct Elements: p=0
 Heavy hitters
 Assume length of stream and
magnitude of updates is · poly(d)
 k-th
Classical Results
 Approximating
problem
Lp and Fp is the same
 For
0 · p · 2, Fp is approximable in
O~(1) space (AMS, FM, Indyk, …)
 For
p > 2, Fp is approximable in
O~(d1-2/p) space (IW)

this is best-possible (BJKS, CKS)
Cascaded Aggregates

Stream defines updates to pairs of items in
{1,2,…n} x {1,2,…,d}
 fij

= weight of item (i,j)
Two aggregates P and Q
0
B
B
B
@
f 11
f 21
..
.
f 12
f 22
..
.
:::
:::
..
.
f 1d
f 2d
..
.
f n1
f n2
: : : f nd
1
C
C
C
A
0
Q
B
B
B
@
Q(Row 1)
Q(Row 2)
..
.
Q(Row n)
P ± Q = cascaded aggregate
1
C
C
C
A
P
P±Q
Motivation
 Multigraph
traffic
streams for analyzing IP
[Cormode-Muthukrishnan]
Corresponds to P ± F0 for different P’s
 F0 returns #destinations accessed by
each source

 Also
introduced the more general
problem of estimating P ± Q
 Computing complex join estimates
 Product metrics [Andoni-Indyk-Krauthgamer]
 Stock volatility, computational
geometry, operator norms
The Picture
1
k
2
We give the (n1-1/k) bound
n
for Lk ± L0 and Lk ± L1
Õ(n
1-2/k
n1-1/kFollows
n1-1/k
from
n
1/2)
£(1)
0
0
1
d
k=p
deletions [CM]
Õ(n1-1/k) for Lk ± Lp for any p 1-2/k
n
in {0,1} in turnstile [MW]
[Ganguly] (without
deletions) techniques of Our upper
[ADIW]
bound
1
for L2 ± L0 without
1-2/p
?
We give a 1-pass O~(n1-2/kd1-2/p)
space algorithm when
k¸p
1-2/p
d
d
We also provide a matching
£(1)
lower bound based
on multiparty
disjointness
1
2
p
Estimating Lk ± Lp
d
Our Problem: Fk ± Fp
0
M=
B
B
B
@
f 11
f 21
..
.
f 12
f 22
..
.
:::
:::
..
.
f 1d
f 2d
..
.
f n1
f n2
: : : f nd
Fk ± Fp (M) = i (j |fij|p)k
= i Fp(Row i)k
1
C
C
C
A
High Level Ideas: Fk ± Fp
1.
do we do
We want the Fk-value of the How
vector
the sampling
(Fp(Row 1), …, Fp(Row n)) efficiently??
2.
We try to sample a row i with probability
/ Fp(Row i)
3.
Spend an extra pass to compute Fp(Row i)
4.
Could then output Fp(M) ¢ Fp(Row i)k-1
(can be seen as a generalization of [AMS])
Review – Estimating Fp [IW]
 Level
sets:
t
St = f i j (1 + ²) · jf i j · (1 + ²)
t+ 1
g
 Level
t is good if
|St|(1+ε)2t ¸ F2/B
 Items
good
from such level sets are also
²-Histogram [IW]
 Finds
sets
approximate sizes s’t of level
For all St, s’t · (1+ε)|St|
 For good St, s’t ¸ (1- ε)|St|

 Also
provides O~(1) random samples
from each good St
 Space:
O~(B)
Sampling Rows According to Fp value

Treat n x d matrix M as a vector:
 Run ε-Histogram on M for certain B
Pr[row i] =
pt/F ’(M)]¢[|S Å row
 Obtain
(1§ε)-approximation
sti|/|S
’ to |S
t [st’(1+ε)
t| for good t
p
t
t|]
FFpp(row
 Fk¼±Problems
(M’) i)/F
¸ (1-ε)
p(M) Fk ± Fp(M), where M’ is M
1. High to
level
algorithm
requires inequality)
restricted
good
items (Holder’s
many samples (up to n1-1/k) from
the St, but [IW] just gives O~(1).
 To sample,
Can’t afford to repeat in low space
Choose a good t with probability st’(1+ε)pt/Fp’(M),
may misclassifypt a pair
where2.FAlgorithm
p’(M) = sumgood t st’ (1+ε)
(i,j) into St when it is in St-1
 Choose random sample (i, j) from St
 Let row i be the current sample

High Level Ideas: Fk ± Fp
1.
We want the Fk-value of the vector
(Fp(Row 1), …, Fp(Row n))
2.
We try to sample a row i with probability
/ Fp(Row i)
3.
Spend an extra pass to compute Fp(Row i)
4.
Could then output Fp(M) ¢ Fp(Row i)k-1
How do we
avoid an extra
pass??
(can be seen as a generalization of [AMS])
Avoiding an Extra Pass

Now we can sample a Row i / Fp(Row i)

We design a new Fk-algorithm to run on
(Fp(Row 1), …, Fp(Row n))
which only receives IDs i with probability
/ Fp(Row i)

For each j 2 [log n], algorithm does:
1. Choose a random subset of n/2j rows
2. Sample a row i from this set with
Pr[Row i] / Fp(Row i)

We show that O~(n1-1/k) oracle samples is
enough to estimate Fk up to 1§ε
New Lower Bounds
Alice
n x d matrix A
Bob
n x d matrix B
NO instance: for all rows i, ¢(Ai, Bi) · 1
YES instance: there is a unique row j for which
¢(Aj, Bj) = d, and for all i  j, ¢(Ai, Bi) · 1
We show distinguishing these cases requires
(n/d) randomized communication CC
Implies estimating Lk(L0) or Lk(L1) needs (n1-1/k) space
Information Complexity Paradigm

[CSWY, BJKS]: the information cost IC is the
amount of information the transcript reveals
about the inputs

For any function f, CC(f) ¸ IC(f)

Using their direct sum theorem, it suffices to
show an (1/d) information cost of a protocol
for deciding if ¢(x,y) = d or ¢(x,y) · 1

Caveat: distribution is only on instances
where ¢(x,y) · 1
Working with Hellinger Distance

Given the prob. distribution vector ¼(x,y) over transcripts of an
input (x,y)
 Let Ã(x,y)¿ = ¼(x,y)¿1/2 for all ¿

Information cost can be lower bounded by
E¢(u,v) = 1 kÃ(u,u) - Ã(u,v)k2

Unlike previous work, we exploit the geometry of the squared
Euclidean norm (useful in later work [AJP])

Short diagonals property:
E¢(u,v) = 1 kÃ(u,u) - Ã(u,v)k2 ¸ (1/d) E¢(u,v) = d kÃ(u,u) - Ã(u,v)k2
d
a
e
f
b
c
a2 + b2 + c2 + d2
¸ e2 + f2
Open Problems

Lk ± Lp estimation for k < p

Other cascaded aggregates, e.g. entropy

Cascaded aggregates with 3 or more stages