Streaming Algorithms, etc.
MIT
Piotr Indyk

Data Streams
• A data stream is a sequence of data that is too
large to be stored in available memory
(disk, memory, cache, etc.)
• Examples:
–
–
–
–

Network traffic
Database transactions
Sensor networks
Satellite data feed

Example application: Monitoring
Network Traffic
•

Router routs packets
(many packets)
– Where do they come from ?
– Where do they go to ?

Ideally, would like to maintain a traffic
matrix x[.,.]
– For each (src,dst) packet, increment xsrc,dst
– Requires way too much space!
(232 x 232 entries)
– Need to maintain a compressed version of the
matrix

destination
source

•

x

Data Streams
• A data stream is a (massive) sequence of data
– Too large to store (on disk, memory, cache, etc.)

• Examples:
–
–
–
–
–

Network traffic (source/destination)
Database transactions
Sensor networks
Satellite data feed
…

• Approaches:

– Ignore it
– Develop algorithms for dealing with such data

This course
• Systematic introduction to the area
– Emphasis on common themes
– Connections between streaming, sketching,
compressed sensing, communication complexity, …
– First Second of its kind
(previous edition from Fall’07: see my web page at MIT)

• Style: algorithmic/theoretical…
– Background in linear algebra and probability

Topics
•

Streaming model. Estimating distinct elements (L0 norm)

•

Estimating L2 norm (AMS), Johnson Lindenstrauss

•

Lp norm (p<2), other norms, entropy

•

Heavy hitters: L1 norm, L2 norm, sparse approximations

•

Sparse recovery via LP decoding

•

Lower bounds: communication complexity, indexing, L2 norm

•

Options: MST, bi-chromatic matching, insertions-only streams,
Fourier sampling,

Plan For This Lecture
• Introduce the data stream model(s)
• Basic algorithms
– Estimating number of distinct elements in a
stream
– Into to frequency moments and norms

Basic Data Stream Model
• Single pass over the data: i1, i2,…,in
– Typically, we assume n is known

• Bounded storage (typically nα or logc n)
– Units of storage: bits, words or „elements”
(e.g., points, nodes/edges)

• Fast processing time per element
– Randomness OK (in fact, almost always necessary)
8 2 1 9 1 9 2 4 6 3 9 4 2 3 4 2 3 8 5 2 5 6 ...

Counting Distinct Elements
• Stream elements: numbers from {1...m}
• Goal: estimate the number of distinct elements DE in
the stream
– Up to 1±ε
– With probability 1-P

• Simpler goal: for a given T>0, provide an algorithm
which, with probability 1-P:
– Answers YES, if DE> (1+ε)T
– Answers NO, if DE< (1-ε)T

• Run, in parallel, the algorithm with
T=1, 1+ε, (1+ε)2,..., n
– Total space multiplied by log1+εn ≈ log(n)/ ε
– Probability of failure multiplied by the same factor

Vector Interpretation
Stream: 8 2 1 9 1 9 2 4 4 9 4 2 5 4 2 5 8 5 2 5
Vector X:
1 2 3 4 5 6 7 8 9

• Initially, x=0
• Insertion of i is interpreted as
xi = xi +1
• Want to estimate DE(x) = ||x||0

Estimating DE(x)
Vector X:
1 2 3 4 5 6 7 8 9
Set S:
•
•
•
•

+

++

(T=4)

Choose a random set S of coordinates
– For each i, we have Pr[i∈S]=1/T
Maintain SumS(x) = Σi∈S xi
Estimation algorithm A:
Pr
– YES, if SumS(x)>0
– NO, if SumS(x)=0
Analysis:
– Pr=Pr[SumS(x)=0] = (1-1/T)DE
– For T “large enough”: (1-1/T)DE ≈e-DE/T
– Using calculus, for ε small enough:
• If DE> (1+ε)T, then Pr ≈ e-(1+ε) < 1/e - ε/3
• if DE< (1-ε)T, then Pr ≈ e-( 1-ε) > 1/e + ε/3

0.8
0.7
0.6
0.5
0.4

Series1

0.3
0.2
0.1
0
1

3

5

7

9 11 13 15 17 19

DE

Estimating DE(x) ctd.
• We have Algorithm A:

– If DE> (1+ε)T, then Pr<1/e-ε/3
– if DE< (1-ε)T, then Pr>1/e+ε/3

• Algorithm B:

– Select sets S1 … Sk , k=O(log(1/P)/ε2)
– Let Z = number of SumSj(x) that are equal to 0
– By Chernoff bound (define), with probability >1-P
• If DE> (1+ε)T, then Z<k/e
• if DE< (1-ε)T, then Z>k/e

• Total space: O( log(n)/ε log (1/P)/ε2 ) numbers
in range 0…n
• Can remove the log(n)/ε factor
• Bibliographic note: [Flajolet-Martin’85]

Interlude – Chernoff bound
• Let Z1…Zk be i.i.d. Bernoulli variables, with
Pr[Zj=1]=p
• Let Z=∑j Zj
• For any 1>ε>0, we have
Pr[ |E[Z]-Z| > εE[Z] ]≤2exp( -ε2E[Z]/3 )

Comments
• Implementing S:
– Choose a hash function h: {1..m} -> {1..T}
– Define S={i: h(i)=1}

• Implementing h
– Pseudorandom generators. More later.

• Better algorithms known:
– Theory: O( log(1/ε)/ε2 +log n) bits
[Bar-Yossef-Jayram-Kumar-Sivakumar-Trevisan’02]

– Practice: need 128 bytes for all works of
Shakespeare , ε≈10% [Durand-Flajolet’03]

More comments
Vector X:
1 2 3 4 5 6 7 8 9

• The algorithm uses “linear sketches”
SumSj(x)=Σi∈Sj xi
• Can implement decrements xi=xi-1
– I.e., the stream can contain deletions of elements
(as long as x≥0)
– Other names: dynamic model, turnstile model

More General Problem
• What other functions of a vector x can we maintain in small space ?
• Lp norms:
||x||p = ( ∑i |xi|p )1/p

•

– We also have ||x||∞ =maxi |xi|
– … and ||x||0 = DE(x), since ||x||pp =∑i |xi|p→DE(x) as p→0

Alternatively: frequency moments Fp = p-th power of Lp norms
(exception: F0 = L0 )
• How much space do you need to estimate ||x||p (for const. ε) ?
• Theorem:
– For p∈[0,2]:
polylog n space suffices
– For p>2:
n1-2/p polylog n space suffices and is necessary

[Alon-Matias-Szegedy’96, Feigenbaum-Kannan-Strauss-Viswanathan’99,
Indyk’00, Coppersmith-Kumar’04, Ganguly’04, Bar-Yossef-JayramKumar-Sivakumar’02’03, Saks-Sun’03, Indyk-Woodruff’05]