4
$\begingroup$

I have been reading up on the Count Sketch algorithm, and I stumpled upon the Count Sketh algorithm explained in section 5 of https://www.cs.dartmouth.edu/~ac/Teach/data-streams-lecnotes.pdf. Then, I tried to solve some of the exercises. However, exercise 5-3 is causing me some problems.

If we let $\hat{f}_a$ denote the estimated frequency associated with a key $a$ and $f_a$ be actual frequency associated with key $a$, then it can be show that $E[\hat{f}_a]=f_a$. Furthermore, for $j\in[n]$, we let $\mathbf{f}_{-j}$ be the $(n-1)$ dimensional vector obtained by dropping the $i$th entry of frequency vector $\mathbf{f}.$ It can then be shown that $$ \text{Var}[\hat{f}_a]=\frac{||\mathbf{f}_{-a}||_2^2}{k} $$ when we use the hash function $h : [n] \rightarrow [k]$. Thus, using Chebyshev's inequality, it can be shown that $$ \text{Pr}[|\hat{f}_a - f_a| \geq \epsilon ||\mathbf{f}_{-a}||_2] \leq \frac{1}{3} $$ for $k=\frac{3}{\epsilon^2}$.

In exercise 5-3, $\textbf{f}_{-a}^{\text{res}(\ell)}$ denotes the $(n-1)$-dimensional vector obtained by setting the $\ell$ largest (by absolute value) entries to zero in $\mathbf{f}_{-a}$. The exercise then essentially asks the reader to show that $$ \text{Pr}[|\hat{f}_a - f_a| \geq \epsilon ||\mathbf{f}_{-a}^{\text{res}(\ell)}||_2] \leq \frac{1}{3} $$ where $k=\frac{6}{\epsilon^2}$ and $\ell=1/\epsilon^2$. However, I don't know how to show that the inequality above holds. I have tried using Chebyshev's inequality, but I still can't seem to make it work. Any help would be appreciated.

$\endgroup$

1 Answer 1

3
+50
$\begingroup$

$\newcommand{\norm}[1]{\left\lVert#1\right\rVert}$

Denote the support of $f$ by $f_1\le f_2\le...\le f_s$. Given a frequency sequence $f$, denote by $E_{f,a}$ the random variable $\hat{f}_a$ where the stream frequencies are given by $f$ (the evaluation depends on the inner randomness of the sketch and the input stream).

The probability that the hash of one of the top $\ell=1/\epsilon^2$ elements collides with the hash of some given element $a$ is bounded by $\frac{\ell}{k}$ (union bound). Conditioning on the event that such a collision did not occur, the evaluation $\hat{f}_a$ does not depend on the top $\ell$ entries, as they do not affect the hash entry of $a$. More formally, conditioned on the event $C=\bigcap\limits_{i=0}^{\ell-1}\mathbb{1}_{h\left(f_{s-i}\right)\neq h(a)}$, $E_{f,a}$ and $E_{f^{res(\ell)},a}$ are equally distributed. This yields:

$$\Pr\left[\left|\hat{f}_a-f_a\right|\ge \epsilon\norm{f_{-a}^{res(\ell)}}\right]= \Pr[C]\cdot \Pr\left[\left|\hat{f}_a-f_a\right|\ge \epsilon\norm{f_{-a}^{res(\ell)}}\big| C\right]+\\ \left(1-\Pr[C]\right)\Pr\left[\left|\hat{f}_a-f_a\right|\ge \epsilon\norm{f_{-a}^{res(\ell)}} \big|\overline{C}\right]\le\\ \Pr\left[\left|\hat{f}_a-f_a\right|\ge \epsilon\norm{f_{-a}^{res(\ell)}}\big| C\right]+\frac{1}{6}$$

Since $E_{f,a}, E_{f^{res(\ell)},a}$ are equally distributed (conditioned on $C$), the left hand term equals $\Pr\left[\left|\hat{f}_a-f_a\right|\ge \epsilon\norm{f_{-a}}\right]$, which you can bound by $1/6$ with your original result.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.