In Theorem D.5 of Goldreich's Computational Complexity: A Conceptual Perspective, Goldreich states:
Let $m \leq n$ be integers, $H_n^m$ be a family of pairwise independent hash functions, and $S \subseteq \{0,1\}^n$. Then, for every $y \in \{0,1\}^m$ and every $\epsilon > 0$, for all but at most an $\frac{2^m}{\epsilon |S|}$ fraction of $h \in H_n^m$ it holds that:
$$(1-\epsilon) \frac{|S|}{2^m} < |\{ x \in S \, : \, h(x) = y \}| < (1+\epsilon) \frac{|S|}{2^m}.$$
To prove this theorem, Goldreich notes that the expected size of this set is $|S| / 2^m$. However, I do not see how this works for all $y$. For example, suppose $x = 0 \in S$, and let $y = 0$. Then, by definition the hashed set will always contain $x$, so the expected value should go to $1$ as $m \rightarrow \infty$, not $|S| / 2^m \rightarrow 0$. More generally, any time I pick a $y$ such that $y = h(x)$ for some $x \in S$, the expected value will go to $1$, not $0$. Why does this lemma hold for all $y$ if specific values will yield different expected values?