0
$\begingroup$

I want to visualize or calculate the compression bounds for hypothesis classes. I learnt how to figure out the VC dimension. Let's say I define two hypothesis class. For example: $$ H_k = \{h ∈ (0,1)_X \mid |h^{−1}(1)| ≤ k\} $$

and $$ H_{decst} = \{h^i_a \mid a ∈ R, i ∈ (1,2)\}, \text{ where } h^i_a((x1,x2)) = 1[x_i ≤ a]. $$

For VC dimension, I showed The VC dimension of $H_{decst}$ is 2. We can shatter the two points $x_1 = (2,0)$ and $x_2 = (0, 2)$ with the four functions $h_{0.5}, h_{1.5}, h_{1.5}, h_{2.5}$.

Also, It is not hard to see that $H_k$ shatters any set $C ⊆ N$ of size $k$, but no larger set, since the all-1 labeling can not be realized on more than $k$ points with $H_k$. Thus, the VC-dimension of $H_k$ is $k$.

Similarly, I want to figure out the compression bounds for $H_k$ and $H_{decst}$.

$\endgroup$

1 Answer 1

1
$\begingroup$

Generally, every class of finite VC dimension admits to a (exponential size in the dimension) compression scheme. This was an open question resolved a few years ago by Shay and Amir in this paper. However, this is an overkill for your question, since $H_k$ obviously has a $k$-comprresion scheme (only keep the samples labeled $1$, of which there are at most $k$, and the reconstruction is zero everywhere else).

$H_{desct}$ can be compressed to two samples, given $\left((a_i,b_i),y_i\right)_{i=1}^n$, we know that either there exists $j$ such $\forall i\in[n]: y_i=\mathbb{1}_{a_i\le a_j}$ or there exists $j$ such that $\forall i\in[n] : y_i=\mathbb{1}_{b_i\le b_j}$. The compression keeps $(a_j,b_j)$ and one more sample $(a_{j'},b_{j'})$ to indicate the correct coordinate of the threshold, e.g. if the former condition holds and $\forall i\in[n]: y_i=\mathbb{1}_{a_i\le a_j}$ then pick $j'$ such that $y_{j'}\neq y_j$ (so $y_{j'}=0)$ and $b_{j'}<b_j$ (this allows us to reconstruct the function $\mathbb{1}_{x_1\le a_j}$ from the two samples). If no such $j'$ exists then $\mathbb{1}_{x_2\le b_j}$ is also a consistent hypothesis, and our compression can keep only one sample.

$\endgroup$
2
  • $\begingroup$ According to what you mean, any finite classes admits to a compression scheme. What about a finite set of natural numbers. Think of a hypothesis class that outputs 1 everytime there is an element present in A but 0 if its not present. I don't think this can be compressed. Can you elaborate on this ? $\endgroup$ Commented Dec 14, 2020 at 1:14
  • 1
    $\begingroup$ Do you mean $\mathcal{X}=2^{\mathbb{N}}$ and $\mathcal{H}=\{h_{x_0} | x_0\in\mathbb{N}\}$ where $h_{x_0}(A)=\mathbb{1}_{x_0\in A}$? In that case $VC_{\mathcal{H}}=\infty$ (Given $n\in\mathbb{N}$ construct an $n$-size shattered set as follows, select different $x_1,...,x_{2^n}\in\mathbb{N}$ and index the subsets of $[n]$ by $1,...,2^n$. Now simply generate sets $A_1,...,A_n$ such that if $I\subseteq[n]$ is the $i'th$ subset then $x_i\in A_j$ iff $j\in I$. You can show that this set is indeed shattered). Infinite VC dimension implies no constant size compression scheme. $\endgroup$ Commented Dec 14, 2020 at 7:22

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.