2
$\begingroup$

Given a hypothesis $h:X\rightarrow Y$ ($h$ is returned by an Empirical Risk Minimization (ERM) strategy with realizable case i.e. $h$ is consistent with the sample examples) over $X=[0,1]\subseteq R$ where $Y=\{0,1\}$ and $D$ is the uniform distribution over $X$. How can I prove that for the accuracy parameter $0\leq\epsilon\leq 1$ we might get $err_D(h)>\epsilon$? where $err_D(h)$ is the true error of $h$ in the distribution $D$.

This is a homework and all I can do is using probabilities (as its shown in the course slides and other sources) to give some bounds on the error (i.e. $P(err_D(h)> \epsilon)\leq k(1-\epsilon)^m$) where $k$ is the number of bad hypotheses $|H_B|$ s.t. ($\bar{h}\in H_B | err_D(\bar{h})>\epsilon)$ and $m$ is the sample size.

MY question is: is there any other way (beside using probabilities) to prove this? It just seems odd to me when the question asks for a prove and the answer is about probability bound. I am new to the field of learning theory and this might seems like a silly question but I am really trying to understand the intuition behind using probabilities.

$\endgroup$
8
  • $\begingroup$ I can't understand the question. Can you explain it to non-experts? $\endgroup$ Commented Sep 29, 2013 at 7:54
  • $\begingroup$ @YuvalFilmus Sorry.I have edited the question. Here is a pointer courses.cs.washington.edu/courses/cse546/12wi/slides/… $\endgroup$ Commented Sep 29, 2013 at 8:26
  • 3
    $\begingroup$ I also don't understand what you are asking. Are you trying to prove that you might get $err_D(h)>\epsilon$? Clearly there are examples where this happens (even if the probability for it is very small). $\endgroup$ Commented Sep 29, 2013 at 16:50
  • $\begingroup$ @Shaull yes my ultimate goal is to prove this. however, all the sources I came through prove it by probabilities (probability of such thing happens). $\endgroup$ Commented Sep 29, 2013 at 17:59
  • 1
    $\begingroup$ @seteropere There are two different things here: one is the (probabilistic) upper bound on the error, which you present here. The second is a lower bound, i.e. showing that $err_D(h)>\epsilon$ can actually occur, or a stronger result, stating that $Pr(err_D(h)>\epsilon)>t$ for some $t$. Which of the two are you trying to find a "non-probabilistic" proof for? $\endgroup$ Commented Sep 29, 2013 at 19:57

1 Answer 1

5
$\begingroup$

Consider a case where the actual label is $1$ for all $x\in [0,1]$. Suppose $H$ consists of two classifiers: one that realizes this (i.e. $\forall x\in [0,1], h(x)=1$), and one that is almost entirely wrong: $g(x)=0$ except for some finite set of numbers $S$.

If the sample is exactly $S$, then both classifiers have the same error on the sample set (0), so you may choose $g$. However, you'll get that $err_D(g)=1>\epsilon$.

As mentioned in the comments, this doesn't really prove anything useful, since the probability that the sample will be exactly $S$ is $0$.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.