4
$\begingroup$

In Aurelien Geron's book I found this line

This cost function makes sense because –log(t) grows very large when t approaches 0, so the cost will be large if the model estimates a probability close to 0 for a positive instance, and it will also be very large if the model estimates a probability close to 1 for a negative instance. On the other hand, – log(t) is close to 0 when t is close to 1, so the cost will be close to 0 if the estimated probability is close to 0 for a negative instance or close to 1 for a positive instance, which is precisely what we want.

What I dont get is, How will the cost will be large if the model estimates a probability close to 0 for a positive instance, and it will also be very large if the model estimates a probability close to 1 for a negative instance?

$\endgroup$

2 Answers 2

6
$\begingroup$

The cost function of the Logistic Regression derived via Maximum Likelihood Estimation: enter image description here enter image description here

  • If y = 1 (positive): i) cost = 0 if prediction is correct (i.e. h=1), ii) cost $\rightarrow \infty $ if $h_{\theta}(x)\rightarrow 0$.
  • If y = 0 (negative): i) cost = 0 if prediction is correct (i.e. h=0), ii) cost $\rightarrow \infty$ if $(1-h_{\theta}(x))\rightarrow 0$.

The intuition is that larger mistakes should get larger penalties. Further readings, 1,2,3,4.

$\endgroup$
1
  • 1
    $\begingroup$ Yes, I get it. I was not able to digest, how the model penalises the cost function, whenever the difference between predicted and actual probablities are different. $\endgroup$ Commented Nov 10, 2018 at 6:54
3
$\begingroup$

Not trying to oversimplify the answer, but simply get a calculator to compute these manually and you can see this in action:

If t is close to 1, lets just say that is 0.9999 for the example, then: $$ -log(t) = -log(0.9999) = 0.000100005 $$

conversely,

If t is close to 0, lets just say that is 0.0001 for the example, then: $$ -log(t) = -log(0.0001) = 9.21034 $$

So if the probability is high, the cost function returns a small, but if the probability is low, the cost function returns a (relatively) large number.

Perhaps I missed the point of your question, in which case, I apologize.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.