Deriving binary cross entropy loss function

Question

I want to know how the equation for binary cross entropy came about. My approach is the following:

Let's say we have two ground truths: $y_1$ and $y_2$. We also have two predictions $p_1$ and $p_2$. Now, $p_2$ can also be defined as $1 -p_1$ since we're dealing with a binary problem.

From this, how exactly do we arrive at this equation: $$−(y\log{p}+(1−y)\log{(1−p)})$$

And we think of this as a loss function, why does it make sense to minimize this equation?

Hint: What's the log-likelihood of a Bernoulli probability model? — Sycorax
– Sycorax ♦, Commented May 20, 2018 at 17:40

Satwik Bhattamishra · Accepted Answer · 2018-05-20 18:11:47Z

Suppose there's a random variable $Y$ where $Y \in \{0,1\}$ (for binary classification), then the Bernoulli probability model will give us:

$$ L(p) = p^y (1-p)^{1-y} $$

$$ log(L(p)) = y\log p + (1-y) \log (1-p) $$

Its often easier to work with the derivatives when the metric is in terms of log and additionally, the min/max of loglikelihood is the same as the min/max of likelihood. The inherent meaning of a cost or loss function is such that the more it deviates from the 0, the worse the model performs. The negative sign the just preserves that meaning and is easier to interpret. Maximizing the above function will lead to the same result.

So maximizing log(L(p)) is the same as minimizing cross entropy as I have defined it? — Christian
– Christian, Commented May 21, 2018 at 3:09
The above equation has a maxima at 0 and for the rest of the values it is negative. Thus in the ideal case (perfect prediction), the value of log(L(p)) will be 0 and it will be its maxima. Conversely, the negative of log(L(p)) will have minima at 0. — Satwik Bhattamishra
– Satwik Bhattamishra, Commented May 21, 2018 at 7:33

Stack Exchange Network

Deriving binary cross entropy loss function

1 Answer 1

Hot Network Questions

Deriving binary cross entropy loss function

1 Answer 1

Related

Hot Network Questions