Edit - Cross Validated

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

Rev

8

$\begingroup$ This answer is what I was looking for. In my own current experience, which involves learning a target probabilities, BCE is way more robust than KL. Basically, KL was unusable. KL and BCE aren't "equivalent" loss functions. $\endgroup$

Nicholas Leonard
– Nicholas Leonard

2019-11-29 16:31:35 +00:00
Commented Nov 29, 2019 at 16:31
$\begingroup$ When you said "the first part" and "the second part", which one was which? $\endgroup$

Josh
– Josh

2020-05-30 20:27:59 +00:00
Commented May 30, 2020 at 20:27
1

$\begingroup$ @zewen's answer can be misleading as he claims that in mini-batch training, CE can be more robust than KL. In most of standard mini-batch training, we use gradient-based approach, and the gradient of $H(p)$ with respect to $q$ (which is a function of our model parameter) would be zero. So in these cases, CE and KL as a loss function are identical. $\endgroup$

fatpanda2049
– fatpanda2049

2021-09-23 13:41:23 +00:00
Commented Sep 23, 2021 at 13:41
1

$\begingroup$ Are you sure the 1st formula is correct? Seems the p,d are ordered wrong. $\endgroup$

Junwei Dong
– Junwei Dong

2022-09-28 03:29:03 +00:00
Commented Sep 28, 2022 at 3:29
1

$\begingroup$ I don't understand why the $H(p)$ constant makes the training less robust. The gradient should still be exactly the same, no? So is it just that your loss curve may look a bit more jiggly, but you training is still unchanged? $\endgroup$

Thomas Ahle
– Thomas Ahle

2023-12-09 19:33:10 +00:00
Commented Dec 9, 2023 at 19:33

| Show 3 more comments

Correct minor typos or mistakes
Clarify meaning without changing it
Add related resources or links
Always respect the author’s intent
Don’t use edits to reply to the author

create code fences with backticks ` or tildes ~
```
like so
```
add language identifier to highlight code
```python
def function(foo):
print(foo)
```
put returns between paragraphs
for linebreak add 2 spaces at end
_italic_ or **bold**
indent code by 4 spaces
backtick escapes `like _so_`
quote by placing > at start of line
to make links (use https whenever possible)

<https://example.com>

[example](https://example.com)

<a href="https://example.com">example</a>
MathJax equations $\sin^2 \theta$

formatting help »
answering help »

MathJax help »