Why is zero initialization seems to give really good results with softmax regression?

Question

Normally, zero initialization is really frowned upon when training neural network for classification. But today, when I decided to implement softmax regression from scratch and play around with it a bit, I came upon a phenomenon, where apparently, when there is no hidden layer, then it is better to use zero initialization for the weight of the model.

For example, when I train the model on MNIST dataset, here are the plots of training accuracy, training loss, validation accuracy and validation loss for each weight initialization scheme I used.

As you can see, the metrics are much better when I use zero initialization. To make sure that this is not specific to MNIST, I also did experiment on CIFAR10 dataset. Here are the plots for CIFAR10.

As you can see from both datasets (clearer in MNIST), the model initialized randomly have converged, and they are still quite far away from the models that is zero initialized. I'll be really appreciate if anyone can give me an explanation for this phenomenon, or anyone can share idea on why this might happen.

If you want to reproduce the experiments, here is the notebook I created.

It works because with no hidden layer, the model parameters are identified. It’s no different than multi-class logistic regression. en.m.wikipedia.org/wiki/Identifiability — Sycorax
– Sycorax ♦, Commented Mar 25 at 19:53
I'm not sure that I understand your comment. How does Identifiability explains zero initialization is better than other types of random initialization? — Minh Ngo
– Minh Ngo, Commented Mar 26 at 7:29
Each weight gets a unique update during optimization. This update moves the weights towards the global minimum. — Sycorax
– Sycorax ♦, Commented Mar 26 at 11:28

Stack Exchange Network

Why is zero initialization seems to give really good results with softmax regression?

0

Linked

Hot Network Questions

Why is zero initialization seems to give really good results with softmax regression?

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked

Related

Hot Network Questions