1
$\begingroup$

Normally, zero initialization is really frowned upon when training neural network for classification. But today, when I decided to implement softmax regression from scratch and play around with it a bit, I came upon a phenomenon, where apparently, when there is no hidden layer, then it is better to use zero initialization for the weight of the model.

For example, when I train the model on MNIST dataset, here are the plots of training accuracy, training loss, validation accuracy and validation loss for each weight initialization scheme I used.

MNIST plots

As you can see, the metrics are much better when I use zero initialization. To make sure that this is not specific to MNIST, I also did experiment on CIFAR10 dataset. Here are the plots for CIFAR10.

CIFAR10 plots

As you can see from both datasets (clearer in MNIST), the model initialized randomly have converged, and they are still quite far away from the models that is zero initialized. I'll be really appreciate if anyone can give me an explanation for this phenomenon, or anyone can share idea on why this might happen.

If you want to reproduce the experiments, here is the notebook I created.

$\endgroup$
4
  • 1
    $\begingroup$ It works because with no hidden layer, the model parameters are identified. It’s no different than multi-class logistic regression. en.m.wikipedia.org/wiki/Identifiability $\endgroup$ Commented Mar 25 at 19:53
  • $\begingroup$ More details: stats.stackexchange.com/questions/349418/… $\endgroup$ Commented Mar 25 at 20:08
  • $\begingroup$ I'm not sure that I understand your comment. How does Identifiability explains zero initialization is better than other types of random initialization? $\endgroup$ Commented Mar 26 at 7:29
  • 1
    $\begingroup$ Each weight gets a unique update during optimization. This update moves the weights towards the global minimum. $\endgroup$ Commented Mar 26 at 11:28

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.