Return to Revisions

2 of 5

Sentence case style in title, typo, tags

edit approved Feb 18, 2014 at 13:37

3.1k
5
34
58

Reason for not shrink the bias term

For linear model, $y=\beta_0+x*\beta+\varepsilon$, the shrinkage term is always like $P(\beta) $.

What's the reason we do not shrink the bias term $\beta_0$? Comparatively, should we shrink the bias term in the Neural network model?

regression linear-model bias regularization

asked Feb 18, 2014 at 9:50

855
2
8
11