Stack Exchange network consists of 183 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.
Reason for not shrinkshrinking the bias (intercept) term in regression
For linear model, $y=\beta_0+x*\beta+\varepsilon$$y=\beta_0+x\beta+\varepsilon$, the shrinkage term is always like$P(\beta) $.
What'sWhat is the reason that we do not shrink the bias (intercept) term $\beta_0$? Comparatively, shouldShould we shrink the bias term in the Neuralneural network modelmodels?
Reason for not shrink the bias term
For linear model, $y=\beta_0+x*\beta+\varepsilon$, the shrinkage term is always like$P(\beta) $.
What's the reason we do not shrink the bias term $\beta_0$? Comparatively, should we shrink the bias term in the Neural network model?
Reason for not shrinking the bias (intercept) term in regression
For linear model, $y=\beta_0+x\beta+\varepsilon$, the shrinkage term is always $P(\beta) $.
What is the reason that we do not shrink the bias (intercept) term $\beta_0$? Should we shrink the bias term in the neural network models?