2
$\begingroup$

(Background info taken from my blog) In logistic regression, the hypothesis function, which models the relationshiop between the dependent variable $P(y = 1)$ and the independent variable $X$, is : \begin{align*} H_i = h(X_i) &= \frac{1}{1 + e^{-X_i \cdot \beta}} \end{align*} where $X_i$ is the $i$th row of the design matrix $\underset{m \times n}{X}$, or in matrix form: \begin{align*} H &= \frac{1}{1 + e^{-X \beta}} \end{align*}

H is a $m\times 1$ matrix. Except for $X\beta$ all operations are element-wise. The cost function $J$ is a measure of deviance of the modeled dependent variable from the observed $y$

\begin{align*} J &= (1/m)\sum_{i = 1}^m [-y_i\log H_i - (1-y_i)\log (1-H_i)] \\ &= (1/m)\sum_{i = 1}^m \left[ -y_i \log \frac{1}{1+e^{- X_i \cdot \beta}} - (1 - y_i) \log \left( 1 - \frac{1}{1+e^{- X_i \cdot \beta}} \right) \right] \\ &= (1/m)\sum_{i = 1}^m \left[ y_i \log (1+e^{-X_i \cdot \beta}) + (1 - y_i) \log \left( 1+e^{X_i \cdot \beta} \right) \right] \\ \end{align*}

\begin{align*} \frac{\partial J}{\partial \beta_j} &= \dfrac {1} {m} \sum_{i=1}^m \left[ y_{i}H_{i}e^{-X_{i}\cdot \beta }\left( -X_{ij}\right ) + \left( 1-y_{i}\right) \dfrac {1} {1+e^{X_{i}\cdot\beta }}e^{X_{i}\cdot\beta }X_{ij} \right] \\ &= \dfrac {1} {m} \sum_{i=1}^m \left[ y_{i}H_{i}e^{-X_{i}\cdot \beta }\left( -X_{ij}\right ) + \left( 1-y_{i}\right) H_i X_{ij} \right] \\ &= \sum_{i=0}^m H_{i}X_{ij}\left( -y_{i}e^{-X_{i}\cdot \beta }+1-y_{i}\right) \\ &= \sum_{i=0}^m H_{i}X_{ij}\left( 1-y_{i}\left( 1+e^{X_i\cdot\beta }\right) \right) \\ &= \sum_{i=0}^m H_{i}X_{ij}\left( 1-y_{i} / H_i\right) \\ &= \sum_{i=0}^m \left( H_{i}-y_{i}\right) X_{ij} \\ &= (H - y) \cdot X_j \\\\ \frac{\partial J}{\partial \beta} &= X^T (H-y) \end{align*}

Let $f_i' = \left( H_{i}-y_{i}\right) X_{ij}$, then according to this video:

enter image description here

batch gradient descent can be described as:

Until convergence:

for all $j$:

$$\theta_j := \theta_j - \alpha \sum f_i'$$

and stochastic gradient descent can be described as:

Shuffle the rows of data, and until convergence:

for all $i$ in $1\cdots m$:

for all $j$ in $0\cdots n$: \begin{align*} \theta_j := \theta_j - \alpha f_i' \end{align*}

This looks straight-forward, but when I implement stochastic gradient descent in R, it's unable to converge anywhere close to the optimum, here is the code:

logreg = function(y, x) { alpha = 1.15 x = as.matrix(x) x = cbind(1, x) m = nrow(x) m1 = sample(m) n = ncol(x) b = matrix(rep(1, n)) newb = b + .1 h = 1 / (1 + exp(-x %*% b)) J = -(t(y) %*% log(h) + t(1-y) %*% log(1 -h)) newJ = J+.5 while(1) { cat("outer while...\n") for(i in m1) { Vi = exp(-as.numeric(x[i, ]%*%b)) Hi = 1 / (1 + Vi) Ei = (Hi - y[i]) sDerivJ = matrix(Ei * x[i, ]) newb = b - alpha * sDerivJ } h = 1 / (1 + exp(-x %*% newb)) newJ = -(t(y) %*% log(h) + t(1-y) %*% log(1 -h)) if((newJ - J)/J > .15) { alpha = alpha/2 next } print(b) print(newb) b = newb J = newJ if(max(abs(b - newb)) < 0.001) { break } } b } nr = 5000 nc = 20 set.seed(17) x = matrix(rnorm(nr*nc), nr) y = matrix(sample(0:1, nr, repl=T), nr) testglm = function() { res = summary(glm(y~x, family=binomial)) print(res) } testlogreg = function() { res = logreg(y, x) print(res) } print(system.time(testlogreg())) print(system.time(testglm())) 

I am wondering what went wrong.

$\endgroup$
2
  • $\begingroup$ I did not go through all your code, but noticed that you have a pretty high $\alpha$. You might consider using a much smaller value such as 0.1 or 0.01. $\endgroup$ Commented Apr 16, 2014 at 4:31
  • $\begingroup$ I have tried that, it doesn't help. I also used step-halving in the code, doesn't help, either. $\endgroup$ Commented Apr 16, 2014 at 8:50

1 Answer 1

1
$\begingroup$

I would suggest you make the question concise and skip the derivation part.

Two suggestions

  • Choose a small $\alpha$
  • You need to slowly decrease $\alpha$ over time.

Details see Alex.R's answer here

Why one epoch for stochastic gradient descent (SGD) is much better than one iteration for gradient decent (GD)?

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.