2
$\begingroup$

If MLE (Maximum Likelihood Estimation) cannot give a proper closed-form solution for the parameters in Logistic Regression, why is this method discussed so much? Why not just stick to Gradient Descent for estimating parameters?

$\endgroup$
1
  • $\begingroup$ Wait a sec... when does Gradient Descent give closed-form solution? (given that you also state it "estimates parameters"). $\endgroup$ Commented May 5, 2022 at 3:11

3 Answers 3

1
$\begingroup$

Maximum likelihood is a method for estimating parameters.

Gradient descent is a numerical technique to help us solve equations that we might not be able to solve by traditional means (e.g., we can't get a closed-form solution when we take the derivative and set it equal to zero).

The two can coexist.

In fact, when we use gradient descent to minimize the crossentropy loss in a logistic regression, we are solving for a maximum likelihood estimator of the regression parameters, as minimizing crossentropy loss and maximizing likelihood are equivalent in logistic regression.

In order to descend a gradient, you have to have a function. If we take the negative log-likelihood and descend the gradient until we find the minimum, we have done the equivalent of finding the maximum of the log-likelihood and, thus, the likelihood.

$\endgroup$
1
$\begingroup$

I think you are comparing apples and oranges here. Maximum likelihood is a the maximum value of your likelihood function, which somehow describes your data generation process. Specifically likelihood gives you the probability of observing your data, given the data-generation model you imagine. It is similar to a loss metric in that respect.

Gradient descent is an approach to varying your parameters in such a way as to maximize/minimize some function, e.g. loss-metric.

So why are you trying to compare these two things? It would seem to me that you can use likelihood as a loss-function (normally log-likelihood) and then run gradient descent to maximize it.

Perhaps that's what you meant. Why would you use likelihood as the loss-metric for gradient descent? In my use-cases it helped when available data was not evenly sampled or non-homoschedastic, e.g. if you want to do regression to get y as a function of x=0...1, but variance of y is greater in the region x=0.2...0.4 compared to other regions. Using least squares as the loss metric may give a poor fit (since the intrinsic assumption of least squares is that your variance in y is the same).

$\endgroup$
2
  • $\begingroup$ Minimizing crossentropy loss is equivalent to maximum likelihood estimation in logistic regression. $\endgroup$ Commented Jul 5, 2022 at 18:46
  • $\begingroup$ @Dave, your comment is correct, but I am not sure why it is relevant. Yes, minimizing cross-entropy can be equivalent to likelihood maximization. At least sometimes, where you are Bernoulli sampling. Does it change my comment in any way? $\endgroup$ Commented Jul 6, 2022 at 3:13
0
$\begingroup$

MLE (Maximum Likelihood Estimation) is discussed a lot because it is relatively straightforward and commonly applicable. The downside is that MLE is not always the best method for a given context or it might not even be possible to be applied.

In practice, gradient descent and other methods are commonly used for estimating parameters in logistic regression.

$\endgroup$
1
  • $\begingroup$ Minimizing crossentropy loss is equivalent to maximum likelihood estimation in logistic regression. $\endgroup$ Commented Jul 5, 2022 at 18:46

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.