Why should MLE be considered in Logistic Regression when it cannot give a definite solution?

Question

If MLE (Maximum Likelihood Estimation) cannot give a proper closed-form solution for the parameters in Logistic Regression, why is this method discussed so much? Why not just stick to Gradient Descent for estimating parameters?

Wait a sec... when does Gradient Descent give closed-form solution? (given that you also state it "estimates parameters"). — lpounng
– lpounng, Commented May 5, 2022 at 3:11

Dave · Accepted Answer · 2022-07-05 18:43:10Z

Maximum likelihood is a method for estimating parameters.

Gradient descent is a numerical technique to help us solve equations that we might not be able to solve by traditional means (e.g., we can't get a closed-form solution when we take the derivative and set it equal to zero).

The two can coexist.

In fact, when we use gradient descent to minimize the crossentropy loss in a logistic regression, we are solving for a maximum likelihood estimator of the regression parameters, as minimizing crossentropy loss and maximizing likelihood are equivalent in logistic regression.

In order to descend a gradient, you have to have a function. If we take the negative log-likelihood and descend the gradient until we find the minimum, we have done the equivalent of finding the maximum of the log-likelihood and, thus, the likelihood.

Cryo · Accepted Answer · 2022-05-05 07:00:44Z

I think you are comparing apples and oranges here. Maximum likelihood is a the maximum value of your likelihood function, which somehow describes your data generation process. Specifically likelihood gives you the probability of observing your data, given the data-generation model you imagine. It is similar to a loss metric in that respect.

Gradient descent is an approach to varying your parameters in such a way as to maximize/minimize some function, e.g. loss-metric.

So why are you trying to compare these two things? It would seem to me that you can use likelihood as a loss-function (normally log-likelihood) and then run gradient descent to maximize it.

Perhaps that's what you meant. Why would you use likelihood as the loss-metric for gradient descent? In my use-cases it helped when available data was not evenly sampled or non-homoschedastic, e.g. if you want to do regression to get y as a function of x=0...1, but variance of y is greater in the region x=0.2...0.4 compared to other regions. Using least squares as the loss metric may give a poor fit (since the intrinsic assumption of least squares is that your variance in y is the same).

Minimizing crossentropy loss is equivalent to maximum likelihood estimation in logistic regression. — Dave
– Dave, Commented Jul 5, 2022 at 18:46
@Dave, your comment is correct, but I am not sure why it is relevant. Yes, minimizing cross-entropy can be equivalent to likelihood maximization. At least sometimes, where you are Bernoulli sampling. Does it change my comment in any way? — Cryo
– Cryo, Commented Jul 6, 2022 at 3:13

Brian Spiering · Accepted Answer · 2022-07-05 14:22:25Z

MLE (Maximum Likelihood Estimation) is discussed a lot because it is relatively straightforward and commonly applicable. The downside is that MLE is not always the best method for a given context or it might not even be possible to be applied.

In practice, gradient descent and other methods are commonly used for estimating parameters in logistic regression.

Minimizing crossentropy loss is equivalent to maximum likelihood estimation in logistic regression. — Dave
– Dave, Commented Jul 5, 2022 at 18:46

Stack Exchange Network

Why should MLE be considered in Logistic Regression when it cannot give a definite solution?

3 Answers 3

Linked

Hot Network Questions

Why should MLE be considered in Logistic Regression when it cannot give a definite solution?

3 Answers 3

Linked

Related

Hot Network Questions