Skip to content

flowel1/nonlinear-regression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Levenberg-Marquardt algorithm for nonlinear regression

Nonlinear regression: generic problem statement

Let us consider a regression problem where a scalar target variable y must be predicted based on a vector of observables .

We assume that the dynamics are nonlinear and, specifically, that

where is a vector of unknown real parameters, f is a known deterministic function nonlinear in θ and ε is a random noise with distribution for some positive and unknown value of σ.

If we have N independent observations , we can estimate the value of θ by maximizing the log-likelihood. We can optionally choose to weight some observations more or less that others by choosing weights and assuming that for each i.

Under these assumptions, the log-likelihood is given by

Setting for simplicity of notation

we see that maximizing the log-likelihood is equivalent to minimizing the following objective function (weighted sum of squared residuals):

Moreover, the maximum likelihood estimate for σ is

.

The Levenberg-Marquardt algorithm

The Levenberg-Marquardt algorithm calculates the minimum of Obj in an iterative way calculating a series of local quadratic approximations.

The algorithm requires that an initial guess is provided for the unknown vector of parameters. The objective function Obj is then approximated locally in a neighbourhood of with the following quadratic function (the approximation is only valid for small values of ||δ||:

The peculiarity here is that, thanks to the objective function's special form, we can calculate a local quadratic approximation by taking the first order expansion of f instead of the second-order expansion of the objective function itself (as we would be forced to do in the general case).

Defining for simplicity of notation

we have that this quadratic has a unique global minimum satisfying the equation

which is equivalent to requiring that the displacement δ solves the following linear system:

This picture illustrates the calculation of δ in a simple one-dimensional case:

alt-text

In the picture, the displacement has been applied as it is. Actually, in practice, since the quadratic approximation is generally only valid locally, δ will just provide the displacement direction, while its module will be re-scaled according to a small positive number h (step) when updating θ:

Regularization

The Levenberg-Marquardt algorithm can be extended to incorporate regularization terms. Seeing the problem in a Bayesian perspective, we can decide to provide a prior distribution on θ. For simplicity, we assume that the prior distributions on the different parameter components are independent, so that the global prior distribution can be factorized:

We assume moreover that the one-dimensional priors on the single components are either gaussian or lognormal:

Reasoning in Bayesian terms, this time we estimate θ via maximum posterior instead of maximum likelihood. The posterior distribution of θ is

(using the fact that ).

Keeping the same notations as before, the objective function to minimize is now

(the term const incorporates terms that are independent of both θ and σ). This corresponds to minimizing a weighted sum of squared residuals plus a series of regularization terms.

The following contributions are added to φ and to the j-th component of its gradient due to the presence of prior distributions:

Gaussian prior:

Lognormal prior:

(We have used the first-order expansion of log in the lognormal case).

The linear system is now

where

and σ is replaced by its maximum likelihood estimate (see above).

Releases

No releases published

Packages

No packages published

Languages