Name	Name	Last commit message	Last commit date
Latest commit History 12 Commits
pictures	pictures
sample-data	sample-data
README.md	README.md
example.py	example.py
example_reg.py	example_reg.py
levenberg_marquardt.py	levenberg_marquardt.py

Levenberg-Marquardt algorithm for nonlinear regression

Nonlinear regression: generic statement

Assume that we have N observations

$(\underline{x}_1, y_1), ..., (\underline{x}_N, y_N)$

where $\underline{x}_i \in \mathbb{R}^n$ are observable predictors and $y_i$ are target real variables (i.e., the variables that must be predicted).

We would like to estimate a nonlinear model of the form

$y = f(\underline{x};\underline{\theta}) + \epsilon$

where θ is a vector of k unknown real parameters, f is a known function nonlinear in θ and $\epsilon \sim N(0, \sigma^2)$ for some positive value of σ.

Setting

$X =\begin{bmatrix} \underline{x}_1 \\ ... \\ \underline{x}_N \\ \end{bmatrix} \in \mathbb{R}^{N \times n}, \hspace{10} \underline{y} = \begin{bmatrix} y_1 \\ ... \\ y_N \end{bmatrix} \in \mathbb{R}^N, \hspace{10} \underline{\epsilon} = \begin{bmatrix} \epsilon_1 \\ ... \\ \epsilon_N \end{bmatrix} \in \mathbb{R}^N$

and assuming that the N observations are independent, the log-likelihood of the model given our N observations is

$L(\underline{\theta} | X, \underline{y}) = \prod_{i = 1}^N \frac{1}{\sigma\sqrt{2\pi}}\exp \left \{ - \left( \frac{y_i-f(\underline{x}_i;\underline{\theta})}{\sigma} \right) ^ 2 \right \} \Rightarrow \log L(\underline{\theta} | X, \underline{y}) = -\frac{1}{(2\pi\sigma^2)^\frac{N}{2}} \sum_{i=1}^N \left( \frac{y_i-f(\underline{x}_i;\underline{\theta})}{\sigma} \right) ^ 2$

We estimate the model parameters by maximizing the log-likelihood with respect to θ. This is equivalent to minimizing the following objective function (sum of squared residuals):

$\text{Obj}(\underline{\theta}) = \left \| \underline{y} - f(X, \underline{\theta}) \right \| ^ 2$

The Levenberg-Marquardt algorithm

Start from $\underline{\theta}^{(0)}$ and approximate the objective function around $\underline{\theta}^{(0)}$ with the following quadratic function:

$\phi(\underline{\delta}) = \left \| \underline{y} - f(X, \underline{\theta}^{(0)}) - J_{\underline{\theta}}f(X, \underline{\theta})_{|_{\underline{\theta} = \underline{\theta}^{(0)}}} \cdot \underline{\delta} \right \| ^2 \sim \text{Obj}(\underline{\theta}^{(0)} + \underline{\delta})$

Thanks to the objective function's special form, we can calculate a local quadratic approximation by taking the first order expansion of f instead of the second-order expansion of the objective function itself.

Defining for simplicity

$J^{(0)} := J_{\underline{\theta}}f(X, \underline{\theta})_{|_{\underline{\theta} = \underline{\theta}^{(0)}}} \in \mathbb{R}^{N \times k}, \hspace{10} \underline{\epsilon}^{(0)} := \underline{y} - f(X, \underline{\theta}^{(0)}) \in \mathbb{R}^N$ ,