Name	Name	Last commit message	Last commit date
Latest commit History 17 Commits
pictures	pictures
sample-data	sample-data
README.md	README.md
example.py	example.py
example_reg.py	example_reg.py
levenberg_marquardt.py	levenberg_marquardt.py

Levenberg-Marquardt algorithm for nonlinear regression

Nonlinear regression: generic problem statement

Let us consider a regression problem where a scalar target variable y must be predicted based on a vector of observable predictors $\underline{x} \in \mathbb{R}^n$ .

We assume that the dynamics are nonlinear and, specifically, that

$y = f(\underline{x};\underline{\theta}) + \epsilon$

where $\underline{\theta} \in \mathbb{R}^k$ is a vector of unknown real parameters, f is a known deterministic function nonlinear in θ and ε is a random noise with distribution $\epsilon \sim N(0, \sigma^2)$ for some positive value of σ.

If we have N independent observations $(\underline{x}_1, y_1), ..., (\underline{x}_N, y_N)$ , we can estimate the value of θ by maximizing the log-likelihood. We can optionally choose to weight some observations more or less that others by choosing weights $w_i, i = 1, ..., n$ and assuming that $\small y_i \sim N(f(\underline{x}_i, \underline{\theta}), \frac{\sigma^2}{w_i})$ for all i (where σ is unknown).

Under these assumptions, the log-likelihood is given by

$L(\underline{\theta} | X, \underline{y}) = \prod_{i = 1}^N \frac{1}{\sigma\sqrt{2\pi}}\exp \left \{ - w_i \cdot \left( \frac{y_i-f(\underline{x}_i;\underline{\theta})}{\sigma} \right) ^ 2 \right \} \Rightarrow \newline \log L(\underline{\theta} | X, \underline{y}) = -\frac{1}{(2\pi\sigma^2)^\frac{N}{2}} \sum_{i=1}^N w_i \cdot \left( \frac{y_i-f(\underline{x}_i;\underline{\theta})}{\sigma} \right) ^ 2$

Setting for simplicity of notation

$\small X =\begin{bmatrix} \underline{x}_1 \\ ... \\ \underline{x}_N \\ \end{bmatrix} \in \mathbb{R}^{N \times n}, \hspace{10} \underline{y} = \begin{bmatrix} y_1 \\ ... \\ y_N \end{bmatrix} \in \mathbb{R}^N, \hspace{10} W = \text{diag} \left \{ \sqrt{w_1}, ..., \sqrt{w_N} \right \} \in \mathbb{R}^{N \times N}$

we see that maximizing the log-likelihood is equivalent to minimizing the following objective function (weighted sum of squared residuals):

$\text{Obj}(\underline{\theta}) = \left \| W \cdot \left( \underline{y} - f(X, \underline{\theta}) \right) \right \| ^ 2$

The Levenberg-Marquardt algorithm

Start from $\underline{\theta}^{(0)}$ and approximate the objective function around $\underline{\theta}^{(0)}$ with the following quadratic function:

$\phi(\underline{\delta}) = \left \| \underline{y} - f(X, \underline{\theta}^{(0)}) - J_{\underline{\theta}}f(X, \underline{\theta})_{|_{\underline{\theta} = \underline{\theta}^{(0)}}} \cdot \underline{\delta} \right \| ^2 \sim \text{Obj}(\underline{\theta}^{(0)} + \underline{\delta})$

Thanks to the objective function's special form, we can calculate a local quadratic approximation by taking the first order expansion of f instead of the second-order expansion of the objective function itself.

Defining for simplicity

$J^{(0)} := J_{\underline{\theta}}f(X, \underline{\theta})_{|_{\underline{\theta} = \underline{\theta}^{(0)}}} \in \mathbb{R}^{N \times k}, \hspace{10} \underline{\epsilon}^{(0)} := \underline{y} - f(X, \underline{\theta}^{(0)}) \in \mathbb{R}^N$ ,