Let us consider a regression problem where a scalar target variable y must be predicted based on a vector of observable predictors .
We assume that the dynamics are nonlinear and, specifically, that
where is a vector of unknown real parameters, f is a known deterministic function nonlinear in θ and ε is a random noise with distribution
for some positive value of σ.
If we have N independent observations , we can estimate the value of θ by maximizing the log-likelihood. We can optionally choose to weight some observations more or less that others by choosing weights
and assuming that
for all i (where σ is unknown).
Under these assumptions, the log-likelihood is given by
Setting for simplicity of notation
we see that maximizing the log-likelihood is equivalent to minimizing the following objective function (weighted sum of squared residuals):
Start from and approximate the objective function around
with the following quadratic function:
Thanks to the objective function's special form, we can calculate a local quadratic approximation by taking the first order expansion of f instead of the second-order expansion of the objective function itself.
Defining for simplicity
we have that this quadratic approximation reaches its minimum when
which is satisfied when the displacement δ solves the following linear system:
Since
The following contributions are added to the gradient of φ in the two cases:
The linear system is now
where
