Why is the Kalman filter so popular?

Question

This is something I have been trying to understand.

Consider the following local level linear state space model:

Observation equation: $y_t = \mu_t + \epsilon_t$ where $\epsilon_t \sim N(0, \sigma_\epsilon^2)$

State equation: $\mu_t = \mu_{t-1} + \eta_t$ where $\eta_t \sim N(0, \sigma_\eta^2)$

To estimate the parameters of this model, we need to optimize the following likelihood - this is the joint probability of getting the observations conditional on the hidden states and parameters multiplied by the probability of getting the states conditional on parameters (the integral is used to remove the influence of the states as they are unobserved):

$$L(\sigma_\epsilon^2, \sigma_\eta^2) = \int p(y_1, ..., y_T | \mu_1, ..., \mu_T, \sigma_\epsilon^2, \sigma_\eta^2) \cdot p(\mu_1, ..., \mu_T | \sigma_\eta^2) \, d\mu_1 ... d\mu_T$$

The first part of the integral involves a joint product of normal densities:

$$p(y_1, ..., y_T | \mu_1, ..., \mu_T, \sigma_\epsilon^2) = \prod_{t=1}^T \frac{1}{\sqrt{2\pi\sigma_\epsilon^2}} \exp\left(-\frac{(y_t - \mu_t)^2}{2\sigma_\epsilon^2}\right)$$

The second part also involves a joint product of normal densities (a prior is placed on the first observation for initialization) :

$$p(\mu_1, ..., \mu_T | \sigma_\eta^2) = p(\mu_1) \prod_{t=2}^T \frac{1}{\sqrt{2\pi\sigma_\eta^2}} \exp\left(-\frac{(\mu_t - \mu_{t-1})^2}{2\sigma_\eta^2}\right)$$

Together, the full joint likelihood becomes:

$$L(\sigma_\epsilon^2, \sigma_\eta^2) = \int \left[\prod_{t=1}^T \frac{1}{\sqrt{2\pi\sigma_\epsilon^2}} \exp\left(-\frac{(y_t - \mu_t)^2}{2\sigma_\epsilon^2}\right)\right]$$ $$\times \left[\frac{1}{\sqrt{2\pi\kappa^2}} \exp\left(-\frac{\mu_1^2}{2\kappa^2}\right) \prod_{t=2}^T \frac{1}{\sqrt{2\pi\sigma_\eta^2}} \exp\left(-\frac{(\mu_t - \mu_{t-1})^2}{2\sigma_\eta^2}\right)\right] d\mu_1 ... d\mu_T$$

My Question: Why can't this likelihood function be numerically optimized to obtain the estimates of the state space model? Why is the Kalman Filter typically used for this instead?

My guess is that in the 1960's, solving this likelihood function numerically was difficult given the limitations in computation power. The Kalman Filter has a simpler objective function and the computation is simplified as it relies on recursion.

Is this correct?

To complement what Royi said below, this paper explains how the kalman filter can be viewed as a regression problem. jstor.org/stable/2284643 — mark leeds
– mark leeds, Commented Jul 12 at 10:38
thank you for sending this! I have been trying to find an answer for the following question: math.stackexchange.com/questions/5082004/… do you have any ideas? — stats_noob
– stats_noob, Commented Jul 12 at 10:47

Royi · Accepted Answer · 2025-07-12 17:45:51Z

The main advantage of the Kalman Filter in this context is the ability to solve the problem sequentially.
Namely, given the current estimation, calculate the new estimation given a new measurement.

You could package all history into a single Maximum Likelihood problem.
The Kalman Filter applies a simple update rule to have the same optimal solution given a solution of $n - 1$ measurements and a new measurement.

So you may see the Kalman Filter as a framework to solve this problem sequentially.
This fits a dynamical system in real world and their control system.
In this scenario data is collected sequentially and compute resources are usually limited compared to the latency allowed to the solution to be received.

Remark: The Kalman Filter is basically a sequential form solution for (The Stationary Case):

$$\begin{align} \arg \min_{ \left\{\boldsymbol{x}_{l}\right\}_{l = 0}^{k}, \left\{\boldsymbol{w}_{l}\right\}_{l = 0}^{k - 1} } \quad & \frac{1}{2} {\left( \boldsymbol{x}_{0} - \bar{\boldsymbol{x}}_{0} \right)}^{T} \boldsymbol{P}_{0}^{-1} \left( \boldsymbol{x}_{0} - \bar{\boldsymbol{x}}_{0} \right) \\ & + \frac{1}{2} \sum_{l = 0}^{k} {\left( \boldsymbol{z}_{l} - \boldsymbol{H} \boldsymbol{x}_{l} \right)}^{T} \boldsymbol{R}^{-1} \left( \boldsymbol{z}_{l} - \boldsymbol{H} \boldsymbol{x}_{l} \right) \\ & + \frac{1}{2} \sum_{l = 0}^{k - 1} \boldsymbol{w}_{l}^{T} \boldsymbol{Q}^{-1} \boldsymbol{w}_{l} \\ \text{subject to} \quad & \begin{aligned} \boldsymbol{x}_{l + 1} & = \boldsymbol{F} \boldsymbol{x}_{l} + \boldsymbol{G} \boldsymbol{w}_{l}, \; l = 0, 1, 2, \ldots, k - 1 \end{aligned} \end{align}$$

Which is equivalent of the Maximum Likelihood problem.
See How to Regularize the State Variables of a Kalman Filter.

Thank you for your answer @Royi! Is it possible to compare how well the Kalman Filter compares to the full likelihood? — stats_noob
– stats_noob, Commented Jul 13 at 17:49
@stats_noob, Given the assumptions of the Kalman Filter holds they have the same solution. — Royi
– Royi, Commented Jul 14 at 4:50

Stack Exchange Network

Why is the Kalman filter so popular?

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

Why is the Kalman filter so popular?

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions