Differences in the Formulations of Kalman Filter and Kalman-Bucy Filter

The discrete-time filtering formulation differs from its continuous-time counterpart. The difference is revealed by looking at the observations.

Published

08 May 2025

Discrete-time filtering: Kalman filter

Consider a discrete-time dynamical system of the following form,

$\mathbf{x}_{n+1} = \mathbf{M} \mathbf{x}_n + \bm{\mathsf{\Sigma}}_{\mathbf{x}} \mathbf{\epsilon}_{\mathbf{x},n} \tag{\small 1a}$ $\mathbf{y}_{n+1} = \mathbf{H} \mathbf{x}_{n+1} + \bm{\mathsf{\Sigma}}_{\mathbf{y}} \mathbf{\epsilon}_{\mathbf{y},n+1} \tag{\small 1b}$ where $\mathbf{x}_n$ and $\mathbf{y}_n$ are unobserved states and observations at discrete time $t_n$ , respectively. $\mathbf{\epsilon}_{\mathbf{x}, n}$ and $\mathbf{\epsilon}_{\mathbf{y}, n}$ are standard Gaussian random noises. $\mathbf{M}$ , $\mathbf{H}$ , $\bm{\mathsf{\Sigma}}_{\mathbf{x}}$ , and $\bm{\mathsf{\Sigma}}_{\mathbf{y}}$ are matrices (assumed to be constant-in-time here for brevity). (1a) is often called the “model” and $\mathbf{Q} = \bm{\mathsf{\Sigma}}_{\mathbf{x}} \bm{\mathsf{\Sigma}}_{\mathbf{x}}^\top$ is the corresponding model error covariance. Similarly, $\mathbf{R} = \bm{\mathsf{\Sigma}}_{\mathbf{y}} \bm{\mathsf{\Sigma}}_{\mathbf{y}}^\top$ is the observation error covariance.

The discrete-time filtering problem aims for the posterior distribution $\mathbb{P}[\mathbf{x}_n | \mathbf{y}_n]$ . The Kalman filter gives the following posterior (analysis) state estimate and covariance for (1), $\mathbf{x}^a_n = \mathbf{x}^f_n + \mathbf{K}_n (\mathbf{y}_n - \mathbf{H} \mathbf{x}^f_n) \tag{\small 2a}$ $\mathbf{P}^a_{n} = (\mathbf{I} - \mathbf{K}_n \mathbf{H}) \mathbf{P}^f_{n} \tag{\small 2b}$ where $\mathbf{P}^f_{n} := \mathbb{E}[(\mathbf{x}^f_{n} - \mathbf{\mu}_{\mathbf{x},n})(\mathbf{x}^f_{n} - \mathbf{\mu}_{\mathbf{x},n})^\top],$ $\mathbf{P}^a_{n} := \mathbb{E}[(\mathbf{x}^a_n - \mathbf{\mu}_{\mathbf{x},n})(\mathbf{x}^a_n - \mathbf{\mu}_{\mathbf{x},n})^\top],$ $\mathbf{K}_n = \mathbf{P}^f_{n} \mathbf{H}^\top \left( \mathbf{H} \mathbf{P}^f_n \mathbf{H}^\top + \mathbf{R} \right).$ The prior (forecast) covariance matrix is often obtained by $\mathbf{P}^f_{n+1} = \mathbf{M} \mathbf{P}_n^a \mathbf{M}^\top + \mathbf{Q}.$

Continuous-time filtering: Kalman-Bucy filter

Consider a continuous stochastic differential equation of the following form, $\frac{\mathrm{d}\mathbf{x}}{\mathrm{d}t} = \mathbf{M} \mathbf{x} + \bm{\mathsf{\Sigma}}_{\mathbf{x}} \dot{\mathbf{W}}_\mathbf{x} \tag{\small 3a}$ $\frac{\mathrm{d}\mathbf{y}}{\mathrm{d}t} = \mathbf{H} \mathbf{x} + \bm{\mathsf{\Sigma}}_{\mathbf{y}} \dot{\mathbf{W}}_\mathbf{y} \tag{\small 3b}$ where $\mathbf{x}(t)$ and $\mathbf{y}(t)$ are unobserved states and observations at time $t$ , respectively. $\dot{\mathbf{W}}_\mathbf{x}$ and $\dot{\mathbf{W}}_\mathbf{y}$ are Gaussian white noises. $\mathbf{M}$ , $\mathbf{H}$ , $\bm{\mathsf{\Sigma}}_{\mathbf{x}}$ , and $\bm{\mathsf{\Sigma}}_{\mathbf{y}}$ are constant matrices. $\mathbf{Q} = \bm{\mathsf{\Sigma}}_{\mathbf{x}} \bm{\mathsf{\Sigma}}_{\mathbf{x}}^\top$ is the model error covariance, and $\mathbf{R} = \bm{\mathsf{\Sigma}}_{\mathbf{y}} \bm{\mathsf{\Sigma}}_{\mathbf{y}}^\top$ is the observation error covariance.

The continuous-time filtering problem aims for the posterior $\mathbb{P}[\mathbf{x}(t) | \mathbf{y}(s), 0 \leq s \leq t]$ . The posterior mean and covariance for (3) are given by the Kalman-Bucy filter (Bishop & Del Moral, 2023) as $\frac{\mathrm{d}\mathbf{\chi}}{\mathrm{d}t} = \mathbf{M} \mathbf{\chi} + \mathbf{K}(t) \left( \frac{\mathrm{d}\mathbf{y}}{\mathrm{d}t} - \mathbf{H} \mathbf{\chi} \right) \tag{\small 4a}$ $\frac{\mathrm{d}\mathbf{P}}{\mathrm{d}t} = \mathbf{M} \mathbf{P} + \mathbf{P} \mathbf{M}^\top - \mathbf{P} \mathbf{H}^\top \mathbf{R}^{-1} \mathbf{H} \mathbf{P} + \mathbf{Q} \tag{\small 4b}$ where $\mathbf{x}(t) | \mathbf{y}(s), 0 \leq s \leq t \sim \mathcal{N}(\mathbf{\chi}(t), \mathbf{P}(t)),$ $\mathbf{K}(t) = \mathbf{P}(t) \mathbf{H}^\top \mathbf{R}^{-1}.$ (4b) is known as the Riccati equation.

Kalman Filter and Kalman-Bucy filter are not translatable

The Kalman filter system (1) and Kalman-Bucy filter system (3), although similar in form, are not translatable with each other. The key difference that prohibits the translation is not in the model equations (1a), (3a), but in the observation equations (1b), (3b)¹. We will try the translation while focusing on the observation equations to reveal the differences.

When taking $\Delta t \to 0$ in the Kalman filter, one will quickly realize that the observation noise $\mathbf{\epsilon}_{\mathbf{y}}(t)$ does not have a continuous path, while the observation noise $\dot{\mathbf{W}}_{\mathbf{y}}$ in the Kalman-Bucy filter gives a continuous path for $\mathbf{y}(t)$ . This already indicates the differences between the two formulations. To look into the details, we write down the time increments of observations based on the observation equation (1b) in the Kalman filter: $\mathbf{y}_{n+1} - \mathbf{y}_{n} = \mathbf{H}(\mathbf{M} - \mathbf{I}) \mathbf{x}_n + \mathbf{H} \bm{\mathsf{\Sigma}}_{\mathbf{x}} \mathbf{\epsilon}_{\mathbf{x}, n} + \bm{\mathsf{\Sigma}}_{\mathbf{y}} (\mathbf{\epsilon}_{\mathbf{y}, n+1} - \mathbf{\epsilon}_{\mathbf{y}, n}) \tag{\small 5}.$

Compared to the discretized observation equation (3b) in the Kalman-Bucy filter: $\mathbf{y}_{n+1} - \mathbf{y}_{n} = \mathbf{H} \mathbf{x}_n \Delta t + \bm{\mathsf{\Sigma}}_{\mathbf{y}} \mathbf{\epsilon}_{\mathbf{y},n} \sqrt{\Delta t} \tag{\small 6}$

(5) is different in the following ways:

The observation noise does not go to zero as $\Delta t \to 0$ , unlike $\mathrm{d} \mathbf{W}_{\mathbf{y}}$ in the Kalman-Bucy filter.
The influences of the observation noises on the dynamics of $\mathbf{y}$ do not accumulate over time.

The first point results in the discontinuity of $\mathbf{y}(t)$ . The second point says that previous observation noises are abandoned in time marching, which is inherited from the assumption that the observation noises of the Kalman filter are independent in time, and the observation procedure does not interfere with the underlying dynamics. In contrast, the observation noises in the Kalman-Bucy filter are involved in the dynamics of $\mathbf{y}(t)$ ². Therefore, the Kalman-Bucy filter is not the continuous-time version of the Kalman filter by taking the limit of $\Delta t \to 0$ . The Kalman filter is not the discrete-time version of the Kalman-Bucy filter by simply discretizing the continuous equations.

Historically, the observations $\mathbf{y}$ in the Kalman filter are replaced by the time derivative $\frac{\mathrm{d}\mathbf{y}}{\mathrm{d}t}$ (or increments $\mathrm{d}\mathbf{y}$ ) in the Kalman-Bucy filter (Kalman & Bucy, 1961; Jazwinski, 1970; Bergemann & Reich, 2012). Although this replacement leads to a form of the Kalman-Bucy filter similar to the Kalman filter, it introduces an inconsistency in the observation noises — consider $\frac{\mathrm{d}\mathbf{y}}{\mathrm{d}t}$ as the “observations,” the observation noise $\dot{\mathbf{W}}_{\mathbf{y}}$ has an infinite magnitude, or consider $\mathrm{d}\mathbf{y}$ as the “observations,” the observation noise $\mathrm{d}\mathbf{W}_{\mathbf{y}}$ has a zero magnitude, both of which are unrealistic for the Kalman filter when taking the limit $\Delta t \to 0$ . The Kalman-Bucy filter only looks like the Kalman filter in the integral sense: taking a $\Delta t$ that is not too small, then integrating (3b) over $t_0, t_0 + \Delta t$ , we have: $\mathbf{y}(t_0 + \Delta t) - \mathbf{y}(t_0) = \int_{t_0}^{t_0 + \Delta t} \mathbf{H} \mathbf{x}(s) \, ds + \int_{t_0}^{t_0 + \Delta t} \bm{\mathsf{\Sigma}}_{\mathbf{y}} \mathrm{d} \mathbf{W}_\mathbf{y}$ The time increments $\mathbf{y}(t_0 + \Delta t) - \mathbf{y}(t_0)$ are considered as the “observation” with a noise $\int_{t_0}^{t_0 + \Delta t} \bm{\mathsf{\Sigma}}_{\mathbf{y}} \mathrm{d} \mathbf{W}_\mathbf{y} \sim \mathcal{N}(0, \bm{\mathsf{\Sigma}}_{\mathbf{y}} \Delta t)$ . By solely looking at $\mathbf{y}$ , the $\mathbf{y}$ in the Kalman-Bucy filter is very different from the observations $\mathbf{y}$ defined in the Kalman filter. This should be noted with caution to avoid potential confusion.

(1a) is the discrete formulation of an SDE, which is usually translatable to a time discretization of an SDE.

↩
"Measurement error" is a better term for the Kalman-filter-type observation noise, while the Kalman-Bucy-filter-type observation noise is closer to "observation model error" that characterizes the unresolved dynamics of observations.

↩

Bishop, A. N., & Del Moral, P. (2023). On the mathematical theory of ensemble (linear-Gaussian) Kalman–Bucy filtering. Mathematics of Control, Signals, and Systems, 35(4), 835–903. https://doi.org/10.1007/s00498-023-00357-2
Kalman, R. E., & Bucy, R. S. (1961). New results in linear filtering and prediction theory. Journal of Fluids Engineering, Transactions of the ASME, 83(1), 95–108. https://doi.org/10.1115/1.3658902
Jazwinski, A. (1970). Stochastic Processes And Filtering Theory. 64. https://doi.org/10.1016/S0076-5392(09)60368-4
Bergemann, K., & Reich, S. (2012). An ensemble Kalman-Bucy filter for continuous data assimilation. Meteorologische Zeitschrift, 21(3), 213–219. https://doi.org/10.1127/0941-2948/2012/0307