Differences in the Formulations of Kalman Filter and Kalman-Bucy Filter

The discrete-time filtering formulation differs from its continuous-time counterpart. The difference is revealed by looking at the observations.

Discrete-time filtering: Kalman filter

Consider a discrete-time dynamical system of the following form,

xn+1=Mxn+Σxϵx,n(1a) \mathbf{x}_{n+1} = \mathbf{M} \mathbf{x}_n + \bm{\mathsf{\Sigma}}_{\mathbf{x}} \mathbf{\epsilon}_{\mathbf{x},n} \tag{\small 1a} yn+1=Hxn+1+Σyϵy,n+1(1b) \mathbf{y}_{n+1} = \mathbf{H} \mathbf{x}_{n+1} + \bm{\mathsf{\Sigma}}_{\mathbf{y}} \mathbf{\epsilon}_{\mathbf{y},n+1} \tag{\small 1b} where xn\mathbf{x}_n and yn\mathbf{y}_n are unobserved states and observations at discrete time tnt_n, respectively. ϵx,n\mathbf{\epsilon}_{\mathbf{x}, n} and ϵy,n\mathbf{\epsilon}_{\mathbf{y}, n} are standard Gaussian random noises. M\mathbf{M}, H\mathbf{H}, Σx\bm{\mathsf{\Sigma}}_{\mathbf{x}}, and Σy\bm{\mathsf{\Sigma}}_{\mathbf{y}} are matrices (assumed to be constant-in-time here for brevity). (1a) is often called the “model” and Q=ΣxΣx\mathbf{Q} = \bm{\mathsf{\Sigma}}_{\mathbf{x}} \bm{\mathsf{\Sigma}}_{\mathbf{x}}^\top is the corresponding model error covariance. Similarly, R=ΣyΣy\mathbf{R} = \bm{\mathsf{\Sigma}}_{\mathbf{y}} \bm{\mathsf{\Sigma}}_{\mathbf{y}}^\top is the observation error covariance.

The discrete-time filtering problem aims for the posterior distribution P[xnyn]\mathbb{P}[\mathbf{x}_n | \mathbf{y}_n]. The Kalman filter gives the following posterior (analysis) state estimate and covariance for (1), xna=xnf+Kn(ynHxnf)(2a) \mathbf{x}^a_n = \mathbf{x}^f_n + \mathbf{K}_n (\mathbf{y}_n - \mathbf{H} \mathbf{x}^f_n) \tag{\small 2a} Pna=(IKnH)Pnf(2b) \mathbf{P}^a_{n} = (\mathbf{I} - \mathbf{K}_n \mathbf{H}) \mathbf{P}^f_{n} \tag{\small 2b} where Pnf:=E[(xnfμx,n)(xnfμx,n)], \mathbf{P}^f_{n} := \mathbb{E}[(\mathbf{x}^f_{n} - \mathbf{\mu}_{\mathbf{x},n})(\mathbf{x}^f_{n} - \mathbf{\mu}_{\mathbf{x},n})^\top], Pna:=E[(xnaμx,n)(xnaμx,n)], \mathbf{P}^a_{n} := \mathbb{E}[(\mathbf{x}^a_n - \mathbf{\mu}_{\mathbf{x},n})(\mathbf{x}^a_n - \mathbf{\mu}_{\mathbf{x},n})^\top], Kn=PnfH(HPnfH+R). \mathbf{K}_n = \mathbf{P}^f_{n} \mathbf{H}^\top \left( \mathbf{H} \mathbf{P}^f_n \mathbf{H}^\top + \mathbf{R} \right). The prior (forecast) covariance matrix is often obtained by Pn+1f=MPnaM+Q. \mathbf{P}^f_{n+1} = \mathbf{M} \mathbf{P}_n^a \mathbf{M}^\top + \mathbf{Q}.

Continuous-time filtering: Kalman-Bucy filter

Consider a continuous stochastic differential equation of the following form, dxdt=Mx+ΣxW˙x(3a) \frac{\mathrm{d}\mathbf{x}}{\mathrm{d}t} = \mathbf{M} \mathbf{x} + \bm{\mathsf{\Sigma}}_{\mathbf{x}} \dot{\mathbf{W}}_\mathbf{x} \tag{\small 3a} dydt=Hx+ΣyW˙y(3b) \frac{\mathrm{d}\mathbf{y}}{\mathrm{d}t} = \mathbf{H} \mathbf{x} + \bm{\mathsf{\Sigma}}_{\mathbf{y}} \dot{\mathbf{W}}_\mathbf{y} \tag{\small 3b} where x(t)\mathbf{x}(t) and y(t)\mathbf{y}(t) are unobserved states and observations at time tt, respectively. W˙x\dot{\mathbf{W}}_\mathbf{x} and W˙y\dot{\mathbf{W}}_\mathbf{y} are Gaussian white noises. M\mathbf{M}, H\mathbf{H}, Σx\bm{\mathsf{\Sigma}}_{\mathbf{x}}, and Σy\bm{\mathsf{\Sigma}}_{\mathbf{y}} are constant matrices. Q=ΣxΣx\mathbf{Q} = \bm{\mathsf{\Sigma}}_{\mathbf{x}} \bm{\mathsf{\Sigma}}_{\mathbf{x}}^\top is the model error covariance, and R=ΣyΣy\mathbf{R} = \bm{\mathsf{\Sigma}}_{\mathbf{y}} \bm{\mathsf{\Sigma}}_{\mathbf{y}}^\top is the observation error covariance.

The continuous-time filtering problem aims for the posterior P[x(t)y(s),0st]\mathbb{P}[\mathbf{x}(t) | \mathbf{y}(s), 0 \leq s \leq t]. The posterior mean and covariance for (3) are given by the Kalman-Bucy filter (Bishop & Del Moral, 2023) as dχdt=Mχ+K(t)(dydtHχ)(4a) \frac{\mathrm{d}\mathbf{\chi}}{\mathrm{d}t} = \mathbf{M} \mathbf{\chi} + \mathbf{K}(t) \left( \frac{\mathrm{d}\mathbf{y}}{\mathrm{d}t} - \mathbf{H} \mathbf{\chi} \right) \tag{\small 4a} dPdt=MP+PMPHR1HP+Q(4b) \frac{\mathrm{d}\mathbf{P}}{\mathrm{d}t} = \mathbf{M} \mathbf{P} + \mathbf{P} \mathbf{M}^\top - \mathbf{P} \mathbf{H}^\top \mathbf{R}^{-1} \mathbf{H} \mathbf{P} + \mathbf{Q} \tag{\small 4b} where x(t)y(s),0stN(χ(t),P(t)), \mathbf{x}(t) | \mathbf{y}(s), 0 \leq s \leq t \sim \mathcal{N}(\mathbf{\chi}(t), \mathbf{P}(t)), K(t)=P(t)HR1. \mathbf{K}(t) = \mathbf{P}(t) \mathbf{H}^\top \mathbf{R}^{-1}. (4b) is known as the Riccati equation.

Kalman Filter and Kalman-Bucy filter are not translatable

The Kalman filter system (1) and Kalman-Bucy filter system (3), although similar in form, are not translatable with each other. The key difference that prohibits the translation is not in the model equations (1a), (3a), but in the observation equations (1b), (3b)1. We will try the translation while focusing on the observation equations to reveal the differences.

When taking Δt0\Delta t \to 0 in the Kalman filter, one will quickly realize that the observation noise ϵy(t)\mathbf{\epsilon}_{\mathbf{y}}(t) does not have a continuous path, while the observation noise W˙y\dot{\mathbf{W}}_{\mathbf{y}} in the Kalman-Bucy filter gives a continuous path for y(t)\mathbf{y}(t). This already indicates the differences between the two formulations. To look into the details, we write down the time increments of observations based on the observation equation (1b) in the Kalman filter: yn+1yn=H(MI)xn+HΣxϵx,n+Σy(ϵy,n+1ϵy,n).(5) \mathbf{y}_{n+1} - \mathbf{y}_{n} = \mathbf{H}(\mathbf{M} - \mathbf{I}) \mathbf{x}_n + \mathbf{H} \bm{\mathsf{\Sigma}}_{\mathbf{x}} \mathbf{\epsilon}_{\mathbf{x}, n} + \bm{\mathsf{\Sigma}}_{\mathbf{y}} (\mathbf{\epsilon}_{\mathbf{y}, n+1} - \mathbf{\epsilon}_{\mathbf{y}, n}) \tag{\small 5}.

Compared to the discretized observation equation (3b) in the Kalman-Bucy filter: yn+1yn=HxnΔt+Σyϵy,nΔt(6) \mathbf{y}_{n+1} - \mathbf{y}_{n} = \mathbf{H} \mathbf{x}_n \Delta t + \bm{\mathsf{\Sigma}}_{\mathbf{y}} \mathbf{\epsilon}_{\mathbf{y},n} \sqrt{\Delta t} \tag{\small 6}

(5) is different in the following ways:

  1. The observation noise does not go to zero as Δt0\Delta t \to 0, unlike dWy\mathrm{d} \mathbf{W}_{\mathbf{y}} in the Kalman-Bucy filter.
  2. The influences of the observation noises on the dynamics of y\mathbf{y} do not accumulate over time.

The first point results in the discontinuity of y(t)\mathbf{y}(t). The second point says that previous observation noises are abandoned in time marching, which is inherited from the assumption that the observation noises of the Kalman filter are independent in time, and the observation procedure does not interfere with the underlying dynamics. In contrast, the observation noises in the Kalman-Bucy filter are involved in the dynamics of y(t)\mathbf{y}(t)2. Therefore, the Kalman-Bucy filter is not the continuous-time version of the Kalman filter by taking the limit of Δt0\Delta t \to 0. The Kalman filter is not the discrete-time version of the Kalman-Bucy filter by simply discretizing the continuous equations.

Historically, the observations y\mathbf{y} in the Kalman filter are replaced by the time derivative dydt\frac{\mathrm{d}\mathbf{y}}{\mathrm{d}t} (or increments dy\mathrm{d}\mathbf{y}) in the Kalman-Bucy filter (Kalman & Bucy, 1961; Jazwinski, 1970; Bergemann & Reich, 2012). Although this replacement leads to a form of the Kalman-Bucy filter similar to the Kalman filter, it introduces an inconsistency in the observation noises — consider dydt\frac{\mathrm{d}\mathbf{y}}{\mathrm{d}t} as the “observations,” the observation noise W˙y\dot{\mathbf{W}}_{\mathbf{y}} has an infinite magnitude, or consider dy\mathrm{d}\mathbf{y} as the “observations,” the observation noise dWy\mathrm{d}\mathbf{W}_{\mathbf{y}} has a zero magnitude, both of which are unrealistic for the Kalman filter when taking the limit Δt0\Delta t \to 0. The Kalman-Bucy filter only looks like the Kalman filter in the integral sense: taking a Δt\Delta t that is not too small, then integrating (3b) over t0,t0+Δtt_0, t_0 + \Delta t, we have: y(t0+Δt)y(t0)=t0t0+ΔtHx(s)ds+t0t0+ΔtΣydWy \mathbf{y}(t_0 + \Delta t) - \mathbf{y}(t_0) = \int_{t_0}^{t_0 + \Delta t} \mathbf{H} \mathbf{x}(s) \, ds + \int_{t_0}^{t_0 + \Delta t} \bm{\mathsf{\Sigma}}_{\mathbf{y}} \mathrm{d} \mathbf{W}_\mathbf{y} The time increments y(t0+Δt)y(t0)\mathbf{y}(t_0 + \Delta t) - \mathbf{y}(t_0) are considered as the “observation” with a noise t0t0+ΔtΣydWyN(0,ΣyΔt)\int_{t_0}^{t_0 + \Delta t} \bm{\mathsf{\Sigma}}_{\mathbf{y}} \mathrm{d} \mathbf{W}_\mathbf{y} \sim \mathcal{N}(0, \bm{\mathsf{\Sigma}}_{\mathbf{y}} \Delta t). By solely looking at y\mathbf{y}, the y\mathbf{y} in the Kalman-Bucy filter is very different from the observations y\mathbf{y} defined in the Kalman filter. This should be noted with caution to avoid potential confusion.

  1. (1a) is the discrete formulation of an SDE, which is usually translatable to a time discretization of an SDE.

  2. "Measurement error" is a better term for the Kalman-filter-type observation noise, while the Kalman-Bucy-filter-type observation noise is closer to "observation model error" that characterizes the unresolved dynamics of observations.

  1. Bishop, A. N., & Del Moral, P. (2023). On the mathematical theory of ensemble (linear-Gaussian) Kalman–Bucy filtering. Mathematics of Control, Signals, and Systems, 35(4), 835–903. https://doi.org/10.1007/s00498-023-00357-2
  2. Kalman, R. E., & Bucy, R. S. (1961). New results in linear filtering and prediction theory. Journal of Fluids Engineering, Transactions of the ASME, 83(1), 95–108. https://doi.org/10.1115/1.3658902
  3. Jazwinski, A. (1970). Stochastic Processes And Filtering Theory. 64. https://doi.org/10.1016/S0076-5392(09)60368-4
  4. Bergemann, K., & Reich, S. (2012). An ensemble Kalman-Bucy filter for continuous data assimilation. Meteorologische Zeitschrift, 21(3), 213–219. https://doi.org/10.1127/0941-2948/2012/0307