I am embarking on a project that works with sensors. Sensor readings are noisy, so they need to be filtered. There are many ways to filter a stream of noisy data, and one that I just learnt is called Kalman Filter. A good place to learn about it is: http://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/. Here I am going to discuss about this technique mostly for my own future reference.

It seems that the main motivation behind this technique is that the product of two Gaussian functions is also Gaussian. This relationship allows us to combine the result of future measurements with current predictions to come up with a more optimal guess in a recursive manner.

Motivating example: we are tracking a vehicle using a GPS. We can't tell exactly where the vehicle is. But we can draw a circle in which we believe where the vehicle should be located. The centre of the circle being the point we most strongly believe where the vehicle is, though it can be located anywhere further away from that point with decreasing probability.

In this kind of example, if we model the thing that we want to keep track of as Gaussian, then what we need is the mean \(x\) and covariance matrix \(P\). As of the example above, over time the vehicle will move, and hence \(x\) will also change over time, as well as \(P\) which will get larger as time passes (we are getting more and more uncertain of where the vehicle is). Our main objective is to keep track of \(x\) and \(P\) and at the same time find a way to bound \(P\).

Given \(x_k\) and \(P_k\) the mean and covariance values at time k, we can derive \(x_{k+1}\) and \(P_{k+1}\) the prediction of the mean and covariance at time k+1. Let \(F_k\) be the matrix that represents the update from \(x_k\) to \(x_{k+1}\). Then we have \(x_{k+1} = F_k x_k\). Furthermore, \(P_{k+1} = Cov(F_kx_k) = F_kP_kF_k^T\). This is called the "prediction" stage of Kalman Filter. To factor in external forces and process error, we can extend the above formulation into a more general one: \(x_{k+1} = F_kx_k + C_k\) while \(P_{k+1} = F_kP_kF_k^T + D_k\) for some \(C_k\) and \(D_k\) that represent external force and process error respectively.

One timestep ahead, at time k+1, suppose that we also have a reading of \(x\) (which is not accurate). Let's call this reading \(y_{k+1}\), and say that the measurement also follows Gaussian distribution with covariance matrix \(Q_{k+1}\). The idea is to correct our prediction \(x_{k+1}\) and \(P_{k+1}\) using the new noisy readings. The way Kalman Filter resolves the difference is by multiplying both the distribution N(x, P) and N(y, Q) to come up with a new distribution with a new mean and variance that is optimal based on x, P, y and Q. As it turns out, N(x, P) N(y, Q) is also Gaussian: \(N(x, P) N(y, Q) = c \exp\left\{-\frac{1}{2}\left( (v-x)^TP^{-1}(v-x) + (v-y)^TQ^{-1}(v-y) \right)\right\}\) (for some constant c) can be rewritten as \( d \exp \left\{ -\frac{1}{2} \left( (v-u)^TS^{-1}(v-u) \right) \right\} \) (for some constant d) where

\(S^{-1} = P^{-1} + Q^{-1}\) and \(u = SP^{-1}x + SQ^{-1}y\). (by simple grouping of terms and terms comparison, and also using the fact that covariance matrix is symmetric).

Here, it is clear that we can compute S and u which are the "correction" of our prediction x, P based on our reading y, Q. The formulation of Kalman Filter can be derived from the above:

Let \(K = P(P+Q)^{-1} \). K is called the Kalman gain. We will show that:

(1). \(S = P - KP\) and,

(2). \(u = x + K(y-x)\).

For (1): We can reverse engineer the process. Let's suppose that we do not know what K is. Then by letting \(S = (P^{-1}+Q^{-1})^{-1} = P - KP\), we get \(I = P(P^{-1}+Q^{-1}) - KP(P^{-1}+Q^{-1})\) which simplifies to \(K = PQ^{-1}(I+PQ^{-1})^{-1}\). However, notice that \(P+Q)Q^{-1} = PQ^{-1} + I\), which implies \(Q^{-1}(PQ^{-1}+I)^{-1}=(P+Q)^{-1}\). Hence we can rewrite K as \(K = P(P+Q)^{-1}\). This also implies that \(S = P-KP\).

For (2): From \(u = SP^{-1}x + SQ^{-1}y\), we can substitute (1) into that expression to deduce (2).

Together, (1) and (2) forms the "update" stage of the process. The new value S and u is then used for the next prediction of x and P, and the process repeats itself. (1) and (2) can be generalized further in case that we need to transform \(x\) to the dimension of \(y\) by providing a transformation matrix \(H\), s.t. \(Hx\) is in the dimension of \(y\), and \(HPH^T\) is the covariance of \(Hx\).

The next step for me is to make use of this filter and see the effectiveness of this technique. I'll post the results in a future post.

UPDATE:

Implementation of Kalman Filter can be found here:https://github.com/prajogotio/expriment-kalmanfilter. Interesting fact: process error is actually one important factor that allows the filter to converge nearer to reading values.

Preliminary result:

The blue line represents the raw accelerometer readings on one of the axis, while the smoother red line represents the filtered data. It is clear that the filter does effectively smoothen the graph, but at the same time the picture also shows that further experimentation on the parameters of the filter is necessary for its application.