Computer Vision – Feature Detection

Feature Detection is basically finding key points or pixels in the image. An immediate question would be to ask what a key point is. A key point is a point which is unique in the local area around it and can be found and matched to a corresponding point in another image.

Aperture Problem

The following diagram will make it clear which points can and cannot be good key points.


The diagram above shows the same edge in two images where features are to be matched. If the point from where all the arrows originate is taken as a key point, there is no unique point in the second image to which the point can be matched. This is because all the points on the edge appear to the same. The arrows thus show that the point selected as a key point can be matched with many points on the edge, which is not desirable. We thus search for corner points which are locally unique.

Interest Point Detection

There is a mathematical formulation for searching corners in an image. We won’t go into the mathematical details but I will mention the logic behind it. For every pixel in the image we define an auto-correlation matrix A as follows:

A = \begin{bmatrix}  I_{x}^{2} & I_{x}I_{y}\\  I_{x}I_{y} & I_{y}^{2}  \end{bmatrix}

Here I_{x} is the horizontal derivative and I_{y} is the vertical derivative at that pixel. w is the weighting function, typically a Gaussian. Note that w is not a constant since it varies with the location of the pixel. Let \lambda_{0} and \lambda_{1} be the eigenvalues of this matrix. It can be proved that a pixel is a corner point if both \lambda_{0} and \lambda_{1} are ‘big’ in value. In a certain sense, a pixel is a key point if it’s gradient in both the directions is big. There are various quantities defined, which if found to be above a certain threshold ensure that both \lambda_{0} and \lambda_{1} are big. Some of them are listed below:

Szeliski Detector
det(A) / trace(A) = \lambda_{0} \lambda_{1} / (\lambda_{0} + \lambda_{1})

Harris Detector
det(A) - \alpha trace(A)^{2} = \lambda_{0} \lambda_{1} - \alpha ( \lambda_{0} +  \lambda_{1} )^{2}

\alpha is typically taken to be 0.06

Tomasi Detector
min(\lambda_{0}, \lambda_{1})

Triggs Detector
\lambda_{0} - \alpha \lambda_{1}

\alpha is typically taken to be 0.06

The algorithm thus is as follows:

Convert the image to grayscale and blur using a Gaussian
Compute the horizontal and vertical derivatives of the image I_{x} and I_{y}
Compute the three images corresponding to the outer products of these gradients (I_{x}^{2}, I_{x}I_{y} and I_{y}^{2})
Convolve each of these images with a larger Gaussian
For each pixel in the original image
      Construct the auto-correlation matrix A using the three images (I_{x}^{2}, I_{x}I_{y} and I_{y}^{2})
      Compute one of the quantities mentioned above (Szeliski, Harris, Tomasi, Triggs)
      If the point is a local maxima (using a particular threshold) then report it as a key point

The OpenCV code of the above algorithm can be found here.

There are no specific thresholds for these quantities and thus have to be tried out randomly. Out of the above, Harris Detector is the one most widely used because it is computationally cheap compared to the others (since calculation of eigenvalues is not required) and also gives good features. The following image shows the result of applying the above quantities on Lenna:


One comment

  1. […] Bag of words is a basically a simplified representation of an image. Its actually a concept taken form Natural Language Processing where you represent documents as an unordered collection of words disregarding grammar. Translating this into CV jargon, it means that we simplify images by picking out features from an image and representing it as a collection of features. A good explanation of what features are can be found at my friend, Siddharth’s blog here. […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: