 Computer Vision – Shape Context Descriptor

I recently came across a wonderful paper by Jitendra Malik, one of the pioneers of Computer Vision. It is called ‘Shape Matching and Object Recognition using Shape Contexts’. It describes how objects can be characterized by their shape, and how this description can be used effectively to match and recognize objects. In this post I will deal with the construction and properties of Shape Context descriptors. I will use the symbol of infinity (as shown below) as an example. Sampling of Points

To describe the shape of the object, we discretize internal and external contours of the object into a number of sample points. The number of sample points is not fixed and can vary from application to application. The sample points for the above image are as shown below. For this particular example, I have taken 200 sample points. The Descriptor

The sampled points are treated as features, and thus a descriptor is calculated for each one of them. The descriptor is a 5 X 12 matrix, that is it has 60 bins, where each bin stores the number of points in that bin. What are bins? How do we decide if a particular point belongs to a particular bin? Bins are obtained as a result of discretization of the space around a point. The image below will make it clearer. This discretization measures the relative position of every other point with respect to the point for which the descriptor is being calculated. The five circles are the 5 divisions with respect to distance, and the 12 lines divide with respect to angle. Why aren’t the circles equidistant? Well, that is because the divisions are represented in the log-polar space. Now the descriptor can be calculated by counting the number of points in each bin. Remember, this is the calculation for one point. It has to be done for all the sampled points.

One doubt I had in the figure above was about the orientation of the shape. Why was this particular orientation chosen? If the ‘A’ was rotated about the center, it would result in a different descriptor. It turns out that the image is only for representation, and the angular divisions are actually constructed with respect to the tangent at that point.

Properties

-> The Shape Context framework is translational invariant since the distances are calculated with respect to other points in the shape.
-> It is rotation invariant since the angles are computed with respect to the tangent at the point for which the descriptor is being calculated.
-> It can be made scale invariant by normalizing the distances by the mean distance between points in the shape.
-> The Shape Context is a very discriminative point descriptor, incorporating global shape information into a local descriptor.