| Fast Connected Components Labeling |
|---|
Connected components labeling and extraction is a useful initial step in many computer vision applications. Grouping spatially connected pixels together achieves compression of the available information from an image in a way useful for certain tasks such as object identification and shape analysis. Traditional approaches to labeling are computationally expensive. This is mainly due to doing multiple image scans to group parts of a connected region together, which are easily extracted individually. Over time, the time-intensive repeated image scan problem has been addressed with improved algorithmic approaches. Many of these resort to storing additional information during the first image scan, sometimes with the help of predesigned data structures. This enables grouping of sub-regions of a component without an additional pass. We have developed one such fast connected components labeling algorithm using a region coloring approach. It computes region attributes such as size, moments, and bounding boxes in a single pass through the image. Working in the context of real-time pupil detection for an eye tracking system, we have compared the time performance of our algorithm with a contour tracing-based labeling approach and a region coloring method developed for a hardware eye detection system. We found that region attribute extraction performance exceeds that of these comparison methods. Further, labeling each pixel, which requires a second pass through the image, has comparable performance. For more details, a paper describing our proposed algorithm is available online. The original publication is available at "www.springerlink.com".
| Shift Expectation Maximization |
|---|
It is the usual assumption in multidimensional clustering that data points are aligned with
each other. But this does not hold true in many real world datasets. This poses a problem to
clustering algorithms, especially ones that adopt a probabilistic mixture modeling approach.
Misalignment of the data points tends to increase the estimated variances along the feature
dimensions. It might also lead to an artificial increase in the number of mixture components
required to fit the data. One such scenario arises in the clustering of action potentials
(spikes) recorded from the brains of insects or monkeys where the goal is to assign spikes
arising from a single cell or unit to a unique cluster. Usually different spikes from a single
cell are not aligned with each other due to imperfections in the measurement device. Another
scenario is the grouping of images of similar objects with the objects at possibly different
positions in the those images. The Expectation Maximization framework for estimating the
parameters of a mixture model provides a way to handle misalignments. This is done by
introducing shift as a hidden variable in the generative model for the data and assuming
misalignments are due to the unknown random shifts. We have realized this for the case of a
Gaussian mixture model assuming a finite number of discrete random shifts independent of the
clusters. Clustering is performed in a reduced dimensional space within which it is possible to
align the data points by reversing the shifts that might have occured when the points were
being generated. The plot on the left above shows synthetic data points generated from a mixture
of two Gaussians with means resembling an upright and an inverted triangle in a 15 dimensional
space and then randomly shifted. The cluster means in a suitable data subspace obtained using
our Shift Expectation Maximization algorithm are shown on the plot to the right. We are planning
to apply this to neuronal spike datasets recorded from insect brains to cluster them accounting
for possible misalignments.
| Competitive Expectation Maximization |
|---|
Expectation Maximization is a commonly used optimization technique in the machine learning
community to handle missing data problems. A typical application of this algorithm
is in fitting a probabilistic mixture model to a set of data points, eg., a Gaussian
Mixture Model (GMM). To seek the Maximum Likelihood (ML) estimate of the parameters of
the mixture model, the optimization problem is cast as a missing data problem and EM is
used to seek the ML estimate. However EM, being a hill-climbing technique, does not always
lead to the global ML estimate. Depending on the starting point it converges to a nearby local
maximum making it sensitive to initialization. Another problem in modeling a data set using a mixture
model is to automatically estimate the appropriate number of components in the mixture model. To
address these issues, a technique called Competitive Expectation Maximization (CEM) was proposed in
this paper. Working with Kobus Barnard,
I have tried to implement a variant of CEM following the spirit of the paper. The above plots show the
initial and final GMM configurations for a synthetic data set using our implementation of CEM. The
final solution is close to the configuration of the GMM that generated the data. This can be applied
to any kind of mixture model employing EM as the optimization procedure to automatically estimate the
number of mixture components and seek the global ML solution with a high probability. Using our implementation, the following clips help
in visualizing the process of automatic selection of number and configuration of clusters for the above synthetic data set starting from
two different randomly initialized configurations. Observe the sequence of cluster split, merge and annihilation operations as the
algorithm proceeds in time!
From 1 to 8 clusters
From 20 to 8 clusters
| Semantic Evaluation of Features Using Word Prediction Performance |
|---|
Research in the field of multimedia indexing and retrieval has tried to
exploit the semantic information carried by keywords attached to images. There
exist huge databases of images that come with words describing the context of
each image. The semantic information carried by the words associated with images
can be very helpful in organizing and indexing the data. Since these words
describe the content of the images - individual objects or their characteristics
- there exists a correlation between them and the visual features computed from
the images. Models have been introduced to extract this correlation structure
between words and features with the help of clustering methods that learn the
joint statistics of image words and segments. The very fact that words and
images can be modeled using a joint probability distribution has given rise to a
new application called "auto-annotation". It is the process of attaching words
to pictures automatically. The predicted words are indicative of scene
semantics. This can be viewed as a method of general object recognition
performed as machine translation from the object's visual representation to its
verbal description. The annotation performance can be measured by comparing
predicted words to words that are already associated with the test images.
Furthermore the availability of such labeled databases makes it possible to
test the performance on a large scale and obtain reliable performance measures.
The focus of my work is to use this auto-annotation performance measure to
evaluate suitable feature combinations. For further details, here are a few links:
Keiji Yanai, Nikhil V. Shirahatti, Prasad Gabbur, Kobus Barnard, Evaluation Strategies for Image
Understanding and Retrieval, Proc. of ACM Multimedia Workshop on Multimedia Information Retrieval (MIR), Singapore, November, 2005
(Invited paper).
Kobus Barnard, Pinar Duygulu, Raghavendra Guru, Prasad Gabbur, David Forsyth, The effects of segmentation and
feature choice in a translation model of object recognition, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Vol.
2, pp. 675-682, 2003.
| Modifications to Normalized Cuts segmentation algorithm |
|---|
Normalized Cuts is a grouping criterion that aims to partition a set
of points into coherent subsets, originally developed by the Berkeley segmentation research group. It follows a graph theoretic approach
to partition a point set. Given a similarity measure (affinity) between each pair of points in the set, it tries to group together
points that have large affinity between each other. This criterion has been applied to the domain of image segmentation. A possible
approach is to treat each pixel in an image as a point in some arbitrary feature space and group together those pixels that are very
similar to each other according to the features chosen. I have worked on modifying a few aspects of a version of Normalized Cuts
segmentation algorithm to achieve better grouping of regions in natural images. Results of this work have formed a part of my
Masters thesis:
Prasad Gabbur, Quantitative evaluation of feature sets, segmentation algorithms, and color constancy algorithms
using word prediction, University of Arizona, Electrical and Computer Engineering department, Masters thesis, 2003.
| Face Detection/Tracking in Color Image Sequences |
|---|
The idea of using physical attributes viz., face, fingerprints, voiceprints or any of several other characteristics to prove human identity has a lot of appeal. Any trait of human beings that is unique and sufficiently stable can serve as a distinguishing measure for verifying, recognizing or classifying them. Face is one such attribute of human beings that clearly distinguishes different individuals. In fact, face is the attribute that is most commonly used by human visual system to identify people. This gives us the cue as to why research has been aimed at developing computational systems for automatic face recognition. Automatic face recognition is a process of identifying a test face image with one of the faces stored in a prepared face database. Real world images need not necessarily contain isolated face(s) that can directly serve as inputs to a face recognition (FR) system. Hence, there is a need to isolate or segment facial regions to be fed to a FR system.It may be felt that face detection is a trivial task. After all, we human beings, do this in our daily lives without any effort. The human visual system can easily detect and differentiate a human face from its surroundings but it is not easy to train a computer to do so. My work involves detection/segmentation and tracking of faces in color image sequences with complex backgrounds. Skin color is the main cue for detecting face(s) using appropriate mixture models for estimating underlying skin color distribution. For further details, here are a few links:
Prem Kuchi, Prasad Gabbur, P. Subbanna Bhat, Sumam David S., Human Face Detection and Tracking using Skin Color
Modeling and Connected Component Operators, IETE Jl. of Research, Vol. 38, No. 3&4, pp. 289-293, May-Aug 2002.
Prasad Gabbur, Detection and Segmentation of Human Faces in Color Images with Complex Backgrounds, detailed project report as a requirement for the ECE532 Computer Vision course, Fall 2001.
| Color Constancy in a Translation Model of Object Recognition |
|---|
Color is an extremely useful feature for characterizing and recognizing objects. It has been studied in detail for specific recognition
tasks such as skin. Color is also possibly the most useful of the features typically used in a content-based image retrieval (CBIR)
system. However the color of a scene depends on the color of light illuminating the scene. In other words the same object can appear
to be differently colored if viewed under lights having different spectral components (different colors). This poses difficulty to
systems that use color as a cue in recognizing objects. Two different approaches can be taken to deal with this problem. One would be
to make the system learn about different lighting conditions it can encounter by presenting it with exemplars under those conditions.
The other would be to remove the effects of illumination color and obtain an illumination independent description of the scene. This is
essentially the goal of computational color constancy algorithms. These algorithms attempt to estimate the illumination color of a
scene or obtain an illumination independent description of the scene that more precisely reflects its physical content. I am working on
evaluating the two approaches using the same translation model of object recognition. In this model, objects in a scene are recognized
by predicting words for the scene automatically, given a set of visual descriptors for the scene. The translation model learns correlation
between visual descriptors and words using a large annotated image database. Preliminary evaluation results using simple color
constancy algorithms and details of the experiments can be found in the following paper:
Kobus Barnard, Prasad Gabbur, Color and Color Constancy in a Translation Model for Object Recognition,
Proc. IS&T/SID 11th Color Imaging Conference, pp. 364-369, 2003.