Research


test test test result
Fast Connected Components Labeling

Connected components labeling and extraction is a useful initial step in many computer vision applications. Grouping spatially connected pixels together achieves compression of the available information from an image in a way useful for certain tasks such as object identification and shape analysis. Traditional approaches to labeling are computationally expensive. This is mainly due to doing multiple image scans to group parts of a connected region together, which are easily extracted individually. Over time, the time-intensive repeated image scan problem has been addressed with improved algorithmic approaches. Many of these resort to storing additional information during the first image scan, sometimes with the help of predesigned data structures. This enables grouping of sub-regions of a component without an additional pass. We have developed one such fast connected components labeling algorithm using a region coloring approach. It computes region attributes such as size, moments, and bounding boxes in a single pass through the image. Working in the context of real-time pupil detection for an eye tracking system, we have compared the time performance of our algorithm with a contour tracing-based labeling approach and a region coloring method developed for a hardware eye detection system. We found that region attribute extraction performance exceeds that of these comparison methods. Further, labeling each pixel, which requires a second pass through the image, has comparable performance. For more details, a paper describing our proposed algorithm is available online. The original publication is available at "www.springerlink.com".


data_with_random_shifts subspace_cluster_means




Shift Expectation Maximization



It is the usual assumption in multidimensional clustering that data points are aligned with each other. But this does not hold true in many real world datasets. This poses a problem to clustering algorithms, especially ones that adopt a probabilistic mixture modeling approach. Misalignment of the data points tends to increase the estimated variances along the feature dimensions. It might also lead to an artificial increase in the number of mixture components required to fit the data. One such scenario arises in the clustering of action potentials (spikes) recorded from the brains of insects or monkeys where the goal is to assign spikes arising from a single cell or unit to a unique cluster. Usually different spikes from a single cell are not aligned with each other due to imperfections in the measurement device. Another scenario is the grouping of images of similar objects with the objects at possibly different positions in the those images. The Expectation Maximization framework for estimating the parameters of a mixture model provides a way to handle misalignments. This is done by introducing shift as a hidden variable in the generative model for the data and assuming misalignments are due to the unknown random shifts. We have realized this for the case of a Gaussian mixture model assuming a finite number of discrete random shifts independent of the clusters. Clustering is performed in a reduced dimensional space within which it is possible to align the data points by reversing the shifts that might have occured when the points were being generated. The plot on the left above shows synthetic data points generated from a mixture of two Gaussians with means resembling an upright and an inverted triangle in a 15 dimensional space and then randomly shifted. The cluster means in a suitable data subspace obtained using our Shift Expectation Maximization algorithm are shown on the plot to the right. We are planning to apply this to neuronal spike datasets recorded from insect brains to cluster them accounting for possible misalignments.


initial_clusters final_clusters




Competitive Expectation Maximization



Expectation Maximization is a commonly used optimization technique in the machine learning community to handle missing data problems. A typical application of this algorithm is in fitting a probabilistic mixture model to a set of data points, eg., a Gaussian Mixture Model (GMM). To seek the Maximum Likelihood (ML) estimate of the parameters of the mixture model, the optimization problem is cast as a missing data problem and EM is used to seek the ML estimate. However EM, being a hill-climbing technique, does not always lead to the global ML estimate. Depending on the starting point it converges to a nearby local maximum making it sensitive to initialization. Another problem in modeling a data set using a mixture model is to automatically estimate the appropriate number of components in the mixture model. To address these issues, a technique called Competitive Expectation Maximization (CEM) was proposed in this paper. Working with Kobus Barnard, I have tried to implement a variant of CEM following the spirit of the paper. The above plots show the initial and final GMM configurations for a synthetic data set using our implementation of CEM. The final solution is close to the configuration of the GMM that generated the data. This can be applied to any kind of mixture model employing EM as the optimization procedure to automatically estimate the number of mixture components and seek the global ML solution with a high probability. Using our implementation, the following clips help in visualizing the process of automatic selection of number and configuration of clusters for the above synthetic data set starting from two different randomly initialized configurations. Observe the sequence of cluster split, merge and annihilation operations as the algorithm proceeds in time!

From 1 to 8 clusters
From 20 to 8 clusters


annotated




Semantic Evaluation of Features Using Word Prediction Performance



Research in the field of multimedia indexing and retrieval has tried to exploit the semantic information carried by keywords attached to images. There exist huge databases of images that come with words describing the context of each image. The semantic information carried by the words associated with images can be very helpful in organizing and indexing the data. Since these words describe the content of the images - individual objects or their characteristics - there exists a correlation between them and the visual features computed from the images. Models have been introduced to extract this correlation structure between words and features with the help of clustering methods that learn the joint statistics of image words and segments. The very fact that words and images can be modeled using a joint probability distribution has given rise to a new application called "auto-annotation". It is the process of attaching words to pictures automatically. The predicted words are indicative of scene semantics. This can be viewed as a method of general object recognition performed as machine translation from the object's visual representation to its verbal description. The annotation performance can be measured by comparing predicted words to words that are already associated with the test images. Furthermore the availability of such labeled databases makes it possible to test the performance on a large scale and obtain reliable performance measures. The focus of my work is to use this auto-annotation performance measure to evaluate suitable feature combinations. For further details, here are a few links:

Keiji Yanai, Nikhil V. Shirahatti, Prasad Gabbur, Kobus Barnard, Evaluation Strategies for Image Understanding and Retrieval, Proc. of ACM Multimedia Workshop on Multimedia Information Retrieval (MIR), Singapore, November, 2005 (Invited paper).

Kobus Barnard, Pinar Duygulu, Raghavendra Guru, Prasad Gabbur, David Forsyth, The effects of segmentation and feature choice in a translation model of object recognition, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Vol. 2, pp. 675-682, 2003.


original modified




Modifications to Normalized Cuts segmentation algorithm



Normalized Cuts is a grouping criterion that aims to partition a set of points into coherent subsets, originally developed by the Berkeley segmentation research group. It follows a graph theoretic approach to partition a point set. Given a similarity measure (affinity) between each pair of points in the set, it tries to group together points that have large affinity between each other. This criterion has been applied to the domain of image segmentation. A possible approach is to treat each pixel in an image as a point in some arbitrary feature space and group together those pixels that are very similar to each other according to the features chosen. I have worked on modifying a few aspects of a version of Normalized Cuts segmentation algorithm to achieve better grouping of regions in natural images. Results of this work have formed a part of my Masters thesis:

Prasad Gabbur, Quantitative evaluation of feature sets, segmentation algorithms, and color constancy algorithms using word prediction, University of Arizona, Electrical and Computer Engineering department, Masters thesis, 2003.


test result








Face Detection/Tracking in Color Image Sequences



The idea of using physical attributes viz., face, fingerprints, voiceprints or any of several other characteristics to prove human identity has a lot of appeal. Any trait of human beings that is unique and sufficiently stable can serve as a distinguishing measure for verifying, recognizing or classifying them. Face is one such attribute of human beings that clearly distinguishes different individuals. In fact, face is the attribute that is most commonly used by human visual system to identify people. This gives us the cue as to why research has been aimed at developing computational systems for automatic face recognition. Automatic face recognition is a process of identifying a test face image with one of the faces stored in a prepared face database. Real world images need not necessarily contain isolated face(s) that can directly serve as inputs to a face recognition (FR) system. Hence, there is a need to isolate or segment facial regions to be fed to a FR system.It may be felt that face detection is a trivial task. After all, we human beings, do this in our daily lives without any effort. The human visual system can easily detect and differentiate a human face from its surroundings but it is not easy to train a computer to do so. My work involves detection/segmentation and tracking of faces in color image sequences with complex backgrounds. Skin color is the main cue for detecting face(s) using appropriate mixture models for estimating underlying skin color distribution. For further details, here are a few links:

Prem Kuchi, Prasad Gabbur, P. Subbanna Bhat, Sumam David S., Human Face Detection and Tracking using Skin Color Modeling and Connected Component Operators, IETE Jl. of Research, Vol. 38, No. 3&4, pp. 289-293, May-Aug 2002.

Prasad Gabbur, Detection and Segmentation of Human Faces in Color Images with Complex Backgrounds, detailed project report as a requirement for the ECE532 Computer Vision course, Fall 2001.


canonical unknown







Color Constancy in a Translation Model of Object Recognition

Color is an extremely useful feature for characterizing and recognizing objects. It has been studied in detail for specific recognition tasks such as skin. Color is also possibly the most useful of the features typically used in a content-based image retrieval (CBIR) system. However the color of a scene depends on the color of light illuminating the scene. In other words the same object can appear to be differently colored if viewed under lights having different spectral components (different colors). This poses difficulty to systems that use color as a cue in recognizing objects. Two different approaches can be taken to deal with this problem. One would be to make the system learn about different lighting conditions it can encounter by presenting it with exemplars under those conditions. The other would be to remove the effects of illumination color and obtain an illumination independent description of the scene. This is essentially the goal of computational color constancy algorithms. These algorithms attempt to estimate the illumination color of a scene or obtain an illumination independent description of the scene that more precisely reflects its physical content. I am working on evaluating the two approaches using the same translation model of object recognition. In this model, objects in a scene are recognized by predicting words for the scene automatically, given a set of visual descriptors for the scene. The translation model learns correlation between visual descriptors and words using a large annotated image database. Preliminary evaluation results using simple color constancy algorithms and details of the experiments can be found in the following paper:

Kobus Barnard, Prasad Gabbur, Color and Color Constancy in a Translation Model for Object Recognition, Proc. IS&T/SID 11th Color Imaging Conference, pp. 364-369, 2003.


under_constructionThis page is always under construction :)