iGroup: Weakly supervised image and video grouping
Gilbert, A and Bowden, R iGroup: Weakly supervised image and video grouping In: ICCV 2011, 2011-11-06 - 2011-11-13, Barcelona, Spain.
Gilbert_ICCV11_Paper.pdf - Accepted version Manuscript
Available under License : See the attached licence file.
We present a generic, efficient and iterative algorithm for interactively clustering classes of images and videos. The approach moves away from the use of large hand labelled training datasets, instead allowing the user to find natural groups of similar content based upon a handful of “seed” examples. Two efficient data mining tools originally developed for text analysis; min-Hash and APriori are used and extended to achieve both speed and scalability on large image and video datasets. Inspired by the Bag-of-Words (BoW) architecture, the idea of an image signature is introduced as a simple descriptor on which nearest neighbour classification can be performed. The image signature is then dynamically expanded to identify common features amongst samples of the same class. The iterative approach uses APriori to identify common and distinctive elements of a small set of labelled true and false positive signatures. These elements are then accentuated in the signature to increase similarity between examples and “pull” positive classes together. By repeating this process, the accuracy of similarity increases dramatically despite only a few training examples, only 10% of the labelled groundtruth is needed, compared to other approaches. It is tested on two image datasets including the caltech101  dataset and on three state-of-the-art action recognition datasets. On the YouTube  video dataset the accuracy increases from 72% to 97% using only 44 labelled examples from a dataset of over 1200 videos. The approach is both scalable and efficient, with an iteration on the full YouTube dataset taking around 1 minute on a standard desktop machine.
|Item Type:||Conference or Workshop Item (Conference Poster)|
|Divisions :||Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing|
|Identification Number :||10.1109/ICCV.2011.6126493|
|Additional Information :||Copyright 2011 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.|
|Depositing User :||Symplectic Elements|
|Date Deposited :||21 May 2012 10:17|
|Last Modified :||23 Sep 2013 19:24|
Actions (login required)
Downloads per month over past year