University of Surrey

Test tubes in the lab Research in the ATI Dance Research

iGroup: Weakly supervised image and video grouping

Gilbert, A and Bowden, R iGroup: Weakly supervised image and video grouping In: ICCV 2011, 2011-11-06 - 2011-11-13, Barcelona, Spain.

[img]
Preview
Text (licence)
SRI_deposit_agreement.pdf

Download (32Kb)
[img] Text (deleted)
Gilbert_ICCV11_Paper.pdf
Restricted to Repository staff only

Download (4Mb)
[img]
Preview
Text
aaFinalSubmittedICCV11WithAck.pdf
Available under License : See the attached licence file.

Download (4Mb)

Abstract

We present a generic, efficient and iterative algorithm for interactively clustering classes of images and videos. The approach moves away from the use of large hand labelled training datasets, instead allowing the user to find natural groups of similar content based upon a handful of “seed” examples. Two efficient data mining tools originally developed for text analysis; min-Hash and APriori are used and extended to achieve both speed and scalability on large image and video datasets. Inspired by the Bag-of-Words (BoW) architecture, the idea of an image signature is introduced as a simple descriptor on which nearest neighbour classification can be performed. The image signature is then dynamically expanded to identify common features amongst samples of the same class. The iterative approach uses APriori to identify common and distinctive elements of a small set of labelled true and false positive signatures. These elements are then accentuated in the signature to increase similarity between examples and “pull” positive classes together. By repeating this process, the accuracy of similarity increases dramatically despite only a few training examples, only 10% of the labelled groundtruth is needed, compared to other approaches. It is tested on two image datasets including the caltech101 [9] dataset and on three state-of-the-art action recognition datasets. On the YouTube [18] video dataset the accuracy increases from 72% to 97% using only 44 labelled examples from a dataset of over 1200 videos. The approach is both scalable and efficient, with an iteration on the full YouTube dataset taking around 1 minute on a standard desktop machine.

Item Type: Conference or Workshop Item (Poster)
Additional Information: Copyright 2011 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Divisions: Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Depositing User: Symplectic Elements
Date Deposited: 29 Oct 2013 11:30
Last Modified: 29 Oct 2013 11:30
URI: http://epubs.surrey.ac.uk/id/eprint/802867

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800