University of Surrey

Test tubes in the lab Research in the ATI Dance Research

An evaluation of bags-of-words and spatio-temporal shapes for action recognition

De Campos, T, Barnard, M, Mikolajczyk, K, Kittler, J, Yan, F, Christmas, W and Windridge, D (2011) An evaluation of bags-of-words and spatio-temporal shapes for action recognition In: 2011 IEEE Workshop on Applications of Computer Vision (WACV), 2011-01-05 - 2011-01-07.

Available under License : See the attached licence file.

Download (522kB)
PDF (licence)

Download (33kB)


Bags-of-visual-Words (BoW) and Spatio-Temporal Shapes (STS) are two very popular approaches for action recognition from video. The former (BoW) is an un-structured global representation of videos which is built using a large set of local features. The latter (STS) uses a single feature located on a region of interest (where the actor is) in the video. Despite the popularity of these methods, no comparison between them has been done. Also, given that BoW and STS differ intrinsically in terms of context inclusion and globality/locality of operation, an appropriate evaluation framework has to be designed carefully. This paper compares these two approaches using four different datasets with varied degree of space-time specificity of the actions and varied relevance of the contextual background. We use the same local feature extraction method and the same classifier for both approaches. Further to BoW and STS, we also evaluated novel variations of BoW constrained in time or space. We observe that the STS approach leads to better results in all datasets whose background is of little relevance to action classification.

Item Type: Conference or Workshop Item (Paper)
Additional Information:

© 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Divisions: Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Depositing User: Symplectic Elements
Date Deposited: 27 Feb 2012 11:37
Last Modified: 23 Sep 2013 18:58

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800