University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Hollywood 3D: What are the best 3D features for Action Recognition?

Hadfield, Simon, Lebeda, K and Bowden, Richard (2016) Hollywood 3D: What are the best 3D features for Action Recognition? International Journal of Computer Vision, 121 (1). pp. 95-110.


Download (1MB) | Preview
[img] Text
preprint.pdf - Accepted version Manuscript
Restricted to Repository staff only
Available under License : See the attached licence file.

Download (8MB)
Text (licence)
Available under License : See the attached licence file.

Download (33kB) | Preview


Action recognition “in the wild” is extremely challenging, particularly when complex 3D actions are projected down to the image plane, losing a great deal of information. The recent growth of 3D data in broadcast content and commercial depth sensors, makes it possible to overcome this. However, there is little work examining the best way to exploit this new modality. In this paper we introduce the Hollywood 3D benchmark, which is the first dataset containing “in the wild” action footage including 3D data. This dataset consists of 650 stereo video clips across 14 action classes, taken from Hollywood movies. We provide stereo calibrations and depth reconstructions for each clip. We also provide an action recognition pipeline, and propose a number of specialised depth-aware techniques including five interest point detectors and three feature descriptors. Extensive tests allow evaluation of different appearance and depth encoding schemes. Our novel techniques exploiting this depth allow us to reach performance levels more than triple those of the best baseline algorithm using only appearance information. The benchmark data, code and calibrations are all made available to the community.

Item Type: Article
Subjects : subj_Electronic_Engineering
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Authors :
Lebeda, K
Date : 21 June 2016
DOI : 10.1007/s11263-016-0917-2
Grant Title : Learning to Recognise Dynamic Visual Content from Broadcast Footage
Copyright Disclaimer : © The Author(s) 2016. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Uncontrolled Keywords : Action recognition · in the wild · 3D · structure · depth · 3D motion · Hollywood 3D · benchmark This work was support
Depositing User : Symplectic Elements
Date Deposited : 24 May 2016 15:48
Last Modified : 11 Dec 2018 11:22

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800