University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Scalable online annotation & object localisation for broadcast media production.

Gray, Charles (2016) Scalable online annotation & object localisation for broadcast media production. Doctoral thesis, University of Surrey.

draft_final_charles_gray_mphil_corrections.pdf - Version of Record
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (11MB) | Preview


More video content is being produced by production companies and professional videographers than ever before thanks to the adoption of digital media technologies at every stage of the production pipeline. With hundreds of hours of footage being captured by even a small production company, organising and searching these collections has become a very challenging and time-consuming task. This thesis aims to investigate online video annotation for broadcast media production, including scalable video concept detection and object localisation. Most production tools and research focuses on asset management of large-scale video collections, but we also focus on making sense of content within an individual production video by extracting salient metadata and localising objects. We present a scalable semantic video concept detection framework, applied to automated metadata annotation (video logging) in a broadcast production environment. Video logging demands both accurate and fast concept detection. Whilst research often focuses on the former, the latter is essential in practical scenarios where days of footage may be shot per broadcast episode and production is dependent on immediate availability of metadata. We present a hierarchical classification framework that delivers benefits to both through two contributions. First, a dynamic weighting scheme for combining video features from multiple modalities enabling higher accuracy detection rates over diverse production footage. Second, a hierarchical classification strategy that exploits ontological relationships between concepts to scale sub-linearly with the number of classes, yielding a real-time solution. We demonstrate an end-to-end production system using a cloud-based architecture with our detection framework. We also describe a novel fully automatic algorithm for identifying salient objects in video based on their motion. Spatially coherent clusters of optical flow vectors are sampled to generate estimates of affine motion parameters local to super-pixels identified within each frame. These estimates, combined with spatial data, form coherent point distributions in a 5D solution space corresponding to objects or parts there-of. These distributions are temporally de-noised using a particle filtering approach, and clustered to estimate the position and motion parameters of salient moving objects in the clip. We demonstrate localization of salient object/s in a variety of clips exhibiting moving and cluttered backgrounds.

Item Type: Thesis (Doctoral)
Subjects : Video concept detection; Multi-modal fusion; Video object localisation
Divisions : Theses
Authors :
Date : 30 November 2016
Funders : Sony Broadcast Professional Research Labs
Contributors :
ContributionNameEmailORCID, Collomosse
Depositing User : Charles Gray
Date Deposited : 15 Dec 2016 08:55
Last Modified : 16 Jan 2019 17:09

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800