University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Deep Architectures and Ensembles for Semantic Video Classification

Ong, Eng-Jon, Husain, Sameed, Bober-Irizar, Mikel and Bober, Miroslaw (2018) Deep Architectures and Ensembles for Semantic Video Classification IEEE Transactions on Circuits and Systems for Video Technology.

Deep Architectures and Ensembles for Semantic Video Classification.pdf - Accepted version Manuscript

Download (2MB) | Preview


This work addresses the problem of accurate semantic labelling of short videos. To this end, a multitude of different deep nets, ranging from traditional recurrent neural networks (LSTM, GRU), temporal agnostic networks (FV,VLAD,BoW), fully connected neural networks mid-stage AV fusion and others. Additionally, we also propose a residual architecture-based DNN for video classification, with state-of-the art classification performance at significantly reduced complexity. Furthermore, we propose four new approaches to diversity-driven multi-net ensembling, one based on fast correlation measure and three incorporating a DNN-based combiner. We show that significant performance gains can be achieved by ensembling diverse nets and we investigate factors contributing to high diversity. Based on the extensive YouTube8M dataset, we provide an in-depth evaluation and analysis of their behaviour. We show that the performance of the ensemble is state-of-the-art achieving the highest accuracy on the YouTube8M Kaggle test data. The performance of the ensemble of classifiers was also evaluated on the HMDB51 and UCF101 datasets, and show that the resulting method achieves comparable accuracy with state-ofthe- art methods using similar input features.

Item Type: Article
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering
Authors :
Husain, Sameed
Bober-Irizar, Mikel
Date : 2018
Funders : Engineering and Physical Sciences Research Council (EPSRC)
Copyright Disclaimer : © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Uncontrolled Keywords : Computer Vision; Artificial Neural Networks; Machine Learning Algorithms
Related URLs :
Depositing User : Clive Harris
Date Deposited : 15 Nov 2018 13:52
Last Modified : 11 Dec 2018 11:24

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800