University of Surrey

Test tubes in the lab Research in the ATI Dance Research

The MediaMill TRECVID 2009 Semantic Video Search Engine

Snoek, C, Sande, K, Rooij, O, Huurnink, B, Uijlings, J, Liempt, M, Bugalhoy, M, Trancosoy, I, Yan, F, Tahir, M, Mikolajczyk, K, Kittler, J, Rijke, M, Geusebroek, J, Gevers, T, Worring, M, Koelma, D and Smeulders, A (2009) The MediaMill TRECVID 2009 Semantic Video Search Engine

[img]
Preview
PDF (licence)
SRI_deposit_agreement.pdf

Download (33kB)
[img]
Preview
PDF
mediamill-TRECVID2009-final.pdf
Available under License : See the attached licence file.

Download (1MB)

Abstract

In this paper we describe our TRECVID 2009 video re- trieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and in- teractive search. The starting point for the MediaMill con- cept detection approach is our top-performing bag-of-words system of last year, which uses multiple color descriptors, codebooks with soft-assignment, and kernel-based supervised learning. We improve upon this baseline system by explor- ing two novel research directions. Firstly, we study a multi- modal extension by including 20 audio concepts and fusion using two novel multi-kernel supervised learning methods. Secondly, with the help of recently proposed algorithmic re- nements of bag-of-word representations, a GPU implemen- tation, and compute clusters, we scale-up the amount of vi- sual information analyzed by an order of magnitude, to a total of 1,000,000 i-frames. Our experiments evaluate the merit of these new components, ultimately leading to 64 ro- bust concept detectors for video retrieval. For retrieval, a robust but limited set of concept detectors justi es the need to rely on as many auxiliary information channels as pos- sible. For automatic search we therefore explore how we can learn to rank various information channels simultane- ously to maximize video search results for a given topic. To further improve the video retrieval results, our interactive search experiments investigate the roles of visualizing pre- view results for a certain browse-dimension and relevance feedback mechanisms that learn to solve complex search top- ics by analysis from user browsing behavior. The 2009 edi- tion of the TRECVID benchmark has again been a fruitful participation for the MediaMill team, resulting in the top ranking for both concept detection and interactive search. Again a lot has been learned during this year's TRECVID campaign; we highlight the most important lessons at the end of this paper.

Item Type: Conference or Workshop Item (Conference Paper)
Divisions : Faculty of Engineering and Physical Sciences > Electronic Engineering > Centre for Vision Speech and Signal Processing
Authors :
AuthorsEmailORCID
Snoek, CUNSPECIFIEDUNSPECIFIED
Sande, KUNSPECIFIEDUNSPECIFIED
Rooij, OUNSPECIFIEDUNSPECIFIED
Huurnink, BUNSPECIFIEDUNSPECIFIED
Uijlings, JUNSPECIFIEDUNSPECIFIED
Liempt, MUNSPECIFIEDUNSPECIFIED
Bugalhoy, MUNSPECIFIEDUNSPECIFIED
Trancosoy, IUNSPECIFIEDUNSPECIFIED
Yan, FUNSPECIFIEDUNSPECIFIED
Tahir, MUNSPECIFIEDUNSPECIFIED
Mikolajczyk, KUNSPECIFIEDUNSPECIFIED
Kittler, JUNSPECIFIEDUNSPECIFIED
Rijke, MUNSPECIFIEDUNSPECIFIED
Geusebroek, JUNSPECIFIEDUNSPECIFIED
Gevers, TUNSPECIFIEDUNSPECIFIED
Worring, MUNSPECIFIEDUNSPECIFIED
Koelma, DUNSPECIFIEDUNSPECIFIED
Smeulders, AUNSPECIFIEDUNSPECIFIED
Date : 2009
Depositing User : Symplectic Elements
Date Deposited : 14 Dec 2012 10:23
Last Modified : 09 Jun 2014 13:14
URI: http://epubs.surrey.ac.uk/id/eprint/733282

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800