University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Shot Descriptors for Video Temporal Decomposition.

Sidiropoulos, Panagiotis. (2012) Shot Descriptors for Video Temporal Decomposition. Doctoral thesis, University of Surrey (United Kingdom)..

Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (41MB) | Preview


Video temporal decomposition is an essential element of a variety of video processing applications, from semantic indexing and classification to non-linear browsing, video summarization and video retrieval. The decomposition is traditionally conducted using shots as the video structural units. However, while shots are video segments that can be explicitly defined, they lack semantic meaning. On the other hand, scenes, which are generally defined as the elementary semantic video units, are expected to generate more meaningful video representations and to enhance the performance of video processing applications that employ temporal decomposition. However, before replacing shot with scene segmentation the latter need to reach the high performance levels of the former. This thesis aims to provide directions towards this goal, first by identifying some of the main current limitations of video scene segmentation and next by suggesting ways to overcome them. More specifically, four main restraints have been identified. Firstly, the ambiguity in the definition of what a scene is, which is an inherent domain characteristic. The general scene definition as the elementary semantic unit finds various interpretations depending on the video genre, the application etc. Next, the semantic gap between what makes two shots belong to the same scene and the available scene descriptors. Indeed, the scenes are formed by links between pairs of neighboring shots that are similar in content. The shot content similarity cant be efficiently modeled by low-level descriptors, which are typically used by the community for this purpose. Additionally, the limited scalability of the existing scene segmentation algorithms. As a matter of fact, it seems to be difficult to generalize and efficiently tune scene segmentation approaches not only for videos of multiple genres but also for a small number of videos from the same genre. Finally, the lack of a uni-dimensional evaluation measure that would efficiently gauge the performance of an automatic scene segmentation system. This thesis includes the development of a novel approach to evaluating video temporal decomposition algorithms, which is not only effective in evaluating scene segmentation techniques and in helping to optimize their parameters, but also satisfies a number of qualitative prerequisites that previous measures do not. Furthermore, the novel measure is proven to be a metric, which is a property that can be used to alleviate the effects of the scene definition ambiguity. Subsequently, a scheme that fully exploits the scene discrimination potential of shot descriptors deriving both from visual and audio modality is presented, followed by the introduction of a number of novel shot descriptors. These employ high-level features automatically extracted from the visual and the auditory channel, which are shown to be able to contribute towards improved video segmentation to scenes. Finally, conclusions and future work complete this thesis.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors : Sidiropoulos, Panagiotis.
Date : 2012
Additional Information : Thesis (Ph.D.)--University of Surrey (United Kingdom), 2012.
Depositing User : EPrints Services
Date Deposited : 14 May 2020 14:16
Last Modified : 14 May 2020 14:19

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800