University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Novel Image Representations For Visual Categorisation With Bag-of-Words.

Koniusz, Piotr. (2013) Novel Image Representations For Visual Categorisation With Bag-of-Words. Doctoral thesis, University of Surrey (United Kingdom)..

Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (13MB) | Preview


Visual Category Recognition aims at fast classification of objects, as well as scenery, action, and semantically complex concepts in collections of unannotated images. Its applications include security and crime prevention, rapid selection of content for efficient media practices, television and press archives, organisation of visual content in the social media, e-commerce, robotic recognition, and many more. There exist a variety of approaches to visual categorisation. However, due to complex nature of visual appearances and complex taxonomy of objects, a simplifying statistical model developed for natural language processing, called Bag-of-Words, is typically used. In such a model, descriptors are extracted from images at keypoint locations and then expressed as vectors representing visual word appearances, referred to as mid-level features. A pooling step is carried out to transform mid-level features from an image into a final vectorial representation called image signature. Finally, a classifier is applied. Segmentation-based interest points for matching and recognition are investigated first. Two simple methods for extracting features from the segmentation maps are proposed. They focus on the boundaries and centres of the gravity of the segments. Segmentation-based image descriptors are proposed next. They are extracted from pairs of adjacent regions from an unsupervised segmentation. Thus, semi-local structural appearances are exploited. This limits contribution of uniform regions. A highly popular technique for coding the local image descriptors in Bag-of-Words, called Soft Assignment, is combined with Linear Coordinate Coding to minimise its quantisation loss which strongly correlates with the best classification performance. An approach that introduces spatial information to Bag-of-Words, called Spatial Coordinate Coding is proposed. It reduces the size of mid-level features tenfold. Moreover, as dominant orientations of edges and colour are sources of bias in images, we learn them at multiple levels of coarseness by Dominant Angle and Colour Pyramid Matching. A number of techniques for generating mid-level features as well as various pooling methods that aggregate mid-level features into image signatures are investigated. We generalise these pooling methods to account for the descriptor interdependence and introduce an improved pooling that addresses noise effects in mid-level features. Bag-of-Words typically extract the first-order statistics from mid-level features. To improve recognition, aggregation over co-occurrences of visual words in mid-level features is proposed. An appropriate derivation is provided and various likelihood inspired pooling operators investigated. Moreover, an extension to multiple modalities is proposed.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors : Koniusz, Piotr.
Date : 2013
Additional Information : Thesis (Ph.D.)--University of Surrey (United Kingdom), 2013.
Depositing User : EPrints Services
Date Deposited : 06 May 2020 12:07
Last Modified : 06 May 2020 12:13

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800