University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Learning a Structured Model For Visual Category Recognition.

Gupta, Ashish. (2013) Learning a Structured Model For Visual Category Recognition. Doctoral thesis, University of Surrey (United Kingdom)..

Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (13MB) | Preview


This thesis deals with the problem of estimating structure in data due to the semantic relations between data elements and leveraging this information to learn a visual model for category recognition. A visual model consists of dictionary learning, which computes a succinct set of prototypes from training data by partitioning feature space, and feature encoding, which learns a representation of each image as a combination of dictionary elements. Besides variations in lighting and pose, a key challenge of classifying a category is intra-category appearance variation. The key idea in this thesis is that feature data describing a category has latent structure due to visual content idiomatic to a category. However, popular algorithms in literature disregard this structure when computing a visual model. Towards incorporating this structure in the learning algorithms, this thesis analyses two facets of feature data to discover relevant structure. The first is structure amongst the sub-spaces of the feature descriptor. Several subspace embedding techniques that use global or local information to compute a projection function are analysed. A novel entropy based measure of structure in the embedded descriptors suggests that relevant structure has local extent. The second is structure amongst the partitions of feature space. Hard partitioning of feature space leads to issues of uncertainty and plausibility in the assignment of descriptors to dictionary elements. To address this issue, novel fuzzy logic based dictionary learning and feature encoding algorithms are employed that are able to model the local feature vectors distributions and provide performance benefits. To estimate structure amongst sub-spaces, co-clustering is used with a training descriptor data matrix to compute groups of sub-spaces. A dictionary learnt on feature vectors embedded in these multiple sub-manifolds is demonstrated to model data better than a dictionary learnt on feature vectors embedded in a single sub-manifold. In a similar manner, co-clustering is used with encoded feature data matrix to compute groups of dictionary elements - referred to as ‘topics’. A topic dictionary is demonstrated to perform better than a regular dictionary of comparable size. Both these results suggest that the co-clustered groups of sub-spaces and dictionary elements have semantic relevance. All the methods developed here have been viewed from the unifying perspective of matrix factorization, where a data matrix is decomposed to two matrices which are interpreted as a dictionary matrix and a co-efficient matrix. Sparse coding methods, which are currently enjoying much success, can be viewed as matrix factorization with a regularization constraint on the dictionary or co-efficient matrices. With regards to sub-space embedding, the sparse principal component analysis is one such method that induces sparsity amongst the sub-spaces selected to represent each descriptor. Similarly, a sparsity inducing regularization method called Lasso is used for feature encoding, which uses only a sub-set of dictionary elements to represent each image. While these methods are effective, they disregard structure in the data matrix. To improve on this, structured sparse principal component analysis is used in conjunction with co-clustered groups of sub-spaces to induce sparsity at group level. The resultant structured sparse sub-manifold dictionary is demonstrated to provide performance benefits. In a similar manner, group Lasso is used with co-clustered groups of dictionary elements to induce sparsity in terms of topics. The structured sparse encoding is demonstrated to improve aggregate performance in comparison to a regular sparse coding. In conclusion, this thesis estimates structure in descriptor sub-spaces and learnt dictionary, uses co-clustering to compute semantically relevant sub-manifolds and topic dictionary, and finally incorporates the estimated structure in sparse coding methods, demonstrating performance gain.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors : Gupta, Ashish.
Date : 2013
Additional Information : Thesis (Ph.D.)--University of Surrey (United Kingdom), 2013.
Depositing User : EPrints Services
Date Deposited : 24 Apr 2020 15:26
Last Modified : 24 Apr 2020 15:26

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800