Robust aggregation of local image descriptors for visual search.
Husain, Syed S. (2016) Robust aggregation of local image descriptors for visual search. Doctoral thesis, University of Surrey.
Sameed_thesis_19_5_2016.pdf - Version of Record
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (17MB) | Preview
Visual search and recognition underpins numerous applications including management of multimedia content, mobile commerce, surveillance, navigation, robotics and many others. However the task is still challenging predominantly due to the variability of object appearance and ever increasing size of the databases, often exceeding billions of images. The objective of this thesis is to develop a robust, compact and discriminative image representation suitable for tasks of visual search. This thesis contributes to four research areas. First we propose a novel method, named Robust Visual Descriptor (RVD), for deriving a compact and robust representation of image content which significantly advances state of the art and delivers world-class performance. In our approach, the local descriptors are assigned to multiple cluster centres with rank weights leading to a stable and reliable global image representation. Residual vectors are then computed in each cluster, normalized using a direction preserving normalization and aggregated based on the neighbourhood rank information. We then propose two extensions to the core RVD descriptor. The first one consists of de-correlating weighted residual vectors by applying cluster level PCA before aggregation. In the second extension, the weighted residual vectors are whitened in each cluster before aggregation, leading to a balanced energy distribution in each dimension and improved performance. Compressing floating point global signatures to binary codes improves storage requirements and matching speed for large scale image retrieval tasks. Our third contribution is to derive a compact and robust binary image signature from the core RVD representation. In addition, we propose a novel binary descriptors matching algorithm, PCAE with Weighted Hamming distance (PCAE+WH), to minimize the quantization loss associated with converting floating point vector to discrete binary codes. In the context of industry work on Compact descriptors for Visual Search (CDVS) and its standardization in MPEG (ISO), we propose a scalable RVD representation. The bitrate scalability is achieved by employing novel Cluster Selection and Bit Selection mechanisms which support interoperable binary RVD representations. Moreover, we propose a very efficient and effective score function based on weighted Hamming distance, to compute similarity between two binary representations. Our fourth contribution is to develop an image classification system based on RVD representation. We introduce an effective method to incorporate second order statistics in the original RVD framework.
|Item Type:||Thesis (Doctoral)|
|Subjects :||Pattern Recognition, Computer Vision, Visual Search|
|Date :||30 June 2016|
|Funders :||Centre for Vision, Speech and Signal Processing|
|Depositing User :||Syed Sameed Husain|
|Date Deposited :||12 Jul 2016 08:39|
|Last Modified :||12 Jul 2016 08:39|
Actions (login required)
Downloads per month over past year