University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Sketch based image retrieval on big visual data.

Bui, Tu (2019) Sketch based image retrieval on big visual data. Doctoral thesis, University of Surrey.

Text (thesis corrected version)
thesis_corrected.pdf - Version of Record
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (53MB) | Preview


The deluge of visual content on the Internet - from user-generated content to commercial image collections - motivates intuitive new methods for searching digital image content: how can we find certain images in a database of millions? Sketch-based image retrieval (SBIR) is an emerging research topic in which a free-hand drawing can be used to visually query photographic images. SBIR is aligned to emerging trends for visual content consumption on mobile touch-screen based devices, for which gestural interactions such as sketch are a natural alternative to textual input. This thesis presents several contributions to the literature of SBIR. First, we propose a cross-domain learning framework that maps both sketches and images into a joint embedding space invariant to depictive style, while preserving semantics. The resulting embedding enables direct comparison and search between sketches and images and is based upon a multi-branch convolutional neural network (CNN) trained using unique parameter sharing and training schemes. The deeply learned embedding is shown to yield state-of-art retrieval performance on several SBIR benchmarks. Second, under two separate works we propose to disambiguate sketched queries by combining sketched shape with a secondary modality: SBIR with colour and with aesthetic context. The former enables querying with coloured line-art sketches. Colour and shape features are extracted locally using a modified version of gradient field orientation histogram (GF-HoG) before globally pooled using dictionary learning. Various colour-shape fusion strategies are explored, coupled with an efficient indexing scheme for fast retrieval performance. The latter supports querying using both a sketched shape accompanied by one or several images serving as an aesthetic constraint governing the visual style of search results. We propose to model structure and style separately dis-entangling one modality from the other; then learn structure-style fusion using a hierarchical triplet network. This method enables further studies beyond SBIR such as style blending, style analogy and retrieval with alternative-modal queries. Third, we explore mid-grain SBIR -- a novel field requiring retrieved images to match both category and key visual characteristics of the sketch without demanding fine-grain, instance-level matching of specific object instance. We study a semi-supervised approach that requires mainly class-labelled sketches and images plus a small number of instance-labelled sketch-image pairs. This approach involves aligning sketch and image embeddings before pooling them into clusters from which mid-grain similarity may be measured. Our learned model demonstrates not only intra-category discrimination (mid-grain) but also improved inter-category discrimination (coarse-grain) on a newly created MidGrain65c dataset.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors :
Bui, Tu0000-0001-6622-9703
Date : 31 January 2019
Funders : EPSRC
DOI : 10.15126/thesis.00850099
Grant Title : Sketch based management of big visual data
Contributors :
Depositing User : Tu Bui
Date Deposited : 07 Feb 2019 08:48
Last Modified : 07 Feb 2019 08:48

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800