University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Cross-modal classification and retrieval of multimodal data using combinations of neural networks.

Saragiotis, Panagiotis. (2006) Cross-modal classification and retrieval of multimodal data using combinations of neural networks. Doctoral thesis, University of Surrey (United Kingdom)..

Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (29MB) | Preview


Current neurobiological thinking supported, in part, by experimentation stresses the importance of cross-modality. Uni-modal cognitive tasks, language and vision, for example, are performed with the help of many networks working simultaneously or sequentially; and for cross-modal tasks, like picture / object naming and word illustration, the output of these networks is combined to produce higher cognitive behaviour. The notion of multi-net processing is used typically in the pattern recognition literature, where ensemble networks of weak classifiers - typically supervised - appear to outperform strong classifiers. We have built a system, based on combinations of neural networks, that demonstrates how cross-modal classification can be used to retrieve multi-modal data using one of the available modalities of information. Two multi-net systems were used in this work: one comprising Kohonen SOMs that interact with each other via a Hebbian network and a fuzzy ARTMAP network where the interaction is through the embedded map field. The multi-nets were used for the cross-modal retrieval of images given keywords and for finding the appropriate keywords for an image. The systems were trained on two publicly available image databases that had collateral annotations on the images. The Hemera collection, comprising images of pre-segmented single objects, and the Corel collection with images of multiple objects were used for automatically generating various sets of input vectors. We have attempted to develop a method for evaluating the performance of multi-net systems using a monolithic network trained on modally-undifferentiated vectors as an intuitive bench-mark. To this extent single SOM and fuzzy ART networks were trained using a concatenated visual / linguistic vector to test the performance of multi-net systems with typical monolithic systems. Both multi-nets outperform the respective monolithic systems in terms of information retrieval measures of precision and recall on test images drawn from both datasets; the SOM multi-net outperforms the fuzzy ARTMAP both in terms of convergence and precision-recall. The performance of the SOM-based multi-net in retrieval, classification and auto-annotation is on a par with that of state of the art systems like "ALIP" and "Blobworld". Much of the neural network based simulations reported in the literature use supervised learning algorithms. Such algorithms are suited when classes of objects are predefined and objects in themselves are quite unique in terms of their attributes. We have compared the performance of our multi-net systems with that of a multi-layer perceptron (MLP). The MLP does show substantially greater precision and recall on a (fixed) class of objects when compared with our unsupervised systems. However when 'lesioned' -the network connectivity 'damaged' deliberately- the multi-net systems show a greater degree of robustness. Cross-modal systems appear to hold considerable intellectual and commercial potential and the multi-net approach facilitates the simulation of such systems.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors :
Saragiotis, Panagiotis.
Date : 2006
Contributors :
Depositing User : EPrints Services
Date Deposited : 09 Nov 2017 12:14
Last Modified : 15 Mar 2018 21:49

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800