University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Clowns, Crowds and Clouds: A Cross-Enterprise Approach to Detecting Information Leakage without Leaking Information

Gillam, L and Cooke, N (2011) Clowns, Crowds and Clouds: A Cross-Enterprise Approach to Detecting Information Leakage without Leaking Information In: Cloud Computing for Enterprise Architectures. Springer-Verlag New York Inc, pp. 301-322. ISBN 1447122356

[img] Text
clowns and - Version of Record
Restricted to Repository staff only
Available under License : See the attached licence file.

Download (317kB)
Text (licence)

Download (33kB)


In this paper we elaborate a near-duplicate and plagiarism detection ­service that combines both Crowd and Cloud computing in searching for and evaluating matching documents. We believe that our approach could be used across collaborating or competing Enterprises, or against the web, without any Enterprise needing to reveal the contents of its corporate (confidential) documents. The Cloud service involves a novel document fingerprinting approach which derives grammatical patterns but does not require grammatical knowledge and does not rely on hash-based approaches. Our approach generates a lossy and highly compressed document signature from which it is possible to generate fixed-length patterns as fingerprints or shingles. Fingerprint sizes are established by estimating likely random hit rates resulting from the size of the pattern and target search. Our Cloud service is geared towards enabling detection of Clowns, those who may attempt to, or have, leaked confidential or sensitive information, or have otherwise plagiarized, without needing to provide a copy of the original information. Crowds are to be used to validate results emerging from systematic evaluation of the service, ensuring that service modifications continue to act effectively and enabling continuous scaling-up. We discuss the formulation of the service and assess the efficacy of the fingerprinting approach by reference to an international benchmarking competition where we believe our system achieves top 5 performance (Precision=0.96 Recall=0.39).

Item Type: Book Section
Subjects : Computing
Divisions : Faculty of Engineering and Physical Sciences > Computing Science
Authors :
Gillam, L
Cooke, N
Editors :
Mahmood, Z
Hill, R
Date : 25 November 2011
Uncontrolled Keywords : Business & Economics, Cloud Computing, Plagiarism Detection, Data Leak Prevention
Depositing User : Symplectic Elements
Date Deposited : 06 Mar 2017 14:18
Last Modified : 06 Mar 2017 14:52

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800