Anomaly Detection in Non-Stationary and Distributed Environments
O'Reilly, Colin (2014) Anomaly Detection in Non-Stationary and Distributed Environments Doctoral thesis, University of Surrey.
Colin_OReilly_PhD_thesis.pdf - Thesis (version of record)
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (4MB) | Preview
Anomaly detection is an important aspect of data analysis in order to identify data items that significantly differ from normal data. It is used in a variety of fields such as machine monitoring, environmental monitoring and security applications and is a well-studied area in the field of pattern recognition and machine learning. In this thesis, the key challenges of performing anomaly detection in non-stationary and distributed environments are addressed separately. In non-stationary environments the data distribution may alter, meaning that the concepts to be learned evolve in time. Anomaly detection techniques must be able to adapt to a non-stationary data distribution in order to perform optimally. This requires an update to the model that is being used to classify data. A batch approach to the problem requires a reconstruction of the model each time an update is required. Incremental learning overcomes this issue by using the previous model as the basis for an update. Two kernel-based incremental anomaly detection techniques are proposed. One technique uses kernel principal component analysis to perform anomaly detection. The kernel eigenspace is incrementally updated by splitting and merging kernel eigenspaces. The technique is shown to be more accurate than current state-of-the-art solutions. The second technique offers a reduction in the number of computations by using an incrementally updated hypersphere in kernel space. In addition to updating a model, in a non-stationary environment an update to the parameters of the model are required. Anomaly detection algorithms require the selection of appropriate parameters in order to perform optimally for a given data set. If the distribution of the data changes, an update to the parameters of a model is required. An automatic parameter optimization procedure is proposed for the one-class quartersphere support vector machine where the v parameter is selected automatically based on the anomaly rate in the training set. In environments such as wireless sensor networks, data might be distributed amongst a number of nodes. In this case, distributed learning is required where nodes construct a classifier, or an approximation of the classifier, that would have been formed had all the data been available to one instance of the algorithm. A principal component analysis based anomaly detection method is proposed that uses the solution to a convex optimization problem. The convex optimization problem is then derived in a distributed form, with each node running a local instance of the algorithm. Nodes are able to iterate towards an anomaly detector equivalent to the global solution by exchanging short messages. Detailed evaluations of the proposed techniques are performed against existing state-of-the-art techniques using a variety of synthetic and real-world data sets. Results in the area of a non-stationary environment illustrate the necessity to adapt an anomaly detection model to the changing data distribution. It is shown that the proposed incremental techniques are maintain accuracy while reducing the number of computations. In addition, optimal parameters derived from an unlabelled training set are shown to exhibit superior performance to statically selected parameters. In the area of a distributed environment, it is shown that local learning is insufficient due to the lack of examples. Distributed learning can be performed in a manner where a centralized model can be derived by passing small amounts of information between neighbouring nodes. This approach yields a model that obtains performance equal to that of the centralized model.
|Item Type:||Thesis (Doctoral)|
|Subjects :||Machine learning|
|Date :||December 2014|
|Funders :||Engineering and Physical Sciences Research Council (EPSRC)|
|Copyright Disclaimer :||Copyright 2014 The Author|
|Depositing User :||Colin O'Reilly|
|Date Deposited :||15 Apr 2016 15:56|
|Last Modified :||15 Apr 2016 15:56|
Actions (login required)
Downloads per month over past year