University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Feature Selection and Casual Discovery For Ensemble Classifiers.

Duangsoithong, Rakkrit. (2012) Feature Selection and Casual Discovery For Ensemble Classifiers. Doctoral thesis, University of Surrey (United Kingdom)..

[img]
Preview
Text
27558476.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (12MB) | Preview

Abstract

With rapid development of computer and information technology that can improve a large number of applications such as web text mining, intrusion detection, biomedical informatics, gene selection in micro array data, medical data mining, and clinical decision support systems, many information databases have been created. However, in some applications especially in the medical area, clinical data may contain hundreds to thousands of features with relatively few samples. A consequence of this problem is increased complexity that leads to degradation in efficiency and accuracy. Moreover, in this high dimensional feature space, many features are possibly irrelevant or redundant and should be removed in order to ensure good generalisation performance. Otherwise, the classifier may over-fit the data, that is the classifier may specialise on features which are not relevant for discrimination. To overcome this problem, feature selection and ensemble classification are applied. In this thesis, an empirical analysis on using bootstrap and random subspace feature selection for multiple classifier system is investigated and bootstrap feature selection and embedded feature ranking for ensemble MLP classifiers along with a stopping criterion based on the out-of-bootstrap estimate are proposed. Moreover, basically, feature selection does not usually take causal discovery into account. However, in some cases such as when the testing distribution is shifted from manipulation by external agent, causal discovery can provide some benefits for feature selection under these uncertainty conditions. It also can learn the underlying data structure, provide better understanding of the data generation process and better accuracy and robustness under uncertainty. Similarly, feature selection mutually enables global causal discovery algorithms to deal with high dimensional data by eliminating irrelevant and redundant features before exploring the causal relationship between features. A redundancy-based ensemble causal feature selection approach using bootstrap and random subspace and a comparison between correlation-based and causal feature selection for ensemble classifiers are analysed. Finally, hybrid correlation-causal feature selection for multiple classifier system is proposed in order to scale up causal discovery and deal with high dimensional features.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors : Duangsoithong, Rakkrit.
Date : 2012
Additional Information : Thesis (Ph.D.)--University of Surrey (United Kingdom), 2012.
Depositing User : EPrints Services
Date Deposited : 24 Apr 2020 15:26
Last Modified : 24 Apr 2020 15:26
URI: http://epubs.surrey.ac.uk/id/eprint/855176

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800