University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Machine learning for the exploitation of high throughput omics data : a case study on identifying circadian disruption from human blood transcriptomic data.

Alganmi, Nofe (2019) Machine learning for the exploitation of high throughput omics data : a case study on identifying circadian disruption from human blood transcriptomic data. Doctoral thesis, University of Surrey.

[img] Text
NG.pdf - Version of Record
Restricted to Repository staff only
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (7MB) | Request a copy

Abstract

The DNA microarray is a high throughput technology that is able to scan thousands of genes simultaneously and read their expression level. However, there are many challenges associated with data. One of the main opportunities is the curse of dimensionality which makes it difficult to learn without overfitting. Therefore, we proposed an unsupervised nonlinear machine learning framework to explore the circadian rhythmic features as a case study. Auto-encoder is capable of automatically learn the microarray data features and reveal knowledge that can help in designing the complex relations between the features for a circadian disorder in the future. Features derived from unsupervised algorithms can serve as input features to supervised learning, used to build discriminative markers, and directly used as functional modules. The constructed features are typically compressed representation of input data in a lower dimension. They maintain essential information in the input but are better organized than the input with less noise or artifacts. Therefore, it is easier to build classifiers on the summarized features than raw input data, and the success of a classifier heavily depends on the choice of data representation We proved our finding using machine learning classification framework. With our representation, we could enhance simple linear SVM accuracy from 63% to 75% We also proposed a novel machine learning approach to evaluating the circadian disruption using robust regression as a contextual anomaly detection method. The main aspect of novelty in this work is coming from applying a point anomaly detection technique with respect to a circadian rhythmicity context. To the best of our knowledge, this work is the first which introduced the use of NR1D1/NR1D2 clock genes as prior knowledge to detect genes pathways involved in response to sleep disruption. In the Circadian Disruption Detection (CDD) model, we implemented and validated a model that successfully model the normal samples. While in anomalies samples i.e. samples with significant transcription effect under the circadian disruption, the model was acting poorly. Results of the analysis of variance (ANOVA) and t-test show the benefits of using our robust multi-regression errors as a biological biomarker to detect sleep deprivation using genes microarray data. we found that there was a significant difference between the error distribution for the normal sleep and the anomalies samples at the p<0.05 level. The model used to identify a quantitative measurement for sleep disruption in human regardless of the time of the day.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors :
NameEmailORCID
Alganmi, Nofe
Date : 29 March 2019
Funders : King Abdulaziz University Jeddah, Saudi Arabia
DOI : 10.15126/thesis.00850560
Contributors :
ContributionNameEmailORCID
http://www.loc.gov/loc.terms/relators/THSTang, H LilianH.Tang@surrey.ac.uk
http://www.loc.gov/loc.terms/relators/THSLaing, EmmaE.Laing@surrey.ac.uk
Depositing User : Nofe Alganmi
Date Deposited : 04 Apr 2019 07:49
Last Modified : 04 Apr 2019 07:50
URI: http://epubs.surrey.ac.uk/id/eprint/850560

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800