University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Systematic evaluation of machine learning methods for identifying human-pathogen protein-protein interactions

Chen, Huaming, Li, Fuyi, Wang, Lei, Jin, Yaochu, Chi, Chi-Hung, Kurgan, Lukasz, Song, Jiangning and Shen, Jun (2020) Systematic evaluation of machine learning methods for identifying human-pathogen protein-protein interactions Bioinformatics.

[img] Text
Chen_et_al_BIB_R1_0329[122016].pdf - Accepted version Manuscript
Restricted to Repository staff only until 2 April 2021.

Download (661kB)

Abstract

In recent years, high-throughput experimental techniques have significantly enhanced the accuracy and coverage of protein-protein interaction identification, including human-pathogen protein-protein interactions (HP-PPIs). Despite this progress, experimental methods are, in general, expensive in terms of both time and labor costs, especially considering that there are enormous amounts of potential protein-interacting partners. Developing computational methods to predict interactions between human and bacteria pathogen has thus become critical and meaningful, in both facilitating the detection of interactions and mining incomplete interaction maps. In this paper, we present a systematic evaluation of machine-learning-based computational methods for human-bacterium protein-protein interactions (HB-PPIs). We first review a vast number of publicly available databases of HP-PPIs, and then critically evaluate the availability of these databases. Benefitting from its well-structured nature, we subsequently preprocess the data and identified six bacterium pathogens that could be used to study bacterium subjects in which a human was the host. Additionally, we thoroughly reviewed the literature on “host-pathogen interactions” whereby existing models were summarized that we used to jointly study the impact of different feature-representation algorithms and evaluate the performance of existing machine-learning computational models. Owing to the abundance of sequence information and the limited scale of other protein-related information, we adopted the primary protocol from the literature and dedicated our analysis to a comprehensive assessment of sequence information and machine-learning models. A systematic evaluation of machine-learning models and a wide range of feature-representation algorithms based on sequence information are presented as a comparison survey towards the prediction performance evaluation of HB-PPIs.

Item Type: Article
Divisions : Faculty of Engineering and Physical Sciences > Computer Science
Authors :
NameEmailORCID
Chen, Huaming
Li, Fuyi
Wang, Lei
Jin, YaochuYaochu.Jin@surrey.ac.uk
Chi, Chi-Hung
Kurgan, Lukasz
Song, Jiangning
Shen, Jun
Date : 27 May 2020
DOI : 10.1093/bib/bbaa068
Copyright Disclaimer : © The Author(s) 2020. Published by Oxford University Press. All rights reserved.
Uncontrolled Keywords : bioinformatics; human-pathogen interactions; protein-protein interactions; systematic evaluation; sequential analysis; machine learning
Depositing User : James Marshall
Date Deposited : 29 May 2020 15:08
Last Modified : 29 May 2020 15:08
URI: http://epubs.surrey.ac.uk/id/eprint/857011

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800