• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索

author:

Li, Tianjun (Li, Tianjun.) [1] | Chen, Long (Chen, Long.) [2] | Gan, Min (Gan, Min.) [3]

Indexed by:

EI Scopus SCIE

Abstract:

Background Mass spectra are usually acquired from the Liquid Chromatography-Mass Spectrometry (LC-MS) analysis for isotope labeled proteomics experiments. In such experiments, the mass profiles of labeled (heavy) and unlabeled (light) peptide pairs are represented by isotope clusters (2D or 3D) that provide valuable information about the studied biological samples in different conditions. The core task of quality control in quantitative LC-MS experiment is to filter out low-quality peptides with questionable profiles. The commonly used methods for this problem are the classification approaches. However, the data imbalance problems in previous control methods are often ignored or mishandled. In this study, we introduced a quality control framework based on the extreme gradient boosting machine (XGBoost), and carefully addressed the imbalanced data problem in this framework. Results In the XGBoost based framework, we suggest the application of the Synthetic minority over-sampling technique (SMOTE) to re-balance data and use the balanced data to train the boosted trees as the classifier. Then the classifier is applied to other data for the peptide quality assessment. Experimental results show that our proposed framework increases the reliability of peptide heavy-light ratio estimation significantly. Conclusions Our results indicate that this framework is a powerful method for the peptide quality assessment. For the feature extraction part, the extracted ion chromatogram (XIC) based features contribute to the peptide quality assessment. To solve the imbalanced data problem, SMOTE brings a much better classification performance. Finally, the XGBoost is capable for the peptide quality control. Overall, our proposed framework provides reliable results for the further proteomics studies.

Keyword:

Gradient Boosting Imbalanced Data Mass Spectra Proteomics Quality Control

Community:

  • [ 1 ] [Li, Tianjun]Univ Macau, Dept Comp & Informat Sci, Taipa, Macao, Peoples R China
  • [ 2 ] [Chen, Long]Univ Macau, Dept Comp & Informat Sci, Taipa, Macao, Peoples R China
  • [ 3 ] [Gan, Min]Fuzhou Univ, Coll Math & Comp Sci, Fuzhou, Fujian, Peoples R China

Reprint 's Address:

  • [Chen, Long]Univ Macau, Dept Comp & Informat Sci, Taipa, Macao, Peoples R China

Show more details

Related Keywords:

Related Article:

Source :

BMC BIOINFORMATICS

ISSN: 1471-2105

Year: 2019

Issue: 1

Volume: 20

3 . 2 4 2

JCR@2019

2 . 9 0 0

JCR@2023

ESI Discipline: COMPUTER SCIENCE;

ESI HC Threshold:162

JCR Journal Grade:2

CAS Journal Grade:2

Cited Count:

WoS CC Cited Count: 7

SCOPUS Cited Count: 7

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 2

Online/Total:91/10022031
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1