Sequence-based prediction of protein folding rates using contacts, secondary structures and support vector machines

Guan Ning Lin, Zheng Wang, Dong Xu, Jianlin Cheng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic natures (two-state folding and multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using features extracted from only protein sequence with support vector machines. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80%. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold-rate/ index.html.

Original languageEnglish (US)
Title of host publication2009 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2009
Pages3-8
Number of pages6
DOIs
StatePublished - Dec 1 2009
Externally publishedYes
Event2009 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2009 - Washington, D.C., United States
Duration: Nov 1 2009Nov 4 2009

Other

Other2009 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2009
CountryUnited States
CityWashington, D.C.
Period11/1/0911/4/09

Fingerprint

Protein folding
Protein Folding
Support vector machines
Proteins
Kinetics
Benchmarking
Tertiary Protein Structure
Software
Support Vector Machine
Servers

Keywords

  • Classifcation
  • Folding rate
  • Folding type
  • Support vector machine

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Biomedical Engineering
  • Health Informatics

Cite this

Lin, G. N., Wang, Z., Xu, D., & Cheng, J. (2009). Sequence-based prediction of protein folding rates using contacts, secondary structures and support vector machines. In 2009 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2009 (pp. 3-8). [5341882] https://doi.org/10.1109/BIBM.2009.21

Sequence-based prediction of protein folding rates using contacts, secondary structures and support vector machines. / Lin, Guan Ning; Wang, Zheng; Xu, Dong; Cheng, Jianlin.

2009 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2009. 2009. p. 3-8 5341882.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lin, GN, Wang, Z, Xu, D & Cheng, J 2009, Sequence-based prediction of protein folding rates using contacts, secondary structures and support vector machines. in 2009 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2009., 5341882, pp. 3-8, 2009 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2009, Washington, D.C., United States, 11/1/09. https://doi.org/10.1109/BIBM.2009.21
Lin GN, Wang Z, Xu D, Cheng J. Sequence-based prediction of protein folding rates using contacts, secondary structures and support vector machines. In 2009 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2009. 2009. p. 3-8. 5341882 https://doi.org/10.1109/BIBM.2009.21
Lin, Guan Ning ; Wang, Zheng ; Xu, Dong ; Cheng, Jianlin. / Sequence-based prediction of protein folding rates using contacts, secondary structures and support vector machines. 2009 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2009. 2009. pp. 3-8
@inproceedings{e366ff4f64de4b27960582b1886ae542,
title = "Sequence-based prediction of protein folding rates using contacts, secondary structures and support vector machines",
abstract = "Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic natures (two-state folding and multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using features extracted from only protein sequence with support vector machines. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80{\%}. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold-rate/ index.html.",
keywords = "Classifcation, Folding rate, Folding type, Support vector machine",
author = "Lin, {Guan Ning} and Zheng Wang and Dong Xu and Jianlin Cheng",
year = "2009",
month = "12",
day = "1",
doi = "10.1109/BIBM.2009.21",
language = "English (US)",
isbn = "9780769538853",
pages = "3--8",
booktitle = "2009 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2009",

}

TY - GEN

T1 - Sequence-based prediction of protein folding rates using contacts, secondary structures and support vector machines

AU - Lin, Guan Ning

AU - Wang, Zheng

AU - Xu, Dong

AU - Cheng, Jianlin

PY - 2009/12/1

Y1 - 2009/12/1

N2 - Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic natures (two-state folding and multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using features extracted from only protein sequence with support vector machines. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80%. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold-rate/ index.html.

AB - Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic natures (two-state folding and multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using features extracted from only protein sequence with support vector machines. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80%. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold-rate/ index.html.

KW - Classifcation

KW - Folding rate

KW - Folding type

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=74549147412&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=74549147412&partnerID=8YFLogxK

U2 - 10.1109/BIBM.2009.21

DO - 10.1109/BIBM.2009.21

M3 - Conference contribution

SN - 9780769538853

SP - 3

EP - 8

BT - 2009 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2009

ER -