Evaluating the absolute quality of a single protein model using structural features and support vector machines

Zheng Wang, Allison N. Tegge, Jianlin Cheng

Research output: Contribution to journalArticle

65 Citations (Scopus)

Abstract

Knowing the quality of a protein structure model is important for its appropriate usage. We developed a model evaluation method to assess the absolute quality of a single protein model using only structural features with support vector machine regression. The method assigns an absolute quantitative score (i.e. GDT-TS) to a model by comparing its secondary structure, relative solvent accessibility, contact map, and beta sheet structure with their counterparts predicted from its primary sequence. We trained and tested the method on the CASP6 dataset using cross-validation. The correlation between predicted and true scores is 0.82. On the independent CASP7 dataset, the correlation averaged over 95 protein targets is 0.76; the average correlation for template-based and ab initio targets is 0.82 and 0.50, respectively. Furthermore, the predicted absolute quality scores can be used to rank models effectively. The average difference (or loss) between the scores of the top-ranked models and the best models is 5.70 on the CASP7 targets. This method performs favorably when compared with the other methods used on the same dataset. Moreover, the predicted absolute quality scores are comparable across models for different proteins. These features make the method a valuable tool for model quality assurance and ranking.

Original languageEnglish (US)
Pages (from-to)638-647
Number of pages10
JournalProteins: Structure, Function and Bioinformatics
Volume75
Issue number3
DOIs
StatePublished - May 15 2009
Externally publishedYes

Fingerprint

Structural Models
Support vector machines
Proteins
Support Vector Machine
Model structures
Quality assurance
Datasets

Keywords

  • Machine learning
  • Protein model evaluation
  • Protein model quality assurance
  • Protein structure prediction
  • Support vector machine

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology

Cite this

Evaluating the absolute quality of a single protein model using structural features and support vector machines. / Wang, Zheng; Tegge, Allison N.; Cheng, Jianlin.

In: Proteins: Structure, Function and Bioinformatics, Vol. 75, No. 3, 15.05.2009, p. 638-647.

Research output: Contribution to journalArticle

@article{ce06a9e9cf2e45f6a7702e4f2336a791,
title = "Evaluating the absolute quality of a single protein model using structural features and support vector machines",
abstract = "Knowing the quality of a protein structure model is important for its appropriate usage. We developed a model evaluation method to assess the absolute quality of a single protein model using only structural features with support vector machine regression. The method assigns an absolute quantitative score (i.e. GDT-TS) to a model by comparing its secondary structure, relative solvent accessibility, contact map, and beta sheet structure with their counterparts predicted from its primary sequence. We trained and tested the method on the CASP6 dataset using cross-validation. The correlation between predicted and true scores is 0.82. On the independent CASP7 dataset, the correlation averaged over 95 protein targets is 0.76; the average correlation for template-based and ab initio targets is 0.82 and 0.50, respectively. Furthermore, the predicted absolute quality scores can be used to rank models effectively. The average difference (or loss) between the scores of the top-ranked models and the best models is 5.70 on the CASP7 targets. This method performs favorably when compared with the other methods used on the same dataset. Moreover, the predicted absolute quality scores are comparable across models for different proteins. These features make the method a valuable tool for model quality assurance and ranking.",
keywords = "Machine learning, Protein model evaluation, Protein model quality assurance, Protein structure prediction, Support vector machine",
author = "Zheng Wang and Tegge, {Allison N.} and Jianlin Cheng",
year = "2009",
month = "5",
day = "15",
doi = "10.1002/prot.22275",
language = "English (US)",
volume = "75",
pages = "638--647",
journal = "Proteins: Structure, Function and Genetics",
issn = "0887-3585",
publisher = "Wiley-Liss Inc.",
number = "3",

}

TY - JOUR

T1 - Evaluating the absolute quality of a single protein model using structural features and support vector machines

AU - Wang, Zheng

AU - Tegge, Allison N.

AU - Cheng, Jianlin

PY - 2009/5/15

Y1 - 2009/5/15

N2 - Knowing the quality of a protein structure model is important for its appropriate usage. We developed a model evaluation method to assess the absolute quality of a single protein model using only structural features with support vector machine regression. The method assigns an absolute quantitative score (i.e. GDT-TS) to a model by comparing its secondary structure, relative solvent accessibility, contact map, and beta sheet structure with their counterparts predicted from its primary sequence. We trained and tested the method on the CASP6 dataset using cross-validation. The correlation between predicted and true scores is 0.82. On the independent CASP7 dataset, the correlation averaged over 95 protein targets is 0.76; the average correlation for template-based and ab initio targets is 0.82 and 0.50, respectively. Furthermore, the predicted absolute quality scores can be used to rank models effectively. The average difference (or loss) between the scores of the top-ranked models and the best models is 5.70 on the CASP7 targets. This method performs favorably when compared with the other methods used on the same dataset. Moreover, the predicted absolute quality scores are comparable across models for different proteins. These features make the method a valuable tool for model quality assurance and ranking.

AB - Knowing the quality of a protein structure model is important for its appropriate usage. We developed a model evaluation method to assess the absolute quality of a single protein model using only structural features with support vector machine regression. The method assigns an absolute quantitative score (i.e. GDT-TS) to a model by comparing its secondary structure, relative solvent accessibility, contact map, and beta sheet structure with their counterparts predicted from its primary sequence. We trained and tested the method on the CASP6 dataset using cross-validation. The correlation between predicted and true scores is 0.82. On the independent CASP7 dataset, the correlation averaged over 95 protein targets is 0.76; the average correlation for template-based and ab initio targets is 0.82 and 0.50, respectively. Furthermore, the predicted absolute quality scores can be used to rank models effectively. The average difference (or loss) between the scores of the top-ranked models and the best models is 5.70 on the CASP7 targets. This method performs favorably when compared with the other methods used on the same dataset. Moreover, the predicted absolute quality scores are comparable across models for different proteins. These features make the method a valuable tool for model quality assurance and ranking.

KW - Machine learning

KW - Protein model evaluation

KW - Protein model quality assurance

KW - Protein structure prediction

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=66149156968&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=66149156968&partnerID=8YFLogxK

U2 - 10.1002/prot.22275

DO - 10.1002/prot.22275

M3 - Article

C2 - 19004001

AN - SCOPUS:66149156968

VL - 75

SP - 638

EP - 647

JO - Proteins: Structure, Function and Genetics

JF - Proteins: Structure, Function and Genetics

SN - 0887-3585

IS - 3

ER -