Three-Level Prediction of Protein Function by Combining Profile-Sequence Search, Profile-Profile Search, and Domain Co-Occurrence Networks

Zheng Wang, Renzhi Cao, Jianlin Cheng

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

Predicting protein function from sequence is useful for biochemical experiment design, mutagenesis analysis, protein engineering, protein design, biological pathway analysis, drug design, disease diagnosis, and genome annotation as a vast number of protein sequences with unknown function are routinely being generated by DNA, RNA and protein sequencing in the genomic era. However, despite significant progresses in the last several years, the accuracy of protein function prediction still needs to be improved in order to be used effectively in practice, particularly when little or no homology exists between a target protein and proteins with annotated function. Here, we developed a method that integrated profile-sequence alignment, profile-profile alignment, and Domain Co-Occurrence Networks (DCN) to predict protein function at different levels of complexity, ranging from obvious homology, to remote homology, to no homology. We tested the method blindingly in the 2011 Critical Assessment of Function Annotation (CAFA). Our experiments demonstrated that our three-level prediction method effectively increased the recall of function prediction while maintaining a reasonable precision. Particularly, our method can predict function terms defined by the Gene Ontology more accurately than three standard baseline methods in most situations, handle multi-domain proteins naturally, and make ab initio function prediction when no homology exists. These results show that our approach can combine complementary strengths of most widely used BLAST-based function prediction methods, rarely used in function prediction but more sensitive profile-profile comparison-based homology detection methods, and non-homology-based domain co-occurrence networks, to effectively extend the power of function prediction from high homology, to low homology, to no homology (ab initio cases).

Original languageEnglish (US)
Article numberS3
JournalBMC Bioinformatics
Volume14
Issue numberSUPPL.3
DOIs
StatePublished - Feb 28 2013
Externally publishedYes

Fingerprint

Proteins
Protein
Homology
Prediction
RNA Sequence Analysis
Protein Engineering
Gene Ontology
Sequence Alignment
Drug Design
Protein Sequence Analysis
Annotation
Profile
DNA Sequence Analysis
Mutagenesis
Genes
Predict
Genome
Protein Sequence
RNA
Sequencing

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Three-Level Prediction of Protein Function by Combining Profile-Sequence Search, Profile-Profile Search, and Domain Co-Occurrence Networks. / Wang, Zheng; Cao, Renzhi; Cheng, Jianlin.

In: BMC Bioinformatics, Vol. 14, No. SUPPL.3, S3, 28.02.2013.

Research output: Contribution to journalArticle

@article{51e26e92738d498996490f128c6324e4,
title = "Three-Level Prediction of Protein Function by Combining Profile-Sequence Search, Profile-Profile Search, and Domain Co-Occurrence Networks",
abstract = "Predicting protein function from sequence is useful for biochemical experiment design, mutagenesis analysis, protein engineering, protein design, biological pathway analysis, drug design, disease diagnosis, and genome annotation as a vast number of protein sequences with unknown function are routinely being generated by DNA, RNA and protein sequencing in the genomic era. However, despite significant progresses in the last several years, the accuracy of protein function prediction still needs to be improved in order to be used effectively in practice, particularly when little or no homology exists between a target protein and proteins with annotated function. Here, we developed a method that integrated profile-sequence alignment, profile-profile alignment, and Domain Co-Occurrence Networks (DCN) to predict protein function at different levels of complexity, ranging from obvious homology, to remote homology, to no homology. We tested the method blindingly in the 2011 Critical Assessment of Function Annotation (CAFA). Our experiments demonstrated that our three-level prediction method effectively increased the recall of function prediction while maintaining a reasonable precision. Particularly, our method can predict function terms defined by the Gene Ontology more accurately than three standard baseline methods in most situations, handle multi-domain proteins naturally, and make ab initio function prediction when no homology exists. These results show that our approach can combine complementary strengths of most widely used BLAST-based function prediction methods, rarely used in function prediction but more sensitive profile-profile comparison-based homology detection methods, and non-homology-based domain co-occurrence networks, to effectively extend the power of function prediction from high homology, to low homology, to no homology (ab initio cases).",
author = "Zheng Wang and Renzhi Cao and Jianlin Cheng",
year = "2013",
month = "2",
day = "28",
doi = "10.1186/1471-2105-14-S3-S3",
language = "English (US)",
volume = "14",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "SUPPL.3",

}

TY - JOUR

T1 - Three-Level Prediction of Protein Function by Combining Profile-Sequence Search, Profile-Profile Search, and Domain Co-Occurrence Networks

AU - Wang, Zheng

AU - Cao, Renzhi

AU - Cheng, Jianlin

PY - 2013/2/28

Y1 - 2013/2/28

N2 - Predicting protein function from sequence is useful for biochemical experiment design, mutagenesis analysis, protein engineering, protein design, biological pathway analysis, drug design, disease diagnosis, and genome annotation as a vast number of protein sequences with unknown function are routinely being generated by DNA, RNA and protein sequencing in the genomic era. However, despite significant progresses in the last several years, the accuracy of protein function prediction still needs to be improved in order to be used effectively in practice, particularly when little or no homology exists between a target protein and proteins with annotated function. Here, we developed a method that integrated profile-sequence alignment, profile-profile alignment, and Domain Co-Occurrence Networks (DCN) to predict protein function at different levels of complexity, ranging from obvious homology, to remote homology, to no homology. We tested the method blindingly in the 2011 Critical Assessment of Function Annotation (CAFA). Our experiments demonstrated that our three-level prediction method effectively increased the recall of function prediction while maintaining a reasonable precision. Particularly, our method can predict function terms defined by the Gene Ontology more accurately than three standard baseline methods in most situations, handle multi-domain proteins naturally, and make ab initio function prediction when no homology exists. These results show that our approach can combine complementary strengths of most widely used BLAST-based function prediction methods, rarely used in function prediction but more sensitive profile-profile comparison-based homology detection methods, and non-homology-based domain co-occurrence networks, to effectively extend the power of function prediction from high homology, to low homology, to no homology (ab initio cases).

AB - Predicting protein function from sequence is useful for biochemical experiment design, mutagenesis analysis, protein engineering, protein design, biological pathway analysis, drug design, disease diagnosis, and genome annotation as a vast number of protein sequences with unknown function are routinely being generated by DNA, RNA and protein sequencing in the genomic era. However, despite significant progresses in the last several years, the accuracy of protein function prediction still needs to be improved in order to be used effectively in practice, particularly when little or no homology exists between a target protein and proteins with annotated function. Here, we developed a method that integrated profile-sequence alignment, profile-profile alignment, and Domain Co-Occurrence Networks (DCN) to predict protein function at different levels of complexity, ranging from obvious homology, to remote homology, to no homology. We tested the method blindingly in the 2011 Critical Assessment of Function Annotation (CAFA). Our experiments demonstrated that our three-level prediction method effectively increased the recall of function prediction while maintaining a reasonable precision. Particularly, our method can predict function terms defined by the Gene Ontology more accurately than three standard baseline methods in most situations, handle multi-domain proteins naturally, and make ab initio function prediction when no homology exists. These results show that our approach can combine complementary strengths of most widely used BLAST-based function prediction methods, rarely used in function prediction but more sensitive profile-profile comparison-based homology detection methods, and non-homology-based domain co-occurrence networks, to effectively extend the power of function prediction from high homology, to low homology, to no homology (ab initio cases).

UR - http://www.scopus.com/inward/record.url?scp=84879310012&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84879310012&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-14-S3-S3

DO - 10.1186/1471-2105-14-S3-S3

M3 - Article

C2 - 23514381

AN - SCOPUS:84879310012

VL - 14

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - SUPPL.3

M1 - S3

ER -