PANDA: Protein function prediction using domain architecture and affinity propagation

Zheng Wang, Chenguang Zhao, Yiheng Wang, Zheng Sun, Nan Wang

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

We developed PANDA (Propagation of Affinity and Domain Architecture) to predict protein functions in the format of Gene Ontology (GO) terms. PANDA at first executes profile-profile alignment algorithm to search against PfamA, KOG, COG, and SwissProt databases, and then launches PSI-BLAST against UniProt for homologue search. PANDA integrates a domain architecture inference algorithm based on the Bayesian statistics that calculates the probability of having a GO term. All the candidate GO terms are pooled and filtered based on Z-score. After that, the remaining GO terms are clustered using an affinity propagation algorithm based on the GO directed acyclic graph, followed by a second round of filtering on the clusters of GO terms. We benchmarked the performance of all the baseline predictors PANDA integrates and also for every pooling and filtering step of PANDA. It can be found that PANDA achieves better performances in terms of area under the curve for precision and recall compared to the baseline predictors. PANDA can be accessed from http://dna.cs.miami.edu/PANDA/.

Original languageEnglish (US)
Article number3484
JournalScientific Reports
Volume8
Issue number1
DOIs
StatePublished - Dec 1 2018

Fingerprint

Ontology
Genes
Proteins
Statistics

ASJC Scopus subject areas

  • General

Cite this

PANDA : Protein function prediction using domain architecture and affinity propagation. / Wang, Zheng; Zhao, Chenguang; Wang, Yiheng; Sun, Zheng; Wang, Nan.

In: Scientific Reports, Vol. 8, No. 1, 3484, 01.12.2018.

Research output: Contribution to journalArticle

Wang, Zheng ; Zhao, Chenguang ; Wang, Yiheng ; Sun, Zheng ; Wang, Nan. / PANDA : Protein function prediction using domain architecture and affinity propagation. In: Scientific Reports. 2018 ; Vol. 8, No. 1.
@article{1236143ae92a4f7699db69412cc2ff02,
title = "PANDA: Protein function prediction using domain architecture and affinity propagation",
abstract = "We developed PANDA (Propagation of Affinity and Domain Architecture) to predict protein functions in the format of Gene Ontology (GO) terms. PANDA at first executes profile-profile alignment algorithm to search against PfamA, KOG, COG, and SwissProt databases, and then launches PSI-BLAST against UniProt for homologue search. PANDA integrates a domain architecture inference algorithm based on the Bayesian statistics that calculates the probability of having a GO term. All the candidate GO terms are pooled and filtered based on Z-score. After that, the remaining GO terms are clustered using an affinity propagation algorithm based on the GO directed acyclic graph, followed by a second round of filtering on the clusters of GO terms. We benchmarked the performance of all the baseline predictors PANDA integrates and also for every pooling and filtering step of PANDA. It can be found that PANDA achieves better performances in terms of area under the curve for precision and recall compared to the baseline predictors. PANDA can be accessed from http://dna.cs.miami.edu/PANDA/.",
author = "Zheng Wang and Chenguang Zhao and Yiheng Wang and Zheng Sun and Nan Wang",
year = "2018",
month = "12",
day = "1",
doi = "10.1038/s41598-018-21849-1",
language = "English (US)",
volume = "8",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",
number = "1",

}

TY - JOUR

T1 - PANDA

T2 - Protein function prediction using domain architecture and affinity propagation

AU - Wang, Zheng

AU - Zhao, Chenguang

AU - Wang, Yiheng

AU - Sun, Zheng

AU - Wang, Nan

PY - 2018/12/1

Y1 - 2018/12/1

N2 - We developed PANDA (Propagation of Affinity and Domain Architecture) to predict protein functions in the format of Gene Ontology (GO) terms. PANDA at first executes profile-profile alignment algorithm to search against PfamA, KOG, COG, and SwissProt databases, and then launches PSI-BLAST against UniProt for homologue search. PANDA integrates a domain architecture inference algorithm based on the Bayesian statistics that calculates the probability of having a GO term. All the candidate GO terms are pooled and filtered based on Z-score. After that, the remaining GO terms are clustered using an affinity propagation algorithm based on the GO directed acyclic graph, followed by a second round of filtering on the clusters of GO terms. We benchmarked the performance of all the baseline predictors PANDA integrates and also for every pooling and filtering step of PANDA. It can be found that PANDA achieves better performances in terms of area under the curve for precision and recall compared to the baseline predictors. PANDA can be accessed from http://dna.cs.miami.edu/PANDA/.

AB - We developed PANDA (Propagation of Affinity and Domain Architecture) to predict protein functions in the format of Gene Ontology (GO) terms. PANDA at first executes profile-profile alignment algorithm to search against PfamA, KOG, COG, and SwissProt databases, and then launches PSI-BLAST against UniProt for homologue search. PANDA integrates a domain architecture inference algorithm based on the Bayesian statistics that calculates the probability of having a GO term. All the candidate GO terms are pooled and filtered based on Z-score. After that, the remaining GO terms are clustered using an affinity propagation algorithm based on the GO directed acyclic graph, followed by a second round of filtering on the clusters of GO terms. We benchmarked the performance of all the baseline predictors PANDA integrates and also for every pooling and filtering step of PANDA. It can be found that PANDA achieves better performances in terms of area under the curve for precision and recall compared to the baseline predictors. PANDA can be accessed from http://dna.cs.miami.edu/PANDA/.

UR - http://www.scopus.com/inward/record.url?scp=85042550112&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85042550112&partnerID=8YFLogxK

U2 - 10.1038/s41598-018-21849-1

DO - 10.1038/s41598-018-21849-1

M3 - Article

C2 - 29472600

AN - SCOPUS:85042550112

VL - 8

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

IS - 1

M1 - 3484

ER -