MMM-QSAR recognition of ribonucleases without alignment: Comparison with an HMM model and isolation from Schizosaccharomyces pombe, prediction, and experimental assay of a new sequence

Guillermín Agüero-Chapín, Humberto González-Díaz, Gustavo De La Riva, Edrey Rodríguez, Aminael Sánchez-Rodríguez, Gianni Podda, Roberto I Vazquez-Padron

Research output: Contribution to journalArticle

46 Citations (Scopus)

Abstract

The study of type III RNases constitutes an important area in molecular biology. It is known that the pac1+ gene encodes a particular RNase III that shares low amino acid similarity with other genes despite having a double-stranded ribonuclease activity. Bioinformatics memods based on sequence alignment may fail when there is a low amino acidic identity percentage between a query sequence and others with similar functions (remote homologues) or a similar sequence is not recorded in the database. Quantitative structure-activity relationships (QSAR) applied to protein sequences may allow an alignment-independent prediction of protein function. These sequences of QS AR-like methods often use 1D sequence numerical parameters as the input to seek sequence-function relationships. However, previous 2D representation of sequences may uncover useful higher-order information. In the work described here we calculated for the first time the spectral moments of a Markov matrix (MMM) associated with a 2D-HP-map of a protein sequence. We used MMMs values to characterize numerically 81 sequences of type III RNases and 133 proteins of a control group. We subsequently developed one MMM-QSAR and one classic hidden Markov model (HMM) based on the same data. The MMM-QSAR showed a discrimination power of RNAses from other proteins of 97.35% without using alignment, which is a result as good as for the known HMM techniques. We also report for the first time the isolation of a new Pac1 protein (DQ647826) from Schizosaccharomyces pombe strain 428-4-1. The MMM-QSAR model predicts the new RNase III with the same accuracy as other classical alignment methods. Experimental assay of this protein confirms me predicted activity. The present results suggest that MMM-QSAR models may be used for protein function annotation avoiding sequence alignment with the same accuracy of classic HMM models.

Original languageEnglish
Pages (from-to)434-448
Number of pages15
JournalJournal of Chemical Information and Modeling
Volume48
Issue number2
DOIs
StatePublished - Feb 1 2008
Externally publishedYes

Fingerprint

activity structure
Hidden Markov models
Ribonucleases
social isolation
Assays
Ribonuclease III
Proteins
Schizosaccharomyces pombe Proteins
Genes
Molecular biology
Bioinformatics
biology
discrimination
Amino acids
Amino Acids
Values
Group

ASJC Scopus subject areas

  • Chemistry(all)
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Cite this

MMM-QSAR recognition of ribonucleases without alignment : Comparison with an HMM model and isolation from Schizosaccharomyces pombe, prediction, and experimental assay of a new sequence. / Agüero-Chapín, Guillermín; González-Díaz, Humberto; De La Riva, Gustavo; Rodríguez, Edrey; Sánchez-Rodríguez, Aminael; Podda, Gianni; Vazquez-Padron, Roberto I.

In: Journal of Chemical Information and Modeling, Vol. 48, No. 2, 01.02.2008, p. 434-448.

Research output: Contribution to journalArticle

Agüero-Chapín, Guillermín ; González-Díaz, Humberto ; De La Riva, Gustavo ; Rodríguez, Edrey ; Sánchez-Rodríguez, Aminael ; Podda, Gianni ; Vazquez-Padron, Roberto I. / MMM-QSAR recognition of ribonucleases without alignment : Comparison with an HMM model and isolation from Schizosaccharomyces pombe, prediction, and experimental assay of a new sequence. In: Journal of Chemical Information and Modeling. 2008 ; Vol. 48, No. 2. pp. 434-448.
@article{5aed0f3b181049f19ebfcbc155bdc66d,
title = "MMM-QSAR recognition of ribonucleases without alignment: Comparison with an HMM model and isolation from Schizosaccharomyces pombe, prediction, and experimental assay of a new sequence",
abstract = "The study of type III RNases constitutes an important area in molecular biology. It is known that the pac1+ gene encodes a particular RNase III that shares low amino acid similarity with other genes despite having a double-stranded ribonuclease activity. Bioinformatics memods based on sequence alignment may fail when there is a low amino acidic identity percentage between a query sequence and others with similar functions (remote homologues) or a similar sequence is not recorded in the database. Quantitative structure-activity relationships (QSAR) applied to protein sequences may allow an alignment-independent prediction of protein function. These sequences of QS AR-like methods often use 1D sequence numerical parameters as the input to seek sequence-function relationships. However, previous 2D representation of sequences may uncover useful higher-order information. In the work described here we calculated for the first time the spectral moments of a Markov matrix (MMM) associated with a 2D-HP-map of a protein sequence. We used MMMs values to characterize numerically 81 sequences of type III RNases and 133 proteins of a control group. We subsequently developed one MMM-QSAR and one classic hidden Markov model (HMM) based on the same data. The MMM-QSAR showed a discrimination power of RNAses from other proteins of 97.35{\%} without using alignment, which is a result as good as for the known HMM techniques. We also report for the first time the isolation of a new Pac1 protein (DQ647826) from Schizosaccharomyces pombe strain 428-4-1. The MMM-QSAR model predicts the new RNase III with the same accuracy as other classical alignment methods. Experimental assay of this protein confirms me predicted activity. The present results suggest that MMM-QSAR models may be used for protein function annotation avoiding sequence alignment with the same accuracy of classic HMM models.",
author = "Guillerm{\'i}n Ag{\"u}ero-Chap{\'i}n and Humberto Gonz{\'a}lez-D{\'i}az and {De La Riva}, Gustavo and Edrey Rodr{\'i}guez and Aminael S{\'a}nchez-Rodr{\'i}guez and Gianni Podda and Vazquez-Padron, {Roberto I}",
year = "2008",
month = "2",
day = "1",
doi = "10.1021/ci7003225",
language = "English",
volume = "48",
pages = "434--448",
journal = "Journal of Chemical Information and Computer Sciences",
issn = "0095-2338",
publisher = "American Chemical Society",
number = "2",

}

TY - JOUR

T1 - MMM-QSAR recognition of ribonucleases without alignment

T2 - Comparison with an HMM model and isolation from Schizosaccharomyces pombe, prediction, and experimental assay of a new sequence

AU - Agüero-Chapín, Guillermín

AU - González-Díaz, Humberto

AU - De La Riva, Gustavo

AU - Rodríguez, Edrey

AU - Sánchez-Rodríguez, Aminael

AU - Podda, Gianni

AU - Vazquez-Padron, Roberto I

PY - 2008/2/1

Y1 - 2008/2/1

N2 - The study of type III RNases constitutes an important area in molecular biology. It is known that the pac1+ gene encodes a particular RNase III that shares low amino acid similarity with other genes despite having a double-stranded ribonuclease activity. Bioinformatics memods based on sequence alignment may fail when there is a low amino acidic identity percentage between a query sequence and others with similar functions (remote homologues) or a similar sequence is not recorded in the database. Quantitative structure-activity relationships (QSAR) applied to protein sequences may allow an alignment-independent prediction of protein function. These sequences of QS AR-like methods often use 1D sequence numerical parameters as the input to seek sequence-function relationships. However, previous 2D representation of sequences may uncover useful higher-order information. In the work described here we calculated for the first time the spectral moments of a Markov matrix (MMM) associated with a 2D-HP-map of a protein sequence. We used MMMs values to characterize numerically 81 sequences of type III RNases and 133 proteins of a control group. We subsequently developed one MMM-QSAR and one classic hidden Markov model (HMM) based on the same data. The MMM-QSAR showed a discrimination power of RNAses from other proteins of 97.35% without using alignment, which is a result as good as for the known HMM techniques. We also report for the first time the isolation of a new Pac1 protein (DQ647826) from Schizosaccharomyces pombe strain 428-4-1. The MMM-QSAR model predicts the new RNase III with the same accuracy as other classical alignment methods. Experimental assay of this protein confirms me predicted activity. The present results suggest that MMM-QSAR models may be used for protein function annotation avoiding sequence alignment with the same accuracy of classic HMM models.

AB - The study of type III RNases constitutes an important area in molecular biology. It is known that the pac1+ gene encodes a particular RNase III that shares low amino acid similarity with other genes despite having a double-stranded ribonuclease activity. Bioinformatics memods based on sequence alignment may fail when there is a low amino acidic identity percentage between a query sequence and others with similar functions (remote homologues) or a similar sequence is not recorded in the database. Quantitative structure-activity relationships (QSAR) applied to protein sequences may allow an alignment-independent prediction of protein function. These sequences of QS AR-like methods often use 1D sequence numerical parameters as the input to seek sequence-function relationships. However, previous 2D representation of sequences may uncover useful higher-order information. In the work described here we calculated for the first time the spectral moments of a Markov matrix (MMM) associated with a 2D-HP-map of a protein sequence. We used MMMs values to characterize numerically 81 sequences of type III RNases and 133 proteins of a control group. We subsequently developed one MMM-QSAR and one classic hidden Markov model (HMM) based on the same data. The MMM-QSAR showed a discrimination power of RNAses from other proteins of 97.35% without using alignment, which is a result as good as for the known HMM techniques. We also report for the first time the isolation of a new Pac1 protein (DQ647826) from Schizosaccharomyces pombe strain 428-4-1. The MMM-QSAR model predicts the new RNase III with the same accuracy as other classical alignment methods. Experimental assay of this protein confirms me predicted activity. The present results suggest that MMM-QSAR models may be used for protein function annotation avoiding sequence alignment with the same accuracy of classic HMM models.

UR - http://www.scopus.com/inward/record.url?scp=41549153907&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=41549153907&partnerID=8YFLogxK

U2 - 10.1021/ci7003225

DO - 10.1021/ci7003225

M3 - Article

C2 - 18254616

AN - SCOPUS:41549153907

VL - 48

SP - 434

EP - 448

JO - Journal of Chemical Information and Computer Sciences

JF - Journal of Chemical Information and Computer Sciences

SN - 0095-2338

IS - 2

ER -