Automatic discovery of bioluminescent proteins from large protein databases

Tao Meng, Mei-Ling Shyu, Hua Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Accurate annotation of different protein features becomes increasingly important in enriching gene ontology databases. In this work, we present a framework to predict the bioluminescence of any given protein sequence. Bioluminescent proteins are produced by living organisms and emit light naturally. Bioluminescence is deemed to have different functions in living organisms including camouflage, attraction to prey, communication, etc. In addition, bioluminescent proteins are also widely used as labels in assay development, reporters of gene expression, and imaging agents in biotechnology. Currently, bioluminescent proteins are mainly curated by researchers through experimental analysis, which is a time consuming process. However, the data mining based algorithms provide an efficient way to detect candidate bioluminescent proteins and suggest prioritization of the experimental work. While traditional alignment based algorithms (such as BLAST) show promising results in terms of sequence analysis, it suffers from the limitation that the testing sequence should show homology to the sequences in the available training data sets. In order to overcome such a limitation, our proposed framework uses a set of homology-independent features that are extracted directly from the primary sequences to represent the global physicochemical properties as well as the sequence order characteristics of proteins. In addition, a novel subspace-based data filtering algorithm is proposed to eliminate noise from the training data. One existing framework addressing the same problem was implemented and compared with our proposed framework. The experimental results indicate that our proposed framework shows promising performance. In addition, the proposed framework is generic and could easily be applied to annotations of other protein properties.

Original languageEnglish
Title of host publicationProceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013
Pages355-362
Number of pages8
DOIs
StatePublished - Dec 1 2013
Event2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013 - Irvine, CA, United States
Duration: Sep 16 2013Sep 18 2013

Other

Other2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013
CountryUnited States
CityIrvine, CA
Period9/16/139/18/13

Fingerprint

Proteins
Bioluminescence
Camouflage
Biotechnology
Gene expression
Data mining
Ontology
Labels
Assays
Genes
Imaging techniques
Communication
Testing

Keywords

  • Bioluminescence
  • Classification
  • Lasso
  • Subspace-based filtering

ASJC Scopus subject areas

  • Software

Cite this

Meng, T., Shyu, M-L., & Zhang, H. (2013). Automatic discovery of bioluminescent proteins from large protein databases. In Proceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013 (pp. 355-362). [6693542] https://doi.org/10.1109/ICSC.2013.67

Automatic discovery of bioluminescent proteins from large protein databases. / Meng, Tao; Shyu, Mei-Ling; Zhang, Hua.

Proceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013. 2013. p. 355-362 6693542.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Meng, T, Shyu, M-L & Zhang, H 2013, Automatic discovery of bioluminescent proteins from large protein databases. in Proceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013., 6693542, pp. 355-362, 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013, Irvine, CA, United States, 9/16/13. https://doi.org/10.1109/ICSC.2013.67
Meng T, Shyu M-L, Zhang H. Automatic discovery of bioluminescent proteins from large protein databases. In Proceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013. 2013. p. 355-362. 6693542 https://doi.org/10.1109/ICSC.2013.67
Meng, Tao ; Shyu, Mei-Ling ; Zhang, Hua. / Automatic discovery of bioluminescent proteins from large protein databases. Proceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013. 2013. pp. 355-362
@inproceedings{4767e6a1d0714087a04d1b93e1696706,
title = "Automatic discovery of bioluminescent proteins from large protein databases",
abstract = "Accurate annotation of different protein features becomes increasingly important in enriching gene ontology databases. In this work, we present a framework to predict the bioluminescence of any given protein sequence. Bioluminescent proteins are produced by living organisms and emit light naturally. Bioluminescence is deemed to have different functions in living organisms including camouflage, attraction to prey, communication, etc. In addition, bioluminescent proteins are also widely used as labels in assay development, reporters of gene expression, and imaging agents in biotechnology. Currently, bioluminescent proteins are mainly curated by researchers through experimental analysis, which is a time consuming process. However, the data mining based algorithms provide an efficient way to detect candidate bioluminescent proteins and suggest prioritization of the experimental work. While traditional alignment based algorithms (such as BLAST) show promising results in terms of sequence analysis, it suffers from the limitation that the testing sequence should show homology to the sequences in the available training data sets. In order to overcome such a limitation, our proposed framework uses a set of homology-independent features that are extracted directly from the primary sequences to represent the global physicochemical properties as well as the sequence order characteristics of proteins. In addition, a novel subspace-based data filtering algorithm is proposed to eliminate noise from the training data. One existing framework addressing the same problem was implemented and compared with our proposed framework. The experimental results indicate that our proposed framework shows promising performance. In addition, the proposed framework is generic and could easily be applied to annotations of other protein properties.",
keywords = "Bioluminescence, Classification, Lasso, Subspace-based filtering",
author = "Tao Meng and Mei-Ling Shyu and Hua Zhang",
year = "2013",
month = "12",
day = "1",
doi = "10.1109/ICSC.2013.67",
language = "English",
isbn = "9780769551197",
pages = "355--362",
booktitle = "Proceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013",

}

TY - GEN

T1 - Automatic discovery of bioluminescent proteins from large protein databases

AU - Meng, Tao

AU - Shyu, Mei-Ling

AU - Zhang, Hua

PY - 2013/12/1

Y1 - 2013/12/1

N2 - Accurate annotation of different protein features becomes increasingly important in enriching gene ontology databases. In this work, we present a framework to predict the bioluminescence of any given protein sequence. Bioluminescent proteins are produced by living organisms and emit light naturally. Bioluminescence is deemed to have different functions in living organisms including camouflage, attraction to prey, communication, etc. In addition, bioluminescent proteins are also widely used as labels in assay development, reporters of gene expression, and imaging agents in biotechnology. Currently, bioluminescent proteins are mainly curated by researchers through experimental analysis, which is a time consuming process. However, the data mining based algorithms provide an efficient way to detect candidate bioluminescent proteins and suggest prioritization of the experimental work. While traditional alignment based algorithms (such as BLAST) show promising results in terms of sequence analysis, it suffers from the limitation that the testing sequence should show homology to the sequences in the available training data sets. In order to overcome such a limitation, our proposed framework uses a set of homology-independent features that are extracted directly from the primary sequences to represent the global physicochemical properties as well as the sequence order characteristics of proteins. In addition, a novel subspace-based data filtering algorithm is proposed to eliminate noise from the training data. One existing framework addressing the same problem was implemented and compared with our proposed framework. The experimental results indicate that our proposed framework shows promising performance. In addition, the proposed framework is generic and could easily be applied to annotations of other protein properties.

AB - Accurate annotation of different protein features becomes increasingly important in enriching gene ontology databases. In this work, we present a framework to predict the bioluminescence of any given protein sequence. Bioluminescent proteins are produced by living organisms and emit light naturally. Bioluminescence is deemed to have different functions in living organisms including camouflage, attraction to prey, communication, etc. In addition, bioluminescent proteins are also widely used as labels in assay development, reporters of gene expression, and imaging agents in biotechnology. Currently, bioluminescent proteins are mainly curated by researchers through experimental analysis, which is a time consuming process. However, the data mining based algorithms provide an efficient way to detect candidate bioluminescent proteins and suggest prioritization of the experimental work. While traditional alignment based algorithms (such as BLAST) show promising results in terms of sequence analysis, it suffers from the limitation that the testing sequence should show homology to the sequences in the available training data sets. In order to overcome such a limitation, our proposed framework uses a set of homology-independent features that are extracted directly from the primary sequences to represent the global physicochemical properties as well as the sequence order characteristics of proteins. In addition, a novel subspace-based data filtering algorithm is proposed to eliminate noise from the training data. One existing framework addressing the same problem was implemented and compared with our proposed framework. The experimental results indicate that our proposed framework shows promising performance. In addition, the proposed framework is generic and could easily be applied to annotations of other protein properties.

KW - Bioluminescence

KW - Classification

KW - Lasso

KW - Subspace-based filtering

UR - http://www.scopus.com/inward/record.url?scp=84893971344&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893971344&partnerID=8YFLogxK

U2 - 10.1109/ICSC.2013.67

DO - 10.1109/ICSC.2013.67

M3 - Conference contribution

SN - 9780769551197

SP - 355

EP - 362

BT - Proceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013

ER -