Automatic discovery of bioluminescent proteins from large protein databases

Tao Meng, Mei Ling Shyu, Hua Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Accurate annotation of different protein features becomes increasingly important in enriching gene ontology databases. In this work, we present a framework to predict the bioluminescence of any given protein sequence. Bioluminescent proteins are produced by living organisms and emit light naturally. Bioluminescence is deemed to have different functions in living organisms including camouflage, attraction to prey, communication, etc. In addition, bioluminescent proteins are also widely used as labels in assay development, reporters of gene expression, and imaging agents in biotechnology. Currently, bioluminescent proteins are mainly curated by researchers through experimental analysis, which is a time consuming process. However, the data mining based algorithms provide an efficient way to detect candidate bioluminescent proteins and suggest prioritization of the experimental work. While traditional alignment based algorithms (such as BLAST) show promising results in terms of sequence analysis, it suffers from the limitation that the testing sequence should show homology to the sequences in the available training data sets. In order to overcome such a limitation, our proposed framework uses a set of homology-independent features that are extracted directly from the primary sequences to represent the global physicochemical properties as well as the sequence order characteristics of proteins. In addition, a novel subspace-based data filtering algorithm is proposed to eliminate noise from the training data. One existing framework addressing the same problem was implemented and compared with our proposed framework. The experimental results indicate that our proposed framework shows promising performance. In addition, the proposed framework is generic and could easily be applied to annotations of other protein properties.

Original languageEnglish (US)
Title of host publicationProceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013
Pages355-362
Number of pages8
DOIs
StatePublished - Dec 1 2013
Event2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013 - Irvine, CA, United States
Duration: Sep 16 2013Sep 18 2013

Publication series

NameProceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013

Other

Other2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013
CountryUnited States
CityIrvine, CA
Period9/16/139/18/13

Keywords

  • Bioluminescence
  • Classification
  • Lasso
  • Subspace-based filtering

ASJC Scopus subject areas

  • Software

Fingerprint Dive into the research topics of 'Automatic discovery of bioluminescent proteins from large protein databases'. Together they form a unique fingerprint.

  • Cite this

    Meng, T., Shyu, M. L., & Zhang, H. (2013). Automatic discovery of bioluminescent proteins from large protein databases. In Proceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013 (pp. 355-362). [6693542] (Proceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013). https://doi.org/10.1109/ICSC.2013.67