A multi-model based approach for driver missense identification

Ahmed T. Soliman, Mei-Ling Shyu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The rapid growth in DNA and protein sequencing techniques over the last decade boosted the availability and scale of mutations data, and therefore the necessity of developing automated approaches to predict driver mutations arises. Identifying driver mutations is essential to better understand and measure cancer progression and thus enable proper diagnosis and targeted treatment of cancer. Here, we present a scalable machine learning based approach to identify driver missense mutations. The proposed approach builds on and expands our previously proposed framework. A group of independent parallel classifiers where each classifier handles a single set of features can be deployed. Then, a model fusion module combines the classifiers' outputs to produce a final mutation label. Each classifier is trained and validated independently with its corresponding feature set. Feature sets undergo a feature selection process to filter out low significance features. In this paper, four protein sequence-level feature sets are leveraged, namely two amino acid indices (AAIndex1 and AAIndex2) feature sets, one pseudo amino acid composition (PseAAC) feature set, and one feature set generated using wavelet analysis. The proposed approach is extensible to consume new additional features with the minimal impact on the computational complexity due to the parallel design of its components. Experiments were performed to assess the performance of the proposed approach and to compare it with other similar approaches.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science, IRI 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages419-425
Number of pages7
ISBN (Print)9781538626597
DOIs
StatePublished - Aug 2 2018
Event19th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2018 - Salt Lake City, United States
Duration: Jul 7 2018Jul 9 2018

Other

Other19th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2018
CountryUnited States
CitySalt Lake City
Period7/7/187/9/18

Keywords

  • Cancer genome
  • Driver mutation
  • Passenger mutation

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software
  • Artificial Intelligence
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Public Administration

Fingerprint Dive into the research topics of 'A multi-model based approach for driver missense identification'. Together they form a unique fingerprint.

  • Cite this

    Soliman, A. T., & Shyu, M-L. (2018). A multi-model based approach for driver missense identification. In Proceedings - 2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science, IRI 2018 (pp. 419-425). [8424739] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IRI.2018.00068