Abstract
The rapid growth in DNA and protein sequencing techniques over the last decade boosted the availability and scale of mutations data, and therefore the necessity of developing automated approaches to predict driver mutations arises. Identifying driver mutations is essential to better understand and measure cancer progression and thus enable proper diagnosis and targeted treatment of cancer. Here, we present a scalable machine learning based approach to identify driver missense mutations. The proposed approach builds on and expands our previously proposed framework. A group of independent parallel classifiers where each classifier handles a single set of features can be deployed. Then, a model fusion module combines the classifiers' outputs to produce a final mutation label. Each classifier is trained and validated independently with its corresponding feature set. Feature sets undergo a feature selection process to filter out low significance features. In this paper, four protein sequence-level feature sets are leveraged, namely two amino acid indices (AAIndex1 and AAIndex2) feature sets, one pseudo amino acid composition (PseAAC) feature set, and one feature set generated using wavelet analysis. The proposed approach is extensible to consume new additional features with the minimal impact on the computational complexity due to the parallel design of its components. Experiments were performed to assess the performance of the proposed approach and to compare it with other similar approaches.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science, IRI 2018 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 419-425 |
Number of pages | 7 |
ISBN (Print) | 9781538626597 |
DOIs | |
State | Published - Aug 2 2018 |
Event | 19th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2018 - Salt Lake City, United States Duration: Jul 7 2018 → Jul 9 2018 |
Other
Other | 19th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2018 |
---|---|
Country/Territory | United States |
City | Salt Lake City |
Period | 7/7/18 → 7/9/18 |
Keywords
- Cancer genome
- Driver mutation
- Passenger mutation
ASJC Scopus subject areas
- Computer Networks and Communications
- Software
- Artificial Intelligence
- Information Systems and Management
- Safety, Risk, Reliability and Quality
- Public Administration