MR2: A two-stage feature selection algorithm in high-throughput methylation data for max-relevance and min-redundancy

Haluk Damgacioglu, Nurcin Celik, Emrah Celik

Research output: Contribution to conferencePaper

Abstract

Recent advances reveal that DNA methylation plays an important role in regulating different genome functions where anomalous methylation levels are associated with various cancer types. Feature selection algorithms are geared towards high-throughput analysis of DNA methylation to help identify idiosyncratic DNA methylation profiles associated with cancer types and subtypes. In high dimensional and highly correlated DNA methylation data, feature selection algorithms aim at selecting an efficient and comprehensive feature set to better capture characteristics of phenotypes. In this work, we introduce a two-stage feature selection algorithm (MR2) based on maximum relevance and minimum redundancy criteria. The features that satisfy the relevance conditions are filtered in the first stage, in the second stage, the final subset of loci is selected to reach minimal redundancy by using a k-medoids clustering algorithm that embeds a succinct uncertainty measure score. The performance of the proposed feature selection algorithm is benchmarked against those of the principal component analysis and four other commonly used filtering methods using lung and breast cancer datasets obtained from Gene Expression Omnibus in terms of their classification errors in support vector machine classifiers. Our MR2 algorithm outperforms these filtering based algorithms while at the same time providing more interpretable results.

Original languageEnglish (US)
Pages1169-1174
Number of pages6
StatePublished - Jan 1 2018
Event2018 Institute of Industrial and Systems Engineers Annual Conference and Expo, IISE 2018 - Orlando, United States
Duration: May 19 2018May 22 2018

Other

Other2018 Institute of Industrial and Systems Engineers Annual Conference and Expo, IISE 2018
CountryUnited States
CityOrlando
Period5/19/185/22/18

    Fingerprint

Keywords

  • Beta distribution
  • Classification
  • DNA methylation
  • Feature selection
  • Minimal redundancy

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Industrial and Manufacturing Engineering

Cite this

Damgacioglu, H., Celik, N., & Celik, E. (2018). MR2: A two-stage feature selection algorithm in high-throughput methylation data for max-relevance and min-redundancy. 1169-1174. Paper presented at 2018 Institute of Industrial and Systems Engineers Annual Conference and Expo, IISE 2018, Orlando, United States.