MR2

A two-stage feature selection algorithm in high-throughput methylation data for max-relevance and min-redundancy

Haluk Damgacioglu, Nurcin Celik, Emrah Celik

Research output: Contribution to conferencePaper

Abstract

Recent advances reveal that DNA methylation plays an important role in regulating different genome functions where anomalous methylation levels are associated with various cancer types. Feature selection algorithms are geared towards high-throughput analysis of DNA methylation to help identify idiosyncratic DNA methylation profiles associated with cancer types and subtypes. In high dimensional and highly correlated DNA methylation data, feature selection algorithms aim at selecting an efficient and comprehensive feature set to better capture characteristics of phenotypes. In this work, we introduce a two-stage feature selection algorithm (MR2) based on maximum relevance and minimum redundancy criteria. The features that satisfy the relevance conditions are filtered in the first stage, in the second stage, the final subset of loci is selected to reach minimal redundancy by using a k-medoids clustering algorithm that embeds a succinct uncertainty measure score. The performance of the proposed feature selection algorithm is benchmarked against those of the principal component analysis and four other commonly used filtering methods using lung and breast cancer datasets obtained from Gene Expression Omnibus in terms of their classification errors in support vector machine classifiers. Our MR2 algorithm outperforms these filtering based algorithms while at the same time providing more interpretable results.

Original languageEnglish (US)
Pages1169-1174
Number of pages6
StatePublished - Jan 1 2018
Event2018 Institute of Industrial and Systems Engineers Annual Conference and Expo, IISE 2018 - Orlando, United States
Duration: May 19 2018May 22 2018

Other

Other2018 Institute of Industrial and Systems Engineers Annual Conference and Expo, IISE 2018
CountryUnited States
CityOrlando
Period5/19/185/22/18

Fingerprint

Methylation
Redundancy
Feature extraction
Throughput
Gene expression
Clustering algorithms
Principal component analysis
Support vector machines
Classifiers
Genes
DNA Methylation

Keywords

  • Beta distribution
  • Classification
  • DNA methylation
  • Feature selection
  • Minimal redundancy

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Industrial and Manufacturing Engineering

Cite this

Damgacioglu, H., Celik, N., & Celik, E. (2018). MR2: A two-stage feature selection algorithm in high-throughput methylation data for max-relevance and min-redundancy. 1169-1174. Paper presented at 2018 Institute of Industrial and Systems Engineers Annual Conference and Expo, IISE 2018, Orlando, United States.

MR2 : A two-stage feature selection algorithm in high-throughput methylation data for max-relevance and min-redundancy. / Damgacioglu, Haluk; Celik, Nurcin; Celik, Emrah.

2018. 1169-1174 Paper presented at 2018 Institute of Industrial and Systems Engineers Annual Conference and Expo, IISE 2018, Orlando, United States.

Research output: Contribution to conferencePaper

Damgacioglu, H, Celik, N & Celik, E 2018, 'MR2: A two-stage feature selection algorithm in high-throughput methylation data for max-relevance and min-redundancy' Paper presented at 2018 Institute of Industrial and Systems Engineers Annual Conference and Expo, IISE 2018, Orlando, United States, 5/19/18 - 5/22/18, pp. 1169-1174.
Damgacioglu H, Celik N, Celik E. MR2: A two-stage feature selection algorithm in high-throughput methylation data for max-relevance and min-redundancy. 2018. Paper presented at 2018 Institute of Industrial and Systems Engineers Annual Conference and Expo, IISE 2018, Orlando, United States.
Damgacioglu, Haluk ; Celik, Nurcin ; Celik, Emrah. / MR2 : A two-stage feature selection algorithm in high-throughput methylation data for max-relevance and min-redundancy. Paper presented at 2018 Institute of Industrial and Systems Engineers Annual Conference and Expo, IISE 2018, Orlando, United States.6 p.
@conference{e480bf7ace9946d78ef86d7ce830ee4a,
title = "MR2: A two-stage feature selection algorithm in high-throughput methylation data for max-relevance and min-redundancy",
abstract = "Recent advances reveal that DNA methylation plays an important role in regulating different genome functions where anomalous methylation levels are associated with various cancer types. Feature selection algorithms are geared towards high-throughput analysis of DNA methylation to help identify idiosyncratic DNA methylation profiles associated with cancer types and subtypes. In high dimensional and highly correlated DNA methylation data, feature selection algorithms aim at selecting an efficient and comprehensive feature set to better capture characteristics of phenotypes. In this work, we introduce a two-stage feature selection algorithm (MR2) based on maximum relevance and minimum redundancy criteria. The features that satisfy the relevance conditions are filtered in the first stage, in the second stage, the final subset of loci is selected to reach minimal redundancy by using a k-medoids clustering algorithm that embeds a succinct uncertainty measure score. The performance of the proposed feature selection algorithm is benchmarked against those of the principal component analysis and four other commonly used filtering methods using lung and breast cancer datasets obtained from Gene Expression Omnibus in terms of their classification errors in support vector machine classifiers. Our MR2 algorithm outperforms these filtering based algorithms while at the same time providing more interpretable results.",
keywords = "Beta distribution, Classification, DNA methylation, Feature selection, Minimal redundancy",
author = "Haluk Damgacioglu and Nurcin Celik and Emrah Celik",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
pages = "1169--1174",
note = "2018 Institute of Industrial and Systems Engineers Annual Conference and Expo, IISE 2018 ; Conference date: 19-05-2018 Through 22-05-2018",

}

TY - CONF

T1 - MR2

T2 - A two-stage feature selection algorithm in high-throughput methylation data for max-relevance and min-redundancy

AU - Damgacioglu, Haluk

AU - Celik, Nurcin

AU - Celik, Emrah

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Recent advances reveal that DNA methylation plays an important role in regulating different genome functions where anomalous methylation levels are associated with various cancer types. Feature selection algorithms are geared towards high-throughput analysis of DNA methylation to help identify idiosyncratic DNA methylation profiles associated with cancer types and subtypes. In high dimensional and highly correlated DNA methylation data, feature selection algorithms aim at selecting an efficient and comprehensive feature set to better capture characteristics of phenotypes. In this work, we introduce a two-stage feature selection algorithm (MR2) based on maximum relevance and minimum redundancy criteria. The features that satisfy the relevance conditions are filtered in the first stage, in the second stage, the final subset of loci is selected to reach minimal redundancy by using a k-medoids clustering algorithm that embeds a succinct uncertainty measure score. The performance of the proposed feature selection algorithm is benchmarked against those of the principal component analysis and four other commonly used filtering methods using lung and breast cancer datasets obtained from Gene Expression Omnibus in terms of their classification errors in support vector machine classifiers. Our MR2 algorithm outperforms these filtering based algorithms while at the same time providing more interpretable results.

AB - Recent advances reveal that DNA methylation plays an important role in regulating different genome functions where anomalous methylation levels are associated with various cancer types. Feature selection algorithms are geared towards high-throughput analysis of DNA methylation to help identify idiosyncratic DNA methylation profiles associated with cancer types and subtypes. In high dimensional and highly correlated DNA methylation data, feature selection algorithms aim at selecting an efficient and comprehensive feature set to better capture characteristics of phenotypes. In this work, we introduce a two-stage feature selection algorithm (MR2) based on maximum relevance and minimum redundancy criteria. The features that satisfy the relevance conditions are filtered in the first stage, in the second stage, the final subset of loci is selected to reach minimal redundancy by using a k-medoids clustering algorithm that embeds a succinct uncertainty measure score. The performance of the proposed feature selection algorithm is benchmarked against those of the principal component analysis and four other commonly used filtering methods using lung and breast cancer datasets obtained from Gene Expression Omnibus in terms of their classification errors in support vector machine classifiers. Our MR2 algorithm outperforms these filtering based algorithms while at the same time providing more interpretable results.

KW - Beta distribution

KW - Classification

KW - DNA methylation

KW - Feature selection

KW - Minimal redundancy

UR - http://www.scopus.com/inward/record.url?scp=85054019800&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054019800&partnerID=8YFLogxK

M3 - Paper

SP - 1169

EP - 1174

ER -