Estimating gene expression from high-dimensional DNA methylation levels in cancer data: A bimodal unsupervised dimension reduction algorithm

Haluk Damgacioglu, Emrah Celik, Nurcin Celik

Research output: Contribution to journalArticlepeer-review

3 Scopus citations


Recent molecular and genetic studies have revealed the importance of DNA methylation, a key epigenetic mark, in regulating gene expression and the abnormal profiles of DNA methylation in various diseases including cancer. Here, unsupervised learning methods that are geared towards high-throughput DNA methylation analysis are used to extract useful information from high-dimensional genome wide methylation data in order to provide crucial insights for accurate early diagnosis and treatment of cancer. Herein, these methods are highly dependent on the performance of an earlier step of dimension reduction that aims to find the best subset of attributes to be retained for learning. Widely used algorithms in the literature commonly suffer from resulting in trivial cluster structures and failing to shed light on the relationship between DNA methylation and cancer types due to their myopic and arbitrary search mechanisms. Addressing this issue, we introduce a bimodal unsupervised dimension reduction algorithm (BOUNDER) that identifies the best subset of loci for downstream analysis considering the variability and redundancy across all the samples using bimodal modeling before it feeds into the learning method. BOUNDER models each locus as a bimodal representation using a piecewise linear function with two segments and filters the informative loci based on the fitted line characteristics. To the best of our knowledge, the work presented here is the first study that uses bimodal modeling in unsupervised learning in DNA methylation analysis. BOUNDER is tailored for DNA methylation analysis using a detailed parameter tuning analysis. The performance of BOUNDER is benchmarked against those of widely used conventional algorithms using real lung, breast, kidney, and urological cancer datasets obtained from Gene Expression Omnibus in terms of their accuracies in hierarchical clustering and k-means clustering. Computational experiments reveal that BOUNDER outperforms the PCA and filtering based approach by providing the highest accuracy in 6 out of 9 datasets while providing more interpretable results through a correlation analysis. The BOUNDER algorithm is also shown to be more robust when compared to multiple other conventional dimension reduction algorithms across different datasets.

Original languageEnglish (US)
Pages (from-to)348-357
Number of pages10
JournalComputers and Industrial Engineering
StatePublished - Apr 2019


  • Beta distribution
  • Big data analytics
  • DNA methylation data
  • Dimension reduction algorithm
  • Piece linear curve fitting
  • Values

ASJC Scopus subject areas

  • Computer Science(all)
  • Engineering(all)


Dive into the research topics of 'Estimating gene expression from high-dimensional DNA methylation levels in cancer data: A bimodal unsupervised dimension reduction algorithm'. Together they form a unique fingerprint.

Cite this