Intra-cluster distance minimization in DNA methylation analysis using an advanced Tabu-based iterative k-medoids clustering algorithm (T-CLUST)

Haluk Damgacioglu, Emrah Celik, Nurcin Celik

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Recent advances in DNA methylation profiling have paved the way for understanding the underlying epigenetic mechanisms of various diseases such as cancer. While conventional distance-based clustering algorithms (e.g., hierarchical and k-means clustering) have been heavily used in such profiling owing to their speed in conduct of high-throughput analysis, these methods commonly converge to suboptimal solutions and/or trivial clusters due to their greedy search nature. Hence, methodologies are needed to improve the quality of clusters formed by these algorithms without sacrificing from their speed. In this study, we introduce three related algorithms for a complete high-throughput methylation analysis: a variance-based dimension reduction algorithm to handle high-dimensionality in data, an outlier detection algorithm to identify the outliers of data, and an advanced Tabu-based iterative k-medoids clustering algorithm (T-CLUST) to reduce the impact of initial solutions on the performance of conventional k-medoids algorithm. The performance of the proposed algorithms is demonstrated on nine different real DNA methylation datasets obtained from the Gene Expression Omnibus DataSets database. The accuracy of the cluster identification obtained by our proposed algorithms is higher than those of hierarchical and k-means clustering, as well as the conventional methods. The algorithms are implemented in MATLAB, and available at: http://www.coe.miami.edu/simlab/tclust.html.

Original languageEnglish (US)
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
DOIs
StateAccepted/In press - Jan 1 2018

Fingerprint

Taboo
DNA Methylation
Clustering algorithms
Clustering Algorithm
Cluster Analysis
K-means Clustering
Profiling
High Throughput
Throughput
Methylation
Outlier Detection
Dimension Reduction
Gene expression
DNA Fingerprinting
MATLAB
Outlier
Gene Expression
Dimensionality
Cancer
Trivial

Keywords

  • Biomarker identification
  • clustering
  • DNA methylation analysis
  • k-medoids clustering
  • outlier detection

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Applied Mathematics

Cite this

@article{19bac1d9915144939d7723fcebb2ce21,
title = "Intra-cluster distance minimization in DNA methylation analysis using an advanced Tabu-based iterative k-medoids clustering algorithm (T-CLUST)",
abstract = "Recent advances in DNA methylation profiling have paved the way for understanding the underlying epigenetic mechanisms of various diseases such as cancer. While conventional distance-based clustering algorithms (e.g., hierarchical and k-means clustering) have been heavily used in such profiling owing to their speed in conduct of high-throughput analysis, these methods commonly converge to suboptimal solutions and/or trivial clusters due to their greedy search nature. Hence, methodologies are needed to improve the quality of clusters formed by these algorithms without sacrificing from their speed. In this study, we introduce three related algorithms for a complete high-throughput methylation analysis: a variance-based dimension reduction algorithm to handle high-dimensionality in data, an outlier detection algorithm to identify the outliers of data, and an advanced Tabu-based iterative k-medoids clustering algorithm (T-CLUST) to reduce the impact of initial solutions on the performance of conventional k-medoids algorithm. The performance of the proposed algorithms is demonstrated on nine different real DNA methylation datasets obtained from the Gene Expression Omnibus DataSets database. The accuracy of the cluster identification obtained by our proposed algorithms is higher than those of hierarchical and k-means clustering, as well as the conventional methods. The algorithms are implemented in MATLAB, and available at: http://www.coe.miami.edu/simlab/tclust.html.",
keywords = "Biomarker identification, clustering, DNA methylation analysis, k-medoids clustering, outlier detection",
author = "Haluk Damgacioglu and Emrah Celik and Nurcin Celik",
year = "2018",
month = "1",
day = "1",
doi = "10.1109/TCBB.2018.2886006",
language = "English (US)",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Intra-cluster distance minimization in DNA methylation analysis using an advanced Tabu-based iterative k-medoids clustering algorithm (T-CLUST)

AU - Damgacioglu, Haluk

AU - Celik, Emrah

AU - Celik, Nurcin

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Recent advances in DNA methylation profiling have paved the way for understanding the underlying epigenetic mechanisms of various diseases such as cancer. While conventional distance-based clustering algorithms (e.g., hierarchical and k-means clustering) have been heavily used in such profiling owing to their speed in conduct of high-throughput analysis, these methods commonly converge to suboptimal solutions and/or trivial clusters due to their greedy search nature. Hence, methodologies are needed to improve the quality of clusters formed by these algorithms without sacrificing from their speed. In this study, we introduce three related algorithms for a complete high-throughput methylation analysis: a variance-based dimension reduction algorithm to handle high-dimensionality in data, an outlier detection algorithm to identify the outliers of data, and an advanced Tabu-based iterative k-medoids clustering algorithm (T-CLUST) to reduce the impact of initial solutions on the performance of conventional k-medoids algorithm. The performance of the proposed algorithms is demonstrated on nine different real DNA methylation datasets obtained from the Gene Expression Omnibus DataSets database. The accuracy of the cluster identification obtained by our proposed algorithms is higher than those of hierarchical and k-means clustering, as well as the conventional methods. The algorithms are implemented in MATLAB, and available at: http://www.coe.miami.edu/simlab/tclust.html.

AB - Recent advances in DNA methylation profiling have paved the way for understanding the underlying epigenetic mechanisms of various diseases such as cancer. While conventional distance-based clustering algorithms (e.g., hierarchical and k-means clustering) have been heavily used in such profiling owing to their speed in conduct of high-throughput analysis, these methods commonly converge to suboptimal solutions and/or trivial clusters due to their greedy search nature. Hence, methodologies are needed to improve the quality of clusters formed by these algorithms without sacrificing from their speed. In this study, we introduce three related algorithms for a complete high-throughput methylation analysis: a variance-based dimension reduction algorithm to handle high-dimensionality in data, an outlier detection algorithm to identify the outliers of data, and an advanced Tabu-based iterative k-medoids clustering algorithm (T-CLUST) to reduce the impact of initial solutions on the performance of conventional k-medoids algorithm. The performance of the proposed algorithms is demonstrated on nine different real DNA methylation datasets obtained from the Gene Expression Omnibus DataSets database. The accuracy of the cluster identification obtained by our proposed algorithms is higher than those of hierarchical and k-means clustering, as well as the conventional methods. The algorithms are implemented in MATLAB, and available at: http://www.coe.miami.edu/simlab/tclust.html.

KW - Biomarker identification

KW - clustering

KW - DNA methylation analysis

KW - k-medoids clustering

KW - outlier detection

UR - http://www.scopus.com/inward/record.url?scp=85058180996&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058180996&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2018.2886006

DO - 10.1109/TCBB.2018.2886006

M3 - Article

AN - SCOPUS:85058180996

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

ER -