Sparse Convex Clustering

Binhuan Wang, Yilong Zhang, Wei Sun, Yixin Fang

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Convex clustering, a convex relaxation of k-means clustering and hierarchical clustering, has drawn recent attentions since it nicely addresses the instability issue of traditional nonconvex clustering methods. Although its computational and statistical properties have been recently studied, the performance of convex clustering has not yet been investigated in the high-dimensional clustering scenario, where the data contains a large number of features and many of them carry no information about the clustering structure. In this article, we demonstrate that the performance of convex clustering could be distorted when the uninformative features are included in the clustering. To overcome it, we introduce a new clustering method, referred to as Sparse Convex Clustering, to simultaneously cluster observations and conduct feature selection. The key idea is to formulate convex clustering in a form of regularization, with an adaptive group-lasso penalty term on cluster centers. To optimally balance the trade-off between the cluster fitting and sparsity, a tuning criterion based on clustering stability is developed. Theoretically, we obtain a finite sample error bound for our estimator and further establish its variable selection consistency. The effectiveness of the proposed method is examined through a variety of numerical experiments and a real data application. Supplementary material for this article is available online.

Original languageEnglish (US)
Pages (from-to)393-403
Number of pages11
JournalJournal of Computational and Graphical Statistics
Volume27
Issue number2
DOIs
StatePublished - Apr 3 2018

Fingerprint

Clustering
Clustering Methods
Convex Relaxation
Lasso
K-means Clustering
Hierarchical Clustering
Variable Selection
Sparsity
Feature Selection
Statistical property
Error Bounds
Penalty
Tuning
Regularization
High-dimensional
Trade-offs
Numerical Experiment
Estimator
Scenarios
Term

Keywords

  • Convex clustering
  • Finite sample error
  • Group LASSO
  • High-dimensionality
  • Sparsity

ASJC Scopus subject areas

  • Statistics and Probability
  • Discrete Mathematics and Combinatorics
  • Statistics, Probability and Uncertainty

Cite this

Sparse Convex Clustering. / Wang, Binhuan; Zhang, Yilong; Sun, Wei; Fang, Yixin.

In: Journal of Computational and Graphical Statistics, Vol. 27, No. 2, 03.04.2018, p. 393-403.

Research output: Contribution to journalArticle

Wang, Binhuan ; Zhang, Yilong ; Sun, Wei ; Fang, Yixin. / Sparse Convex Clustering. In: Journal of Computational and Graphical Statistics. 2018 ; Vol. 27, No. 2. pp. 393-403.
@article{97696454fb294eecaef6bd271d5d95be,
title = "Sparse Convex Clustering",
abstract = "Convex clustering, a convex relaxation of k-means clustering and hierarchical clustering, has drawn recent attentions since it nicely addresses the instability issue of traditional nonconvex clustering methods. Although its computational and statistical properties have been recently studied, the performance of convex clustering has not yet been investigated in the high-dimensional clustering scenario, where the data contains a large number of features and many of them carry no information about the clustering structure. In this article, we demonstrate that the performance of convex clustering could be distorted when the uninformative features are included in the clustering. To overcome it, we introduce a new clustering method, referred to as Sparse Convex Clustering, to simultaneously cluster observations and conduct feature selection. The key idea is to formulate convex clustering in a form of regularization, with an adaptive group-lasso penalty term on cluster centers. To optimally balance the trade-off between the cluster fitting and sparsity, a tuning criterion based on clustering stability is developed. Theoretically, we obtain a finite sample error bound for our estimator and further establish its variable selection consistency. The effectiveness of the proposed method is examined through a variety of numerical experiments and a real data application. Supplementary material for this article is available online.",
keywords = "Convex clustering, Finite sample error, Group LASSO, High-dimensionality, Sparsity",
author = "Binhuan Wang and Yilong Zhang and Wei Sun and Yixin Fang",
year = "2018",
month = "4",
day = "3",
doi = "10.1080/10618600.2017.1377081",
language = "English (US)",
volume = "27",
pages = "393--403",
journal = "Journal of Computational and Graphical Statistics",
issn = "1061-8600",
publisher = "American Statistical Association",
number = "2",

}

TY - JOUR

T1 - Sparse Convex Clustering

AU - Wang, Binhuan

AU - Zhang, Yilong

AU - Sun, Wei

AU - Fang, Yixin

PY - 2018/4/3

Y1 - 2018/4/3

N2 - Convex clustering, a convex relaxation of k-means clustering and hierarchical clustering, has drawn recent attentions since it nicely addresses the instability issue of traditional nonconvex clustering methods. Although its computational and statistical properties have been recently studied, the performance of convex clustering has not yet been investigated in the high-dimensional clustering scenario, where the data contains a large number of features and many of them carry no information about the clustering structure. In this article, we demonstrate that the performance of convex clustering could be distorted when the uninformative features are included in the clustering. To overcome it, we introduce a new clustering method, referred to as Sparse Convex Clustering, to simultaneously cluster observations and conduct feature selection. The key idea is to formulate convex clustering in a form of regularization, with an adaptive group-lasso penalty term on cluster centers. To optimally balance the trade-off between the cluster fitting and sparsity, a tuning criterion based on clustering stability is developed. Theoretically, we obtain a finite sample error bound for our estimator and further establish its variable selection consistency. The effectiveness of the proposed method is examined through a variety of numerical experiments and a real data application. Supplementary material for this article is available online.

AB - Convex clustering, a convex relaxation of k-means clustering and hierarchical clustering, has drawn recent attentions since it nicely addresses the instability issue of traditional nonconvex clustering methods. Although its computational and statistical properties have been recently studied, the performance of convex clustering has not yet been investigated in the high-dimensional clustering scenario, where the data contains a large number of features and many of them carry no information about the clustering structure. In this article, we demonstrate that the performance of convex clustering could be distorted when the uninformative features are included in the clustering. To overcome it, we introduce a new clustering method, referred to as Sparse Convex Clustering, to simultaneously cluster observations and conduct feature selection. The key idea is to formulate convex clustering in a form of regularization, with an adaptive group-lasso penalty term on cluster centers. To optimally balance the trade-off between the cluster fitting and sparsity, a tuning criterion based on clustering stability is developed. Theoretically, we obtain a finite sample error bound for our estimator and further establish its variable selection consistency. The effectiveness of the proposed method is examined through a variety of numerical experiments and a real data application. Supplementary material for this article is available online.

KW - Convex clustering

KW - Finite sample error

KW - Group LASSO

KW - High-dimensionality

KW - Sparsity

UR - http://www.scopus.com/inward/record.url?scp=85047125874&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85047125874&partnerID=8YFLogxK

U2 - 10.1080/10618600.2017.1377081

DO - 10.1080/10618600.2017.1377081

M3 - Article

VL - 27

SP - 393

EP - 403

JO - Journal of Computational and Graphical Statistics

JF - Journal of Computational and Graphical Statistics

SN - 1061-8600

IS - 2

ER -