TY - JOUR
T1 - Local sparse bump hunting
AU - Dazard, Jean Eudes
AU - Rao, J. Sunil
N1 - Funding Information:
The authors are grateful to two anonymous referees, the associate editor, and the editor for valuable comments and suggestions. This research was conducted in part while J.-E. Dazard was a postdoctoral fellow in the Division of Biostatistics, mentored by J. Sunil Rao under NIH grant R25-CA04186. J. Sunil Rao was partially supported by NSF grant DMS-0405072 and by NIH grant K25-CA89867. Additional support came from grants of the Case Comprehensive Cancer Center (NIH-National Cancer Institute P30-CA043703) and the Clinical and Translational Science Award (NIH-National Center for Research Resources UL1-RR024989).
PY - 2010/12
Y1 - 2010/12
N2 - The search for structures in real datasets, for example, in the form of bumps, components, classes, or clusters, is important as these often reveal underlying phenomena leading to scientific discoveries. One of these tasks, known as bump hunting, is to locate domains of a multidimensional input space where the target function assumes local maxima without prespecifying their total number. A number of related methods already exist, yet are challenged in the context of high-dimensional data. We introduce a novel supervised and multivariate bump hunting strategy for exploring modes or classes of a target function of many continuous variables. This addresses the issues of correlation, interpretability, and high-dimensionality (p ≤case), while making minimal assumptions. The method is based upon a divide and conquer strategy, combining a treebased method, a dimension reduction technique, and the Patient Rule InductionMethod (PRIM). Important to this task, we show how to estimate the PRIM meta-parameters. Using accuracy evaluation procedures such as cross-validation and ROC analysis, we show empirically how the method outperforms a naive PRIM as well as competitive nonparametric supervised and unsupervised methods in the problem of class discovery. The method has practical application especially in the case of noisy high-throughput data. It is applied to a class discovery problem in a colon cancer microarray dataset aimed at identifying tumor subtypes in the metastatic stage. Supplemental Materials are available online.
AB - The search for structures in real datasets, for example, in the form of bumps, components, classes, or clusters, is important as these often reveal underlying phenomena leading to scientific discoveries. One of these tasks, known as bump hunting, is to locate domains of a multidimensional input space where the target function assumes local maxima without prespecifying their total number. A number of related methods already exist, yet are challenged in the context of high-dimensional data. We introduce a novel supervised and multivariate bump hunting strategy for exploring modes or classes of a target function of many continuous variables. This addresses the issues of correlation, interpretability, and high-dimensionality (p ≤case), while making minimal assumptions. The method is based upon a divide and conquer strategy, combining a treebased method, a dimension reduction technique, and the Patient Rule InductionMethod (PRIM). Important to this task, we show how to estimate the PRIM meta-parameters. Using accuracy evaluation procedures such as cross-validation and ROC analysis, we show empirically how the method outperforms a naive PRIM as well as competitive nonparametric supervised and unsupervised methods in the problem of class discovery. The method has practical application especially in the case of noisy high-throughput data. It is applied to a class discovery problem in a colon cancer microarray dataset aimed at identifying tumor subtypes in the metastatic stage. Supplemental Materials are available online.
KW - Classification
KW - Clustering
KW - Density estimation
KW - Mode/class discovery
KW - Patient rule induction method
KW - Sparse principal components
UR - http://www.scopus.com/inward/record.url?scp=79952801963&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79952801963&partnerID=8YFLogxK
U2 - 10.1198/jcgs.2010.09029
DO - 10.1198/jcgs.2010.09029
M3 - Article
AN - SCOPUS:79952801963
VL - 19
SP - 900
EP - 929
JO - Journal of Computational and Graphical Statistics
JF - Journal of Computational and Graphical Statistics
SN - 1061-8600
IS - 4
ER -