Cross-validation and peeling strategies for survival bump hunting using recursive peeling methods

Jean Eudes Dazard, Michael Choe, Michael Leblanc, Jonnagadda S Rao

Research output: Contribution to journalArticle

Abstract

We introduce a framework to build a survival/risk bump hunting model with a censored time-to-event response. Our survival bump hunting (SBH) method is based on a recursive peeling procedure that uses a specific survival peeling criterion derived from non-/semi-parametric statistics such as the hazard ratio, the log-rank test or the Nelson-Aalen estimator. To optimize the tuning parameter of the model and validate it, we introduce an objective function based on survival- or prediction-error statistics, such as the log-rank test and the concordance error rate. We also describe two alternative cross-validation techniques adapted for the joint task of decision-rule making by recursive peeling and survival estimation. Numerical analyses show the importance of replicated cross-validation and the differences between criteria and techniques in both low- and high-dimensional settings. Although several non-parametric survival models exist, none address the problem of directly identifying local extrema. We show how SBH efficiently estimates extreme survival/risk subgroups, unlike other models. This provides an insight into the behavior of commonly used models and suggests alternatives to be adopted in practice. Finally, our SBH framework was applied to a clinical dataset. In it, we identified subsets of patients characterized by clinical and demographic covariates with a distinct extreme survival outcome for which tailored medical interventions could be made. An R package Patient Rule Induction Method in Survival, Regression and Classification settings (PRIMsrc) is available on Comprehensive R Archive Network (CRAN) and GitHub.

Original languageEnglish (US)
Pages (from-to)12-42
Number of pages31
JournalStatistical Analysis and Data Mining
Volume9
Issue number1
DOIs
StatePublished - Feb 1 2016

Fingerprint

Recursive Method
Peeling
Cross-validation
Error statistics
Log-rank Test
Hazards
Extremes
Tuning
Nelson-Aalen Estimator
Strategy
Statistics
Rule Induction
Survival Model
Concordance
Alternatives
Nonparametric Model
Parameter Tuning
Prediction Error
Decision Rules
Extremum

Keywords

  • Bump hunting
  • Cross-validation
  • Exploratory survival/risk analysis
  • Non-parametric method
  • Patient rule induction method
  • Survival/risk estimation and prediction

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Analysis

Cite this

Cross-validation and peeling strategies for survival bump hunting using recursive peeling methods. / Dazard, Jean Eudes; Choe, Michael; Leblanc, Michael; Rao, Jonnagadda S.

In: Statistical Analysis and Data Mining, Vol. 9, No. 1, 01.02.2016, p. 12-42.

Research output: Contribution to journalArticle

@article{6c0bc7f785a3462bbdc9114f99cd6e86,
title = "Cross-validation and peeling strategies for survival bump hunting using recursive peeling methods",
abstract = "We introduce a framework to build a survival/risk bump hunting model with a censored time-to-event response. Our survival bump hunting (SBH) method is based on a recursive peeling procedure that uses a specific survival peeling criterion derived from non-/semi-parametric statistics such as the hazard ratio, the log-rank test or the Nelson-Aalen estimator. To optimize the tuning parameter of the model and validate it, we introduce an objective function based on survival- or prediction-error statistics, such as the log-rank test and the concordance error rate. We also describe two alternative cross-validation techniques adapted for the joint task of decision-rule making by recursive peeling and survival estimation. Numerical analyses show the importance of replicated cross-validation and the differences between criteria and techniques in both low- and high-dimensional settings. Although several non-parametric survival models exist, none address the problem of directly identifying local extrema. We show how SBH efficiently estimates extreme survival/risk subgroups, unlike other models. This provides an insight into the behavior of commonly used models and suggests alternatives to be adopted in practice. Finally, our SBH framework was applied to a clinical dataset. In it, we identified subsets of patients characterized by clinical and demographic covariates with a distinct extreme survival outcome for which tailored medical interventions could be made. An R package Patient Rule Induction Method in Survival, Regression and Classification settings (PRIMsrc) is available on Comprehensive R Archive Network (CRAN) and GitHub.",
keywords = "Bump hunting, Cross-validation, Exploratory survival/risk analysis, Non-parametric method, Patient rule induction method, Survival/risk estimation and prediction",
author = "Dazard, {Jean Eudes} and Michael Choe and Michael Leblanc and Rao, {Jonnagadda S}",
year = "2016",
month = "2",
day = "1",
doi = "10.1002/sam.11301",
language = "English (US)",
volume = "9",
pages = "12--42",
journal = "Statistical Analysis and Data Mining",
issn = "1932-1872",
publisher = "John Wiley and Sons Inc.",
number = "1",

}

TY - JOUR

T1 - Cross-validation and peeling strategies for survival bump hunting using recursive peeling methods

AU - Dazard, Jean Eudes

AU - Choe, Michael

AU - Leblanc, Michael

AU - Rao, Jonnagadda S

PY - 2016/2/1

Y1 - 2016/2/1

N2 - We introduce a framework to build a survival/risk bump hunting model with a censored time-to-event response. Our survival bump hunting (SBH) method is based on a recursive peeling procedure that uses a specific survival peeling criterion derived from non-/semi-parametric statistics such as the hazard ratio, the log-rank test or the Nelson-Aalen estimator. To optimize the tuning parameter of the model and validate it, we introduce an objective function based on survival- or prediction-error statistics, such as the log-rank test and the concordance error rate. We also describe two alternative cross-validation techniques adapted for the joint task of decision-rule making by recursive peeling and survival estimation. Numerical analyses show the importance of replicated cross-validation and the differences between criteria and techniques in both low- and high-dimensional settings. Although several non-parametric survival models exist, none address the problem of directly identifying local extrema. We show how SBH efficiently estimates extreme survival/risk subgroups, unlike other models. This provides an insight into the behavior of commonly used models and suggests alternatives to be adopted in practice. Finally, our SBH framework was applied to a clinical dataset. In it, we identified subsets of patients characterized by clinical and demographic covariates with a distinct extreme survival outcome for which tailored medical interventions could be made. An R package Patient Rule Induction Method in Survival, Regression and Classification settings (PRIMsrc) is available on Comprehensive R Archive Network (CRAN) and GitHub.

AB - We introduce a framework to build a survival/risk bump hunting model with a censored time-to-event response. Our survival bump hunting (SBH) method is based on a recursive peeling procedure that uses a specific survival peeling criterion derived from non-/semi-parametric statistics such as the hazard ratio, the log-rank test or the Nelson-Aalen estimator. To optimize the tuning parameter of the model and validate it, we introduce an objective function based on survival- or prediction-error statistics, such as the log-rank test and the concordance error rate. We also describe two alternative cross-validation techniques adapted for the joint task of decision-rule making by recursive peeling and survival estimation. Numerical analyses show the importance of replicated cross-validation and the differences between criteria and techniques in both low- and high-dimensional settings. Although several non-parametric survival models exist, none address the problem of directly identifying local extrema. We show how SBH efficiently estimates extreme survival/risk subgroups, unlike other models. This provides an insight into the behavior of commonly used models and suggests alternatives to be adopted in practice. Finally, our SBH framework was applied to a clinical dataset. In it, we identified subsets of patients characterized by clinical and demographic covariates with a distinct extreme survival outcome for which tailored medical interventions could be made. An R package Patient Rule Induction Method in Survival, Regression and Classification settings (PRIMsrc) is available on Comprehensive R Archive Network (CRAN) and GitHub.

KW - Bump hunting

KW - Cross-validation

KW - Exploratory survival/risk analysis

KW - Non-parametric method

KW - Patient rule induction method

KW - Survival/risk estimation and prediction

UR - http://www.scopus.com/inward/record.url?scp=84958770882&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84958770882&partnerID=8YFLogxK

U2 - 10.1002/sam.11301

DO - 10.1002/sam.11301

M3 - Article

AN - SCOPUS:84958770882

VL - 9

SP - 12

EP - 42

JO - Statistical Analysis and Data Mining

JF - Statistical Analysis and Data Mining

SN - 1932-1872

IS - 1

ER -