Stabilized Nearest Neighbor Classifier and its Statistical Properties

Wei Sun, Xingye Qiao, Guang Cheng

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

The stability of statistical analysis is an important indicator for reproducibility, which is one main principle of the scientific method. It entails that similar statistical conclusions can be reached based on independent samples from the same underlying population. In this article, we introduce a general measure of classification instability (CIS) to quantify the sampling variability of the prediction made by a classification method. Interestingly, the asymptotic CIS of any weighted nearest neighbor classifier turns out to be proportional to the Euclidean norm of its weight vector. Based on this concise form, we propose a stabilized nearest neighbor (SNN) classifier, which distinguishes itself from other nearest neighbor classifiers, by taking the stability into consideration. In theory, we prove that SNN attains the minimax optimal convergence rate in risk, and a sharp convergence rate in CIS. The latter rate result is established for general plug-in classifiers under a low-noise condition. Extensive simulated and real examples demonstrate that SNN achieves a considerable improvement in CIS over existing nearest neighbor classifiers, with comparable classification accuracy. We implement the algorithm in a publicly available R package snn. Supplementary materials for this article are available online.

Original languageEnglish (US)
Pages (from-to)1254-1265
Number of pages12
JournalJournal of the American Statistical Association
Volume111
Issue number515
DOIs
StatePublished - Jul 2 2016

Fingerprint

Statistical property
Nearest Neighbor
Classifier
Optimal Convergence Rate
Euclidean norm
Reproducibility
Plug-in
Minimax
Statistical Analysis
Convergence Rate
Nearest neighbor
Quantify
Directly proportional
Prediction
Demonstrate
Convergence rate

Keywords

  • Bayes risk
  • Classification
  • Margin condition
  • Minimax optimality
  • Reproducibility
  • Stability

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Stabilized Nearest Neighbor Classifier and its Statistical Properties. / Sun, Wei; Qiao, Xingye; Cheng, Guang.

In: Journal of the American Statistical Association, Vol. 111, No. 515, 02.07.2016, p. 1254-1265.

Research output: Contribution to journalArticle

@article{e2475a67594a4430a7455d03b72d4679,
title = "Stabilized Nearest Neighbor Classifier and its Statistical Properties",
abstract = "The stability of statistical analysis is an important indicator for reproducibility, which is one main principle of the scientific method. It entails that similar statistical conclusions can be reached based on independent samples from the same underlying population. In this article, we introduce a general measure of classification instability (CIS) to quantify the sampling variability of the prediction made by a classification method. Interestingly, the asymptotic CIS of any weighted nearest neighbor classifier turns out to be proportional to the Euclidean norm of its weight vector. Based on this concise form, we propose a stabilized nearest neighbor (SNN) classifier, which distinguishes itself from other nearest neighbor classifiers, by taking the stability into consideration. In theory, we prove that SNN attains the minimax optimal convergence rate in risk, and a sharp convergence rate in CIS. The latter rate result is established for general plug-in classifiers under a low-noise condition. Extensive simulated and real examples demonstrate that SNN achieves a considerable improvement in CIS over existing nearest neighbor classifiers, with comparable classification accuracy. We implement the algorithm in a publicly available R package snn. Supplementary materials for this article are available online.",
keywords = "Bayes risk, Classification, Margin condition, Minimax optimality, Reproducibility, Stability",
author = "Wei Sun and Xingye Qiao and Guang Cheng",
year = "2016",
month = "7",
day = "2",
doi = "10.1080/01621459.2015.1089772",
language = "English (US)",
volume = "111",
pages = "1254--1265",
journal = "Journal of the American Statistical Association",
issn = "0162-1459",
publisher = "Taylor and Francis Ltd.",
number = "515",

}

TY - JOUR

T1 - Stabilized Nearest Neighbor Classifier and its Statistical Properties

AU - Sun, Wei

AU - Qiao, Xingye

AU - Cheng, Guang

PY - 2016/7/2

Y1 - 2016/7/2

N2 - The stability of statistical analysis is an important indicator for reproducibility, which is one main principle of the scientific method. It entails that similar statistical conclusions can be reached based on independent samples from the same underlying population. In this article, we introduce a general measure of classification instability (CIS) to quantify the sampling variability of the prediction made by a classification method. Interestingly, the asymptotic CIS of any weighted nearest neighbor classifier turns out to be proportional to the Euclidean norm of its weight vector. Based on this concise form, we propose a stabilized nearest neighbor (SNN) classifier, which distinguishes itself from other nearest neighbor classifiers, by taking the stability into consideration. In theory, we prove that SNN attains the minimax optimal convergence rate in risk, and a sharp convergence rate in CIS. The latter rate result is established for general plug-in classifiers under a low-noise condition. Extensive simulated and real examples demonstrate that SNN achieves a considerable improvement in CIS over existing nearest neighbor classifiers, with comparable classification accuracy. We implement the algorithm in a publicly available R package snn. Supplementary materials for this article are available online.

AB - The stability of statistical analysis is an important indicator for reproducibility, which is one main principle of the scientific method. It entails that similar statistical conclusions can be reached based on independent samples from the same underlying population. In this article, we introduce a general measure of classification instability (CIS) to quantify the sampling variability of the prediction made by a classification method. Interestingly, the asymptotic CIS of any weighted nearest neighbor classifier turns out to be proportional to the Euclidean norm of its weight vector. Based on this concise form, we propose a stabilized nearest neighbor (SNN) classifier, which distinguishes itself from other nearest neighbor classifiers, by taking the stability into consideration. In theory, we prove that SNN attains the minimax optimal convergence rate in risk, and a sharp convergence rate in CIS. The latter rate result is established for general plug-in classifiers under a low-noise condition. Extensive simulated and real examples demonstrate that SNN achieves a considerable improvement in CIS over existing nearest neighbor classifiers, with comparable classification accuracy. We implement the algorithm in a publicly available R package snn. Supplementary materials for this article are available online.

KW - Bayes risk

KW - Classification

KW - Margin condition

KW - Minimax optimality

KW - Reproducibility

KW - Stability

UR - http://www.scopus.com/inward/record.url?scp=84991730446&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84991730446&partnerID=8YFLogxK

U2 - 10.1080/01621459.2015.1089772

DO - 10.1080/01621459.2015.1089772

M3 - Article

AN - SCOPUS:84991730446

VL - 111

SP - 1254

EP - 1265

JO - Journal of the American Statistical Association

JF - Journal of the American Statistical Association

SN - 0162-1459

IS - 515

ER -