Optimal Data-Driven Regression Discontinuity Plots

Sebastian Calonico, Matias D. Cattaneo, Rocío Titiunik

Research output: Contribution to journalArticle

63 Citations (Scopus)

Abstract

Exploratory data analysis plays a central role in applied statistics and econometrics. In the popular regression-discontinuity (RD) design, the use of graphical analysis has been strongly advocated because it provides both easy presentation and transparent validation of the design. RD plots are nowadays widely used in applications, despite its formal properties being unknown: these plots are typically presented employing ad hoc choices of tuning parameters, which makes these procedures less automatic and more subjective. In this article, we formally study the most common RD plot based on an evenly spaced binning of the data, and propose several (optimal) data-driven choices for the number of bins depending on the goal of the researcher. These RD plots are constructed either to approximate the underlying unknown regression functions without imposing smoothness in the estimator, or to approximate the underlying variability of the raw data while smoothing out the otherwise uninformative scatterplot of the data. In addition, we introduce an alternative RD plot based on quantile spaced binning, study its formal properties, and propose similar (optimal) data-driven choices for the number of bins. The main proposed data-driven selectors employ spacings estimators, which are simple and easy to implement in applications because they do not require additional choices of tuning parameters. Altogether, our results offer an array of alternative RD plots that are objective and automatic when implemented, providing a reliable benchmark for graphical analysis in RD designs. We illustrate the performance of our automatic RD plots using several empirical examples and a Monte Carlo study. All results are readily available in R and STATA using the software packages described in Calonico, Cattaneo, and Titiunik. Supplementary materials for this article are available online.

Original languageEnglish (US)
Pages (from-to)1753-1769
Number of pages17
JournalJournal of the American Statistical Association
Volume110
Issue number512
DOIs
StatePublished - Oct 2 2015

Fingerprint

Data-driven
Discontinuity
Regression
Binning
Parameter Tuning
Estimator
Exploratory Data Analysis
Unknown
Selector
Regression discontinuity
Alternatives
Regression Function
Monte Carlo Study
Econometrics
Quantile
Software Package
Spacing
Smoothing
Smoothness
Benchmark

Keywords

  • Binning
  • Partitioning
  • RD plots
  • Tuning parameter selection

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Optimal Data-Driven Regression Discontinuity Plots. / Calonico, Sebastian; Cattaneo, Matias D.; Titiunik, Rocío.

In: Journal of the American Statistical Association, Vol. 110, No. 512, 02.10.2015, p. 1753-1769.

Research output: Contribution to journalArticle

Calonico, Sebastian ; Cattaneo, Matias D. ; Titiunik, Rocío. / Optimal Data-Driven Regression Discontinuity Plots. In: Journal of the American Statistical Association. 2015 ; Vol. 110, No. 512. pp. 1753-1769.
@article{1429b72099ae464b9066008dd141666d,
title = "Optimal Data-Driven Regression Discontinuity Plots",
abstract = "Exploratory data analysis plays a central role in applied statistics and econometrics. In the popular regression-discontinuity (RD) design, the use of graphical analysis has been strongly advocated because it provides both easy presentation and transparent validation of the design. RD plots are nowadays widely used in applications, despite its formal properties being unknown: these plots are typically presented employing ad hoc choices of tuning parameters, which makes these procedures less automatic and more subjective. In this article, we formally study the most common RD plot based on an evenly spaced binning of the data, and propose several (optimal) data-driven choices for the number of bins depending on the goal of the researcher. These RD plots are constructed either to approximate the underlying unknown regression functions without imposing smoothness in the estimator, or to approximate the underlying variability of the raw data while smoothing out the otherwise uninformative scatterplot of the data. In addition, we introduce an alternative RD plot based on quantile spaced binning, study its formal properties, and propose similar (optimal) data-driven choices for the number of bins. The main proposed data-driven selectors employ spacings estimators, which are simple and easy to implement in applications because they do not require additional choices of tuning parameters. Altogether, our results offer an array of alternative RD plots that are objective and automatic when implemented, providing a reliable benchmark for graphical analysis in RD designs. We illustrate the performance of our automatic RD plots using several empirical examples and a Monte Carlo study. All results are readily available in R and STATA using the software packages described in Calonico, Cattaneo, and Titiunik. Supplementary materials for this article are available online.",
keywords = "Binning, Partitioning, RD plots, Tuning parameter selection",
author = "Sebastian Calonico and Cattaneo, {Matias D.} and Roc{\'i}o Titiunik",
year = "2015",
month = "10",
day = "2",
doi = "10.1080/01621459.2015.1017578",
language = "English (US)",
volume = "110",
pages = "1753--1769",
journal = "Journal of the American Statistical Association",
issn = "0162-1459",
publisher = "Taylor and Francis Ltd.",
number = "512",

}

TY - JOUR

T1 - Optimal Data-Driven Regression Discontinuity Plots

AU - Calonico, Sebastian

AU - Cattaneo, Matias D.

AU - Titiunik, Rocío

PY - 2015/10/2

Y1 - 2015/10/2

N2 - Exploratory data analysis plays a central role in applied statistics and econometrics. In the popular regression-discontinuity (RD) design, the use of graphical analysis has been strongly advocated because it provides both easy presentation and transparent validation of the design. RD plots are nowadays widely used in applications, despite its formal properties being unknown: these plots are typically presented employing ad hoc choices of tuning parameters, which makes these procedures less automatic and more subjective. In this article, we formally study the most common RD plot based on an evenly spaced binning of the data, and propose several (optimal) data-driven choices for the number of bins depending on the goal of the researcher. These RD plots are constructed either to approximate the underlying unknown regression functions without imposing smoothness in the estimator, or to approximate the underlying variability of the raw data while smoothing out the otherwise uninformative scatterplot of the data. In addition, we introduce an alternative RD plot based on quantile spaced binning, study its formal properties, and propose similar (optimal) data-driven choices for the number of bins. The main proposed data-driven selectors employ spacings estimators, which are simple and easy to implement in applications because they do not require additional choices of tuning parameters. Altogether, our results offer an array of alternative RD plots that are objective and automatic when implemented, providing a reliable benchmark for graphical analysis in RD designs. We illustrate the performance of our automatic RD plots using several empirical examples and a Monte Carlo study. All results are readily available in R and STATA using the software packages described in Calonico, Cattaneo, and Titiunik. Supplementary materials for this article are available online.

AB - Exploratory data analysis plays a central role in applied statistics and econometrics. In the popular regression-discontinuity (RD) design, the use of graphical analysis has been strongly advocated because it provides both easy presentation and transparent validation of the design. RD plots are nowadays widely used in applications, despite its formal properties being unknown: these plots are typically presented employing ad hoc choices of tuning parameters, which makes these procedures less automatic and more subjective. In this article, we formally study the most common RD plot based on an evenly spaced binning of the data, and propose several (optimal) data-driven choices for the number of bins depending on the goal of the researcher. These RD plots are constructed either to approximate the underlying unknown regression functions without imposing smoothness in the estimator, or to approximate the underlying variability of the raw data while smoothing out the otherwise uninformative scatterplot of the data. In addition, we introduce an alternative RD plot based on quantile spaced binning, study its formal properties, and propose similar (optimal) data-driven choices for the number of bins. The main proposed data-driven selectors employ spacings estimators, which are simple and easy to implement in applications because they do not require additional choices of tuning parameters. Altogether, our results offer an array of alternative RD plots that are objective and automatic when implemented, providing a reliable benchmark for graphical analysis in RD designs. We illustrate the performance of our automatic RD plots using several empirical examples and a Monte Carlo study. All results are readily available in R and STATA using the software packages described in Calonico, Cattaneo, and Titiunik. Supplementary materials for this article are available online.

KW - Binning

KW - Partitioning

KW - RD plots

KW - Tuning parameter selection

UR - http://www.scopus.com/inward/record.url?scp=84938914336&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84938914336&partnerID=8YFLogxK

U2 - 10.1080/01621459.2015.1017578

DO - 10.1080/01621459.2015.1017578

M3 - Article

VL - 110

SP - 1753

EP - 1769

JO - Journal of the American Statistical Association

JF - Journal of the American Statistical Association

SN - 0162-1459

IS - 512

ER -