Bandit problems with infinitely many arms

Donald A. Berry, Robert W. Chen, Alan Zame, David C. Heath, Larry A. Shepp

Research output: Contribution to journalArticle

29 Citations (Scopus)

Abstract

We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with n → ∝. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0, 1) and consider various extensions. In the uniform case we show that the best lower bound for the expected failure proportion is between √2/√n and 2/√n and we exhibit classes of strategies that achieve the latter.

Original languageEnglish (US)
Pages (from-to)2103-2116
Number of pages14
JournalAnnals of Statistics
Volume25
Issue number5
DOIs
StatePublished - Jan 1 1997

Fingerprint

Bandit Problems
Bernoulli
Failure Rate
Long-run
Uniform distribution
Proportion
Lower bound
Minimise
Bandit problems
Class
Strategy
Observation
Failure rate
Lower bounds

Keywords

  • Bandit problems
  • Dynamic allocation of bernoulli processes
  • Sequential experimentation
  • Staying with a winner
  • Switching with a loser

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Berry, D. A., Chen, R. W., Zame, A., Heath, D. C., & Shepp, L. A. (1997). Bandit problems with infinitely many arms. Annals of Statistics, 25(5), 2103-2116. https://doi.org/10.1214/aos/1069362389

Bandit problems with infinitely many arms. / Berry, Donald A.; Chen, Robert W.; Zame, Alan; Heath, David C.; Shepp, Larry A.

In: Annals of Statistics, Vol. 25, No. 5, 01.01.1997, p. 2103-2116.

Research output: Contribution to journalArticle

Berry, DA, Chen, RW, Zame, A, Heath, DC & Shepp, LA 1997, 'Bandit problems with infinitely many arms', Annals of Statistics, vol. 25, no. 5, pp. 2103-2116. https://doi.org/10.1214/aos/1069362389
Berry, Donald A. ; Chen, Robert W. ; Zame, Alan ; Heath, David C. ; Shepp, Larry A. / Bandit problems with infinitely many arms. In: Annals of Statistics. 1997 ; Vol. 25, No. 5. pp. 2103-2116.
@article{4ea6c76b4164409fa08fa94f59a0ef82,
title = "Bandit problems with infinitely many arms",
abstract = "We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with n → ∝. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0, 1) and consider various extensions. In the uniform case we show that the best lower bound for the expected failure proportion is between √2/√n and 2/√n and we exhibit classes of strategies that achieve the latter.",
keywords = "Bandit problems, Dynamic allocation of bernoulli processes, Sequential experimentation, Staying with a winner, Switching with a loser",
author = "Berry, {Donald A.} and Chen, {Robert W.} and Alan Zame and Heath, {David C.} and Shepp, {Larry A.}",
year = "1997",
month = "1",
day = "1",
doi = "10.1214/aos/1069362389",
language = "English (US)",
volume = "25",
pages = "2103--2116",
journal = "Annals of Statistics",
issn = "0090-5364",
publisher = "Institute of Mathematical Statistics",
number = "5",

}

TY - JOUR

T1 - Bandit problems with infinitely many arms

AU - Berry, Donald A.

AU - Chen, Robert W.

AU - Zame, Alan

AU - Heath, David C.

AU - Shepp, Larry A.

PY - 1997/1/1

Y1 - 1997/1/1

N2 - We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with n → ∝. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0, 1) and consider various extensions. In the uniform case we show that the best lower bound for the expected failure proportion is between √2/√n and 2/√n and we exhibit classes of strategies that achieve the latter.

AB - We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with n → ∝. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0, 1) and consider various extensions. In the uniform case we show that the best lower bound for the expected failure proportion is between √2/√n and 2/√n and we exhibit classes of strategies that achieve the latter.

KW - Bandit problems

KW - Dynamic allocation of bernoulli processes

KW - Sequential experimentation

KW - Staying with a winner

KW - Switching with a loser

UR - http://www.scopus.com/inward/record.url?scp=0031534756&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0031534756&partnerID=8YFLogxK

U2 - 10.1214/aos/1069362389

DO - 10.1214/aos/1069362389

M3 - Article

VL - 25

SP - 2103

EP - 2116

JO - Annals of Statistics

JF - Annals of Statistics

SN - 0090-5364

IS - 5

ER -