Bandit problems with infinitely many arms

Donald A. Berry, Robert W. Chen, Alan Zame, David C. Heath, Larry A. Shepp

Research output: Contribution to journalArticle

32 Scopus citations

Abstract

We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with n → ∝. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0, 1) and consider various extensions. In the uniform case we show that the best lower bound for the expected failure proportion is between √2/√n and 2/√n and we exhibit classes of strategies that achieve the latter.

Original languageEnglish (US)
Pages (from-to)2103-2116
Number of pages14
JournalAnnals of Statistics
Volume25
Issue number5
DOIs
StatePublished - Oct 1997

Keywords

  • Bandit problems
  • Dynamic allocation of bernoulli processes
  • Sequential experimentation
  • Staying with a winner
  • Switching with a loser

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint Dive into the research topics of 'Bandit problems with infinitely many arms'. Together they form a unique fingerprint.

Cite this