### Abstract

We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with n → ∝. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0, 1) and consider various extensions. In the uniform case we show that the best lower bound for the expected failure proportion is between √2/√n and 2/√n and we exhibit classes of strategies that achieve the latter.

Original language | English (US) |
---|---|

Pages (from-to) | 2103-2116 |

Number of pages | 14 |

Journal | Annals of Statistics |

Volume | 25 |

Issue number | 5 |

DOIs | |

State | Published - Jan 1 1997 |

### Fingerprint

### Keywords

- Bandit problems
- Dynamic allocation of bernoulli processes
- Sequential experimentation
- Staying with a winner
- Switching with a loser

### ASJC Scopus subject areas

- Statistics and Probability
- Statistics, Probability and Uncertainty

### Cite this

*Annals of Statistics*,

*25*(5), 2103-2116. https://doi.org/10.1214/aos/1069362389

**Bandit problems with infinitely many arms.** / Berry, Donald A.; Chen, Robert W.; Zame, Alan; Heath, David C.; Shepp, Larry A.

Research output: Contribution to journal › Article

*Annals of Statistics*, vol. 25, no. 5, pp. 2103-2116. https://doi.org/10.1214/aos/1069362389

}

TY - JOUR

T1 - Bandit problems with infinitely many arms

AU - Berry, Donald A.

AU - Chen, Robert W.

AU - Zame, Alan

AU - Heath, David C.

AU - Shepp, Larry A.

PY - 1997/1/1

Y1 - 1997/1/1

N2 - We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with n → ∝. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0, 1) and consider various extensions. In the uniform case we show that the best lower bound for the expected failure proportion is between √2/√n and 2/√n and we exhibit classes of strategies that achieve the latter.

AB - We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with n → ∝. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0, 1) and consider various extensions. In the uniform case we show that the best lower bound for the expected failure proportion is between √2/√n and 2/√n and we exhibit classes of strategies that achieve the latter.

KW - Bandit problems

KW - Dynamic allocation of bernoulli processes

KW - Sequential experimentation

KW - Staying with a winner

KW - Switching with a loser

UR - http://www.scopus.com/inward/record.url?scp=0031534756&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0031534756&partnerID=8YFLogxK

U2 - 10.1214/aos/1069362389

DO - 10.1214/aos/1069362389

M3 - Article

AN - SCOPUS:0031534756

VL - 25

SP - 2103

EP - 2116

JO - Annals of Statistics

JF - Annals of Statistics

SN - 0090-5364

IS - 5

ER -