Active learning for streaming data in a contextual bandit framework

Linqi Song, Jie Xu, Congduan Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Contextual bandit algorithms have been shown to be effective in solving sequential decision making problems under uncertainty. A common assumption in the literature is that the realized (ground truth) reward is observed by the learner at no cost, which, however, is not realistic in many practical scenarios. When observing the ground truth reward is costly, a key challenge is how to judiciously acquire the ground truth by assessing the benefits and costs in order to balance learning efficiency and learning cost. In this paper, we design a novel contextual bandit-based learning algorithm and endow it with the active learning capability. In addition to sending a query to an annotator for the ground truth, prior information about the ground truth learned by the learner is sent together, thereby reducing the query cost. We prove that the learning regret of the proposed algorithm achieves the same order as that of conventional contextual bandit algorithms in cost-free scenarios, implying that, surprisingly, cost due to acquiring the ground truth does not increase the learning regret in the long-run, where the prior information about the ground truth plays a critical role.

Original languageEnglish (US)
Title of host publicationProceedings of the 2019 5th International Conference on Computing and Data Engineering, ICCDE 2019
PublisherAssociation for Computing Machinery
Pages29-35
Number of pages7
ISBN (Electronic)9781450361248
DOIs
StatePublished - May 4 2019
Event5th International Conference on Computing and Data Engineering, ICCDE 2019 - Shanghai, China
Duration: May 4 2019May 6 2019

Publication series

NameACM International Conference Proceeding Series

Conference

Conference5th International Conference on Computing and Data Engineering, ICCDE 2019
CountryChina
CityShanghai
Period5/4/195/6/19

Fingerprint

Costs
Learning algorithms
Problem-Based Learning
Decision making
Uncertainty

Keywords

  • Active learning
  • Contextual bandits
  • Streaming data

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction
  • Software

Cite this

Song, L., Xu, J., & Li, C. (2019). Active learning for streaming data in a contextual bandit framework. In Proceedings of the 2019 5th International Conference on Computing and Data Engineering, ICCDE 2019 (pp. 29-35). (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3330530.3330543

Active learning for streaming data in a contextual bandit framework. / Song, Linqi; Xu, Jie; Li, Congduan.

Proceedings of the 2019 5th International Conference on Computing and Data Engineering, ICCDE 2019. Association for Computing Machinery, 2019. p. 29-35 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Song, L, Xu, J & Li, C 2019, Active learning for streaming data in a contextual bandit framework. in Proceedings of the 2019 5th International Conference on Computing and Data Engineering, ICCDE 2019. ACM International Conference Proceeding Series, Association for Computing Machinery, pp. 29-35, 5th International Conference on Computing and Data Engineering, ICCDE 2019, Shanghai, China, 5/4/19. https://doi.org/10.1145/3330530.3330543
Song L, Xu J, Li C. Active learning for streaming data in a contextual bandit framework. In Proceedings of the 2019 5th International Conference on Computing and Data Engineering, ICCDE 2019. Association for Computing Machinery. 2019. p. 29-35. (ACM International Conference Proceeding Series). https://doi.org/10.1145/3330530.3330543
Song, Linqi ; Xu, Jie ; Li, Congduan. / Active learning for streaming data in a contextual bandit framework. Proceedings of the 2019 5th International Conference on Computing and Data Engineering, ICCDE 2019. Association for Computing Machinery, 2019. pp. 29-35 (ACM International Conference Proceeding Series).
@inproceedings{352bef377d1845b7915ac9c8477a0e7a,
title = "Active learning for streaming data in a contextual bandit framework",
abstract = "Contextual bandit algorithms have been shown to be effective in solving sequential decision making problems under uncertainty. A common assumption in the literature is that the realized (ground truth) reward is observed by the learner at no cost, which, however, is not realistic in many practical scenarios. When observing the ground truth reward is costly, a key challenge is how to judiciously acquire the ground truth by assessing the benefits and costs in order to balance learning efficiency and learning cost. In this paper, we design a novel contextual bandit-based learning algorithm and endow it with the active learning capability. In addition to sending a query to an annotator for the ground truth, prior information about the ground truth learned by the learner is sent together, thereby reducing the query cost. We prove that the learning regret of the proposed algorithm achieves the same order as that of conventional contextual bandit algorithms in cost-free scenarios, implying that, surprisingly, cost due to acquiring the ground truth does not increase the learning regret in the long-run, where the prior information about the ground truth plays a critical role.",
keywords = "Active learning, Contextual bandits, Streaming data",
author = "Linqi Song and Jie Xu and Congduan Li",
year = "2019",
month = "5",
day = "4",
doi = "10.1145/3330530.3330543",
language = "English (US)",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery",
pages = "29--35",
booktitle = "Proceedings of the 2019 5th International Conference on Computing and Data Engineering, ICCDE 2019",

}

TY - GEN

T1 - Active learning for streaming data in a contextual bandit framework

AU - Song, Linqi

AU - Xu, Jie

AU - Li, Congduan

PY - 2019/5/4

Y1 - 2019/5/4

N2 - Contextual bandit algorithms have been shown to be effective in solving sequential decision making problems under uncertainty. A common assumption in the literature is that the realized (ground truth) reward is observed by the learner at no cost, which, however, is not realistic in many practical scenarios. When observing the ground truth reward is costly, a key challenge is how to judiciously acquire the ground truth by assessing the benefits and costs in order to balance learning efficiency and learning cost. In this paper, we design a novel contextual bandit-based learning algorithm and endow it with the active learning capability. In addition to sending a query to an annotator for the ground truth, prior information about the ground truth learned by the learner is sent together, thereby reducing the query cost. We prove that the learning regret of the proposed algorithm achieves the same order as that of conventional contextual bandit algorithms in cost-free scenarios, implying that, surprisingly, cost due to acquiring the ground truth does not increase the learning regret in the long-run, where the prior information about the ground truth plays a critical role.

AB - Contextual bandit algorithms have been shown to be effective in solving sequential decision making problems under uncertainty. A common assumption in the literature is that the realized (ground truth) reward is observed by the learner at no cost, which, however, is not realistic in many practical scenarios. When observing the ground truth reward is costly, a key challenge is how to judiciously acquire the ground truth by assessing the benefits and costs in order to balance learning efficiency and learning cost. In this paper, we design a novel contextual bandit-based learning algorithm and endow it with the active learning capability. In addition to sending a query to an annotator for the ground truth, prior information about the ground truth learned by the learner is sent together, thereby reducing the query cost. We prove that the learning regret of the proposed algorithm achieves the same order as that of conventional contextual bandit algorithms in cost-free scenarios, implying that, surprisingly, cost due to acquiring the ground truth does not increase the learning regret in the long-run, where the prior information about the ground truth plays a critical role.

KW - Active learning

KW - Contextual bandits

KW - Streaming data

UR - http://www.scopus.com/inward/record.url?scp=85069055967&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069055967&partnerID=8YFLogxK

U2 - 10.1145/3330530.3330543

DO - 10.1145/3330530.3330543

M3 - Conference contribution

T3 - ACM International Conference Proceeding Series

SP - 29

EP - 35

BT - Proceedings of the 2019 5th International Conference on Computing and Data Engineering, ICCDE 2019

PB - Association for Computing Machinery

ER -