Distributed multi-agent online learning based on global feedback

Jie Xu, Cem Tekin, Simpson Zhang, Mihaela Van Der Schaar

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

In this paper, we develop online learning algorithms that enable the agents to cooperatively learn how to maximize the overall reward in scenarios where only noisy global feedback is available without exchanging any information among themselves. We prove that our algorithms' learning regrets-the losses incurred by the algorithms due to uncertainty-are logarithmically increasing in time and thus the time average reward converges to the optimal average reward. Moreover, we also illustrate how the regret depends on the size of the action space, and we show that this relationship is influenced by the informativeness of the reward structure with regard to each agent's individual action. When the overall reward is fully informative, regret is shown to be linear in the total number of actions of all the agents. When the reward function is not informative, regret is linear in the number of joint actions. Our analytic and numerical results show that the proposed learning algorithms significantly outperform existing online learning solutions in terms of regret and learning speed. We illustrate how our theoretical framework can be used in practice by applying it to online Big Data mining using distributed classifiers.

Original languageEnglish (US)
Article number7041172
Pages (from-to)2225-2238
Number of pages14
JournalIEEE Transactions on Signal Processing
Volume63
Issue number9
DOIs
StatePublished - May 1 2015
Externally publishedYes

Fingerprint

Learning algorithms
Feedback
Data mining
Classifiers
Uncertainty
Big data

Keywords

  • Big Data mining
  • distributed cooperative learning
  • multiagent learning
  • multiarmed bandits
  • online learning
  • reward informativeness

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing

Cite this

Distributed multi-agent online learning based on global feedback. / Xu, Jie; Tekin, Cem; Zhang, Simpson; Van Der Schaar, Mihaela.

In: IEEE Transactions on Signal Processing, Vol. 63, No. 9, 7041172, 01.05.2015, p. 2225-2238.

Research output: Contribution to journalArticle

Xu, Jie ; Tekin, Cem ; Zhang, Simpson ; Van Der Schaar, Mihaela. / Distributed multi-agent online learning based on global feedback. In: IEEE Transactions on Signal Processing. 2015 ; Vol. 63, No. 9. pp. 2225-2238.
@article{94666fac81954eb7934f3a6f2611eb14,
title = "Distributed multi-agent online learning based on global feedback",
abstract = "In this paper, we develop online learning algorithms that enable the agents to cooperatively learn how to maximize the overall reward in scenarios where only noisy global feedback is available without exchanging any information among themselves. We prove that our algorithms' learning regrets-the losses incurred by the algorithms due to uncertainty-are logarithmically increasing in time and thus the time average reward converges to the optimal average reward. Moreover, we also illustrate how the regret depends on the size of the action space, and we show that this relationship is influenced by the informativeness of the reward structure with regard to each agent's individual action. When the overall reward is fully informative, regret is shown to be linear in the total number of actions of all the agents. When the reward function is not informative, regret is linear in the number of joint actions. Our analytic and numerical results show that the proposed learning algorithms significantly outperform existing online learning solutions in terms of regret and learning speed. We illustrate how our theoretical framework can be used in practice by applying it to online Big Data mining using distributed classifiers.",
keywords = "Big Data mining, distributed cooperative learning, multiagent learning, multiarmed bandits, online learning, reward informativeness",
author = "Jie Xu and Cem Tekin and Simpson Zhang and {Van Der Schaar}, Mihaela",
year = "2015",
month = "5",
day = "1",
doi = "10.1109/TSP.2015.2403288",
language = "English (US)",
volume = "63",
pages = "2225--2238",
journal = "IEEE Transactions on Signal Processing",
issn = "1053-587X",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "9",

}

TY - JOUR

T1 - Distributed multi-agent online learning based on global feedback

AU - Xu, Jie

AU - Tekin, Cem

AU - Zhang, Simpson

AU - Van Der Schaar, Mihaela

PY - 2015/5/1

Y1 - 2015/5/1

N2 - In this paper, we develop online learning algorithms that enable the agents to cooperatively learn how to maximize the overall reward in scenarios where only noisy global feedback is available without exchanging any information among themselves. We prove that our algorithms' learning regrets-the losses incurred by the algorithms due to uncertainty-are logarithmically increasing in time and thus the time average reward converges to the optimal average reward. Moreover, we also illustrate how the regret depends on the size of the action space, and we show that this relationship is influenced by the informativeness of the reward structure with regard to each agent's individual action. When the overall reward is fully informative, regret is shown to be linear in the total number of actions of all the agents. When the reward function is not informative, regret is linear in the number of joint actions. Our analytic and numerical results show that the proposed learning algorithms significantly outperform existing online learning solutions in terms of regret and learning speed. We illustrate how our theoretical framework can be used in practice by applying it to online Big Data mining using distributed classifiers.

AB - In this paper, we develop online learning algorithms that enable the agents to cooperatively learn how to maximize the overall reward in scenarios where only noisy global feedback is available without exchanging any information among themselves. We prove that our algorithms' learning regrets-the losses incurred by the algorithms due to uncertainty-are logarithmically increasing in time and thus the time average reward converges to the optimal average reward. Moreover, we also illustrate how the regret depends on the size of the action space, and we show that this relationship is influenced by the informativeness of the reward structure with regard to each agent's individual action. When the overall reward is fully informative, regret is shown to be linear in the total number of actions of all the agents. When the reward function is not informative, regret is linear in the number of joint actions. Our analytic and numerical results show that the proposed learning algorithms significantly outperform existing online learning solutions in terms of regret and learning speed. We illustrate how our theoretical framework can be used in practice by applying it to online Big Data mining using distributed classifiers.

KW - Big Data mining

KW - distributed cooperative learning

KW - multiagent learning

KW - multiarmed bandits

KW - online learning

KW - reward informativeness

UR - http://www.scopus.com/inward/record.url?scp=84926453139&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84926453139&partnerID=8YFLogxK

U2 - 10.1109/TSP.2015.2403288

DO - 10.1109/TSP.2015.2403288

M3 - Article

VL - 63

SP - 2225

EP - 2238

JO - IEEE Transactions on Signal Processing

JF - IEEE Transactions on Signal Processing

SN - 1053-587X

IS - 9

M1 - 7041172

ER -