Reinforcement learning via kernel temporal difference.

Jihye Bae, Pratik Chhatbar, Joseph T. Francis, Justin C. Sanchez, Jose C. Principe

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

This paper introduces a kernel adaptive filter implemented with stochastic gradient on temporal differences, kernel Temporal Difference (TD)(λ), to estimate the state-action value function in reinforcement learning. The case λ=0 will be studied in this paper. Experimental results show the method's applicability for learning motor state decoding during a center-out reaching task performed by a monkey. The results are compared to the implementation of a time delay neural network (TDNN) trained with backpropagation of the temporal difference error. From the experiments, it is observed that kernel TD(0) allows faster convergence and a better solution than the neural network.

Original languageEnglish
Pages (from-to)5662-5665
Number of pages4
JournalConference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference
Volume2011
StatePublished - Dec 1 2011
Externally publishedYes

Fingerprint

Reinforcement learning
Learning
Neural networks
Adaptive filters
Backpropagation
Decoding
Time delay
Experiments
Haplorhini
Reinforcement (Psychology)

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Biomedical Engineering
  • Health Informatics

Cite this

Reinforcement learning via kernel temporal difference. / Bae, Jihye; Chhatbar, Pratik; Francis, Joseph T.; Sanchez, Justin C.; Principe, Jose C.

In: Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, Vol. 2011, 01.12.2011, p. 5662-5665.

Research output: Contribution to journalArticle

@article{c86607e56dfc4085be23dd79efa721e4,
title = "Reinforcement learning via kernel temporal difference.",
abstract = "This paper introduces a kernel adaptive filter implemented with stochastic gradient on temporal differences, kernel Temporal Difference (TD)(λ), to estimate the state-action value function in reinforcement learning. The case λ=0 will be studied in this paper. Experimental results show the method's applicability for learning motor state decoding during a center-out reaching task performed by a monkey. The results are compared to the implementation of a time delay neural network (TDNN) trained with backpropagation of the temporal difference error. From the experiments, it is observed that kernel TD(0) allows faster convergence and a better solution than the neural network.",
author = "Jihye Bae and Pratik Chhatbar and Francis, {Joseph T.} and Sanchez, {Justin C.} and Principe, {Jose C.}",
year = "2011",
month = "12",
day = "1",
language = "English",
volume = "2011",
pages = "5662--5665",
journal = "Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference",
issn = "1557-170X",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Reinforcement learning via kernel temporal difference.

AU - Bae, Jihye

AU - Chhatbar, Pratik

AU - Francis, Joseph T.

AU - Sanchez, Justin C.

AU - Principe, Jose C.

PY - 2011/12/1

Y1 - 2011/12/1

N2 - This paper introduces a kernel adaptive filter implemented with stochastic gradient on temporal differences, kernel Temporal Difference (TD)(λ), to estimate the state-action value function in reinforcement learning. The case λ=0 will be studied in this paper. Experimental results show the method's applicability for learning motor state decoding during a center-out reaching task performed by a monkey. The results are compared to the implementation of a time delay neural network (TDNN) trained with backpropagation of the temporal difference error. From the experiments, it is observed that kernel TD(0) allows faster convergence and a better solution than the neural network.

AB - This paper introduces a kernel adaptive filter implemented with stochastic gradient on temporal differences, kernel Temporal Difference (TD)(λ), to estimate the state-action value function in reinforcement learning. The case λ=0 will be studied in this paper. Experimental results show the method's applicability for learning motor state decoding during a center-out reaching task performed by a monkey. The results are compared to the implementation of a time delay neural network (TDNN) trained with backpropagation of the temporal difference error. From the experiments, it is observed that kernel TD(0) allows faster convergence and a better solution than the neural network.

UR - http://www.scopus.com/inward/record.url?scp=84862289741&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84862289741&partnerID=8YFLogxK

M3 - Article

C2 - 22255624

AN - SCOPUS:84862289741

VL - 2011

SP - 5662

EP - 5665

JO - Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference

JF - Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference

SN - 1557-170X

ER -