Reinforcement learning via kernel temporal difference

Jihye Bae, Pratik Chhatbar, Joseph T. Francis, Justin C. Sanchez, Jose C. Principe

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

This paper introduces a kernel adaptive filter implemented with stochastic gradient on temporal differences, kernel Temporal Difference (TD)(λ), to estimate the state-action value function in reinforcement learning. The case λ=0 will be studied in this paper. Experimental results show the method's applicability for learning motor state decoding during a center-out reaching task performed by a monkey. The results are compared to the implementation of a time delay neural network (TDNN) trained with backpropagation of the temporal difference error. From the experiments, it is observed that kernel TD(0) allows faster convergence and a better solution than the neural network.

Original languageEnglish
Title of host publicationProceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
Pages5662-5665
Number of pages4
DOIs
StatePublished - Dec 26 2011
Event33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS 2011 - Boston, MA, United States
Duration: Aug 30 2011Sep 3 2011

Other

Other33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS 2011
CountryUnited States
CityBoston, MA
Period8/30/119/3/11

Fingerprint

Reinforcement learning
Learning
Neural networks
Adaptive filters
Backpropagation
Decoding
Time delay
Experiments
Haplorhini
Reinforcement (Psychology)

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Biomedical Engineering
  • Health Informatics

Cite this

Bae, J., Chhatbar, P., Francis, J. T., Sanchez, J. C., & Principe, J. C. (2011). Reinforcement learning via kernel temporal difference. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS (pp. 5662-5665). [6091370] https://doi.org/10.1109/IEMBS.2011.6091370

Reinforcement learning via kernel temporal difference. / Bae, Jihye; Chhatbar, Pratik; Francis, Joseph T.; Sanchez, Justin C.; Principe, Jose C.

Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS. 2011. p. 5662-5665 6091370.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bae, J, Chhatbar, P, Francis, JT, Sanchez, JC & Principe, JC 2011, Reinforcement learning via kernel temporal difference. in Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS., 6091370, pp. 5662-5665, 33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS 2011, Boston, MA, United States, 8/30/11. https://doi.org/10.1109/IEMBS.2011.6091370
Bae J, Chhatbar P, Francis JT, Sanchez JC, Principe JC. Reinforcement learning via kernel temporal difference. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS. 2011. p. 5662-5665. 6091370 https://doi.org/10.1109/IEMBS.2011.6091370
Bae, Jihye ; Chhatbar, Pratik ; Francis, Joseph T. ; Sanchez, Justin C. ; Principe, Jose C. / Reinforcement learning via kernel temporal difference. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS. 2011. pp. 5662-5665
@inproceedings{8dc43f5c2a5b428c82ae1894fb429760,
title = "Reinforcement learning via kernel temporal difference",
abstract = "This paper introduces a kernel adaptive filter implemented with stochastic gradient on temporal differences, kernel Temporal Difference (TD)(λ), to estimate the state-action value function in reinforcement learning. The case λ=0 will be studied in this paper. Experimental results show the method's applicability for learning motor state decoding during a center-out reaching task performed by a monkey. The results are compared to the implementation of a time delay neural network (TDNN) trained with backpropagation of the temporal difference error. From the experiments, it is observed that kernel TD(0) allows faster convergence and a better solution than the neural network.",
author = "Jihye Bae and Pratik Chhatbar and Francis, {Joseph T.} and Sanchez, {Justin C.} and Principe, {Jose C.}",
year = "2011",
month = "12",
day = "26",
doi = "10.1109/IEMBS.2011.6091370",
language = "English",
isbn = "9781424441211",
pages = "5662--5665",
booktitle = "Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS",

}

TY - GEN

T1 - Reinforcement learning via kernel temporal difference

AU - Bae, Jihye

AU - Chhatbar, Pratik

AU - Francis, Joseph T.

AU - Sanchez, Justin C.

AU - Principe, Jose C.

PY - 2011/12/26

Y1 - 2011/12/26

N2 - This paper introduces a kernel adaptive filter implemented with stochastic gradient on temporal differences, kernel Temporal Difference (TD)(λ), to estimate the state-action value function in reinforcement learning. The case λ=0 will be studied in this paper. Experimental results show the method's applicability for learning motor state decoding during a center-out reaching task performed by a monkey. The results are compared to the implementation of a time delay neural network (TDNN) trained with backpropagation of the temporal difference error. From the experiments, it is observed that kernel TD(0) allows faster convergence and a better solution than the neural network.

AB - This paper introduces a kernel adaptive filter implemented with stochastic gradient on temporal differences, kernel Temporal Difference (TD)(λ), to estimate the state-action value function in reinforcement learning. The case λ=0 will be studied in this paper. Experimental results show the method's applicability for learning motor state decoding during a center-out reaching task performed by a monkey. The results are compared to the implementation of a time delay neural network (TDNN) trained with backpropagation of the temporal difference error. From the experiments, it is observed that kernel TD(0) allows faster convergence and a better solution than the neural network.

UR - http://www.scopus.com/inward/record.url?scp=84055199049&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84055199049&partnerID=8YFLogxK

U2 - 10.1109/IEMBS.2011.6091370

DO - 10.1109/IEMBS.2011.6091370

M3 - Conference contribution

SN - 9781424441211

SP - 5662

EP - 5665

BT - Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS

ER -