Reinforcement learning via kernel temporal difference.

Jihye Bae, Pratik Chhatbar, Joseph T. Francis, Justin C. Sanchez, Jose C. Principe

Research output: Contribution to journalArticle

5 Scopus citations

Abstract

This paper introduces a kernel adaptive filter implemented with stochastic gradient on temporal differences, kernel Temporal Difference (TD)(λ), to estimate the state-action value function in reinforcement learning. The case λ=0 will be studied in this paper. Experimental results show the method's applicability for learning motor state decoding during a center-out reaching task performed by a monkey. The results are compared to the implementation of a time delay neural network (TDNN) trained with backpropagation of the temporal difference error. From the experiments, it is observed that kernel TD(0) allows faster convergence and a better solution than the neural network.

ASJC Scopus subject areas

  • Signal Processing
  • Biomedical Engineering
  • Computer Vision and Pattern Recognition
  • Health Informatics

Fingerprint Dive into the research topics of 'Reinforcement learning via kernel temporal difference.'. Together they form a unique fingerprint.

  • Cite this