Stochastic kernel temporal difference for reinforcement learning

Jihye Bae, Luis Sanchez Giraldo, Pratik Chhatbar, Joseph Francis, Justin C. Sanchez, Jose Principe

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

This paper introduces a kernel adaptive filter using the stochastic gradient on temporal differences, kernel TD(λ), to estimate the state-action value function Q in reinforcement learning. Kernel methods are powerful for solving nonlinear problems, but the growing computational complexity and memory size limit their applicability on practical scenarios. To overcome this, the quantization approach introduced in [1] is applied. To help understand the behavior and illustrate the role of the parameters, we apply the algorithm on a 2-dimentional spatial navigation task. Eligibility traces are commonly applied in TD learning to improve data efficiency, so the relations of eligibility trace λ and step size and filter size are observed. Moreover, kernel TD (0) is applied to neural decoding of an 8 target center-out reaching task performed by a monkey. Results show the method can effectively learn the brain-state action mapping for this task.

Original languageEnglish
Title of host publicationIEEE International Workshop on Machine Learning for Signal Processing
DOIs
StatePublished - Dec 5 2011
Event21st IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2011 - Beijing, China
Duration: Sep 18 2011Sep 21 2011

Other

Other21st IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2011
CountryChina
CityBeijing
Period9/18/119/21/11

Fingerprint

Reinforcement learning
Adaptive filters
Decoding
Computational complexity
Brain
Navigation
Data storage equipment

Keywords

  • adaptive filtering
  • kernel methods
  • reinforcement learning
  • Temporal difference learning

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing

Cite this

Bae, J., Giraldo, L. S., Chhatbar, P., Francis, J., Sanchez, J. C., & Principe, J. (2011). Stochastic kernel temporal difference for reinforcement learning. In IEEE International Workshop on Machine Learning for Signal Processing [6064634] https://doi.org/10.1109/MLSP.2011.6064634

Stochastic kernel temporal difference for reinforcement learning. / Bae, Jihye; Giraldo, Luis Sanchez; Chhatbar, Pratik; Francis, Joseph; Sanchez, Justin C.; Principe, Jose.

IEEE International Workshop on Machine Learning for Signal Processing. 2011. 6064634.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bae, J, Giraldo, LS, Chhatbar, P, Francis, J, Sanchez, JC & Principe, J 2011, Stochastic kernel temporal difference for reinforcement learning. in IEEE International Workshop on Machine Learning for Signal Processing., 6064634, 21st IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2011, Beijing, China, 9/18/11. https://doi.org/10.1109/MLSP.2011.6064634
Bae J, Giraldo LS, Chhatbar P, Francis J, Sanchez JC, Principe J. Stochastic kernel temporal difference for reinforcement learning. In IEEE International Workshop on Machine Learning for Signal Processing. 2011. 6064634 https://doi.org/10.1109/MLSP.2011.6064634
Bae, Jihye ; Giraldo, Luis Sanchez ; Chhatbar, Pratik ; Francis, Joseph ; Sanchez, Justin C. ; Principe, Jose. / Stochastic kernel temporal difference for reinforcement learning. IEEE International Workshop on Machine Learning for Signal Processing. 2011.
@inproceedings{5a8e24d2df8f4f1797a1d7314f0029e9,
title = "Stochastic kernel temporal difference for reinforcement learning",
abstract = "This paper introduces a kernel adaptive filter using the stochastic gradient on temporal differences, kernel TD(λ), to estimate the state-action value function Q in reinforcement learning. Kernel methods are powerful for solving nonlinear problems, but the growing computational complexity and memory size limit their applicability on practical scenarios. To overcome this, the quantization approach introduced in [1] is applied. To help understand the behavior and illustrate the role of the parameters, we apply the algorithm on a 2-dimentional spatial navigation task. Eligibility traces are commonly applied in TD learning to improve data efficiency, so the relations of eligibility trace λ and step size and filter size are observed. Moreover, kernel TD (0) is applied to neural decoding of an 8 target center-out reaching task performed by a monkey. Results show the method can effectively learn the brain-state action mapping for this task.",
keywords = "adaptive filtering, kernel methods, reinforcement learning, Temporal difference learning",
author = "Jihye Bae and Giraldo, {Luis Sanchez} and Pratik Chhatbar and Joseph Francis and Sanchez, {Justin C.} and Jose Principe",
year = "2011",
month = "12",
day = "5",
doi = "10.1109/MLSP.2011.6064634",
language = "English",
isbn = "9781457716232",
booktitle = "IEEE International Workshop on Machine Learning for Signal Processing",

}

TY - GEN

T1 - Stochastic kernel temporal difference for reinforcement learning

AU - Bae, Jihye

AU - Giraldo, Luis Sanchez

AU - Chhatbar, Pratik

AU - Francis, Joseph

AU - Sanchez, Justin C.

AU - Principe, Jose

PY - 2011/12/5

Y1 - 2011/12/5

N2 - This paper introduces a kernel adaptive filter using the stochastic gradient on temporal differences, kernel TD(λ), to estimate the state-action value function Q in reinforcement learning. Kernel methods are powerful for solving nonlinear problems, but the growing computational complexity and memory size limit their applicability on practical scenarios. To overcome this, the quantization approach introduced in [1] is applied. To help understand the behavior and illustrate the role of the parameters, we apply the algorithm on a 2-dimentional spatial navigation task. Eligibility traces are commonly applied in TD learning to improve data efficiency, so the relations of eligibility trace λ and step size and filter size are observed. Moreover, kernel TD (0) is applied to neural decoding of an 8 target center-out reaching task performed by a monkey. Results show the method can effectively learn the brain-state action mapping for this task.

AB - This paper introduces a kernel adaptive filter using the stochastic gradient on temporal differences, kernel TD(λ), to estimate the state-action value function Q in reinforcement learning. Kernel methods are powerful for solving nonlinear problems, but the growing computational complexity and memory size limit their applicability on practical scenarios. To overcome this, the quantization approach introduced in [1] is applied. To help understand the behavior and illustrate the role of the parameters, we apply the algorithm on a 2-dimentional spatial navigation task. Eligibility traces are commonly applied in TD learning to improve data efficiency, so the relations of eligibility trace λ and step size and filter size are observed. Moreover, kernel TD (0) is applied to neural decoding of an 8 target center-out reaching task performed by a monkey. Results show the method can effectively learn the brain-state action mapping for this task.

KW - adaptive filtering

KW - kernel methods

KW - reinforcement learning

KW - Temporal difference learning

UR - http://www.scopus.com/inward/record.url?scp=82455163918&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=82455163918&partnerID=8YFLogxK

U2 - 10.1109/MLSP.2011.6064634

DO - 10.1109/MLSP.2011.6064634

M3 - Conference contribution

AN - SCOPUS:82455163918

SN - 9781457716232

BT - IEEE International Workshop on Machine Learning for Signal Processing

ER -