Feedback for reinforcement learning based brain-machine interfaces using confidence metrics

Noeline W. Prins, Justin C. Sanchez, Abhishek Prasad

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Objective. For brain-machine interfaces (BMI) to be used in activities of daily living by paralyzed individuals, the BMI should be as autonomous as possible. One of the challenges is how the feedback is extracted and utilized in the BMI. Our long-term goal is to create autonomous BMIs that can utilize an evaluative feedback from the brain to update the decoding algorithm and use it intelligently in order to adapt the decoder. In this study, we show how to extract the necessary evaluative feedback from a biologically realistic (synthetic) source, use both the quantity and the quality of the feedback, and how that feedback information can be incorporated into a reinforcement learning (RL) controller architecture to maximize its performance. Approach. Motivated by the perception-action-reward cycle (PARC) in the brain which links reward for cognitive decision making and goal-directed behavior, we used a reward-based RL architecture named Actor-Critic RL as the model. Instead of using an error signal towards building an autonomous BMI, we envision to use a reward signal from the nucleus accumbens (NAcc) which plays a key role in the linking of reward to motor behaviors. To deal with the complexity and non-stationarity of biological reward signals, we used a confidence metric which was used to indicate the degree of feedback accuracy. This confidence was added to the Actor's weight update equation in the RL controller architecture. If the confidence was high (>0.2), the BMI decoder used this feedback to update its parameters. However, when the confidence was low, the BMI decoder ignored the feedback and did not update its parameters. The range between high confidence and low confidence was termed as the 'ambiguous' region. When the feedback was within this region, the BMI decoder updated its weight at a lower rate than when fully confident, which was decided by the confidence. We used two biologically realistic models to generate synthetic data for MI (Izhikevich model) and NAcc (Humphries model) to validate proposed controller architecture. Main results. In this work, we show how the overall performance of the BMI was improved by using a threshold close to the decision boundary to reject erroneous feedback. Additionally, we show the stability of the system improved when the feedback was used with a threshold. Significance: The result of this study is a step towards making BMIs autonomous. While our method is not fully autonomous, the results demonstrate that extensive training times necessary at the beginning of each BMI session can be significantly decreased. In our approach, decoder training time was only limited to 10 trials in the first BMI session. Subsequent sessions used previous session weights to initialize the decoder. We also present a method where the use of a threshold can be applied to any decoder with a feedback signal that is less than perfect so that erroneous feedback can be avoided and the stability of the system can be increased.

Original languageEnglish (US)
Article number036016
JournalJournal of Neural Engineering
Volume14
Issue number3
DOIs
StatePublished - Mar 30 2017

Fingerprint

Brain-Computer Interfaces
Reinforcement learning
Brain
Feedback
Reward
Learning
Nucleus Accumbens
Weights and Measures
Formative Feedback
Reinforcement (Psychology)
Controllers
Activities of Daily Living
Decoding

Keywords

  • biological feedback
  • brain machine interface
  • confidence
  • decoder
  • nucleus Accumbens
  • reinforcement learning

ASJC Scopus subject areas

  • Biomedical Engineering
  • Cellular and Molecular Neuroscience

Cite this

Feedback for reinforcement learning based brain-machine interfaces using confidence metrics. / Prins, Noeline W.; Sanchez, Justin C.; Prasad, Abhishek.

In: Journal of Neural Engineering, Vol. 14, No. 3, 036016, 30.03.2017.

Research output: Contribution to journalArticle

@article{b863eed0c09a4bd69af158c690314341,
title = "Feedback for reinforcement learning based brain-machine interfaces using confidence metrics",
abstract = "Objective. For brain-machine interfaces (BMI) to be used in activities of daily living by paralyzed individuals, the BMI should be as autonomous as possible. One of the challenges is how the feedback is extracted and utilized in the BMI. Our long-term goal is to create autonomous BMIs that can utilize an evaluative feedback from the brain to update the decoding algorithm and use it intelligently in order to adapt the decoder. In this study, we show how to extract the necessary evaluative feedback from a biologically realistic (synthetic) source, use both the quantity and the quality of the feedback, and how that feedback information can be incorporated into a reinforcement learning (RL) controller architecture to maximize its performance. Approach. Motivated by the perception-action-reward cycle (PARC) in the brain which links reward for cognitive decision making and goal-directed behavior, we used a reward-based RL architecture named Actor-Critic RL as the model. Instead of using an error signal towards building an autonomous BMI, we envision to use a reward signal from the nucleus accumbens (NAcc) which plays a key role in the linking of reward to motor behaviors. To deal with the complexity and non-stationarity of biological reward signals, we used a confidence metric which was used to indicate the degree of feedback accuracy. This confidence was added to the Actor's weight update equation in the RL controller architecture. If the confidence was high (>0.2), the BMI decoder used this feedback to update its parameters. However, when the confidence was low, the BMI decoder ignored the feedback and did not update its parameters. The range between high confidence and low confidence was termed as the 'ambiguous' region. When the feedback was within this region, the BMI decoder updated its weight at a lower rate than when fully confident, which was decided by the confidence. We used two biologically realistic models to generate synthetic data for MI (Izhikevich model) and NAcc (Humphries model) to validate proposed controller architecture. Main results. In this work, we show how the overall performance of the BMI was improved by using a threshold close to the decision boundary to reject erroneous feedback. Additionally, we show the stability of the system improved when the feedback was used with a threshold. Significance: The result of this study is a step towards making BMIs autonomous. While our method is not fully autonomous, the results demonstrate that extensive training times necessary at the beginning of each BMI session can be significantly decreased. In our approach, decoder training time was only limited to 10 trials in the first BMI session. Subsequent sessions used previous session weights to initialize the decoder. We also present a method where the use of a threshold can be applied to any decoder with a feedback signal that is less than perfect so that erroneous feedback can be avoided and the stability of the system can be increased.",
keywords = "biological feedback, brain machine interface, confidence, decoder, nucleus Accumbens, reinforcement learning",
author = "Prins, {Noeline W.} and Sanchez, {Justin C.} and Abhishek Prasad",
year = "2017",
month = "3",
day = "30",
doi = "10.1088/1741-2552/aa6317",
language = "English (US)",
volume = "14",
journal = "Journal of Neural Engineering",
issn = "1741-2560",
publisher = "IOP Publishing Ltd.",
number = "3",

}

TY - JOUR

T1 - Feedback for reinforcement learning based brain-machine interfaces using confidence metrics

AU - Prins, Noeline W.

AU - Sanchez, Justin C.

AU - Prasad, Abhishek

PY - 2017/3/30

Y1 - 2017/3/30

N2 - Objective. For brain-machine interfaces (BMI) to be used in activities of daily living by paralyzed individuals, the BMI should be as autonomous as possible. One of the challenges is how the feedback is extracted and utilized in the BMI. Our long-term goal is to create autonomous BMIs that can utilize an evaluative feedback from the brain to update the decoding algorithm and use it intelligently in order to adapt the decoder. In this study, we show how to extract the necessary evaluative feedback from a biologically realistic (synthetic) source, use both the quantity and the quality of the feedback, and how that feedback information can be incorporated into a reinforcement learning (RL) controller architecture to maximize its performance. Approach. Motivated by the perception-action-reward cycle (PARC) in the brain which links reward for cognitive decision making and goal-directed behavior, we used a reward-based RL architecture named Actor-Critic RL as the model. Instead of using an error signal towards building an autonomous BMI, we envision to use a reward signal from the nucleus accumbens (NAcc) which plays a key role in the linking of reward to motor behaviors. To deal with the complexity and non-stationarity of biological reward signals, we used a confidence metric which was used to indicate the degree of feedback accuracy. This confidence was added to the Actor's weight update equation in the RL controller architecture. If the confidence was high (>0.2), the BMI decoder used this feedback to update its parameters. However, when the confidence was low, the BMI decoder ignored the feedback and did not update its parameters. The range between high confidence and low confidence was termed as the 'ambiguous' region. When the feedback was within this region, the BMI decoder updated its weight at a lower rate than when fully confident, which was decided by the confidence. We used two biologically realistic models to generate synthetic data for MI (Izhikevich model) and NAcc (Humphries model) to validate proposed controller architecture. Main results. In this work, we show how the overall performance of the BMI was improved by using a threshold close to the decision boundary to reject erroneous feedback. Additionally, we show the stability of the system improved when the feedback was used with a threshold. Significance: The result of this study is a step towards making BMIs autonomous. While our method is not fully autonomous, the results demonstrate that extensive training times necessary at the beginning of each BMI session can be significantly decreased. In our approach, decoder training time was only limited to 10 trials in the first BMI session. Subsequent sessions used previous session weights to initialize the decoder. We also present a method where the use of a threshold can be applied to any decoder with a feedback signal that is less than perfect so that erroneous feedback can be avoided and the stability of the system can be increased.

AB - Objective. For brain-machine interfaces (BMI) to be used in activities of daily living by paralyzed individuals, the BMI should be as autonomous as possible. One of the challenges is how the feedback is extracted and utilized in the BMI. Our long-term goal is to create autonomous BMIs that can utilize an evaluative feedback from the brain to update the decoding algorithm and use it intelligently in order to adapt the decoder. In this study, we show how to extract the necessary evaluative feedback from a biologically realistic (synthetic) source, use both the quantity and the quality of the feedback, and how that feedback information can be incorporated into a reinforcement learning (RL) controller architecture to maximize its performance. Approach. Motivated by the perception-action-reward cycle (PARC) in the brain which links reward for cognitive decision making and goal-directed behavior, we used a reward-based RL architecture named Actor-Critic RL as the model. Instead of using an error signal towards building an autonomous BMI, we envision to use a reward signal from the nucleus accumbens (NAcc) which plays a key role in the linking of reward to motor behaviors. To deal with the complexity and non-stationarity of biological reward signals, we used a confidence metric which was used to indicate the degree of feedback accuracy. This confidence was added to the Actor's weight update equation in the RL controller architecture. If the confidence was high (>0.2), the BMI decoder used this feedback to update its parameters. However, when the confidence was low, the BMI decoder ignored the feedback and did not update its parameters. The range between high confidence and low confidence was termed as the 'ambiguous' region. When the feedback was within this region, the BMI decoder updated its weight at a lower rate than when fully confident, which was decided by the confidence. We used two biologically realistic models to generate synthetic data for MI (Izhikevich model) and NAcc (Humphries model) to validate proposed controller architecture. Main results. In this work, we show how the overall performance of the BMI was improved by using a threshold close to the decision boundary to reject erroneous feedback. Additionally, we show the stability of the system improved when the feedback was used with a threshold. Significance: The result of this study is a step towards making BMIs autonomous. While our method is not fully autonomous, the results demonstrate that extensive training times necessary at the beginning of each BMI session can be significantly decreased. In our approach, decoder training time was only limited to 10 trials in the first BMI session. Subsequent sessions used previous session weights to initialize the decoder. We also present a method where the use of a threshold can be applied to any decoder with a feedback signal that is less than perfect so that erroneous feedback can be avoided and the stability of the system can be increased.

KW - biological feedback

KW - brain machine interface

KW - confidence

KW - decoder

KW - nucleus Accumbens

KW - reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85020472961&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85020472961&partnerID=8YFLogxK

U2 - 10.1088/1741-2552/aa6317

DO - 10.1088/1741-2552/aa6317

M3 - Article

C2 - 28240598

AN - SCOPUS:85020472961

VL - 14

JO - Journal of Neural Engineering

JF - Journal of Neural Engineering

SN - 1741-2560

IS - 3

M1 - 036016

ER -