Spoken emotion recognition using glottal symmetry

Alexander I. Iliev, Michael S Scordilis

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

Speech variability in real-world situations makes spoken emotion recognition a challenging task. While a variety of temporal and spectral speech features have been proposed, this paper investigates the effectiveness of using the glottal airflow signal in recognizing emotions. The speech used in this investigation is from a classical recording of the theatrical play Waiting for Godot by Samuel Beckett. Six emotions were investigated: happy, angry, sad, fear, surprise, and neutral. The proposed method was tested on the original recording and on simulated distortion conditions. In clean signal conditions the proposed method achieved average recognition rates of 76 for four emotions and 66.5 for all six emotions. Furthermore, it proved fairly robust under signal distortion and noisy conditions achieving recognition rates of 60 for four and 51.6 for six emotions for severely low-pass filtered speech, while with additive white Gaussian noise at SNR = 10dB recognition rates were 53 and 47 for the four and six-emotion tasks, respectively. Results indicate that glottal signal features provide good separation of spoken emotions and achieve enhanced classification performance when compared to other approaches.

Original languageEnglish
Article number624575
JournalEurasip Journal on Advances in Signal Processing
Volume2011
DOIs
StatePublished - Jun 22 2011

Fingerprint

Signal distortion

ASJC Scopus subject areas

  • Hardware and Architecture
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Spoken emotion recognition using glottal symmetry. / Iliev, Alexander I.; Scordilis, Michael S.

In: Eurasip Journal on Advances in Signal Processing, Vol. 2011, 624575, 22.06.2011.

Research output: Contribution to journalArticle

Iliev, Alexander I. ; Scordilis, Michael S. / Spoken emotion recognition using glottal symmetry. In: Eurasip Journal on Advances in Signal Processing. 2011 ; Vol. 2011.
@article{3cd199f635ae4b538aca953f1754004b,
title = "Spoken emotion recognition using glottal symmetry",
abstract = "Speech variability in real-world situations makes spoken emotion recognition a challenging task. While a variety of temporal and spectral speech features have been proposed, this paper investigates the effectiveness of using the glottal airflow signal in recognizing emotions. The speech used in this investigation is from a classical recording of the theatrical play Waiting for Godot by Samuel Beckett. Six emotions were investigated: happy, angry, sad, fear, surprise, and neutral. The proposed method was tested on the original recording and on simulated distortion conditions. In clean signal conditions the proposed method achieved average recognition rates of 76 for four emotions and 66.5 for all six emotions. Furthermore, it proved fairly robust under signal distortion and noisy conditions achieving recognition rates of 60 for four and 51.6 for six emotions for severely low-pass filtered speech, while with additive white Gaussian noise at SNR = 10dB recognition rates were 53 and 47 for the four and six-emotion tasks, respectively. Results indicate that glottal signal features provide good separation of spoken emotions and achieve enhanced classification performance when compared to other approaches.",
author = "Iliev, {Alexander I.} and Scordilis, {Michael S}",
year = "2011",
month = "6",
day = "22",
doi = "10.1155/2011/624575",
language = "English",
volume = "2011",
journal = "Eurasip Journal on Advances in Signal Processing",
issn = "1687-6172",
publisher = "Springer Publishing Company",

}

TY - JOUR

T1 - Spoken emotion recognition using glottal symmetry

AU - Iliev, Alexander I.

AU - Scordilis, Michael S

PY - 2011/6/22

Y1 - 2011/6/22

N2 - Speech variability in real-world situations makes spoken emotion recognition a challenging task. While a variety of temporal and spectral speech features have been proposed, this paper investigates the effectiveness of using the glottal airflow signal in recognizing emotions. The speech used in this investigation is from a classical recording of the theatrical play Waiting for Godot by Samuel Beckett. Six emotions were investigated: happy, angry, sad, fear, surprise, and neutral. The proposed method was tested on the original recording and on simulated distortion conditions. In clean signal conditions the proposed method achieved average recognition rates of 76 for four emotions and 66.5 for all six emotions. Furthermore, it proved fairly robust under signal distortion and noisy conditions achieving recognition rates of 60 for four and 51.6 for six emotions for severely low-pass filtered speech, while with additive white Gaussian noise at SNR = 10dB recognition rates were 53 and 47 for the four and six-emotion tasks, respectively. Results indicate that glottal signal features provide good separation of spoken emotions and achieve enhanced classification performance when compared to other approaches.

AB - Speech variability in real-world situations makes spoken emotion recognition a challenging task. While a variety of temporal and spectral speech features have been proposed, this paper investigates the effectiveness of using the glottal airflow signal in recognizing emotions. The speech used in this investigation is from a classical recording of the theatrical play Waiting for Godot by Samuel Beckett. Six emotions were investigated: happy, angry, sad, fear, surprise, and neutral. The proposed method was tested on the original recording and on simulated distortion conditions. In clean signal conditions the proposed method achieved average recognition rates of 76 for four emotions and 66.5 for all six emotions. Furthermore, it proved fairly robust under signal distortion and noisy conditions achieving recognition rates of 60 for four and 51.6 for six emotions for severely low-pass filtered speech, while with additive white Gaussian noise at SNR = 10dB recognition rates were 53 and 47 for the four and six-emotion tasks, respectively. Results indicate that glottal signal features provide good separation of spoken emotions and achieve enhanced classification performance when compared to other approaches.

UR - http://www.scopus.com/inward/record.url?scp=79959207330&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959207330&partnerID=8YFLogxK

U2 - 10.1155/2011/624575

DO - 10.1155/2011/624575

M3 - Article

AN - SCOPUS:79959207330

VL - 2011

JO - Eurasip Journal on Advances in Signal Processing

JF - Eurasip Journal on Advances in Signal Processing

SN - 1687-6172

M1 - 624575

ER -