Spoken emotion recognition through optimum-path forest classification using glottal features

Alexander I. Iliev, Michael S Scordilis, João P. Papa, Alexandre X. Falcão

Research output: Contribution to journalArticle

65 Citations (Scopus)

Abstract

A new method for the recognition of spoken emotions is presented based on features of the glottal airflow signal. Its effectiveness is tested on the new optimum path classifier (OPF) as well as on six other previously established classification methods that included the Gaussian mixture model (GMM), support vector machine (SVM), artificial neural networks - multi layer perceptron (ANN-MLP), k-nearest neighbor rule (k-NN), Bayesian classifier (BC) and the C4.5 decision tree. The speech database used in this work was collected in an anechoic environment with ten speakers (5 M and 5 F) each speaking ten sentences in four different emotions: Happy, Angry, Sad, and Neutral. The glottal waveform was extracted from fluent speech via inverse filtering. The investigated features included the glottal symmetry and MFCC vectors of various lengths both for the glottal and the corresponding speech signal. Experimental results indicate that best performance is obtained for the glottal-only features with SVM and OPF generally providing the highest recognition rates, while for GMM or the combination of glottal and speech features performance was relatively inferior. For this text dependent, multi speaker task the top performing classifiers achieved perfect recognition rates for the case of 6th order glottal MFCCs.

Original languageEnglish
Pages (from-to)445-460
Number of pages16
JournalComputer Speech and Language
Volume24
Issue number3
DOIs
StatePublished - Jul 1 2010

Fingerprint

Emotion Recognition
Classifiers
Classifier
Gaussian Mixture Model
Path
Support Vector Machine
Support vector machines
Bayesian Classifier
Speech Signal
Perceptron
Decision tree
Waveform
Artificial Neural Network
Multilayer
Nearest Neighbor
Multilayer neural networks
Filtering
Decision trees
Symmetry
Dependent

Keywords

  • Emotion recognition
  • Glottal analysis
  • Optimum-path forest
  • Speech analysis

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Theoretical Computer Science

Cite this

Spoken emotion recognition through optimum-path forest classification using glottal features. / Iliev, Alexander I.; Scordilis, Michael S; Papa, João P.; Falcão, Alexandre X.

In: Computer Speech and Language, Vol. 24, No. 3, 01.07.2010, p. 445-460.

Research output: Contribution to journalArticle

Iliev, Alexander I. ; Scordilis, Michael S ; Papa, João P. ; Falcão, Alexandre X. / Spoken emotion recognition through optimum-path forest classification using glottal features. In: Computer Speech and Language. 2010 ; Vol. 24, No. 3. pp. 445-460.
@article{ada3ddfd27bc4274a770799953d9b19d,
title = "Spoken emotion recognition through optimum-path forest classification using glottal features",
abstract = "A new method for the recognition of spoken emotions is presented based on features of the glottal airflow signal. Its effectiveness is tested on the new optimum path classifier (OPF) as well as on six other previously established classification methods that included the Gaussian mixture model (GMM), support vector machine (SVM), artificial neural networks - multi layer perceptron (ANN-MLP), k-nearest neighbor rule (k-NN), Bayesian classifier (BC) and the C4.5 decision tree. The speech database used in this work was collected in an anechoic environment with ten speakers (5 M and 5 F) each speaking ten sentences in four different emotions: Happy, Angry, Sad, and Neutral. The glottal waveform was extracted from fluent speech via inverse filtering. The investigated features included the glottal symmetry and MFCC vectors of various lengths both for the glottal and the corresponding speech signal. Experimental results indicate that best performance is obtained for the glottal-only features with SVM and OPF generally providing the highest recognition rates, while for GMM or the combination of glottal and speech features performance was relatively inferior. For this text dependent, multi speaker task the top performing classifiers achieved perfect recognition rates for the case of 6th order glottal MFCCs.",
keywords = "Emotion recognition, Glottal analysis, Optimum-path forest, Speech analysis",
author = "Iliev, {Alexander I.} and Scordilis, {Michael S} and Papa, {Jo{\~a}o P.} and Falc{\~a}o, {Alexandre X.}",
year = "2010",
month = "7",
day = "1",
doi = "10.1016/j.csl.2009.02.005",
language = "English",
volume = "24",
pages = "445--460",
journal = "Computer Speech and Language",
issn = "0885-2308",
publisher = "Academic Press Inc.",
number = "3",

}

TY - JOUR

T1 - Spoken emotion recognition through optimum-path forest classification using glottal features

AU - Iliev, Alexander I.

AU - Scordilis, Michael S

AU - Papa, João P.

AU - Falcão, Alexandre X.

PY - 2010/7/1

Y1 - 2010/7/1

N2 - A new method for the recognition of spoken emotions is presented based on features of the glottal airflow signal. Its effectiveness is tested on the new optimum path classifier (OPF) as well as on six other previously established classification methods that included the Gaussian mixture model (GMM), support vector machine (SVM), artificial neural networks - multi layer perceptron (ANN-MLP), k-nearest neighbor rule (k-NN), Bayesian classifier (BC) and the C4.5 decision tree. The speech database used in this work was collected in an anechoic environment with ten speakers (5 M and 5 F) each speaking ten sentences in four different emotions: Happy, Angry, Sad, and Neutral. The glottal waveform was extracted from fluent speech via inverse filtering. The investigated features included the glottal symmetry and MFCC vectors of various lengths both for the glottal and the corresponding speech signal. Experimental results indicate that best performance is obtained for the glottal-only features with SVM and OPF generally providing the highest recognition rates, while for GMM or the combination of glottal and speech features performance was relatively inferior. For this text dependent, multi speaker task the top performing classifiers achieved perfect recognition rates for the case of 6th order glottal MFCCs.

AB - A new method for the recognition of spoken emotions is presented based on features of the glottal airflow signal. Its effectiveness is tested on the new optimum path classifier (OPF) as well as on six other previously established classification methods that included the Gaussian mixture model (GMM), support vector machine (SVM), artificial neural networks - multi layer perceptron (ANN-MLP), k-nearest neighbor rule (k-NN), Bayesian classifier (BC) and the C4.5 decision tree. The speech database used in this work was collected in an anechoic environment with ten speakers (5 M and 5 F) each speaking ten sentences in four different emotions: Happy, Angry, Sad, and Neutral. The glottal waveform was extracted from fluent speech via inverse filtering. The investigated features included the glottal symmetry and MFCC vectors of various lengths both for the glottal and the corresponding speech signal. Experimental results indicate that best performance is obtained for the glottal-only features with SVM and OPF generally providing the highest recognition rates, while for GMM or the combination of glottal and speech features performance was relatively inferior. For this text dependent, multi speaker task the top performing classifiers achieved perfect recognition rates for the case of 6th order glottal MFCCs.

KW - Emotion recognition

KW - Glottal analysis

KW - Optimum-path forest

KW - Speech analysis

UR - http://www.scopus.com/inward/record.url?scp=77950073346&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77950073346&partnerID=8YFLogxK

U2 - 10.1016/j.csl.2009.02.005

DO - 10.1016/j.csl.2009.02.005

M3 - Article

AN - SCOPUS:77950073346

VL - 24

SP - 445

EP - 460

JO - Computer Speech and Language

JF - Computer Speech and Language

SN - 0885-2308

IS - 3

ER -