Neural network-based control strategy for a speech formant synthesizer

Michael S Scordilis, John N. Gowdy

Research output: Contribution to journalArticle

Abstract

The quality of text-to-speech conversion performed by machines is still unacceptably low and does not compare well with natural speech. As a result, synthetic speech has not found wide acceptance and has been restricted mainly in useful applications for the handicapped. The problems associated with automatic speech synthesis are related primarily to the methods of controlling the mathematical models of the human vocal tract and its properties as they change with time during discourse. In formant synthesis, rules are applied to relate the incoming phonemic information to values of the synthesizer control vectors. Such rules are usually developed through the analysis of a representative set of utterances and adjusted with listening tests. The tedious nature of the parameter extraction process and the lack of unambiguous relationships of acoustic events with spectral information have hindered the effective control of the models. In this article, artificial neural networks are employed to assist with the latter concern. For this purpose, a set of 56 common words composed of larynx-produced phonemes was analyzed and used to train a network cluster. The system was able to produce intelligible speech for certain phonemic combinations, and the role of neural networks in designing alternatives to the classical rule-based approach was studied.

Original languageEnglish
Pages (from-to)195-203
Number of pages9
JournalJournal of artificial neural networks
Volume2
Issue number3
StatePublished - Dec 1 1995
Externally publishedYes

Fingerprint

Neural networks
Speech synthesis
Parameter extraction
Acoustics
Mathematical models

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Neural network-based control strategy for a speech formant synthesizer. / Scordilis, Michael S; Gowdy, John N.

In: Journal of artificial neural networks, Vol. 2, No. 3, 01.12.1995, p. 195-203.

Research output: Contribution to journalArticle

Scordilis, Michael S ; Gowdy, John N. / Neural network-based control strategy for a speech formant synthesizer. In: Journal of artificial neural networks. 1995 ; Vol. 2, No. 3. pp. 195-203.
@article{35dd6e2b2fb04e64a32949f52815b96c,
title = "Neural network-based control strategy for a speech formant synthesizer",
abstract = "The quality of text-to-speech conversion performed by machines is still unacceptably low and does not compare well with natural speech. As a result, synthetic speech has not found wide acceptance and has been restricted mainly in useful applications for the handicapped. The problems associated with automatic speech synthesis are related primarily to the methods of controlling the mathematical models of the human vocal tract and its properties as they change with time during discourse. In formant synthesis, rules are applied to relate the incoming phonemic information to values of the synthesizer control vectors. Such rules are usually developed through the analysis of a representative set of utterances and adjusted with listening tests. The tedious nature of the parameter extraction process and the lack of unambiguous relationships of acoustic events with spectral information have hindered the effective control of the models. In this article, artificial neural networks are employed to assist with the latter concern. For this purpose, a set of 56 common words composed of larynx-produced phonemes was analyzed and used to train a network cluster. The system was able to produce intelligible speech for certain phonemic combinations, and the role of neural networks in designing alternatives to the classical rule-based approach was studied.",
author = "Scordilis, {Michael S} and Gowdy, {John N.}",
year = "1995",
month = "12",
day = "1",
language = "English",
volume = "2",
pages = "195--203",
journal = "Journal of artificial neural networks",
issn = "1073-5828",
number = "3",

}

TY - JOUR

T1 - Neural network-based control strategy for a speech formant synthesizer

AU - Scordilis, Michael S

AU - Gowdy, John N.

PY - 1995/12/1

Y1 - 1995/12/1

N2 - The quality of text-to-speech conversion performed by machines is still unacceptably low and does not compare well with natural speech. As a result, synthetic speech has not found wide acceptance and has been restricted mainly in useful applications for the handicapped. The problems associated with automatic speech synthesis are related primarily to the methods of controlling the mathematical models of the human vocal tract and its properties as they change with time during discourse. In formant synthesis, rules are applied to relate the incoming phonemic information to values of the synthesizer control vectors. Such rules are usually developed through the analysis of a representative set of utterances and adjusted with listening tests. The tedious nature of the parameter extraction process and the lack of unambiguous relationships of acoustic events with spectral information have hindered the effective control of the models. In this article, artificial neural networks are employed to assist with the latter concern. For this purpose, a set of 56 common words composed of larynx-produced phonemes was analyzed and used to train a network cluster. The system was able to produce intelligible speech for certain phonemic combinations, and the role of neural networks in designing alternatives to the classical rule-based approach was studied.

AB - The quality of text-to-speech conversion performed by machines is still unacceptably low and does not compare well with natural speech. As a result, synthetic speech has not found wide acceptance and has been restricted mainly in useful applications for the handicapped. The problems associated with automatic speech synthesis are related primarily to the methods of controlling the mathematical models of the human vocal tract and its properties as they change with time during discourse. In formant synthesis, rules are applied to relate the incoming phonemic information to values of the synthesizer control vectors. Such rules are usually developed through the analysis of a representative set of utterances and adjusted with listening tests. The tedious nature of the parameter extraction process and the lack of unambiguous relationships of acoustic events with spectral information have hindered the effective control of the models. In this article, artificial neural networks are employed to assist with the latter concern. For this purpose, a set of 56 common words composed of larynx-produced phonemes was analyzed and used to train a network cluster. The system was able to produce intelligible speech for certain phonemic combinations, and the role of neural networks in designing alternatives to the classical rule-based approach was studied.

UR - http://www.scopus.com/inward/record.url?scp=0029534680&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0029534680&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0029534680

VL - 2

SP - 195

EP - 203

JO - Journal of artificial neural networks

JF - Journal of artificial neural networks

SN - 1073-5828

IS - 3

ER -