Analysis, enhancement and evaluation of five pitch determination techniques

Peter Veprek, Michael S Scordilis

Research output: Contribution to journalArticle

49 Citations (Scopus)

Abstract

Speech classification into voiced and unvoiced (or silent) portions is important in many speech processing applications. In addition, segmentation of voiced speech into individual pitch epochs is necessary in several high quality speech synthesis and coding techniques. This paper introduces criteria for measuring the performance of automatic procedures performing this task against manually segmented and labeled data. First, five basic pitch determination algorithms (PDAs) (SIFT, comb filter energy maximization, spectrum decimation/accumulation, optimal temporal similarity and dyadic wavelet transform) are evaluated and their performance is analyzed. A set of enhancements is then developed and applied to the basic algorithms, which yields superior performance by virtually eliminating multiple and sub-multiple pitch assignment errors and reducing all other errors. Evaluation shows that the enhancements improved performance of all five PDAs with the improvement ranging from 3.5% for the comb filter energy maximization method to 8.3% for the dyadic wavelet transform method.

Original languageEnglish
Pages (from-to)249-270
Number of pages22
JournalSpeech Communication
Volume37
Issue number3-4
DOIs
StatePublished - Jul 1 2002

Fingerprint

Comb filters
Enhancement
Wavelet Analysis
Comb and Wattles
Wavelet transforms
Evaluation
evaluation
Wavelet Transform
performance
Speech coding
Speech processing
Speech synthesis
Speech Coding
Filter
energy
Speech Processing
Speech Synthesis
Decimation
Scale Invariant Feature Transform
Energy

Keywords

  • Pitch determination
  • Speech analysis
  • Speech segmentation

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Experimental and Cognitive Psychology
  • Linguistics and Language

Cite this

Analysis, enhancement and evaluation of five pitch determination techniques. / Veprek, Peter; Scordilis, Michael S.

In: Speech Communication, Vol. 37, No. 3-4, 01.07.2002, p. 249-270.

Research output: Contribution to journalArticle

Veprek, Peter ; Scordilis, Michael S. / Analysis, enhancement and evaluation of five pitch determination techniques. In: Speech Communication. 2002 ; Vol. 37, No. 3-4. pp. 249-270.
@article{80338ff7536e4d75924c9ebead13b388,
title = "Analysis, enhancement and evaluation of five pitch determination techniques",
abstract = "Speech classification into voiced and unvoiced (or silent) portions is important in many speech processing applications. In addition, segmentation of voiced speech into individual pitch epochs is necessary in several high quality speech synthesis and coding techniques. This paper introduces criteria for measuring the performance of automatic procedures performing this task against manually segmented and labeled data. First, five basic pitch determination algorithms (PDAs) (SIFT, comb filter energy maximization, spectrum decimation/accumulation, optimal temporal similarity and dyadic wavelet transform) are evaluated and their performance is analyzed. A set of enhancements is then developed and applied to the basic algorithms, which yields superior performance by virtually eliminating multiple and sub-multiple pitch assignment errors and reducing all other errors. Evaluation shows that the enhancements improved performance of all five PDAs with the improvement ranging from 3.5{\%} for the comb filter energy maximization method to 8.3{\%} for the dyadic wavelet transform method.",
keywords = "Pitch determination, Speech analysis, Speech segmentation",
author = "Peter Veprek and Scordilis, {Michael S}",
year = "2002",
month = "7",
day = "1",
doi = "10.1016/S0167-6393(01)00017-6",
language = "English",
volume = "37",
pages = "249--270",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",
number = "3-4",

}

TY - JOUR

T1 - Analysis, enhancement and evaluation of five pitch determination techniques

AU - Veprek, Peter

AU - Scordilis, Michael S

PY - 2002/7/1

Y1 - 2002/7/1

N2 - Speech classification into voiced and unvoiced (or silent) portions is important in many speech processing applications. In addition, segmentation of voiced speech into individual pitch epochs is necessary in several high quality speech synthesis and coding techniques. This paper introduces criteria for measuring the performance of automatic procedures performing this task against manually segmented and labeled data. First, five basic pitch determination algorithms (PDAs) (SIFT, comb filter energy maximization, spectrum decimation/accumulation, optimal temporal similarity and dyadic wavelet transform) are evaluated and their performance is analyzed. A set of enhancements is then developed and applied to the basic algorithms, which yields superior performance by virtually eliminating multiple and sub-multiple pitch assignment errors and reducing all other errors. Evaluation shows that the enhancements improved performance of all five PDAs with the improvement ranging from 3.5% for the comb filter energy maximization method to 8.3% for the dyadic wavelet transform method.

AB - Speech classification into voiced and unvoiced (or silent) portions is important in many speech processing applications. In addition, segmentation of voiced speech into individual pitch epochs is necessary in several high quality speech synthesis and coding techniques. This paper introduces criteria for measuring the performance of automatic procedures performing this task against manually segmented and labeled data. First, five basic pitch determination algorithms (PDAs) (SIFT, comb filter energy maximization, spectrum decimation/accumulation, optimal temporal similarity and dyadic wavelet transform) are evaluated and their performance is analyzed. A set of enhancements is then developed and applied to the basic algorithms, which yields superior performance by virtually eliminating multiple and sub-multiple pitch assignment errors and reducing all other errors. Evaluation shows that the enhancements improved performance of all five PDAs with the improvement ranging from 3.5% for the comb filter energy maximization method to 8.3% for the dyadic wavelet transform method.

KW - Pitch determination

KW - Speech analysis

KW - Speech segmentation

UR - http://www.scopus.com/inward/record.url?scp=0036642776&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036642776&partnerID=8YFLogxK

U2 - 10.1016/S0167-6393(01)00017-6

DO - 10.1016/S0167-6393(01)00017-6

M3 - Article

AN - SCOPUS:0036642776

VL - 37

SP - 249

EP - 270

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

IS - 3-4

ER -