Single channel speech enhancement by frequency domain constrained optimization and temporal masking

Jin Wen, Michael S Scordilis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A speech enhancement algorithm is proposed that exploits the masking properties of the human auditory system. The enhancement is formulated as a frequency domain constrained optimization problem. The noise components of the noisy speech are suppressed by a gain function subject to the constraint that both the signal distortion and residual noise should fall below the masking thresholds. Temporal as well as simultaneous masking effects are incorporated into the estimation of masking thresholds. The enhancement algorithm was tested with speech corrupted by white Gaussian and multitalker babble noise, respectively. Its performance was evaluated by ITU PESQ scores and segmental SNR. Experimental results indicate that the proposed gain function performs slightly but consistently better than a former perceptually motivated enhancement algorithm. Greater improvement is achieved by incorporating the temporal masking effects.

Original languageEnglish
Title of host publicationINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Pages1411-1414
Number of pages4
Volume3
StatePublished - Dec 1 2006
EventINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP - Pittsburgh, PA, United States
Duration: Sep 17 2006Sep 21 2006

Other

OtherINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
CountryUnited States
CityPittsburgh, PA
Period9/17/069/21/06

Fingerprint

Speech intelligibility
Speech enhancement
Constrained optimization
Signal distortion

Keywords

  • Psychoacoustical model
  • Speech enhancement
  • Temporal masking

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Wen, J., & Scordilis, M. S. (2006). Single channel speech enhancement by frequency domain constrained optimization and temporal masking. In INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP (Vol. 3, pp. 1411-1414)

Single channel speech enhancement by frequency domain constrained optimization and temporal masking. / Wen, Jin; Scordilis, Michael S.

INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. Vol. 3 2006. p. 1411-1414.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wen, J & Scordilis, MS 2006, Single channel speech enhancement by frequency domain constrained optimization and temporal masking. in INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. vol. 3, pp. 1411-1414, INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP, Pittsburgh, PA, United States, 9/17/06.
Wen J, Scordilis MS. Single channel speech enhancement by frequency domain constrained optimization and temporal masking. In INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. Vol. 3. 2006. p. 1411-1414
Wen, Jin ; Scordilis, Michael S. / Single channel speech enhancement by frequency domain constrained optimization and temporal masking. INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. Vol. 3 2006. pp. 1411-1414
@inproceedings{b543cd6290d64ac8b39114cdb889e5e8,
title = "Single channel speech enhancement by frequency domain constrained optimization and temporal masking",
abstract = "A speech enhancement algorithm is proposed that exploits the masking properties of the human auditory system. The enhancement is formulated as a frequency domain constrained optimization problem. The noise components of the noisy speech are suppressed by a gain function subject to the constraint that both the signal distortion and residual noise should fall below the masking thresholds. Temporal as well as simultaneous masking effects are incorporated into the estimation of masking thresholds. The enhancement algorithm was tested with speech corrupted by white Gaussian and multitalker babble noise, respectively. Its performance was evaluated by ITU PESQ scores and segmental SNR. Experimental results indicate that the proposed gain function performs slightly but consistently better than a former perceptually motivated enhancement algorithm. Greater improvement is achieved by incorporating the temporal masking effects.",
keywords = "Psychoacoustical model, Speech enhancement, Temporal masking",
author = "Jin Wen and Scordilis, {Michael S}",
year = "2006",
month = "12",
day = "1",
language = "English",
isbn = "9781604234497",
volume = "3",
pages = "1411--1414",
booktitle = "INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP",

}

TY - GEN

T1 - Single channel speech enhancement by frequency domain constrained optimization and temporal masking

AU - Wen, Jin

AU - Scordilis, Michael S

PY - 2006/12/1

Y1 - 2006/12/1

N2 - A speech enhancement algorithm is proposed that exploits the masking properties of the human auditory system. The enhancement is formulated as a frequency domain constrained optimization problem. The noise components of the noisy speech are suppressed by a gain function subject to the constraint that both the signal distortion and residual noise should fall below the masking thresholds. Temporal as well as simultaneous masking effects are incorporated into the estimation of masking thresholds. The enhancement algorithm was tested with speech corrupted by white Gaussian and multitalker babble noise, respectively. Its performance was evaluated by ITU PESQ scores and segmental SNR. Experimental results indicate that the proposed gain function performs slightly but consistently better than a former perceptually motivated enhancement algorithm. Greater improvement is achieved by incorporating the temporal masking effects.

AB - A speech enhancement algorithm is proposed that exploits the masking properties of the human auditory system. The enhancement is formulated as a frequency domain constrained optimization problem. The noise components of the noisy speech are suppressed by a gain function subject to the constraint that both the signal distortion and residual noise should fall below the masking thresholds. Temporal as well as simultaneous masking effects are incorporated into the estimation of masking thresholds. The enhancement algorithm was tested with speech corrupted by white Gaussian and multitalker babble noise, respectively. Its performance was evaluated by ITU PESQ scores and segmental SNR. Experimental results indicate that the proposed gain function performs slightly but consistently better than a former perceptually motivated enhancement algorithm. Greater improvement is achieved by incorporating the temporal masking effects.

KW - Psychoacoustical model

KW - Speech enhancement

KW - Temporal masking

UR - http://www.scopus.com/inward/record.url?scp=44949088741&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=44949088741&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:44949088741

SN - 9781604234497

VL - 3

SP - 1411

EP - 1414

BT - INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP

ER -