Model-Based Feature Enhancement for Reverberant Speech Recognition

Krueger, Alexander; Haeb-Umbach, Reinhold

Model-Based Feature Enhancement for Reverberant Speech Recognition

A. Krueger, R. Haeb-Umbach, IEEE Transactions on Audio, Speech, and Language Processing 18 (2010) 1692–1707.

Download (ext.)

https://groups.uni-paderborn.de/nt/pubs/2010/KrHa10.pdf

DOI

10.1109/TASL.2010.2049684

Journal Article | English

Author

Krueger, Alexander; Haeb-Umbach, Reinhold^LibreCat

Department

Nachrichtentechnik (NT) / Heinz Nixdorf Institut

Abstract

In this paper, we present a new technique for automatic speech recognition (ASR) in reverberant environments. Our approach is aimed at the enhancement of the logarithmic Mel power spectrum, which is computed at an intermediate stage to obtain the widely used Mel frequency cepstral coefficients (MFCCs). Given the reverberant logarithmic Mel power spectral coefficients (LMPSCs), a minimum mean square error estimate of the clean LMPSCs is computed by carrying out Bayesian inference. We employ switching linear dynamical models as an a priori model for the dynamics of the clean LMPSCs. Further, we derive a stochastic observation model which relates the clean to the reverberant LMPSCs through a simplified model of the room impulse response (RIR). This model requires only two parameters, namely RIR energy and reverberation time, which can be estimated from the captured microphone signal. The performance of the proposed enhancement technique is studied on the AURORA5 database and compared to that of constrained maximum-likelihood linear regression (CMLLR). It is shown by experimental results that our approach significantly outperforms CMLLR and that up to 80\% of the errors caused by the reverberation are recovered. In addition to the fact that the approach is compatible with the standard MFCC feature vectors, it leaves the ASR back-end unchanged. It is of moderate computational complexity and suitable for real time applications.

Keywords

Publishing Year

2010

Journal Title

IEEE Transactions on Audio, Speech, and Language Processing

Volume

Issue

Page

1692-1707

LibreCat-ID

11846

Cite this

Krueger A, Haeb-Umbach R. Model-Based Feature Enhancement for Reverberant Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing. 2010;18(7):1692-1707. doi:10.1109/TASL.2010.2049684

Krueger, A., & Haeb-Umbach, R. (2010). Model-Based Feature Enhancement for Reverberant Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18(7), 1692–1707. https://doi.org/10.1109/TASL.2010.2049684

@article{Krueger_Haeb-Umbach_2010, title={Model-Based Feature Enhancement for Reverberant Speech Recognition}, volume={18}, DOI={10.1109/TASL.2010.2049684}, number={7}, journal={IEEE Transactions on Audio, Speech, and Language Processing}, author={Krueger, Alexander and Haeb-Umbach, Reinhold}, year={2010}, pages={1692–1707} }

Krueger, Alexander, and Reinhold Haeb-Umbach. “Model-Based Feature Enhancement for Reverberant Speech Recognition.” IEEE Transactions on Audio, Speech, and Language Processing 18, no. 7 (2010): 1692–1707. https://doi.org/10.1109/TASL.2010.2049684.

A. Krueger and R. Haeb-Umbach, “Model-Based Feature Enhancement for Reverberant Speech Recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 7, pp. 1692–1707, 2010.

Krueger, Alexander, and Reinhold Haeb-Umbach. “Model-Based Feature Enhancement for Reverberant Speech Recognition.” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 7, 2010, pp. 1692–707, doi:10.1109/TASL.2010.2049684.

All files available under the following license(s):

Copyright Statement:

This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)

URL

https://groups.uni-paderborn.de/nt/pubs/2010/KrHa10.pdf

Access Level

Closed Access

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar