A New Observation Model in the Logarithmic Mel Power Spectral Domain for the Automatic Recognition of Noisy Reverberant Speech

V. Leutnant, A. Krueger, R. Haeb-Umbach, IEEE/ACM Transactions on Audio, Speech, and Language Processing 22 (2014) 95–109.

Download
No fulltext has been uploaded.
Journal Article | English
Author
Leutnant, Volker; Krueger, Alexander; Haeb-Umbach, ReinholdLibreCat
Abstract
In this contribution we present a theoretical and experimental investigation into the effects of reverberation and noise on features in the logarithmic mel power spectral domain, an intermediate stage in the computation of the mel frequency cepstral coefficients, prevalent in automatic speech recognition (ASR). Gaining insight into the complex interaction between clean speech, noise, and noisy reverberant speech features is essential for any ASR system to be robust against noise and reverberation present in distant microphone input signals. The findings are gathered in a probabilistic formulation of an observation model which may be used in model-based feature compensation schemes. The proposed observation model extends previous models in three major directions: First, the contribution of additive background noise to the observation error is explicitly taken into account. Second, an energy compensation constant is introduced which ensures an unbiased estimate of the reverberant speech features, and, third, a recursive variant of the observation model is developed resulting in reduced computational complexity when used in model-based feature compensation. The experimental section is used to evaluate the accuracy of the model and to describe how its parameters can be determined from test data.
Publishing Year
Journal Title
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Volume
22
Issue
1
Page
95-109
ISSN
LibreCat-ID

Cite this

Leutnant V, Krueger A, Haeb-Umbach R. A New Observation Model in the Logarithmic Mel Power Spectral Domain for the Automatic Recognition of Noisy Reverberant Speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2014;22(1):95-109. doi:10.1109/TASLP.2013.2285480
Leutnant, V., Krueger, A., & Haeb-Umbach, R. (2014). A New Observation Model in the Logarithmic Mel Power Spectral Domain for the Automatic Recognition of Noisy Reverberant Speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(1), 95–109. https://doi.org/10.1109/TASLP.2013.2285480
@article{Leutnant_Krueger_Haeb-Umbach_2014, title={A New Observation Model in the Logarithmic Mel Power Spectral Domain for the Automatic Recognition of Noisy Reverberant Speech}, volume={22}, DOI={10.1109/TASLP.2013.2285480}, number={1}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, author={Leutnant, Volker and Krueger, Alexander and Haeb-Umbach, Reinhold}, year={2014}, pages={95–109} }
Leutnant, Volker, Alexander Krueger, and Reinhold Haeb-Umbach. “A New Observation Model in the Logarithmic Mel Power Spectral Domain for the Automatic Recognition of Noisy Reverberant Speech.” IEEE/ACM Transactions on Audio, Speech, and Language Processing 22, no. 1 (2014): 95–109. https://doi.org/10.1109/TASLP.2013.2285480.
V. Leutnant, A. Krueger, and R. Haeb-Umbach, “A New Observation Model in the Logarithmic Mel Power Spectral Domain for the Automatic Recognition of Noisy Reverberant Speech,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 1, pp. 95–109, 2014.
Leutnant, Volker, et al. “A New Observation Model in the Logarithmic Mel Power Spectral Domain for the Automatic Recognition of Noisy Reverberant Speech.” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 1, 2014, pp. 95–109, doi:10.1109/TASLP.2013.2285480.

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar