Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition

article Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition Volker Leutnant author Alexander Krueger author Reinhold Haeb-Umbach author 242 54 department In this contribution we extend a previously proposed Bayesian approach for the enhancement of reverberant logarithmic mel power spectral coefficients for robust automatic speech recognition to the additional compensation of background noise. A recently proposed observation model is employed whose time-variant observation error statistics are obtained as a side product of the inference of the a posteriori probability density function of the clean speech feature vectors. Further a reduction of the computational effort and the memory requirements are achieved by using a recursive formulation of the observation model. The performance of the proposed algorithms is first experimentally studied on a connected digits recognition task with artificially created noisy reverberant data. It is shown that the use of the time-variant observation error model leads to a significant error rate reduction at low signal-to-noise ratios compared to a time-invariant model. Further experiments were conducted on a 5000 word task recorded in a reverberant and noisy environment. A significant word error rate reduction was obtained demonstrating the effectiveness of the approach on real-world data. 2013 eng Bayes methodscompensationerror statisticsreverberationspeech recognitionBayesian feature enhancementbackground noiseclean speech feature vectorscompensationconnected digits recognition taskerror statisticsmemory requirementsnoisy reverberant dataposteriori probability density functionrecursive formulationreverberant logarithmic mel power spectral coefficientsrobust automatic speech recognitionsignal-to-noise ratiostime-variant observationword error rate reductionRobust automatic speech recognitionmodel-based Bayesian feature enhancementobservation model for reverberant and noisy speechrecursive observation model IEEE Transactions on Audio, Speech, and Language Processing10.1109/TASL.2013.2258013 2181640-1652 Leutnant, Volker, Alexander Krueger, and Reinhold Haeb-Umbach. “Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition.” IEEE Transactions on Audio, Speech, and Language Processing 21, no. 8 (2013): 1640–52. <a href="https://doi.org/10.1109/TASL.2013.2258013">https://doi.org/10.1109/TASL.2013.2258013</a>. V. Leutnant, A. Krueger, R. Haeb-Umbach, IEEE Transactions on Audio, Speech, and Language Processing 21 (2013) 1640–1652. @article{Leutnant_Krueger_Haeb-Umbach_2013, title={Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition}, volume={21}, DOI={<a href="https://doi.org/10.1109/TASL.2013.2258013">10.1109/TASL.2013.2258013</a>}, number={8}, journal={IEEE Transactions on Audio, Speech, and Language Processing}, author={Leutnant, Volker and Krueger, Alexander and Haeb-Umbach, Reinhold}, year={2013}, pages={1640–1652} } Leutnant, V., Krueger, A., & Haeb-Umbach, R. (2013). Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing, 21(8), 1640–1652. <a href="https://doi.org/10.1109/TASL.2013.2258013">https://doi.org/10.1109/TASL.2013.2258013</a> Leutnant V, Krueger A, Haeb-Umbach R. Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing. 2013;21(8):1640-1652. doi:<a href="https://doi.org/10.1109/TASL.2013.2258013">10.1109/TASL.2013.2258013</a> Leutnant, Volker, et al. “Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition.” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 8, 2013, pp. 1640–52, doi:<a href="https://doi.org/10.1109/TASL.2013.2258013">10.1109/TASL.2013.2258013</a>. V. Leutnant, A. Krueger, and R. Haeb-Umbach, “Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 8, pp. 1640–1652, 2013. 118622019-07-12T05:29:42Z2022-01-06T06:51:11Z