Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online ASR
J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, T. Nakatani, in: ICASSP 2019, Brighton, UK, 2019.
Download
ICASSP_2019_Heymann_Paper.pdf
199.11 KB
Conference Paper
| English
Author
Heymann, JahnLibreCat;
Drude, LukasLibreCat;
Haeb-Umbach, ReinholdLibreCat;
Kinoshita, Keisuke;
Nakatani, Tomohiro
Abstract
Signal dereverberation using the Weighted Prediction Error (WPE) method has been proven to be an effective means to raise the accuracy of far-field speech recognition. First proposed as an iterative algorithm, follow-up works have reformulated it as a recursive least squares algorithm and therefore enabled its use in online applications. For this algorithm, the estimation of the power spectral density (PSD) of the anechoic signal plays an important role and strongly influences its performance. Recently, we showed that using a neural network PSD estimator leads to improved performance for online automatic speech recognition. This, however, comes at a price. To train the network, we require parallel data, i.e., utterances simultaneously available in clean and reverberated form. Here we propose to overcome this limitation by training the network jointly with the acoustic model of the speech recognizer. To be specific, the gradients computed from the cross-entropy loss between the target senone sequence and the acoustic model network output is backpropagated through the complex-valued dereverberation filter estimation to the neural network for PSD estimation. Evaluation on two databases demonstrates improved performance for on-line processing scenarios while imposing fewer requirements on the available training data and thus widening the range of applications.
Publishing Year
Proceedings Title
ICASSP 2019, Brighton, UK
LibreCat-ID
Cite this
Heymann J, Drude L, Haeb-Umbach R, Kinoshita K, Nakatani T. Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online ASR. In: ICASSP 2019, Brighton, UK. ; 2019.
Heymann, J., Drude, L., Haeb-Umbach, R., Kinoshita, K., & Nakatani, T. (2019). Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online ASR. In ICASSP 2019, Brighton, UK.
@inproceedings{Heymann_Drude_Haeb-Umbach_Kinoshita_Nakatani_2019, title={Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online ASR}, booktitle={ICASSP 2019, Brighton, UK}, author={Heymann, Jahn and Drude, Lukas and Haeb-Umbach, Reinhold and Kinoshita, Keisuke and Nakatani, Tomohiro}, year={2019} }
Heymann, Jahn, Lukas Drude, Reinhold Haeb-Umbach, Keisuke Kinoshita, and Tomohiro Nakatani. “Joint Optimization of Neural Network-Based WPE Dereverberation and Acoustic Model for Robust Online ASR.” In ICASSP 2019, Brighton, UK, 2019.
J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, and T. Nakatani, “Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online ASR,” in ICASSP 2019, Brighton, UK, 2019.
Heymann, Jahn, et al. “Joint Optimization of Neural Network-Based WPE Dereverberation and Acoustic Model for Robust Online ASR.” ICASSP 2019, Brighton, UK, 2019.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
File Name
ICASSP_2019_Heymann_Paper.pdf
199.11 KB
Access Level
Open Access
Last Uploaded
2019-12-17T07:28:06Z