Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online ASR

Heymann, Jahn; Drude, Lukas; Haeb-Umbach, Reinhold; Kinoshita, Keisuke; Nakatani, Tomohiro

Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online ASR

J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, T. Nakatani, in: ICASSP 2019, Brighton, UK, 2019.

Download

ICASSP_2019_Heymann_Paper.pdf 199.11 KB

Conference Paper | English

Author

Heymann, Jahn^LibreCat; Drude, Lukas^LibreCat; Haeb-Umbach, Reinhold^LibreCat; Kinoshita, Keisuke; Nakatani, Tomohiro

Department

Nachrichtentechnik (NT) / Heinz Nixdorf Institut

Project

Computing Resources Provided by the Paderborn Center for Parallel Computing

Abstract

Signal dereverberation using the Weighted Prediction Error (WPE) method has been proven to be an effective means to raise the accuracy of far-field speech recognition. First proposed as an iterative algorithm, follow-up works have reformulated it as a recursive least squares algorithm and therefore enabled its use in online applications. For this algorithm, the estimation of the power spectral density (PSD) of the anechoic signal plays an important role and strongly influences its performance. Recently, we showed that using a neural network PSD estimator leads to improved performance for online automatic speech recognition. This, however, comes at a price. To train the network, we require parallel data, i.e., utterances simultaneously available in clean and reverberated form. Here we propose to overcome this limitation by training the network jointly with the acoustic model of the speech recognizer. To be specific, the gradients computed from the cross-entropy loss between the target senone sequence and the acoustic model network output is backpropagated through the complex-valued dereverberation filter estimation to the neural network for PSD estimation. Evaluation on two databases demonstrates improved performance for on-line processing scenarios while imposing fewer requirements on the available training data and thus widening the range of applications.

Publishing Year

2019

Proceedings Title

ICASSP 2019, Brighton, UK

LibreCat-ID

12875

Cite this

Heymann J, Drude L, Haeb-Umbach R, Kinoshita K, Nakatani T. Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online ASR. In: ICASSP 2019, Brighton, UK. ; 2019.

Heymann, J., Drude, L., Haeb-Umbach, R., Kinoshita, K., & Nakatani, T. (2019). Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online ASR. In ICASSP 2019, Brighton, UK.

@inproceedings{Heymann_Drude_Haeb-Umbach_Kinoshita_Nakatani_2019, title={Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online ASR}, booktitle={ICASSP 2019, Brighton, UK}, author={Heymann, Jahn and Drude, Lukas and Haeb-Umbach, Reinhold and Kinoshita, Keisuke and Nakatani, Tomohiro}, year={2019} }

Heymann, Jahn, Lukas Drude, Reinhold Haeb-Umbach, Keisuke Kinoshita, and Tomohiro Nakatani. “Joint Optimization of Neural Network-Based WPE Dereverberation and Acoustic Model for Robust Online ASR.” In ICASSP 2019, Brighton, UK, 2019.

J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, and T. Nakatani, “Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online ASR,” in ICASSP 2019, Brighton, UK, 2019.

Heymann, Jahn, et al. “Joint Optimization of Neural Network-Based WPE Dereverberation and Acoustic Model for Robust Online ASR.” ICASSP 2019, Brighton, UK, 2019.

All files available under the following license(s):

Copyright Statement:

This Item is protected by copyright and/or related rights. [...]

Main File(s)

File Name

ICASSP_2019_Heymann_Paper.pdf 199.11 KB

Access Level

Open Access

Last Uploaded

2019-12-17T07:28:06Z

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar