Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition

Heymann, Jahn; Drude, Lukas; Haeb-Umbach, Reinhold

Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition

J. Heymann, L. Drude, R. Haeb-Umbach, in: Computer Speech and Language, 2016.

Download (ext.)

https://groups.uni-paderborn.de/nt/pubs/2016/chime4_upbonly_paper.pdf

Conference Paper | English

Author

Heymann, Jahn^LibreCat; Drude, Lukas^LibreCat; Haeb-Umbach, Reinhold^LibreCat

Department

Nachrichtentechnik (NT) / Heinz Nixdorf Institut

Abstract

We present a system for the 4th CHiME challenge which significantly increases the performance for all three tracks with respect to the provided baseline system. The front-end uses a bi-directional Long Short-Term Memory (BLSTM)-based neural network to estimate signal statistics. These then steer a Generalized Eigenvalue beamformer. The back-end consists of a 22 layer deep Wide Residual Network and two extra BLSTM layers. Working on a whole utterance instead of frames allows us to refine Batch-Normalization. We also train our own BLSTM-based language model. Adding a discriminative speaker adaptation leads to further gains. The final system achieves a word error rate on the six channel real test data of 3.48%. For the two channel track we achieve 5.96% and for the one channel track 9.34%. This is the best reported performance on the challenge achieved by a single system, i.e., a configuration, which does not combine multiple systems. At the same time, our system is independent of the microphone configuration. We can thus use the same components for all three tracks.

Publishing Year

2016

Proceedings Title

Computer Speech and Language

LibreCat-ID

11834

Cite this

Heymann J, Drude L, Haeb-Umbach R. Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition. In: Computer Speech and Language. ; 2016.

Heymann, J., Drude, L., & Haeb-Umbach, R. (2016). Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition. In Computer Speech and Language.

@inproceedings{Heymann_Drude_Haeb-Umbach_2016, title={Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition}, booktitle={Computer Speech and Language}, author={Heymann, Jahn and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2016} }

Heymann, Jahn, Lukas Drude, and Reinhold Haeb-Umbach. “Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition.” In Computer Speech and Language, 2016.

J. Heymann, L. Drude, and R. Haeb-Umbach, “Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition,” in Computer Speech and Language, 2016.

Heymann, Jahn, et al. “Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition.” Computer Speech and Language, 2016.

All files available under the following license(s):

Copyright Statement: