Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition

J. Heymann, L. Drude, R. Haeb-Umbach, in: Computer Speech and Language, 2016.

Conference Paper | English
Abstract
We present a system for the 4th CHiME challenge which significantly increases the performance for all three tracks with respect to the provided baseline system. The front-end uses a bi-directional Long Short-Term Memory (BLSTM)-based neural network to estimate signal statistics. These then steer a Generalized Eigenvalue beamformer. The back-end consists of a 22 layer deep Wide Residual Network and two extra BLSTM layers. Working on a whole utterance instead of frames allows us to refine Batch-Normalization. We also train our own BLSTM-based language model. Adding a discriminative speaker adaptation leads to further gains. The final system achieves a word error rate on the six channel real test data of 3.48%. For the two channel track we achieve 5.96% and for the one channel track 9.34%. This is the best reported performance on the challenge achieved by a single system, i.e., a configuration, which does not combine multiple systems. At the same time, our system is independent of the microphone configuration. We can thus use the same components for all three tracks.
Publishing Year
Proceedings Title
Computer Speech and Language
LibreCat-ID

Cite this

Heymann J, Drude L, Haeb-Umbach R. Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition. In: Computer Speech and Language. ; 2016.
Heymann, J., Drude, L., & Haeb-Umbach, R. (2016). Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition. In Computer Speech and Language.
@inproceedings{Heymann_Drude_Haeb-Umbach_2016, title={Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition}, booktitle={Computer Speech and Language}, author={Heymann, Jahn and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2016} }
Heymann, Jahn, Lukas Drude, and Reinhold Haeb-Umbach. “Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition.” In Computer Speech and Language, 2016.
J. Heymann, L. Drude, and R. Haeb-Umbach, “Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition,” in Computer Speech and Language, 2016.
Heymann, Jahn, et al. “Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition.” Computer Speech and Language, 2016.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)
Access Level
Restricted Closed Access
External material:
Supplementary Material
Description
Poster

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar