{"title":"Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition","related_material":{"link":[{"relation":"supplementary_material","url":"https://groups.uni-paderborn.de/nt/pubs/2016/chime4_upbonly_poster.pdf","description":"Poster"}]},"author":[{"last_name":"Heymann","first_name":"Jahn","full_name":"Heymann, Jahn","id":"9168"},{"first_name":"Lukas","last_name":"Drude","id":"11213","full_name":"Drude, Lukas"},{"last_name":"Haeb-Umbach","first_name":"Reinhold","full_name":"Haeb-Umbach, Reinhold","id":"242"}],"publication":"Computer Speech and Language","citation":{"ama":"Heymann J, Drude L, Haeb-Umbach R. Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition. In: Computer Speech and Language. ; 2016.","chicago":"Heymann, Jahn, Lukas Drude, and Reinhold Haeb-Umbach. “Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition.” In Computer Speech and Language, 2016.","apa":"Heymann, J., Drude, L., & Haeb-Umbach, R. (2016). Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition. In Computer Speech and Language.","bibtex":"@inproceedings{Heymann_Drude_Haeb-Umbach_2016, title={Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition}, booktitle={Computer Speech and Language}, author={Heymann, Jahn and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2016} }","ieee":"J. Heymann, L. Drude, and R. Haeb-Umbach, “Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition,” in Computer Speech and Language, 2016.","mla":"Heymann, Jahn, et al. “Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition.” Computer Speech and Language, 2016.","short":"J. Heymann, L. Drude, R. Haeb-Umbach, in: Computer Speech and Language, 2016."},"type":"conference","_id":"11834","department":[{"_id":"54"}],"date_updated":"2022-01-06T06:51:11Z","year":"2016","oa":"1","status":"public","date_created":"2019-07-12T05:29:09Z","abstract":[{"text":"We present a system for the 4th CHiME challenge which significantly increases the performance for all three tracks with respect to the provided baseline system. The front-end uses a bi-directional Long Short-Term Memory (BLSTM)-based neural network to estimate signal statistics. These then steer a Generalized Eigenvalue beamformer. The back-end consists of a 22 layer deep Wide Residual Network and two extra BLSTM layers. Working on a whole utterance instead of frames allows us to refine Batch-Normalization. We also train our own BLSTM-based language model. Adding a discriminative speaker adaptation leads to further gains. The final system achieves a word error rate on the six channel real test data of 3.48%. For the two channel track we achieve 5.96% and for the one channel track 9.34%. This is the best reported performance on the challenge achieved by a single system, i.e., a configuration, which does not combine multiple systems. At the same time, our system is independent of the microphone configuration. We can thus use the same components for all three tracks.","lang":"eng"}],"main_file_link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2016/chime4_upbonly_paper.pdf","open_access":"1"}],"language":[{"iso":"eng"}],"user_id":"44006"}