BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System

Heymann, Jahn; Drude, Lukas; Boeddeker, Christoph; Hanebrink, Patrick; Haeb-Umbach, Reinhold

BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System

J. Heymann, L. Drude, C. Boeddeker, P. Hanebrink, R. Haeb-Umbach, in: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017.

Download (ext.)

https://groups.uni-paderborn.de/nt/pubs/2017/icassp_2017_heymann_paper.pdf

Conference Paper | English

Author

Heymann, Jahn^LibreCat; Drude, Lukas^LibreCat; Boeddeker, Christoph^LibreCat; Hanebrink, Patrick; Haeb-Umbach, Reinhold^LibreCat

Department

Nachrichtentechnik (NT) / Heinz Nixdorf Institut

Project

Computing Resources Provided by the Paderborn Center for Parallel Computing

Abstract

This paper presents an end-to-end training approach for a beamformer-supported multi-channel ASR system. A neural network which estimates masks for a statistically optimum beamformer is jointly trained with a network for acoustic modeling. To update its parameters, we propagate the gradients from the acoustic model all the way through feature extraction and the complex valued beamforming operation. Besides avoiding a mismatch between the front-end and the back-end, this approach also eliminates the need for stereo data, i.e., the parallel availability of clean and noisy versions of the signals. Instead, it can be trained with real noisy multichannel data only. Also, relying on the signal statistics for beamforming, the approach makes no assumptions on the configuration of the microphone array. We further observe a performance gain through joint training in terms of word error rate in an evaluation of the system on the CHiME 4 dataset.

Publishing Year

2017

Proceedings Title

Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)

LibreCat-ID

11809

Cite this

Heymann J, Drude L, Boeddeker C, Hanebrink P, Haeb-Umbach R. BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System. In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP). ; 2017.

Heymann, J., Drude, L., Boeddeker, C., Hanebrink, P., & Haeb-Umbach, R. (2017). BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System. In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP).

@inproceedings{Heymann_Drude_Boeddeker_Hanebrink_Haeb-Umbach_2017, title={BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System}, booktitle={Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)}, author={Heymann, Jahn and Drude, Lukas and Boeddeker, Christoph and Hanebrink, Patrick and Haeb-Umbach, Reinhold}, year={2017} }

Heymann, Jahn, Lukas Drude, Christoph Boeddeker, Patrick Hanebrink, and Reinhold Haeb-Umbach. “BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System.” In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017.

J. Heymann, L. Drude, C. Boeddeker, P. Hanebrink, and R. Haeb-Umbach, “BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System,” in Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017.

Heymann, Jahn, et al. “BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System.” Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017.

All files available under the following license(s):

Copyright Statement: