BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System

J. Heymann, L. Drude, C. Boeddeker, P. Hanebrink, R. Haeb-Umbach, in: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017.

Conference Paper | English
Abstract
This paper presents an end-to-end training approach for a beamformer-supported multi-channel ASR system. A neural network which estimates masks for a statistically optimum beamformer is jointly trained with a network for acoustic modeling. To update its parameters, we propagate the gradients from the acoustic model all the way through feature extraction and the complex valued beamforming operation. Besides avoiding a mismatch between the front-end and the back-end, this approach also eliminates the need for stereo data, i.e., the parallel availability of clean and noisy versions of the signals. Instead, it can be trained with real noisy multichannel data only. Also, relying on the signal statistics for beamforming, the approach makes no assumptions on the configuration of the microphone array. We further observe a performance gain through joint training in terms of word error rate in an evaluation of the system on the CHiME 4 dataset.
Publishing Year
Proceedings Title
Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)
LibreCat-ID

Cite this

Heymann J, Drude L, Boeddeker C, Hanebrink P, Haeb-Umbach R. BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System. In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP). ; 2017.
Heymann, J., Drude, L., Boeddeker, C., Hanebrink, P., & Haeb-Umbach, R. (2017). BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System. In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP).
@inproceedings{Heymann_Drude_Boeddeker_Hanebrink_Haeb-Umbach_2017, title={BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System}, booktitle={Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)}, author={Heymann, Jahn and Drude, Lukas and Boeddeker, Christoph and Hanebrink, Patrick and Haeb-Umbach, Reinhold}, year={2017} }
Heymann, Jahn, Lukas Drude, Christoph Boeddeker, Patrick Hanebrink, and Reinhold Haeb-Umbach. “BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System.” In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017.
J. Heymann, L. Drude, C. Boeddeker, P. Hanebrink, and R. Haeb-Umbach, “BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System,” in Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017.
Heymann, Jahn, et al. “BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System.” Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)
Access Level
Restricted Closed Access
External material:
Supplementary Material
Description
Poster

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar