A Generic Neural Acoustic Beamforming Architecture for Robust Multi-Channel Speech Processing

J. Heymann, L. Drude, R. Haeb-Umbach, Computer Speech and Language (2017).

Journal Article | English
Abstract
Acoustic beamforming can greatly improve the performance of Automatic Speech Recognition (ASR) and speech enhancement systems when multiple channels are available. We recently proposed a way to support the model-based Generalized Eigenvalue beamforming operation with a powerful neural network for spectral mask estimation. The enhancement system has a number of desirable properties. In particular, neither assumptions need to be made about the nature of the acoustic transfer function (e.g., being anechonic), nor does the array configuration need to be known. While the system has been originally developed to enhance speech in noisy environments, we show in this article that it is also effective in suppressing reverberation, thus leading to a generic trainable multi-channel speech enhancement system for robust speech processing. To support this claim, we consider two distinct datasets: The CHiME 3 challenge, which features challenging real-world noise distortions, and the Reverb challenge, which focuses on distortions caused by reverberation. We evaluate the system both with respect to a speech enhancement and a recognition task. For the first task we propose a new way to cope with the distortions introduced by the Generalized Eigenvalue beamformer by renormalizing the target energy for each frequency bin, and measure its effectiveness in terms of the PESQ score. For the latter we feed the enhanced signal to a strong DNN back-end and achieve state-of-the-art ASR results on both datasets. We further experiment with different network architectures for spectral mask estimation: One small feed-forward network with only one hidden layer, one Convolutional Neural Network and one bi-directional Long Short-Term Memory network, showing that even a small network is capable of delivering significant performance improvements.
Publishing Year
Journal Title
Computer Speech and Language
LibreCat-ID

Cite this

Heymann J, Drude L, Haeb-Umbach R. A Generic Neural Acoustic Beamforming Architecture for Robust Multi-Channel Speech Processing. Computer Speech and Language. 2017.
Heymann, J., Drude, L., & Haeb-Umbach, R. (2017). A Generic Neural Acoustic Beamforming Architecture for Robust Multi-Channel Speech Processing. Computer Speech and Language.
@article{Heymann_Drude_Haeb-Umbach_2017, title={A Generic Neural Acoustic Beamforming Architecture for Robust Multi-Channel Speech Processing}, journal={Computer Speech and Language}, author={Heymann, Jahn and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2017} }
Heymann, Jahn, Lukas Drude, and Reinhold Haeb-Umbach. “A Generic Neural Acoustic Beamforming Architecture for Robust Multi-Channel Speech Processing.” Computer Speech and Language, 2017.
J. Heymann, L. Drude, and R. Haeb-Umbach, “A Generic Neural Acoustic Beamforming Architecture for Robust Multi-Channel Speech Processing,” Computer Speech and Language, 2017.
Heymann, Jahn, et al. “A Generic Neural Acoustic Beamforming Architecture for Robust Multi-Channel Speech Processing.” Computer Speech and Language, 2017.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)
Access Level
Restricted Closed Access

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar