Multi-Channel Block-Online Source Extraction based on Utterance Adaptation

J.M. Martin-Donas, J. Heitkaemper, R. Haeb-Umbach, A.M. Gomez, A.M. Peinado, in: INTERSPEECH 2019, Graz, Austria, 2019.

Download
OA INTERSPEECH_2019_Heitkaemper_Paper.pdf 225.69 KB
Conference Paper | English
Author
Martin-Donas, Juan M.; Heitkaemper, JensLibreCat; Haeb-Umbach, ReinholdLibreCat; Gomez, Angel M.; Peinado, Antonio M.
Abstract
This paper deals with multi-channel speech recognition in scenarios with multiple speakers. Recently, the spectral characteristics of a target speaker, extracted from an adaptation utterance, have been used to guide a neural network mask estimator to focus on that speaker. In this work we present two variants of speakeraware neural networks, which exploit both spectral and spatial information to allow better discrimination between target and interfering speakers. Thus, we introduce either a spatial preprocessing prior to the mask estimation or a spatial plus spectral speaker characterization block whose output is directly fed into the neural mask estimator. The target speaker’s spectral and spatial signature is extracted from an adaptation utterance recorded at the beginning of a session. We further adapt the architecture for low-latency processing by means of block-online beamforming that recursively updates the signal statistics. Experimental results show that the additional spatial information clearly improves source extraction, in particular in the same-gender case, and that our proposal achieves state-of-the-art performance in terms of distortion reduction and recognition accuracy.
Publishing Year
Proceedings Title
INTERSPEECH 2019, Graz, Austria
LibreCat-ID

Cite this

Martin-Donas JM, Heitkaemper J, Haeb-Umbach R, Gomez AM, Peinado AM. Multi-Channel Block-Online Source Extraction based on Utterance Adaptation. In: INTERSPEECH 2019, Graz, Austria. ; 2019.
Martin-Donas, J. M., Heitkaemper, J., Haeb-Umbach, R., Gomez, A. M., & Peinado, A. M. (2019). Multi-Channel Block-Online Source Extraction based on Utterance Adaptation. In INTERSPEECH 2019, Graz, Austria.
@inproceedings{Martin-Donas_Heitkaemper_Haeb-Umbach_Gomez_Peinado_2019, title={Multi-Channel Block-Online Source Extraction based on Utterance Adaptation}, booktitle={INTERSPEECH 2019, Graz, Austria}, author={Martin-Donas, Juan M. and Heitkaemper, Jens and Haeb-Umbach, Reinhold and Gomez, Angel M. and Peinado, Antonio M.}, year={2019} }
Martin-Donas, Juan M., Jens Heitkaemper, Reinhold Haeb-Umbach, Angel M. Gomez, and Antonio M. Peinado. “Multi-Channel Block-Online Source Extraction Based on Utterance Adaptation.” In INTERSPEECH 2019, Graz, Austria, 2019.
J. M. Martin-Donas, J. Heitkaemper, R. Haeb-Umbach, A. M. Gomez, and A. M. Peinado, “Multi-Channel Block-Online Source Extraction based on Utterance Adaptation,” in INTERSPEECH 2019, Graz, Austria, 2019.
Martin-Donas, Juan M., et al. “Multi-Channel Block-Online Source Extraction Based on Utterance Adaptation.” INTERSPEECH 2019, Graz, Austria, 2019.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
Access Level
OA Open Access
Last Uploaded
2019-11-08T07:46:37Z


Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar