Multi-Channel Block-Online Source Extraction based on Utterance Adaptation

Martin-Donas, Juan M.; Heitkaemper, Jens; Haeb-Umbach, Reinhold; Gomez, Angel M.; Peinado, Antonio M.

Multi-Channel Block-Online Source Extraction based on Utterance Adaptation

J.M. Martin-Donas, J. Heitkaemper, R. Haeb-Umbach, A.M. Gomez, A.M. Peinado, in: INTERSPEECH 2019, Graz, Austria, 2019.

Download

INTERSPEECH_2019_Heitkaemper_Paper.pdf 225.69 KB

Conference Paper | English

Author

Martin-Donas, Juan M.; Heitkaemper, Jens^LibreCat; Haeb-Umbach, Reinhold^LibreCat; Gomez, Angel M.; Peinado, Antonio M.

Department

Nachrichtentechnik (NT) / Heinz Nixdorf Institut

Project

Computing Resources Provided by the Paderborn Center for Parallel Computing

Abstract

This paper deals with multi-channel speech recognition in scenarios with multiple speakers. Recently, the spectral characteristics of a target speaker, extracted from an adaptation utterance, have been used to guide a neural network mask estimator to focus on that speaker. In this work we present two variants of speakeraware neural networks, which exploit both spectral and spatial information to allow better discrimination between target and interfering speakers. Thus, we introduce either a spatial preprocessing prior to the mask estimation or a spatial plus spectral speaker characterization block whose output is directly fed into the neural mask estimator. The target speaker’s spectral and spatial signature is extracted from an adaptation utterance recorded at the beginning of a session. We further adapt the architecture for low-latency processing by means of block-online beamforming that recursively updates the signal statistics. Experimental results show that the additional spatial information clearly improves source extraction, in particular in the same-gender case, and that our proposal achieves state-of-the-art performance in terms of distortion reduction and recognition accuracy.

Publishing Year

2019

Proceedings Title

INTERSPEECH 2019, Graz, Austria

LibreCat-ID

14824

Cite this

Martin-Donas JM, Heitkaemper J, Haeb-Umbach R, Gomez AM, Peinado AM. Multi-Channel Block-Online Source Extraction based on Utterance Adaptation. In: INTERSPEECH 2019, Graz, Austria. ; 2019.

Martin-Donas, J. M., Heitkaemper, J., Haeb-Umbach, R., Gomez, A. M., & Peinado, A. M. (2019). Multi-Channel Block-Online Source Extraction based on Utterance Adaptation. In INTERSPEECH 2019, Graz, Austria.

@inproceedings{Martin-Donas_Heitkaemper_Haeb-Umbach_Gomez_Peinado_2019, title={Multi-Channel Block-Online Source Extraction based on Utterance Adaptation}, booktitle={INTERSPEECH 2019, Graz, Austria}, author={Martin-Donas, Juan M. and Heitkaemper, Jens and Haeb-Umbach, Reinhold and Gomez, Angel M. and Peinado, Antonio M.}, year={2019} }

Martin-Donas, Juan M., Jens Heitkaemper, Reinhold Haeb-Umbach, Angel M. Gomez, and Antonio M. Peinado. “Multi-Channel Block-Online Source Extraction Based on Utterance Adaptation.” In INTERSPEECH 2019, Graz, Austria, 2019.

J. M. Martin-Donas, J. Heitkaemper, R. Haeb-Umbach, A. M. Gomez, and A. M. Peinado, “Multi-Channel Block-Online Source Extraction based on Utterance Adaptation,” in INTERSPEECH 2019, Graz, Austria, 2019.

Martin-Donas, Juan M., et al. “Multi-Channel Block-Online Source Extraction Based on Utterance Adaptation.” INTERSPEECH 2019, Graz, Austria, 2019.

All files available under the following license(s):

Copyright Statement:

This Item is protected by copyright and/or related rights. [...]

Main File(s)

File Name

INTERSPEECH_2019_Heitkaemper_Paper.pdf 225.69 KB

Access Level

Open Access

Last Uploaded

2019-11-08T07:46:37Z

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar