Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings
L. Drude, R. Haeb-Umbach, in: INTERSPEECH 2017, Stockholm, Schweden, 2017.
Conference Paper
| English
Abstract
Recent advances in discriminatively trained mask estimation networks to extract a single source utilizing beamforming techniques demonstrate, that the integration of statistical models and deep neural networks (DNNs) are a promising approach for robust automatic speech recognition (ASR) applications. In this contribution we demonstrate how discriminatively trained embeddings on spectral features can be tightly integrated into statistical model-based source separation to separate and transcribe overlapping speech. Good generalization to unseen spatial configurations is achieved by estimating a statistical model at test time, while still leveraging discriminative training of deep clustering embeddings on a separate training set. We formulate an expectation maximization (EM) algorithm which jointly estimates a model for deep clustering embeddings and complex-valued spatial observations in the short time Fourier transform (STFT) domain at test time. Extensive simulations confirm, that the integrated model outperforms (a) a deep clustering model with a subsequent beamforming step and (b) an EM-based model with a beamforming step alone in terms of signal to distortion ratio (SDR) and perceptually motivated metric (PESQ) gains. ASR results on a reverberated dataset further show, that the aforementioned gains translate to reduced word error rates (WERs) even in reverberant environments.
Publishing Year
Proceedings Title
INTERSPEECH 2017, Stockholm, Schweden
LibreCat-ID
Cite this
Drude L, Haeb-Umbach R. Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings. In: INTERSPEECH 2017, Stockholm, Schweden. ; 2017.
Drude, L., & Haeb-Umbach, R. (2017). Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings. In INTERSPEECH 2017, Stockholm, Schweden.
@inproceedings{Drude_Haeb-Umbach_2017, title={Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings}, booktitle={INTERSPEECH 2017, Stockholm, Schweden}, author={Drude, Lukas and Haeb-Umbach, Reinhold}, year={2017} }
Drude, Lukas, and Reinhold Haeb-Umbach. “Tight Integration of Spatial and Spectral Features for BSS with Deep Clustering Embeddings.” In INTERSPEECH 2017, Stockholm, Schweden, 2017.
L. Drude and R. Haeb-Umbach, “Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings,” in INTERSPEECH 2017, Stockholm, Schweden, 2017.
Drude, Lukas, and Reinhold Haeb-Umbach. “Tight Integration of Spatial and Spectral Features for BSS with Deep Clustering Embeddings.” INTERSPEECH 2017, Stockholm, Schweden, 2017.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Link(s) to Main File(s)
Access Level
Closed Access
External material:
Supplementary Material
Description
Slides