Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings

L. Drude, R. Haeb-Umbach, in: INTERSPEECH 2017, Stockholm, Schweden, 2017.

Conference Paper | English
Abstract
Recent advances in discriminatively trained mask estimation networks to extract a single source utilizing beamforming techniques demonstrate, that the integration of statistical models and deep neural networks (DNNs) are a promising approach for robust automatic speech recognition (ASR) applications. In this contribution we demonstrate how discriminatively trained embeddings on spectral features can be tightly integrated into statistical model-based source separation to separate and transcribe overlapping speech. Good generalization to unseen spatial configurations is achieved by estimating a statistical model at test time, while still leveraging discriminative training of deep clustering embeddings on a separate training set. We formulate an expectation maximization (EM) algorithm which jointly estimates a model for deep clustering embeddings and complex-valued spatial observations in the short time Fourier transform (STFT) domain at test time. Extensive simulations confirm, that the integrated model outperforms (a) a deep clustering model with a subsequent beamforming step and (b) an EM-based model with a beamforming step alone in terms of signal to distortion ratio (SDR) and perceptually motivated metric (PESQ) gains. ASR results on a reverberated dataset further show, that the aforementioned gains translate to reduced word error rates (WERs) even in reverberant environments.
Publishing Year
Proceedings Title
INTERSPEECH 2017, Stockholm, Schweden
LibreCat-ID

Cite this

Drude L, Haeb-Umbach R. Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings. In: INTERSPEECH 2017, Stockholm, Schweden. ; 2017.
Drude, L., & Haeb-Umbach, R. (2017). Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings. In INTERSPEECH 2017, Stockholm, Schweden.
@inproceedings{Drude_Haeb-Umbach_2017, title={Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings}, booktitle={INTERSPEECH 2017, Stockholm, Schweden}, author={Drude, Lukas and Haeb-Umbach, Reinhold}, year={2017} }
Drude, Lukas, and Reinhold Haeb-Umbach. “Tight Integration of Spatial and Spectral Features for BSS with Deep Clustering Embeddings.” In INTERSPEECH 2017, Stockholm, Schweden, 2017.
L. Drude and R. Haeb-Umbach, “Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings,” in INTERSPEECH 2017, Stockholm, Schweden, 2017.
Drude, Lukas, and Reinhold Haeb-Umbach. “Tight Integration of Spatial and Spectral Features for BSS with Deep Clustering Embeddings.” INTERSPEECH 2017, Stockholm, Schweden, 2017.

Link(s) to Main File(s)
Access Level
Restricted Closed Access
External material:
Supplementary Material
Description
Slides

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar