{"main_file_link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2017/INTERSPEECH_2017_Drude_paper.pdf","open_access":"1"}],"year":"2017","user_id":"44006","related_material":{"link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2017/INTERSPEECH_2017_Drude_slides.pdf","description":"Slides","relation":"supplementary_material"}]},"author":[{"full_name":"Drude, Lukas","last_name":"Drude","first_name":"Lukas","id":"11213"},{"id":"242","first_name":"Reinhold","full_name":"Haeb-Umbach, Reinhold","last_name":"Haeb-Umbach"}],"department":[{"_id":"54"}],"_id":"11754","language":[{"iso":"eng"}],"title":"Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings","date_updated":"2022-01-06T06:51:08Z","publication":"INTERSPEECH 2017, Stockholm, Schweden","type":"conference","citation":{"mla":"Drude, Lukas, and Reinhold Haeb-Umbach. “Tight Integration of Spatial and Spectral Features for BSS with Deep Clustering Embeddings.” INTERSPEECH 2017, Stockholm, Schweden, 2017.","bibtex":"@inproceedings{Drude_Haeb-Umbach_2017, title={Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings}, booktitle={INTERSPEECH 2017, Stockholm, Schweden}, author={Drude, Lukas and Haeb-Umbach, Reinhold}, year={2017} }","apa":"Drude, L., & Haeb-Umbach, R. (2017). Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings. In INTERSPEECH 2017, Stockholm, Schweden.","chicago":"Drude, Lukas, and Reinhold Haeb-Umbach. “Tight Integration of Spatial and Spectral Features for BSS with Deep Clustering Embeddings.” In INTERSPEECH 2017, Stockholm, Schweden, 2017.","ama":"Drude L, Haeb-Umbach R. Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings. In: INTERSPEECH 2017, Stockholm, Schweden. ; 2017.","short":"L. Drude, R. Haeb-Umbach, in: INTERSPEECH 2017, Stockholm, Schweden, 2017.","ieee":"L. Drude and R. Haeb-Umbach, “Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings,” in INTERSPEECH 2017, Stockholm, Schweden, 2017."},"oa":"1","abstract":[{"lang":"eng","text":"Recent advances in discriminatively trained mask estimation networks to extract a single source utilizing beamforming techniques demonstrate, that the integration of statistical models and deep neural networks (DNNs) are a promising approach for robust automatic speech recognition (ASR) applications. In this contribution we demonstrate how discriminatively trained embeddings on spectral features can be tightly integrated into statistical model-based source separation to separate and transcribe overlapping speech. Good generalization to unseen spatial configurations is achieved by estimating a statistical model at test time, while still leveraging discriminative training of deep clustering embeddings on a separate training set. We formulate an expectation maximization (EM) algorithm which jointly estimates a model for deep clustering embeddings and complex-valued spatial observations in the short time Fourier transform (STFT) domain at test time. Extensive simulations confirm, that the integrated model outperforms (a) a deep clustering model with a subsequent beamforming step and (b) an EM-based model with a beamforming step alone in terms of signal to distortion ratio (SDR) and perceptually motivated metric (PESQ) gains. ASR results on a reverberated dataset further show, that the aforementioned gains translate to reduced word error rates (WERs) even in reverberant environments."}],"date_created":"2019-07-12T05:27:37Z","status":"public"}