[{"publication":"ICASSP 2018, Calgary, Canada","type":"conference","abstract":[{"text":"Deep attractor networks (DANs) are a recently introduced method to blindly separate sources from spectral features of a monaural recording using bidirectional long short-term memory networks (BLSTMs). Due to the nature of BLSTMs, this is inherently not online-ready and resorting to operating on blocks yields a block permutation problem in that the index of each speaker may change between blocks. We here propose the joint modeling of spatial and spectral features to solve the block permutation problem and generalize DANs to multi-channel meeting recordings: The DAN acts as a spectral feature extractor for a subsequent model-based clustering approach. We first analyze different joint models in batch-processing scenarios and finally propose a block-online blind source separation algorithm. The efficacy of the proposed models is demonstrated on reverberant mixtures corrupted by real recordings of multi-channel background noise. We demonstrate that both the proposed batch-processing and the proposed block-online system outperform (a) a spatial-only model with a state-of-the-art frequency permutation solver and (b) a spectral-only model with an oracle block permutation solver in terms of signal to distortion ratio (SDR) gains.","lang":"eng"}],"status":"public","_id":"12900","department":[{"_id":"54"}],"user_id":"44006","language":[{"iso":"eng"}],"related_material":{"link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2018/ICASSP_2018_Drude_Poster.pdf","relation":"supplementary_material","description":"Poster"}]},"year":"2018","citation":{"ama":"Drude L, Higuchi,  Takuya , Kinoshita K, Nakatani T, Haeb-Umbach R. Dual Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online Blind Source Separation. In: <i>ICASSP 2018, Calgary, Canada</i>. ; 2018.","ieee":"L. Drude,  Takuya  Higuchi, K. Kinoshita, T. Nakatani, and R. Haeb-Umbach, “Dual Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online Blind Source Separation,” in <i>ICASSP 2018, Calgary, Canada</i>, 2018.","chicago":"Drude, Lukas,  Takuya  Higuchi, Keisuke  Kinoshita, Tomohiro  Nakatani, and Reinhold Haeb-Umbach. “Dual Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online Blind Source Separation.” In <i>ICASSP 2018, Calgary, Canada</i>, 2018.","apa":"Drude, L., Higuchi,  Takuya , Kinoshita, K., Nakatani, T., &#38; Haeb-Umbach, R. (2018). Dual Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online Blind Source Separation. In <i>ICASSP 2018, Calgary, Canada</i>.","mla":"Drude, Lukas, et al. “Dual Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online Blind Source Separation.” <i>ICASSP 2018, Calgary, Canada</i>, 2018.","short":"L. Drude,  Takuya  Higuchi, K. Kinoshita, T. Nakatani, R. Haeb-Umbach, in: ICASSP 2018, Calgary, Canada, 2018.","bibtex":"@inproceedings{Drude_Higuchi,_Kinoshita_Nakatani_Haeb-Umbach_2018, title={Dual Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online Blind Source Separation}, booktitle={ICASSP 2018, Calgary, Canada}, author={Drude, Lukas and Higuchi,  Takuya  and Kinoshita, Keisuke  and Nakatani, Tomohiro  and Haeb-Umbach, Reinhold}, year={2018} }"},"date_updated":"2022-01-06T06:51:24Z","oa":"1","date_created":"2019-07-30T14:42:15Z","author":[{"full_name":"Drude, Lukas","id":"11213","last_name":"Drude","first_name":"Lukas"},{"last_name":"Higuchi,","full_name":"Higuchi,,  Takuya ","first_name":" Takuya "},{"full_name":"Kinoshita, Keisuke ","last_name":"Kinoshita","first_name":"Keisuke "},{"first_name":"Tomohiro ","last_name":"Nakatani","full_name":"Nakatani, Tomohiro "},{"id":"242","full_name":"Haeb-Umbach, Reinhold","last_name":"Haeb-Umbach","first_name":"Reinhold"}],"title":"Dual Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online Blind Source Separation","main_file_link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2018/ICASSP_2018_Drude_Paper.pdf","open_access":"1"}]},{"abstract":[{"lang":"eng","text":"This contribution presents a speech enhancement system for the CHiME-5 Dinner Party Scenario. The front-end employs multi-channel linear time-variant filtering and achieves its gains without the use of a neural network. We present an adaptation of blind source separation techniques to the CHiME-5 database which we call Guided Source Separation (GSS). Using the baseline acoustic and language model, the combination of Weighted Prediction Error based dereverberation, guided source separation, and beamforming reduces the WER by 10:54% (relative) for the single array track and by 21:12% (relative) on the multiple array track."}],"status":"public","publication":"Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India","type":"conference","language":[{"iso":"eng"}],"_id":"12899","project":[{"name":"Computing Resources Provided by the Paderborn Center for Parallel Computing","_id":"52"}],"department":[{"_id":"54"}],"user_id":"460","year":"2018","citation":{"ama":"Boeddeker C, Heitkaemper J, Schmalenstroeer J, Drude L, Heymann J, Haeb-Umbach R. Front-End Processing for the CHiME-5 Dinner Party Scenario. In: <i>Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India</i>. ; 2018.","ieee":"C. Boeddeker, J. Heitkaemper, J. Schmalenstroeer, L. Drude, J. Heymann, and R. Haeb-Umbach, “Front-End Processing for the CHiME-5 Dinner Party Scenario,” 2018.","chicago":"Boeddeker, Christoph, Jens Heitkaemper, Joerg Schmalenstroeer, Lukas Drude, Jahn Heymann, and Reinhold Haeb-Umbach. “Front-End Processing for the CHiME-5 Dinner Party Scenario.” In <i>Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India</i>, 2018.","bibtex":"@inproceedings{Boeddeker_Heitkaemper_Schmalenstroeer_Drude_Heymann_Haeb-Umbach_2018, title={Front-End Processing for the CHiME-5 Dinner Party Scenario}, booktitle={Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India}, author={Boeddeker, Christoph and Heitkaemper, Jens and Schmalenstroeer, Joerg and Drude, Lukas and Heymann, Jahn and Haeb-Umbach, Reinhold}, year={2018} }","mla":"Boeddeker, Christoph, et al. “Front-End Processing for the CHiME-5 Dinner Party Scenario.” <i>Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India</i>, 2018.","short":"C. Boeddeker, J. Heitkaemper, J. Schmalenstroeer, L. Drude, J. Heymann, R. Haeb-Umbach, in: Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India, 2018.","apa":"Boeddeker, C., Heitkaemper, J., Schmalenstroeer, J., Drude, L., Heymann, J., &#38; Haeb-Umbach, R. (2018). Front-End Processing for the CHiME-5 Dinner Party Scenario. <i>Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India</i>."},"quality_controlled":"1","related_material":{"link":[{"relation":"supplementary_material","description":"Poster","url":"https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Heitkaemper_Poster.pdf"}]},"title":"Front-End Processing for the CHiME-5 Dinner Party Scenario","main_file_link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Heitkaemper_Paper.pdf","open_access":"1"}],"date_updated":"2023-10-26T08:14:15Z","oa":"1","author":[{"first_name":"Christoph","last_name":"Boeddeker","id":"40767","full_name":"Boeddeker, Christoph"},{"last_name":"Heitkaemper","id":"27643","full_name":"Heitkaemper, Jens","first_name":"Jens"},{"first_name":"Joerg","last_name":"Schmalenstroeer","id":"460","full_name":"Schmalenstroeer, Joerg"},{"last_name":"Drude","id":"11213","full_name":"Drude, Lukas","first_name":"Lukas"},{"last_name":"Heymann","full_name":"Heymann, Jahn","first_name":"Jahn"},{"first_name":"Reinhold","last_name":"Haeb-Umbach","full_name":"Haeb-Umbach, Reinhold","id":"242"}],"date_created":"2019-07-30T14:35:15Z"},{"title":"The RWTH/UPB System Combination for the CHiME 2018 Workshop","main_file_link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Heitkaemper_RWTH_Paper.pdf","open_access":"1"}],"date_updated":"2023-10-26T08:12:14Z","oa":"1","date_created":"2019-07-12T05:29:58Z","author":[{"last_name":"Kitza","full_name":"Kitza, Markus","first_name":"Markus"},{"first_name":"Wilfried","full_name":"Michel, Wilfried","last_name":"Michel"},{"first_name":"Christoph","last_name":"Boeddeker","full_name":"Boeddeker, Christoph","id":"40767"},{"first_name":"Jens","full_name":"Heitkaemper, Jens","id":"27643","last_name":"Heitkaemper"},{"first_name":"Tobias","last_name":"Menne","full_name":"Menne, Tobias"},{"first_name":"Ralf","full_name":"Schlüter, Ralf","last_name":"Schlüter"},{"full_name":"Ney, Hermann","last_name":"Ney","first_name":"Hermann"},{"first_name":"Joerg","last_name":"Schmalenstroeer","id":"460","full_name":"Schmalenstroeer, Joerg"},{"first_name":"Lukas","id":"11213","full_name":"Drude, Lukas","last_name":"Drude"},{"full_name":"Heymann, Jahn","id":"9168","last_name":"Heymann","first_name":"Jahn"},{"id":"242","full_name":"Haeb-Umbach, Reinhold","last_name":"Haeb-Umbach","first_name":"Reinhold"}],"year":"2018","citation":{"ieee":"M. Kitza <i>et al.</i>, “The RWTH/UPB System Combination for the CHiME 2018 Workshop,” 2018.","chicago":"Kitza, Markus, Wilfried Michel, Christoph Boeddeker, Jens Heitkaemper, Tobias Menne, Ralf Schlüter, Hermann Ney, et al. “The RWTH/UPB System Combination for the CHiME 2018 Workshop.” In <i>Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India</i>, 2018.","ama":"Kitza M, Michel W, Boeddeker C, et al. The RWTH/UPB System Combination for the CHiME 2018 Workshop. In: <i>Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India</i>. ; 2018.","apa":"Kitza, M., Michel, W., Boeddeker, C., Heitkaemper, J., Menne, T., Schlüter, R., Ney, H., Schmalenstroeer, J., Drude, L., Heymann, J., &#38; Haeb-Umbach, R. (2018). The RWTH/UPB System Combination for the CHiME 2018 Workshop. <i>Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India</i>.","bibtex":"@inproceedings{Kitza_Michel_Boeddeker_Heitkaemper_Menne_Schlüter_Ney_Schmalenstroeer_Drude_Heymann_et al._2018, title={The RWTH/UPB System Combination for the CHiME 2018 Workshop}, booktitle={Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India}, author={Kitza, Markus and Michel, Wilfried and Boeddeker, Christoph and Heitkaemper, Jens and Menne, Tobias and Schlüter, Ralf and Ney, Hermann and Schmalenstroeer, Joerg and Drude, Lukas and Heymann, Jahn and et al.}, year={2018} }","mla":"Kitza, Markus, et al. “The RWTH/UPB System Combination for the CHiME 2018 Workshop.” <i>Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India</i>, 2018.","short":"M. Kitza, W. Michel, C. Boeddeker, J. Heitkaemper, T. Menne, R. Schlüter, H. Ney, J. Schmalenstroeer, L. Drude, J. Heymann, R. Haeb-Umbach, in: Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India, 2018."},"quality_controlled":"1","language":[{"iso":"eng"}],"_id":"11876","department":[{"_id":"54"}],"user_id":"460","abstract":[{"lang":"eng","text":"This paper describes the systems for the single-array track and the multiple-array track of the 5th CHiME Challenge. The final system is a combination of multiple systems, using Confusion Network Combination (CNC). The different systems presented here are utilizing different front-ends and training sets for a Bidirectional Long Short-Term Memory (BLSTM) Acoustic Model (AM). The front-end was replaced by enhancements provided by Paderborn University [1]. The back-end has been implemented using RASR [2] and RETURNN [3]. Additionally, a system combination including the hypothesis word graphs from the system of the submission [1] has been performed, which results in the final best system."}],"status":"public","publication":"Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India","type":"conference"},{"department":[{"_id":"54"}],"user_id":"40767","_id":"11735","language":[{"iso":"eng"}],"type":"report","status":"public","abstract":[{"text":"This report describes the computation of gradients by algorithmic differentiation for statistically optimum beamforming operations. Especially the derivation of complex-valued functions is a key component of this approach. Therefore the real-valued algorithmic differentiation is extended via the complex-valued chain rule. In addition to the basic mathematic operations the derivative of the eigenvalue problem with complex-valued eigenvectors is one of the key results of this report. The potential of this approach is shown with experimental results on the CHiME-3 challenge database. There, the beamforming task is used as a front-end for an ASR system. With the developed derivatives a joint optimization of a speech enhancement and speech recognition system w.r.t. the recognition optimization criterion is possible.","lang":"eng"}],"author":[{"first_name":"Christoph","last_name":"Boeddeker","id":"40767","full_name":"Boeddeker, Christoph"},{"first_name":"Patrick","full_name":"Hanebrink, Patrick","last_name":"Hanebrink"},{"first_name":"Lukas","last_name":"Drude","id":"11213","full_name":"Drude, Lukas"},{"first_name":"Jahn","full_name":"Heymann, Jahn","id":"9168","last_name":"Heymann"},{"id":"242","full_name":"Haeb-Umbach, Reinhold","last_name":"Haeb-Umbach","first_name":"Reinhold"}],"date_created":"2019-07-12T05:27:15Z","oa":"1","date_updated":"2022-01-06T06:51:08Z","main_file_link":[{"open_access":"1","url":"https://groups.uni-paderborn.de/nt/pubs/2017/ArXiv_2017_BoeddekerHanebrinkHaeb_Article.pdf"}],"title":"On the Computation of Complex-valued Gradients with Application to Statistically Optimum Beamforming","citation":{"mla":"Boeddeker, Christoph, et al. <i>On the Computation of Complex-Valued Gradients with Application to Statistically Optimum Beamforming</i>. 2017.","bibtex":"@book{Boeddeker_Hanebrink_Drude_Heymann_Haeb-Umbach_2017, title={On the Computation of Complex-valued Gradients with Application to Statistically Optimum Beamforming}, author={Boeddeker, Christoph and Hanebrink, Patrick and Drude, Lukas and Heymann, Jahn and Haeb-Umbach, Reinhold}, year={2017} }","short":"C. Boeddeker, P. Hanebrink, L. Drude, J. Heymann, R. Haeb-Umbach, On the Computation of Complex-Valued Gradients with Application to Statistically Optimum Beamforming, 2017.","apa":"Boeddeker, C., Hanebrink, P., Drude, L., Heymann, J., &#38; Haeb-Umbach, R. (2017). <i>On the Computation of Complex-valued Gradients with Application to Statistically Optimum Beamforming</i>.","chicago":"Boeddeker, Christoph, Patrick Hanebrink, Lukas Drude, Jahn Heymann, and Reinhold Haeb-Umbach. <i>On the Computation of Complex-Valued Gradients with Application to Statistically Optimum Beamforming</i>, 2017.","ieee":"C. Boeddeker, P. Hanebrink, L. Drude, J. Heymann, and R. Haeb-Umbach, <i>On the Computation of Complex-valued Gradients with Application to Statistically Optimum Beamforming</i>. 2017.","ama":"Boeddeker C, Hanebrink P, Drude L, Heymann J, Haeb-Umbach R. <i>On the Computation of Complex-Valued Gradients with Application to Statistically Optimum Beamforming</i>.; 2017."},"year":"2017"},{"date_created":"2019-07-12T05:27:16Z","author":[{"id":"40767","full_name":"Boeddeker, Christoph","last_name":"Boeddeker","first_name":"Christoph"},{"first_name":"Patrick","last_name":"Hanebrink","full_name":"Hanebrink, Patrick"},{"first_name":"Lukas","last_name":"Drude","id":"11213","full_name":"Drude, Lukas"},{"first_name":"Jahn","full_name":"Heymann, Jahn","id":"9168","last_name":"Heymann"},{"full_name":"Haeb-Umbach, Reinhold","id":"242","last_name":"Haeb-Umbach","first_name":"Reinhold"}],"oa":"1","date_updated":"2022-01-06T06:51:08Z","main_file_link":[{"open_access":"1","url":"https://groups.uni-paderborn.de/nt/pubs/2017/icassp_2017_boeddeker_paper.pdf"}],"title":"Optimizing Neural-Network Supported Acoustic Beamforming by Algorithmic Differentiation","citation":{"ieee":"C. Boeddeker, P. Hanebrink, L. Drude, J. Heymann, and R. Haeb-Umbach, “Optimizing Neural-Network Supported Acoustic Beamforming by Algorithmic Differentiation,” in <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2017.","chicago":"Boeddeker, Christoph, Patrick Hanebrink, Lukas Drude, Jahn Heymann, and Reinhold Haeb-Umbach. “Optimizing Neural-Network Supported Acoustic Beamforming by Algorithmic Differentiation.” In <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2017.","ama":"Boeddeker C, Hanebrink P, Drude L, Heymann J, Haeb-Umbach R. Optimizing Neural-Network Supported Acoustic Beamforming by Algorithmic Differentiation. In: <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>. ; 2017.","bibtex":"@inproceedings{Boeddeker_Hanebrink_Drude_Heymann_Haeb-Umbach_2017, title={Optimizing Neural-Network Supported Acoustic Beamforming by Algorithmic Differentiation}, booktitle={Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)}, author={Boeddeker, Christoph and Hanebrink, Patrick and Drude, Lukas and Heymann, Jahn and Haeb-Umbach, Reinhold}, year={2017} }","short":"C. Boeddeker, P. Hanebrink, L. Drude, J. Heymann, R. Haeb-Umbach, in: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017.","mla":"Boeddeker, Christoph, et al. “Optimizing Neural-Network Supported Acoustic Beamforming by Algorithmic Differentiation.” <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2017.","apa":"Boeddeker, C., Hanebrink, P., Drude, L., Heymann, J., &#38; Haeb-Umbach, R. (2017). Optimizing Neural-Network Supported Acoustic Beamforming by Algorithmic Differentiation. In <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>."},"year":"2017","department":[{"_id":"54"}],"user_id":"44006","_id":"11736","language":[{"iso":"eng"}],"publication":"Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)","type":"conference","status":"public","abstract":[{"lang":"eng","text":"In this paper we show how a neural network for spectral mask estimation for an acoustic beamformer can be optimized by algorithmic differentiation. Using the beamformer output SNR as the objective function to maximize, the gradient is propagated through the beamformer all the way to the neural network which provides the clean speech and noise masks from which the beamformer coefficients are estimated by eigenvalue decomposition. A key theoretical result is the derivative of an eigenvalue problem involving complex-valued eigenvectors. Experimental results on the CHiME-3 challenge database demonstrate the effectiveness of the approach. The tools developed in this paper are a key component for an end-to-end optimization of speech enhancement and speech recognition."}]},{"user_id":"44006","department":[{"_id":"54"}],"_id":"11754","language":[{"iso":"eng"}],"type":"conference","publication":"INTERSPEECH 2017, Stockholm, Schweden","status":"public","abstract":[{"text":"Recent advances in discriminatively trained mask estimation networks to extract a single source utilizing beamforming techniques demonstrate, that the integration of statistical models and deep neural networks (DNNs) are a promising approach for robust automatic speech recognition (ASR) applications. In this contribution we demonstrate how discriminatively trained embeddings on spectral features can be tightly integrated into statistical model-based source separation to separate and transcribe overlapping speech. Good generalization to unseen spatial configurations is achieved by estimating a statistical model at test time, while still leveraging discriminative training of deep clustering embeddings on a separate training set. We formulate an expectation maximization (EM) algorithm which jointly estimates a model for deep clustering embeddings and complex-valued spatial observations in the short time Fourier transform (STFT) domain at test time. Extensive simulations confirm, that the integrated model outperforms (a) a deep clustering model with a subsequent beamforming step and (b) an EM-based model with a beamforming step alone in terms of signal to distortion ratio (SDR) and perceptually motivated metric (PESQ) gains. ASR results on a reverberated dataset further show, that the aforementioned gains translate to reduced word error rates (WERs) even in reverberant environments.","lang":"eng"}],"author":[{"first_name":"Lukas","last_name":"Drude","id":"11213","full_name":"Drude, Lukas"},{"last_name":"Haeb-Umbach","id":"242","full_name":"Haeb-Umbach, Reinhold","first_name":"Reinhold"}],"date_created":"2019-07-12T05:27:37Z","oa":"1","date_updated":"2022-01-06T06:51:08Z","main_file_link":[{"open_access":"1","url":"https://groups.uni-paderborn.de/nt/pubs/2017/INTERSPEECH_2017_Drude_paper.pdf"}],"title":"Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings","related_material":{"link":[{"relation":"supplementary_material","description":"Slides","url":"https://groups.uni-paderborn.de/nt/pubs/2017/INTERSPEECH_2017_Drude_slides.pdf"}]},"citation":{"apa":"Drude, L., &#38; Haeb-Umbach, R. (2017). Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings. In <i>INTERSPEECH 2017, Stockholm, Schweden</i>.","mla":"Drude, Lukas, and Reinhold Haeb-Umbach. “Tight Integration of Spatial and Spectral Features for BSS with Deep Clustering Embeddings.” <i>INTERSPEECH 2017, Stockholm, Schweden</i>, 2017.","bibtex":"@inproceedings{Drude_Haeb-Umbach_2017, title={Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings}, booktitle={INTERSPEECH 2017, Stockholm, Schweden}, author={Drude, Lukas and Haeb-Umbach, Reinhold}, year={2017} }","short":"L. Drude, R. Haeb-Umbach, in: INTERSPEECH 2017, Stockholm, Schweden, 2017.","ama":"Drude L, Haeb-Umbach R. Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings. In: <i>INTERSPEECH 2017, Stockholm, Schweden</i>. ; 2017.","ieee":"L. Drude and R. Haeb-Umbach, “Tight integration of spatial and spectral features for BSS with Deep Clustering embeddings,” in <i>INTERSPEECH 2017, Stockholm, Schweden</i>, 2017.","chicago":"Drude, Lukas, and Reinhold Haeb-Umbach. “Tight Integration of Spatial and Spectral Features for BSS with Deep Clustering Embeddings.” In <i>INTERSPEECH 2017, Stockholm, Schweden</i>, 2017."},"year":"2017"},{"main_file_link":[{"open_access":"1","url":"https://groups.uni-paderborn.de/nt/pubs/2017/icassp_2017_heymann_paper.pdf"}],"title":"BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System","date_created":"2019-07-12T05:28:40Z","author":[{"first_name":"Jahn","full_name":"Heymann, Jahn","id":"9168","last_name":"Heymann"},{"full_name":"Drude, Lukas","id":"11213","last_name":"Drude","first_name":"Lukas"},{"first_name":"Christoph","last_name":"Boeddeker","full_name":"Boeddeker, Christoph","id":"40767"},{"first_name":"Patrick","full_name":"Hanebrink, Patrick","last_name":"Hanebrink"},{"first_name":"Reinhold","full_name":"Haeb-Umbach, Reinhold","id":"242","last_name":"Haeb-Umbach"}],"oa":"1","date_updated":"2022-01-06T06:51:09Z","citation":{"ieee":"J. Heymann, L. Drude, C. Boeddeker, P. Hanebrink, and R. Haeb-Umbach, “BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System,” in <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2017.","chicago":"Heymann, Jahn, Lukas Drude, Christoph Boeddeker, Patrick Hanebrink, and Reinhold Haeb-Umbach. “BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System.” In <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2017.","ama":"Heymann J, Drude L, Boeddeker C, Hanebrink P, Haeb-Umbach R. BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System. In: <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>. ; 2017.","bibtex":"@inproceedings{Heymann_Drude_Boeddeker_Hanebrink_Haeb-Umbach_2017, title={BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System}, booktitle={Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)}, author={Heymann, Jahn and Drude, Lukas and Boeddeker, Christoph and Hanebrink, Patrick and Haeb-Umbach, Reinhold}, year={2017} }","mla":"Heymann, Jahn, et al. “BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System.” <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2017.","short":"J. Heymann, L. Drude, C. Boeddeker, P. Hanebrink, R. Haeb-Umbach, in: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017.","apa":"Heymann, J., Drude, L., Boeddeker, C., Hanebrink, P., &#38; Haeb-Umbach, R. (2017). BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System. In <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>."},"year":"2017","related_material":{"link":[{"relation":"supplementary_material","description":"Poster","url":"https://groups.uni-paderborn.de/nt/pubs/2017/icassp_2017_heymann_poster.pdf"}]},"language":[{"iso":"eng"}],"department":[{"_id":"54"}],"user_id":"40767","_id":"11809","project":[{"_id":"52","name":"Computing Resources Provided by the Paderborn Center for Parallel Computing"}],"status":"public","abstract":[{"lang":"eng","text":"This paper presents an end-to-end training approach for a beamformer-supported multi-channel ASR system. A neural network which estimates masks for a statistically optimum beamformer is jointly trained with a network for acoustic modeling. To update its parameters, we propagate the gradients from the acoustic model all the way through feature extraction and the complex valued beamforming operation. Besides avoiding a mismatch between the front-end and the back-end, this approach also eliminates the need for stereo data, i.e., the parallel availability of clean and noisy versions of the signals. Instead, it can be trained with real noisy multichannel data only. Also, relying on the signal statistics for beamforming, the approach makes no assumptions on the configuration of the microphone array. We further observe a performance gain through joint training in terms of word error rate in an evaluation of the system on the CHiME 4 dataset."}],"publication":"Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)","type":"conference"},{"type":"journal_article","publication":"Computer Speech and Language","status":"public","abstract":[{"text":"Acoustic beamforming can greatly improve the performance of Automatic Speech Recognition (ASR) and speech enhancement systems when multiple channels are available. We recently proposed a way to support the model-based Generalized Eigenvalue beamforming operation with a powerful neural network for spectral mask estimation. The enhancement system has a number of desirable properties. In particular, neither assumptions need to be made about the nature of the acoustic transfer function (e.g., being anechonic), nor does the array configuration need to be known. While the system has been originally developed to enhance speech in noisy environments, we show in this article that it is also effective in suppressing reverberation, thus leading to a generic trainable multi-channel speech enhancement system for robust speech processing. To support this claim, we consider two distinct datasets: The CHiME 3 challenge, which features challenging real-world noise distortions, and the Reverb challenge, which focuses on distortions caused by reverberation. We evaluate the system both with respect to a speech enhancement and a recognition task. For the first task we propose a new way to cope with the distortions introduced by the Generalized Eigenvalue beamformer by renormalizing the target energy for each frequency bin, and measure its effectiveness in terms of the PESQ score. For the latter we feed the enhanced signal to a strong DNN back-end and achieve state-of-the-art ASR results on both datasets. We further experiment with different network architectures for spectral mask estimation: One small feed-forward network with only one hidden layer, one Convolutional Neural Network and one bi-directional Long Short-Term Memory network, showing that even a small network is capable of delivering significant performance improvements.","lang":"eng"}],"user_id":"44006","department":[{"_id":"54"}],"_id":"11811","language":[{"iso":"eng"}],"citation":{"short":"J. Heymann, L. Drude, R. Haeb-Umbach, Computer Speech and Language (2017).","mla":"Heymann, Jahn, et al. “A Generic Neural Acoustic Beamforming Architecture for Robust Multi-Channel Speech Processing.” <i>Computer Speech and Language</i>, 2017.","bibtex":"@article{Heymann_Drude_Haeb-Umbach_2017, title={A Generic Neural Acoustic Beamforming Architecture for Robust Multi-Channel Speech Processing}, journal={Computer Speech and Language}, author={Heymann, Jahn and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2017} }","apa":"Heymann, J., Drude, L., &#38; Haeb-Umbach, R. (2017). A Generic Neural Acoustic Beamforming Architecture for Robust Multi-Channel Speech Processing. <i>Computer Speech and Language</i>.","ama":"Heymann J, Drude L, Haeb-Umbach R. A Generic Neural Acoustic Beamforming Architecture for Robust Multi-Channel Speech Processing. <i>Computer Speech and Language</i>. 2017.","ieee":"J. Heymann, L. Drude, and R. Haeb-Umbach, “A Generic Neural Acoustic Beamforming Architecture for Robust Multi-Channel Speech Processing,” <i>Computer Speech and Language</i>, 2017.","chicago":"Heymann, Jahn, Lukas Drude, and Reinhold Haeb-Umbach. “A Generic Neural Acoustic Beamforming Architecture for Robust Multi-Channel Speech Processing.” <i>Computer Speech and Language</i>, 2017."},"year":"2017","date_created":"2019-07-12T05:28:43Z","author":[{"last_name":"Heymann","full_name":"Heymann, Jahn","id":"9168","first_name":"Jahn"},{"first_name":"Lukas","full_name":"Drude, Lukas","id":"11213","last_name":"Drude"},{"last_name":"Haeb-Umbach","id":"242","full_name":"Haeb-Umbach, Reinhold","first_name":"Reinhold"}],"oa":"1","date_updated":"2022-01-06T06:51:09Z","main_file_link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2017/ComputerSpeechLanguage_2017_heymann_paper.pdf","open_access":"1"}],"title":"A Generic Neural Acoustic Beamforming Architecture for Robust Multi-Channel Speech Processing"},{"type":"conference","publication":"INTERSPEECH 2017, Stockholm, Schweden","abstract":[{"text":"Variational Autoencoders (VAEs) have been shown to provide efficient neural-network-based approximate Bayesian inference for observation models for which exact inference is intractable. Its extension, the so-called Structured VAE (SVAE) allows inference in the presence of both discrete and continuous latent variables. Inspired by this extension, we developed a VAE with Hidden Markov Models (HMMs) as latent models. We applied the resulting HMM-VAE to the task of acoustic unit discovery in a zero resource scenario. Starting from an initial model based on variational inference in an HMM with Gaussian Mixture Model (GMM) emission probabilities, the accuracy of the acoustic unit discovery could be significantly improved by the HMM-VAE. In doing so we were able to demonstrate for an unsupervised learning task what is well-known in the supervised learning case: Neural networks provide superior modeling power compared to GMMs.","lang":"eng"}],"status":"public","_id":"11759","user_id":"34851","department":[{"_id":"54"}],"language":[{"iso":"eng"}],"quality_controlled":"1","related_material":{"link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2017/INTERSPEECH_2017_Ebbers_poster.pdf","description":"Poster","relation":"supplementary_material"},{"description":"Slides","relation":"supplementary_material","url":"https://groups.uni-paderborn.de/nt/pubs/2017/INTERSPEECH_2017_Ebbers_slides.pdf"}]},"year":"2017","citation":{"chicago":"Ebbers, Janek, Jahn Heymann, Lukas Drude, Thomas Glarner, Reinhold Haeb-Umbach, and Bhiksha Raj. “Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery.” In <i>INTERSPEECH 2017, Stockholm, Schweden</i>, 2017.","ieee":"J. Ebbers, J. Heymann, L. Drude, T. Glarner, R. Haeb-Umbach, and B. Raj, “Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery,” 2017.","ama":"Ebbers J, Heymann J, Drude L, Glarner T, Haeb-Umbach R, Raj B. Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery. In: <i>INTERSPEECH 2017, Stockholm, Schweden</i>. ; 2017.","apa":"Ebbers, J., Heymann, J., Drude, L., Glarner, T., Haeb-Umbach, R., &#38; Raj, B. (2017). Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery. <i>INTERSPEECH 2017, Stockholm, Schweden</i>.","mla":"Ebbers, Janek, et al. “Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery.” <i>INTERSPEECH 2017, Stockholm, Schweden</i>, 2017.","bibtex":"@inproceedings{Ebbers_Heymann_Drude_Glarner_Haeb-Umbach_Raj_2017, title={Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery}, booktitle={INTERSPEECH 2017, Stockholm, Schweden}, author={Ebbers, Janek and Heymann, Jahn and Drude, Lukas and Glarner, Thomas and Haeb-Umbach, Reinhold and Raj, Bhiksha}, year={2017} }","short":"J. Ebbers, J. Heymann, L. Drude, T. Glarner, R. Haeb-Umbach, B. Raj, in: INTERSPEECH 2017, Stockholm, Schweden, 2017."},"date_updated":"2023-11-22T08:29:06Z","oa":"1","author":[{"first_name":"Janek","full_name":"Ebbers, Janek","id":"34851","last_name":"Ebbers"},{"full_name":"Heymann, Jahn","id":"9168","last_name":"Heymann","first_name":"Jahn"},{"id":"11213","full_name":"Drude, Lukas","last_name":"Drude","first_name":"Lukas"},{"first_name":"Thomas","last_name":"Glarner","full_name":"Glarner, Thomas","id":"14169"},{"last_name":"Haeb-Umbach","id":"242","full_name":"Haeb-Umbach, Reinhold","first_name":"Reinhold"},{"last_name":"Raj","full_name":"Raj, Bhiksha","first_name":"Bhiksha"}],"date_created":"2019-07-12T05:27:42Z","title":"Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery","main_file_link":[{"open_access":"1","url":"https://groups.uni-paderborn.de/nt/pubs/2017/INTERSPEECH_2017_Ebbers_paper.pdf"}]},{"author":[{"first_name":"Joerg","last_name":"Schmalenstroeer","id":"460","full_name":"Schmalenstroeer, Joerg"},{"last_name":"Heymann","full_name":"Heymann, Jahn","id":"9168","first_name":"Jahn"},{"first_name":"Lukas","id":"11213","full_name":"Drude, Lukas","last_name":"Drude"},{"first_name":"Christoph","full_name":"Boeddeker, Christoph","id":"40767","last_name":"Boeddeker"},{"first_name":"Reinhold","last_name":"Haeb-Umbach","id":"242","full_name":"Haeb-Umbach, Reinhold"}],"date_created":"2019-07-12T05:30:20Z","oa":"1","date_updated":"2023-10-26T08:12:05Z","main_file_link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2017/MMSP_2017_SchHaeb.pdf","open_access":"1"}],"title":"Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic Beamforming","related_material":{"link":[{"relation":"supplementary_material","description":"Poster","url":"https://groups.uni-paderborn.de/nt/pubs/2017/MMSP_2017_SchHaeb_poster.pdf"}]},"quality_controlled":"1","citation":{"ama":"Schmalenstroeer J, Heymann J, Drude L, Boeddeker C, Haeb-Umbach R. Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic Beamforming. In: <i>IEEE 19th International Workshop on Multimedia Signal Processing (MMSP)</i>. ; 2017.","ieee":"J. Schmalenstroeer, J. Heymann, L. Drude, C. Boeddeker, and R. Haeb-Umbach, “Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic Beamforming,” 2017.","chicago":"Schmalenstroeer, Joerg, Jahn Heymann, Lukas Drude, Christoph Boeddeker, and Reinhold Haeb-Umbach. “Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic Beamforming.” In <i>IEEE 19th International Workshop on Multimedia Signal Processing (MMSP)</i>, 2017.","mla":"Schmalenstroeer, Joerg, et al. “Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic Beamforming.” <i>IEEE 19th International Workshop on Multimedia Signal Processing (MMSP)</i>, 2017.","bibtex":"@inproceedings{Schmalenstroeer_Heymann_Drude_Boeddeker_Haeb-Umbach_2017, title={Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic Beamforming}, booktitle={IEEE 19th International Workshop on Multimedia Signal Processing (MMSP)}, author={Schmalenstroeer, Joerg and Heymann, Jahn and Drude, Lukas and Boeddeker, Christoph and Haeb-Umbach, Reinhold}, year={2017} }","short":"J. Schmalenstroeer, J. Heymann, L. Drude, C. Boeddeker, R. Haeb-Umbach, in: IEEE 19th International Workshop on Multimedia Signal Processing (MMSP), 2017.","apa":"Schmalenstroeer, J., Heymann, J., Drude, L., Boeddeker, C., &#38; Haeb-Umbach, R. (2017). Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic Beamforming. <i>IEEE 19th International Workshop on Multimedia Signal Processing (MMSP)</i>."},"year":"2017","department":[{"_id":"54"}],"user_id":"460","_id":"11895","language":[{"iso":"eng"}],"publication":"IEEE 19th International Workshop on Multimedia Signal Processing (MMSP)","type":"conference","status":"public","abstract":[{"text":"Multi-channel speech enhancement algorithms rely on a synchronous sampling of the microphone signals. This, however, cannot always be guaranteed, especially if the sensors are distributed in an environment. To avoid performance degradation the sampling rate offset needs to be estimated and compensated for. In this contribution we extend the recently proposed coherence drift based method in two important directions. First, the increasing phase shift in the short-time Fourier transform domain is estimated from the coherence drift in a Matched Filterlike fashion, where intermediate estimates are weighted by their instantaneous SNR. Second, an observed bias is removed by iterating between offset estimation and compensation by resampling a couple of times. The effectiveness of the proposed method is demonstrated by speech recognition results on the output of a beamformer with and without sampling rate offset compensation between the input channels. We compare MVDR and maximum-SNR beamformers in reverberant environments and further show that both benefit from a novel phase normalization, which we also propose in this contribution.","lang":"eng"}]},{"status":"public","abstract":[{"lang":"eng","text":"A noise power spectral density (PSD) estimation is an indispensable component of speech spectral enhancement systems. In this paper we present a noise PSD tracking algorithm, which employs a noise presence probability estimate delivered by a deep neural network (DNN). The algorithm provides a causal noise PSD estimate and can thus be used in speech enhancement systems for communication purposes. An extensive performance comparison has been carried out with ten causal state-of-the-art noise tracking algorithms taken from the literature and categorized acc. to applied techniques. The experiments showed that the proposed DNN-based noise PSD tracker outperforms all competing methods with respect to all tested performance measures, which include the noise tracking performance and the performance of a speech enhancement system employing the noise tracking component."}],"type":"conference","publication":"12. ITG Fachtagung Sprachkommunikation (ITG 2016)","language":[{"iso":"eng"}],"user_id":"44006","department":[{"_id":"54"}],"_id":"11744","citation":{"apa":"Chinaev, A., Heymann, J., Drude, L., &#38; Haeb-Umbach, R. (2016). Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs. In <i>12. ITG Fachtagung Sprachkommunikation (ITG 2016)</i>.","mla":"Chinaev, Aleksej, et al. “Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs.” <i>12. ITG Fachtagung Sprachkommunikation (ITG 2016)</i>, 2016.","bibtex":"@inproceedings{Chinaev_Heymann_Drude_Haeb-Umbach_2016, title={Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs}, booktitle={12. ITG Fachtagung Sprachkommunikation (ITG 2016)}, author={Chinaev, Aleksej and Heymann, Jahn and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2016} }","short":"A. Chinaev, J. Heymann, L. Drude, R. Haeb-Umbach, in: 12. ITG Fachtagung Sprachkommunikation (ITG 2016), 2016.","chicago":"Chinaev, Aleksej, Jahn Heymann, Lukas Drude, and Reinhold Haeb-Umbach. “Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs.” In <i>12. ITG Fachtagung Sprachkommunikation (ITG 2016)</i>, 2016.","ieee":"A. Chinaev, J. Heymann, L. Drude, and R. Haeb-Umbach, “Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs,” in <i>12. ITG Fachtagung Sprachkommunikation (ITG 2016)</i>, 2016.","ama":"Chinaev A, Heymann J, Drude L, Haeb-Umbach R. Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs. In: <i>12. ITG Fachtagung Sprachkommunikation (ITG 2016)</i>. ; 2016."},"year":"2016","related_material":{"link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2016/ChHeyDrHa16_Presentation.pdf","relation":"supplementary_material","description":"Presentation"}]},"main_file_link":[{"open_access":"1","url":"https://groups.uni-paderborn.de/nt/pubs/2016/ChHeyDrHa16.pdf"}],"title":"Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs","author":[{"first_name":"Aleksej","full_name":"Chinaev, Aleksej","last_name":"Chinaev"},{"full_name":"Heymann, Jahn","id":"9168","last_name":"Heymann","first_name":"Jahn"},{"first_name":"Lukas","id":"11213","full_name":"Drude, Lukas","last_name":"Drude"},{"full_name":"Haeb-Umbach, Reinhold","id":"242","last_name":"Haeb-Umbach","first_name":"Reinhold"}],"date_created":"2019-07-12T05:27:25Z","oa":"1","date_updated":"2022-01-06T06:51:08Z"},{"related_material":{"link":[{"relation":"supplementary_material","description":"Slides","url":"https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_drude_slides.pdf"}]},"year":"2016","citation":{"ama":"Drude L, Boeddeker C, Haeb-Umbach R. Blind Speech Separation based on Complex Spherical k-Mode Clustering. In: <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>. ; 2016.","ieee":"L. Drude, C. Boeddeker, and R. Haeb-Umbach, “Blind Speech Separation based on Complex Spherical k-Mode Clustering,” in <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2016.","chicago":"Drude, Lukas, Christoph Boeddeker, and Reinhold Haeb-Umbach. “Blind Speech Separation Based on Complex Spherical K-Mode Clustering.” In <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2016.","short":"L. Drude, C. Boeddeker, R. Haeb-Umbach, in: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2016.","bibtex":"@inproceedings{Drude_Boeddeker_Haeb-Umbach_2016, title={Blind Speech Separation based on Complex Spherical k-Mode Clustering}, booktitle={Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)}, author={Drude, Lukas and Boeddeker, Christoph and Haeb-Umbach, Reinhold}, year={2016} }","mla":"Drude, Lukas, et al. “Blind Speech Separation Based on Complex Spherical K-Mode Clustering.” <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2016.","apa":"Drude, L., Boeddeker, C., &#38; Haeb-Umbach, R. (2016). Blind Speech Separation based on Complex Spherical k-Mode Clustering. In <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>."},"oa":"1","date_updated":"2022-01-06T06:51:08Z","date_created":"2019-07-12T05:27:33Z","author":[{"first_name":"Lukas","id":"11213","full_name":"Drude, Lukas","last_name":"Drude"},{"first_name":"Christoph","id":"40767","full_name":"Boeddeker, Christoph","last_name":"Boeddeker"},{"first_name":"Reinhold","id":"242","full_name":"Haeb-Umbach, Reinhold","last_name":"Haeb-Umbach"}],"title":"Blind Speech Separation based on Complex Spherical k-Mode Clustering","main_file_link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_drude_paper.pdf","open_access":"1"}],"publication":"Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)","type":"conference","status":"public","_id":"11751","department":[{"_id":"54"}],"user_id":"44006","language":[{"iso":"eng"}]},{"status":"public","abstract":[{"text":"Although complex-valued neural networks (CVNNs) â?? networks which can operate with complex arithmetic â?? have been around for a while, they have not been given reconsideration since the breakthrough of deep network architectures. This paper presents a critical assessment whether the novel tool set of deep neural networks (DNNs) should be extended to complex-valued arithmetic. Indeed, with DNNs making inroads in speech enhancement tasks, the use of complex-valued input data, specifically the short-time Fourier transform coefficients, is an obvious consideration. In particular when it comes to performing tasks that heavily rely on phase information, such as acoustic beamforming, complex-valued algorithms are omnipresent. In this contribution we recapitulate backpropagation in CVNNs, develop complex-valued network elements, such as the split-rectified non-linearity, and compare real- and complex-valued networks on a beamforming task. We find that CVNNs hardly provide a performance gain and conclude that the effort of developing the complex-valued counterparts of the building blocks of modern deep or recurrent neural networks can hardly be justified.","lang":"eng"}],"type":"conference","publication":"INTERSPEECH 2016, San Francisco, USA","language":[{"iso":"eng"}],"user_id":"44006","department":[{"_id":"54"}],"_id":"11756","citation":{"ama":"Drude L, Raj B, Haeb-Umbach R. On the appropriateness of complex-valued neural networks for speech enhancement. In: <i>INTERSPEECH 2016, San Francisco, USA</i>. ; 2016.","chicago":"Drude, Lukas, Bhiksha Raj, and Reinhold Haeb-Umbach. “On the Appropriateness of Complex-Valued Neural Networks for Speech Enhancement.” In <i>INTERSPEECH 2016, San Francisco, USA</i>, 2016.","ieee":"L. Drude, B. Raj, and R. Haeb-Umbach, “On the appropriateness of complex-valued neural networks for speech enhancement,” in <i>INTERSPEECH 2016, San Francisco, USA</i>, 2016.","apa":"Drude, L., Raj, B., &#38; Haeb-Umbach, R. (2016). On the appropriateness of complex-valued neural networks for speech enhancement. In <i>INTERSPEECH 2016, San Francisco, USA</i>.","short":"L. Drude, B. Raj, R. Haeb-Umbach, in: INTERSPEECH 2016, San Francisco, USA, 2016.","bibtex":"@inproceedings{Drude_Raj_Haeb-Umbach_2016, title={On the appropriateness of complex-valued neural networks for speech enhancement}, booktitle={INTERSPEECH 2016, San Francisco, USA}, author={Drude, Lukas and Raj, Bhiksha and Haeb-Umbach, Reinhold}, year={2016} }","mla":"Drude, Lukas, et al. “On the Appropriateness of Complex-Valued Neural Networks for Speech Enhancement.” <i>INTERSPEECH 2016, San Francisco, USA</i>, 2016."},"year":"2016","related_material":{"link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2016/interspeech_2016_drude_slides.pdf","description":"Poster","relation":"supplementary_material"}]},"main_file_link":[{"open_access":"1","url":"https://groups.uni-paderborn.de/nt/pubs/2016/interspeech_2016_drude_paper.pdf"}],"title":"On the appropriateness of complex-valued neural networks for speech enhancement","author":[{"last_name":"Drude","full_name":"Drude, Lukas","id":"11213","first_name":"Lukas"},{"first_name":"Bhiksha","last_name":"Raj","full_name":"Raj, Bhiksha"},{"first_name":"Reinhold","last_name":"Haeb-Umbach","full_name":"Haeb-Umbach, Reinhold","id":"242"}],"date_created":"2019-07-12T05:27:39Z","date_updated":"2022-01-06T06:51:08Z","oa":"1"},{"_id":"11771","department":[{"_id":"54"}],"user_id":"44006","language":[{"iso":"eng"}],"publication":"12. ITG Fachtagung Sprachkommunikation (ITG 2016)","type":"conference","abstract":[{"lang":"eng","text":"This paper is concerned with speech presence probability estimation employing an explicit model of the temporal and spectral correlations of speech. An undirected graphical model is introduced, based on a Factor Graph formulation. It is shown that this undirected model cures some of the theoretical issues of an earlier directed graphical model. Furthermore, we formulate a message passing inference scheme based on an approximate graph factorization, identify this inference scheme as a particular message passing schedule based on the turbo principle and suggest further alternative schedules. The experiments show an improved performance over speech presence probability estimation based on an IID assumption, and a slightly better performance of the turbo schedule over the alternatives."}],"status":"public","oa":"1","date_updated":"2022-01-06T06:51:08Z","author":[{"id":"14169","full_name":"Glarner, Thomas","last_name":"Glarner","first_name":"Thomas"},{"first_name":"Mohammad","last_name":"Mahdi Momenzadeh","full_name":"Mahdi Momenzadeh, Mohammad"},{"last_name":"Drude","id":"11213","full_name":"Drude, Lukas","first_name":"Lukas"},{"first_name":"Reinhold","id":"242","full_name":"Haeb-Umbach, Reinhold","last_name":"Haeb-Umbach"}],"date_created":"2019-07-12T05:27:56Z","title":"Factor Graph Decoding for Speech Presence Probability Estimation","main_file_link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2016/itgspeech2016_08_Glarner.pdf","open_access":"1"}],"related_material":{"link":[{"relation":"supplementary_material","description":"Slides","url":"https://groups.uni-paderborn.de/nt/pubs/2016/itgspeech2016_08_Glarner_slides.pdf"}]},"year":"2016","citation":{"ieee":"T. Glarner, M. Mahdi Momenzadeh, L. Drude, and R. Haeb-Umbach, “Factor Graph Decoding for Speech Presence Probability Estimation,” in <i>12. ITG Fachtagung Sprachkommunikation (ITG 2016)</i>, 2016.","chicago":"Glarner, Thomas, Mohammad Mahdi Momenzadeh, Lukas Drude, and Reinhold Haeb-Umbach. “Factor Graph Decoding for Speech Presence Probability Estimation.” In <i>12. ITG Fachtagung Sprachkommunikation (ITG 2016)</i>, 2016.","ama":"Glarner T, Mahdi Momenzadeh M, Drude L, Haeb-Umbach R. Factor Graph Decoding for Speech Presence Probability Estimation. In: <i>12. ITG Fachtagung Sprachkommunikation (ITG 2016)</i>. ; 2016.","apa":"Glarner, T., Mahdi Momenzadeh, M., Drude, L., &#38; Haeb-Umbach, R. (2016). Factor Graph Decoding for Speech Presence Probability Estimation. In <i>12. ITG Fachtagung Sprachkommunikation (ITG 2016)</i>.","bibtex":"@inproceedings{Glarner_Mahdi Momenzadeh_Drude_Haeb-Umbach_2016, title={Factor Graph Decoding for Speech Presence Probability Estimation}, booktitle={12. ITG Fachtagung Sprachkommunikation (ITG 2016)}, author={Glarner, Thomas and Mahdi Momenzadeh, Mohammad and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2016} }","short":"T. Glarner, M. Mahdi Momenzadeh, L. Drude, R. Haeb-Umbach, in: 12. ITG Fachtagung Sprachkommunikation (ITG 2016), 2016.","mla":"Glarner, Thomas, et al. “Factor Graph Decoding for Speech Presence Probability Estimation.” <i>12. ITG Fachtagung Sprachkommunikation (ITG 2016)</i>, 2016."}},{"_id":"11812","user_id":"44006","department":[{"_id":"54"}],"language":[{"iso":"eng"}],"type":"conference","publication":"Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)","status":"public","oa":"1","date_updated":"2022-01-06T06:51:09Z","author":[{"first_name":"Jahn","last_name":"Heymann","id":"9168","full_name":"Heymann, Jahn"},{"last_name":"Drude","full_name":"Drude, Lukas","id":"11213","first_name":"Lukas"},{"full_name":"Haeb-Umbach, Reinhold","id":"242","last_name":"Haeb-Umbach","first_name":"Reinhold"}],"date_created":"2019-07-12T05:28:44Z","title":"Neural Network Based Spectral Mask Estimation for Acoustic Beamforming","main_file_link":[{"open_access":"1","url":"https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_heymann_paper.pdf"}],"related_material":{"link":[{"description":"Slides","relation":"supplementary_material","url":"https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_heymann_slides.pdf"}]},"year":"2016","citation":{"apa":"Heymann, J., Drude, L., &#38; Haeb-Umbach, R. (2016). Neural Network Based Spectral Mask Estimation for Acoustic Beamforming. In <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>.","mla":"Heymann, Jahn, et al. “Neural Network Based Spectral Mask Estimation for Acoustic Beamforming.” <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2016.","short":"J. Heymann, L. Drude, R. Haeb-Umbach, in: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2016.","bibtex":"@inproceedings{Heymann_Drude_Haeb-Umbach_2016, title={Neural Network Based Spectral Mask Estimation for Acoustic Beamforming}, booktitle={Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)}, author={Heymann, Jahn and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2016} }","ieee":"J. Heymann, L. Drude, and R. Haeb-Umbach, “Neural Network Based Spectral Mask Estimation for Acoustic Beamforming,” in <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2016.","chicago":"Heymann, Jahn, Lukas Drude, and Reinhold Haeb-Umbach. “Neural Network Based Spectral Mask Estimation for Acoustic Beamforming.” In <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2016.","ama":"Heymann J, Drude L, Haeb-Umbach R. Neural Network Based Spectral Mask Estimation for Acoustic Beamforming. In: <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>. ; 2016."}},{"status":"public","abstract":[{"lang":"eng","text":"We present a system for the 4th CHiME challenge which significantly increases the performance for all three tracks with respect to the provided baseline system. The front-end uses a bi-directional Long Short-Term Memory (BLSTM)-based neural network to estimate signal statistics. These then steer a Generalized Eigenvalue beamformer. The back-end consists of a 22 layer deep Wide Residual Network and two extra BLSTM layers. Working on a whole utterance instead of frames allows us to refine Batch-Normalization. We also train our own BLSTM-based language model. Adding a discriminative speaker adaptation leads to further gains. The final system achieves a word error rate on the six channel real test data of 3.48%. For the two channel track we achieve 5.96% and for the one channel track 9.34%. This is the best reported performance on the challenge achieved by a single system, i.e., a configuration, which does not combine multiple systems. At the same time, our system is independent of the microphone configuration. We can thus use the same components for all three tracks."}],"type":"conference","publication":"Computer Speech and Language","language":[{"iso":"eng"}],"user_id":"44006","department":[{"_id":"54"}],"_id":"11834","citation":{"short":"J. Heymann, L. Drude, R. Haeb-Umbach, in: Computer Speech and Language, 2016.","mla":"Heymann, Jahn, et al. “Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition.” <i>Computer Speech and Language</i>, 2016.","bibtex":"@inproceedings{Heymann_Drude_Haeb-Umbach_2016, title={Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition}, booktitle={Computer Speech and Language}, author={Heymann, Jahn and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2016} }","apa":"Heymann, J., Drude, L., &#38; Haeb-Umbach, R. (2016). Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition. In <i>Computer Speech and Language</i>.","ieee":"J. Heymann, L. Drude, and R. Haeb-Umbach, “Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition,” in <i>Computer Speech and Language</i>, 2016.","chicago":"Heymann, Jahn, Lukas Drude, and Reinhold Haeb-Umbach. “Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition.” In <i>Computer Speech and Language</i>, 2016.","ama":"Heymann J, Drude L, Haeb-Umbach R. Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition. In: <i>Computer Speech and Language</i>. ; 2016."},"year":"2016","related_material":{"link":[{"relation":"supplementary_material","description":"Poster","url":"https://groups.uni-paderborn.de/nt/pubs/2016/chime4_upbonly_poster.pdf"}]},"main_file_link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2016/chime4_upbonly_paper.pdf","open_access":"1"}],"title":"Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition","date_created":"2019-07-12T05:29:09Z","author":[{"first_name":"Jahn","last_name":"Heymann","id":"9168","full_name":"Heymann, Jahn"},{"first_name":"Lukas","last_name":"Drude","full_name":"Drude, Lukas","id":"11213"},{"full_name":"Haeb-Umbach, Reinhold","id":"242","last_name":"Haeb-Umbach","first_name":"Reinhold"}],"date_updated":"2022-01-06T06:51:11Z","oa":"1"},{"citation":{"ama":"Menne T, Heymann J, Alexandridis A, et al. The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation. In: <i>Computer Speech and Language</i>. ; 2016.","ieee":"T. Menne <i>et al.</i>, “The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation,” in <i>Computer Speech and Language</i>, 2016.","chicago":"Menne, Tobias, Jahn Heymann, Anastasios Alexandridis, Kazuki Irie, Albert Zeyer, Markus Kitza, Pavel Golik, et al. “The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation.” In <i>Computer Speech and Language</i>, 2016.","bibtex":"@inproceedings{Menne_Heymann_Alexandridis_Irie_Zeyer_Kitza_Golik_Kulikov_Drude_Schlüter_et al._2016, title={The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation}, booktitle={Computer Speech and Language}, author={Menne, Tobias and Heymann, Jahn and Alexandridis, Anastasios and Irie, Kazuki and Zeyer, Albert and Kitza, Markus and Golik, Pavel and Kulikov, Ilia and Drude, Lukas and Schlüter, Ralf and et al.}, year={2016} }","short":"T. Menne, J. Heymann, A. Alexandridis, K. Irie, A. Zeyer, M. Kitza, P. Golik, I. Kulikov, L. Drude, R. Schlüter, H. Ney, R. Haeb-Umbach, A. Mouchtaris, in: Computer Speech and Language, 2016.","mla":"Menne, Tobias, et al. “The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation.” <i>Computer Speech and Language</i>, 2016.","apa":"Menne, T., Heymann, J., Alexandridis, A., Irie, K., Zeyer, A., Kitza, M., … Mouchtaris, A. (2016). The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation. In <i>Computer Speech and Language</i>."},"year":"2016","main_file_link":[{"open_access":"1","url":"https://groups.uni-paderborn.de/nt/pubs/2016/chime4_rwthupbforth_paper.pdf"}],"title":"The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation","author":[{"first_name":"Tobias","full_name":"Menne, Tobias","last_name":"Menne"},{"first_name":"Jahn","id":"9168","full_name":"Heymann, Jahn","last_name":"Heymann"},{"first_name":"Anastasios","last_name":"Alexandridis","full_name":"Alexandridis, Anastasios"},{"first_name":"Kazuki","last_name":"Irie","full_name":"Irie, Kazuki"},{"first_name":"Albert","last_name":"Zeyer","full_name":"Zeyer, Albert"},{"full_name":"Kitza, Markus","last_name":"Kitza","first_name":"Markus"},{"last_name":"Golik","full_name":"Golik, Pavel","first_name":"Pavel"},{"first_name":"Ilia","last_name":"Kulikov","full_name":"Kulikov, Ilia"},{"first_name":"Lukas","id":"11213","full_name":"Drude, Lukas","last_name":"Drude"},{"first_name":"Ralf","full_name":"Schlüter, Ralf","last_name":"Schlüter"},{"last_name":"Ney","full_name":"Ney, Hermann","first_name":"Hermann"},{"last_name":"Haeb-Umbach","full_name":"Haeb-Umbach, Reinhold","id":"242","first_name":"Reinhold"},{"full_name":"Mouchtaris, Athanasios","last_name":"Mouchtaris","first_name":"Athanasios"}],"date_created":"2019-07-12T05:30:35Z","oa":"1","date_updated":"2022-01-06T06:51:12Z","status":"public","abstract":[{"lang":"eng","text":"This paper describes automatic speech recognition (ASR) systems developed jointly by RWTH, UPB and FORTH for the 1ch, 2ch and 6ch track of the 4th CHiME Challenge. In the 2ch and 6ch tracks the final system output is obtained by a Confusion Network Combination (CNC) of multiple systems. The Acoustic Model (AM) is a deep neural network based on Bidirectional Long Short-Term Memory (BLSTM) units. The systems differ by front ends and training sets used for the acoustic training. The model for the 1ch track is trained without any preprocessing. For each front end we trained and evaluated individual acoustic models. We compare the ASR performance of different beamforming approaches: a conventional superdirective beamformer [1] and an MVDR beamformer as in [2], where the steering vector is estimated based on [3]. Furthermore we evaluated a BLSTM supported Generalized Eigenvalue beamformer using NN-GEV [4]. The back end is implemented using RWTH?s open-source toolkits RASR [5], RETURNN [6] and rwthlm [7]. We rescore lattices with a Long Short-Term Memory (LSTM) based language model. The overall best results are obtained by a system combination that includes the lattices from the system of UPB?s submission [8]. Our final submission scored second in each of the three tracks of the 4th CHiME Challenge."}],"publication":"Computer Speech and Language","type":"conference","language":[{"iso":"eng"}],"department":[{"_id":"54"}],"user_id":"44006","_id":"11908"},{"citation":{"ieee":"L. Drude, F. Jacob, and R. Haeb-Umbach, “DOA-Estimation based on a Complex Watson Kernel Method,” in <i>23th European Signal Processing Conference (EUSIPCO 2015)</i>, 2015.","chicago":"Drude, Lukas, Florian Jacob, and Reinhold Haeb-Umbach. “DOA-Estimation Based on a Complex Watson Kernel Method.” In <i>23th European Signal Processing Conference (EUSIPCO 2015)</i>, 2015.","ama":"Drude L, Jacob F, Haeb-Umbach R. DOA-Estimation based on a Complex Watson Kernel Method. In: <i>23th European Signal Processing Conference (EUSIPCO 2015)</i>. ; 2015.","apa":"Drude, L., Jacob, F., &#38; Haeb-Umbach, R. (2015). DOA-Estimation based on a Complex Watson Kernel Method. In <i>23th European Signal Processing Conference (EUSIPCO 2015)</i>.","short":"L. Drude, F. Jacob, R. Haeb-Umbach, in: 23th European Signal Processing Conference (EUSIPCO 2015), 2015.","bibtex":"@inproceedings{Drude_Jacob_Haeb-Umbach_2015, title={DOA-Estimation based on a Complex Watson Kernel Method}, booktitle={23th European Signal Processing Conference (EUSIPCO 2015)}, author={Drude, Lukas and Jacob, Florian and Haeb-Umbach, Reinhold}, year={2015} }","mla":"Drude, Lukas, et al. “DOA-Estimation Based on a Complex Watson Kernel Method.” <i>23th European Signal Processing Conference (EUSIPCO 2015)</i>, 2015."},"year":"2015","related_material":{"link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2015/DrJaHa15_Presentation.pdf","description":"Presentation","relation":"supplementary_material"}]},"main_file_link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2015/DrJaHa15.pdf","open_access":"1"}],"title":"DOA-Estimation based on a Complex Watson Kernel Method","author":[{"last_name":"Drude","id":"11213","full_name":"Drude, Lukas","first_name":"Lukas"},{"first_name":"Florian","full_name":"Jacob, Florian","last_name":"Jacob"},{"first_name":"Reinhold","last_name":"Haeb-Umbach","id":"242","full_name":"Haeb-Umbach, Reinhold"}],"date_created":"2019-07-12T05:27:38Z","oa":"1","date_updated":"2022-01-06T06:51:08Z","status":"public","abstract":[{"lang":"eng","text":"This contribution presents a Direction of Arrival (DoA) estimation algorithm based on the complex Watson distribution to incorporate both phase and level differences of captured micro- phone array signals. The derived algorithm is reviewed in the context of the Generalized State Coherence Transform (GSCT) on the one hand and a kernel density estimation method on the other hand. A thorough simulative evaluation yields insight into parameter selection and provides details on the performance for both directional and omni-directional microphones. A comparison to the well known Steered Response Power with Phase Transform (SRP-PHAT) algorithm and a state of the art DoA estimator which explicitly accounts for aliasing, shows in particular the advantages of presented algorithm if inter-sensor level differences are indicative of the DoA, as with directional microphones."}],"publication":"23th European Signal Processing Conference (EUSIPCO 2015)","type":"conference","language":[{"iso":"eng"}],"department":[{"_id":"54"}],"user_id":"44006","_id":"11755"},{"citation":{"chicago":"Heymann, Jahn, Lukas Drude, Aleksej Chinaev, and Reinhold Haeb-Umbach. “BLSTM Supported GEV Beamformer Front-End for the 3RD CHiME Challenge.” In <i>Automatic Speech Recognition and Understanding Workshop (ASRU 2015)</i>, 2015.","ieee":"J. Heymann, L. Drude, A. Chinaev, and R. Haeb-Umbach, “BLSTM supported GEV Beamformer Front-End for the 3RD CHiME Challenge,” in <i>Automatic Speech Recognition and Understanding Workshop (ASRU 2015)</i>, 2015.","ama":"Heymann J, Drude L, Chinaev A, Haeb-Umbach R. BLSTM supported GEV Beamformer Front-End for the 3RD CHiME Challenge. In: <i>Automatic Speech Recognition and Understanding Workshop (ASRU 2015)</i>. ; 2015.","mla":"Heymann, Jahn, et al. “BLSTM Supported GEV Beamformer Front-End for the 3RD CHiME Challenge.” <i>Automatic Speech Recognition and Understanding Workshop (ASRU 2015)</i>, 2015.","bibtex":"@inproceedings{Heymann_Drude_Chinaev_Haeb-Umbach_2015, title={BLSTM supported GEV Beamformer Front-End for the 3RD CHiME Challenge}, booktitle={Automatic Speech Recognition and Understanding Workshop (ASRU 2015)}, author={Heymann, Jahn and Drude, Lukas and Chinaev, Aleksej and Haeb-Umbach, Reinhold}, year={2015} }","short":"J. Heymann, L. Drude, A. Chinaev, R. Haeb-Umbach, in: Automatic Speech Recognition and Understanding Workshop (ASRU 2015), 2015.","apa":"Heymann, J., Drude, L., Chinaev, A., &#38; Haeb-Umbach, R. (2015). BLSTM supported GEV Beamformer Front-End for the 3RD CHiME Challenge. In <i>Automatic Speech Recognition and Understanding Workshop (ASRU 2015)</i>."},"year":"2015","title":"BLSTM supported GEV Beamformer Front-End for the 3RD CHiME Challenge","author":[{"first_name":"Jahn","last_name":"Heymann","id":"9168","full_name":"Heymann, Jahn"},{"first_name":"Lukas","full_name":"Drude, Lukas","id":"11213","last_name":"Drude"},{"last_name":"Chinaev","full_name":"Chinaev, Aleksej","first_name":"Aleksej"},{"last_name":"Haeb-Umbach","id":"242","full_name":"Haeb-Umbach, Reinhold","first_name":"Reinhold"}],"date_created":"2019-07-12T05:28:41Z","date_updated":"2022-01-06T06:51:09Z","status":"public","publication":"Automatic Speech Recognition and Understanding Workshop (ASRU 2015)","type":"conference","language":[{"iso":"eng"}],"department":[{"_id":"54"}],"user_id":"44006","_id":"11810"},{"related_material":{"link":[{"relation":"supplementary_material","description":"Poster","url":"https://groups.uni-paderborn.de/nt/pubs/2015/WaDrHa15_Poster.pdf"}]},"year":"2015","citation":{"short":"O. Walter, L. Drude, R. Haeb-Umbach, in: 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015), 2015.","bibtex":"@inproceedings{Walter_Drude_Haeb-Umbach_2015, title={Source Counting in Speech Mixtures by Nonparametric Bayesian Estimation of an infinite Gaussian Mixture Model}, booktitle={40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015)}, author={Walter, Oliver and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2015} }","mla":"Walter, Oliver, et al. “Source Counting in Speech Mixtures by Nonparametric Bayesian Estimation of an Infinite Gaussian Mixture Model.” <i>40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015)</i>, 2015.","apa":"Walter, O., Drude, L., &#38; Haeb-Umbach, R. (2015). Source Counting in Speech Mixtures by Nonparametric Bayesian Estimation of an infinite Gaussian Mixture Model. In <i>40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015)</i>.","ama":"Walter O, Drude L, Haeb-Umbach R. Source Counting in Speech Mixtures by Nonparametric Bayesian Estimation of an infinite Gaussian Mixture Model. In: <i>40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015)</i>. ; 2015.","ieee":"O. Walter, L. Drude, and R. Haeb-Umbach, “Source Counting in Speech Mixtures by Nonparametric Bayesian Estimation of an infinite Gaussian Mixture Model,” in <i>40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015)</i>, 2015.","chicago":"Walter, Oliver, Lukas Drude, and Reinhold Haeb-Umbach. “Source Counting in Speech Mixtures by Nonparametric Bayesian Estimation of an Infinite Gaussian Mixture Model.” In <i>40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015)</i>, 2015."},"oa":"1","date_updated":"2022-01-06T06:51:12Z","date_created":"2019-07-12T05:30:47Z","author":[{"first_name":"Oliver","last_name":"Walter","full_name":"Walter, Oliver"},{"last_name":"Drude","id":"11213","full_name":"Drude, Lukas","first_name":"Lukas"},{"first_name":"Reinhold","last_name":"Haeb-Umbach","id":"242","full_name":"Haeb-Umbach, Reinhold"}],"title":"Source Counting in Speech Mixtures by Nonparametric Bayesian Estimation of an infinite Gaussian Mixture Model","main_file_link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2015/WaDrHa15.pdf","open_access":"1"}],"type":"conference","publication":"40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015)","abstract":[{"text":"In this paper we present a source counting algorithm to determine the number of speakers in a speech mixture. In our proposed method, we model the histogram of estimated directions of arrival with a nonparametric Bayesian infinite Gaussian mixture model. As an alternative to classical model selection criteria and to avoid specifying the maximum number of mixture components in advance, a Dirichlet process prior is employed over the mixture components. This allows to automatically determine the optimal number of mixture components that most probably model the observations. We demonstrate by experiments that this model outperforms a parametric approach using a finite Gaussian mixture model with a Dirichlet distribution prior over the mixture weights.","lang":"eng"}],"status":"public","_id":"11919","user_id":"44006","department":[{"_id":"54"}],"language":[{"iso":"eng"}]}]