--- _id: '11744' abstract: - lang: eng text: A noise power spectral density (PSD) estimation is an indispensable component of speech spectral enhancement systems. In this paper we present a noise PSD tracking algorithm, which employs a noise presence probability estimate delivered by a deep neural network (DNN). The algorithm provides a causal noise PSD estimate and can thus be used in speech enhancement systems for communication purposes. An extensive performance comparison has been carried out with ten causal state-of-the-art noise tracking algorithms taken from the literature and categorized acc. to applied techniques. The experiments showed that the proposed DNN-based noise PSD tracker outperforms all competing methods with respect to all tested performance measures, which include the noise tracking performance and the performance of a speech enhancement system employing the noise tracking component. author: - first_name: Aleksej full_name: Chinaev, Aleksej last_name: Chinaev - first_name: Jahn full_name: Heymann, Jahn id: '9168' last_name: Heymann - first_name: Lukas full_name: Drude, Lukas id: '11213' last_name: Drude - first_name: Reinhold full_name: Haeb-Umbach, Reinhold id: '242' last_name: Haeb-Umbach citation: ama: 'Chinaev A, Heymann J, Drude L, Haeb-Umbach R. Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs. In: 12. ITG Fachtagung Sprachkommunikation (ITG 2016). ; 2016.' apa: Chinaev, A., Heymann, J., Drude, L., & Haeb-Umbach, R. (2016). Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs. In 12. ITG Fachtagung Sprachkommunikation (ITG 2016). bibtex: '@inproceedings{Chinaev_Heymann_Drude_Haeb-Umbach_2016, title={Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs}, booktitle={12. ITG Fachtagung Sprachkommunikation (ITG 2016)}, author={Chinaev, Aleksej and Heymann, Jahn and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2016} }' chicago: Chinaev, Aleksej, Jahn Heymann, Lukas Drude, and Reinhold Haeb-Umbach. “Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs.” In 12. ITG Fachtagung Sprachkommunikation (ITG 2016), 2016. ieee: A. Chinaev, J. Heymann, L. Drude, and R. Haeb-Umbach, “Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs,” in 12. ITG Fachtagung Sprachkommunikation (ITG 2016), 2016. mla: Chinaev, Aleksej, et al. “Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs.” 12. ITG Fachtagung Sprachkommunikation (ITG 2016), 2016. short: 'A. Chinaev, J. Heymann, L. Drude, R. Haeb-Umbach, in: 12. ITG Fachtagung Sprachkommunikation (ITG 2016), 2016.' date_created: 2019-07-12T05:27:25Z date_updated: 2022-01-06T06:51:08Z department: - _id: '54' language: - iso: eng main_file_link: - open_access: '1' url: https://groups.uni-paderborn.de/nt/pubs/2016/ChHeyDrHa16.pdf oa: '1' publication: 12. ITG Fachtagung Sprachkommunikation (ITG 2016) related_material: link: - description: Presentation relation: supplementary_material url: https://groups.uni-paderborn.de/nt/pubs/2016/ChHeyDrHa16_Presentation.pdf status: public title: Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs type: conference user_id: '44006' year: '2016' ... --- _id: '11751' author: - first_name: Lukas full_name: Drude, Lukas id: '11213' last_name: Drude - first_name: Christoph full_name: Boeddeker, Christoph id: '40767' last_name: Boeddeker - first_name: Reinhold full_name: Haeb-Umbach, Reinhold id: '242' last_name: Haeb-Umbach citation: ama: 'Drude L, Boeddeker C, Haeb-Umbach R. Blind Speech Separation based on Complex Spherical k-Mode Clustering. In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP). ; 2016.' apa: Drude, L., Boeddeker, C., & Haeb-Umbach, R. (2016). Blind Speech Separation based on Complex Spherical k-Mode Clustering. In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP). bibtex: '@inproceedings{Drude_Boeddeker_Haeb-Umbach_2016, title={Blind Speech Separation based on Complex Spherical k-Mode Clustering}, booktitle={Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)}, author={Drude, Lukas and Boeddeker, Christoph and Haeb-Umbach, Reinhold}, year={2016} }' chicago: Drude, Lukas, Christoph Boeddeker, and Reinhold Haeb-Umbach. “Blind Speech Separation Based on Complex Spherical K-Mode Clustering.” In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2016. ieee: L. Drude, C. Boeddeker, and R. Haeb-Umbach, “Blind Speech Separation based on Complex Spherical k-Mode Clustering,” in Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2016. mla: Drude, Lukas, et al. “Blind Speech Separation Based on Complex Spherical K-Mode Clustering.” Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2016. short: 'L. Drude, C. Boeddeker, R. Haeb-Umbach, in: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2016.' date_created: 2019-07-12T05:27:33Z date_updated: 2022-01-06T06:51:08Z department: - _id: '54' language: - iso: eng main_file_link: - open_access: '1' url: https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_drude_paper.pdf oa: '1' publication: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP) related_material: link: - description: Slides relation: supplementary_material url: https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_drude_slides.pdf status: public title: Blind Speech Separation based on Complex Spherical k-Mode Clustering type: conference user_id: '44006' year: '2016' ... --- _id: '11756' abstract: - lang: eng text: Although complex-valued neural networks (CVNNs) â?? networks which can operate with complex arithmetic â?? have been around for a while, they have not been given reconsideration since the breakthrough of deep network architectures. This paper presents a critical assessment whether the novel tool set of deep neural networks (DNNs) should be extended to complex-valued arithmetic. Indeed, with DNNs making inroads in speech enhancement tasks, the use of complex-valued input data, specifically the short-time Fourier transform coefficients, is an obvious consideration. In particular when it comes to performing tasks that heavily rely on phase information, such as acoustic beamforming, complex-valued algorithms are omnipresent. In this contribution we recapitulate backpropagation in CVNNs, develop complex-valued network elements, such as the split-rectified non-linearity, and compare real- and complex-valued networks on a beamforming task. We find that CVNNs hardly provide a performance gain and conclude that the effort of developing the complex-valued counterparts of the building blocks of modern deep or recurrent neural networks can hardly be justified. author: - first_name: Lukas full_name: Drude, Lukas id: '11213' last_name: Drude - first_name: Bhiksha full_name: Raj, Bhiksha last_name: Raj - first_name: Reinhold full_name: Haeb-Umbach, Reinhold id: '242' last_name: Haeb-Umbach citation: ama: 'Drude L, Raj B, Haeb-Umbach R. On the appropriateness of complex-valued neural networks for speech enhancement. In: INTERSPEECH 2016, San Francisco, USA. ; 2016.' apa: Drude, L., Raj, B., & Haeb-Umbach, R. (2016). On the appropriateness of complex-valued neural networks for speech enhancement. In INTERSPEECH 2016, San Francisco, USA. bibtex: '@inproceedings{Drude_Raj_Haeb-Umbach_2016, title={On the appropriateness of complex-valued neural networks for speech enhancement}, booktitle={INTERSPEECH 2016, San Francisco, USA}, author={Drude, Lukas and Raj, Bhiksha and Haeb-Umbach, Reinhold}, year={2016} }' chicago: Drude, Lukas, Bhiksha Raj, and Reinhold Haeb-Umbach. “On the Appropriateness of Complex-Valued Neural Networks for Speech Enhancement.” In INTERSPEECH 2016, San Francisco, USA, 2016. ieee: L. Drude, B. Raj, and R. Haeb-Umbach, “On the appropriateness of complex-valued neural networks for speech enhancement,” in INTERSPEECH 2016, San Francisco, USA, 2016. mla: Drude, Lukas, et al. “On the Appropriateness of Complex-Valued Neural Networks for Speech Enhancement.” INTERSPEECH 2016, San Francisco, USA, 2016. short: 'L. Drude, B. Raj, R. Haeb-Umbach, in: INTERSPEECH 2016, San Francisco, USA, 2016.' date_created: 2019-07-12T05:27:39Z date_updated: 2022-01-06T06:51:08Z department: - _id: '54' language: - iso: eng main_file_link: - open_access: '1' url: https://groups.uni-paderborn.de/nt/pubs/2016/interspeech_2016_drude_paper.pdf oa: '1' publication: INTERSPEECH 2016, San Francisco, USA related_material: link: - description: Poster relation: supplementary_material url: https://groups.uni-paderborn.de/nt/pubs/2016/interspeech_2016_drude_slides.pdf status: public title: On the appropriateness of complex-valued neural networks for speech enhancement type: conference user_id: '44006' year: '2016' ... --- _id: '11771' abstract: - lang: eng text: This paper is concerned with speech presence probability estimation employing an explicit model of the temporal and spectral correlations of speech. An undirected graphical model is introduced, based on a Factor Graph formulation. It is shown that this undirected model cures some of the theoretical issues of an earlier directed graphical model. Furthermore, we formulate a message passing inference scheme based on an approximate graph factorization, identify this inference scheme as a particular message passing schedule based on the turbo principle and suggest further alternative schedules. The experiments show an improved performance over speech presence probability estimation based on an IID assumption, and a slightly better performance of the turbo schedule over the alternatives. author: - first_name: Thomas full_name: Glarner, Thomas id: '14169' last_name: Glarner - first_name: Mohammad full_name: Mahdi Momenzadeh, Mohammad last_name: Mahdi Momenzadeh - first_name: Lukas full_name: Drude, Lukas id: '11213' last_name: Drude - first_name: Reinhold full_name: Haeb-Umbach, Reinhold id: '242' last_name: Haeb-Umbach citation: ama: 'Glarner T, Mahdi Momenzadeh M, Drude L, Haeb-Umbach R. Factor Graph Decoding for Speech Presence Probability Estimation. In: 12. ITG Fachtagung Sprachkommunikation (ITG 2016). ; 2016.' apa: Glarner, T., Mahdi Momenzadeh, M., Drude, L., & Haeb-Umbach, R. (2016). Factor Graph Decoding for Speech Presence Probability Estimation. In 12. ITG Fachtagung Sprachkommunikation (ITG 2016). bibtex: '@inproceedings{Glarner_Mahdi Momenzadeh_Drude_Haeb-Umbach_2016, title={Factor Graph Decoding for Speech Presence Probability Estimation}, booktitle={12. ITG Fachtagung Sprachkommunikation (ITG 2016)}, author={Glarner, Thomas and Mahdi Momenzadeh, Mohammad and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2016} }' chicago: Glarner, Thomas, Mohammad Mahdi Momenzadeh, Lukas Drude, and Reinhold Haeb-Umbach. “Factor Graph Decoding for Speech Presence Probability Estimation.” In 12. ITG Fachtagung Sprachkommunikation (ITG 2016), 2016. ieee: T. Glarner, M. Mahdi Momenzadeh, L. Drude, and R. Haeb-Umbach, “Factor Graph Decoding for Speech Presence Probability Estimation,” in 12. ITG Fachtagung Sprachkommunikation (ITG 2016), 2016. mla: Glarner, Thomas, et al. “Factor Graph Decoding for Speech Presence Probability Estimation.” 12. ITG Fachtagung Sprachkommunikation (ITG 2016), 2016. short: 'T. Glarner, M. Mahdi Momenzadeh, L. Drude, R. Haeb-Umbach, in: 12. ITG Fachtagung Sprachkommunikation (ITG 2016), 2016.' date_created: 2019-07-12T05:27:56Z date_updated: 2022-01-06T06:51:08Z department: - _id: '54' language: - iso: eng main_file_link: - open_access: '1' url: https://groups.uni-paderborn.de/nt/pubs/2016/itgspeech2016_08_Glarner.pdf oa: '1' publication: 12. ITG Fachtagung Sprachkommunikation (ITG 2016) related_material: link: - description: Slides relation: supplementary_material url: https://groups.uni-paderborn.de/nt/pubs/2016/itgspeech2016_08_Glarner_slides.pdf status: public title: Factor Graph Decoding for Speech Presence Probability Estimation type: conference user_id: '44006' year: '2016' ... --- _id: '11812' author: - first_name: Jahn full_name: Heymann, Jahn id: '9168' last_name: Heymann - first_name: Lukas full_name: Drude, Lukas id: '11213' last_name: Drude - first_name: Reinhold full_name: Haeb-Umbach, Reinhold id: '242' last_name: Haeb-Umbach citation: ama: 'Heymann J, Drude L, Haeb-Umbach R. Neural Network Based Spectral Mask Estimation for Acoustic Beamforming. In: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP). ; 2016.' apa: Heymann, J., Drude, L., & Haeb-Umbach, R. (2016). Neural Network Based Spectral Mask Estimation for Acoustic Beamforming. In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP). bibtex: '@inproceedings{Heymann_Drude_Haeb-Umbach_2016, title={Neural Network Based Spectral Mask Estimation for Acoustic Beamforming}, booktitle={Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)}, author={Heymann, Jahn and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2016} }' chicago: Heymann, Jahn, Lukas Drude, and Reinhold Haeb-Umbach. “Neural Network Based Spectral Mask Estimation for Acoustic Beamforming.” In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2016. ieee: J. Heymann, L. Drude, and R. Haeb-Umbach, “Neural Network Based Spectral Mask Estimation for Acoustic Beamforming,” in Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2016. mla: Heymann, Jahn, et al. “Neural Network Based Spectral Mask Estimation for Acoustic Beamforming.” Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2016. short: 'J. Heymann, L. Drude, R. Haeb-Umbach, in: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2016.' date_created: 2019-07-12T05:28:44Z date_updated: 2022-01-06T06:51:09Z department: - _id: '54' language: - iso: eng main_file_link: - open_access: '1' url: https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_heymann_paper.pdf oa: '1' publication: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP) related_material: link: - description: Slides relation: supplementary_material url: https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_heymann_slides.pdf status: public title: Neural Network Based Spectral Mask Estimation for Acoustic Beamforming type: conference user_id: '44006' year: '2016' ... --- _id: '11834' abstract: - lang: eng text: We present a system for the 4th CHiME challenge which significantly increases the performance for all three tracks with respect to the provided baseline system. The front-end uses a bi-directional Long Short-Term Memory (BLSTM)-based neural network to estimate signal statistics. These then steer a Generalized Eigenvalue beamformer. The back-end consists of a 22 layer deep Wide Residual Network and two extra BLSTM layers. Working on a whole utterance instead of frames allows us to refine Batch-Normalization. We also train our own BLSTM-based language model. Adding a discriminative speaker adaptation leads to further gains. The final system achieves a word error rate on the six channel real test data of 3.48%. For the two channel track we achieve 5.96% and for the one channel track 9.34%. This is the best reported performance on the challenge achieved by a single system, i.e., a configuration, which does not combine multiple systems. At the same time, our system is independent of the microphone configuration. We can thus use the same components for all three tracks. author: - first_name: Jahn full_name: Heymann, Jahn id: '9168' last_name: Heymann - first_name: Lukas full_name: Drude, Lukas id: '11213' last_name: Drude - first_name: Reinhold full_name: Haeb-Umbach, Reinhold id: '242' last_name: Haeb-Umbach citation: ama: 'Heymann J, Drude L, Haeb-Umbach R. Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition. In: Computer Speech and Language. ; 2016.' apa: Heymann, J., Drude, L., & Haeb-Umbach, R. (2016). Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition. In Computer Speech and Language. bibtex: '@inproceedings{Heymann_Drude_Haeb-Umbach_2016, title={Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition}, booktitle={Computer Speech and Language}, author={Heymann, Jahn and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2016} }' chicago: Heymann, Jahn, Lukas Drude, and Reinhold Haeb-Umbach. “Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition.” In Computer Speech and Language, 2016. ieee: J. Heymann, L. Drude, and R. Haeb-Umbach, “Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition,” in Computer Speech and Language, 2016. mla: Heymann, Jahn, et al. “Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition.” Computer Speech and Language, 2016. short: 'J. Heymann, L. Drude, R. Haeb-Umbach, in: Computer Speech and Language, 2016.' date_created: 2019-07-12T05:29:09Z date_updated: 2022-01-06T06:51:11Z department: - _id: '54' language: - iso: eng main_file_link: - open_access: '1' url: https://groups.uni-paderborn.de/nt/pubs/2016/chime4_upbonly_paper.pdf oa: '1' publication: Computer Speech and Language related_material: link: - description: Poster relation: supplementary_material url: https://groups.uni-paderborn.de/nt/pubs/2016/chime4_upbonly_poster.pdf status: public title: Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust Speech Recognition type: conference user_id: '44006' year: '2016' ... --- _id: '11908' abstract: - lang: eng text: 'This paper describes automatic speech recognition (ASR) systems developed jointly by RWTH, UPB and FORTH for the 1ch, 2ch and 6ch track of the 4th CHiME Challenge. In the 2ch and 6ch tracks the final system output is obtained by a Confusion Network Combination (CNC) of multiple systems. The Acoustic Model (AM) is a deep neural network based on Bidirectional Long Short-Term Memory (BLSTM) units. The systems differ by front ends and training sets used for the acoustic training. The model for the 1ch track is trained without any preprocessing. For each front end we trained and evaluated individual acoustic models. We compare the ASR performance of different beamforming approaches: a conventional superdirective beamformer [1] and an MVDR beamformer as in [2], where the steering vector is estimated based on [3]. Furthermore we evaluated a BLSTM supported Generalized Eigenvalue beamformer using NN-GEV [4]. The back end is implemented using RWTH?s open-source toolkits RASR [5], RETURNN [6] and rwthlm [7]. We rescore lattices with a Long Short-Term Memory (LSTM) based language model. The overall best results are obtained by a system combination that includes the lattices from the system of UPB?s submission [8]. Our final submission scored second in each of the three tracks of the 4th CHiME Challenge.' author: - first_name: Tobias full_name: Menne, Tobias last_name: Menne - first_name: Jahn full_name: Heymann, Jahn id: '9168' last_name: Heymann - first_name: Anastasios full_name: Alexandridis, Anastasios last_name: Alexandridis - first_name: Kazuki full_name: Irie, Kazuki last_name: Irie - first_name: Albert full_name: Zeyer, Albert last_name: Zeyer - first_name: Markus full_name: Kitza, Markus last_name: Kitza - first_name: Pavel full_name: Golik, Pavel last_name: Golik - first_name: Ilia full_name: Kulikov, Ilia last_name: Kulikov - first_name: Lukas full_name: Drude, Lukas id: '11213' last_name: Drude - first_name: Ralf full_name: Schlüter, Ralf last_name: Schlüter - first_name: Hermann full_name: Ney, Hermann last_name: Ney - first_name: Reinhold full_name: Haeb-Umbach, Reinhold id: '242' last_name: Haeb-Umbach - first_name: Athanasios full_name: Mouchtaris, Athanasios last_name: Mouchtaris citation: ama: 'Menne T, Heymann J, Alexandridis A, et al. The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation. In: Computer Speech and Language. ; 2016.' apa: Menne, T., Heymann, J., Alexandridis, A., Irie, K., Zeyer, A., Kitza, M., … Mouchtaris, A. (2016). The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation. In Computer Speech and Language. bibtex: '@inproceedings{Menne_Heymann_Alexandridis_Irie_Zeyer_Kitza_Golik_Kulikov_Drude_Schlüter_et al._2016, title={The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation}, booktitle={Computer Speech and Language}, author={Menne, Tobias and Heymann, Jahn and Alexandridis, Anastasios and Irie, Kazuki and Zeyer, Albert and Kitza, Markus and Golik, Pavel and Kulikov, Ilia and Drude, Lukas and Schlüter, Ralf and et al.}, year={2016} }' chicago: Menne, Tobias, Jahn Heymann, Anastasios Alexandridis, Kazuki Irie, Albert Zeyer, Markus Kitza, Pavel Golik, et al. “The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation.” In Computer Speech and Language, 2016. ieee: T. Menne et al., “The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation,” in Computer Speech and Language, 2016. mla: Menne, Tobias, et al. “The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation.” Computer Speech and Language, 2016. short: 'T. Menne, J. Heymann, A. Alexandridis, K. Irie, A. Zeyer, M. Kitza, P. Golik, I. Kulikov, L. Drude, R. Schlüter, H. Ney, R. Haeb-Umbach, A. Mouchtaris, in: Computer Speech and Language, 2016.' date_created: 2019-07-12T05:30:35Z date_updated: 2022-01-06T06:51:12Z department: - _id: '54' language: - iso: eng main_file_link: - open_access: '1' url: https://groups.uni-paderborn.de/nt/pubs/2016/chime4_rwthupbforth_paper.pdf oa: '1' publication: Computer Speech and Language status: public title: The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation type: conference user_id: '44006' year: '2016' ... --- _id: '11755' abstract: - lang: eng text: This contribution presents a Direction of Arrival (DoA) estimation algorithm based on the complex Watson distribution to incorporate both phase and level differences of captured micro- phone array signals. The derived algorithm is reviewed in the context of the Generalized State Coherence Transform (GSCT) on the one hand and a kernel density estimation method on the other hand. A thorough simulative evaluation yields insight into parameter selection and provides details on the performance for both directional and omni-directional microphones. A comparison to the well known Steered Response Power with Phase Transform (SRP-PHAT) algorithm and a state of the art DoA estimator which explicitly accounts for aliasing, shows in particular the advantages of presented algorithm if inter-sensor level differences are indicative of the DoA, as with directional microphones. author: - first_name: Lukas full_name: Drude, Lukas id: '11213' last_name: Drude - first_name: Florian full_name: Jacob, Florian last_name: Jacob - first_name: Reinhold full_name: Haeb-Umbach, Reinhold id: '242' last_name: Haeb-Umbach citation: ama: 'Drude L, Jacob F, Haeb-Umbach R. DOA-Estimation based on a Complex Watson Kernel Method. In: 23th European Signal Processing Conference (EUSIPCO 2015). ; 2015.' apa: Drude, L., Jacob, F., & Haeb-Umbach, R. (2015). DOA-Estimation based on a Complex Watson Kernel Method. In 23th European Signal Processing Conference (EUSIPCO 2015). bibtex: '@inproceedings{Drude_Jacob_Haeb-Umbach_2015, title={DOA-Estimation based on a Complex Watson Kernel Method}, booktitle={23th European Signal Processing Conference (EUSIPCO 2015)}, author={Drude, Lukas and Jacob, Florian and Haeb-Umbach, Reinhold}, year={2015} }' chicago: Drude, Lukas, Florian Jacob, and Reinhold Haeb-Umbach. “DOA-Estimation Based on a Complex Watson Kernel Method.” In 23th European Signal Processing Conference (EUSIPCO 2015), 2015. ieee: L. Drude, F. Jacob, and R. Haeb-Umbach, “DOA-Estimation based on a Complex Watson Kernel Method,” in 23th European Signal Processing Conference (EUSIPCO 2015), 2015. mla: Drude, Lukas, et al. “DOA-Estimation Based on a Complex Watson Kernel Method.” 23th European Signal Processing Conference (EUSIPCO 2015), 2015. short: 'L. Drude, F. Jacob, R. Haeb-Umbach, in: 23th European Signal Processing Conference (EUSIPCO 2015), 2015.' date_created: 2019-07-12T05:27:38Z date_updated: 2022-01-06T06:51:08Z department: - _id: '54' language: - iso: eng main_file_link: - open_access: '1' url: https://groups.uni-paderborn.de/nt/pubs/2015/DrJaHa15.pdf oa: '1' publication: 23th European Signal Processing Conference (EUSIPCO 2015) related_material: link: - description: Presentation relation: supplementary_material url: https://groups.uni-paderborn.de/nt/pubs/2015/DrJaHa15_Presentation.pdf status: public title: DOA-Estimation based on a Complex Watson Kernel Method type: conference user_id: '44006' year: '2015' ... --- _id: '11810' author: - first_name: Jahn full_name: Heymann, Jahn id: '9168' last_name: Heymann - first_name: Lukas full_name: Drude, Lukas id: '11213' last_name: Drude - first_name: Aleksej full_name: Chinaev, Aleksej last_name: Chinaev - first_name: Reinhold full_name: Haeb-Umbach, Reinhold id: '242' last_name: Haeb-Umbach citation: ama: 'Heymann J, Drude L, Chinaev A, Haeb-Umbach R. BLSTM supported GEV Beamformer Front-End for the 3RD CHiME Challenge. In: Automatic Speech Recognition and Understanding Workshop (ASRU 2015). ; 2015.' apa: Heymann, J., Drude, L., Chinaev, A., & Haeb-Umbach, R. (2015). BLSTM supported GEV Beamformer Front-End for the 3RD CHiME Challenge. In Automatic Speech Recognition and Understanding Workshop (ASRU 2015). bibtex: '@inproceedings{Heymann_Drude_Chinaev_Haeb-Umbach_2015, title={BLSTM supported GEV Beamformer Front-End for the 3RD CHiME Challenge}, booktitle={Automatic Speech Recognition and Understanding Workshop (ASRU 2015)}, author={Heymann, Jahn and Drude, Lukas and Chinaev, Aleksej and Haeb-Umbach, Reinhold}, year={2015} }' chicago: Heymann, Jahn, Lukas Drude, Aleksej Chinaev, and Reinhold Haeb-Umbach. “BLSTM Supported GEV Beamformer Front-End for the 3RD CHiME Challenge.” In Automatic Speech Recognition and Understanding Workshop (ASRU 2015), 2015. ieee: J. Heymann, L. Drude, A. Chinaev, and R. Haeb-Umbach, “BLSTM supported GEV Beamformer Front-End for the 3RD CHiME Challenge,” in Automatic Speech Recognition and Understanding Workshop (ASRU 2015), 2015. mla: Heymann, Jahn, et al. “BLSTM Supported GEV Beamformer Front-End for the 3RD CHiME Challenge.” Automatic Speech Recognition and Understanding Workshop (ASRU 2015), 2015. short: 'J. Heymann, L. Drude, A. Chinaev, R. Haeb-Umbach, in: Automatic Speech Recognition and Understanding Workshop (ASRU 2015), 2015.' date_created: 2019-07-12T05:28:41Z date_updated: 2022-01-06T06:51:09Z department: - _id: '54' language: - iso: eng publication: Automatic Speech Recognition and Understanding Workshop (ASRU 2015) status: public title: BLSTM supported GEV Beamformer Front-End for the 3RD CHiME Challenge type: conference user_id: '44006' year: '2015' ... --- _id: '11919' abstract: - lang: eng text: In this paper we present a source counting algorithm to determine the number of speakers in a speech mixture. In our proposed method, we model the histogram of estimated directions of arrival with a nonparametric Bayesian infinite Gaussian mixture model. As an alternative to classical model selection criteria and to avoid specifying the maximum number of mixture components in advance, a Dirichlet process prior is employed over the mixture components. This allows to automatically determine the optimal number of mixture components that most probably model the observations. We demonstrate by experiments that this model outperforms a parametric approach using a finite Gaussian mixture model with a Dirichlet distribution prior over the mixture weights. author: - first_name: Oliver full_name: Walter, Oliver last_name: Walter - first_name: Lukas full_name: Drude, Lukas id: '11213' last_name: Drude - first_name: Reinhold full_name: Haeb-Umbach, Reinhold id: '242' last_name: Haeb-Umbach citation: ama: 'Walter O, Drude L, Haeb-Umbach R. Source Counting in Speech Mixtures by Nonparametric Bayesian Estimation of an infinite Gaussian Mixture Model. In: 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015). ; 2015.' apa: Walter, O., Drude, L., & Haeb-Umbach, R. (2015). Source Counting in Speech Mixtures by Nonparametric Bayesian Estimation of an infinite Gaussian Mixture Model. In 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015). bibtex: '@inproceedings{Walter_Drude_Haeb-Umbach_2015, title={Source Counting in Speech Mixtures by Nonparametric Bayesian Estimation of an infinite Gaussian Mixture Model}, booktitle={40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015)}, author={Walter, Oliver and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2015} }' chicago: Walter, Oliver, Lukas Drude, and Reinhold Haeb-Umbach. “Source Counting in Speech Mixtures by Nonparametric Bayesian Estimation of an Infinite Gaussian Mixture Model.” In 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015), 2015. ieee: O. Walter, L. Drude, and R. Haeb-Umbach, “Source Counting in Speech Mixtures by Nonparametric Bayesian Estimation of an infinite Gaussian Mixture Model,” in 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015), 2015. mla: Walter, Oliver, et al. “Source Counting in Speech Mixtures by Nonparametric Bayesian Estimation of an Infinite Gaussian Mixture Model.” 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015), 2015. short: 'O. Walter, L. Drude, R. Haeb-Umbach, in: 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015), 2015.' date_created: 2019-07-12T05:30:47Z date_updated: 2022-01-06T06:51:12Z department: - _id: '54' language: - iso: eng main_file_link: - open_access: '1' url: https://groups.uni-paderborn.de/nt/pubs/2015/WaDrHa15.pdf oa: '1' publication: 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015) related_material: link: - description: Poster relation: supplementary_material url: https://groups.uni-paderborn.de/nt/pubs/2015/WaDrHa15_Poster.pdf status: public title: Source Counting in Speech Mixtures by Nonparametric Bayesian Estimation of an infinite Gaussian Mixture Model type: conference user_id: '44006' year: '2015' ...