--- _id: '11753' abstract: - lang: eng text: This contribution describes a step-wise source counting algorithm to determine the number of speakers in an offline scenario. Each speaker is identified by a variational expectation maximization (VEM) algorithm for complex Watson mixture models and therefore directly yields beamforming vectors for a subsequent speech separation process. An observation selection criterion is proposed which improves the robustness of the source counting in noise. The algorithm is compared to an alternative VEM approach with Gaussian mixture models based on directions of arrival and shown to deliver improved source counting accuracy. The article concludes by extending the offline algorithm towards a low-latency online estimation of the number of active sources from the streaming input data. author: - first_name: Lukas full_name: Drude, Lukas id: '11213' last_name: Drude - first_name: Aleksej full_name: Chinaev, Aleksej last_name: Chinaev - first_name: Dang Hai full_name: Tran Vu, Dang Hai last_name: Tran Vu - first_name: Reinhold full_name: Haeb-Umbach, Reinhold id: '242' last_name: Haeb-Umbach citation: ama: 'Drude L, Chinaev A, Tran Vu DH, Haeb-Umbach R. Towards Online Source Counting in Speech Mixtures Applying a Variational EM for Complex Watson Mixture Models. In: 14th International Workshop on Acoustic Signal Enhancement (IWAENC 2014). ; 2014:213-217.' apa: Drude, L., Chinaev, A., Tran Vu, D. H., & Haeb-Umbach, R. (2014). Towards Online Source Counting in Speech Mixtures Applying a Variational EM for Complex Watson Mixture Models. In 14th International Workshop on Acoustic Signal Enhancement (IWAENC 2014) (pp. 213–217). bibtex: '@inproceedings{Drude_Chinaev_Tran Vu_Haeb-Umbach_2014, title={Towards Online Source Counting in Speech Mixtures Applying a Variational EM for Complex Watson Mixture Models}, booktitle={14th International Workshop on Acoustic Signal Enhancement (IWAENC 2014)}, author={Drude, Lukas and Chinaev, Aleksej and Tran Vu, Dang Hai and Haeb-Umbach, Reinhold}, year={2014}, pages={213–217} }' chicago: Drude, Lukas, Aleksej Chinaev, Dang Hai Tran Vu, and Reinhold Haeb-Umbach. “Towards Online Source Counting in Speech Mixtures Applying a Variational EM for Complex Watson Mixture Models.” In 14th International Workshop on Acoustic Signal Enhancement (IWAENC 2014), 213–17, 2014. ieee: L. Drude, A. Chinaev, D. H. Tran Vu, and R. Haeb-Umbach, “Towards Online Source Counting in Speech Mixtures Applying a Variational EM for Complex Watson Mixture Models,” in 14th International Workshop on Acoustic Signal Enhancement (IWAENC 2014), 2014, pp. 213–217. mla: Drude, Lukas, et al. “Towards Online Source Counting in Speech Mixtures Applying a Variational EM for Complex Watson Mixture Models.” 14th International Workshop on Acoustic Signal Enhancement (IWAENC 2014), 2014, pp. 213–17. short: 'L. Drude, A. Chinaev, D.H. Tran Vu, R. Haeb-Umbach, in: 14th International Workshop on Acoustic Signal Enhancement (IWAENC 2014), 2014, pp. 213–217.' date_created: 2019-07-12T05:27:35Z date_updated: 2022-01-06T06:51:08Z department: - _id: '54' keyword: - Accuracy - Acoustics - Estimation - Mathematical model - Soruce separation - Speech - Vectors - Bayes methods - Blind source separation - Directional statistics - Number of speakers - Speaker diarization language: - iso: eng main_file_link: - open_access: '1' url: https://groups.uni-paderborn.de/nt/pubs/2014/DrChTrHaeb14.pdf oa: '1' page: 213-217 publication: 14th International Workshop on Acoustic Signal Enhancement (IWAENC 2014) related_material: link: - description: Poster relation: supplementary_material url: https://groups.uni-paderborn.de/nt/pubs/2014/DrChTrHaeb14_Poster.pdf status: public title: Towards Online Source Counting in Speech Mixtures Applying a Variational EM for Complex Watson Mixture Models type: conference user_id: '44006' year: '2014' ... --- _id: '11716' abstract: - lang: eng text: The accuracy of automatic speech recognition systems in noisy and reverberant environments can be improved notably by exploiting the uncertainty of the estimated speech features using so-called uncertainty-of-observation techniques. In this paper, we introduce a new Bayesian decision rule that can serve as a mathematical framework from which both known and new uncertainty-of-observation techniques can be either derived or approximated. The new decision rule in its direct form leads to the new significance decoding approach for Gaussian mixture models, which results in better performance compared to standard uncertainty-of-observation techniques in different additive and convolutive noise scenarios. author: - first_name: Ahmed H. full_name: Abdelaziz, Ahmed H. last_name: Abdelaziz - first_name: Steffen full_name: Zeiler, Steffen last_name: Zeiler - first_name: Dorothea full_name: Kolossa, Dorothea last_name: Kolossa - first_name: Volker full_name: Leutnant, Volker last_name: Leutnant - first_name: Reinhold full_name: Haeb-Umbach, Reinhold id: '242' last_name: Haeb-Umbach citation: ama: 'Abdelaziz AH, Zeiler S, Kolossa D, Leutnant V, Haeb-Umbach R. GMM-based significance decoding. In: Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference On. ; 2013:6827-6831. doi:10.1109/ICASSP.2013.6638984' apa: Abdelaziz, A. H., Zeiler, S., Kolossa, D., Leutnant, V., & Haeb-Umbach, R. (2013). GMM-based significance decoding. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 6827–6831). https://doi.org/10.1109/ICASSP.2013.6638984 bibtex: '@inproceedings{Abdelaziz_Zeiler_Kolossa_Leutnant_Haeb-Umbach_2013, title={GMM-based significance decoding}, DOI={10.1109/ICASSP.2013.6638984}, booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on}, author={Abdelaziz, Ahmed H. and Zeiler, Steffen and Kolossa, Dorothea and Leutnant, Volker and Haeb-Umbach, Reinhold}, year={2013}, pages={6827–6831} }' chicago: Abdelaziz, Ahmed H., Steffen Zeiler, Dorothea Kolossa, Volker Leutnant, and Reinhold Haeb-Umbach. “GMM-Based Significance Decoding.” In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference On, 6827–31, 2013. https://doi.org/10.1109/ICASSP.2013.6638984. ieee: A. H. Abdelaziz, S. Zeiler, D. Kolossa, V. Leutnant, and R. Haeb-Umbach, “GMM-based significance decoding,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, 2013, pp. 6827–6831. mla: Abdelaziz, Ahmed H., et al. “GMM-Based Significance Decoding.” Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference On, 2013, pp. 6827–31, doi:10.1109/ICASSP.2013.6638984. short: 'A.H. Abdelaziz, S. Zeiler, D. Kolossa, V. Leutnant, R. Haeb-Umbach, in: Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference On, 2013, pp. 6827–6831.' date_created: 2019-07-12T05:26:53Z date_updated: 2022-01-06T06:51:07Z department: - _id: '54' doi: 10.1109/ICASSP.2013.6638984 keyword: - Bayes methods - Gaussian processes - convolution - decision theory - decoding - noise - reverberation - speech coding - speech recognition - Bayesian decision rule - GMM - Gaussian mixture models - additive noise scenarios - automatic speech recognition systems - convolutive noise scenarios - decoding approach - mathematical framework - reverberant environments - significance decoding - speech feature estimation - uncertainty-of-observation techniques - Hidden Markov models - Maximum likelihood decoding - Noise - Speech - Speech recognition - Uncertainty - Uncertainty-of-observation - modified imputation - noise robust speech recognition - significance decoding - uncertainty decoding language: - iso: eng page: 6827-6831 publication: Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on publication_identifier: issn: - 1520-6149 status: public title: GMM-based significance decoding type: conference user_id: '44006' year: '2013' ... --- _id: '11862' abstract: - lang: eng text: In this contribution we extend a previously proposed Bayesian approach for the enhancement of reverberant logarithmic mel power spectral coefficients for robust automatic speech recognition to the additional compensation of background noise. A recently proposed observation model is employed whose time-variant observation error statistics are obtained as a side product of the inference of the a posteriori probability density function of the clean speech feature vectors. Further a reduction of the computational effort and the memory requirements are achieved by using a recursive formulation of the observation model. The performance of the proposed algorithms is first experimentally studied on a connected digits recognition task with artificially created noisy reverberant data. It is shown that the use of the time-variant observation error model leads to a significant error rate reduction at low signal-to-noise ratios compared to a time-invariant model. Further experiments were conducted on a 5000 word task recorded in a reverberant and noisy environment. A significant word error rate reduction was obtained demonstrating the effectiveness of the approach on real-world data. author: - first_name: Volker full_name: Leutnant, Volker last_name: Leutnant - first_name: Alexander full_name: Krueger, Alexander last_name: Krueger - first_name: Reinhold full_name: Haeb-Umbach, Reinhold id: '242' last_name: Haeb-Umbach citation: ama: Leutnant V, Krueger A, Haeb-Umbach R. Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing. 2013;21(8):1640-1652. doi:10.1109/TASL.2013.2258013 apa: Leutnant, V., Krueger, A., & Haeb-Umbach, R. (2013). Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing, 21(8), 1640–1652. https://doi.org/10.1109/TASL.2013.2258013 bibtex: '@article{Leutnant_Krueger_Haeb-Umbach_2013, title={Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition}, volume={21}, DOI={10.1109/TASL.2013.2258013}, number={8}, journal={IEEE Transactions on Audio, Speech, and Language Processing}, author={Leutnant, Volker and Krueger, Alexander and Haeb-Umbach, Reinhold}, year={2013}, pages={1640–1652} }' chicago: 'Leutnant, Volker, Alexander Krueger, and Reinhold Haeb-Umbach. “Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition.” IEEE Transactions on Audio, Speech, and Language Processing 21, no. 8 (2013): 1640–52. https://doi.org/10.1109/TASL.2013.2258013.' ieee: V. Leutnant, A. Krueger, and R. Haeb-Umbach, “Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 8, pp. 1640–1652, 2013. mla: Leutnant, Volker, et al. “Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition.” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 8, 2013, pp. 1640–52, doi:10.1109/TASL.2013.2258013. short: V. Leutnant, A. Krueger, R. Haeb-Umbach, IEEE Transactions on Audio, Speech, and Language Processing 21 (2013) 1640–1652. date_created: 2019-07-12T05:29:42Z date_updated: 2022-01-06T06:51:11Z department: - _id: '54' doi: 10.1109/TASL.2013.2258013 intvolume: ' 21' issue: '8' keyword: - Bayes methods - compensation - error statistics - reverberation - speech recognition - Bayesian feature enhancement - background noise - clean speech feature vectors - compensation - connected digits recognition task - error statistics - memory requirements - noisy reverberant data - posteriori probability density function - recursive formulation - reverberant logarithmic mel power spectral coefficients - robust automatic speech recognition - signal-to-noise ratios - time-variant observation - word error rate reduction - Robust automatic speech recognition - model-based Bayesian feature enhancement - observation model for reverberant and noisy speech - recursive observation model language: - iso: eng page: 1640-1652 publication: IEEE Transactions on Audio, Speech, and Language Processing status: public title: Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition type: journal_article user_id: '44006' volume: 21 year: '2013' ... --- _id: '11939' abstract: - lang: eng text: In this paper a switching linear dynamical model (SLDM) approach for speech feature enhancement is improved by employing more accurate models for the dynamics of speech and noise. The model of the clean speech feature trajectory is improved by augmenting the state vector to capture information derived from the delta features. Further a hidden noise state variable is introduced to obtain a more elaborated model for the noise dynamics. Approximate Bayesian inference in the SLDM is carried out by a bank of extended Kalman filters, whose outputs are combined according to the a posteriori probability of the individual state models. Experimental results on the AURORA2 database show improved recognition accuracy. author: - first_name: Stefan full_name: Windmann, Stefan last_name: Windmann - first_name: Reinhold full_name: Haeb-Umbach, Reinhold id: '242' last_name: Haeb-Umbach citation: ama: 'Windmann S, Haeb-Umbach R. Modeling the dynamics of speech and noise for speech feature enhancement in ASR. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008). ; 2008:4409-4412. doi:10.1109/ICASSP.2008.4518633' apa: Windmann, S., & Haeb-Umbach, R. (2008). Modeling the dynamics of speech and noise for speech feature enhancement in ASR. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008) (pp. 4409–4412). https://doi.org/10.1109/ICASSP.2008.4518633 bibtex: '@inproceedings{Windmann_Haeb-Umbach_2008, title={Modeling the dynamics of speech and noise for speech feature enhancement in ASR}, DOI={10.1109/ICASSP.2008.4518633}, booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008)}, author={Windmann, Stefan and Haeb-Umbach, Reinhold}, year={2008}, pages={4409–4412} }' chicago: Windmann, Stefan, and Reinhold Haeb-Umbach. “Modeling the Dynamics of Speech and Noise for Speech Feature Enhancement in ASR.” In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008), 4409–12, 2008. https://doi.org/10.1109/ICASSP.2008.4518633. ieee: S. Windmann and R. Haeb-Umbach, “Modeling the dynamics of speech and noise for speech feature enhancement in ASR,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008), 2008, pp. 4409–4412. mla: Windmann, Stefan, and Reinhold Haeb-Umbach. “Modeling the Dynamics of Speech and Noise for Speech Feature Enhancement in ASR.” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008), 2008, pp. 4409–12, doi:10.1109/ICASSP.2008.4518633. short: 'S. Windmann, R. Haeb-Umbach, in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008), 2008, pp. 4409–4412.' date_created: 2019-07-12T05:31:11Z date_updated: 2022-01-06T06:51:12Z department: - _id: '54' doi: 10.1109/ICASSP.2008.4518633 keyword: - a posteriori probability - AURORA2 database - Bayesian inference - Bayes methods - channel bank filters - extended Kalman filter banks - hidden noise state variable - Kalman filters - noise dynamics - speech enhancement - speech feature enhancement - speech feature trajectory - switching linear dynamical model approach language: - iso: eng main_file_link: - open_access: '1' url: https://groups.uni-paderborn.de/nt/pubs/2008/WiHa08-1.pdf oa: '1' page: 4409-4412 publication: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008) status: public title: Modeling the dynamics of speech and noise for speech feature enhancement in ASR type: conference user_id: '44006' year: '2008' ... --- _id: '11870' abstract: - lang: eng text: We derive a class of computationally inexpensive linear dimension reduction criteria by introducing a weighted variant of the well-known K-class Fisher criterion associated with linear discriminant analysis (LDA). It can be seen that LDA weights contributions of individual class pairs according to the Euclidean distance of the respective class means. We generalize upon LDA by introducing a different weighting function author: - first_name: M. full_name: Loog, M. last_name: Loog - first_name: R.P.W. full_name: Duin, R.P.W. last_name: Duin - first_name: Reinhold full_name: Haeb-Umbach, Reinhold id: '242' last_name: Haeb-Umbach citation: ama: Loog M, Duin RPW, Haeb-Umbach R. Multiclass linear dimension reduction by weighted pairwise Fisher criteria. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2001;23(7):762-766. doi:10.1109/34.935849 apa: Loog, M., Duin, R. P. W., & Haeb-Umbach, R. (2001). Multiclass linear dimension reduction by weighted pairwise Fisher criteria. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(7), 762–766. https://doi.org/10.1109/34.935849 bibtex: '@article{Loog_Duin_Haeb-Umbach_2001, title={Multiclass linear dimension reduction by weighted pairwise Fisher criteria}, volume={23}, DOI={10.1109/34.935849}, number={7}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, author={Loog, M. and Duin, R.P.W. and Haeb-Umbach, Reinhold}, year={2001}, pages={762–766} }' chicago: 'Loog, M., R.P.W. Duin, and Reinhold Haeb-Umbach. “Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria.” IEEE Transactions on Pattern Analysis and Machine Intelligence 23, no. 7 (2001): 762–66. https://doi.org/10.1109/34.935849.' ieee: M. Loog, R. P. W. Duin, and R. Haeb-Umbach, “Multiclass linear dimension reduction by weighted pairwise Fisher criteria,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 7, pp. 762–766, 2001. mla: Loog, M., et al. “Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 7, 2001, pp. 762–66, doi:10.1109/34.935849. short: M. Loog, R.P.W. Duin, R. Haeb-Umbach, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (2001) 762–766. date_created: 2019-07-12T05:29:51Z date_updated: 2022-01-06T06:51:11Z department: - _id: '54' doi: 10.1109/34.935849 intvolume: ' 23' issue: '7' keyword: - approximate pairwise accuracy - Bayes error - Bayes methods - error statistics - Euclidean distance - Fisher criterion - linear dimension reduction - linear discriminant analysis - pattern classification - statistical analysis - statistical pattern classification - weighting function language: - iso: eng main_file_link: - open_access: '1' url: https://groups.uni-paderborn.de/nt/pubs/2001/LoDuHa01.pdf oa: '1' page: 762-766 publication: IEEE Transactions on Pattern Analysis and Machine Intelligence status: public title: Multiclass linear dimension reduction by weighted pairwise Fisher criteria type: journal_article user_id: '44006' volume: 23 year: '2001' ...