---
_id: '11753'
abstract:
- lang: eng
  text: This contribution describes a step-wise source counting algorithm to determine
    the number of speakers in an offline scenario. Each speaker is identified by a
    variational expectation maximization (VEM) algorithm for complex Watson mixture
    models and therefore directly yields beamforming vectors for a subsequent speech
    separation process. An observation selection criterion is proposed which improves
    the robustness of the source counting in noise. The algorithm is compared to an
    alternative VEM approach with Gaussian mixture models based on directions of arrival
    and shown to deliver improved source counting accuracy. The article concludes
    by extending the offline algorithm towards a low-latency online estimation of
    the number of active sources from the streaming input data.
author:
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Aleksej
  full_name: Chinaev, Aleksej
  last_name: Chinaev
- first_name: Dang Hai
  full_name: Tran Vu, Dang Hai
  last_name: Tran Vu
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Drude L, Chinaev A, Tran Vu DH, Haeb-Umbach R. Towards Online Source Counting
    in Speech Mixtures Applying a Variational EM for Complex Watson Mixture Models.
    In: <i>14th International Workshop on Acoustic Signal Enhancement (IWAENC 2014)</i>.
    ; 2014:213-217.'
  apa: Drude, L., Chinaev, A., Tran Vu, D. H., &#38; Haeb-Umbach, R. (2014). Towards
    Online Source Counting in Speech Mixtures Applying a Variational EM for Complex
    Watson Mixture Models. In <i>14th International Workshop on Acoustic Signal Enhancement
    (IWAENC 2014)</i> (pp. 213–217).
  bibtex: '@inproceedings{Drude_Chinaev_Tran Vu_Haeb-Umbach_2014, title={Towards Online
    Source Counting in Speech Mixtures Applying a Variational EM for Complex Watson
    Mixture Models}, booktitle={14th International Workshop on Acoustic Signal Enhancement
    (IWAENC 2014)}, author={Drude, Lukas and Chinaev, Aleksej and Tran Vu, Dang Hai
    and Haeb-Umbach, Reinhold}, year={2014}, pages={213–217} }'
  chicago: Drude, Lukas, Aleksej Chinaev, Dang Hai Tran Vu, and Reinhold Haeb-Umbach.
    “Towards Online Source Counting in Speech Mixtures Applying a Variational EM for
    Complex Watson Mixture Models.” In <i>14th International Workshop on Acoustic
    Signal Enhancement (IWAENC 2014)</i>, 213–17, 2014.
  ieee: L. Drude, A. Chinaev, D. H. Tran Vu, and R. Haeb-Umbach, “Towards Online Source
    Counting in Speech Mixtures Applying a Variational EM for Complex Watson Mixture
    Models,” in <i>14th International Workshop on Acoustic Signal Enhancement (IWAENC
    2014)</i>, 2014, pp. 213–217.
  mla: Drude, Lukas, et al. “Towards Online Source Counting in Speech Mixtures Applying
    a Variational EM for Complex Watson Mixture Models.” <i>14th International Workshop
    on Acoustic Signal Enhancement (IWAENC 2014)</i>, 2014, pp. 213–17.
  short: 'L. Drude, A. Chinaev, D.H. Tran Vu, R. Haeb-Umbach, in: 14th International
    Workshop on Acoustic Signal Enhancement (IWAENC 2014), 2014, pp. 213–217.'
date_created: 2019-07-12T05:27:35Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
keyword:
- Accuracy
- Acoustics
- Estimation
- Mathematical model
- Soruce separation
- Speech
- Vectors
- Bayes methods
- Blind source separation
- Directional statistics
- Number of speakers
- Speaker diarization
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2014/DrChTrHaeb14.pdf
oa: '1'
page: 213-217
publication: 14th International Workshop on Acoustic Signal Enhancement (IWAENC 2014)
related_material:
  link:
  - description: Poster
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2014/DrChTrHaeb14_Poster.pdf
status: public
title: Towards Online Source Counting in Speech Mixtures Applying a Variational EM
  for Complex Watson Mixture Models
type: conference
user_id: '44006'
year: '2014'
...
---
_id: '11716'
abstract:
- lang: eng
  text: The accuracy of automatic speech recognition systems in noisy and reverberant
    environments can be improved notably by exploiting the uncertainty of the estimated
    speech features using so-called uncertainty-of-observation techniques. In this
    paper, we introduce a new Bayesian decision rule that can serve as a mathematical
    framework from which both known and new uncertainty-of-observation techniques
    can be either derived or approximated. The new decision rule in its direct form
    leads to the new significance decoding approach for Gaussian mixture models, which
    results in better performance compared to standard uncertainty-of-observation
    techniques in different additive and convolutive noise scenarios.
author:
- first_name: Ahmed H.
  full_name: Abdelaziz, Ahmed H.
  last_name: Abdelaziz
- first_name: Steffen
  full_name: Zeiler, Steffen
  last_name: Zeiler
- first_name: Dorothea
  full_name: Kolossa, Dorothea
  last_name: Kolossa
- first_name: Volker
  full_name: Leutnant, Volker
  last_name: Leutnant
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Abdelaziz AH, Zeiler S, Kolossa D, Leutnant V, Haeb-Umbach R. GMM-based significance
    decoding. In: <i>Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International
    Conference On</i>. ; 2013:6827-6831. doi:<a href="https://doi.org/10.1109/ICASSP.2013.6638984">10.1109/ICASSP.2013.6638984</a>'
  apa: Abdelaziz, A. H., Zeiler, S., Kolossa, D., Leutnant, V., &#38; Haeb-Umbach,
    R. (2013). GMM-based significance decoding. In <i>Acoustics, Speech and Signal
    Processing (ICASSP), 2013 IEEE International Conference on</i> (pp. 6827–6831).
    <a href="https://doi.org/10.1109/ICASSP.2013.6638984">https://doi.org/10.1109/ICASSP.2013.6638984</a>
  bibtex: '@inproceedings{Abdelaziz_Zeiler_Kolossa_Leutnant_Haeb-Umbach_2013, title={GMM-based
    significance decoding}, DOI={<a href="https://doi.org/10.1109/ICASSP.2013.6638984">10.1109/ICASSP.2013.6638984</a>},
    booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International
    Conference on}, author={Abdelaziz, Ahmed H. and Zeiler, Steffen and Kolossa, Dorothea
    and Leutnant, Volker and Haeb-Umbach, Reinhold}, year={2013}, pages={6827–6831}
    }'
  chicago: Abdelaziz, Ahmed H., Steffen Zeiler, Dorothea Kolossa, Volker Leutnant,
    and Reinhold Haeb-Umbach. “GMM-Based Significance Decoding.” In <i>Acoustics,
    Speech and Signal Processing (ICASSP), 2013 IEEE International Conference On</i>,
    6827–31, 2013. <a href="https://doi.org/10.1109/ICASSP.2013.6638984">https://doi.org/10.1109/ICASSP.2013.6638984</a>.
  ieee: A. H. Abdelaziz, S. Zeiler, D. Kolossa, V. Leutnant, and R. Haeb-Umbach, “GMM-based
    significance decoding,” in <i>Acoustics, Speech and Signal Processing (ICASSP),
    2013 IEEE International Conference on</i>, 2013, pp. 6827–6831.
  mla: Abdelaziz, Ahmed H., et al. “GMM-Based Significance Decoding.” <i>Acoustics,
    Speech and Signal Processing (ICASSP), 2013 IEEE International Conference On</i>,
    2013, pp. 6827–31, doi:<a href="https://doi.org/10.1109/ICASSP.2013.6638984">10.1109/ICASSP.2013.6638984</a>.
  short: 'A.H. Abdelaziz, S. Zeiler, D. Kolossa, V. Leutnant, R. Haeb-Umbach, in:
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference
    On, 2013, pp. 6827–6831.'
date_created: 2019-07-12T05:26:53Z
date_updated: 2022-01-06T06:51:07Z
department:
- _id: '54'
doi: 10.1109/ICASSP.2013.6638984
keyword:
- Bayes methods
- Gaussian processes
- convolution
- decision theory
- decoding
- noise
- reverberation
- speech coding
- speech recognition
- Bayesian decision rule
- GMM
- Gaussian mixture models
- additive noise scenarios
- automatic speech recognition systems
- convolutive noise scenarios
- decoding approach
- mathematical framework
- reverberant environments
- significance decoding
- speech feature estimation
- uncertainty-of-observation techniques
- Hidden Markov models
- Maximum likelihood decoding
- Noise
- Speech
- Speech recognition
- Uncertainty
- Uncertainty-of-observation
- modified imputation
- noise robust speech recognition
- significance decoding
- uncertainty decoding
language:
- iso: eng
page: 6827-6831
publication: Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International
  Conference on
publication_identifier:
  issn:
  - 1520-6149
status: public
title: GMM-based significance decoding
type: conference
user_id: '44006'
year: '2013'
...
---
_id: '11862'
abstract:
- lang: eng
  text: In this contribution we extend a previously proposed Bayesian approach for
    the enhancement of reverberant logarithmic mel power spectral coefficients for
    robust automatic speech recognition to the additional compensation of background
    noise. A recently proposed observation model is employed whose time-variant observation
    error statistics are obtained as a side product of the inference of the a posteriori
    probability density function of the clean speech feature vectors. Further a reduction
    of the computational effort and the memory requirements are achieved by using
    a recursive formulation of the observation model. The performance of the proposed
    algorithms is first experimentally studied on a connected digits recognition task
    with artificially created noisy reverberant data. It is shown that the use of
    the time-variant observation error model leads to a significant error rate reduction
    at low signal-to-noise ratios compared to a time-invariant model. Further experiments
    were conducted on a 5000 word task recorded in a reverberant and noisy environment.
    A significant word error rate reduction was obtained demonstrating the effectiveness
    of the approach on real-world data.
author:
- first_name: Volker
  full_name: Leutnant, Volker
  last_name: Leutnant
- first_name: Alexander
  full_name: Krueger, Alexander
  last_name: Krueger
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Leutnant V, Krueger A, Haeb-Umbach R. Bayesian Feature Enhancement for Reverberation
    and Noise Robust Speech Recognition. <i>IEEE Transactions on Audio, Speech, and
    Language Processing</i>. 2013;21(8):1640-1652. doi:<a href="https://doi.org/10.1109/TASL.2013.2258013">10.1109/TASL.2013.2258013</a>
  apa: Leutnant, V., Krueger, A., &#38; Haeb-Umbach, R. (2013). Bayesian Feature Enhancement
    for Reverberation and Noise Robust Speech Recognition. <i>IEEE Transactions on
    Audio, Speech, and Language Processing</i>, <i>21</i>(8), 1640–1652. <a href="https://doi.org/10.1109/TASL.2013.2258013">https://doi.org/10.1109/TASL.2013.2258013</a>
  bibtex: '@article{Leutnant_Krueger_Haeb-Umbach_2013, title={Bayesian Feature Enhancement
    for Reverberation and Noise Robust Speech Recognition}, volume={21}, DOI={<a href="https://doi.org/10.1109/TASL.2013.2258013">10.1109/TASL.2013.2258013</a>},
    number={8}, journal={IEEE Transactions on Audio, Speech, and Language Processing},
    author={Leutnant, Volker and Krueger, Alexander and Haeb-Umbach, Reinhold}, year={2013},
    pages={1640–1652} }'
  chicago: 'Leutnant, Volker, Alexander Krueger, and Reinhold Haeb-Umbach. “Bayesian
    Feature Enhancement for Reverberation and Noise Robust Speech Recognition.” <i>IEEE
    Transactions on Audio, Speech, and Language Processing</i> 21, no. 8 (2013): 1640–52.
    <a href="https://doi.org/10.1109/TASL.2013.2258013">https://doi.org/10.1109/TASL.2013.2258013</a>.'
  ieee: V. Leutnant, A. Krueger, and R. Haeb-Umbach, “Bayesian Feature Enhancement
    for Reverberation and Noise Robust Speech Recognition,” <i>IEEE Transactions on
    Audio, Speech, and Language Processing</i>, vol. 21, no. 8, pp. 1640–1652, 2013.
  mla: Leutnant, Volker, et al. “Bayesian Feature Enhancement for Reverberation and
    Noise Robust Speech Recognition.” <i>IEEE Transactions on Audio, Speech, and Language
    Processing</i>, vol. 21, no. 8, 2013, pp. 1640–52, doi:<a href="https://doi.org/10.1109/TASL.2013.2258013">10.1109/TASL.2013.2258013</a>.
  short: V. Leutnant, A. Krueger, R. Haeb-Umbach, IEEE Transactions on Audio, Speech,
    and Language Processing 21 (2013) 1640–1652.
date_created: 2019-07-12T05:29:42Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
doi: 10.1109/TASL.2013.2258013
intvolume: '        21'
issue: '8'
keyword:
- Bayes methods
- compensation
- error statistics
- reverberation
- speech recognition
- Bayesian feature enhancement
- background noise
- clean speech feature vectors
- compensation
- connected digits recognition task
- error statistics
- memory requirements
- noisy reverberant data
- posteriori probability density function
- recursive formulation
- reverberant logarithmic mel power spectral coefficients
- robust automatic speech recognition
- signal-to-noise ratios
- time-variant observation
- word error rate reduction
- Robust automatic speech recognition
- model-based Bayesian feature enhancement
- observation model for reverberant and noisy speech
- recursive observation model
language:
- iso: eng
page: 1640-1652
publication: IEEE Transactions on Audio, Speech, and Language Processing
status: public
title: Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition
type: journal_article
user_id: '44006'
volume: 21
year: '2013'
...
---
_id: '11939'
abstract:
- lang: eng
  text: In this paper a switching linear dynamical model (SLDM) approach for speech
    feature enhancement is improved by employing more accurate models for the dynamics
    of speech and noise. The model of the clean speech feature trajectory is improved
    by augmenting the state vector to capture information derived from the delta features.
    Further a hidden noise state variable is introduced to obtain a more elaborated
    model for the noise dynamics. Approximate Bayesian inference in the SLDM is carried
    out by a bank of extended Kalman filters, whose outputs are combined according
    to the a posteriori probability of the individual state models. Experimental results
    on the AURORA2 database show improved recognition accuracy.
author:
- first_name: Stefan
  full_name: Windmann, Stefan
  last_name: Windmann
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Windmann S, Haeb-Umbach R. Modeling the dynamics of speech and noise for speech
    feature enhancement in ASR. In: <i>IEEE International Conference on Acoustics,
    Speech and Signal Processing (ICASSP 2008)</i>. ; 2008:4409-4412. doi:<a href="https://doi.org/10.1109/ICASSP.2008.4518633">10.1109/ICASSP.2008.4518633</a>'
  apa: Windmann, S., &#38; Haeb-Umbach, R. (2008). Modeling the dynamics of speech
    and noise for speech feature enhancement in ASR. In <i>IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP 2008)</i> (pp. 4409–4412).
    <a href="https://doi.org/10.1109/ICASSP.2008.4518633">https://doi.org/10.1109/ICASSP.2008.4518633</a>
  bibtex: '@inproceedings{Windmann_Haeb-Umbach_2008, title={Modeling the dynamics
    of speech and noise for speech feature enhancement in ASR}, DOI={<a href="https://doi.org/10.1109/ICASSP.2008.4518633">10.1109/ICASSP.2008.4518633</a>},
    booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP 2008)}, author={Windmann, Stefan and Haeb-Umbach, Reinhold}, year={2008},
    pages={4409–4412} }'
  chicago: Windmann, Stefan, and Reinhold Haeb-Umbach. “Modeling the Dynamics of Speech
    and Noise for Speech Feature Enhancement in ASR.” In <i>IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP 2008)</i>, 4409–12, 2008. <a
    href="https://doi.org/10.1109/ICASSP.2008.4518633">https://doi.org/10.1109/ICASSP.2008.4518633</a>.
  ieee: S. Windmann and R. Haeb-Umbach, “Modeling the dynamics of speech and noise
    for speech feature enhancement in ASR,” in <i>IEEE International Conference on
    Acoustics, Speech and Signal Processing (ICASSP 2008)</i>, 2008, pp. 4409–4412.
  mla: Windmann, Stefan, and Reinhold Haeb-Umbach. “Modeling the Dynamics of Speech
    and Noise for Speech Feature Enhancement in ASR.” <i>IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP 2008)</i>, 2008, pp. 4409–12,
    doi:<a href="https://doi.org/10.1109/ICASSP.2008.4518633">10.1109/ICASSP.2008.4518633</a>.
  short: 'S. Windmann, R. Haeb-Umbach, in: IEEE International Conference on Acoustics,
    Speech and Signal Processing (ICASSP 2008), 2008, pp. 4409–4412.'
date_created: 2019-07-12T05:31:11Z
date_updated: 2022-01-06T06:51:12Z
department:
- _id: '54'
doi: 10.1109/ICASSP.2008.4518633
keyword:
- a posteriori probability
- AURORA2 database
- Bayesian inference
- Bayes methods
- channel bank filters
- extended Kalman filter banks
- hidden noise state variable
- Kalman filters
- noise dynamics
- speech enhancement
- speech feature enhancement
- speech feature trajectory
- switching linear dynamical model approach
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2008/WiHa08-1.pdf
oa: '1'
page: 4409-4412
publication: IEEE International Conference on Acoustics, Speech and Signal Processing
  (ICASSP 2008)
status: public
title: Modeling the dynamics of speech and noise for speech feature enhancement in
  ASR
type: conference
user_id: '44006'
year: '2008'
...
---
_id: '11870'
abstract:
- lang: eng
  text: We derive a class of computationally inexpensive linear dimension reduction
    criteria by introducing a weighted variant of the well-known K-class Fisher criterion
    associated with linear discriminant analysis (LDA). It can be seen that LDA weights
    contributions of individual class pairs according to the Euclidean distance of
    the respective class means. We generalize upon LDA by introducing a different
    weighting function
author:
- first_name: M.
  full_name: Loog, M.
  last_name: Loog
- first_name: R.P.W.
  full_name: Duin, R.P.W.
  last_name: Duin
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Loog M, Duin RPW, Haeb-Umbach R. Multiclass linear dimension reduction by weighted
    pairwise Fisher criteria. <i>IEEE Transactions on Pattern Analysis and Machine
    Intelligence</i>. 2001;23(7):762-766. doi:<a href="https://doi.org/10.1109/34.935849">10.1109/34.935849</a>
  apa: Loog, M., Duin, R. P. W., &#38; Haeb-Umbach, R. (2001). Multiclass linear dimension
    reduction by weighted pairwise Fisher criteria. <i>IEEE Transactions on Pattern
    Analysis and Machine Intelligence</i>, <i>23</i>(7), 762–766. <a href="https://doi.org/10.1109/34.935849">https://doi.org/10.1109/34.935849</a>
  bibtex: '@article{Loog_Duin_Haeb-Umbach_2001, title={Multiclass linear dimension
    reduction by weighted pairwise Fisher criteria}, volume={23}, DOI={<a href="https://doi.org/10.1109/34.935849">10.1109/34.935849</a>},
    number={7}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
    author={Loog, M. and Duin, R.P.W. and Haeb-Umbach, Reinhold}, year={2001}, pages={762–766}
    }'
  chicago: 'Loog, M., R.P.W. Duin, and Reinhold Haeb-Umbach. “Multiclass Linear Dimension
    Reduction by Weighted Pairwise Fisher Criteria.” <i>IEEE Transactions on Pattern
    Analysis and Machine Intelligence</i> 23, no. 7 (2001): 762–66. <a href="https://doi.org/10.1109/34.935849">https://doi.org/10.1109/34.935849</a>.'
  ieee: M. Loog, R. P. W. Duin, and R. Haeb-Umbach, “Multiclass linear dimension reduction
    by weighted pairwise Fisher criteria,” <i>IEEE Transactions on Pattern Analysis
    and Machine Intelligence</i>, vol. 23, no. 7, pp. 762–766, 2001.
  mla: Loog, M., et al. “Multiclass Linear Dimension Reduction by Weighted Pairwise
    Fisher Criteria.” <i>IEEE Transactions on Pattern Analysis and Machine Intelligence</i>,
    vol. 23, no. 7, 2001, pp. 762–66, doi:<a href="https://doi.org/10.1109/34.935849">10.1109/34.935849</a>.
  short: M. Loog, R.P.W. Duin, R. Haeb-Umbach, IEEE Transactions on Pattern Analysis
    and Machine Intelligence 23 (2001) 762–766.
date_created: 2019-07-12T05:29:51Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
doi: 10.1109/34.935849
intvolume: '        23'
issue: '7'
keyword:
- approximate pairwise accuracy
- Bayes error
- Bayes methods
- error statistics
- Euclidean distance
- Fisher criterion
- linear dimension reduction
- linear discriminant analysis
- pattern classification
- statistical analysis
- statistical pattern classification
- weighting function
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2001/LoDuHa01.pdf
oa: '1'
page: 762-766
publication: IEEE Transactions on Pattern Analysis and Machine Intelligence
status: public
title: Multiclass linear dimension reduction by weighted pairwise Fisher criteria
type: journal_article
user_id: '44006'
volume: 23
year: '2001'
...
