---
_id: '11861'
abstract:
- lang: eng
  text: 'In this contribution we present a theoretical and experimental investigation
    into the effects of reverberation and noise on features in the logarithmic mel
    power spectral domain, an intermediate stage in the computation of the mel frequency
    cepstral coefficients, prevalent in automatic speech recognition (ASR). Gaining
    insight into the complex interaction between clean speech, noise, and noisy reverberant
    speech features is essential for any ASR system to be robust against noise and
    reverberation present in distant microphone input signals. The findings are gathered
    in a probabilistic formulation of an observation model which may be used in model-based
    feature compensation schemes. The proposed observation model extends previous
    models in three major directions: First, the contribution of additive background
    noise to the observation error is explicitly taken into account. Second, an energy
    compensation constant is introduced which ensures an unbiased estimate of the
    reverberant speech features, and, third, a recursive variant of the observation
    model is developed resulting in reduced computational complexity when used in
    model-based feature compensation. The experimental section is used to evaluate
    the accuracy of the model and to describe how its parameters can be determined
    from test data.'
author:
- first_name: Volker
  full_name: Leutnant, Volker
  last_name: Leutnant
- first_name: Alexander
  full_name: Krueger, Alexander
  last_name: Krueger
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Leutnant V, Krueger A, Haeb-Umbach R. A New Observation Model in the Logarithmic
    Mel Power Spectral Domain for the Automatic Recognition of Noisy Reverberant Speech.
    <i>IEEE/ACM Transactions on Audio, Speech, and Language Processing</i>. 2014;22(1):95-109.
    doi:<a href="https://doi.org/10.1109/TASLP.2013.2285480">10.1109/TASLP.2013.2285480</a>
  apa: Leutnant, V., Krueger, A., &#38; Haeb-Umbach, R. (2014). A New Observation
    Model in the Logarithmic Mel Power Spectral Domain for the Automatic Recognition
    of Noisy Reverberant Speech. <i>IEEE/ACM Transactions on Audio, Speech, and Language
    Processing</i>, <i>22</i>(1), 95–109. <a href="https://doi.org/10.1109/TASLP.2013.2285480">https://doi.org/10.1109/TASLP.2013.2285480</a>
  bibtex: '@article{Leutnant_Krueger_Haeb-Umbach_2014, title={A New Observation Model
    in the Logarithmic Mel Power Spectral Domain for the Automatic Recognition of
    Noisy Reverberant Speech}, volume={22}, DOI={<a href="https://doi.org/10.1109/TASLP.2013.2285480">10.1109/TASLP.2013.2285480</a>},
    number={1}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
    author={Leutnant, Volker and Krueger, Alexander and Haeb-Umbach, Reinhold}, year={2014},
    pages={95–109} }'
  chicago: 'Leutnant, Volker, Alexander Krueger, and Reinhold Haeb-Umbach. “A New
    Observation Model in the Logarithmic Mel Power Spectral Domain for the Automatic
    Recognition of Noisy Reverberant Speech.” <i>IEEE/ACM Transactions on Audio, Speech,
    and Language Processing</i> 22, no. 1 (2014): 95–109. <a href="https://doi.org/10.1109/TASLP.2013.2285480">https://doi.org/10.1109/TASLP.2013.2285480</a>.'
  ieee: V. Leutnant, A. Krueger, and R. Haeb-Umbach, “A New Observation Model in the
    Logarithmic Mel Power Spectral Domain for the Automatic Recognition of Noisy Reverberant
    Speech,” <i>IEEE/ACM Transactions on Audio, Speech, and Language Processing</i>,
    vol. 22, no. 1, pp. 95–109, 2014.
  mla: Leutnant, Volker, et al. “A New Observation Model in the Logarithmic Mel Power
    Spectral Domain for the Automatic Recognition of Noisy Reverberant Speech.” <i>IEEE/ACM
    Transactions on Audio, Speech, and Language Processing</i>, vol. 22, no. 1, 2014,
    pp. 95–109, doi:<a href="https://doi.org/10.1109/TASLP.2013.2285480">10.1109/TASLP.2013.2285480</a>.
  short: V. Leutnant, A. Krueger, R. Haeb-Umbach, IEEE/ACM Transactions on Audio,
    Speech, and Language Processing 22 (2014) 95–109.
date_created: 2019-07-12T05:29:41Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
doi: 10.1109/TASLP.2013.2285480
intvolume: '        22'
issue: '1'
keyword:
- computational complexity
- reverberation
- speech recognition
- automatic speech recognition
- background noise
- clean speech
- computational complexity
- energy compensation
- logarithmic mel power spectral domain
- mel frequency cepstral coefficients
- microphone input signals
- model-based feature compensation schemes
- noisy reverberant speech automatic recognition
- noisy reverberant speech features
- reverberation
- Atmospheric modeling
- Computational modeling
- Noise
- Noise measurement
- Reverberation
- Speech
- Vectors
- Model-based feature compensation
- observation model for reverberant and noisy speech
- recursive observation model
- robust automatic speech recognition
language:
- iso: eng
page: 95-109
publication: IEEE/ACM Transactions on Audio, Speech, and Language Processing
publication_identifier:
  issn:
  - 2329-9290
status: public
title: A New Observation Model in the Logarithmic Mel Power Spectral Domain for the
  Automatic Recognition of Noisy Reverberant Speech
type: journal_article
user_id: '44006'
volume: 22
year: '2014'
...
---
_id: '11862'
abstract:
- lang: eng
  text: In this contribution we extend a previously proposed Bayesian approach for
    the enhancement of reverberant logarithmic mel power spectral coefficients for
    robust automatic speech recognition to the additional compensation of background
    noise. A recently proposed observation model is employed whose time-variant observation
    error statistics are obtained as a side product of the inference of the a posteriori
    probability density function of the clean speech feature vectors. Further a reduction
    of the computational effort and the memory requirements are achieved by using
    a recursive formulation of the observation model. The performance of the proposed
    algorithms is first experimentally studied on a connected digits recognition task
    with artificially created noisy reverberant data. It is shown that the use of
    the time-variant observation error model leads to a significant error rate reduction
    at low signal-to-noise ratios compared to a time-invariant model. Further experiments
    were conducted on a 5000 word task recorded in a reverberant and noisy environment.
    A significant word error rate reduction was obtained demonstrating the effectiveness
    of the approach on real-world data.
author:
- first_name: Volker
  full_name: Leutnant, Volker
  last_name: Leutnant
- first_name: Alexander
  full_name: Krueger, Alexander
  last_name: Krueger
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Leutnant V, Krueger A, Haeb-Umbach R. Bayesian Feature Enhancement for Reverberation
    and Noise Robust Speech Recognition. <i>IEEE Transactions on Audio, Speech, and
    Language Processing</i>. 2013;21(8):1640-1652. doi:<a href="https://doi.org/10.1109/TASL.2013.2258013">10.1109/TASL.2013.2258013</a>
  apa: Leutnant, V., Krueger, A., &#38; Haeb-Umbach, R. (2013). Bayesian Feature Enhancement
    for Reverberation and Noise Robust Speech Recognition. <i>IEEE Transactions on
    Audio, Speech, and Language Processing</i>, <i>21</i>(8), 1640–1652. <a href="https://doi.org/10.1109/TASL.2013.2258013">https://doi.org/10.1109/TASL.2013.2258013</a>
  bibtex: '@article{Leutnant_Krueger_Haeb-Umbach_2013, title={Bayesian Feature Enhancement
    for Reverberation and Noise Robust Speech Recognition}, volume={21}, DOI={<a href="https://doi.org/10.1109/TASL.2013.2258013">10.1109/TASL.2013.2258013</a>},
    number={8}, journal={IEEE Transactions on Audio, Speech, and Language Processing},
    author={Leutnant, Volker and Krueger, Alexander and Haeb-Umbach, Reinhold}, year={2013},
    pages={1640–1652} }'
  chicago: 'Leutnant, Volker, Alexander Krueger, and Reinhold Haeb-Umbach. “Bayesian
    Feature Enhancement for Reverberation and Noise Robust Speech Recognition.” <i>IEEE
    Transactions on Audio, Speech, and Language Processing</i> 21, no. 8 (2013): 1640–52.
    <a href="https://doi.org/10.1109/TASL.2013.2258013">https://doi.org/10.1109/TASL.2013.2258013</a>.'
  ieee: V. Leutnant, A. Krueger, and R. Haeb-Umbach, “Bayesian Feature Enhancement
    for Reverberation and Noise Robust Speech Recognition,” <i>IEEE Transactions on
    Audio, Speech, and Language Processing</i>, vol. 21, no. 8, pp. 1640–1652, 2013.
  mla: Leutnant, Volker, et al. “Bayesian Feature Enhancement for Reverberation and
    Noise Robust Speech Recognition.” <i>IEEE Transactions on Audio, Speech, and Language
    Processing</i>, vol. 21, no. 8, 2013, pp. 1640–52, doi:<a href="https://doi.org/10.1109/TASL.2013.2258013">10.1109/TASL.2013.2258013</a>.
  short: V. Leutnant, A. Krueger, R. Haeb-Umbach, IEEE Transactions on Audio, Speech,
    and Language Processing 21 (2013) 1640–1652.
date_created: 2019-07-12T05:29:42Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
doi: 10.1109/TASL.2013.2258013
intvolume: '        21'
issue: '8'
keyword:
- Bayes methods
- compensation
- error statistics
- reverberation
- speech recognition
- Bayesian feature enhancement
- background noise
- clean speech feature vectors
- compensation
- connected digits recognition task
- error statistics
- memory requirements
- noisy reverberant data
- posteriori probability density function
- recursive formulation
- reverberant logarithmic mel power spectral coefficients
- robust automatic speech recognition
- signal-to-noise ratios
- time-variant observation
- word error rate reduction
- Robust automatic speech recognition
- model-based Bayesian feature enhancement
- observation model for reverberant and noisy speech
- recursive observation model
language:
- iso: eng
page: 1640-1652
publication: IEEE Transactions on Audio, Speech, and Language Processing
status: public
title: Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition
type: journal_article
user_id: '44006'
volume: 21
year: '2013'
...
---
_id: '11943'
abstract:
- lang: eng
  text: A marginalized particle filter is proposed for performing single channel speech
    enhancement with a non-linear dynamic state model. The system consists of a particle
    filter for tracking line spectral pair (LSP) parameters and a Kalman filter per
    particle for speech enhancement. The state model for the LSPs has been learnt
    on clean speech training data. In our approach parameters and speech samples are
    processed at different time scales by assuming the parameters to be constant for
    small blocks of data. Further enhancement is obtained by an iteration which can
    be applied on these small blocks. The experiments show that similar SNR gains
    are obtained as with the Kalman-LM-iterative algorithm. However better values
    of the noise level and the log-spectral distance are achieved
author:
- first_name: Stefan
  full_name: Windmann, Stefan
  last_name: Windmann
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Windmann S, Haeb-Umbach R. Iterative Speech Enhancement using a Non-Linear
    Dynamic State Model of Speech and its Parameters. In: <i>IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP 2006)</i>. Vol 1. ; 2006:I.
    doi:<a href="https://doi.org/10.1109/ICASSP.2006.1660058">10.1109/ICASSP.2006.1660058</a>'
  apa: Windmann, S., &#38; Haeb-Umbach, R. (2006). Iterative Speech Enhancement using
    a Non-Linear Dynamic State Model of Speech and its Parameters. In <i>IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP 2006)</i> (Vol.
    1, p. I). <a href="https://doi.org/10.1109/ICASSP.2006.1660058">https://doi.org/10.1109/ICASSP.2006.1660058</a>
  bibtex: '@inproceedings{Windmann_Haeb-Umbach_2006, title={Iterative Speech Enhancement
    using a Non-Linear Dynamic State Model of Speech and its Parameters}, volume={1},
    DOI={<a href="https://doi.org/10.1109/ICASSP.2006.1660058">10.1109/ICASSP.2006.1660058</a>},
    booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP 2006)}, author={Windmann, Stefan and Haeb-Umbach, Reinhold}, year={2006},
    pages={I} }'
  chicago: Windmann, Stefan, and Reinhold Haeb-Umbach. “Iterative Speech Enhancement
    Using a Non-Linear Dynamic State Model of Speech and Its Parameters.” In <i>IEEE
    International Conference on Acoustics, Speech and Signal Processing (ICASSP 2006)</i>,
    1:I, 2006. <a href="https://doi.org/10.1109/ICASSP.2006.1660058">https://doi.org/10.1109/ICASSP.2006.1660058</a>.
  ieee: S. Windmann and R. Haeb-Umbach, “Iterative Speech Enhancement using a Non-Linear
    Dynamic State Model of Speech and its Parameters,” in <i>IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP 2006)</i>, 2006, vol. 1, p.
    I.
  mla: Windmann, Stefan, and Reinhold Haeb-Umbach. “Iterative Speech Enhancement Using
    a Non-Linear Dynamic State Model of Speech and Its Parameters.” <i>IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP 2006)</i>, vol.
    1, 2006, p. I, doi:<a href="https://doi.org/10.1109/ICASSP.2006.1660058">10.1109/ICASSP.2006.1660058</a>.
  short: 'S. Windmann, R. Haeb-Umbach, in: IEEE International Conference on Acoustics,
    Speech and Signal Processing (ICASSP 2006), 2006, p. I.'
date_created: 2019-07-12T05:31:15Z
date_updated: 2022-01-06T06:51:12Z
department:
- _id: '54'
doi: 10.1109/ICASSP.2006.1660058
intvolume: '         1'
keyword:
- clean speech training data
- iterative methods
- iterative speech enhancement
- Kalman filter
- Kalman filters
- Kalman-LM-iterative algorithm
- line spectral pair parameters
- log-spectral distance
- marginalized particle filter
- noise level
- nonlinear dynamic state speech model
- particle filtering (numerical methods)
- single channel speech enhancement
- SNR gains
- speech enhancement
- speech samples
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2006/WiHa06-2.pdf
oa: '1'
page: I
publication: IEEE International Conference on Acoustics, Speech and Signal Processing
  (ICASSP 2006)
status: public
title: Iterative Speech Enhancement using a Non-Linear Dynamic State Model of Speech
  and its Parameters
type: conference
user_id: '44006'
volume: 1
year: '2006'
...
