---
_id: '11965'
abstract:
- lang: eng
  text: 'We present an unsupervised training approach for a neural network-based mask
    estimator in an acoustic beamforming application. The network is trained to maximize
    a likelihood criterion derived from a spatial mixture model of the observations.
    It is trained from scratch without requiring any parallel data consisting of degraded
    input and clean training targets. Thus, training can be carried out on real recordings
    of noisy speech rather than simulated ones. In contrast to previous work on unsupervised
    training of neural mask estimators, our approach avoids the need for a possibly
    pre-trained teacher model entirely. We demonstrate the effectiveness of our approach
    by speech recognition experiments on two different datasets: one mainly deteriorated
    by noise (CHiME 4) and one by reverberation (REVERB). The results show that the
    performance of the proposed system is on par with a supervised system using oracle
    target masks for training and with a system trained using a model-based teacher.'
author:
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Drude L, Heymann J, Haeb-Umbach R. Unsupervised training of neural mask-based
    beamforming. In: <i>INTERSPEECH 2019, Graz, Austria</i>. ; 2019.'
  apa: Drude, L., Heymann, J., &#38; Haeb-Umbach, R. (2019). Unsupervised training
    of neural mask-based beamforming. In <i>INTERSPEECH 2019, Graz, Austria</i>.
  bibtex: '@inproceedings{Drude_Heymann_Haeb-Umbach_2019, title={Unsupervised training
    of neural mask-based beamforming}, booktitle={INTERSPEECH 2019, Graz, Austria},
    author={Drude, Lukas and Heymann, Jahn and Haeb-Umbach, Reinhold}, year={2019}
    }'
  chicago: Drude, Lukas, Jahn Heymann, and Reinhold Haeb-Umbach. “Unsupervised Training
    of Neural Mask-Based Beamforming.” In <i>INTERSPEECH 2019, Graz, Austria</i>,
    2019.
  ieee: L. Drude, J. Heymann, and R. Haeb-Umbach, “Unsupervised training of neural
    mask-based beamforming,” in <i>INTERSPEECH 2019, Graz, Austria</i>, 2019.
  mla: Drude, Lukas, et al. “Unsupervised Training of Neural Mask-Based Beamforming.”
    <i>INTERSPEECH 2019, Graz, Austria</i>, 2019.
  short: 'L. Drude, J. Heymann, R. Haeb-Umbach, in: INTERSPEECH 2019, Graz, Austria,
    2019.'
date_created: 2019-07-18T09:11:39Z
date_updated: 2022-01-06T06:51:14Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2019-08-13T06:36:44Z
  date_updated: 2019-08-13T06:41:35Z
  file_id: '12914'
  file_name: INTERSPEECH_2019_Drude_Paper.pdf
  file_size: 223413
  relation: main_file
file_date_updated: 2019-08-13T06:41:35Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: INTERSPEECH 2019, Graz, Austria
status: public
title: Unsupervised training of neural mask-based beamforming
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '12875'
abstract:
- lang: eng
  text: Signal dereverberation using the Weighted Prediction Error (WPE) method has
    been proven to be an effective means to raise the accuracy of far-field speech
    recognition. First proposed as an iterative algorithm, follow-up works have reformulated
    it as a recursive least squares algorithm and therefore enabled its use in online
    applications. For this algorithm, the estimation of the power spectral density
    (PSD) of the anechoic signal plays an important role and strongly influences its
    performance. Recently, we showed that using a neural network PSD estimator leads
    to improved performance for online automatic speech recognition. This, however,
    comes at a price. To train the network, we require parallel data, i.e., utterances
    simultaneously available in clean and reverberated form. Here we propose to overcome
    this limitation by training the network jointly with the acoustic model of the
    speech recognizer. To be specific, the gradients computed from the cross-entropy
    loss between the target senone sequence and the acoustic model network output
    is backpropagated through the complex-valued dereverberation filter estimation
    to the neural network for PSD estimation. Evaluation on two databases demonstrates
    improved performance for on-line processing scenarios while imposing fewer requirements
    on the available training data and thus widening the range of applications.
author:
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Tomohiro
  full_name: Nakatani, Tomohiro
  last_name: Nakatani
citation:
  ama: 'Heymann J, Drude L, Haeb-Umbach R, Kinoshita K, Nakatani T. Joint Optimization
    of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online
    ASR. In: <i>ICASSP 2019, Brighton, UK</i>. ; 2019.'
  apa: Heymann, J., Drude, L., Haeb-Umbach, R., Kinoshita, K., &#38; Nakatani, T.
    (2019). Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic
    Model for Robust Online ASR. In <i>ICASSP 2019, Brighton, UK</i>.
  bibtex: '@inproceedings{Heymann_Drude_Haeb-Umbach_Kinoshita_Nakatani_2019, title={Joint
    Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for
    Robust Online ASR}, booktitle={ICASSP 2019, Brighton, UK}, author={Heymann, Jahn
    and Drude, Lukas and Haeb-Umbach, Reinhold and Kinoshita, Keisuke and Nakatani,
    Tomohiro}, year={2019} }'
  chicago: Heymann, Jahn, Lukas Drude, Reinhold Haeb-Umbach, Keisuke Kinoshita, and
    Tomohiro Nakatani. “Joint Optimization of Neural Network-Based WPE Dereverberation
    and Acoustic Model for Robust Online ASR.” In <i>ICASSP 2019, Brighton, UK</i>,
    2019.
  ieee: J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, and T. Nakatani, “Joint
    Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for
    Robust Online ASR,” in <i>ICASSP 2019, Brighton, UK</i>, 2019.
  mla: Heymann, Jahn, et al. “Joint Optimization of Neural Network-Based WPE Dereverberation
    and Acoustic Model for Robust Online ASR.” <i>ICASSP 2019, Brighton, UK</i>, 2019.
  short: 'J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, T. Nakatani, in: ICASSP
    2019, Brighton, UK, 2019.'
date_created: 2019-07-23T07:42:26Z
date_updated: 2022-01-06T06:51:22Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2019-12-17T07:28:06Z
  date_updated: 2019-12-17T07:28:06Z
  file_id: '15334'
  file_name: ICASSP_2019_Heymann_Paper.pdf
  file_size: 199109
  relation: main_file
file_date_updated: 2019-12-17T07:28:06Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: ICASSP 2019, Brighton, UK
status: public
title: Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic
  Model for Robust Online ASR
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '18107'
author:
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: M.
  full_name: Bacchiani, M.
  last_name: Bacchiani
- first_name: T. N.
  full_name: Sainath, T. N.
  last_name: Sainath
citation:
  ama: 'Heymann J, Bacchiani M, Sainath TN. Performance of Mask Based Statistical
    Beamforming in a Smart Home Scenario. In: <i>2018 IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP)</i>. ; 2018:6722-6726. doi:<a
    href="https://doi.org/10.1109/ICASSP.2018.8462372">10.1109/ICASSP.2018.8462372</a>'
  apa: Heymann, J., Bacchiani, M., &#38; Sainath, T. N. (2018). Performance of Mask
    Based Statistical Beamforming in a Smart Home Scenario. In <i>2018 IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP)</i> (pp. 6722–6726).
    <a href="https://doi.org/10.1109/ICASSP.2018.8462372">https://doi.org/10.1109/ICASSP.2018.8462372</a>
  bibtex: '@inproceedings{Heymann_Bacchiani_Sainath_2018, title={Performance of Mask
    Based Statistical Beamforming in a Smart Home Scenario}, DOI={<a href="https://doi.org/10.1109/ICASSP.2018.8462372">10.1109/ICASSP.2018.8462372</a>},
    booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal
    Processing (ICASSP)}, author={Heymann, Jahn and Bacchiani, M. and Sainath, T.
    N.}, year={2018}, pages={6722–6726} }'
  chicago: Heymann, Jahn, M. Bacchiani, and T. N. Sainath. “Performance of Mask Based
    Statistical Beamforming in a Smart Home Scenario.” In <i>2018 IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, 6722–26, 2018.
    <a href="https://doi.org/10.1109/ICASSP.2018.8462372">https://doi.org/10.1109/ICASSP.2018.8462372</a>.
  ieee: J. Heymann, M. Bacchiani, and T. N. Sainath, “Performance of Mask Based Statistical
    Beamforming in a Smart Home Scenario,” in <i>2018 IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2018, pp. 6722–6726.
  mla: Heymann, Jahn, et al. “Performance of Mask Based Statistical Beamforming in
    a Smart Home Scenario.” <i>2018 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)</i>, 2018, pp. 6722–26, doi:<a href="https://doi.org/10.1109/ICASSP.2018.8462372">10.1109/ICASSP.2018.8462372</a>.
  short: 'J. Heymann, M. Bacchiani, T.N. Sainath, in: 2018 IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 6722–6726.'
date_created: 2020-08-20T14:20:59Z
date_updated: 2022-01-06T06:53:26Z
department:
- _id: '54'
doi: 10.1109/ICASSP.2018.8462372
language:
- iso: eng
page: 6722-6726
publication: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing
  (ICASSP)
status: public
title: Performance of Mask Based Statistical Beamforming in a Smart Home Scenario
type: conference
user_id: '44006'
year: '2018'
...
---
_id: '11835'
abstract:
- lang: eng
  text: Signal dereverberation using the weighted prediction error (WPE) method has
    been proven to be an effective means to raise the accuracy of far-field speech
    recognition. But in its original formulation, WPE requires multiple iterations
    over a sufficiently long utterance, rendering it unsuitable for online low-latency
    applications. Recently, two methods have been proposed to overcome this limitation.
    One utilizes a neural network to estimate the power spectral density (PSD) of
    the target signal and works in a block-online fashion. The other method relies
    on a rather simple PSD estimation which smoothes the observed PSD and utilizes
    a recursive formulation which enables it to work on a frame-by-frame basis. In
    this paper, we integrate a deep neural network (DNN) based estimator into the
    recursive frame-online formulation. We evaluate the performance of the recursive
    system with different PSD estimators in comparison to the block-online and offline
    variant on two distinct corpora. The REVERB challenge data, where the signal is
    mainly deteriorated by reverberation, and a database which combines WSJ and VoiceHome
    to also consider (directed) noise sources. The results show that although smoothing
    works surprisingly well, the more sophisticated DNN based estimator shows promising
    improvements and shortens the performance gap between online and offline processing.
author:
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Tomohiro
  full_name: Nakatani, Tomohiro
  last_name: Nakatani
citation:
  ama: 'Heymann J, Drude L, Haeb-Umbach R, Kinoshita K, Nakatani T. Frame-Online DNN-WPE
    Dereverberation. In: <i>IWAENC 2018, Tokio, Japan</i>. ; 2018.'
  apa: Heymann, J., Drude, L., Haeb-Umbach, R., Kinoshita, K., &#38; Nakatani, T.
    (2018). Frame-Online DNN-WPE Dereverberation. In <i>IWAENC 2018, Tokio, Japan</i>.
  bibtex: '@inproceedings{Heymann_Drude_Haeb-Umbach_Kinoshita_Nakatani_2018, title={Frame-Online
    DNN-WPE Dereverberation}, booktitle={IWAENC 2018, Tokio, Japan}, author={Heymann,
    Jahn and Drude, Lukas and Haeb-Umbach, Reinhold and Kinoshita, Keisuke and Nakatani,
    Tomohiro}, year={2018} }'
  chicago: Heymann, Jahn, Lukas Drude, Reinhold Haeb-Umbach, Keisuke Kinoshita, and
    Tomohiro Nakatani. “Frame-Online DNN-WPE Dereverberation.” In <i>IWAENC 2018,
    Tokio, Japan</i>, 2018.
  ieee: J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, and T. Nakatani, “Frame-Online
    DNN-WPE Dereverberation,” in <i>IWAENC 2018, Tokio, Japan</i>, 2018.
  mla: Heymann, Jahn, et al. “Frame-Online DNN-WPE Dereverberation.” <i>IWAENC 2018,
    Tokio, Japan</i>, 2018.
  short: 'J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, T. Nakatani, in: IWAENC
    2018, Tokio, Japan, 2018.'
date_created: 2019-07-12T05:29:10Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/IWAENC_2018_Heymann_Paper.pdf
oa: '1'
publication: IWAENC 2018, Tokio, Japan
related_material:
  link:
  - description: Poster
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/IWAENC_2018_Heymann_Poster.pdf
status: public
title: Frame-Online DNN-WPE Dereverberation
type: conference
user_id: '44006'
year: '2018'
...
---
_id: '11837'
abstract:
- lang: eng
  text: We present a block-online multi-channel front end for automatic speech recognition
    in noisy and reverberated environments. It is an online version of our earlier
    proposed neural network supported acoustic beamformer, whose coefficients are
    calculated from noise and speech spatial covariance matrices which are estimated
    utilizing a neural mask estimator. However, the sparsity of speech in the STFT
    domain causes problems for the initial beamformer coefficients estimation in some
    frequency bins due to lack of speech observations. We propose two methods to mitigate
    this issue. The first is to lower the frequency resolution of the STFT, which
    comes with the additional advantage of a reduced time window, thus lowering the
    latency introduced by block processing. The second approach is to smooth beamforming
    coefficients along the frequency axis, thus exploiting their high interfrequency
    correlation. With both approaches the gap between offline and block-online beamformer
    performance, as measured by the word error rate achieved by a downstream speech
    recognizer, is significantly reduced. Experiments are carried out on two copora,
    representing noisy (CHiME-4) and noisy reverberant (voiceHome) environments.
author:
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Heitkaemper J, Heymann J, Haeb-Umbach R. Smoothing along Frequency in Online
    Neural Network Supported Acoustic Beamforming. In: <i>ITG 2018, Oldenburg, Germany</i>.
    ; 2018.'
  apa: Heitkaemper, J., Heymann, J., &#38; Haeb-Umbach, R. (2018). Smoothing along
    Frequency in Online Neural Network Supported Acoustic Beamforming. In <i>ITG 2018,
    Oldenburg, Germany</i>.
  bibtex: '@inproceedings{Heitkaemper_Heymann_Haeb-Umbach_2018, title={Smoothing along
    Frequency in Online Neural Network Supported Acoustic Beamforming}, booktitle={ITG
    2018, Oldenburg, Germany}, author={Heitkaemper, Jens and Heymann, Jahn and Haeb-Umbach,
    Reinhold}, year={2018} }'
  chicago: Heitkaemper, Jens, Jahn Heymann, and Reinhold Haeb-Umbach. “Smoothing along
    Frequency in Online Neural Network Supported Acoustic Beamforming.” In <i>ITG
    2018, Oldenburg, Germany</i>, 2018.
  ieee: J. Heitkaemper, J. Heymann, and R. Haeb-Umbach, “Smoothing along Frequency
    in Online Neural Network Supported Acoustic Beamforming,” in <i>ITG 2018, Oldenburg,
    Germany</i>, 2018.
  mla: Heitkaemper, Jens, et al. “Smoothing along Frequency in Online Neural Network
    Supported Acoustic Beamforming.” <i>ITG 2018, Oldenburg, Germany</i>, 2018.
  short: 'J. Heitkaemper, J. Heymann, R. Haeb-Umbach, in: ITG 2018, Oldenburg, Germany,
    2018.'
date_created: 2019-07-12T05:29:13Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/ITG_2018_Heitkaemper_Paper.pdf
oa: '1'
publication: ITG 2018, Oldenburg, Germany
related_material:
  link:
  - description: Slides
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/ITG_2018_Heitkaemper_Slides.pdf
status: public
title: Smoothing along Frequency in Online Neural Network Supported Acoustic Beamforming
type: conference
user_id: '44006'
year: '2018'
...
---
_id: '11872'
abstract:
- lang: eng
  text: 'The weighted prediction error (WPE) algorithm has proven to be a very successful
    dereverberation method for the REVERB challenge. Likewise, neural network based
    mask estimation for beamforming demonstrated very good noise suppression in the
    CHiME 3 and CHiME 4 challenges. Recently, it has been shown that this estimator
    can also be trained to perform dereverberation and denoising jointly. However,
    up to now a comparison of a neural beamformer and WPE is still missing, so is
    an investigation into a combination of the two. Therefore, we here provide an
    extensive evaluation of both and consequently propose variants to integrate deep
    neural network based beamforming with WPE. For these integrated variants we identify
    a consistent word error rate (WER) reduction on two distinct databases. In particular,
    our study shows that deep learning based beamforming benefits from a model-based
    dereverberation technique (i.e. WPE) and vice versa. Our key findings are: (a)
    Neural beamforming yields the lower WERs in comparison to WPE the more channels
    and noise are present. (b) Integration of WPE and a neural beamformer consistently
    outperforms all stand-alone systems.'
author:
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Tomohiro
  full_name: Nakatani, Tomohiro
  last_name: Nakatani
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Drude L, Boeddeker C, Heymann J, et al. Integration neural network based beamforming
    and weighted prediction error dereverberation. In: <i>INTERSPEECH 2018, Hyderabad,
    India</i>. ; 2018.'
  apa: Drude, L., Boeddeker, C., Heymann, J., Kinoshita, K., Delcroix, M., Nakatani,
    T., &#38; Haeb-Umbach, R. (2018). Integration neural network based beamforming
    and weighted prediction error dereverberation. In <i>INTERSPEECH 2018, Hyderabad,
    India</i>.
  bibtex: '@inproceedings{Drude_Boeddeker_Heymann_Kinoshita_Delcroix_Nakatani_Haeb-Umbach_2018,
    title={Integration neural network based beamforming and weighted prediction error
    dereverberation}, booktitle={INTERSPEECH 2018, Hyderabad, India}, author={Drude,
    Lukas and Boeddeker, Christoph and Heymann, Jahn and Kinoshita, Keisuke and Delcroix,
    Marc and Nakatani, Tomohiro and Haeb-Umbach, Reinhold}, year={2018} }'
  chicago: Drude, Lukas, Christoph Boeddeker, Jahn Heymann, Keisuke Kinoshita, Marc
    Delcroix, Tomohiro Nakatani, and Reinhold Haeb-Umbach. “Integration Neural Network
    Based Beamforming and Weighted Prediction Error Dereverberation.” In <i>INTERSPEECH
    2018, Hyderabad, India</i>, 2018.
  ieee: L. Drude <i>et al.</i>, “Integration neural network based beamforming and
    weighted prediction error dereverberation,” in <i>INTERSPEECH 2018, Hyderabad,
    India</i>, 2018.
  mla: Drude, Lukas, et al. “Integration Neural Network Based Beamforming and Weighted
    Prediction Error Dereverberation.” <i>INTERSPEECH 2018, Hyderabad, India</i>,
    2018.
  short: 'L. Drude, C. Boeddeker, J. Heymann, K. Kinoshita, M. Delcroix, T. Nakatani,
    R. Haeb-Umbach, in: INTERSPEECH 2018, Hyderabad, India, 2018.'
date_created: 2019-07-12T05:29:53Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Drude_Paper.pdf
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: INTERSPEECH 2018, Hyderabad, India
related_material:
  link:
  - description: Slides
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Drude_Slides.pdf
status: public
title: Integration neural network based beamforming and weighted prediction error
  dereverberation
type: conference
user_id: '40767'
year: '2018'
...
---
_id: '11873'
abstract:
- lang: eng
  text: NARA-WPE is a Python software package providing implementations of the weighted
    prediction error (WPE) dereverberation algorithm. WPE has been shown to be a highly
    effective tool for speech dereverberation, thus improving the perceptual quality
    of the signal and improving the recognition performance of downstream automatic
    speech recognition (ASR). It is suitable both for single-channel and multi-channel
    applications. The package consist of (1) a Numpy implementation which can easily
    be integrated into a custom Python toolchain, and (2) a TensorFlow implementation
    which allows integration into larger computational graphs and enables backpropagation
    through WPE to train more advanced front-ends. This package comprises of an iterative
    offline (batch) version, a block-online version, and a frame-online version which
    can be used in moderately low latency applications, e.g. digital speech assistants.
author:
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Drude L, Heymann J, Boeddeker C, Haeb-Umbach R. NARA-WPE: A Python package
    for weighted prediction error dereverberation in Numpy and Tensorflow for online
    and offline processing. In: <i>ITG 2018, Oldenburg, Germany</i>. ; 2018.'
  apa: 'Drude, L., Heymann, J., Boeddeker, C., &#38; Haeb-Umbach, R. (2018). NARA-WPE:
    A Python package for weighted prediction error dereverberation in Numpy and Tensorflow
    for online and offline processing. In <i>ITG 2018, Oldenburg, Germany</i>.'
  bibtex: '@inproceedings{Drude_Heymann_Boeddeker_Haeb-Umbach_2018, title={NARA-WPE:
    A Python package for weighted prediction error dereverberation in Numpy and Tensorflow
    for online and offline processing}, booktitle={ITG 2018, Oldenburg, Germany},
    author={Drude, Lukas and Heymann, Jahn and Boeddeker, Christoph and Haeb-Umbach,
    Reinhold}, year={2018} }'
  chicago: 'Drude, Lukas, Jahn Heymann, Christoph Boeddeker, and Reinhold Haeb-Umbach.
    “NARA-WPE: A Python Package for Weighted Prediction Error Dereverberation in Numpy
    and Tensorflow for Online and Offline Processing.” In <i>ITG 2018, Oldenburg,
    Germany</i>, 2018.'
  ieee: 'L. Drude, J. Heymann, C. Boeddeker, and R. Haeb-Umbach, “NARA-WPE: A Python
    package for weighted prediction error dereverberation in Numpy and Tensorflow
    for online and offline processing,” in <i>ITG 2018, Oldenburg, Germany</i>, 2018.'
  mla: 'Drude, Lukas, et al. “NARA-WPE: A Python Package for Weighted Prediction Error
    Dereverberation in Numpy and Tensorflow for Online and Offline Processing.” <i>ITG
    2018, Oldenburg, Germany</i>, 2018.'
  short: 'L. Drude, J. Heymann, C. Boeddeker, R. Haeb-Umbach, in: ITG 2018, Oldenburg,
    Germany, 2018.'
date_created: 2019-07-12T05:29:54Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/ITG_2018_Drude_Paper.pdf
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: ITG 2018, Oldenburg, Germany
related_material:
  link:
  - description: Poster
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/ITG_2018_Drude_Poster.pdf
status: public
title: 'NARA-WPE: A Python package for weighted prediction error dereverberation in
  Numpy and Tensorflow for online and offline processing'
type: conference
user_id: '40767'
year: '2018'
...
---
_id: '29923'
abstract:
- lang: eng
  text: "This paper introduces a new open source platform for end-toend speech processing
    named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition
    (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and Py-Torch,
    as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style
    for data processing, feature extraction/format, and recipes to provide a complete
    setup for speech recognition and other speech processing experiments. This paper
    explains a major architecture of this software platform, several important functionalities,
    which differentiate ESPnet from other open source ASR toolkits, and experimental
    results with\r\nmajor ASR benchmarks."
author:
- first_name: Shinji
  full_name: Watanabe, Shinji
  last_name: Watanabe
- first_name: Takaaki
  full_name: Hori, Takaaki
  last_name: Hori
- first_name: Shigeki
  full_name: Karita, Shigeki
  last_name: Karita
- first_name: Tomoki
  full_name: Hayashi, Tomoki
  last_name: Hayashi
- first_name: Jiro
  full_name: Nishitoba, Jiro
  last_name: Nishitoba
- first_name: Yuya
  full_name: Unno, Yuya
  last_name: Unno
- first_name: Nelson
  full_name: Enrique Yalta Soplin, Nelson
  last_name: Enrique Yalta Soplin
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Matthew
  full_name: Wiesner, Matthew
  last_name: Wiesner
- first_name: Nanxin
  full_name: Chen, Nanxin
  last_name: Chen
- first_name: Adithya
  full_name: Renduchintala, Adithya
  last_name: Renduchintala
- first_name: Tsubasa
  full_name: Ochiai, Tsubasa
  last_name: Ochiai
citation:
  ama: 'Watanabe S, Hori T, Karita S, et al. ESPnet: End-to-End Speech Processing
    Toolkit. In: <i>INTERSPEECH 2018, Hyderabad, India</i>. ; 2018:2207–2211. doi:<a
    href="https://doi.org/10.21437/Interspeech.2018-1456">10.21437/Interspeech.2018-1456</a>'
  apa: 'Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y.,
    Enrique Yalta Soplin, N., Heymann, J., Wiesner, M., Chen, N., Renduchintala, A.,
    &#38; Ochiai, T. (2018). ESPnet: End-to-End Speech Processing Toolkit. <i>INTERSPEECH
    2018, Hyderabad, India</i>, 2207–2211. <a href="https://doi.org/10.21437/Interspeech.2018-1456">https://doi.org/10.21437/Interspeech.2018-1456</a>'
  bibtex: '@inproceedings{Watanabe_Hori_Karita_Hayashi_Nishitoba_Unno_Enrique Yalta
    Soplin_Heymann_Wiesner_Chen_et al._2018, title={ESPnet: End-to-End Speech Processing
    Toolkit}, DOI={<a href="https://doi.org/10.21437/Interspeech.2018-1456">10.21437/Interspeech.2018-1456</a>},
    booktitle={INTERSPEECH 2018, Hyderabad, India}, author={Watanabe, Shinji and Hori,
    Takaaki and Karita, Shigeki and Hayashi, Tomoki and Nishitoba, Jiro and Unno,
    Yuya and Enrique Yalta Soplin, Nelson and Heymann, Jahn and Wiesner, Matthew and
    Chen, Nanxin and et al.}, year={2018}, pages={2207–2211} }'
  chicago: 'Watanabe, Shinji, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba,
    Yuya Unno, Nelson Enrique Yalta Soplin, et al. “ESPnet: End-to-End Speech Processing
    Toolkit.” In <i>INTERSPEECH 2018, Hyderabad, India</i>, 2207–2211, 2018. <a href="https://doi.org/10.21437/Interspeech.2018-1456">https://doi.org/10.21437/Interspeech.2018-1456</a>.'
  ieee: 'S. Watanabe <i>et al.</i>, “ESPnet: End-to-End Speech Processing Toolkit,”
    in <i>INTERSPEECH 2018, Hyderabad, India</i>, 2018, pp. 2207–2211, doi: <a href="https://doi.org/10.21437/Interspeech.2018-1456">10.21437/Interspeech.2018-1456</a>.'
  mla: 'Watanabe, Shinji, et al. “ESPnet: End-to-End Speech Processing Toolkit.” <i>INTERSPEECH
    2018, Hyderabad, India</i>, 2018, pp. 2207–2211, doi:<a href="https://doi.org/10.21437/Interspeech.2018-1456">10.21437/Interspeech.2018-1456</a>.'
  short: 'S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N. Enrique
    Yalta Soplin, J. Heymann, M. Wiesner, N. Chen, A. Renduchintala, T. Ochiai, in:
    INTERSPEECH 2018, Hyderabad, India, 2018, pp. 2207–2211.'
date_created: 2022-02-21T10:34:37Z
date_updated: 2023-01-11T11:23:19Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.21437/Interspeech.2018-1456
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2022-02-23T08:03:13Z
  date_updated: 2022-02-23T08:03:13Z
  file_id: '29954'
  file_name: INTERSPEECH_2018_Heymann_Paper.pdf
  file_size: 288907
  relation: main_file
file_date_updated: 2022-02-23T08:03:13Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
page: 2207–2211
publication: INTERSPEECH 2018, Hyderabad, India
status: public
title: 'ESPnet: End-to-End Speech Processing Toolkit'
type: conference
user_id: '59789'
year: '2018'
...
---
_id: '11876'
abstract:
- lang: eng
  text: This paper describes the systems for the single-array track and the multiple-array
    track of the 5th CHiME Challenge. The final system is a combination of multiple
    systems, using Confusion Network Combination (CNC). The different systems presented
    here are utilizing different front-ends and training sets for a Bidirectional
    Long Short-Term Memory (BLSTM) Acoustic Model (AM). The front-end was replaced
    by enhancements provided by Paderborn University [1]. The back-end has been implemented
    using RASR [2] and RETURNN [3]. Additionally, a system combination including the
    hypothesis word graphs from the system of the submission [1] has been performed,
    which results in the final best system.
author:
- first_name: Markus
  full_name: Kitza, Markus
  last_name: Kitza
- first_name: Wilfried
  full_name: Michel, Wilfried
  last_name: Michel
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Tobias
  full_name: Menne, Tobias
  last_name: Menne
- first_name: Ralf
  full_name: Schlüter, Ralf
  last_name: Schlüter
- first_name: Hermann
  full_name: Ney, Hermann
  last_name: Ney
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Kitza M, Michel W, Boeddeker C, et al. The RWTH/UPB System Combination for
    the CHiME 2018 Workshop. In: <i>Proc. CHiME 2018 Workshop on Speech Processing
    in Everyday Environments, Hyderabad, India</i>. ; 2018.'
  apa: Kitza, M., Michel, W., Boeddeker, C., Heitkaemper, J., Menne, T., Schlüter,
    R., Ney, H., Schmalenstroeer, J., Drude, L., Heymann, J., &#38; Haeb-Umbach, R.
    (2018). The RWTH/UPB System Combination for the CHiME 2018 Workshop. <i>Proc.
    CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad,
    India</i>.
  bibtex: '@inproceedings{Kitza_Michel_Boeddeker_Heitkaemper_Menne_Schlüter_Ney_Schmalenstroeer_Drude_Heymann_et
    al._2018, title={The RWTH/UPB System Combination for the CHiME 2018 Workshop},
    booktitle={Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments,
    Hyderabad, India}, author={Kitza, Markus and Michel, Wilfried and Boeddeker, Christoph
    and Heitkaemper, Jens and Menne, Tobias and Schlüter, Ralf and Ney, Hermann and
    Schmalenstroeer, Joerg and Drude, Lukas and Heymann, Jahn and et al.}, year={2018}
    }'
  chicago: Kitza, Markus, Wilfried Michel, Christoph Boeddeker, Jens Heitkaemper,
    Tobias Menne, Ralf Schlüter, Hermann Ney, et al. “The RWTH/UPB System Combination
    for the CHiME 2018 Workshop.” In <i>Proc. CHiME 2018 Workshop on Speech Processing
    in Everyday Environments, Hyderabad, India</i>, 2018.
  ieee: M. Kitza <i>et al.</i>, “The RWTH/UPB System Combination for the CHiME 2018
    Workshop,” 2018.
  mla: Kitza, Markus, et al. “The RWTH/UPB System Combination for the CHiME 2018 Workshop.”
    <i>Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad,
    India</i>, 2018.
  short: 'M. Kitza, W. Michel, C. Boeddeker, J. Heitkaemper, T. Menne, R. Schlüter,
    H. Ney, J. Schmalenstroeer, L. Drude, J. Heymann, R. Haeb-Umbach, in: Proc. CHiME
    2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India,
    2018.'
date_created: 2019-07-12T05:29:58Z
date_updated: 2023-10-26T08:12:14Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Heitkaemper_RWTH_Paper.pdf
oa: '1'
publication: Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments,
  Hyderabad, India
quality_controlled: '1'
status: public
title: The RWTH/UPB System Combination for the CHiME 2018 Workshop
type: conference
user_id: '460'
year: '2018'
...
---
_id: '11735'
abstract:
- lang: eng
  text: This report describes the computation of gradients by algorithmic differentiation
    for statistically optimum beamforming operations. Especially the derivation of
    complex-valued functions is a key component of this approach. Therefore the real-valued
    algorithmic differentiation is extended via the complex-valued chain rule. In
    addition to the basic mathematic operations the derivative of the eigenvalue problem
    with complex-valued eigenvectors is one of the key results of this report. The
    potential of this approach is shown with experimental results on the CHiME-3 challenge
    database. There, the beamforming task is used as a front-end for an ASR system.
    With the developed derivatives a joint optimization of a speech enhancement and
    speech recognition system w.r.t. the recognition optimization criterion is possible.
author:
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Patrick
  full_name: Hanebrink, Patrick
  last_name: Hanebrink
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Boeddeker C, Hanebrink P, Drude L, Heymann J, Haeb-Umbach R. <i>On the Computation
    of Complex-Valued Gradients with Application to Statistically Optimum Beamforming</i>.;
    2017.
  apa: Boeddeker, C., Hanebrink, P., Drude, L., Heymann, J., &#38; Haeb-Umbach, R.
    (2017). <i>On the Computation of Complex-valued Gradients with Application to
    Statistically Optimum Beamforming</i>.
  bibtex: '@book{Boeddeker_Hanebrink_Drude_Heymann_Haeb-Umbach_2017, title={On the
    Computation of Complex-valued Gradients with Application to Statistically Optimum
    Beamforming}, author={Boeddeker, Christoph and Hanebrink, Patrick and Drude, Lukas
    and Heymann, Jahn and Haeb-Umbach, Reinhold}, year={2017} }'
  chicago: Boeddeker, Christoph, Patrick Hanebrink, Lukas Drude, Jahn Heymann, and
    Reinhold Haeb-Umbach. <i>On the Computation of Complex-Valued Gradients with Application
    to Statistically Optimum Beamforming</i>, 2017.
  ieee: C. Boeddeker, P. Hanebrink, L. Drude, J. Heymann, and R. Haeb-Umbach, <i>On
    the Computation of Complex-valued Gradients with Application to Statistically
    Optimum Beamforming</i>. 2017.
  mla: Boeddeker, Christoph, et al. <i>On the Computation of Complex-Valued Gradients
    with Application to Statistically Optimum Beamforming</i>. 2017.
  short: C. Boeddeker, P. Hanebrink, L. Drude, J. Heymann, R. Haeb-Umbach, On the
    Computation of Complex-Valued Gradients with Application to Statistically Optimum
    Beamforming, 2017.
date_created: 2019-07-12T05:27:15Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2017/ArXiv_2017_BoeddekerHanebrinkHaeb_Article.pdf
oa: '1'
status: public
title: On the Computation of Complex-valued Gradients with Application to Statistically
  Optimum Beamforming
type: report
user_id: '40767'
year: '2017'
...
---
_id: '11736'
abstract:
- lang: eng
  text: In this paper we show how a neural network for spectral mask estimation for
    an acoustic beamformer can be optimized by algorithmic differentiation. Using
    the beamformer output SNR as the objective function to maximize, the gradient
    is propagated through the beamformer all the way to the neural network which provides
    the clean speech and noise masks from which the beamformer coefficients are estimated
    by eigenvalue decomposition. A key theoretical result is the derivative of an
    eigenvalue problem involving complex-valued eigenvectors. Experimental results
    on the CHiME-3 challenge database demonstrate the effectiveness of the approach.
    The tools developed in this paper are a key component for an end-to-end optimization
    of speech enhancement and speech recognition.
author:
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Patrick
  full_name: Hanebrink, Patrick
  last_name: Hanebrink
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Boeddeker C, Hanebrink P, Drude L, Heymann J, Haeb-Umbach R. Optimizing Neural-Network
    Supported Acoustic Beamforming by Algorithmic Differentiation. In: <i>Proc. IEEE
    Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>. ; 2017.'
  apa: Boeddeker, C., Hanebrink, P., Drude, L., Heymann, J., &#38; Haeb-Umbach, R.
    (2017). Optimizing Neural-Network Supported Acoustic Beamforming by Algorithmic
    Differentiation. In <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal
    Processing (ICASSP)</i>.
  bibtex: '@inproceedings{Boeddeker_Hanebrink_Drude_Heymann_Haeb-Umbach_2017, title={Optimizing
    Neural-Network Supported Acoustic Beamforming by Algorithmic Differentiation},
    booktitle={Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)},
    author={Boeddeker, Christoph and Hanebrink, Patrick and Drude, Lukas and Heymann,
    Jahn and Haeb-Umbach, Reinhold}, year={2017} }'
  chicago: Boeddeker, Christoph, Patrick Hanebrink, Lukas Drude, Jahn Heymann, and
    Reinhold Haeb-Umbach. “Optimizing Neural-Network Supported Acoustic Beamforming
    by Algorithmic Differentiation.” In <i>Proc. IEEE Intl. Conf. on Acoustics, Speech
    and Signal Processing (ICASSP)</i>, 2017.
  ieee: C. Boeddeker, P. Hanebrink, L. Drude, J. Heymann, and R. Haeb-Umbach, “Optimizing
    Neural-Network Supported Acoustic Beamforming by Algorithmic Differentiation,”
    in <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>,
    2017.
  mla: Boeddeker, Christoph, et al. “Optimizing Neural-Network Supported Acoustic
    Beamforming by Algorithmic Differentiation.” <i>Proc. IEEE Intl. Conf. on Acoustics,
    Speech and Signal Processing (ICASSP)</i>, 2017.
  short: 'C. Boeddeker, P. Hanebrink, L. Drude, J. Heymann, R. Haeb-Umbach, in: Proc.
    IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017.'
date_created: 2019-07-12T05:27:16Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2017/icassp_2017_boeddeker_paper.pdf
oa: '1'
publication: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)
status: public
title: Optimizing Neural-Network Supported Acoustic Beamforming by Algorithmic Differentiation
type: conference
user_id: '44006'
year: '2017'
...
---
_id: '11809'
abstract:
- lang: eng
  text: This paper presents an end-to-end training approach for a beamformer-supported
    multi-channel ASR system. A neural network which estimates masks for a statistically
    optimum beamformer is jointly trained with a network for acoustic modeling. To
    update its parameters, we propagate the gradients from the acoustic model all
    the way through feature extraction and the complex valued beamforming operation.
    Besides avoiding a mismatch between the front-end and the back-end, this approach
    also eliminates the need for stereo data, i.e., the parallel availability of clean
    and noisy versions of the signals. Instead, it can be trained with real noisy
    multichannel data only. Also, relying on the signal statistics for beamforming,
    the approach makes no assumptions on the configuration of the microphone array.
    We further observe a performance gain through joint training in terms of word
    error rate in an evaluation of the system on the CHiME 4 dataset.
author:
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Patrick
  full_name: Hanebrink, Patrick
  last_name: Hanebrink
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Heymann J, Drude L, Boeddeker C, Hanebrink P, Haeb-Umbach R. BEAMNET: End-to-End
    Training of a Beamformer-Supported Multi-Channel ASR System. In: <i>Proc. IEEE
    Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>. ; 2017.'
  apa: 'Heymann, J., Drude, L., Boeddeker, C., Hanebrink, P., &#38; Haeb-Umbach, R.
    (2017). BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR
    System. In <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing
    (ICASSP)</i>.'
  bibtex: '@inproceedings{Heymann_Drude_Boeddeker_Hanebrink_Haeb-Umbach_2017, title={BEAMNET:
    End-to-End Training of a Beamformer-Supported Multi-Channel ASR System}, booktitle={Proc.
    IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)}, author={Heymann,
    Jahn and Drude, Lukas and Boeddeker, Christoph and Hanebrink, Patrick and Haeb-Umbach,
    Reinhold}, year={2017} }'
  chicago: 'Heymann, Jahn, Lukas Drude, Christoph Boeddeker, Patrick Hanebrink, and
    Reinhold Haeb-Umbach. “BEAMNET: End-to-End Training of a Beamformer-Supported
    Multi-Channel ASR System.” In <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and
    Signal Processing (ICASSP)</i>, 2017.'
  ieee: 'J. Heymann, L. Drude, C. Boeddeker, P. Hanebrink, and R. Haeb-Umbach, “BEAMNET:
    End-to-End Training of a Beamformer-Supported Multi-Channel ASR System,” in <i>Proc.
    IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2017.'
  mla: 'Heymann, Jahn, et al. “BEAMNET: End-to-End Training of a Beamformer-Supported
    Multi-Channel ASR System.” <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and
    Signal Processing (ICASSP)</i>, 2017.'
  short: 'J. Heymann, L. Drude, C. Boeddeker, P. Hanebrink, R. Haeb-Umbach, in: Proc.
    IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017.'
date_created: 2019-07-12T05:28:40Z
date_updated: 2022-01-06T06:51:09Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2017/icassp_2017_heymann_paper.pdf
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)
related_material:
  link:
  - description: Poster
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2017/icassp_2017_heymann_poster.pdf
status: public
title: 'BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System'
type: conference
user_id: '40767'
year: '2017'
...
---
_id: '11811'
abstract:
- lang: eng
  text: 'Acoustic beamforming can greatly improve the performance of Automatic Speech
    Recognition (ASR) and speech enhancement systems when multiple channels are available.
    We recently proposed a way to support the model-based Generalized Eigenvalue beamforming
    operation with a powerful neural network for spectral mask estimation. The enhancement
    system has a number of desirable properties. In particular, neither assumptions
    need to be made about the nature of the acoustic transfer function (e.g., being
    anechonic), nor does the array configuration need to be known. While the system
    has been originally developed to enhance speech in noisy environments, we show
    in this article that it is also effective in suppressing reverberation, thus leading
    to a generic trainable multi-channel speech enhancement system for robust speech
    processing. To support this claim, we consider two distinct datasets: The CHiME
    3 challenge, which features challenging real-world noise distortions, and the
    Reverb challenge, which focuses on distortions caused by reverberation. We evaluate
    the system both with respect to a speech enhancement and a recognition task. For
    the first task we propose a new way to cope with the distortions introduced by
    the Generalized Eigenvalue beamformer by renormalizing the target energy for each
    frequency bin, and measure its effectiveness in terms of the PESQ score. For the
    latter we feed the enhanced signal to a strong DNN back-end and achieve state-of-the-art
    ASR results on both datasets. We further experiment with different network architectures
    for spectral mask estimation: One small feed-forward network with only one hidden
    layer, one Convolutional Neural Network and one bi-directional Long Short-Term
    Memory network, showing that even a small network is capable of delivering significant
    performance improvements.'
author:
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Heymann J, Drude L, Haeb-Umbach R. A Generic Neural Acoustic Beamforming Architecture
    for Robust Multi-Channel Speech Processing. <i>Computer Speech and Language</i>.
    2017.
  apa: Heymann, J., Drude, L., &#38; Haeb-Umbach, R. (2017). A Generic Neural Acoustic
    Beamforming Architecture for Robust Multi-Channel Speech Processing. <i>Computer
    Speech and Language</i>.
  bibtex: '@article{Heymann_Drude_Haeb-Umbach_2017, title={A Generic Neural Acoustic
    Beamforming Architecture for Robust Multi-Channel Speech Processing}, journal={Computer
    Speech and Language}, author={Heymann, Jahn and Drude, Lukas and Haeb-Umbach,
    Reinhold}, year={2017} }'
  chicago: Heymann, Jahn, Lukas Drude, and Reinhold Haeb-Umbach. “A Generic Neural
    Acoustic Beamforming Architecture for Robust Multi-Channel Speech Processing.”
    <i>Computer Speech and Language</i>, 2017.
  ieee: J. Heymann, L. Drude, and R. Haeb-Umbach, “A Generic Neural Acoustic Beamforming
    Architecture for Robust Multi-Channel Speech Processing,” <i>Computer Speech and
    Language</i>, 2017.
  mla: Heymann, Jahn, et al. “A Generic Neural Acoustic Beamforming Architecture for
    Robust Multi-Channel Speech Processing.” <i>Computer Speech and Language</i>,
    2017.
  short: J. Heymann, L. Drude, R. Haeb-Umbach, Computer Speech and Language (2017).
date_created: 2019-07-12T05:28:43Z
date_updated: 2022-01-06T06:51:09Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2017/ComputerSpeechLanguage_2017_heymann_paper.pdf
oa: '1'
publication: Computer Speech and Language
status: public
title: A Generic Neural Acoustic Beamforming Architecture for Robust Multi-Channel
  Speech Processing
type: journal_article
user_id: '44006'
year: '2017'
...
---
_id: '11759'
abstract:
- lang: eng
  text: 'Variational Autoencoders (VAEs) have been shown to provide efficient neural-network-based
    approximate Bayesian inference for observation models for which exact inference
    is intractable. Its extension, the so-called Structured VAE (SVAE) allows inference
    in the presence of both discrete and continuous latent variables. Inspired by
    this extension, we developed a VAE with Hidden Markov Models (HMMs) as latent
    models. We applied the resulting HMM-VAE to the task of acoustic unit discovery
    in a zero resource scenario. Starting from an initial model based on variational
    inference in an HMM with Gaussian Mixture Model (GMM) emission probabilities,
    the accuracy of the acoustic unit discovery could be significantly improved by
    the HMM-VAE. In doing so we were able to demonstrate for an unsupervised learning
    task what is well-known in the supervised learning case: Neural networks provide
    superior modeling power compared to GMMs.'
author:
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Thomas
  full_name: Glarner, Thomas
  id: '14169'
  last_name: Glarner
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Bhiksha
  full_name: Raj, Bhiksha
  last_name: Raj
citation:
  ama: 'Ebbers J, Heymann J, Drude L, Glarner T, Haeb-Umbach R, Raj B. Hidden Markov
    Model Variational Autoencoder for Acoustic Unit Discovery. In: <i>INTERSPEECH
    2017, Stockholm, Schweden</i>. ; 2017.'
  apa: Ebbers, J., Heymann, J., Drude, L., Glarner, T., Haeb-Umbach, R., &#38; Raj,
    B. (2017). Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery.
    <i>INTERSPEECH 2017, Stockholm, Schweden</i>.
  bibtex: '@inproceedings{Ebbers_Heymann_Drude_Glarner_Haeb-Umbach_Raj_2017, title={Hidden
    Markov Model Variational Autoencoder for Acoustic Unit Discovery}, booktitle={INTERSPEECH
    2017, Stockholm, Schweden}, author={Ebbers, Janek and Heymann, Jahn and Drude,
    Lukas and Glarner, Thomas and Haeb-Umbach, Reinhold and Raj, Bhiksha}, year={2017}
    }'
  chicago: Ebbers, Janek, Jahn Heymann, Lukas Drude, Thomas Glarner, Reinhold Haeb-Umbach,
    and Bhiksha Raj. “Hidden Markov Model Variational Autoencoder for Acoustic Unit
    Discovery.” In <i>INTERSPEECH 2017, Stockholm, Schweden</i>, 2017.
  ieee: J. Ebbers, J. Heymann, L. Drude, T. Glarner, R. Haeb-Umbach, and B. Raj, “Hidden
    Markov Model Variational Autoencoder for Acoustic Unit Discovery,” 2017.
  mla: Ebbers, Janek, et al. “Hidden Markov Model Variational Autoencoder for Acoustic
    Unit Discovery.” <i>INTERSPEECH 2017, Stockholm, Schweden</i>, 2017.
  short: 'J. Ebbers, J. Heymann, L. Drude, T. Glarner, R. Haeb-Umbach, B. Raj, in:
    INTERSPEECH 2017, Stockholm, Schweden, 2017.'
date_created: 2019-07-12T05:27:42Z
date_updated: 2023-11-22T08:29:06Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2017/INTERSPEECH_2017_Ebbers_paper.pdf
oa: '1'
publication: INTERSPEECH 2017, Stockholm, Schweden
quality_controlled: '1'
related_material:
  link:
  - description: Poster
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2017/INTERSPEECH_2017_Ebbers_poster.pdf
  - description: Slides
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2017/INTERSPEECH_2017_Ebbers_slides.pdf
status: public
title: Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery
type: conference
user_id: '34851'
year: '2017'
...
---
_id: '11895'
abstract:
- lang: eng
  text: Multi-channel speech enhancement algorithms rely on a synchronous sampling
    of the microphone signals. This, however, cannot always be guaranteed, especially
    if the sensors are distributed in an environment. To avoid performance degradation
    the sampling rate offset needs to be estimated and compensated for. In this contribution
    we extend the recently proposed coherence drift based method in two important
    directions. First, the increasing phase shift in the short-time Fourier transform
    domain is estimated from the coherence drift in a Matched Filterlike fashion,
    where intermediate estimates are weighted by their instantaneous SNR. Second,
    an observed bias is removed by iterating between offset estimation and compensation
    by resampling a couple of times. The effectiveness of the proposed method is demonstrated
    by speech recognition results on the output of a beamformer with and without sampling
    rate offset compensation between the input channels. We compare MVDR and maximum-SNR
    beamformers in reverberant environments and further show that both benefit from
    a novel phase normalization, which we also propose in this contribution.
author:
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Schmalenstroeer J, Heymann J, Drude L, Boeddeker C, Haeb-Umbach R. Multi-Stage
    Coherence Drift Based Sampling Rate Synchronization for Acoustic Beamforming.
    In: <i>IEEE 19th International Workshop on Multimedia Signal Processing (MMSP)</i>.
    ; 2017.'
  apa: Schmalenstroeer, J., Heymann, J., Drude, L., Boeddeker, C., &#38; Haeb-Umbach,
    R. (2017). Multi-Stage Coherence Drift Based Sampling Rate Synchronization for
    Acoustic Beamforming. <i>IEEE 19th International Workshop on Multimedia Signal
    Processing (MMSP)</i>.
  bibtex: '@inproceedings{Schmalenstroeer_Heymann_Drude_Boeddeker_Haeb-Umbach_2017,
    title={Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic
    Beamforming}, booktitle={IEEE 19th International Workshop on Multimedia Signal
    Processing (MMSP)}, author={Schmalenstroeer, Joerg and Heymann, Jahn and Drude,
    Lukas and Boeddeker, Christoph and Haeb-Umbach, Reinhold}, year={2017} }'
  chicago: Schmalenstroeer, Joerg, Jahn Heymann, Lukas Drude, Christoph Boeddeker,
    and Reinhold Haeb-Umbach. “Multi-Stage Coherence Drift Based Sampling Rate Synchronization
    for Acoustic Beamforming.” In <i>IEEE 19th International Workshop on Multimedia
    Signal Processing (MMSP)</i>, 2017.
  ieee: J. Schmalenstroeer, J. Heymann, L. Drude, C. Boeddeker, and R. Haeb-Umbach,
    “Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic
    Beamforming,” 2017.
  mla: Schmalenstroeer, Joerg, et al. “Multi-Stage Coherence Drift Based Sampling
    Rate Synchronization for Acoustic Beamforming.” <i>IEEE 19th International Workshop
    on Multimedia Signal Processing (MMSP)</i>, 2017.
  short: 'J. Schmalenstroeer, J. Heymann, L. Drude, C. Boeddeker, R. Haeb-Umbach,
    in: IEEE 19th International Workshop on Multimedia Signal Processing (MMSP), 2017.'
date_created: 2019-07-12T05:30:20Z
date_updated: 2023-10-26T08:12:05Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2017/MMSP_2017_SchHaeb.pdf
oa: '1'
publication: IEEE 19th International Workshop on Multimedia Signal Processing (MMSP)
quality_controlled: '1'
related_material:
  link:
  - description: Poster
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2017/MMSP_2017_SchHaeb_poster.pdf
status: public
title: Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic
  Beamforming
type: conference
user_id: '460'
year: '2017'
...
---
_id: '11744'
abstract:
- lang: eng
  text: A noise power spectral density (PSD) estimation is an indispensable component
    of speech spectral enhancement systems. In this paper we present a noise PSD tracking
    algorithm, which employs a noise presence probability estimate delivered by a
    deep neural network (DNN). The algorithm provides a causal noise PSD estimate
    and can thus be used in speech enhancement systems for communication purposes.
    An extensive performance comparison has been carried out with ten causal state-of-the-art
    noise tracking algorithms taken from the literature and categorized acc. to applied
    techniques. The experiments showed that the proposed DNN-based noise PSD tracker
    outperforms all competing methods with respect to all tested performance measures,
    which include the noise tracking performance and the performance of a speech enhancement
    system employing the noise tracking component.
author:
- first_name: Aleksej
  full_name: Chinaev, Aleksej
  last_name: Chinaev
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Chinaev A, Heymann J, Drude L, Haeb-Umbach R. Noise-Presence-Probability-Based
    Noise PSD Estimation by Using DNNs. In: <i>12. ITG Fachtagung Sprachkommunikation
    (ITG 2016)</i>. ; 2016.'
  apa: Chinaev, A., Heymann, J., Drude, L., &#38; Haeb-Umbach, R. (2016). Noise-Presence-Probability-Based
    Noise PSD Estimation by Using DNNs. In <i>12. ITG Fachtagung Sprachkommunikation
    (ITG 2016)</i>.
  bibtex: '@inproceedings{Chinaev_Heymann_Drude_Haeb-Umbach_2016, title={Noise-Presence-Probability-Based
    Noise PSD Estimation by Using DNNs}, booktitle={12. ITG Fachtagung Sprachkommunikation
    (ITG 2016)}, author={Chinaev, Aleksej and Heymann, Jahn and Drude, Lukas and Haeb-Umbach,
    Reinhold}, year={2016} }'
  chicago: Chinaev, Aleksej, Jahn Heymann, Lukas Drude, and Reinhold Haeb-Umbach.
    “Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs.” In <i>12.
    ITG Fachtagung Sprachkommunikation (ITG 2016)</i>, 2016.
  ieee: A. Chinaev, J. Heymann, L. Drude, and R. Haeb-Umbach, “Noise-Presence-Probability-Based
    Noise PSD Estimation by Using DNNs,” in <i>12. ITG Fachtagung Sprachkommunikation
    (ITG 2016)</i>, 2016.
  mla: Chinaev, Aleksej, et al. “Noise-Presence-Probability-Based Noise PSD Estimation
    by Using DNNs.” <i>12. ITG Fachtagung Sprachkommunikation (ITG 2016)</i>, 2016.
  short: 'A. Chinaev, J. Heymann, L. Drude, R. Haeb-Umbach, in: 12. ITG Fachtagung
    Sprachkommunikation (ITG 2016), 2016.'
date_created: 2019-07-12T05:27:25Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2016/ChHeyDrHa16.pdf
oa: '1'
publication: 12. ITG Fachtagung Sprachkommunikation (ITG 2016)
related_material:
  link:
  - description: Presentation
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2016/ChHeyDrHa16_Presentation.pdf
status: public
title: Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11812'
author:
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Heymann J, Drude L, Haeb-Umbach R. Neural Network Based Spectral Mask Estimation
    for Acoustic Beamforming. In: <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and
    Signal Processing (ICASSP)</i>. ; 2016.'
  apa: Heymann, J., Drude, L., &#38; Haeb-Umbach, R. (2016). Neural Network Based
    Spectral Mask Estimation for Acoustic Beamforming. In <i>Proc. IEEE Intl. Conf.
    on Acoustics, Speech and Signal Processing (ICASSP)</i>.
  bibtex: '@inproceedings{Heymann_Drude_Haeb-Umbach_2016, title={Neural Network Based
    Spectral Mask Estimation for Acoustic Beamforming}, booktitle={Proc. IEEE Intl.
    Conf. on Acoustics, Speech and Signal Processing (ICASSP)}, author={Heymann, Jahn
    and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2016} }'
  chicago: Heymann, Jahn, Lukas Drude, and Reinhold Haeb-Umbach. “Neural Network Based
    Spectral Mask Estimation for Acoustic Beamforming.” In <i>Proc. IEEE Intl. Conf.
    on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2016.
  ieee: J. Heymann, L. Drude, and R. Haeb-Umbach, “Neural Network Based Spectral Mask
    Estimation for Acoustic Beamforming,” in <i>Proc. IEEE Intl. Conf. on Acoustics,
    Speech and Signal Processing (ICASSP)</i>, 2016.
  mla: Heymann, Jahn, et al. “Neural Network Based Spectral Mask Estimation for Acoustic
    Beamforming.” <i>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing
    (ICASSP)</i>, 2016.
  short: 'J. Heymann, L. Drude, R. Haeb-Umbach, in: Proc. IEEE Intl. Conf. on Acoustics,
    Speech and Signal Processing (ICASSP), 2016.'
date_created: 2019-07-12T05:28:44Z
date_updated: 2022-01-06T06:51:09Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_heymann_paper.pdf
oa: '1'
publication: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)
related_material:
  link:
  - description: Slides
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_heymann_slides.pdf
status: public
title: Neural Network Based Spectral Mask Estimation for Acoustic Beamforming
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11834'
abstract:
- lang: eng
  text: We present a system for the 4th CHiME challenge which significantly increases
    the performance for all three tracks with respect to the provided baseline system.
    The front-end uses a bi-directional Long Short-Term Memory (BLSTM)-based neural
    network to estimate signal statistics. These then steer a Generalized Eigenvalue
    beamformer. The back-end consists of a 22 layer deep Wide Residual Network and
    two extra BLSTM layers. Working on a whole utterance instead of frames allows
    us to refine Batch-Normalization. We also train our own BLSTM-based language model.
    Adding a discriminative speaker adaptation leads to further gains. The final system
    achieves a word error rate on the six channel real test data of 3.48%. For the
    two channel track we achieve 5.96% and for the one channel track 9.34%. This is
    the best reported performance on the challenge achieved by a single system, i.e.,
    a configuration, which does not combine multiple systems. At the same time, our
    system is independent of the microphone configuration. We can thus use the same
    components for all three tracks.
author:
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Heymann J, Drude L, Haeb-Umbach R. Wide Residual BLSTM Network with Discriminative
    Speaker Adaptation for Robust Speech Recognition. In: <i>Computer Speech and Language</i>.
    ; 2016.'
  apa: Heymann, J., Drude, L., &#38; Haeb-Umbach, R. (2016). Wide Residual BLSTM Network
    with Discriminative Speaker Adaptation for Robust Speech Recognition. In <i>Computer
    Speech and Language</i>.
  bibtex: '@inproceedings{Heymann_Drude_Haeb-Umbach_2016, title={Wide Residual BLSTM
    Network with Discriminative Speaker Adaptation for Robust Speech Recognition},
    booktitle={Computer Speech and Language}, author={Heymann, Jahn and Drude, Lukas
    and Haeb-Umbach, Reinhold}, year={2016} }'
  chicago: Heymann, Jahn, Lukas Drude, and Reinhold Haeb-Umbach. “Wide Residual BLSTM
    Network with Discriminative Speaker Adaptation for Robust Speech Recognition.”
    In <i>Computer Speech and Language</i>, 2016.
  ieee: J. Heymann, L. Drude, and R. Haeb-Umbach, “Wide Residual BLSTM Network with
    Discriminative Speaker Adaptation for Robust Speech Recognition,” in <i>Computer
    Speech and Language</i>, 2016.
  mla: Heymann, Jahn, et al. “Wide Residual BLSTM Network with Discriminative Speaker
    Adaptation for Robust Speech Recognition.” <i>Computer Speech and Language</i>,
    2016.
  short: 'J. Heymann, L. Drude, R. Haeb-Umbach, in: Computer Speech and Language,
    2016.'
date_created: 2019-07-12T05:29:09Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2016/chime4_upbonly_paper.pdf
oa: '1'
publication: Computer Speech and Language
related_material:
  link:
  - description: Poster
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2016/chime4_upbonly_poster.pdf
status: public
title: Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust
  Speech Recognition
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11908'
abstract:
- lang: eng
  text: 'This paper describes automatic speech recognition (ASR) systems developed
    jointly by RWTH, UPB and FORTH for the 1ch, 2ch and 6ch track of the 4th CHiME
    Challenge. In the 2ch and 6ch tracks the final system output is obtained by a
    Confusion Network Combination (CNC) of multiple systems. The Acoustic Model (AM)
    is a deep neural network based on Bidirectional Long Short-Term Memory (BLSTM)
    units. The systems differ by front ends and training sets used for the acoustic
    training. The model for the 1ch track is trained without any preprocessing. For
    each front end we trained and evaluated individual acoustic models. We compare
    the ASR performance of different beamforming approaches: a conventional superdirective
    beamformer [1] and an MVDR beamformer as in [2], where the steering vector is
    estimated based on [3]. Furthermore we evaluated a BLSTM supported Generalized
    Eigenvalue beamformer using NN-GEV [4]. The back end is implemented using RWTH?s
    open-source toolkits RASR [5], RETURNN [6] and rwthlm [7]. We rescore lattices
    with a Long Short-Term Memory (LSTM) based language model. The overall best results
    are obtained by a system combination that includes the lattices from the system
    of UPB?s submission [8]. Our final submission scored second in each of the three
    tracks of the 4th CHiME Challenge.'
author:
- first_name: Tobias
  full_name: Menne, Tobias
  last_name: Menne
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Anastasios
  full_name: Alexandridis, Anastasios
  last_name: Alexandridis
- first_name: Kazuki
  full_name: Irie, Kazuki
  last_name: Irie
- first_name: Albert
  full_name: Zeyer, Albert
  last_name: Zeyer
- first_name: Markus
  full_name: Kitza, Markus
  last_name: Kitza
- first_name: Pavel
  full_name: Golik, Pavel
  last_name: Golik
- first_name: Ilia
  full_name: Kulikov, Ilia
  last_name: Kulikov
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Ralf
  full_name: Schlüter, Ralf
  last_name: Schlüter
- first_name: Hermann
  full_name: Ney, Hermann
  last_name: Ney
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Athanasios
  full_name: Mouchtaris, Athanasios
  last_name: Mouchtaris
citation:
  ama: 'Menne T, Heymann J, Alexandridis A, et al. The RWTH/UPB/FORTH System Combination
    for the 4th CHiME Challenge Evaluation. In: <i>Computer Speech and Language</i>.
    ; 2016.'
  apa: Menne, T., Heymann, J., Alexandridis, A., Irie, K., Zeyer, A., Kitza, M., …
    Mouchtaris, A. (2016). The RWTH/UPB/FORTH System Combination for the 4th CHiME
    Challenge Evaluation. In <i>Computer Speech and Language</i>.
  bibtex: '@inproceedings{Menne_Heymann_Alexandridis_Irie_Zeyer_Kitza_Golik_Kulikov_Drude_Schlüter_et
    al._2016, title={The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge
    Evaluation}, booktitle={Computer Speech and Language}, author={Menne, Tobias and
    Heymann, Jahn and Alexandridis, Anastasios and Irie, Kazuki and Zeyer, Albert
    and Kitza, Markus and Golik, Pavel and Kulikov, Ilia and Drude, Lukas and Schlüter,
    Ralf and et al.}, year={2016} }'
  chicago: Menne, Tobias, Jahn Heymann, Anastasios Alexandridis, Kazuki Irie, Albert
    Zeyer, Markus Kitza, Pavel Golik, et al. “The RWTH/UPB/FORTH System Combination
    for the 4th CHiME Challenge Evaluation.” In <i>Computer Speech and Language</i>,
    2016.
  ieee: T. Menne <i>et al.</i>, “The RWTH/UPB/FORTH System Combination for the 4th
    CHiME Challenge Evaluation,” in <i>Computer Speech and Language</i>, 2016.
  mla: Menne, Tobias, et al. “The RWTH/UPB/FORTH System Combination for the 4th CHiME
    Challenge Evaluation.” <i>Computer Speech and Language</i>, 2016.
  short: 'T. Menne, J. Heymann, A. Alexandridis, K. Irie, A. Zeyer, M. Kitza, P. Golik,
    I. Kulikov, L. Drude, R. Schlüter, H. Ney, R. Haeb-Umbach, A. Mouchtaris, in:
    Computer Speech and Language, 2016.'
date_created: 2019-07-12T05:30:35Z
date_updated: 2022-01-06T06:51:12Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2016/chime4_rwthupbforth_paper.pdf
oa: '1'
publication: Computer Speech and Language
status: public
title: The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11810'
author:
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Aleksej
  full_name: Chinaev, Aleksej
  last_name: Chinaev
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Heymann J, Drude L, Chinaev A, Haeb-Umbach R. BLSTM supported GEV Beamformer
    Front-End for the 3RD CHiME Challenge. In: <i>Automatic Speech Recognition and
    Understanding Workshop (ASRU 2015)</i>. ; 2015.'
  apa: Heymann, J., Drude, L., Chinaev, A., &#38; Haeb-Umbach, R. (2015). BLSTM supported
    GEV Beamformer Front-End for the 3RD CHiME Challenge. In <i>Automatic Speech Recognition
    and Understanding Workshop (ASRU 2015)</i>.
  bibtex: '@inproceedings{Heymann_Drude_Chinaev_Haeb-Umbach_2015, title={BLSTM supported
    GEV Beamformer Front-End for the 3RD CHiME Challenge}, booktitle={Automatic Speech
    Recognition and Understanding Workshop (ASRU 2015)}, author={Heymann, Jahn and
    Drude, Lukas and Chinaev, Aleksej and Haeb-Umbach, Reinhold}, year={2015} }'
  chicago: Heymann, Jahn, Lukas Drude, Aleksej Chinaev, and Reinhold Haeb-Umbach.
    “BLSTM Supported GEV Beamformer Front-End for the 3RD CHiME Challenge.” In <i>Automatic
    Speech Recognition and Understanding Workshop (ASRU 2015)</i>, 2015.
  ieee: J. Heymann, L. Drude, A. Chinaev, and R. Haeb-Umbach, “BLSTM supported GEV
    Beamformer Front-End for the 3RD CHiME Challenge,” in <i>Automatic Speech Recognition
    and Understanding Workshop (ASRU 2015)</i>, 2015.
  mla: Heymann, Jahn, et al. “BLSTM Supported GEV Beamformer Front-End for the 3RD
    CHiME Challenge.” <i>Automatic Speech Recognition and Understanding Workshop (ASRU
    2015)</i>, 2015.
  short: 'J. Heymann, L. Drude, A. Chinaev, R. Haeb-Umbach, in: Automatic Speech Recognition
    and Understanding Workshop (ASRU 2015), 2015.'
date_created: 2019-07-12T05:28:41Z
date_updated: 2022-01-06T06:51:09Z
department:
- _id: '54'
language:
- iso: eng
publication: Automatic Speech Recognition and Understanding Workshop (ASRU 2015)
status: public
title: BLSTM supported GEV Beamformer Front-End for the 3RD CHiME Challenge
type: conference
user_id: '44006'
year: '2015'
...