---
_id: '15792'
abstract:
- lang: eng
  text: In this paper we highlight the privacy risks entailed in deep neural network
    feature extraction for domestic activity monitoring. We employ the baseline system
    proposed in the Task 5 of the DCASE 2018 challenge and simulate a feature interception
    attack by an eavesdropper who wants to perform speaker identification. We then
    propose to reduce the aforementioned privacy risks by introducing a variational
    information feature extraction scheme that allows for good activity monitoring
    performance while at the same time minimizing the information of the feature representation,
    thus restricting speaker identification attempts. We analyze the resulting model’s
    composite loss function and the budget scaling factor used to control the balance
    between the performance of the trusted and attacker tasks. It is empirically demonstrated
    that the proposed method reduces speaker identification privacy risks without
    significantly deprecating the performance of domestic activity monitoring tasks.
author:
- first_name: Alexandru
  full_name: Nelus, Alexandru
  last_name: Nelus
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Rainer
  full_name: Martin, Rainer
  last_name: Martin
citation:
  ama: 'Nelus A, Ebbers J, Haeb-Umbach R, Martin R. Privacy-preserving Variational
    Information Feature Extraction for Domestic Activity Monitoring Versus Speaker
    Identification. In: <i>INTERSPEECH 2019, Graz, Austria</i>. ; 2019.'
  apa: Nelus, A., Ebbers, J., Haeb-Umbach, R., &#38; Martin, R. (2019). Privacy-preserving
    Variational Information Feature Extraction for Domestic Activity Monitoring Versus
    Speaker Identification. <i>INTERSPEECH 2019, Graz, Austria</i>.
  bibtex: '@inproceedings{Nelus_Ebbers_Haeb-Umbach_Martin_2019, title={Privacy-preserving
    Variational Information Feature Extraction for Domestic Activity Monitoring Versus
    Speaker Identification}, booktitle={INTERSPEECH 2019, Graz, Austria}, author={Nelus,
    Alexandru and Ebbers, Janek and Haeb-Umbach, Reinhold and Martin, Rainer}, year={2019}
    }'
  chicago: Nelus, Alexandru, Janek Ebbers, Reinhold Haeb-Umbach, and Rainer Martin.
    “Privacy-Preserving Variational Information Feature Extraction for Domestic Activity
    Monitoring Versus Speaker Identification.” In <i>INTERSPEECH 2019, Graz, Austria</i>,
    2019.
  ieee: A. Nelus, J. Ebbers, R. Haeb-Umbach, and R. Martin, “Privacy-preserving Variational
    Information Feature Extraction for Domestic Activity Monitoring Versus Speaker
    Identification,” 2019.
  mla: Nelus, Alexandru, et al. “Privacy-Preserving Variational Information Feature
    Extraction for Domestic Activity Monitoring Versus Speaker Identification.” <i>INTERSPEECH
    2019, Graz, Austria</i>, 2019.
  short: 'A. Nelus, J. Ebbers, R. Haeb-Umbach, R. Martin, in: INTERSPEECH 2019, Graz,
    Austria, 2019.'
date_created: 2020-02-05T10:07:53Z
date_updated: 2023-11-22T08:27:55Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2020-02-05T10:11:40Z
  date_updated: 2020-02-05T10:11:40Z
  file_id: '15793'
  file_name: INTERSPEECH_2019_Ebbers_Paper.pdf
  file_size: 454600
  relation: main_file
file_date_updated: 2020-02-05T10:11:40Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
publication: INTERSPEECH 2019, Graz, Austria
quality_controlled: '1'
status: public
title: Privacy-preserving Variational Information Feature Extraction for Domestic
  Activity Monitoring Versus Speaker Identification
type: conference
user_id: '34851'
year: '2019'
...
---
_id: '18107'
author:
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: M.
  full_name: Bacchiani, M.
  last_name: Bacchiani
- first_name: T. N.
  full_name: Sainath, T. N.
  last_name: Sainath
citation:
  ama: 'Heymann J, Bacchiani M, Sainath TN. Performance of Mask Based Statistical
    Beamforming in a Smart Home Scenario. In: <i>2018 IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP)</i>. ; 2018:6722-6726. doi:<a
    href="https://doi.org/10.1109/ICASSP.2018.8462372">10.1109/ICASSP.2018.8462372</a>'
  apa: Heymann, J., Bacchiani, M., &#38; Sainath, T. N. (2018). Performance of Mask
    Based Statistical Beamforming in a Smart Home Scenario. In <i>2018 IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP)</i> (pp. 6722–6726).
    <a href="https://doi.org/10.1109/ICASSP.2018.8462372">https://doi.org/10.1109/ICASSP.2018.8462372</a>
  bibtex: '@inproceedings{Heymann_Bacchiani_Sainath_2018, title={Performance of Mask
    Based Statistical Beamforming in a Smart Home Scenario}, DOI={<a href="https://doi.org/10.1109/ICASSP.2018.8462372">10.1109/ICASSP.2018.8462372</a>},
    booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal
    Processing (ICASSP)}, author={Heymann, Jahn and Bacchiani, M. and Sainath, T.
    N.}, year={2018}, pages={6722–6726} }'
  chicago: Heymann, Jahn, M. Bacchiani, and T. N. Sainath. “Performance of Mask Based
    Statistical Beamforming in a Smart Home Scenario.” In <i>2018 IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, 6722–26, 2018.
    <a href="https://doi.org/10.1109/ICASSP.2018.8462372">https://doi.org/10.1109/ICASSP.2018.8462372</a>.
  ieee: J. Heymann, M. Bacchiani, and T. N. Sainath, “Performance of Mask Based Statistical
    Beamforming in a Smart Home Scenario,” in <i>2018 IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2018, pp. 6722–6726.
  mla: Heymann, Jahn, et al. “Performance of Mask Based Statistical Beamforming in
    a Smart Home Scenario.” <i>2018 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)</i>, 2018, pp. 6722–26, doi:<a href="https://doi.org/10.1109/ICASSP.2018.8462372">10.1109/ICASSP.2018.8462372</a>.
  short: 'J. Heymann, M. Bacchiani, T.N. Sainath, in: 2018 IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 6722–6726.'
date_created: 2020-08-20T14:20:59Z
date_updated: 2022-01-06T06:53:26Z
department:
- _id: '54'
doi: 10.1109/ICASSP.2018.8462372
language:
- iso: eng
page: 6722-6726
publication: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing
  (ICASSP)
status: public
title: Performance of Mask Based Statistical Beamforming in a Smart Home Scenario
type: conference
user_id: '44006'
year: '2018'
...
---
_id: '11760'
abstract:
- lang: eng
  text: Acoustic event detection, i.e., the task of assigning a human interpretable
    label to a segment of audio, has only recently attracted increased interest in
    the research community. Driven by the DCASE challenges and the availability of
    large-scale audio datasets, the state-of-the-art has progressed rapidly with deep-learning-based
    classi- fiers dominating the field. Because several potential use cases favor
    a realization on distributed sensor nodes, e.g. ambient assisted living applications,
    habitat monitoring or surveillance, we are concerned with two issues here. Firstly
    the classification performance of such systems and secondly the computing resources
    required to achieve a certain performance considering node level feature extraction.
    In this contribution we look at the balance between the two criteria by employing
    traditional techniques and different deep learning architectures, including convolutional
    and recurrent models in the context of real life everyday audio recordings in
    realistic, however challenging, multisource conditions.
author:
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Alexandru
  full_name: Nelus, Alexandru
  last_name: Nelus
- first_name: Rainer
  full_name: Martin, Rainer
  last_name: Martin
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Ebbers J, Nelus A, Martin R, Haeb-Umbach R. Evaluation of Modulation-MFCC
    Features and DNN Classification for Acoustic Event Detection. In: <i>DAGA 2018,
    München</i>. ; 2018.'
  apa: Ebbers, J., Nelus, A., Martin, R., &#38; Haeb-Umbach, R. (2018). Evaluation
    of Modulation-MFCC Features and DNN Classification for Acoustic Event Detection.
    In <i>DAGA 2018, München</i>.
  bibtex: '@inproceedings{Ebbers_Nelus_Martin_Haeb-Umbach_2018, title={Evaluation
    of Modulation-MFCC Features and DNN Classification for Acoustic Event Detection},
    booktitle={DAGA 2018, München}, author={Ebbers, Janek and Nelus, Alexandru and
    Martin, Rainer and Haeb-Umbach, Reinhold}, year={2018} }'
  chicago: Ebbers, Janek, Alexandru Nelus, Rainer Martin, and Reinhold Haeb-Umbach.
    “Evaluation of Modulation-MFCC Features and DNN Classification for Acoustic Event
    Detection.” In <i>DAGA 2018, München</i>, 2018.
  ieee: J. Ebbers, A. Nelus, R. Martin, and R. Haeb-Umbach, “Evaluation of Modulation-MFCC
    Features and DNN Classification for Acoustic Event Detection,” in <i>DAGA 2018,
    München</i>, 2018.
  mla: Ebbers, Janek, et al. “Evaluation of Modulation-MFCC Features and DNN Classification
    for Acoustic Event Detection.” <i>DAGA 2018, München</i>, 2018.
  short: 'J. Ebbers, A. Nelus, R. Martin, R. Haeb-Umbach, in: DAGA 2018, München,
    2018.'
date_created: 2019-07-12T05:27:43Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/Daga_2018_Ebbers_Paper.pdf
oa: '1'
publication: DAGA 2018, München
status: public
title: Evaluation of Modulation-MFCC Features and DNN Classification for Acoustic
  Event Detection
type: conference
user_id: '44006'
year: '2018'
...
---
_id: '11835'
abstract:
- lang: eng
  text: Signal dereverberation using the weighted prediction error (WPE) method has
    been proven to be an effective means to raise the accuracy of far-field speech
    recognition. But in its original formulation, WPE requires multiple iterations
    over a sufficiently long utterance, rendering it unsuitable for online low-latency
    applications. Recently, two methods have been proposed to overcome this limitation.
    One utilizes a neural network to estimate the power spectral density (PSD) of
    the target signal and works in a block-online fashion. The other method relies
    on a rather simple PSD estimation which smoothes the observed PSD and utilizes
    a recursive formulation which enables it to work on a frame-by-frame basis. In
    this paper, we integrate a deep neural network (DNN) based estimator into the
    recursive frame-online formulation. We evaluate the performance of the recursive
    system with different PSD estimators in comparison to the block-online and offline
    variant on two distinct corpora. The REVERB challenge data, where the signal is
    mainly deteriorated by reverberation, and a database which combines WSJ and VoiceHome
    to also consider (directed) noise sources. The results show that although smoothing
    works surprisingly well, the more sophisticated DNN based estimator shows promising
    improvements and shortens the performance gap between online and offline processing.
author:
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Tomohiro
  full_name: Nakatani, Tomohiro
  last_name: Nakatani
citation:
  ama: 'Heymann J, Drude L, Haeb-Umbach R, Kinoshita K, Nakatani T. Frame-Online DNN-WPE
    Dereverberation. In: <i>IWAENC 2018, Tokio, Japan</i>. ; 2018.'
  apa: Heymann, J., Drude, L., Haeb-Umbach, R., Kinoshita, K., &#38; Nakatani, T.
    (2018). Frame-Online DNN-WPE Dereverberation. In <i>IWAENC 2018, Tokio, Japan</i>.
  bibtex: '@inproceedings{Heymann_Drude_Haeb-Umbach_Kinoshita_Nakatani_2018, title={Frame-Online
    DNN-WPE Dereverberation}, booktitle={IWAENC 2018, Tokio, Japan}, author={Heymann,
    Jahn and Drude, Lukas and Haeb-Umbach, Reinhold and Kinoshita, Keisuke and Nakatani,
    Tomohiro}, year={2018} }'
  chicago: Heymann, Jahn, Lukas Drude, Reinhold Haeb-Umbach, Keisuke Kinoshita, and
    Tomohiro Nakatani. “Frame-Online DNN-WPE Dereverberation.” In <i>IWAENC 2018,
    Tokio, Japan</i>, 2018.
  ieee: J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, and T. Nakatani, “Frame-Online
    DNN-WPE Dereverberation,” in <i>IWAENC 2018, Tokio, Japan</i>, 2018.
  mla: Heymann, Jahn, et al. “Frame-Online DNN-WPE Dereverberation.” <i>IWAENC 2018,
    Tokio, Japan</i>, 2018.
  short: 'J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, T. Nakatani, in: IWAENC
    2018, Tokio, Japan, 2018.'
date_created: 2019-07-12T05:29:10Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/IWAENC_2018_Heymann_Paper.pdf
oa: '1'
publication: IWAENC 2018, Tokio, Japan
related_material:
  link:
  - description: Poster
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/IWAENC_2018_Heymann_Poster.pdf
status: public
title: Frame-Online DNN-WPE Dereverberation
type: conference
user_id: '44006'
year: '2018'
...
---
_id: '11837'
abstract:
- lang: eng
  text: We present a block-online multi-channel front end for automatic speech recognition
    in noisy and reverberated environments. It is an online version of our earlier
    proposed neural network supported acoustic beamformer, whose coefficients are
    calculated from noise and speech spatial covariance matrices which are estimated
    utilizing a neural mask estimator. However, the sparsity of speech in the STFT
    domain causes problems for the initial beamformer coefficients estimation in some
    frequency bins due to lack of speech observations. We propose two methods to mitigate
    this issue. The first is to lower the frequency resolution of the STFT, which
    comes with the additional advantage of a reduced time window, thus lowering the
    latency introduced by block processing. The second approach is to smooth beamforming
    coefficients along the frequency axis, thus exploiting their high interfrequency
    correlation. With both approaches the gap between offline and block-online beamformer
    performance, as measured by the word error rate achieved by a downstream speech
    recognizer, is significantly reduced. Experiments are carried out on two copora,
    representing noisy (CHiME-4) and noisy reverberant (voiceHome) environments.
author:
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Heitkaemper J, Heymann J, Haeb-Umbach R. Smoothing along Frequency in Online
    Neural Network Supported Acoustic Beamforming. In: <i>ITG 2018, Oldenburg, Germany</i>.
    ; 2018.'
  apa: Heitkaemper, J., Heymann, J., &#38; Haeb-Umbach, R. (2018). Smoothing along
    Frequency in Online Neural Network Supported Acoustic Beamforming. In <i>ITG 2018,
    Oldenburg, Germany</i>.
  bibtex: '@inproceedings{Heitkaemper_Heymann_Haeb-Umbach_2018, title={Smoothing along
    Frequency in Online Neural Network Supported Acoustic Beamforming}, booktitle={ITG
    2018, Oldenburg, Germany}, author={Heitkaemper, Jens and Heymann, Jahn and Haeb-Umbach,
    Reinhold}, year={2018} }'
  chicago: Heitkaemper, Jens, Jahn Heymann, and Reinhold Haeb-Umbach. “Smoothing along
    Frequency in Online Neural Network Supported Acoustic Beamforming.” In <i>ITG
    2018, Oldenburg, Germany</i>, 2018.
  ieee: J. Heitkaemper, J. Heymann, and R. Haeb-Umbach, “Smoothing along Frequency
    in Online Neural Network Supported Acoustic Beamforming,” in <i>ITG 2018, Oldenburg,
    Germany</i>, 2018.
  mla: Heitkaemper, Jens, et al. “Smoothing along Frequency in Online Neural Network
    Supported Acoustic Beamforming.” <i>ITG 2018, Oldenburg, Germany</i>, 2018.
  short: 'J. Heitkaemper, J. Heymann, R. Haeb-Umbach, in: ITG 2018, Oldenburg, Germany,
    2018.'
date_created: 2019-07-12T05:29:13Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/ITG_2018_Heitkaemper_Paper.pdf
oa: '1'
publication: ITG 2018, Oldenburg, Germany
related_material:
  link:
  - description: Slides
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/ITG_2018_Heitkaemper_Slides.pdf
status: public
title: Smoothing along Frequency in Online Neural Network Supported Acoustic Beamforming
type: conference
user_id: '44006'
year: '2018'
...
---
_id: '11872'
abstract:
- lang: eng
  text: 'The weighted prediction error (WPE) algorithm has proven to be a very successful
    dereverberation method for the REVERB challenge. Likewise, neural network based
    mask estimation for beamforming demonstrated very good noise suppression in the
    CHiME 3 and CHiME 4 challenges. Recently, it has been shown that this estimator
    can also be trained to perform dereverberation and denoising jointly. However,
    up to now a comparison of a neural beamformer and WPE is still missing, so is
    an investigation into a combination of the two. Therefore, we here provide an
    extensive evaluation of both and consequently propose variants to integrate deep
    neural network based beamforming with WPE. For these integrated variants we identify
    a consistent word error rate (WER) reduction on two distinct databases. In particular,
    our study shows that deep learning based beamforming benefits from a model-based
    dereverberation technique (i.e. WPE) and vice versa. Our key findings are: (a)
    Neural beamforming yields the lower WERs in comparison to WPE the more channels
    and noise are present. (b) Integration of WPE and a neural beamformer consistently
    outperforms all stand-alone systems.'
author:
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Tomohiro
  full_name: Nakatani, Tomohiro
  last_name: Nakatani
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Drude L, Boeddeker C, Heymann J, et al. Integration neural network based beamforming
    and weighted prediction error dereverberation. In: <i>INTERSPEECH 2018, Hyderabad,
    India</i>. ; 2018.'
  apa: Drude, L., Boeddeker, C., Heymann, J., Kinoshita, K., Delcroix, M., Nakatani,
    T., &#38; Haeb-Umbach, R. (2018). Integration neural network based beamforming
    and weighted prediction error dereverberation. In <i>INTERSPEECH 2018, Hyderabad,
    India</i>.
  bibtex: '@inproceedings{Drude_Boeddeker_Heymann_Kinoshita_Delcroix_Nakatani_Haeb-Umbach_2018,
    title={Integration neural network based beamforming and weighted prediction error
    dereverberation}, booktitle={INTERSPEECH 2018, Hyderabad, India}, author={Drude,
    Lukas and Boeddeker, Christoph and Heymann, Jahn and Kinoshita, Keisuke and Delcroix,
    Marc and Nakatani, Tomohiro and Haeb-Umbach, Reinhold}, year={2018} }'
  chicago: Drude, Lukas, Christoph Boeddeker, Jahn Heymann, Keisuke Kinoshita, Marc
    Delcroix, Tomohiro Nakatani, and Reinhold Haeb-Umbach. “Integration Neural Network
    Based Beamforming and Weighted Prediction Error Dereverberation.” In <i>INTERSPEECH
    2018, Hyderabad, India</i>, 2018.
  ieee: L. Drude <i>et al.</i>, “Integration neural network based beamforming and
    weighted prediction error dereverberation,” in <i>INTERSPEECH 2018, Hyderabad,
    India</i>, 2018.
  mla: Drude, Lukas, et al. “Integration Neural Network Based Beamforming and Weighted
    Prediction Error Dereverberation.” <i>INTERSPEECH 2018, Hyderabad, India</i>,
    2018.
  short: 'L. Drude, C. Boeddeker, J. Heymann, K. Kinoshita, M. Delcroix, T. Nakatani,
    R. Haeb-Umbach, in: INTERSPEECH 2018, Hyderabad, India, 2018.'
date_created: 2019-07-12T05:29:53Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Drude_Paper.pdf
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: INTERSPEECH 2018, Hyderabad, India
related_material:
  link:
  - description: Slides
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Drude_Slides.pdf
status: public
title: Integration neural network based beamforming and weighted prediction error
  dereverberation
type: conference
user_id: '40767'
year: '2018'
...
---
_id: '11873'
abstract:
- lang: eng
  text: NARA-WPE is a Python software package providing implementations of the weighted
    prediction error (WPE) dereverberation algorithm. WPE has been shown to be a highly
    effective tool for speech dereverberation, thus improving the perceptual quality
    of the signal and improving the recognition performance of downstream automatic
    speech recognition (ASR). It is suitable both for single-channel and multi-channel
    applications. The package consist of (1) a Numpy implementation which can easily
    be integrated into a custom Python toolchain, and (2) a TensorFlow implementation
    which allows integration into larger computational graphs and enables backpropagation
    through WPE to train more advanced front-ends. This package comprises of an iterative
    offline (batch) version, a block-online version, and a frame-online version which
    can be used in moderately low latency applications, e.g. digital speech assistants.
author:
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Drude L, Heymann J, Boeddeker C, Haeb-Umbach R. NARA-WPE: A Python package
    for weighted prediction error dereverberation in Numpy and Tensorflow for online
    and offline processing. In: <i>ITG 2018, Oldenburg, Germany</i>. ; 2018.'
  apa: 'Drude, L., Heymann, J., Boeddeker, C., &#38; Haeb-Umbach, R. (2018). NARA-WPE:
    A Python package for weighted prediction error dereverberation in Numpy and Tensorflow
    for online and offline processing. In <i>ITG 2018, Oldenburg, Germany</i>.'
  bibtex: '@inproceedings{Drude_Heymann_Boeddeker_Haeb-Umbach_2018, title={NARA-WPE:
    A Python package for weighted prediction error dereverberation in Numpy and Tensorflow
    for online and offline processing}, booktitle={ITG 2018, Oldenburg, Germany},
    author={Drude, Lukas and Heymann, Jahn and Boeddeker, Christoph and Haeb-Umbach,
    Reinhold}, year={2018} }'
  chicago: 'Drude, Lukas, Jahn Heymann, Christoph Boeddeker, and Reinhold Haeb-Umbach.
    “NARA-WPE: A Python Package for Weighted Prediction Error Dereverberation in Numpy
    and Tensorflow for Online and Offline Processing.” In <i>ITG 2018, Oldenburg,
    Germany</i>, 2018.'
  ieee: 'L. Drude, J. Heymann, C. Boeddeker, and R. Haeb-Umbach, “NARA-WPE: A Python
    package for weighted prediction error dereverberation in Numpy and Tensorflow
    for online and offline processing,” in <i>ITG 2018, Oldenburg, Germany</i>, 2018.'
  mla: 'Drude, Lukas, et al. “NARA-WPE: A Python Package for Weighted Prediction Error
    Dereverberation in Numpy and Tensorflow for Online and Offline Processing.” <i>ITG
    2018, Oldenburg, Germany</i>, 2018.'
  short: 'L. Drude, J. Heymann, C. Boeddeker, R. Haeb-Umbach, in: ITG 2018, Oldenburg,
    Germany, 2018.'
date_created: 2019-07-12T05:29:54Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/ITG_2018_Drude_Paper.pdf
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: ITG 2018, Oldenburg, Germany
related_material:
  link:
  - description: Poster
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/ITG_2018_Drude_Poster.pdf
status: public
title: 'NARA-WPE: A Python package for weighted prediction error dereverberation in
  Numpy and Tensorflow for online and offline processing'
type: conference
user_id: '40767'
year: '2018'
...
---
_id: '11916'
abstract:
- lang: eng
  text: We present an experimental comparison of seven state-of-the-art machine learning
    algorithms for the task of semantic analysis of spoken input, with a special emphasis
    on applications for dysarthric speech. Dysarthria is a motor speech disorder,
    which is characterized by poor articulation of phonemes. In order to cater for
    these noncanonical phoneme realizations, we employed an unsupervised learning
    approach to estimate the acoustic models for speech recognition, which does not
    require a literal transcription of the training data. Even for the subsequent
    task of semantic analysis, only weak supervision is employed, whereby the training
    utterance is accompanied by a semantic label only, rather than a literal transcription.
    Results on two databases, one of them containing dysarthric speech, are presented
    showing that Markov logic networks and conditional random fields substantially
    outperform other machine learning approaches. Markov logic networks have proved
    to be especially robust to recognition errors, which are caused by imprecise articulation
    in dysarthric speech.
author:
- first_name: Vladimir
  full_name: Despotovic, Vladimir
  last_name: Despotovic
- first_name: Oliver
  full_name: Walter, Oliver
  last_name: Walter
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Despotovic V, Walter O, Haeb-Umbach R. Machine learning techniques for semantic
    analysis of dysarthric speech: An experimental study. <i>Speech Communication
    99 (2018) 242-251 (Elsevier BV)</i>. 2018.'
  apa: 'Despotovic, V., Walter, O., &#38; Haeb-Umbach, R. (2018). Machine learning
    techniques for semantic analysis of dysarthric speech: An experimental study.
    <i>Speech Communication 99 (2018) 242-251 (Elsevier B.V.)</i>.'
  bibtex: '@article{Despotovic_Walter_Haeb-Umbach_2018, title={Machine learning techniques
    for semantic analysis of dysarthric speech: An experimental study}, journal={Speech
    Communication 99 (2018) 242-251 (Elsevier B.V.)}, author={Despotovic, Vladimir
    and Walter, Oliver and Haeb-Umbach, Reinhold}, year={2018} }'
  chicago: 'Despotovic, Vladimir, Oliver Walter, and Reinhold Haeb-Umbach. “Machine
    Learning Techniques for Semantic Analysis of Dysarthric Speech: An Experimental
    Study.” <i>Speech Communication 99 (2018) 242-251 (Elsevier B.V.)</i>, 2018.'
  ieee: 'V. Despotovic, O. Walter, and R. Haeb-Umbach, “Machine learning techniques
    for semantic analysis of dysarthric speech: An experimental study,” <i>Speech
    Communication 99 (2018) 242-251 (Elsevier B.V.)</i>, 2018.'
  mla: 'Despotovic, Vladimir, et al. “Machine Learning Techniques for Semantic Analysis
    of Dysarthric Speech: An Experimental Study.” <i>Speech Communication 99 (2018)
    242-251 (Elsevier B.V.)</i>, 2018.'
  short: V. Despotovic, O. Walter, R. Haeb-Umbach, Speech Communication 99 (2018)
    242-251 (Elsevier B.V.) (2018).
date_created: 2019-07-12T05:30:44Z
date_updated: 2022-01-06T06:51:12Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/SpeechCommunication_2018_Walter_Paper.pdf
oa: '1'
publication: Speech Communication 99 (2018) 242-251 (Elsevier B.V.)
status: public
title: 'Machine learning techniques for semantic analysis of dysarthric speech: An
  experimental study'
type: journal_article
user_id: '44006'
year: '2018'
...
---
_id: '12898'
abstract:
- lang: eng
  text: Deep clustering (DC) and deep attractor networks (DANs) are a data-driven
    way to monaural blind source separation. Both approaches provide astonishing single
    channel performance but have not yet been generalized to block-online processing.
    When separating speech in a continuous stream with a block-online algorithm, it
    needs to be determined in each block which of the output streams belongs to whom.
    In this contribution we solve this block permutation problem by introducing an
    additional speaker identification embedding to the DAN model structure. We motivate
    this model decision by analyzing the embedding topology of DC and DANs and show,
    that DC and DANs themselves are not sufficient for speaker identification. This
    model structure (a) improves the signal to distortion ratio (SDR) over a DAN baseline
    and (b) provides up to 61% and up to 34% relative reduction in permutation error
    rate and re-identification error rate compared to an i-vector baseline, respectively.
author:
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Thilo
  full_name: von Neumann, Thilo
  last_name: von Neumann
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Drude L, von Neumann T, Haeb-Umbach R. Deep Attractor Networks for Speaker
    Re-Identifikation and Blind Source Separation. In: <i>ICASSP 2018, Calgary, Canada</i>.
    ; 2018.'
  apa: Drude, L., von Neumann, T., &#38; Haeb-Umbach, R. (2018). Deep Attractor Networks
    for Speaker Re-Identifikation and Blind Source Separation. In <i>ICASSP 2018,
    Calgary, Canada</i>.
  bibtex: '@inproceedings{Drude_von Neumann_Haeb-Umbach_2018, title={Deep Attractor
    Networks for Speaker Re-Identifikation and Blind Source Separation}, booktitle={ICASSP
    2018, Calgary, Canada}, author={Drude, Lukas and von Neumann, Thilo and Haeb-Umbach,
    Reinhold}, year={2018} }'
  chicago: Drude, Lukas, Thilo von Neumann, and Reinhold Haeb-Umbach. “Deep Attractor
    Networks for Speaker Re-Identifikation and Blind Source Separation.” In <i>ICASSP
    2018, Calgary, Canada</i>, 2018.
  ieee: L. Drude, T. von Neumann, and R. Haeb-Umbach, “Deep Attractor Networks for
    Speaker Re-Identifikation and Blind Source Separation,” in <i>ICASSP 2018, Calgary,
    Canada</i>, 2018.
  mla: Drude, Lukas, et al. “Deep Attractor Networks for Speaker Re-Identifikation
    and Blind Source Separation.” <i>ICASSP 2018, Calgary, Canada</i>, 2018.
  short: 'L. Drude, T. von Neumann, R. Haeb-Umbach, in: ICASSP 2018, Calgary, Canada,
    2018.'
date_created: 2019-07-30T14:22:53Z
date_updated: 2022-01-06T06:51:24Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/ICASSP_2018_Drude2_Paper.pdf
oa: '1'
publication: ICASSP 2018, Calgary, Canada
related_material:
  link:
  - description: Slides
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/ICASSP_2018_Drude2_Slides.pdf
status: public
title: Deep Attractor Networks for Speaker Re-Identifikation and Blind Source Separation
type: conference
user_id: '44006'
year: '2018'
...
---
_id: '12900'
abstract:
- lang: eng
  text: 'Deep attractor networks (DANs) are a recently introduced method to blindly
    separate sources from spectral features of a monaural recording using bidirectional
    long short-term memory networks (BLSTMs). Due to the nature of BLSTMs, this is
    inherently not online-ready and resorting to operating on blocks yields a block
    permutation problem in that the index of each speaker may change between blocks.
    We here propose the joint modeling of spatial and spectral features to solve the
    block permutation problem and generalize DANs to multi-channel meeting recordings:
    The DAN acts as a spectral feature extractor for a subsequent model-based clustering
    approach. We first analyze different joint models in batch-processing scenarios
    and finally propose a block-online blind source separation algorithm. The efficacy
    of the proposed models is demonstrated on reverberant mixtures corrupted by real
    recordings of multi-channel background noise. We demonstrate that both the proposed
    batch-processing and the proposed block-online system outperform (a) a spatial-only
    model with a state-of-the-art frequency permutation solver and (b) a spectral-only
    model with an oracle block permutation solver in terms of signal to distortion
    ratio (SDR) gains.'
author:
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: ' Takuya '
  full_name: 'Higuchi,,  Takuya '
  last_name: Higuchi,
- first_name: 'Keisuke '
  full_name: 'Kinoshita, Keisuke '
  last_name: Kinoshita
- first_name: 'Tomohiro '
  full_name: 'Nakatani, Tomohiro '
  last_name: Nakatani
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Drude L, Higuchi,  Takuya , Kinoshita K, Nakatani T, Haeb-Umbach R. Dual Frequency-
    and Block-Permutation Alignment for Deep Learning Based Block-Online Blind Source
    Separation. In: <i>ICASSP 2018, Calgary, Canada</i>. ; 2018.'
  apa: Drude, L., Higuchi,  Takuya , Kinoshita, K., Nakatani, T., &#38; Haeb-Umbach,
    R. (2018). Dual Frequency- and Block-Permutation Alignment for Deep Learning Based
    Block-Online Blind Source Separation. In <i>ICASSP 2018, Calgary, Canada</i>.
  bibtex: '@inproceedings{Drude_Higuchi,_Kinoshita_Nakatani_Haeb-Umbach_2018, title={Dual
    Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online
    Blind Source Separation}, booktitle={ICASSP 2018, Calgary, Canada}, author={Drude,
    Lukas and Higuchi,  Takuya  and Kinoshita, Keisuke  and Nakatani, Tomohiro  and
    Haeb-Umbach, Reinhold}, year={2018} }'
  chicago: Drude, Lukas,  Takuya  Higuchi, Keisuke  Kinoshita, Tomohiro  Nakatani,
    and Reinhold Haeb-Umbach. “Dual Frequency- and Block-Permutation Alignment for
    Deep Learning Based Block-Online Blind Source Separation.” In <i>ICASSP 2018,
    Calgary, Canada</i>, 2018.
  ieee: L. Drude,  Takuya  Higuchi, K. Kinoshita, T. Nakatani, and R. Haeb-Umbach,
    “Dual Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online
    Blind Source Separation,” in <i>ICASSP 2018, Calgary, Canada</i>, 2018.
  mla: Drude, Lukas, et al. “Dual Frequency- and Block-Permutation Alignment for Deep
    Learning Based Block-Online Blind Source Separation.” <i>ICASSP 2018, Calgary,
    Canada</i>, 2018.
  short: 'L. Drude,  Takuya  Higuchi, K. Kinoshita, T. Nakatani, R. Haeb-Umbach, in:
    ICASSP 2018, Calgary, Canada, 2018.'
date_created: 2019-07-30T14:42:15Z
date_updated: 2022-01-06T06:51:24Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/ICASSP_2018_Drude_Paper.pdf
oa: '1'
publication: ICASSP 2018, Calgary, Canada
related_material:
  link:
  - description: Poster
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/ICASSP_2018_Drude_Poster.pdf
status: public
title: Dual Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online
  Blind Source Separation
type: conference
user_id: '44006'
year: '2018'
...
---
_id: '12901'
abstract:
- lang: eng
  text: This work examines acoustic beamformers employing neural networks (NNs) for
    mask prediction as front-end for automatic speech recognition (ASR) systems for
    practical scenarios like voice-enabled home devices. To test the versatility of
    the mask predicting network, the system is evaluated with different recording
    hardware, different microphone array designs, and different acoustic models of
    the downstream ASR system. Significant gains in recognition accuracy are obtained
    in all configurations despite the fact that the NN had been trained on mismatched
    data. Unlike previous work, the NN is trained on a feature level objective, which
    gives some performance advantage over a mask related criterion. Furthermore, different
    approaches for realizing online, or adaptive, NN-based beamforming are explored,
    where the online algorithms still show significant gains compared to the baseline
    performance.
author:
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Hakan
  full_name: Erdogan, Hakan
  last_name: Erdogan
- first_name: Takuya
  full_name: Yoshioka, Takuya
  last_name: Yoshioka
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Boeddeker C, Erdogan H, Yoshioka T, Haeb-Umbach R. Exploring Practical Aspects
    of Neural Mask-Based Beamforming for Far-Field Speech Recognition. In: <i>ICASSP
    2018, Calgary, Canada</i>. ; 2018.'
  apa: Boeddeker, C., Erdogan, H., Yoshioka, T., &#38; Haeb-Umbach, R. (2018). Exploring
    Practical Aspects of Neural Mask-Based Beamforming for Far-Field Speech Recognition.
    In <i>ICASSP 2018, Calgary, Canada</i>.
  bibtex: '@inproceedings{Boeddeker_Erdogan_Yoshioka_Haeb-Umbach_2018, title={Exploring
    Practical Aspects of Neural Mask-Based Beamforming for Far-Field Speech Recognition},
    booktitle={ICASSP 2018, Calgary, Canada}, author={Boeddeker, Christoph and Erdogan,
    Hakan and Yoshioka, Takuya and Haeb-Umbach, Reinhold}, year={2018} }'
  chicago: Boeddeker, Christoph, Hakan Erdogan, Takuya Yoshioka, and Reinhold Haeb-Umbach.
    “Exploring Practical Aspects of Neural Mask-Based Beamforming for Far-Field Speech
    Recognition.” In <i>ICASSP 2018, Calgary, Canada</i>, 2018.
  ieee: C. Boeddeker, H. Erdogan, T. Yoshioka, and R. Haeb-Umbach, “Exploring Practical
    Aspects of Neural Mask-Based Beamforming for Far-Field Speech Recognition,” in
    <i>ICASSP 2018, Calgary, Canada</i>, 2018.
  mla: Boeddeker, Christoph, et al. “Exploring Practical Aspects of Neural Mask-Based
    Beamforming for Far-Field Speech Recognition.” <i>ICASSP 2018, Calgary, Canada</i>,
    2018.
  short: 'C. Boeddeker, H. Erdogan, T. Yoshioka, R. Haeb-Umbach, in: ICASSP 2018,
    Calgary, Canada, 2018.'
date_created: 2019-07-30T14:53:58Z
date_updated: 2022-01-06T06:51:24Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/ICASSP_2018_Boeddeker_Paper.pdf
oa: '1'
publication: ICASSP 2018, Calgary, Canada
related_material:
  link:
  - description: Poster
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/ICASSP_2018_Boeddeker_Slides.pdf
status: public
title: Exploring Practical Aspects of Neural Mask-Based Beamforming for Far-Field
  Speech Recognition
type: conference
user_id: '44006'
year: '2018'
...
---
_id: '29923'
abstract:
- lang: eng
  text: "This paper introduces a new open source platform for end-toend speech processing
    named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition
    (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and Py-Torch,
    as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style
    for data processing, feature extraction/format, and recipes to provide a complete
    setup for speech recognition and other speech processing experiments. This paper
    explains a major architecture of this software platform, several important functionalities,
    which differentiate ESPnet from other open source ASR toolkits, and experimental
    results with\r\nmajor ASR benchmarks."
author:
- first_name: Shinji
  full_name: Watanabe, Shinji
  last_name: Watanabe
- first_name: Takaaki
  full_name: Hori, Takaaki
  last_name: Hori
- first_name: Shigeki
  full_name: Karita, Shigeki
  last_name: Karita
- first_name: Tomoki
  full_name: Hayashi, Tomoki
  last_name: Hayashi
- first_name: Jiro
  full_name: Nishitoba, Jiro
  last_name: Nishitoba
- first_name: Yuya
  full_name: Unno, Yuya
  last_name: Unno
- first_name: Nelson
  full_name: Enrique Yalta Soplin, Nelson
  last_name: Enrique Yalta Soplin
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Matthew
  full_name: Wiesner, Matthew
  last_name: Wiesner
- first_name: Nanxin
  full_name: Chen, Nanxin
  last_name: Chen
- first_name: Adithya
  full_name: Renduchintala, Adithya
  last_name: Renduchintala
- first_name: Tsubasa
  full_name: Ochiai, Tsubasa
  last_name: Ochiai
citation:
  ama: 'Watanabe S, Hori T, Karita S, et al. ESPnet: End-to-End Speech Processing
    Toolkit. In: <i>INTERSPEECH 2018, Hyderabad, India</i>. ; 2018:2207–2211. doi:<a
    href="https://doi.org/10.21437/Interspeech.2018-1456">10.21437/Interspeech.2018-1456</a>'
  apa: 'Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y.,
    Enrique Yalta Soplin, N., Heymann, J., Wiesner, M., Chen, N., Renduchintala, A.,
    &#38; Ochiai, T. (2018). ESPnet: End-to-End Speech Processing Toolkit. <i>INTERSPEECH
    2018, Hyderabad, India</i>, 2207–2211. <a href="https://doi.org/10.21437/Interspeech.2018-1456">https://doi.org/10.21437/Interspeech.2018-1456</a>'
  bibtex: '@inproceedings{Watanabe_Hori_Karita_Hayashi_Nishitoba_Unno_Enrique Yalta
    Soplin_Heymann_Wiesner_Chen_et al._2018, title={ESPnet: End-to-End Speech Processing
    Toolkit}, DOI={<a href="https://doi.org/10.21437/Interspeech.2018-1456">10.21437/Interspeech.2018-1456</a>},
    booktitle={INTERSPEECH 2018, Hyderabad, India}, author={Watanabe, Shinji and Hori,
    Takaaki and Karita, Shigeki and Hayashi, Tomoki and Nishitoba, Jiro and Unno,
    Yuya and Enrique Yalta Soplin, Nelson and Heymann, Jahn and Wiesner, Matthew and
    Chen, Nanxin and et al.}, year={2018}, pages={2207–2211} }'
  chicago: 'Watanabe, Shinji, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba,
    Yuya Unno, Nelson Enrique Yalta Soplin, et al. “ESPnet: End-to-End Speech Processing
    Toolkit.” In <i>INTERSPEECH 2018, Hyderabad, India</i>, 2207–2211, 2018. <a href="https://doi.org/10.21437/Interspeech.2018-1456">https://doi.org/10.21437/Interspeech.2018-1456</a>.'
  ieee: 'S. Watanabe <i>et al.</i>, “ESPnet: End-to-End Speech Processing Toolkit,”
    in <i>INTERSPEECH 2018, Hyderabad, India</i>, 2018, pp. 2207–2211, doi: <a href="https://doi.org/10.21437/Interspeech.2018-1456">10.21437/Interspeech.2018-1456</a>.'
  mla: 'Watanabe, Shinji, et al. “ESPnet: End-to-End Speech Processing Toolkit.” <i>INTERSPEECH
    2018, Hyderabad, India</i>, 2018, pp. 2207–2211, doi:<a href="https://doi.org/10.21437/Interspeech.2018-1456">10.21437/Interspeech.2018-1456</a>.'
  short: 'S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N. Enrique
    Yalta Soplin, J. Heymann, M. Wiesner, N. Chen, A. Renduchintala, T. Ochiai, in:
    INTERSPEECH 2018, Hyderabad, India, 2018, pp. 2207–2211.'
date_created: 2022-02-21T10:34:37Z
date_updated: 2023-01-11T11:23:19Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.21437/Interspeech.2018-1456
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2022-02-23T08:03:13Z
  date_updated: 2022-02-23T08:03:13Z
  file_id: '29954'
  file_name: INTERSPEECH_2018_Heymann_Paper.pdf
  file_size: 288907
  relation: main_file
file_date_updated: 2022-02-23T08:03:13Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
page: 2207–2211
publication: INTERSPEECH 2018, Hyderabad, India
status: public
title: 'ESPnet: End-to-End Speech Processing Toolkit'
type: conference
user_id: '59789'
year: '2018'
...
---
_id: '12899'
abstract:
- lang: eng
  text: This contribution presents a speech enhancement system for the CHiME-5 Dinner
    Party Scenario. The front-end employs multi-channel linear time-variant filtering
    and achieves its gains without the use of a neural network. We present an adaptation
    of blind source separation techniques to the CHiME-5 database which we call Guided
    Source Separation (GSS). Using the baseline acoustic and language model, the combination
    of Weighted Prediction Error based dereverberation, guided source separation,
    and beamforming reduces the WER by 10:54% (relative) for the single array track
    and by 21:12% (relative) on the multiple array track.
author:
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Jahn
  full_name: Heymann, Jahn
  last_name: Heymann
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Boeddeker C, Heitkaemper J, Schmalenstroeer J, Drude L, Heymann J, Haeb-Umbach
    R. Front-End Processing for the CHiME-5 Dinner Party Scenario. In: <i>Proc. CHiME
    2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India</i>.
    ; 2018.'
  apa: Boeddeker, C., Heitkaemper, J., Schmalenstroeer, J., Drude, L., Heymann, J.,
    &#38; Haeb-Umbach, R. (2018). Front-End Processing for the CHiME-5 Dinner Party
    Scenario. <i>Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments,
    Hyderabad, India</i>.
  bibtex: '@inproceedings{Boeddeker_Heitkaemper_Schmalenstroeer_Drude_Heymann_Haeb-Umbach_2018,
    title={Front-End Processing for the CHiME-5 Dinner Party Scenario}, booktitle={Proc.
    CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad,
    India}, author={Boeddeker, Christoph and Heitkaemper, Jens and Schmalenstroeer,
    Joerg and Drude, Lukas and Heymann, Jahn and Haeb-Umbach, Reinhold}, year={2018}
    }'
  chicago: Boeddeker, Christoph, Jens Heitkaemper, Joerg Schmalenstroeer, Lukas Drude,
    Jahn Heymann, and Reinhold Haeb-Umbach. “Front-End Processing for the CHiME-5
    Dinner Party Scenario.” In <i>Proc. CHiME 2018 Workshop on Speech Processing in
    Everyday Environments, Hyderabad, India</i>, 2018.
  ieee: C. Boeddeker, J. Heitkaemper, J. Schmalenstroeer, L. Drude, J. Heymann, and
    R. Haeb-Umbach, “Front-End Processing for the CHiME-5 Dinner Party Scenario,”
    2018.
  mla: Boeddeker, Christoph, et al. “Front-End Processing for the CHiME-5 Dinner Party
    Scenario.” <i>Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments,
    Hyderabad, India</i>, 2018.
  short: 'C. Boeddeker, J. Heitkaemper, J. Schmalenstroeer, L. Drude, J. Heymann,
    R. Haeb-Umbach, in: Proc. CHiME 2018 Workshop on Speech Processing in Everyday
    Environments, Hyderabad, India, 2018.'
date_created: 2019-07-30T14:35:15Z
date_updated: 2023-10-26T08:14:15Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Heitkaemper_Paper.pdf
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments,
  Hyderabad, India
quality_controlled: '1'
related_material:
  link:
  - description: Poster
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Heitkaemper_Poster.pdf
status: public
title: Front-End Processing for the CHiME-5 Dinner Party Scenario
type: conference
user_id: '460'
year: '2018'
...
---
_id: '6859'
abstract:
- lang: eng
  text: "Signal processing in WASNs is based on a software framework for hosting the
    algorithms as well as on a set of wireless connected devices representing the
    hardware. Each of the nodes contributes memory, processing power, communication
    bandwidth and some sensor information for the tasks to be solved on the network.
    \r\nIn this paper we present our MARVELO framework for distributed signal processing.
    It is intended for transforming existing centralized implementations into distributed
    versions. To this end, the software only needs a block-oriented implementation,
    which MARVELO picks-up and distributes on the network. Additionally, our sensor
    node hardware and the audio interfaces responsible for multi-channel recordings
    are presented."
author:
- first_name: Haitham
  full_name: Afifi, Haitham
  id: '65718'
  last_name: Afifi
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Joerg
  full_name: Ullmann, Joerg
  id: '16256'
  last_name: Ullmann
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Holger
  full_name: Karl, Holger
  id: '126'
  last_name: Karl
citation:
  ama: 'Afifi H, Schmalenstroeer J, Ullmann J, Haeb-Umbach R, Karl H. MARVELO - A
    Framework for Signal Processing in Wireless Acoustic Sensor Networks. In: <i>Speech
    Communication; 13th ITG-Symposium</i>. ; 2018:1-5.'
  apa: Afifi, H., Schmalenstroeer, J., Ullmann, J., Haeb-Umbach, R., &#38; Karl, H.
    (2018). MARVELO - A Framework for Signal Processing in Wireless Acoustic Sensor
    Networks. <i>Speech Communication; 13th ITG-Symposium</i>, 1–5.
  bibtex: '@inproceedings{Afifi_Schmalenstroeer_Ullmann_Haeb-Umbach_Karl_2018, title={MARVELO
    - A Framework for Signal Processing in Wireless Acoustic Sensor Networks}, booktitle={Speech
    Communication; 13th ITG-Symposium}, author={Afifi, Haitham and Schmalenstroeer,
    Joerg and Ullmann, Joerg and Haeb-Umbach, Reinhold and Karl, Holger}, year={2018},
    pages={1–5} }'
  chicago: Afifi, Haitham, Joerg Schmalenstroeer, Joerg Ullmann, Reinhold Haeb-Umbach,
    and Holger Karl. “MARVELO - A Framework for Signal Processing in Wireless Acoustic
    Sensor Networks.” In <i>Speech Communication; 13th ITG-Symposium</i>, 1–5, 2018.
  ieee: H. Afifi, J. Schmalenstroeer, J. Ullmann, R. Haeb-Umbach, and H. Karl, “MARVELO
    - A Framework for Signal Processing in Wireless Acoustic Sensor Networks,” in
    <i>Speech Communication; 13th ITG-Symposium</i>, 2018, pp. 1–5.
  mla: Afifi, Haitham, et al. “MARVELO - A Framework for Signal Processing in Wireless
    Acoustic Sensor Networks.” <i>Speech Communication; 13th ITG-Symposium</i>, 2018,
    pp. 1–5.
  short: 'H. Afifi, J. Schmalenstroeer, J. Ullmann, R. Haeb-Umbach, H. Karl, in: Speech
    Communication; 13th ITG-Symposium, 2018, pp. 1–5.'
date_created: 2019-01-17T15:47:35Z
date_updated: 2023-10-26T08:15:32Z
department:
- _id: '75'
- _id: '54'
language:
- iso: eng
page: 1-5
project:
- _id: '27'
  name: 'Akustische Sensornetzwerke - Teilprojekt '
- _id: '27'
  name: Akustische Sensornetzwerke - Teilprojekt "Verteilte akustische Signalverarbeitung
    über funkbasierte Sensornetzwerke
publication: Speech Communication; 13th ITG-Symposium
quality_controlled: '1'
status: public
title: MARVELO - A Framework for Signal Processing in Wireless Acoustic Sensor Networks
type: conference
user_id: '460'
year: '2018'
...
---
_id: '11747'
abstract:
- lang: eng
  text: In this paper, we present a neural network based classification algorithm
    for the discrimination of moving from stationary targets in the sight of an automotive
    radar sensor. Compared to existing algorithms, the proposed algorithm can take
    into account multiple local radar targets instead of performing classification
    inference on each target individually resulting in superior discrimination accuracy,
    especially suitable for non rigid objects, like pedestrians, which in general
    have a wide velocity spread when multiple targets are detected.
author:
- first_name: Christopher
  full_name: Grimm, Christopher
  last_name: Grimm
- first_name: Tobias
  full_name: Breddermann, Tobias
  last_name: Breddermann
- first_name: Ridha
  full_name: Farhoud, Ridha
  last_name: Farhoud
- first_name: Tai
  full_name: Fei, Tai
  last_name: Fei
- first_name: Ernst
  full_name: Warsitz, Ernst
  last_name: Warsitz
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Grimm C, Breddermann T, Farhoud R, Fei T, Warsitz E, Haeb-Umbach R. Discrimination
    of Stationary from Moving Targets with Recurrent Neural Networks in Automotive
    Radar. In: <i>International Conference on Microwaves for Intelligent Mobility
    (ICMIM) 2018</i>. ; 2018.'
  apa: Grimm, C., Breddermann, T., Farhoud, R., Fei, T., Warsitz, E., &#38; Haeb-Umbach,
    R. (2018). Discrimination of Stationary from Moving Targets with Recurrent Neural
    Networks in Automotive Radar. <i>International Conference on Microwaves for Intelligent
    Mobility (ICMIM) 2018</i>.
  bibtex: '@inproceedings{Grimm_Breddermann_Farhoud_Fei_Warsitz_Haeb-Umbach_2018,
    title={Discrimination of Stationary from Moving Targets with Recurrent Neural
    Networks in Automotive Radar}, booktitle={International Conference on Microwaves
    for Intelligent Mobility (ICMIM) 2018}, author={Grimm, Christopher and Breddermann,
    Tobias and Farhoud, Ridha and Fei, Tai and Warsitz, Ernst and Haeb-Umbach, Reinhold},
    year={2018} }'
  chicago: Grimm, Christopher, Tobias Breddermann, Ridha Farhoud, Tai Fei, Ernst Warsitz,
    and Reinhold Haeb-Umbach. “Discrimination of Stationary from Moving Targets with
    Recurrent Neural Networks in Automotive Radar.” In <i>International Conference
    on Microwaves for Intelligent Mobility (ICMIM) 2018</i>, 2018.
  ieee: C. Grimm, T. Breddermann, R. Farhoud, T. Fei, E. Warsitz, and R. Haeb-Umbach,
    “Discrimination of Stationary from Moving Targets with Recurrent Neural Networks
    in Automotive Radar,” 2018.
  mla: Grimm, Christopher, et al. “Discrimination of Stationary from Moving Targets
    with Recurrent Neural Networks in Automotive Radar.” <i>International Conference
    on Microwaves for Intelligent Mobility (ICMIM) 2018</i>, 2018.
  short: 'C. Grimm, T. Breddermann, R. Farhoud, T. Fei, E. Warsitz, R. Haeb-Umbach,
    in: International Conference on Microwaves for Intelligent Mobility (ICMIM) 2018,
    2018.'
date_created: 2019-07-12T05:27:29Z
date_updated: 2023-11-20T16:37:39Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/ICMIM_2018_Haeb-Umbach_Paper.pdf
oa: '1'
publication: International Conference on Microwaves for Intelligent Mobility (ICMIM)
  2018
quality_controlled: '1'
status: public
title: Discrimination of Stationary from Moving Targets with Recurrent Neural Networks
  in Automotive Radar
type: conference
user_id: '242'
year: '2018'
...
---
_id: '11907'
abstract:
- lang: eng
  text: The invention of the Variational Autoencoder enables the application of Neural
    Networks to a wide range of tasks in unsupervised learning, including the field
    of Acoustic Unit Discovery (AUD). The recently proposed Hidden Markov Model Variational
    Autoencoder (HMMVAE) allows a joint training of a neural network based feature
    extractor and a structured prior for the latent space given by a Hidden Markov
    Model. It has been shown that the HMMVAE significantly outperforms pure GMM-HMM
    based systems on the AUD task. However, the HMMVAE cannot autonomously infer the
    number of acoustic units and thus relies on the GMM-HMM system for initialization.
    This paper introduces the Bayesian Hidden Markov Model Variational Autoencoder
    (BHMMVAE) which solves these issues by embedding the HMMVAE in a Bayesian framework
    with a Dirichlet Process Prior for the distribution of the acoustic units, and
    diagonal or full-covariance Gaussians as emission distributions. Experiments on
    TIMIT and Xitsonga show that the BHMMVAE is able to autonomously infer a reasonable
    number of acoustic units, can be initialized without supervision by a GMM-HMM
    system, achieves computationally efficient stochastic variational inference by
    using natural gradient descent, and, additionally, improves the AUD performance
    over the HMMVAE.
author:
- first_name: Thomas
  full_name: Glarner, Thomas
  id: '14169'
  last_name: Glarner
- first_name: Patrick
  full_name: Hanebrink, Patrick
  last_name: Hanebrink
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Glarner T, Hanebrink P, Ebbers J, Haeb-Umbach R. Full Bayesian Hidden Markov
    Model Variational Autoencoder for Acoustic Unit Discovery. In: <i>INTERSPEECH
    2018, Hyderabad, India</i>. ; 2018.'
  apa: Glarner, T., Hanebrink, P., Ebbers, J., &#38; Haeb-Umbach, R. (2018). Full
    Bayesian Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery.
    <i>INTERSPEECH 2018, Hyderabad, India</i>.
  bibtex: '@inproceedings{Glarner_Hanebrink_Ebbers_Haeb-Umbach_2018, title={Full Bayesian
    Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery}, booktitle={INTERSPEECH
    2018, Hyderabad, India}, author={Glarner, Thomas and Hanebrink, Patrick and Ebbers,
    Janek and Haeb-Umbach, Reinhold}, year={2018} }'
  chicago: Glarner, Thomas, Patrick Hanebrink, Janek Ebbers, and Reinhold Haeb-Umbach.
    “Full Bayesian Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery.”
    In <i>INTERSPEECH 2018, Hyderabad, India</i>, 2018.
  ieee: T. Glarner, P. Hanebrink, J. Ebbers, and R. Haeb-Umbach, “Full Bayesian Hidden
    Markov Model Variational Autoencoder for Acoustic Unit Discovery,” 2018.
  mla: Glarner, Thomas, et al. “Full Bayesian Hidden Markov Model Variational Autoencoder
    for Acoustic Unit Discovery.” <i>INTERSPEECH 2018, Hyderabad, India</i>, 2018.
  short: 'T. Glarner, P. Hanebrink, J. Ebbers, R. Haeb-Umbach, in: INTERSPEECH 2018,
    Hyderabad, India, 2018.'
date_created: 2019-07-12T05:30:34Z
date_updated: 2023-11-22T08:29:22Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Glarner_Paper.pdf
oa: '1'
publication: INTERSPEECH 2018, Hyderabad, India
quality_controlled: '1'
related_material:
  link:
  - description: Slides
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Glarner_Slides.pdf
status: public
title: Full Bayesian Hidden Markov Model Variational Autoencoder for Acoustic Unit
  Discovery
type: conference
user_id: '34851'
year: '2018'
...
---
_id: '11838'
abstract:
- lang: eng
  text: Distributed sensor data acquisition usually encompasses data sampling by the
    individual devices, where each of them has its own oscillator driving the local
    sampling process, resulting in slightly different sampling rates at the individual
    sensor nodes. Nevertheless, for certain downstream signal processing tasks it
    is important to compensate even for small sampling rate offsets. Aligning the
    sampling rates of oscillators which differ only by a few parts-per-million, is,
    however, challenging and quite different from traditional multirate signal processing
    tasks. In this paper we propose to transfer a precise but computationally demanding
    time domain approach, inspired by the Nyquist-Shannon sampling theorem, to an
    efficient frequency domain implementation. To this end a buffer control is employed
    which compensates for sampling offsets which are multiples of the sampling period,
    while a digital filter, realized by the wellknown Overlap-Save method, handles
    the fractional part of the sampling phase offset. With experiments on artificially
    misaligned data we investigate the parametrization, the efficiency, and the induced
    distortions of the proposed resampling method. It is shown that a favorable compromise
    between residual distortion and computational complexity is achieved, compared
    to other sampling rate offset compensation techniques.
author:
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Schmalenstroeer J, Haeb-Umbach R. Efficient Sampling Rate Offset Compensation
    - An Overlap-Save Based Approach. In: <i>26th European Signal Processing Conference
    (EUSIPCO 2018)</i>. ; 2018.'
  apa: Schmalenstroeer, J., &#38; Haeb-Umbach, R. (2018). Efficient Sampling Rate
    Offset Compensation - An Overlap-Save Based Approach. <i>26th European Signal
    Processing Conference (EUSIPCO 2018)</i>.
  bibtex: '@inproceedings{Schmalenstroeer_Haeb-Umbach_2018, title={Efficient Sampling
    Rate Offset Compensation - An Overlap-Save Based Approach}, booktitle={26th European
    Signal Processing Conference (EUSIPCO 2018)}, author={Schmalenstroeer, Joerg and
    Haeb-Umbach, Reinhold}, year={2018} }'
  chicago: Schmalenstroeer, Joerg, and Reinhold Haeb-Umbach. “Efficient Sampling Rate
    Offset Compensation - An Overlap-Save Based Approach.” In <i>26th European Signal
    Processing Conference (EUSIPCO 2018)</i>, 2018.
  ieee: J. Schmalenstroeer and R. Haeb-Umbach, “Efficient Sampling Rate Offset Compensation
    - An Overlap-Save Based Approach,” 2018.
  mla: Schmalenstroeer, Joerg, and Reinhold Haeb-Umbach. “Efficient Sampling Rate
    Offset Compensation - An Overlap-Save Based Approach.” <i>26th European Signal
    Processing Conference (EUSIPCO 2018)</i>, 2018.
  short: 'J. Schmalenstroeer, R. Haeb-Umbach, in: 26th European Signal Processing
    Conference (EUSIPCO 2018), 2018.'
date_created: 2019-07-12T05:29:14Z
date_updated: 2023-10-26T08:12:33Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/Eusipco_2018_Schmalenstroeer_Paper.pdf
oa: '1'
publication: 26th European Signal Processing Conference (EUSIPCO 2018)
quality_controlled: '1'
status: public
title: Efficient Sampling Rate Offset Compensation - An Overlap-Save Based Approach
type: conference
user_id: '460'
year: '2018'
...
---
_id: '11876'
abstract:
- lang: eng
  text: This paper describes the systems for the single-array track and the multiple-array
    track of the 5th CHiME Challenge. The final system is a combination of multiple
    systems, using Confusion Network Combination (CNC). The different systems presented
    here are utilizing different front-ends and training sets for a Bidirectional
    Long Short-Term Memory (BLSTM) Acoustic Model (AM). The front-end was replaced
    by enhancements provided by Paderborn University [1]. The back-end has been implemented
    using RASR [2] and RETURNN [3]. Additionally, a system combination including the
    hypothesis word graphs from the system of the submission [1] has been performed,
    which results in the final best system.
author:
- first_name: Markus
  full_name: Kitza, Markus
  last_name: Kitza
- first_name: Wilfried
  full_name: Michel, Wilfried
  last_name: Michel
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Tobias
  full_name: Menne, Tobias
  last_name: Menne
- first_name: Ralf
  full_name: Schlüter, Ralf
  last_name: Schlüter
- first_name: Hermann
  full_name: Ney, Hermann
  last_name: Ney
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Kitza M, Michel W, Boeddeker C, et al. The RWTH/UPB System Combination for
    the CHiME 2018 Workshop. In: <i>Proc. CHiME 2018 Workshop on Speech Processing
    in Everyday Environments, Hyderabad, India</i>. ; 2018.'
  apa: Kitza, M., Michel, W., Boeddeker, C., Heitkaemper, J., Menne, T., Schlüter,
    R., Ney, H., Schmalenstroeer, J., Drude, L., Heymann, J., &#38; Haeb-Umbach, R.
    (2018). The RWTH/UPB System Combination for the CHiME 2018 Workshop. <i>Proc.
    CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad,
    India</i>.
  bibtex: '@inproceedings{Kitza_Michel_Boeddeker_Heitkaemper_Menne_Schlüter_Ney_Schmalenstroeer_Drude_Heymann_et
    al._2018, title={The RWTH/UPB System Combination for the CHiME 2018 Workshop},
    booktitle={Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments,
    Hyderabad, India}, author={Kitza, Markus and Michel, Wilfried and Boeddeker, Christoph
    and Heitkaemper, Jens and Menne, Tobias and Schlüter, Ralf and Ney, Hermann and
    Schmalenstroeer, Joerg and Drude, Lukas and Heymann, Jahn and et al.}, year={2018}
    }'
  chicago: Kitza, Markus, Wilfried Michel, Christoph Boeddeker, Jens Heitkaemper,
    Tobias Menne, Ralf Schlüter, Hermann Ney, et al. “The RWTH/UPB System Combination
    for the CHiME 2018 Workshop.” In <i>Proc. CHiME 2018 Workshop on Speech Processing
    in Everyday Environments, Hyderabad, India</i>, 2018.
  ieee: M. Kitza <i>et al.</i>, “The RWTH/UPB System Combination for the CHiME 2018
    Workshop,” 2018.
  mla: Kitza, Markus, et al. “The RWTH/UPB System Combination for the CHiME 2018 Workshop.”
    <i>Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad,
    India</i>, 2018.
  short: 'M. Kitza, W. Michel, C. Boeddeker, J. Heitkaemper, T. Menne, R. Schlüter,
    H. Ney, J. Schmalenstroeer, L. Drude, J. Heymann, R. Haeb-Umbach, in: Proc. CHiME
    2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India,
    2018.'
date_created: 2019-07-12T05:29:58Z
date_updated: 2023-10-26T08:12:14Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Heitkaemper_RWTH_Paper.pdf
oa: '1'
publication: Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments,
  Hyderabad, India
quality_controlled: '1'
status: public
title: The RWTH/UPB System Combination for the CHiME 2018 Workshop
type: conference
user_id: '460'
year: '2018'
...
---
_id: '11836'
abstract:
- lang: eng
  text: Due to their distributed nature wireless acoustic sensor networks offer great
    potential for improved signal acquisition, processing and classification for applications
    such as monitoring and surveillance, home automation, or hands-free telecommunication.
    To reduce the communication demand with a central server and to raise the privacy
    level it is desirable to perform processing at node level. The limited processing
    and memory capabilities on a sensor node, however, stand in contrast to the compute
    and memory intensive deep learning algorithms used in modern speech and audio
    processing. In this work, we perform benchmarking of commonly used convolutional
    and recurrent neural network architectures on a Raspberry Pi based acoustic sensor
    node. We show that it is possible to run medium-sized neural network topologies
    used for speech enhancement and speech recognition in real time. For acoustic
    event recognition, where predictions in a lower temporal resolution are sufficient,
    it is even possible to run current state-of-the-art deep convolutional models
    with a real-time-factor of 0:11.
author:
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Ebbers J, Heitkaemper J, Schmalenstroeer J, Haeb-Umbach R. Benchmarking Neural
    Network Architectures for Acoustic Sensor Networks. In: <i>ITG 2018, Oldenburg,
    Germany</i>. ; 2018.'
  apa: Ebbers, J., Heitkaemper, J., Schmalenstroeer, J., &#38; Haeb-Umbach, R. (2018).
    Benchmarking Neural Network Architectures for Acoustic Sensor Networks. <i>ITG
    2018, Oldenburg, Germany</i>.
  bibtex: '@inproceedings{Ebbers_Heitkaemper_Schmalenstroeer_Haeb-Umbach_2018, title={Benchmarking
    Neural Network Architectures for Acoustic Sensor Networks}, booktitle={ITG 2018,
    Oldenburg, Germany}, author={Ebbers, Janek and Heitkaemper, Jens and Schmalenstroeer,
    Joerg and Haeb-Umbach, Reinhold}, year={2018} }'
  chicago: Ebbers, Janek, Jens Heitkaemper, Joerg Schmalenstroeer, and Reinhold Haeb-Umbach.
    “Benchmarking Neural Network Architectures for Acoustic Sensor Networks.” In <i>ITG
    2018, Oldenburg, Germany</i>, 2018.
  ieee: J. Ebbers, J. Heitkaemper, J. Schmalenstroeer, and R. Haeb-Umbach, “Benchmarking
    Neural Network Architectures for Acoustic Sensor Networks,” 2018.
  mla: Ebbers, Janek, et al. “Benchmarking Neural Network Architectures for Acoustic
    Sensor Networks.” <i>ITG 2018, Oldenburg, Germany</i>, 2018.
  short: 'J. Ebbers, J. Heitkaemper, J. Schmalenstroeer, R. Haeb-Umbach, in: ITG 2018,
    Oldenburg, Germany, 2018.'
date_created: 2019-07-12T05:29:11Z
date_updated: 2023-10-26T08:12:40Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/ITG_2018_Ebbers_Paper.pdf
oa: '1'
publication: ITG 2018, Oldenburg, Germany
quality_controlled: '1'
related_material:
  link:
  - description: Poster
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/ITG_2018_Ebbers_Poster.pdf
status: public
title: Benchmarking Neural Network Architectures for Acoustic Sensor Networks
type: conference
user_id: '460'
year: '2018'
...
---
_id: '11839'
abstract:
- lang: eng
  text: It has been experimentally verified that sampling rate offsets (SROs) between
    the input channels of an acoustic beamformer have a detrimental effect on the
    achievable SNR gains. In this paper we derive an analytic model to study the impact
    of SRO on the estimation of the spatial noise covariance matrix used in MVDR beamforming.
    It is shown that a perfect compensation of the SRO is impossible if the noise
    covariance matrix is estimated by time averaging, even if the SRO is perfectly
    known. The SRO should therefore be compensated for prior to beamformer coefficient
    estimation. We present a novel scheme where SRO compensation and beamforming closely
    interact, saving some computational effort compared to separate SRO adjustment
    followed by acoustic beamforming.
author:
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Schmalenstroeer J, Haeb-Umbach R. Insights into the Interplay of Sampling
    Rate Offsets and MVDR Beamforming. In: <i>ITG 2018, Oldenburg, Germany</i>. ;
    2018.'
  apa: Schmalenstroeer, J., &#38; Haeb-Umbach, R. (2018). Insights into the Interplay
    of Sampling Rate Offsets and MVDR Beamforming. <i>ITG 2018, Oldenburg, Germany</i>.
  bibtex: '@inproceedings{Schmalenstroeer_Haeb-Umbach_2018, title={Insights into the
    Interplay of Sampling Rate Offsets and MVDR Beamforming}, booktitle={ITG 2018,
    Oldenburg, Germany}, author={Schmalenstroeer, Joerg and Haeb-Umbach, Reinhold},
    year={2018} }'
  chicago: Schmalenstroeer, Joerg, and Reinhold Haeb-Umbach. “Insights into the Interplay
    of Sampling Rate Offsets and MVDR Beamforming.” In <i>ITG 2018, Oldenburg, Germany</i>,
    2018.
  ieee: J. Schmalenstroeer and R. Haeb-Umbach, “Insights into the Interplay of Sampling
    Rate Offsets and MVDR Beamforming,” 2018.
  mla: Schmalenstroeer, Joerg, and Reinhold Haeb-Umbach. “Insights into the Interplay
    of Sampling Rate Offsets and MVDR Beamforming.” <i>ITG 2018, Oldenburg, Germany</i>,
    2018.
  short: 'J. Schmalenstroeer, R. Haeb-Umbach, in: ITG 2018, Oldenburg, Germany, 2018.'
date_created: 2019-07-12T05:29:15Z
date_updated: 2023-10-26T08:12:22Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/ITG_2018_Schmalenstroeer_Paper.pdf
oa: '1'
publication: ITG 2018, Oldenburg, Germany
quality_controlled: '1'
status: public
title: Insights into the Interplay of Sampling Rate Offsets and MVDR Beamforming
type: conference
user_id: '460'
year: '2018'
...
