---
_id: '20753'
abstract:
- lang: eng
  text: 'In this paper we present our system for the detection and classification
    of acoustic scenes and events (DCASE) 2020 Challenge Task 4: Sound event detection
    and separation in domestic environments. We introduce two new models: the forward-backward
    convolutional recurrent neural network (FBCRNN) and the tag-conditioned convolutional
    neural network (CNN). The FBCRNN employs two recurrent neural network (RNN) classifiers
    sharing the same CNN for preprocessing. With one RNN processing a recording in
    forward direction and the other in backward direction, the two networks are trained
    to jointly predict audio tags, i.e., weak labels, at each time step within a recording,
    given that at each time step they have jointly processed the whole recording.
    The proposed training encourages the classifiers to tag events as soon as possible.
    Therefore, after training, the networks can be applied to shorter audio segments
    of, e.g., 200ms, allowing sound event detection (SED). Further, we propose a tag-conditioned
    CNN to complement SED. It is trained to predict strong labels while using (predicted)
    tags, i.e., weak labels, as additional input. For training pseudo strong labels
    from a FBCRNN ensemble are used. The presented system scored the fourth and third
    place in the systems and teams rankings, respectively. Subsequent improvements
    allow our system to even outperform the challenge baseline and winner systems
    in average by, respectively, 18.0% and 2.2% event-based F1-score on the validation
    set. Source code is publicly available at https://github.com/fgnt/pb_sed.'
author:
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Ebbers J, Haeb-Umbach R. Forward-Backward Convolutional Recurrent Neural Networks
    and Tag-Conditioned Convolutional Neural Networks for Weakly Labeled Semi-Supervised
    Sound Event Detection. In: <i>Proceedings of the Detection and Classification
    of Acoustic Scenes and Events 2020 Workshop (DCASE2020)</i>. ; 2020.'
  apa: Ebbers, J., &#38; Haeb-Umbach, R. (2020). Forward-Backward Convolutional Recurrent
    Neural Networks and Tag-Conditioned Convolutional Neural Networks for Weakly Labeled
    Semi-Supervised Sound Event Detection. <i>Proceedings of the Detection and Classification
    of Acoustic Scenes and Events 2020 Workshop (DCASE2020)</i>.
  bibtex: '@inproceedings{Ebbers_Haeb-Umbach_2020, title={Forward-Backward Convolutional
    Recurrent Neural Networks and Tag-Conditioned Convolutional Neural Networks for
    Weakly Labeled Semi-Supervised Sound Event Detection}, booktitle={Proceedings
    of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop
    (DCASE2020)}, author={Ebbers, Janek and Haeb-Umbach, Reinhold}, year={2020} }'
  chicago: Ebbers, Janek, and Reinhold Haeb-Umbach. “Forward-Backward Convolutional
    Recurrent Neural Networks and Tag-Conditioned Convolutional Neural Networks for
    Weakly Labeled Semi-Supervised Sound Event Detection.” In <i>Proceedings of the
    Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020)</i>,
    2020.
  ieee: J. Ebbers and R. Haeb-Umbach, “Forward-Backward Convolutional Recurrent Neural
    Networks and Tag-Conditioned Convolutional Neural Networks for Weakly Labeled
    Semi-Supervised Sound Event Detection,” 2020.
  mla: Ebbers, Janek, and Reinhold Haeb-Umbach. “Forward-Backward Convolutional Recurrent
    Neural Networks and Tag-Conditioned Convolutional Neural Networks for Weakly Labeled
    Semi-Supervised Sound Event Detection.” <i>Proceedings of the Detection and Classification
    of Acoustic Scenes and Events 2020 Workshop (DCASE2020)</i>, 2020.
  short: 'J. Ebbers, R. Haeb-Umbach, in: Proceedings of the Detection and Classification
    of Acoustic Scenes and Events 2020 Workshop (DCASE2020), 2020.'
date_created: 2020-12-16T08:55:27Z
date_updated: 2023-11-22T08:27:32Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2020-12-16T08:57:22Z
  date_updated: 2020-12-16T08:57:22Z
  file_id: '20754'
  file_name: DCASE2020Workshop_Ebbers_Paper.pdf
  file_size: 108326
  relation: main_file
file_date_updated: 2020-12-16T08:57:22Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: Proceedings of the Detection and Classification of Acoustic Scenes and
  Events 2020 Workshop (DCASE2020)
quality_controlled: '1'
status: public
title: Forward-Backward Convolutional Recurrent Neural Networks and Tag-Conditioned
  Convolutional Neural Networks for Weakly Labeled Semi-Supervised Sound Event Detection
type: conference
user_id: '34851'
year: '2020'
...
---
_id: '20695'
author:
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Tomohiro
  full_name: Nakatani, Tomohiro
  last_name: Nakatani
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Boeddeker C, Nakatani T, Kinoshita K, Haeb-Umbach R. Jointly Optimal Dereverberation
    and Beamforming. In: <i>ICASSP 2020 - 2020 IEEE International Conference on Acoustics,
    Speech and Signal Processing (ICASSP)</i>. ; 2020. doi:<a href="https://doi.org/10.1109/icassp40776.2020.9054393">10.1109/icassp40776.2020.9054393</a>'
  apa: Boeddeker, C., Nakatani, T., Kinoshita, K., &#38; Haeb-Umbach, R. (2020). Jointly
    Optimal Dereverberation and Beamforming. <i>ICASSP 2020 - 2020 IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>. <a href="https://doi.org/10.1109/icassp40776.2020.9054393">https://doi.org/10.1109/icassp40776.2020.9054393</a>
  bibtex: '@inproceedings{Boeddeker_Nakatani_Kinoshita_Haeb-Umbach_2020, title={Jointly
    Optimal Dereverberation and Beamforming}, DOI={<a href="https://doi.org/10.1109/icassp40776.2020.9054393">10.1109/icassp40776.2020.9054393</a>},
    booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)}, author={Boeddeker, Christoph and Nakatani, Tomohiro
    and Kinoshita, Keisuke and Haeb-Umbach, Reinhold}, year={2020} }'
  chicago: Boeddeker, Christoph, Tomohiro Nakatani, Keisuke Kinoshita, and Reinhold
    Haeb-Umbach. “Jointly Optimal Dereverberation and Beamforming.” In <i>ICASSP 2020
    - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)</i>, 2020. <a href="https://doi.org/10.1109/icassp40776.2020.9054393">https://doi.org/10.1109/icassp40776.2020.9054393</a>.
  ieee: 'C. Boeddeker, T. Nakatani, K. Kinoshita, and R. Haeb-Umbach, “Jointly Optimal
    Dereverberation and Beamforming,” 2020, doi: <a href="https://doi.org/10.1109/icassp40776.2020.9054393">10.1109/icassp40776.2020.9054393</a>.'
  mla: Boeddeker, Christoph, et al. “Jointly Optimal Dereverberation and Beamforming.”
    <i>ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal
    Processing (ICASSP)</i>, 2020, doi:<a href="https://doi.org/10.1109/icassp40776.2020.9054393">10.1109/icassp40776.2020.9054393</a>.
  short: 'C. Boeddeker, T. Nakatani, K. Kinoshita, R. Haeb-Umbach, in: ICASSP 2020
    - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP), 2020.'
date_created: 2020-12-11T12:28:49Z
date_updated: 2024-11-14T09:17:32Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/icassp40776.2020.9054393
file:
- access_level: open_access
  content_type: application/pdf
  creator: cbj
  date_created: 2020-12-11T12:32:44Z
  date_updated: 2020-12-11T12:32:44Z
  file_id: '20698'
  file_name: convBF.pdf
  file_size: 200127
  relation: main_file
file_date_updated: 2020-12-11T12:32:44Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
publication_identifier:
  isbn:
  - '9781509066315'
publication_status: published
status: public
title: Jointly Optimal Dereverberation and Beamforming
type: conference
user_id: '40767'
year: '2020'
...
---
_id: '17762'
abstract:
- lang: eng
  text: 'Abstract Wenn akustische Signalverarbeitung mit automatisiertem Lernen verknüpft
    wird: Nachrichtentechniker arbeiten mit mehreren Mikrofonen und tiefen neuronalen
    Netzen an besserer Spracherkennung unter widrigsten Bedingungen. Von solchen Sensornetzwerken
    könnten langfristig auch digitale Sprachassistenten profitieren.'
author:
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Haeb-Umbach R. Lektionen für Alexa \&#38; Co?! <i>forschung</i>. 2019;44(1):12-15.
    doi:<a href="https://doi.org/10.1002/fors.201970104">10.1002/fors.201970104</a>
  apa: Haeb-Umbach, R. (2019). Lektionen für Alexa \&#38; Co?! <i>Forschung</i>, <i>44</i>(1),
    12–15. <a href="https://doi.org/10.1002/fors.201970104">https://doi.org/10.1002/fors.201970104</a>
  bibtex: '@article{Haeb-Umbach_2019, title={Lektionen für Alexa \&#38; Co?!}, volume={44},
    DOI={<a href="https://doi.org/10.1002/fors.201970104">10.1002/fors.201970104</a>},
    number={1}, journal={forschung}, author={Haeb-Umbach, Reinhold}, year={2019},
    pages={12–15} }'
  chicago: 'Haeb-Umbach, Reinhold. “Lektionen Für Alexa \&#38; Co?!” <i>Forschung</i>
    44, no. 1 (2019): 12–15. <a href="https://doi.org/10.1002/fors.201970104">https://doi.org/10.1002/fors.201970104</a>.'
  ieee: R. Haeb-Umbach, “Lektionen für Alexa \&#38; Co?!,” <i>forschung</i>, vol.
    44, no. 1, pp. 12–15, 2019.
  mla: Haeb-Umbach, Reinhold. “Lektionen Für Alexa \&#38; Co?!” <i>Forschung</i>,
    vol. 44, no. 1, 2019, pp. 12–15, doi:<a href="https://doi.org/10.1002/fors.201970104">10.1002/fors.201970104</a>.
  short: R. Haeb-Umbach, Forschung 44 (2019) 12–15.
date_created: 2020-08-10T09:51:09Z
date_updated: 2022-01-06T06:53:19Z
department:
- _id: '54'
doi: 10.1002/fors.201970104
intvolume: '        44'
issue: '1'
language:
- iso: eng
page: 12-15
publication: forschung
status: public
title: Lektionen für Alexa \& Co?!
type: journal_article
user_id: '44006'
volume: 44
year: '2019'
...
---
_id: '19446'
abstract:
- lang: eng
  text: 'We present a multi-channel database of overlapping speech for training, evaluation,
    and detailed analysis of source separation and extraction algorithms: SMS-WSJ
    -- Spatialized Multi-Speaker Wall Street Journal. It consists of artificially
    mixed speech taken from the WSJ database, but unlike earlier databases we consider
    all WSJ0+1 utterances and take care of strictly separating the speaker sets present
    in the training, validation and test sets. When spatializing the data we ensure
    a high degree of randomness w.r.t. room size, array center and rotation, as well
    as speaker position. Furthermore, this paper offers a critical assessment of recently
    proposed measures of source separation performance. Alongside the code to generate
    the database we provide a source separation baseline and a Kaldi recipe with competitive
    word error rates to provide common ground for evaluation.'
author:
- first_name: Lukas
  full_name: Drude, Lukas
  last_name: Drude
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Drude L, Heitkaemper J, Boeddeker C, Haeb-Umbach R. SMS-WSJ: Database, performance
    measures, and baseline recipe for multi-channel source separation and recognition.
    <i>ArXiv e-prints</i>. 2019.'
  apa: 'Drude, L., Heitkaemper, J., Boeddeker, C., &#38; Haeb-Umbach, R. (2019). SMS-WSJ:
    Database, performance measures, and baseline recipe for multi-channel source separation
    and recognition. <i>ArXiv E-Prints</i>.'
  bibtex: '@article{Drude_Heitkaemper_Boeddeker_Haeb-Umbach_2019, title={SMS-WSJ:
    Database, performance measures, and baseline recipe for multi-channel source separation
    and recognition}, journal={ArXiv e-prints}, author={Drude, Lukas and Heitkaemper,
    Jens and Boeddeker, Christoph and Haeb-Umbach, Reinhold}, year={2019} }'
  chicago: 'Drude, Lukas, Jens Heitkaemper, Christoph Boeddeker, and Reinhold Haeb-Umbach.
    “SMS-WSJ: Database, Performance Measures, and Baseline Recipe for Multi-Channel
    Source Separation and Recognition.” <i>ArXiv E-Prints</i>, 2019.'
  ieee: 'L. Drude, J. Heitkaemper, C. Boeddeker, and R. Haeb-Umbach, “SMS-WSJ: Database,
    performance measures, and baseline recipe for multi-channel source separation
    and recognition,” <i>ArXiv e-prints</i>, 2019.'
  mla: 'Drude, Lukas, et al. “SMS-WSJ: Database, Performance Measures, and Baseline
    Recipe for Multi-Channel Source Separation and Recognition.” <i>ArXiv E-Prints</i>,
    2019.'
  short: L. Drude, J. Heitkaemper, C. Boeddeker, R. Haeb-Umbach, ArXiv E-Prints (2019).
date_created: 2020-09-16T07:59:46Z
date_updated: 2022-01-06T06:54:04Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2020-09-16T08:00:56Z
  date_updated: 2020-12-11T12:22:31Z
  file_id: '19448'
  file_name: ArXiv_2019_Drude.pdf
  file_size: 288594
  relation: main_file
file_date_updated: 2020-12-11T12:22:31Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: ArXiv e-prints
status: public
title: 'SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel
  source separation and recognition'
type: journal_article
user_id: '40767'
year: '2019'
...
---
_id: '11965'
abstract:
- lang: eng
  text: 'We present an unsupervised training approach for a neural network-based mask
    estimator in an acoustic beamforming application. The network is trained to maximize
    a likelihood criterion derived from a spatial mixture model of the observations.
    It is trained from scratch without requiring any parallel data consisting of degraded
    input and clean training targets. Thus, training can be carried out on real recordings
    of noisy speech rather than simulated ones. In contrast to previous work on unsupervised
    training of neural mask estimators, our approach avoids the need for a possibly
    pre-trained teacher model entirely. We demonstrate the effectiveness of our approach
    by speech recognition experiments on two different datasets: one mainly deteriorated
    by noise (CHiME 4) and one by reverberation (REVERB). The results show that the
    performance of the proposed system is on par with a supervised system using oracle
    target masks for training and with a system trained using a model-based teacher.'
author:
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Drude L, Heymann J, Haeb-Umbach R. Unsupervised training of neural mask-based
    beamforming. In: <i>INTERSPEECH 2019, Graz, Austria</i>. ; 2019.'
  apa: Drude, L., Heymann, J., &#38; Haeb-Umbach, R. (2019). Unsupervised training
    of neural mask-based beamforming. In <i>INTERSPEECH 2019, Graz, Austria</i>.
  bibtex: '@inproceedings{Drude_Heymann_Haeb-Umbach_2019, title={Unsupervised training
    of neural mask-based beamforming}, booktitle={INTERSPEECH 2019, Graz, Austria},
    author={Drude, Lukas and Heymann, Jahn and Haeb-Umbach, Reinhold}, year={2019}
    }'
  chicago: Drude, Lukas, Jahn Heymann, and Reinhold Haeb-Umbach. “Unsupervised Training
    of Neural Mask-Based Beamforming.” In <i>INTERSPEECH 2019, Graz, Austria</i>,
    2019.
  ieee: L. Drude, J. Heymann, and R. Haeb-Umbach, “Unsupervised training of neural
    mask-based beamforming,” in <i>INTERSPEECH 2019, Graz, Austria</i>, 2019.
  mla: Drude, Lukas, et al. “Unsupervised Training of Neural Mask-Based Beamforming.”
    <i>INTERSPEECH 2019, Graz, Austria</i>, 2019.
  short: 'L. Drude, J. Heymann, R. Haeb-Umbach, in: INTERSPEECH 2019, Graz, Austria,
    2019.'
date_created: 2019-07-18T09:11:39Z
date_updated: 2022-01-06T06:51:14Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2019-08-13T06:36:44Z
  date_updated: 2019-08-13T06:41:35Z
  file_id: '12914'
  file_name: INTERSPEECH_2019_Drude_Paper.pdf
  file_size: 223413
  relation: main_file
file_date_updated: 2019-08-13T06:41:35Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: INTERSPEECH 2019, Graz, Austria
status: public
title: Unsupervised training of neural mask-based beamforming
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '12874'
abstract:
- lang: eng
  text: We propose a training scheme to train neural network-based source separation
    algorithms from scratch when parallel clean data is unavailable. In particular,
    we demonstrate that an unsupervised spatial clustering algorithm is sufficient
    to guide the training of a deep clustering system. We argue that previous work
    on deep clustering requires strong supervision and elaborate on why this is a
    limitation. We demonstrate that (a) the single-channel deep clustering system
    trained according to the proposed scheme alone is able to achieve a similar performance
    as the multi-channel teacher in terms of word error rates and (b) initializing
    the spatial clustering approach with the deep clustering result yields a relative
    word error rate reduction of 26% over the unsupervised teacher.
author:
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Daniel
  full_name: Hasenklever, Daniel
  last_name: Hasenklever
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Drude L, Hasenklever D, Haeb-Umbach R. Unsupervised Training of a Deep Clustering
    Model for Multichannel Blind Source Separation. In: <i>ICASSP 2019, Brighton,
    UK</i>. ; 2019.'
  apa: Drude, L., Hasenklever, D., &#38; Haeb-Umbach, R. (2019). Unsupervised Training
    of a Deep Clustering Model for Multichannel Blind Source Separation. In <i>ICASSP
    2019, Brighton, UK</i>.
  bibtex: '@inproceedings{Drude_Hasenklever_Haeb-Umbach_2019, title={Unsupervised
    Training of a Deep Clustering Model for Multichannel Blind Source Separation},
    booktitle={ICASSP 2019, Brighton, UK}, author={Drude, Lukas and Hasenklever, Daniel
    and Haeb-Umbach, Reinhold}, year={2019} }'
  chicago: Drude, Lukas, Daniel Hasenklever, and Reinhold Haeb-Umbach. “Unsupervised
    Training of a Deep Clustering Model for Multichannel Blind Source Separation.”
    In <i>ICASSP 2019, Brighton, UK</i>, 2019.
  ieee: L. Drude, D. Hasenklever, and R. Haeb-Umbach, “Unsupervised Training of a
    Deep Clustering Model for Multichannel Blind Source Separation,” in <i>ICASSP
    2019, Brighton, UK</i>, 2019.
  mla: Drude, Lukas, et al. “Unsupervised Training of a Deep Clustering Model for
    Multichannel Blind Source Separation.” <i>ICASSP 2019, Brighton, UK</i>, 2019.
  short: 'L. Drude, D. Hasenklever, R. Haeb-Umbach, in: ICASSP 2019, Brighton, UK,
    2019.'
date_created: 2019-07-23T07:37:54Z
date_updated: 2022-01-06T06:51:21Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2019-08-14T07:19:13Z
  date_updated: 2019-08-14T07:19:13Z
  file_id: '12925'
  file_name: ICASSP_2019_Drude_Paper.pdf
  file_size: 368225
  relation: main_file
file_date_updated: 2019-08-14T07:19:13Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: ICASSP 2019, Brighton, UK
status: public
title: Unsupervised Training of a Deep Clustering Model for Multichannel Blind Source
  Separation
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '12875'
abstract:
- lang: eng
  text: Signal dereverberation using the Weighted Prediction Error (WPE) method has
    been proven to be an effective means to raise the accuracy of far-field speech
    recognition. First proposed as an iterative algorithm, follow-up works have reformulated
    it as a recursive least squares algorithm and therefore enabled its use in online
    applications. For this algorithm, the estimation of the power spectral density
    (PSD) of the anechoic signal plays an important role and strongly influences its
    performance. Recently, we showed that using a neural network PSD estimator leads
    to improved performance for online automatic speech recognition. This, however,
    comes at a price. To train the network, we require parallel data, i.e., utterances
    simultaneously available in clean and reverberated form. Here we propose to overcome
    this limitation by training the network jointly with the acoustic model of the
    speech recognizer. To be specific, the gradients computed from the cross-entropy
    loss between the target senone sequence and the acoustic model network output
    is backpropagated through the complex-valued dereverberation filter estimation
    to the neural network for PSD estimation. Evaluation on two databases demonstrates
    improved performance for on-line processing scenarios while imposing fewer requirements
    on the available training data and thus widening the range of applications.
author:
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Tomohiro
  full_name: Nakatani, Tomohiro
  last_name: Nakatani
citation:
  ama: 'Heymann J, Drude L, Haeb-Umbach R, Kinoshita K, Nakatani T. Joint Optimization
    of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online
    ASR. In: <i>ICASSP 2019, Brighton, UK</i>. ; 2019.'
  apa: Heymann, J., Drude, L., Haeb-Umbach, R., Kinoshita, K., &#38; Nakatani, T.
    (2019). Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic
    Model for Robust Online ASR. In <i>ICASSP 2019, Brighton, UK</i>.
  bibtex: '@inproceedings{Heymann_Drude_Haeb-Umbach_Kinoshita_Nakatani_2019, title={Joint
    Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for
    Robust Online ASR}, booktitle={ICASSP 2019, Brighton, UK}, author={Heymann, Jahn
    and Drude, Lukas and Haeb-Umbach, Reinhold and Kinoshita, Keisuke and Nakatani,
    Tomohiro}, year={2019} }'
  chicago: Heymann, Jahn, Lukas Drude, Reinhold Haeb-Umbach, Keisuke Kinoshita, and
    Tomohiro Nakatani. “Joint Optimization of Neural Network-Based WPE Dereverberation
    and Acoustic Model for Robust Online ASR.” In <i>ICASSP 2019, Brighton, UK</i>,
    2019.
  ieee: J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, and T. Nakatani, “Joint
    Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for
    Robust Online ASR,” in <i>ICASSP 2019, Brighton, UK</i>, 2019.
  mla: Heymann, Jahn, et al. “Joint Optimization of Neural Network-Based WPE Dereverberation
    and Acoustic Model for Robust Online ASR.” <i>ICASSP 2019, Brighton, UK</i>, 2019.
  short: 'J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, T. Nakatani, in: ICASSP
    2019, Brighton, UK, 2019.'
date_created: 2019-07-23T07:42:26Z
date_updated: 2022-01-06T06:51:22Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2019-12-17T07:28:06Z
  date_updated: 2019-12-17T07:28:06Z
  file_id: '15334'
  file_name: ICASSP_2019_Heymann_Paper.pdf
  file_size: 199109
  relation: main_file
file_date_updated: 2019-12-17T07:28:06Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: ICASSP 2019, Brighton, UK
status: public
title: Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic
  Model for Robust Online ASR
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '12876'
abstract:
- lang: eng
  text: In this paper, we present libDirectional, a MATLAB library for directional
    statistics and directional estimation. It supports a variety of commonly used
    distributions on the unit circle, such as the von Mises, wrapped normal, and wrapped
    Cauchy distributions. Furthermore, various distributions on higher-dimensional
    manifolds such as the unit hypersphere and the hypertorus are available. Based
    on these distributions, several recursive filtering algorithms in libDirectional
    allow estimation on these manifolds. The functionality is implemented in a clear,
    well-documented, and object-oriented structure that is both easy to use and easy
    to extend.
author:
- first_name: Gerhard
  full_name: Kurz, Gerhard
  last_name: Kurz
- first_name: Igor
  full_name: Gilitschenski, Igor
  last_name: Gilitschenski
- first_name: Florian
  full_name: Pfaff, Florian
  last_name: Pfaff
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Uwe D.
  full_name: Hanebeck, Uwe D.
  last_name: Hanebeck
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Roland Y.
  full_name: Siegwart, Roland Y.
  last_name: Siegwart
citation:
  ama: 'Kurz G, Gilitschenski I, Pfaff F, et al. Directional Statistics and Filtering
    Using libDirectional. In: <i>Journal of Statistical Software 89(4)</i>. ; 2019.'
  apa: Kurz, G., Gilitschenski, I., Pfaff, F., Drude, L., Hanebeck, U. D., Haeb-Umbach,
    R., &#38; Siegwart, R. Y. (2019). Directional Statistics and Filtering Using libDirectional.
    In <i>Journal of Statistical Software 89(4)</i>.
  bibtex: '@inproceedings{Kurz_Gilitschenski_Pfaff_Drude_Hanebeck_Haeb-Umbach_Siegwart_2019,
    title={Directional Statistics and Filtering Using libDirectional}, booktitle={Journal
    of Statistical Software 89(4)}, author={Kurz, Gerhard and Gilitschenski, Igor
    and Pfaff, Florian and Drude, Lukas and Hanebeck, Uwe D. and Haeb-Umbach, Reinhold
    and Siegwart, Roland Y.}, year={2019} }'
  chicago: Kurz, Gerhard, Igor Gilitschenski, Florian Pfaff, Lukas Drude, Uwe D. Hanebeck,
    Reinhold Haeb-Umbach, and Roland Y. Siegwart. “Directional Statistics and Filtering
    Using LibDirectional.” In <i>Journal of Statistical Software 89(4)</i>, 2019.
  ieee: G. Kurz <i>et al.</i>, “Directional Statistics and Filtering Using libDirectional,”
    in <i>Journal of Statistical Software 89(4)</i>, 2019.
  mla: Kurz, Gerhard, et al. “Directional Statistics and Filtering Using LibDirectional.”
    <i>Journal of Statistical Software 89(4)</i>, 2019.
  short: 'G. Kurz, I. Gilitschenski, F. Pfaff, L. Drude, U.D. Hanebeck, R. Haeb-Umbach,
    R.Y. Siegwart, in: Journal of Statistical Software 89(4), 2019.'
date_created: 2019-07-23T07:44:59Z
date_updated: 2022-01-06T06:51:22Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2019-08-14T07:16:05Z
  date_updated: 2019-08-14T07:16:05Z
  file_id: '12923'
  file_name: JournalofStatisticalSoftware_2019_Drude_Paper.pdf
  file_size: 1522964
  relation: main_file
file_date_updated: 2019-08-14T07:16:05Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
publication: Journal of Statistical Software 89(4)
status: public
title: Directional Statistics and Filtering Using libDirectional
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '12890'
abstract:
- lang: eng
  text: 'We formulate a generic framework for blind source separation (BSS), which
    allows integrating data-driven spectro-temporal methods, such as deep clustering
    and deep attractor networks, with physically motivated probabilistic spatial methods,
    such as complex angular central Gaussian mixture models. The integrated model
    exploits the complementary strengths of the two approaches to BSS: the strong
    modeling power of neural networks, which, however, is based on supervised learning,
    and the ease of unsupervised learning of the spatial mixture models whose few
    parameters can be estimated on as little as a single segment of a real mixture
    of speech. Experiments are carried out on both artificially mixed speech and true
    recordings of speech mixtures. The experiments verify that the integrated models
    consistently outperform the individual components. We further extend the models
    to cope with noisy, reverberant speech and introduce a cross-domain teacher–student
    training where the mixture model serves as the teacher to provide training targets
    for the student neural network.'
author:
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Drude L, Haeb-Umbach R. Integration of Neural Networks and Probabilistic Spatial
    Models for Acoustic Blind Source Separation. <i>IEEE Journal of Selected Topics
    in Signal Processing</i>. 2019. doi:<a href="https://doi.org/10.1109/JSTSP.2019.2912565">10.1109/JSTSP.2019.2912565</a>
  apa: Drude, L., &#38; Haeb-Umbach, R. (2019). Integration of Neural Networks and
    Probabilistic Spatial Models for Acoustic Blind Source Separation. <i>IEEE Journal
    of Selected Topics in Signal Processing</i>. <a href="https://doi.org/10.1109/JSTSP.2019.2912565">https://doi.org/10.1109/JSTSP.2019.2912565</a>
  bibtex: '@article{Drude_Haeb-Umbach_2019, title={Integration of Neural Networks
    and Probabilistic Spatial Models for Acoustic Blind Source Separation}, DOI={<a
    href="https://doi.org/10.1109/JSTSP.2019.2912565">10.1109/JSTSP.2019.2912565</a>},
    journal={IEEE Journal of Selected Topics in Signal Processing}, author={Drude,
    Lukas and Haeb-Umbach, Reinhold}, year={2019} }'
  chicago: Drude, Lukas, and Reinhold Haeb-Umbach. “Integration of Neural Networks
    and Probabilistic Spatial Models for Acoustic Blind Source Separation.” <i>IEEE
    Journal of Selected Topics in Signal Processing</i>, 2019. <a href="https://doi.org/10.1109/JSTSP.2019.2912565">https://doi.org/10.1109/JSTSP.2019.2912565</a>.
  ieee: L. Drude and R. Haeb-Umbach, “Integration of Neural Networks and Probabilistic
    Spatial Models for Acoustic Blind Source Separation,” <i>IEEE Journal of Selected
    Topics in Signal Processing</i>, 2019.
  mla: Drude, Lukas, and Reinhold Haeb-Umbach. “Integration of Neural Networks and
    Probabilistic Spatial Models for Acoustic Blind Source Separation.” <i>IEEE Journal
    of Selected Topics in Signal Processing</i>, 2019, doi:<a href="https://doi.org/10.1109/JSTSP.2019.2912565">10.1109/JSTSP.2019.2912565</a>.
  short: L. Drude, R. Haeb-Umbach, IEEE Journal of Selected Topics in Signal Processing
    (2019).
date_created: 2019-07-26T08:38:46Z
date_updated: 2022-01-06T06:51:23Z
ddc:
- '050'
department:
- _id: '54'
doi: 10.1109/JSTSP.2019.2912565
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2019-08-07T07:12:21Z
  date_updated: 2019-08-14T07:11:22Z
  file_id: '12903'
  file_name: IEEE Jounal_2019_Drude_Paper.pdf
  file_size: 967424
  relation: main_file
file_date_updated: 2019-08-14T07:11:22Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: IEEE Journal of Selected Topics in Signal Processing
publication_identifier:
  eissn:
  - 1941-0484
status: public
title: Integration of Neural Networks and Probabilistic Spatial Models for Acoustic
  Blind Source Separation
type: journal_article
user_id: '11213'
year: '2019'
...
---
_id: '15812'
abstract:
- lang: eng
  text: Connectionist temporal classification (CTC) is a sequence-level loss that
    has been successfully applied to train recurrent neural network (RNN) models for
    automatic speech recognition. However, one major weakness of CTC is the conditional
    independence assumption that makes it difficult for the model to learn label dependencies.
    In this paper, we propose stimulated CTC, which uses stimulated learning to help
    CTC models learn label dependencies implicitly by using an auxiliary RNN to generate
    the appropriate stimuli. This stimuli comes in the form of an additional stimulation
    loss term which encourages the model to learn said label dependencies. The auxiliary
    network is only used during training and the inference model has the same structure
    as a standard CTC model. The proposed stimulated CTC model achieves about 35%
    relative character error rate improvements on a synthetic gesture keyboard recognition
    task and over 30% relative word error rate improvements on the Librispeech automatic
    speech recognition tasks over a baseline model trained with CTC only.
author:
- first_name: Jahn
  full_name: Heymann, Jahn
  last_name: Heymann
- first_name: Bo Li
  full_name: Khe Chai Sim, Bo Li
  last_name: Khe Chai Sim
citation:
  ama: 'Heymann J, Khe Chai Sim BL. Improving CTC Using Stimulated Learning for Sequence
    Modeling. In: <i>ICASSP 2019, Brighton, UK</i>. ; 2019.'
  apa: Heymann, J., &#38; Khe Chai Sim, B. L. (2019). Improving CTC Using Stimulated
    Learning for Sequence Modeling. In <i>ICASSP 2019, Brighton, UK</i>.
  bibtex: '@inproceedings{Heymann_Khe Chai Sim_2019, title={Improving CTC Using Stimulated
    Learning for Sequence Modeling}, booktitle={ICASSP 2019, Brighton, UK}, author={Heymann,
    Jahn and Khe Chai Sim, Bo Li}, year={2019} }'
  chicago: Heymann, Jahn, and Bo Li Khe Chai Sim. “Improving CTC Using Stimulated
    Learning for Sequence Modeling.” In <i>ICASSP 2019, Brighton, UK</i>, 2019.
  ieee: J. Heymann and B. L. Khe Chai Sim, “Improving CTC Using Stimulated Learning
    for Sequence Modeling,” in <i>ICASSP 2019, Brighton, UK</i>, 2019.
  mla: Heymann, Jahn, and Bo Li Khe Chai Sim. “Improving CTC Using Stimulated Learning
    for Sequence Modeling.” <i>ICASSP 2019, Brighton, UK</i>, 2019.
  short: 'J. Heymann, B.L. Khe Chai Sim, in: ICASSP 2019, Brighton, UK, 2019.'
date_created: 2020-02-06T07:22:47Z
date_updated: 2022-01-06T06:52:35Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2020-02-06T07:24:26Z
  date_updated: 2020-02-06T07:24:26Z
  file_id: '15813'
  file_name: ICASSP_2019_Heymann_1_Paper.pdf
  file_size: 239665
  relation: main_file
file_date_updated: 2020-02-06T07:24:26Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: ICASSP 2019, Brighton, UK
status: public
title: Improving CTC Using Stimulated Learning for Sequence Modeling
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '15816'
abstract:
- lang: eng
  text: 'Despite the strong modeling power of neural network acoustic models, speech
    enhancement has been shown to deliver additional word error rate improvements
    if multi-channel data is available. However, there has been a longstanding debate
    whether enhancement should also be carried out on the ASR training data. In an
    extensive experimental evaluation on the acoustically very challenging CHiME-5
    dinner party data we show that: (i) cleaning up the training data can lead to
    substantial error rate reductions, and (ii) enhancement in training is advisable
    as long as enhancement in test is at least as strong as in training. This approach
    stands in contrast and delivers larger gains than the common strategy reported
    in the literature to augment the training database with additional artificially
    degraded speech. Together with an acoustic model topology consisting of initial
    CNN layers followed by factorized TDNN layers we achieve with 41.6% and 43.2%
    WER on the DEV and EVAL test sets, respectively, a new single-system state-of-the-art
    result on the CHiME-5 data. This is a 8% relative improvement compared to the
    best word error rate published so far for a speech recognizer without system combination.'
author:
- first_name: Catalin
  full_name: Zorila, Catalin
  last_name: Zorila
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Rama
  full_name: Doddipatla, Rama
  last_name: Doddipatla
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Zorila C, Boeddeker C, Doddipatla R, Haeb-Umbach R. An Investigation Into
    the Effectiveness of Enhancement in ASR Training and Test for Chime-5 Dinner Party
    Transcription. In: <i>ASRU 2019, Sentosa, Singapore</i>. ; 2019.'
  apa: Zorila, C., Boeddeker, C., Doddipatla, R., &#38; Haeb-Umbach, R. (2019). An
    Investigation Into the Effectiveness of Enhancement in ASR Training and Test for
    Chime-5 Dinner Party Transcription. In <i>ASRU 2019, Sentosa, Singapore</i>.
  bibtex: '@inproceedings{Zorila_Boeddeker_Doddipatla_Haeb-Umbach_2019, title={An
    Investigation Into the Effectiveness of Enhancement in ASR Training and Test for
    Chime-5 Dinner Party Transcription}, booktitle={ASRU 2019, Sentosa, Singapore},
    author={Zorila, Catalin and Boeddeker, Christoph and Doddipatla, Rama and Haeb-Umbach,
    Reinhold}, year={2019} }'
  chicago: Zorila, Catalin, Christoph Boeddeker, Rama Doddipatla, and Reinhold Haeb-Umbach.
    “An Investigation Into the Effectiveness of Enhancement in ASR Training and Test
    for Chime-5 Dinner Party Transcription.” In <i>ASRU 2019, Sentosa, Singapore</i>,
    2019.
  ieee: C. Zorila, C. Boeddeker, R. Doddipatla, and R. Haeb-Umbach, “An Investigation
    Into the Effectiveness of Enhancement in ASR Training and Test for Chime-5 Dinner
    Party Transcription,” in <i>ASRU 2019, Sentosa, Singapore</i>, 2019.
  mla: Zorila, Catalin, et al. “An Investigation Into the Effectiveness of Enhancement
    in ASR Training and Test for Chime-5 Dinner Party Transcription.” <i>ASRU 2019,
    Sentosa, Singapore</i>, 2019.
  short: 'C. Zorila, C. Boeddeker, R. Doddipatla, R. Haeb-Umbach, in: ASRU 2019, Sentosa,
    Singapore, 2019.'
date_created: 2020-02-06T07:35:08Z
date_updated: 2022-01-06T06:52:37Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2020-02-06T07:42:42Z
  date_updated: 2020-02-06T07:42:42Z
  file_id: '15817'
  file_name: ASRU_2019_Boeddeker_Paper.pdf
  file_size: 200256
  relation: main_file
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2020-02-06T07:42:55Z
  date_updated: 2020-02-06T07:42:55Z
  file_id: '15818'
  file_name: ASRU_2019_Boeddeker_Poster.pdf
  file_size: 123963
  relation: main_file
file_date_updated: 2020-02-06T07:42:55Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: ASRU 2019, Sentosa, Singapore
status: public
title: An Investigation Into the Effectiveness of Enhancement in ASR Training and
  Test for Chime-5 Dinner Party Transcription
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '14822'
abstract:
- lang: eng
  text: Multi-talker speech and moving speakers still pose a significant challenge
    to automatic speech recognition systems. Assuming an enrollment utterance of the
    target speakeris available, the so-called SpeakerBeam concept has been recently
    proposed to extract the target speaker from a speech mixture. If multi-channel
    input is available, spatial properties of the speaker can be exploited to support
    the source extraction. In this contribution we investigate different approaches
    to exploit such spatial information. In particular, we are interested in the question,
    how useful this information is if the target speaker changes his/her position.
    To this end, we present a SpeakerBeam-based source extraction network that is
    adapted to work on moving speakers by recursively updating the beamformer coefficients.
    Experimental results are presented on two data sets, one with articially created
    room impulse responses, and one with real room impulse responses and noise recorded
    in a conference room. Interestingly, spatial features turn out to be advantageous
    even if the speaker position changes.
author:
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Thomas
  full_name: Feher, Thomas
  last_name: Feher
- first_name: Michael
  full_name: Freitag, Michael
  last_name: Freitag
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Heitkaemper J, Feher T, Freitag M, Haeb-Umbach R. A Study on Online Source
    Extraction in the Presence of Changing Speaker Positions. In: <i>International
    Conference on Statistical Language and Speech Processing 2019, Ljubljana, Slovenia</i>.
    ; 2019.'
  apa: Heitkaemper, J., Feher, T., Freitag, M., &#38; Haeb-Umbach, R. (2019). A Study
    on Online Source Extraction in the Presence of Changing Speaker Positions. In
    <i>International Conference on Statistical Language and Speech Processing 2019,
    Ljubljana, Slovenia</i>.
  bibtex: '@inproceedings{Heitkaemper_Feher_Freitag_Haeb-Umbach_2019, title={A Study
    on Online Source Extraction in the Presence of Changing Speaker Positions}, booktitle={International
    Conference on Statistical Language and Speech Processing 2019, Ljubljana, Slovenia},
    author={Heitkaemper, Jens and Feher, Thomas and Freitag, Michael and Haeb-Umbach,
    Reinhold}, year={2019} }'
  chicago: Heitkaemper, Jens, Thomas Feher, Michael Freitag, and Reinhold Haeb-Umbach.
    “A Study on Online Source Extraction in the Presence of Changing Speaker Positions.”
    In <i>International Conference on Statistical Language and Speech Processing 2019,
    Ljubljana, Slovenia</i>, 2019.
  ieee: J. Heitkaemper, T. Feher, M. Freitag, and R. Haeb-Umbach, “A Study on Online
    Source Extraction in the Presence of Changing Speaker Positions,” in <i>International
    Conference on Statistical Language and Speech Processing 2019, Ljubljana, Slovenia</i>,
    2019.
  mla: Heitkaemper, Jens, et al. “A Study on Online Source Extraction in the Presence
    of Changing Speaker Positions.” <i>International Conference on Statistical Language
    and Speech Processing 2019, Ljubljana, Slovenia</i>, 2019.
  short: 'J. Heitkaemper, T. Feher, M. Freitag, R. Haeb-Umbach, in: International
    Conference on Statistical Language and Speech Processing 2019, Ljubljana, Slovenia,
    2019.'
date_created: 2019-11-06T09:43:03Z
date_updated: 2022-01-06T06:52:06Z
ddc:
- '006'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2019-11-06T10:02:26Z
  date_updated: 2019-11-08T07:47:12Z
  file_id: '14823'
  file_name: SLSP_2019_Heitkaemper_Paper.pdf
  file_size: 578595
  relation: main_file
file_date_updated: 2019-11-08T07:47:12Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: International Conference on Statistical Language and Speech Processing
  2019, Ljubljana, Slovenia
status: public
title: A Study on Online Source Extraction in the Presence of Changing Speaker Positions
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '14824'
abstract:
- lang: eng
  text: This paper deals with multi-channel speech recognition in scenarios with multiple
    speakers. Recently, the spectral characteristics of a target speaker, extracted
    from an adaptation utterance, have been used to guide a neural network mask estimator
    to focus on that speaker. In this work we present two variants of speakeraware
    neural networks, which exploit both spectral and spatial information to allow
    better discrimination between target and interfering speakers. Thus, we introduce
    either a spatial preprocessing prior to the mask estimation or a spatial plus
    spectral speaker characterization block whose output is directly fed into the
    neural mask estimator. The target speaker’s spectral and spatial signature is
    extracted from an adaptation utterance recorded at the beginning of a session.
    We further adapt the architecture for low-latency processing by means of block-online
    beamforming that recursively updates the signal statistics. Experimental results
    show that the additional spatial information clearly improves source extraction,
    in particular in the same-gender case, and that our proposal achieves state-of-the-art
    performance in terms of distortion reduction and recognition accuracy.
author:
- first_name: Juan M.
  full_name: Martin-Donas, Juan M.
  last_name: Martin-Donas
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Angel M.
  full_name: Gomez, Angel M.
  last_name: Gomez
- first_name: Antonio M.
  full_name: Peinado, Antonio M.
  last_name: Peinado
citation:
  ama: 'Martin-Donas JM, Heitkaemper J, Haeb-Umbach R, Gomez AM, Peinado AM. Multi-Channel
    Block-Online Source Extraction based on Utterance Adaptation. In: <i>INTERSPEECH
    2019, Graz, Austria</i>. ; 2019.'
  apa: Martin-Donas, J. M., Heitkaemper, J., Haeb-Umbach, R., Gomez, A. M., &#38;
    Peinado, A. M. (2019). Multi-Channel Block-Online Source Extraction based on Utterance
    Adaptation. In <i>INTERSPEECH 2019, Graz, Austria</i>.
  bibtex: '@inproceedings{Martin-Donas_Heitkaemper_Haeb-Umbach_Gomez_Peinado_2019,
    title={Multi-Channel Block-Online Source Extraction based on Utterance Adaptation},
    booktitle={INTERSPEECH 2019, Graz, Austria}, author={Martin-Donas, Juan M. and
    Heitkaemper, Jens and Haeb-Umbach, Reinhold and Gomez, Angel M. and Peinado, Antonio
    M.}, year={2019} }'
  chicago: Martin-Donas, Juan M., Jens Heitkaemper, Reinhold Haeb-Umbach, Angel M.
    Gomez, and Antonio M. Peinado. “Multi-Channel Block-Online Source Extraction Based
    on Utterance Adaptation.” In <i>INTERSPEECH 2019, Graz, Austria</i>, 2019.
  ieee: J. M. Martin-Donas, J. Heitkaemper, R. Haeb-Umbach, A. M. Gomez, and A. M.
    Peinado, “Multi-Channel Block-Online Source Extraction based on Utterance Adaptation,”
    in <i>INTERSPEECH 2019, Graz, Austria</i>, 2019.
  mla: Martin-Donas, Juan M., et al. “Multi-Channel Block-Online Source Extraction
    Based on Utterance Adaptation.” <i>INTERSPEECH 2019, Graz, Austria</i>, 2019.
  short: 'J.M. Martin-Donas, J. Heitkaemper, R. Haeb-Umbach, A.M. Gomez, A.M. Peinado,
    in: INTERSPEECH 2019, Graz, Austria, 2019.'
date_created: 2019-11-06T10:04:49Z
date_updated: 2022-01-06T06:52:07Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2019-11-06T10:07:15Z
  date_updated: 2019-11-08T07:46:37Z
  file_id: '14825'
  file_name: INTERSPEECH_2019_Heitkaemper_Paper.pdf
  file_size: 225689
  relation: main_file
file_date_updated: 2019-11-08T07:46:37Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: INTERSPEECH 2019, Graz, Austria
status: public
title: Multi-Channel Block-Online Source Extraction based on Utterance Adaptation
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '14826'
abstract:
- lang: eng
  text: In this paper, we present Hitachi and Paderborn University’s joint effort
    for automatic speech recognition (ASR) in a dinner party scenario. The main challenges
    of ASR systems for dinner party recordings obtained by multiple microphone arrays
    are (1) heavy speech overlaps, (2) severe noise and reverberation, (3) very natural
    onversational content, and possibly (4) insufficient training data. As an example
    of a dinner party scenario, we have chosen the data presented during the CHiME-5
    speech recognition challenge, where the baseline ASR had a 73.3% word error rate
    (WER), and even the best performing system at the CHiME-5 challenge had a 46.1%
    WER. We extensively investigated a combination of the guided source separation-based
    speech enhancement technique and an already proposed strong ASR backend and found
    that a tight combination of these techniques provided substantial accuracy improvements.
    Our final system achieved WERs of 39.94% and 41.64% for the development and evaluation
    data, respectively, both of which are the best published results for the dataset.
    We also investigated with additional training data on the official small data
    in the CHiME-5 corpus to assess the intrinsic difficulty of this ASR task.
author:
- first_name: Naoyuki
  full_name: Kanda, Naoyuki
  last_name: Kanda
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Yusuke
  full_name: Fujita, Yusuke
  last_name: Fujita
- first_name: Shota
  full_name: Horiguchi, Shota
  last_name: Horiguchi
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Kanda N, Boeddeker C, Heitkaemper J, Fujita Y, Horiguchi S, Haeb-Umbach R.
    Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University
    Joint Investigation for Dinner Party ASR. In: <i>INTERSPEECH 2019, Graz, Austria</i>.
    ; 2019.'
  apa: 'Kanda, N., Boeddeker, C., Heitkaemper, J., Fujita, Y., Horiguchi, S., &#38;
    Haeb-Umbach, R. (2019). Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn
    University Joint Investigation for Dinner Party ASR. In <i>INTERSPEECH 2019, Graz,
    Austria</i>.'
  bibtex: '@inproceedings{Kanda_Boeddeker_Heitkaemper_Fujita_Horiguchi_Haeb-Umbach_2019,
    title={Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn
    University Joint Investigation for Dinner Party ASR}, booktitle={INTERSPEECH 2019,
    Graz, Austria}, author={Kanda, Naoyuki and Boeddeker, Christoph and Heitkaemper,
    Jens and Fujita, Yusuke and Horiguchi, Shota and Haeb-Umbach, Reinhold}, year={2019}
    }'
  chicago: 'Kanda, Naoyuki, Christoph Boeddeker, Jens Heitkaemper, Yusuke Fujita,
    Shota Horiguchi, and Reinhold Haeb-Umbach. “Guided Source Separation Meets a Strong
    ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party
    ASR.” In <i>INTERSPEECH 2019, Graz, Austria</i>, 2019.'
  ieee: 'N. Kanda, C. Boeddeker, J. Heitkaemper, Y. Fujita, S. Horiguchi, and R. Haeb-Umbach,
    “Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University
    Joint Investigation for Dinner Party ASR,” in <i>INTERSPEECH 2019, Graz, Austria</i>,
    2019.'
  mla: 'Kanda, Naoyuki, et al. “Guided Source Separation Meets a Strong ASR Backend:
    Hitachi/Paderborn University Joint Investigation for Dinner Party ASR.” <i>INTERSPEECH
    2019, Graz, Austria</i>, 2019.'
  short: 'N. Kanda, C. Boeddeker, J. Heitkaemper, Y. Fujita, S. Horiguchi, R. Haeb-Umbach,
    in: INTERSPEECH 2019, Graz, Austria, 2019.'
date_created: 2019-11-06T10:08:49Z
date_updated: 2022-01-06T06:52:07Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2019-11-06T10:10:23Z
  date_updated: 2019-11-08T07:45:15Z
  file_id: '14827'
  file_name: INTERSPEECH_2019_Boeddeker_Paper.pdf
  file_size: 216202
  relation: main_file
file_date_updated: 2019-11-08T07:45:15Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: INTERSPEECH 2019, Graz, Austria
status: public
title: 'Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University
  Joint Investigation for Dinner Party ASR'
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '13271'
abstract:
- lang: eng
  text: Automatic meeting analysis comprises the tasks of speaker counting, speaker
    diarization, and the separation of overlapped speech, followed by automatic speech
    recognition. This all has to be carried out on arbitrarily long sessions and,
    ideally, in an online or block-online manner. While significant progress has been
    made on individual tasks, this paper presents for the first time an all-neural
    approach to simultaneous speaker counting, diarization and source separation.
    The NN-based estimator operates in a block-online fashion and tracks speakers
    even if they remain silent for a number of time blocks, thus learning a stable
    output order for the separated sources. The neural network is recurrent over time
    as well as over the number of sources. The simulation experiments show that state
    of the art separation performance is achieved, while at the same time delivering
    good diarization and source counting results. It even generalizes well to an unseen
    large number of blocks.
author:
- first_name: Thilo
  full_name: von Neumann, Thilo
  last_name: von Neumann
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Shoko
  full_name: Araki, Shoko
  last_name: Araki
- first_name: Tomohiro
  full_name: Nakatani, Tomohiro
  last_name: Nakatani
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'von Neumann T, Kinoshita K, Delcroix M, Araki S, Nakatani T, Haeb-Umbach R.
    All-neural Online Source Separation, Counting, and Diarization for Meeting Analysis.
    In: <i>ICASSP 2019, Brighton, UK</i>. ; 2019.'
  apa: von Neumann, T., Kinoshita, K., Delcroix, M., Araki, S., Nakatani, T., &#38;
    Haeb-Umbach, R. (2019). All-neural Online Source Separation, Counting, and Diarization
    for Meeting Analysis. In <i>ICASSP 2019, Brighton, UK</i>.
  bibtex: '@inproceedings{von Neumann_Kinoshita_Delcroix_Araki_Nakatani_Haeb-Umbach_2019,
    title={All-neural Online Source Separation, Counting, and Diarization for Meeting
    Analysis}, booktitle={ICASSP 2019, Brighton, UK}, author={von Neumann, Thilo and
    Kinoshita, Keisuke and Delcroix, Marc and Araki, Shoko and Nakatani, Tomohiro
    and Haeb-Umbach, Reinhold}, year={2019} }'
  chicago: Neumann, Thilo von, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro
    Nakatani, and Reinhold Haeb-Umbach. “All-Neural Online Source Separation, Counting,
    and Diarization for Meeting Analysis.” In <i>ICASSP 2019, Brighton, UK</i>, 2019.
  ieee: T. von Neumann, K. Kinoshita, M. Delcroix, S. Araki, T. Nakatani, and R. Haeb-Umbach,
    “All-neural Online Source Separation, Counting, and Diarization for Meeting Analysis,”
    in <i>ICASSP 2019, Brighton, UK</i>, 2019.
  mla: von Neumann, Thilo, et al. “All-Neural Online Source Separation, Counting,
    and Diarization for Meeting Analysis.” <i>ICASSP 2019, Brighton, UK</i>, 2019.
  short: 'T. von Neumann, K. Kinoshita, M. Delcroix, S. Araki, T. Nakatani, R. Haeb-Umbach,
    in: ICASSP 2019, Brighton, UK, 2019.'
date_created: 2019-09-18T08:20:50Z
date_updated: 2022-01-06T06:51:31Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2019-09-18T08:28:39Z
  date_updated: 2019-09-19T07:05:57Z
  file_id: '13272'
  file_name: ICASSP_2019_Neumann_Paper.pdf
  file_size: 126453
  relation: main_file
file_date_updated: 2019-09-19T07:05:57Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
publication: ICASSP 2019, Brighton, UK
status: public
title: All-neural Online Source Separation, Counting, and Diarization for Meeting
  Analysis
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '15814'
abstract:
- lang: eng
  text: Once a popular theme of futuristic science fiction or far-fetched technology
    forecasts, digital home assistants with a spoken language interface have become
    a ubiquitous commodity today. This success has been made possible by major advancements
    in signal processing and machine learning for so-called far-field speech recognition,
    where the commands are spoken at a distance from the sound capturing device. The
    challenges encountered are quite unique and different from many other use cases
    of automatic speech recognition. The purpose of this tutorial article is to describe,
    in a way amenable to the non-specialist, the key speech processing algorithms
    that enable reliable fully hands-free speech interaction with digital home assistants.
    These technologies include multi-channel acoustic echo cancellation, microphone
    array processing and dereverberation techniques for signal enhancement, reliable
    wake-up word and end-of-interaction detection, high-quality speech synthesis,
    as well as sophisticated statistical models for speech and language, learned from
    large amounts of heterogeneous training data. In all these fields, deep learning
    has occupied a critical role.
author:
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Shinji
  full_name: Watanabe, Shinji
  last_name: Watanabe
- first_name: Tomohiro
  full_name: Nakatani, Tomohiro
  last_name: Nakatani
- first_name: Michiel
  full_name: Bacchiani, Michiel
  last_name: Bacchiani
- first_name: Bjoern
  full_name: Hoffmeister, Bjoern
  last_name: Hoffmeister
- first_name: Michael L.
  full_name: Seltzer, Michael L.
  last_name: Seltzer
- first_name: Heiga
  full_name: Zen, Heiga
  last_name: Zen
- first_name: Mehrez
  full_name: Souden, Mehrez
  last_name: Souden
citation:
  ama: 'Haeb-Umbach R, Watanabe S, Nakatani T, et al. Speech Processing for Digital
    Home Assistance: Combining Signal Processing With Deep-Learning Techniques. <i>IEEE
    Signal Processing Magazine</i>. 2019;36(6):111-124. doi:<a href="https://doi.org/10.1109/MSP.2019.2918706">10.1109/MSP.2019.2918706</a>'
  apa: 'Haeb-Umbach, R., Watanabe, S., Nakatani, T., Bacchiani, M., Hoffmeister, B.,
    Seltzer, M. L., Zen, H., &#38; Souden, M. (2019). Speech Processing for Digital
    Home Assistance: Combining Signal Processing With Deep-Learning Techniques. <i>IEEE
    Signal Processing Magazine</i>, <i>36</i>(6), 111–124. <a href="https://doi.org/10.1109/MSP.2019.2918706">https://doi.org/10.1109/MSP.2019.2918706</a>'
  bibtex: '@article{Haeb-Umbach_Watanabe_Nakatani_Bacchiani_Hoffmeister_Seltzer_Zen_Souden_2019,
    title={Speech Processing for Digital Home Assistance: Combining Signal Processing
    With Deep-Learning Techniques}, volume={36}, DOI={<a href="https://doi.org/10.1109/MSP.2019.2918706">10.1109/MSP.2019.2918706</a>},
    number={6}, journal={IEEE Signal Processing Magazine}, author={Haeb-Umbach, Reinhold
    and Watanabe, Shinji and Nakatani, Tomohiro and Bacchiani, Michiel and Hoffmeister,
    Bjoern and Seltzer, Michael L. and Zen, Heiga and Souden, Mehrez}, year={2019},
    pages={111–124} }'
  chicago: 'Haeb-Umbach, Reinhold, Shinji Watanabe, Tomohiro Nakatani, Michiel Bacchiani,
    Bjoern Hoffmeister, Michael L. Seltzer, Heiga Zen, and Mehrez Souden. “Speech
    Processing for Digital Home Assistance: Combining Signal Processing With Deep-Learning
    Techniques.” <i>IEEE Signal Processing Magazine</i> 36, no. 6 (2019): 111–24.
    <a href="https://doi.org/10.1109/MSP.2019.2918706">https://doi.org/10.1109/MSP.2019.2918706</a>.'
  ieee: 'R. Haeb-Umbach <i>et al.</i>, “Speech Processing for Digital Home Assistance:
    Combining Signal Processing With Deep-Learning Techniques,” <i>IEEE Signal Processing
    Magazine</i>, vol. 36, no. 6, pp. 111–124, 2019, doi: <a href="https://doi.org/10.1109/MSP.2019.2918706">10.1109/MSP.2019.2918706</a>.'
  mla: 'Haeb-Umbach, Reinhold, et al. “Speech Processing for Digital Home Assistance:
    Combining Signal Processing With Deep-Learning Techniques.” <i>IEEE Signal Processing
    Magazine</i>, vol. 36, no. 6, 2019, pp. 111–24, doi:<a href="https://doi.org/10.1109/MSP.2019.2918706">10.1109/MSP.2019.2918706</a>.'
  short: R. Haeb-Umbach, S. Watanabe, T. Nakatani, M. Bacchiani, B. Hoffmeister, M.L.
    Seltzer, H. Zen, M. Souden, IEEE Signal Processing Magazine 36 (2019) 111–124.
date_created: 2020-02-06T07:26:20Z
date_updated: 2023-01-09T11:47:09Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/MSP.2019.2918706
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2020-02-06T07:28:26Z
  date_updated: 2020-02-06T07:28:26Z
  file_id: '15815'
  file_name: JournalIEEESignal ProcessingMagazine_2019_Haeb-Umbach_Paper.pdf
  file_size: 1085002
  relation: main_file
file_date_updated: 2020-02-06T07:28:26Z
has_accepted_license: '1'
intvolume: '        36'
issue: '6'
language:
- iso: eng
oa: '1'
page: 111-124
publication: IEEE Signal Processing Magazine
publication_identifier:
  issn:
  - 1558-0792
status: public
title: 'Speech Processing for Digital Home Assistance: Combining Signal Processing
  With Deep-Learning Techniques'
type: journal_article
user_id: '242'
volume: 36
year: '2019'
...
---
_id: '19450'
abstract:
- lang: eng
  text: 'Wenn akustische Signalverarbeitung mit automatisiertem Lernen verknüpft wird:
    Nachrichtentechniker arbeiten mit mehreren Mikrofonen und tiefen neuronalen Netzen
    an besserer Spracherkennung unter widrigsten Bedingungen. Von solchen Sensornetzwerken
    könnten langfristig auch digitale Sprachassistenten profitieren.'
author:
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Haeb-Umbach R. Lektionen für Alexa &#38; Co?! <i>DFG forschung 1/2019</i>.
    Published online 2019:12-15. doi:<a href="https://doi.org/10.1002/fors.201970104">10.1002/fors.201970104</a>
  apa: Haeb-Umbach, R. (2019). Lektionen für Alexa &#38; Co?! <i>DFG Forschung 1/2019</i>,
    12–15. <a href="https://doi.org/10.1002/fors.201970104">https://doi.org/10.1002/fors.201970104</a>
  bibtex: '@article{Haeb-Umbach_2019, title={Lektionen für Alexa &#38; Co?!}, DOI={<a
    href="https://doi.org/10.1002/fors.201970104">10.1002/fors.201970104</a>}, journal={DFG
    forschung 1/2019}, author={Haeb-Umbach, Reinhold}, year={2019}, pages={12–15}
    }'
  chicago: Haeb-Umbach, Reinhold. “Lektionen Für Alexa &#38; Co?!” <i>DFG Forschung
    1/2019</i>, 2019, 12–15. <a href="https://doi.org/10.1002/fors.201970104">https://doi.org/10.1002/fors.201970104</a>.
  ieee: 'R. Haeb-Umbach, “Lektionen für Alexa &#38; Co?!,” <i>DFG forschung 1/2019</i>,
    pp. 12–15, 2019, doi: <a href="https://doi.org/10.1002/fors.201970104">10.1002/fors.201970104</a>.'
  mla: Haeb-Umbach, Reinhold. “Lektionen Für Alexa &#38; Co?!” <i>DFG Forschung 1/2019</i>,
    2019, pp. 12–15, doi:<a href="https://doi.org/10.1002/fors.201970104">10.1002/fors.201970104</a>.
  short: R. Haeb-Umbach, DFG Forschung 1/2019 (2019) 12–15.
date_created: 2020-09-16T08:09:15Z
date_updated: 2023-01-11T11:24:57Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1002/fors.201970104
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2020-09-16T08:10:25Z
  date_updated: 2020-09-16T08:10:25Z
  file_id: '19451'
  file_name: Artikel_2019_haeb_umbach.pdf
  file_size: 337622
  relation: main_file
file_date_updated: 2020-09-16T08:10:25Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
page: 12-15
publication: DFG forschung 1/2019
status: public
title: Lektionen für Alexa & Co?!
type: journal_article
user_id: '59789'
year: '2019'
...
---
_id: '15237'
abstract:
- lang: eng
  text: This  paper  presents  an  approach  to  voice  conversion,  whichdoes neither
    require parallel data nor speaker or phone labels fortraining.  It can convert
    between speakers which are not in thetraining set by employing the previously
    proposed concept of afactorized hierarchical variational autoencoder. Here, linguisticand
    speaker induced variations are separated upon the notionthat content induced variations
    change at a much shorter timescale, i.e., at the segment level, than speaker induced
    variations,which vary at the longer utterance level. In this contribution wepropose
    to employ convolutional instead of recurrent networklayers  in  the  encoder  and  decoder  blocks,  which  is  shown  toachieve
    better phone recognition accuracy on the latent segmentvariables at frame-level
    due to their better temporal resolution.For voice conversion the mean of the utterance
    variables is re-placed with the respective estimated mean of the target speaker.The
    resulting log-mel spectra of the decoder output are used aslocal conditions of
    a WaveNet which is utilized for synthesis ofthe speech waveforms.  Experiments
    show both good disentan-glement properties of the latent space variables, and
    good voiceconversion performance.
author:
- first_name: Tobias
  full_name: Gburrek, Tobias
  id: '44006'
  last_name: Gburrek
- first_name: Thomas
  full_name: Glarner, Thomas
  id: '14169'
  last_name: Glarner
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Petra
  full_name: Wagner, Petra
  last_name: Wagner
citation:
  ama: 'Gburrek T, Glarner T, Ebbers J, Haeb-Umbach R, Wagner P. Unsupervised Learning
    of a Disentangled Speech Representation for Voice Conversion. In: <i>Proc. 10th
    ISCA Speech Synthesis Workshop</i>. ; 2019:81-86. doi:<a href="https://doi.org/10.21437/SSW.2019-15">10.21437/SSW.2019-15</a>'
  apa: Gburrek, T., Glarner, T., Ebbers, J., Haeb-Umbach, R., &#38; Wagner, P. (2019).
    Unsupervised Learning of a Disentangled Speech Representation for Voice Conversion.
    <i>Proc. 10th ISCA Speech Synthesis Workshop</i>, 81–86. <a href="https://doi.org/10.21437/SSW.2019-15">https://doi.org/10.21437/SSW.2019-15</a>
  bibtex: '@inproceedings{Gburrek_Glarner_Ebbers_Haeb-Umbach_Wagner_2019, title={Unsupervised
    Learning of a Disentangled Speech Representation for Voice Conversion}, DOI={<a
    href="https://doi.org/10.21437/SSW.2019-15">10.21437/SSW.2019-15</a>}, booktitle={Proc.
    10th ISCA Speech Synthesis Workshop}, author={Gburrek, Tobias and Glarner, Thomas
    and Ebbers, Janek and Haeb-Umbach, Reinhold and Wagner, Petra}, year={2019}, pages={81–86}
    }'
  chicago: Gburrek, Tobias, Thomas Glarner, Janek Ebbers, Reinhold Haeb-Umbach, and
    Petra Wagner. “Unsupervised Learning of a Disentangled Speech Representation for
    Voice Conversion.” In <i>Proc. 10th ISCA Speech Synthesis Workshop</i>, 81–86,
    2019. <a href="https://doi.org/10.21437/SSW.2019-15">https://doi.org/10.21437/SSW.2019-15</a>.
  ieee: 'T. Gburrek, T. Glarner, J. Ebbers, R. Haeb-Umbach, and P. Wagner, “Unsupervised
    Learning of a Disentangled Speech Representation for Voice Conversion,” in <i>Proc.
    10th ISCA Speech Synthesis Workshop</i>, Vienna, 2019, pp. 81–86, doi: <a href="https://doi.org/10.21437/SSW.2019-15">10.21437/SSW.2019-15</a>.'
  mla: Gburrek, Tobias, et al. “Unsupervised Learning of a Disentangled Speech Representation
    for Voice Conversion.” <i>Proc. 10th ISCA Speech Synthesis Workshop</i>, 2019,
    pp. 81–86, doi:<a href="https://doi.org/10.21437/SSW.2019-15">10.21437/SSW.2019-15</a>.
  short: 'T. Gburrek, T. Glarner, J. Ebbers, R. Haeb-Umbach, P. Wagner, in: Proc.
    10th ISCA Speech Synthesis Workshop, 2019, pp. 81–86.'
conference:
  location: Vienna
  name: 10th ISCA Speech Synthesis Workshop
date_created: 2019-12-04T08:12:29Z
date_updated: 2023-11-17T06:20:39Z
department:
- _id: '54'
doi: 10.21437/SSW.2019-15
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://www.isca-speech.org/archive/pdfs/ssw_2019/gburrek19_ssw.pdf
oa: '1'
page: 81-86
publication: Proc. 10th ISCA Speech Synthesis Workshop
quality_controlled: '1'
related_material:
  link:
  - description: Listening examples
    relation: supplementary_material
    url: http://go.upb.de/vcex
status: public
title: Unsupervised Learning of a Disentangled Speech Representation for Voice Conversion
type: conference
user_id: '44006'
year: '2019'
...
---
_id: '15794'
abstract:
- lang: eng
  text: In this paper we present our audio tagging system for the DCASE 2019 Challenge
    Task 2. We propose a model consisting of a convolutional front end using log-mel-energies
    as input features, a recurrent neural network sequence encoder and a fully connected
    classifier network outputting an activity probability for each of the 80 considered
    event classes. Due to the recurrent neural network, which encodes a whole sequence
    into a single vector, our model is able to process sequences of varying lengths.
    The model is trained with only little manually labeled training data and a larger
    amount of automatically labeled web data, which hence suffers from label noise.
    To efficiently train the model with the provided data we use various data augmentation
    to prevent overfitting and improve generalization. Our best submitted system achieves
    a label-weighted label-ranking average precision (lwlrap) of 75.5% on the private
    test set which is an absolute improvement of 21.7% over the baseline. This system
    scored the second place in the teams ranking of the DCASE 2019 Challenge Task
    2 and the fifth place in the Kaggle competition “Freesound Audio Tagging 2019”
    with more than 400 participants. After the challenge ended we further improved
    performance to 76.5% lwlrap setting a new state-of-the-art on this dataset.
author:
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Ebbers J, Haeb-Umbach R. Convolutional Recurrent Neural Network and Data Augmentation
    for Audio Tagging with Noisy Labels and Minimal Supervision. In: <i>DCASE2019
    Workshop, New York, USA</i>. ; 2019.'
  apa: Ebbers, J., &#38; Haeb-Umbach, R. (2019). Convolutional Recurrent Neural Network
    and Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision.
    <i>DCASE2019 Workshop, New York, USA</i>.
  bibtex: '@inproceedings{Ebbers_Haeb-Umbach_2019, title={Convolutional Recurrent
    Neural Network and Data Augmentation for Audio Tagging with Noisy Labels and Minimal
    Supervision}, booktitle={DCASE2019 Workshop, New York, USA}, author={Ebbers, Janek
    and Haeb-Umbach, Reinhold}, year={2019} }'
  chicago: Ebbers, Janek, and Reinhold Haeb-Umbach. “Convolutional Recurrent Neural
    Network and Data Augmentation for Audio Tagging with Noisy Labels and Minimal
    Supervision.” In <i>DCASE2019 Workshop, New York, USA</i>, 2019.
  ieee: J. Ebbers and R. Haeb-Umbach, “Convolutional Recurrent Neural Network and
    Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision,”
    2019.
  mla: Ebbers, Janek, and Reinhold Haeb-Umbach. “Convolutional Recurrent Neural Network
    and Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision.”
    <i>DCASE2019 Workshop, New York, USA</i>, 2019.
  short: 'J. Ebbers, R. Haeb-Umbach, in: DCASE2019 Workshop, New York, USA, 2019.'
date_created: 2020-02-05T10:16:03Z
date_updated: 2023-11-22T08:30:12Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2020-02-05T10:18:06Z
  date_updated: 2020-02-05T10:18:06Z
  file_id: '15795'
  file_name: DCASE_2019_WS_Ebbers_Paper.pdf
  file_size: 184967
  relation: main_file
file_date_updated: 2020-02-05T10:18:06Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: DCASE2019 Workshop, New York, USA
quality_controlled: '1'
status: public
title: Convolutional Recurrent Neural Network and Data Augmentation for Audio Tagging
  with Noisy Labels and Minimal Supervision
type: conference
user_id: '34851'
year: '2019'
...
---
_id: '15796'
abstract:
- lang: eng
  text: In this paper we consider human daily activity recognition using an acoustic
    sensor network (ASN) which consists of nodes distributed in a home environment.
    Assuming that the ASN is permanently recording, the vast majority of recordings
    is silence. Therefore, we propose to employ a computationally efficient two-stage
    sound recognition system, consisting of an initial sound activity detection (SAD)
    and a subsequent sound event classification (SEC), which is only activated once
    sound activity has been detected. We show how a low-latency activity detector
    with high temporal resolution can be trained from weak labels with low temporal
    resolution. We further demonstrate the advantage of using spatial features for
    the subsequent event classification task.
author:
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Andreas
  full_name: Brendel, Andreas
  last_name: Brendel
- first_name: Walter
  full_name: Kellermann, Walter
  last_name: Kellermann
citation:
  ama: 'Ebbers J, Drude L, Haeb-Umbach R, Brendel A, Kellermann W. Weakly Supervised
    Sound Activity Detection and Event Classification in Acoustic Sensor Networks.
    In: <i>CAMSAP 2019, Guadeloupe, West Indies</i>. ; 2019.'
  apa: Ebbers, J., Drude, L., Haeb-Umbach, R., Brendel, A., &#38; Kellermann, W. (2019).
    Weakly Supervised Sound Activity Detection and Event Classification in Acoustic
    Sensor Networks. <i>CAMSAP 2019, Guadeloupe, West Indies</i>.
  bibtex: '@inproceedings{Ebbers_Drude_Haeb-Umbach_Brendel_Kellermann_2019, title={Weakly
    Supervised Sound Activity Detection and Event Classification in Acoustic Sensor
    Networks}, booktitle={CAMSAP 2019, Guadeloupe, West Indies}, author={Ebbers, Janek
    and Drude, Lukas and Haeb-Umbach, Reinhold and Brendel, Andreas and Kellermann,
    Walter}, year={2019} }'
  chicago: Ebbers, Janek, Lukas Drude, Reinhold Haeb-Umbach, Andreas Brendel, and
    Walter Kellermann. “Weakly Supervised Sound Activity Detection and Event Classification
    in Acoustic Sensor Networks.” In <i>CAMSAP 2019, Guadeloupe, West Indies</i>,
    2019.
  ieee: J. Ebbers, L. Drude, R. Haeb-Umbach, A. Brendel, and W. Kellermann, “Weakly
    Supervised Sound Activity Detection and Event Classification in Acoustic Sensor
    Networks,” 2019.
  mla: Ebbers, Janek, et al. “Weakly Supervised Sound Activity Detection and Event
    Classification in Acoustic Sensor Networks.” <i>CAMSAP 2019, Guadeloupe, West
    Indies</i>, 2019.
  short: 'J. Ebbers, L. Drude, R. Haeb-Umbach, A. Brendel, W. Kellermann, in: CAMSAP
    2019, Guadeloupe, West Indies, 2019.'
date_created: 2020-02-05T10:20:17Z
date_updated: 2023-11-22T08:29:58Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2020-02-05T10:21:39Z
  date_updated: 2020-02-05T10:21:39Z
  file_id: '15797'
  file_name: CAMSAP_2019_WS_Ebbers_Paper.pdf
  file_size: 311887
  relation: main_file
file_date_updated: 2020-02-05T10:21:39Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: CAMSAP 2019, Guadeloupe, West Indies
quality_controlled: '1'
status: public
title: Weakly Supervised Sound Activity Detection and Event Classification in Acoustic
  Sensor Networks
type: conference
user_id: '34851'
year: '2019'
...
