---
_id: '12890'
abstract:
- lang: eng
  text: 'We formulate a generic framework for blind source separation (BSS), which
    allows integrating data-driven spectro-temporal methods, such as deep clustering
    and deep attractor networks, with physically motivated probabilistic spatial methods,
    such as complex angular central Gaussian mixture models. The integrated model
    exploits the complementary strengths of the two approaches to BSS: the strong
    modeling power of neural networks, which, however, is based on supervised learning,
    and the ease of unsupervised learning of the spatial mixture models whose few
    parameters can be estimated on as little as a single segment of a real mixture
    of speech. Experiments are carried out on both artificially mixed speech and true
    recordings of speech mixtures. The experiments verify that the integrated models
    consistently outperform the individual components. We further extend the models
    to cope with noisy, reverberant speech and introduce a cross-domain teacher–student
    training where the mixture model serves as the teacher to provide training targets
    for the student neural network.'
author:
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Drude L, Haeb-Umbach R. Integration of Neural Networks and Probabilistic Spatial
    Models for Acoustic Blind Source Separation. <i>IEEE Journal of Selected Topics
    in Signal Processing</i>. 2019. doi:<a href="https://doi.org/10.1109/JSTSP.2019.2912565">10.1109/JSTSP.2019.2912565</a>
  apa: Drude, L., &#38; Haeb-Umbach, R. (2019). Integration of Neural Networks and
    Probabilistic Spatial Models for Acoustic Blind Source Separation. <i>IEEE Journal
    of Selected Topics in Signal Processing</i>. <a href="https://doi.org/10.1109/JSTSP.2019.2912565">https://doi.org/10.1109/JSTSP.2019.2912565</a>
  bibtex: '@article{Drude_Haeb-Umbach_2019, title={Integration of Neural Networks
    and Probabilistic Spatial Models for Acoustic Blind Source Separation}, DOI={<a
    href="https://doi.org/10.1109/JSTSP.2019.2912565">10.1109/JSTSP.2019.2912565</a>},
    journal={IEEE Journal of Selected Topics in Signal Processing}, author={Drude,
    Lukas and Haeb-Umbach, Reinhold}, year={2019} }'
  chicago: Drude, Lukas, and Reinhold Haeb-Umbach. “Integration of Neural Networks
    and Probabilistic Spatial Models for Acoustic Blind Source Separation.” <i>IEEE
    Journal of Selected Topics in Signal Processing</i>, 2019. <a href="https://doi.org/10.1109/JSTSP.2019.2912565">https://doi.org/10.1109/JSTSP.2019.2912565</a>.
  ieee: L. Drude and R. Haeb-Umbach, “Integration of Neural Networks and Probabilistic
    Spatial Models for Acoustic Blind Source Separation,” <i>IEEE Journal of Selected
    Topics in Signal Processing</i>, 2019.
  mla: Drude, Lukas, and Reinhold Haeb-Umbach. “Integration of Neural Networks and
    Probabilistic Spatial Models for Acoustic Blind Source Separation.” <i>IEEE Journal
    of Selected Topics in Signal Processing</i>, 2019, doi:<a href="https://doi.org/10.1109/JSTSP.2019.2912565">10.1109/JSTSP.2019.2912565</a>.
  short: L. Drude, R. Haeb-Umbach, IEEE Journal of Selected Topics in Signal Processing
    (2019).
date_created: 2019-07-26T08:38:46Z
date_updated: 2022-01-06T06:51:23Z
ddc:
- '050'
department:
- _id: '54'
doi: 10.1109/JSTSP.2019.2912565
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2019-08-07T07:12:21Z
  date_updated: 2019-08-14T07:11:22Z
  file_id: '12903'
  file_name: IEEE Jounal_2019_Drude_Paper.pdf
  file_size: 967424
  relation: main_file
file_date_updated: 2019-08-14T07:11:22Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: IEEE Journal of Selected Topics in Signal Processing
publication_identifier:
  eissn:
  - 1941-0484
status: public
title: Integration of Neural Networks and Probabilistic Spatial Models for Acoustic
  Blind Source Separation
type: journal_article
user_id: '11213'
year: '2019'
...
---
_id: '15816'
abstract:
- lang: eng
  text: 'Despite the strong modeling power of neural network acoustic models, speech
    enhancement has been shown to deliver additional word error rate improvements
    if multi-channel data is available. However, there has been a longstanding debate
    whether enhancement should also be carried out on the ASR training data. In an
    extensive experimental evaluation on the acoustically very challenging CHiME-5
    dinner party data we show that: (i) cleaning up the training data can lead to
    substantial error rate reductions, and (ii) enhancement in training is advisable
    as long as enhancement in test is at least as strong as in training. This approach
    stands in contrast and delivers larger gains than the common strategy reported
    in the literature to augment the training database with additional artificially
    degraded speech. Together with an acoustic model topology consisting of initial
    CNN layers followed by factorized TDNN layers we achieve with 41.6% and 43.2%
    WER on the DEV and EVAL test sets, respectively, a new single-system state-of-the-art
    result on the CHiME-5 data. This is a 8% relative improvement compared to the
    best word error rate published so far for a speech recognizer without system combination.'
author:
- first_name: Catalin
  full_name: Zorila, Catalin
  last_name: Zorila
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Rama
  full_name: Doddipatla, Rama
  last_name: Doddipatla
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Zorila C, Boeddeker C, Doddipatla R, Haeb-Umbach R. An Investigation Into
    the Effectiveness of Enhancement in ASR Training and Test for Chime-5 Dinner Party
    Transcription. In: <i>ASRU 2019, Sentosa, Singapore</i>. ; 2019.'
  apa: Zorila, C., Boeddeker, C., Doddipatla, R., &#38; Haeb-Umbach, R. (2019). An
    Investigation Into the Effectiveness of Enhancement in ASR Training and Test for
    Chime-5 Dinner Party Transcription. In <i>ASRU 2019, Sentosa, Singapore</i>.
  bibtex: '@inproceedings{Zorila_Boeddeker_Doddipatla_Haeb-Umbach_2019, title={An
    Investigation Into the Effectiveness of Enhancement in ASR Training and Test for
    Chime-5 Dinner Party Transcription}, booktitle={ASRU 2019, Sentosa, Singapore},
    author={Zorila, Catalin and Boeddeker, Christoph and Doddipatla, Rama and Haeb-Umbach,
    Reinhold}, year={2019} }'
  chicago: Zorila, Catalin, Christoph Boeddeker, Rama Doddipatla, and Reinhold Haeb-Umbach.
    “An Investigation Into the Effectiveness of Enhancement in ASR Training and Test
    for Chime-5 Dinner Party Transcription.” In <i>ASRU 2019, Sentosa, Singapore</i>,
    2019.
  ieee: C. Zorila, C. Boeddeker, R. Doddipatla, and R. Haeb-Umbach, “An Investigation
    Into the Effectiveness of Enhancement in ASR Training and Test for Chime-5 Dinner
    Party Transcription,” in <i>ASRU 2019, Sentosa, Singapore</i>, 2019.
  mla: Zorila, Catalin, et al. “An Investigation Into the Effectiveness of Enhancement
    in ASR Training and Test for Chime-5 Dinner Party Transcription.” <i>ASRU 2019,
    Sentosa, Singapore</i>, 2019.
  short: 'C. Zorila, C. Boeddeker, R. Doddipatla, R. Haeb-Umbach, in: ASRU 2019, Sentosa,
    Singapore, 2019.'
date_created: 2020-02-06T07:35:08Z
date_updated: 2022-01-06T06:52:37Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2020-02-06T07:42:42Z
  date_updated: 2020-02-06T07:42:42Z
  file_id: '15817'
  file_name: ASRU_2019_Boeddeker_Paper.pdf
  file_size: 200256
  relation: main_file
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2020-02-06T07:42:55Z
  date_updated: 2020-02-06T07:42:55Z
  file_id: '15818'
  file_name: ASRU_2019_Boeddeker_Poster.pdf
  file_size: 123963
  relation: main_file
file_date_updated: 2020-02-06T07:42:55Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: ASRU 2019, Sentosa, Singapore
status: public
title: An Investigation Into the Effectiveness of Enhancement in ASR Training and
  Test for Chime-5 Dinner Party Transcription
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '14822'
abstract:
- lang: eng
  text: Multi-talker speech and moving speakers still pose a significant challenge
    to automatic speech recognition systems. Assuming an enrollment utterance of the
    target speakeris available, the so-called SpeakerBeam concept has been recently
    proposed to extract the target speaker from a speech mixture. If multi-channel
    input is available, spatial properties of the speaker can be exploited to support
    the source extraction. In this contribution we investigate different approaches
    to exploit such spatial information. In particular, we are interested in the question,
    how useful this information is if the target speaker changes his/her position.
    To this end, we present a SpeakerBeam-based source extraction network that is
    adapted to work on moving speakers by recursively updating the beamformer coefficients.
    Experimental results are presented on two data sets, one with articially created
    room impulse responses, and one with real room impulse responses and noise recorded
    in a conference room. Interestingly, spatial features turn out to be advantageous
    even if the speaker position changes.
author:
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Thomas
  full_name: Feher, Thomas
  last_name: Feher
- first_name: Michael
  full_name: Freitag, Michael
  last_name: Freitag
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Heitkaemper J, Feher T, Freitag M, Haeb-Umbach R. A Study on Online Source
    Extraction in the Presence of Changing Speaker Positions. In: <i>International
    Conference on Statistical Language and Speech Processing 2019, Ljubljana, Slovenia</i>.
    ; 2019.'
  apa: Heitkaemper, J., Feher, T., Freitag, M., &#38; Haeb-Umbach, R. (2019). A Study
    on Online Source Extraction in the Presence of Changing Speaker Positions. In
    <i>International Conference on Statistical Language and Speech Processing 2019,
    Ljubljana, Slovenia</i>.
  bibtex: '@inproceedings{Heitkaemper_Feher_Freitag_Haeb-Umbach_2019, title={A Study
    on Online Source Extraction in the Presence of Changing Speaker Positions}, booktitle={International
    Conference on Statistical Language and Speech Processing 2019, Ljubljana, Slovenia},
    author={Heitkaemper, Jens and Feher, Thomas and Freitag, Michael and Haeb-Umbach,
    Reinhold}, year={2019} }'
  chicago: Heitkaemper, Jens, Thomas Feher, Michael Freitag, and Reinhold Haeb-Umbach.
    “A Study on Online Source Extraction in the Presence of Changing Speaker Positions.”
    In <i>International Conference on Statistical Language and Speech Processing 2019,
    Ljubljana, Slovenia</i>, 2019.
  ieee: J. Heitkaemper, T. Feher, M. Freitag, and R. Haeb-Umbach, “A Study on Online
    Source Extraction in the Presence of Changing Speaker Positions,” in <i>International
    Conference on Statistical Language and Speech Processing 2019, Ljubljana, Slovenia</i>,
    2019.
  mla: Heitkaemper, Jens, et al. “A Study on Online Source Extraction in the Presence
    of Changing Speaker Positions.” <i>International Conference on Statistical Language
    and Speech Processing 2019, Ljubljana, Slovenia</i>, 2019.
  short: 'J. Heitkaemper, T. Feher, M. Freitag, R. Haeb-Umbach, in: International
    Conference on Statistical Language and Speech Processing 2019, Ljubljana, Slovenia,
    2019.'
date_created: 2019-11-06T09:43:03Z
date_updated: 2022-01-06T06:52:06Z
ddc:
- '006'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2019-11-06T10:02:26Z
  date_updated: 2019-11-08T07:47:12Z
  file_id: '14823'
  file_name: SLSP_2019_Heitkaemper_Paper.pdf
  file_size: 578595
  relation: main_file
file_date_updated: 2019-11-08T07:47:12Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: International Conference on Statistical Language and Speech Processing
  2019, Ljubljana, Slovenia
status: public
title: A Study on Online Source Extraction in the Presence of Changing Speaker Positions
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '14824'
abstract:
- lang: eng
  text: This paper deals with multi-channel speech recognition in scenarios with multiple
    speakers. Recently, the spectral characteristics of a target speaker, extracted
    from an adaptation utterance, have been used to guide a neural network mask estimator
    to focus on that speaker. In this work we present two variants of speakeraware
    neural networks, which exploit both spectral and spatial information to allow
    better discrimination between target and interfering speakers. Thus, we introduce
    either a spatial preprocessing prior to the mask estimation or a spatial plus
    spectral speaker characterization block whose output is directly fed into the
    neural mask estimator. The target speaker’s spectral and spatial signature is
    extracted from an adaptation utterance recorded at the beginning of a session.
    We further adapt the architecture for low-latency processing by means of block-online
    beamforming that recursively updates the signal statistics. Experimental results
    show that the additional spatial information clearly improves source extraction,
    in particular in the same-gender case, and that our proposal achieves state-of-the-art
    performance in terms of distortion reduction and recognition accuracy.
author:
- first_name: Juan M.
  full_name: Martin-Donas, Juan M.
  last_name: Martin-Donas
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Angel M.
  full_name: Gomez, Angel M.
  last_name: Gomez
- first_name: Antonio M.
  full_name: Peinado, Antonio M.
  last_name: Peinado
citation:
  ama: 'Martin-Donas JM, Heitkaemper J, Haeb-Umbach R, Gomez AM, Peinado AM. Multi-Channel
    Block-Online Source Extraction based on Utterance Adaptation. In: <i>INTERSPEECH
    2019, Graz, Austria</i>. ; 2019.'
  apa: Martin-Donas, J. M., Heitkaemper, J., Haeb-Umbach, R., Gomez, A. M., &#38;
    Peinado, A. M. (2019). Multi-Channel Block-Online Source Extraction based on Utterance
    Adaptation. In <i>INTERSPEECH 2019, Graz, Austria</i>.
  bibtex: '@inproceedings{Martin-Donas_Heitkaemper_Haeb-Umbach_Gomez_Peinado_2019,
    title={Multi-Channel Block-Online Source Extraction based on Utterance Adaptation},
    booktitle={INTERSPEECH 2019, Graz, Austria}, author={Martin-Donas, Juan M. and
    Heitkaemper, Jens and Haeb-Umbach, Reinhold and Gomez, Angel M. and Peinado, Antonio
    M.}, year={2019} }'
  chicago: Martin-Donas, Juan M., Jens Heitkaemper, Reinhold Haeb-Umbach, Angel M.
    Gomez, and Antonio M. Peinado. “Multi-Channel Block-Online Source Extraction Based
    on Utterance Adaptation.” In <i>INTERSPEECH 2019, Graz, Austria</i>, 2019.
  ieee: J. M. Martin-Donas, J. Heitkaemper, R. Haeb-Umbach, A. M. Gomez, and A. M.
    Peinado, “Multi-Channel Block-Online Source Extraction based on Utterance Adaptation,”
    in <i>INTERSPEECH 2019, Graz, Austria</i>, 2019.
  mla: Martin-Donas, Juan M., et al. “Multi-Channel Block-Online Source Extraction
    Based on Utterance Adaptation.” <i>INTERSPEECH 2019, Graz, Austria</i>, 2019.
  short: 'J.M. Martin-Donas, J. Heitkaemper, R. Haeb-Umbach, A.M. Gomez, A.M. Peinado,
    in: INTERSPEECH 2019, Graz, Austria, 2019.'
date_created: 2019-11-06T10:04:49Z
date_updated: 2022-01-06T06:52:07Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2019-11-06T10:07:15Z
  date_updated: 2019-11-08T07:46:37Z
  file_id: '14825'
  file_name: INTERSPEECH_2019_Heitkaemper_Paper.pdf
  file_size: 225689
  relation: main_file
file_date_updated: 2019-11-08T07:46:37Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: INTERSPEECH 2019, Graz, Austria
status: public
title: Multi-Channel Block-Online Source Extraction based on Utterance Adaptation
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '14826'
abstract:
- lang: eng
  text: In this paper, we present Hitachi and Paderborn University’s joint effort
    for automatic speech recognition (ASR) in a dinner party scenario. The main challenges
    of ASR systems for dinner party recordings obtained by multiple microphone arrays
    are (1) heavy speech overlaps, (2) severe noise and reverberation, (3) very natural
    onversational content, and possibly (4) insufficient training data. As an example
    of a dinner party scenario, we have chosen the data presented during the CHiME-5
    speech recognition challenge, where the baseline ASR had a 73.3% word error rate
    (WER), and even the best performing system at the CHiME-5 challenge had a 46.1%
    WER. We extensively investigated a combination of the guided source separation-based
    speech enhancement technique and an already proposed strong ASR backend and found
    that a tight combination of these techniques provided substantial accuracy improvements.
    Our final system achieved WERs of 39.94% and 41.64% for the development and evaluation
    data, respectively, both of which are the best published results for the dataset.
    We also investigated with additional training data on the official small data
    in the CHiME-5 corpus to assess the intrinsic difficulty of this ASR task.
author:
- first_name: Naoyuki
  full_name: Kanda, Naoyuki
  last_name: Kanda
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Yusuke
  full_name: Fujita, Yusuke
  last_name: Fujita
- first_name: Shota
  full_name: Horiguchi, Shota
  last_name: Horiguchi
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Kanda N, Boeddeker C, Heitkaemper J, Fujita Y, Horiguchi S, Haeb-Umbach R.
    Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University
    Joint Investigation for Dinner Party ASR. In: <i>INTERSPEECH 2019, Graz, Austria</i>.
    ; 2019.'
  apa: 'Kanda, N., Boeddeker, C., Heitkaemper, J., Fujita, Y., Horiguchi, S., &#38;
    Haeb-Umbach, R. (2019). Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn
    University Joint Investigation for Dinner Party ASR. In <i>INTERSPEECH 2019, Graz,
    Austria</i>.'
  bibtex: '@inproceedings{Kanda_Boeddeker_Heitkaemper_Fujita_Horiguchi_Haeb-Umbach_2019,
    title={Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn
    University Joint Investigation for Dinner Party ASR}, booktitle={INTERSPEECH 2019,
    Graz, Austria}, author={Kanda, Naoyuki and Boeddeker, Christoph and Heitkaemper,
    Jens and Fujita, Yusuke and Horiguchi, Shota and Haeb-Umbach, Reinhold}, year={2019}
    }'
  chicago: 'Kanda, Naoyuki, Christoph Boeddeker, Jens Heitkaemper, Yusuke Fujita,
    Shota Horiguchi, and Reinhold Haeb-Umbach. “Guided Source Separation Meets a Strong
    ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party
    ASR.” In <i>INTERSPEECH 2019, Graz, Austria</i>, 2019.'
  ieee: 'N. Kanda, C. Boeddeker, J. Heitkaemper, Y. Fujita, S. Horiguchi, and R. Haeb-Umbach,
    “Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University
    Joint Investigation for Dinner Party ASR,” in <i>INTERSPEECH 2019, Graz, Austria</i>,
    2019.'
  mla: 'Kanda, Naoyuki, et al. “Guided Source Separation Meets a Strong ASR Backend:
    Hitachi/Paderborn University Joint Investigation for Dinner Party ASR.” <i>INTERSPEECH
    2019, Graz, Austria</i>, 2019.'
  short: 'N. Kanda, C. Boeddeker, J. Heitkaemper, Y. Fujita, S. Horiguchi, R. Haeb-Umbach,
    in: INTERSPEECH 2019, Graz, Austria, 2019.'
date_created: 2019-11-06T10:08:49Z
date_updated: 2022-01-06T06:52:07Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2019-11-06T10:10:23Z
  date_updated: 2019-11-08T07:45:15Z
  file_id: '14827'
  file_name: INTERSPEECH_2019_Boeddeker_Paper.pdf
  file_size: 216202
  relation: main_file
file_date_updated: 2019-11-08T07:45:15Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: INTERSPEECH 2019, Graz, Austria
status: public
title: 'Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University
  Joint Investigation for Dinner Party ASR'
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '13271'
abstract:
- lang: eng
  text: Automatic meeting analysis comprises the tasks of speaker counting, speaker
    diarization, and the separation of overlapped speech, followed by automatic speech
    recognition. This all has to be carried out on arbitrarily long sessions and,
    ideally, in an online or block-online manner. While significant progress has been
    made on individual tasks, this paper presents for the first time an all-neural
    approach to simultaneous speaker counting, diarization and source separation.
    The NN-based estimator operates in a block-online fashion and tracks speakers
    even if they remain silent for a number of time blocks, thus learning a stable
    output order for the separated sources. The neural network is recurrent over time
    as well as over the number of sources. The simulation experiments show that state
    of the art separation performance is achieved, while at the same time delivering
    good diarization and source counting results. It even generalizes well to an unseen
    large number of blocks.
author:
- first_name: Thilo
  full_name: von Neumann, Thilo
  last_name: von Neumann
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Shoko
  full_name: Araki, Shoko
  last_name: Araki
- first_name: Tomohiro
  full_name: Nakatani, Tomohiro
  last_name: Nakatani
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'von Neumann T, Kinoshita K, Delcroix M, Araki S, Nakatani T, Haeb-Umbach R.
    All-neural Online Source Separation, Counting, and Diarization for Meeting Analysis.
    In: <i>ICASSP 2019, Brighton, UK</i>. ; 2019.'
  apa: von Neumann, T., Kinoshita, K., Delcroix, M., Araki, S., Nakatani, T., &#38;
    Haeb-Umbach, R. (2019). All-neural Online Source Separation, Counting, and Diarization
    for Meeting Analysis. In <i>ICASSP 2019, Brighton, UK</i>.
  bibtex: '@inproceedings{von Neumann_Kinoshita_Delcroix_Araki_Nakatani_Haeb-Umbach_2019,
    title={All-neural Online Source Separation, Counting, and Diarization for Meeting
    Analysis}, booktitle={ICASSP 2019, Brighton, UK}, author={von Neumann, Thilo and
    Kinoshita, Keisuke and Delcroix, Marc and Araki, Shoko and Nakatani, Tomohiro
    and Haeb-Umbach, Reinhold}, year={2019} }'
  chicago: Neumann, Thilo von, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro
    Nakatani, and Reinhold Haeb-Umbach. “All-Neural Online Source Separation, Counting,
    and Diarization for Meeting Analysis.” In <i>ICASSP 2019, Brighton, UK</i>, 2019.
  ieee: T. von Neumann, K. Kinoshita, M. Delcroix, S. Araki, T. Nakatani, and R. Haeb-Umbach,
    “All-neural Online Source Separation, Counting, and Diarization for Meeting Analysis,”
    in <i>ICASSP 2019, Brighton, UK</i>, 2019.
  mla: von Neumann, Thilo, et al. “All-Neural Online Source Separation, Counting,
    and Diarization for Meeting Analysis.” <i>ICASSP 2019, Brighton, UK</i>, 2019.
  short: 'T. von Neumann, K. Kinoshita, M. Delcroix, S. Araki, T. Nakatani, R. Haeb-Umbach,
    in: ICASSP 2019, Brighton, UK, 2019.'
date_created: 2019-09-18T08:20:50Z
date_updated: 2022-01-06T06:51:31Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2019-09-18T08:28:39Z
  date_updated: 2019-09-19T07:05:57Z
  file_id: '13272'
  file_name: ICASSP_2019_Neumann_Paper.pdf
  file_size: 126453
  relation: main_file
file_date_updated: 2019-09-19T07:05:57Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
publication: ICASSP 2019, Brighton, UK
status: public
title: All-neural Online Source Separation, Counting, and Diarization for Meeting
  Analysis
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '15814'
abstract:
- lang: eng
  text: Once a popular theme of futuristic science fiction or far-fetched technology
    forecasts, digital home assistants with a spoken language interface have become
    a ubiquitous commodity today. This success has been made possible by major advancements
    in signal processing and machine learning for so-called far-field speech recognition,
    where the commands are spoken at a distance from the sound capturing device. The
    challenges encountered are quite unique and different from many other use cases
    of automatic speech recognition. The purpose of this tutorial article is to describe,
    in a way amenable to the non-specialist, the key speech processing algorithms
    that enable reliable fully hands-free speech interaction with digital home assistants.
    These technologies include multi-channel acoustic echo cancellation, microphone
    array processing and dereverberation techniques for signal enhancement, reliable
    wake-up word and end-of-interaction detection, high-quality speech synthesis,
    as well as sophisticated statistical models for speech and language, learned from
    large amounts of heterogeneous training data. In all these fields, deep learning
    has occupied a critical role.
author:
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Shinji
  full_name: Watanabe, Shinji
  last_name: Watanabe
- first_name: Tomohiro
  full_name: Nakatani, Tomohiro
  last_name: Nakatani
- first_name: Michiel
  full_name: Bacchiani, Michiel
  last_name: Bacchiani
- first_name: Bjoern
  full_name: Hoffmeister, Bjoern
  last_name: Hoffmeister
- first_name: Michael L.
  full_name: Seltzer, Michael L.
  last_name: Seltzer
- first_name: Heiga
  full_name: Zen, Heiga
  last_name: Zen
- first_name: Mehrez
  full_name: Souden, Mehrez
  last_name: Souden
citation:
  ama: 'Haeb-Umbach R, Watanabe S, Nakatani T, et al. Speech Processing for Digital
    Home Assistance: Combining Signal Processing With Deep-Learning Techniques. <i>IEEE
    Signal Processing Magazine</i>. 2019;36(6):111-124. doi:<a href="https://doi.org/10.1109/MSP.2019.2918706">10.1109/MSP.2019.2918706</a>'
  apa: 'Haeb-Umbach, R., Watanabe, S., Nakatani, T., Bacchiani, M., Hoffmeister, B.,
    Seltzer, M. L., Zen, H., &#38; Souden, M. (2019). Speech Processing for Digital
    Home Assistance: Combining Signal Processing With Deep-Learning Techniques. <i>IEEE
    Signal Processing Magazine</i>, <i>36</i>(6), 111–124. <a href="https://doi.org/10.1109/MSP.2019.2918706">https://doi.org/10.1109/MSP.2019.2918706</a>'
  bibtex: '@article{Haeb-Umbach_Watanabe_Nakatani_Bacchiani_Hoffmeister_Seltzer_Zen_Souden_2019,
    title={Speech Processing for Digital Home Assistance: Combining Signal Processing
    With Deep-Learning Techniques}, volume={36}, DOI={<a href="https://doi.org/10.1109/MSP.2019.2918706">10.1109/MSP.2019.2918706</a>},
    number={6}, journal={IEEE Signal Processing Magazine}, author={Haeb-Umbach, Reinhold
    and Watanabe, Shinji and Nakatani, Tomohiro and Bacchiani, Michiel and Hoffmeister,
    Bjoern and Seltzer, Michael L. and Zen, Heiga and Souden, Mehrez}, year={2019},
    pages={111–124} }'
  chicago: 'Haeb-Umbach, Reinhold, Shinji Watanabe, Tomohiro Nakatani, Michiel Bacchiani,
    Bjoern Hoffmeister, Michael L. Seltzer, Heiga Zen, and Mehrez Souden. “Speech
    Processing for Digital Home Assistance: Combining Signal Processing With Deep-Learning
    Techniques.” <i>IEEE Signal Processing Magazine</i> 36, no. 6 (2019): 111–24.
    <a href="https://doi.org/10.1109/MSP.2019.2918706">https://doi.org/10.1109/MSP.2019.2918706</a>.'
  ieee: 'R. Haeb-Umbach <i>et al.</i>, “Speech Processing for Digital Home Assistance:
    Combining Signal Processing With Deep-Learning Techniques,” <i>IEEE Signal Processing
    Magazine</i>, vol. 36, no. 6, pp. 111–124, 2019, doi: <a href="https://doi.org/10.1109/MSP.2019.2918706">10.1109/MSP.2019.2918706</a>.'
  mla: 'Haeb-Umbach, Reinhold, et al. “Speech Processing for Digital Home Assistance:
    Combining Signal Processing With Deep-Learning Techniques.” <i>IEEE Signal Processing
    Magazine</i>, vol. 36, no. 6, 2019, pp. 111–24, doi:<a href="https://doi.org/10.1109/MSP.2019.2918706">10.1109/MSP.2019.2918706</a>.'
  short: R. Haeb-Umbach, S. Watanabe, T. Nakatani, M. Bacchiani, B. Hoffmeister, M.L.
    Seltzer, H. Zen, M. Souden, IEEE Signal Processing Magazine 36 (2019) 111–124.
date_created: 2020-02-06T07:26:20Z
date_updated: 2023-01-09T11:47:09Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/MSP.2019.2918706
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2020-02-06T07:28:26Z
  date_updated: 2020-02-06T07:28:26Z
  file_id: '15815'
  file_name: JournalIEEESignal ProcessingMagazine_2019_Haeb-Umbach_Paper.pdf
  file_size: 1085002
  relation: main_file
file_date_updated: 2020-02-06T07:28:26Z
has_accepted_license: '1'
intvolume: '        36'
issue: '6'
language:
- iso: eng
oa: '1'
page: 111-124
publication: IEEE Signal Processing Magazine
publication_identifier:
  issn:
  - 1558-0792
status: public
title: 'Speech Processing for Digital Home Assistance: Combining Signal Processing
  With Deep-Learning Techniques'
type: journal_article
user_id: '242'
volume: 36
year: '2019'
...
---
_id: '19450'
abstract:
- lang: eng
  text: 'Wenn akustische Signalverarbeitung mit automatisiertem Lernen verknüpft wird:
    Nachrichtentechniker arbeiten mit mehreren Mikrofonen und tiefen neuronalen Netzen
    an besserer Spracherkennung unter widrigsten Bedingungen. Von solchen Sensornetzwerken
    könnten langfristig auch digitale Sprachassistenten profitieren.'
author:
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Haeb-Umbach R. Lektionen für Alexa &#38; Co?! <i>DFG forschung 1/2019</i>.
    Published online 2019:12-15. doi:<a href="https://doi.org/10.1002/fors.201970104">10.1002/fors.201970104</a>
  apa: Haeb-Umbach, R. (2019). Lektionen für Alexa &#38; Co?! <i>DFG Forschung 1/2019</i>,
    12–15. <a href="https://doi.org/10.1002/fors.201970104">https://doi.org/10.1002/fors.201970104</a>
  bibtex: '@article{Haeb-Umbach_2019, title={Lektionen für Alexa &#38; Co?!}, DOI={<a
    href="https://doi.org/10.1002/fors.201970104">10.1002/fors.201970104</a>}, journal={DFG
    forschung 1/2019}, author={Haeb-Umbach, Reinhold}, year={2019}, pages={12–15}
    }'
  chicago: Haeb-Umbach, Reinhold. “Lektionen Für Alexa &#38; Co?!” <i>DFG Forschung
    1/2019</i>, 2019, 12–15. <a href="https://doi.org/10.1002/fors.201970104">https://doi.org/10.1002/fors.201970104</a>.
  ieee: 'R. Haeb-Umbach, “Lektionen für Alexa &#38; Co?!,” <i>DFG forschung 1/2019</i>,
    pp. 12–15, 2019, doi: <a href="https://doi.org/10.1002/fors.201970104">10.1002/fors.201970104</a>.'
  mla: Haeb-Umbach, Reinhold. “Lektionen Für Alexa &#38; Co?!” <i>DFG Forschung 1/2019</i>,
    2019, pp. 12–15, doi:<a href="https://doi.org/10.1002/fors.201970104">10.1002/fors.201970104</a>.
  short: R. Haeb-Umbach, DFG Forschung 1/2019 (2019) 12–15.
date_created: 2020-09-16T08:09:15Z
date_updated: 2023-01-11T11:24:57Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1002/fors.201970104
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2020-09-16T08:10:25Z
  date_updated: 2020-09-16T08:10:25Z
  file_id: '19451'
  file_name: Artikel_2019_haeb_umbach.pdf
  file_size: 337622
  relation: main_file
file_date_updated: 2020-09-16T08:10:25Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
page: 12-15
publication: DFG forschung 1/2019
status: public
title: Lektionen für Alexa & Co?!
type: journal_article
user_id: '59789'
year: '2019'
...
---
_id: '15237'
abstract:
- lang: eng
  text: This  paper  presents  an  approach  to  voice  conversion,  whichdoes neither
    require parallel data nor speaker or phone labels fortraining.  It can convert
    between speakers which are not in thetraining set by employing the previously
    proposed concept of afactorized hierarchical variational autoencoder. Here, linguisticand
    speaker induced variations are separated upon the notionthat content induced variations
    change at a much shorter timescale, i.e., at the segment level, than speaker induced
    variations,which vary at the longer utterance level. In this contribution wepropose
    to employ convolutional instead of recurrent networklayers  in  the  encoder  and  decoder  blocks,  which  is  shown  toachieve
    better phone recognition accuracy on the latent segmentvariables at frame-level
    due to their better temporal resolution.For voice conversion the mean of the utterance
    variables is re-placed with the respective estimated mean of the target speaker.The
    resulting log-mel spectra of the decoder output are used aslocal conditions of
    a WaveNet which is utilized for synthesis ofthe speech waveforms.  Experiments
    show both good disentan-glement properties of the latent space variables, and
    good voiceconversion performance.
author:
- first_name: Tobias
  full_name: Gburrek, Tobias
  id: '44006'
  last_name: Gburrek
- first_name: Thomas
  full_name: Glarner, Thomas
  id: '14169'
  last_name: Glarner
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Petra
  full_name: Wagner, Petra
  last_name: Wagner
citation:
  ama: 'Gburrek T, Glarner T, Ebbers J, Haeb-Umbach R, Wagner P. Unsupervised Learning
    of a Disentangled Speech Representation for Voice Conversion. In: <i>Proc. 10th
    ISCA Speech Synthesis Workshop</i>. ; 2019:81-86. doi:<a href="https://doi.org/10.21437/SSW.2019-15">10.21437/SSW.2019-15</a>'
  apa: Gburrek, T., Glarner, T., Ebbers, J., Haeb-Umbach, R., &#38; Wagner, P. (2019).
    Unsupervised Learning of a Disentangled Speech Representation for Voice Conversion.
    <i>Proc. 10th ISCA Speech Synthesis Workshop</i>, 81–86. <a href="https://doi.org/10.21437/SSW.2019-15">https://doi.org/10.21437/SSW.2019-15</a>
  bibtex: '@inproceedings{Gburrek_Glarner_Ebbers_Haeb-Umbach_Wagner_2019, title={Unsupervised
    Learning of a Disentangled Speech Representation for Voice Conversion}, DOI={<a
    href="https://doi.org/10.21437/SSW.2019-15">10.21437/SSW.2019-15</a>}, booktitle={Proc.
    10th ISCA Speech Synthesis Workshop}, author={Gburrek, Tobias and Glarner, Thomas
    and Ebbers, Janek and Haeb-Umbach, Reinhold and Wagner, Petra}, year={2019}, pages={81–86}
    }'
  chicago: Gburrek, Tobias, Thomas Glarner, Janek Ebbers, Reinhold Haeb-Umbach, and
    Petra Wagner. “Unsupervised Learning of a Disentangled Speech Representation for
    Voice Conversion.” In <i>Proc. 10th ISCA Speech Synthesis Workshop</i>, 81–86,
    2019. <a href="https://doi.org/10.21437/SSW.2019-15">https://doi.org/10.21437/SSW.2019-15</a>.
  ieee: 'T. Gburrek, T. Glarner, J. Ebbers, R. Haeb-Umbach, and P. Wagner, “Unsupervised
    Learning of a Disentangled Speech Representation for Voice Conversion,” in <i>Proc.
    10th ISCA Speech Synthesis Workshop</i>, Vienna, 2019, pp. 81–86, doi: <a href="https://doi.org/10.21437/SSW.2019-15">10.21437/SSW.2019-15</a>.'
  mla: Gburrek, Tobias, et al. “Unsupervised Learning of a Disentangled Speech Representation
    for Voice Conversion.” <i>Proc. 10th ISCA Speech Synthesis Workshop</i>, 2019,
    pp. 81–86, doi:<a href="https://doi.org/10.21437/SSW.2019-15">10.21437/SSW.2019-15</a>.
  short: 'T. Gburrek, T. Glarner, J. Ebbers, R. Haeb-Umbach, P. Wagner, in: Proc.
    10th ISCA Speech Synthesis Workshop, 2019, pp. 81–86.'
conference:
  location: Vienna
  name: 10th ISCA Speech Synthesis Workshop
date_created: 2019-12-04T08:12:29Z
date_updated: 2023-11-17T06:20:39Z
department:
- _id: '54'
doi: 10.21437/SSW.2019-15
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://www.isca-speech.org/archive/pdfs/ssw_2019/gburrek19_ssw.pdf
oa: '1'
page: 81-86
publication: Proc. 10th ISCA Speech Synthesis Workshop
quality_controlled: '1'
related_material:
  link:
  - description: Listening examples
    relation: supplementary_material
    url: http://go.upb.de/vcex
status: public
title: Unsupervised Learning of a Disentangled Speech Representation for Voice Conversion
type: conference
user_id: '44006'
year: '2019'
...
---
_id: '15794'
abstract:
- lang: eng
  text: In this paper we present our audio tagging system for the DCASE 2019 Challenge
    Task 2. We propose a model consisting of a convolutional front end using log-mel-energies
    as input features, a recurrent neural network sequence encoder and a fully connected
    classifier network outputting an activity probability for each of the 80 considered
    event classes. Due to the recurrent neural network, which encodes a whole sequence
    into a single vector, our model is able to process sequences of varying lengths.
    The model is trained with only little manually labeled training data and a larger
    amount of automatically labeled web data, which hence suffers from label noise.
    To efficiently train the model with the provided data we use various data augmentation
    to prevent overfitting and improve generalization. Our best submitted system achieves
    a label-weighted label-ranking average precision (lwlrap) of 75.5% on the private
    test set which is an absolute improvement of 21.7% over the baseline. This system
    scored the second place in the teams ranking of the DCASE 2019 Challenge Task
    2 and the fifth place in the Kaggle competition “Freesound Audio Tagging 2019”
    with more than 400 participants. After the challenge ended we further improved
    performance to 76.5% lwlrap setting a new state-of-the-art on this dataset.
author:
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Ebbers J, Haeb-Umbach R. Convolutional Recurrent Neural Network and Data Augmentation
    for Audio Tagging with Noisy Labels and Minimal Supervision. In: <i>DCASE2019
    Workshop, New York, USA</i>. ; 2019.'
  apa: Ebbers, J., &#38; Haeb-Umbach, R. (2019). Convolutional Recurrent Neural Network
    and Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision.
    <i>DCASE2019 Workshop, New York, USA</i>.
  bibtex: '@inproceedings{Ebbers_Haeb-Umbach_2019, title={Convolutional Recurrent
    Neural Network and Data Augmentation for Audio Tagging with Noisy Labels and Minimal
    Supervision}, booktitle={DCASE2019 Workshop, New York, USA}, author={Ebbers, Janek
    and Haeb-Umbach, Reinhold}, year={2019} }'
  chicago: Ebbers, Janek, and Reinhold Haeb-Umbach. “Convolutional Recurrent Neural
    Network and Data Augmentation for Audio Tagging with Noisy Labels and Minimal
    Supervision.” In <i>DCASE2019 Workshop, New York, USA</i>, 2019.
  ieee: J. Ebbers and R. Haeb-Umbach, “Convolutional Recurrent Neural Network and
    Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision,”
    2019.
  mla: Ebbers, Janek, and Reinhold Haeb-Umbach. “Convolutional Recurrent Neural Network
    and Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision.”
    <i>DCASE2019 Workshop, New York, USA</i>, 2019.
  short: 'J. Ebbers, R. Haeb-Umbach, in: DCASE2019 Workshop, New York, USA, 2019.'
date_created: 2020-02-05T10:16:03Z
date_updated: 2023-11-22T08:30:12Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2020-02-05T10:18:06Z
  date_updated: 2020-02-05T10:18:06Z
  file_id: '15795'
  file_name: DCASE_2019_WS_Ebbers_Paper.pdf
  file_size: 184967
  relation: main_file
file_date_updated: 2020-02-05T10:18:06Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: DCASE2019 Workshop, New York, USA
quality_controlled: '1'
status: public
title: Convolutional Recurrent Neural Network and Data Augmentation for Audio Tagging
  with Noisy Labels and Minimal Supervision
type: conference
user_id: '34851'
year: '2019'
...
---
_id: '15796'
abstract:
- lang: eng
  text: In this paper we consider human daily activity recognition using an acoustic
    sensor network (ASN) which consists of nodes distributed in a home environment.
    Assuming that the ASN is permanently recording, the vast majority of recordings
    is silence. Therefore, we propose to employ a computationally efficient two-stage
    sound recognition system, consisting of an initial sound activity detection (SAD)
    and a subsequent sound event classification (SEC), which is only activated once
    sound activity has been detected. We show how a low-latency activity detector
    with high temporal resolution can be trained from weak labels with low temporal
    resolution. We further demonstrate the advantage of using spatial features for
    the subsequent event classification task.
author:
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Andreas
  full_name: Brendel, Andreas
  last_name: Brendel
- first_name: Walter
  full_name: Kellermann, Walter
  last_name: Kellermann
citation:
  ama: 'Ebbers J, Drude L, Haeb-Umbach R, Brendel A, Kellermann W. Weakly Supervised
    Sound Activity Detection and Event Classification in Acoustic Sensor Networks.
    In: <i>CAMSAP 2019, Guadeloupe, West Indies</i>. ; 2019.'
  apa: Ebbers, J., Drude, L., Haeb-Umbach, R., Brendel, A., &#38; Kellermann, W. (2019).
    Weakly Supervised Sound Activity Detection and Event Classification in Acoustic
    Sensor Networks. <i>CAMSAP 2019, Guadeloupe, West Indies</i>.
  bibtex: '@inproceedings{Ebbers_Drude_Haeb-Umbach_Brendel_Kellermann_2019, title={Weakly
    Supervised Sound Activity Detection and Event Classification in Acoustic Sensor
    Networks}, booktitle={CAMSAP 2019, Guadeloupe, West Indies}, author={Ebbers, Janek
    and Drude, Lukas and Haeb-Umbach, Reinhold and Brendel, Andreas and Kellermann,
    Walter}, year={2019} }'
  chicago: Ebbers, Janek, Lukas Drude, Reinhold Haeb-Umbach, Andreas Brendel, and
    Walter Kellermann. “Weakly Supervised Sound Activity Detection and Event Classification
    in Acoustic Sensor Networks.” In <i>CAMSAP 2019, Guadeloupe, West Indies</i>,
    2019.
  ieee: J. Ebbers, L. Drude, R. Haeb-Umbach, A. Brendel, and W. Kellermann, “Weakly
    Supervised Sound Activity Detection and Event Classification in Acoustic Sensor
    Networks,” 2019.
  mla: Ebbers, Janek, et al. “Weakly Supervised Sound Activity Detection and Event
    Classification in Acoustic Sensor Networks.” <i>CAMSAP 2019, Guadeloupe, West
    Indies</i>, 2019.
  short: 'J. Ebbers, L. Drude, R. Haeb-Umbach, A. Brendel, W. Kellermann, in: CAMSAP
    2019, Guadeloupe, West Indies, 2019.'
date_created: 2020-02-05T10:20:17Z
date_updated: 2023-11-22T08:29:58Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2020-02-05T10:21:39Z
  date_updated: 2020-02-05T10:21:39Z
  file_id: '15797'
  file_name: CAMSAP_2019_WS_Ebbers_Paper.pdf
  file_size: 311887
  relation: main_file
file_date_updated: 2020-02-05T10:21:39Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: CAMSAP 2019, Guadeloupe, West Indies
quality_controlled: '1'
status: public
title: Weakly Supervised Sound Activity Detection and Event Classification in Acoustic
  Sensor Networks
type: conference
user_id: '34851'
year: '2019'
...
---
_id: '15792'
abstract:
- lang: eng
  text: In this paper we highlight the privacy risks entailed in deep neural network
    feature extraction for domestic activity monitoring. We employ the baseline system
    proposed in the Task 5 of the DCASE 2018 challenge and simulate a feature interception
    attack by an eavesdropper who wants to perform speaker identification. We then
    propose to reduce the aforementioned privacy risks by introducing a variational
    information feature extraction scheme that allows for good activity monitoring
    performance while at the same time minimizing the information of the feature representation,
    thus restricting speaker identification attempts. We analyze the resulting model’s
    composite loss function and the budget scaling factor used to control the balance
    between the performance of the trusted and attacker tasks. It is empirically demonstrated
    that the proposed method reduces speaker identification privacy risks without
    significantly deprecating the performance of domestic activity monitoring tasks.
author:
- first_name: Alexandru
  full_name: Nelus, Alexandru
  last_name: Nelus
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Rainer
  full_name: Martin, Rainer
  last_name: Martin
citation:
  ama: 'Nelus A, Ebbers J, Haeb-Umbach R, Martin R. Privacy-preserving Variational
    Information Feature Extraction for Domestic Activity Monitoring Versus Speaker
    Identification. In: <i>INTERSPEECH 2019, Graz, Austria</i>. ; 2019.'
  apa: Nelus, A., Ebbers, J., Haeb-Umbach, R., &#38; Martin, R. (2019). Privacy-preserving
    Variational Information Feature Extraction for Domestic Activity Monitoring Versus
    Speaker Identification. <i>INTERSPEECH 2019, Graz, Austria</i>.
  bibtex: '@inproceedings{Nelus_Ebbers_Haeb-Umbach_Martin_2019, title={Privacy-preserving
    Variational Information Feature Extraction for Domestic Activity Monitoring Versus
    Speaker Identification}, booktitle={INTERSPEECH 2019, Graz, Austria}, author={Nelus,
    Alexandru and Ebbers, Janek and Haeb-Umbach, Reinhold and Martin, Rainer}, year={2019}
    }'
  chicago: Nelus, Alexandru, Janek Ebbers, Reinhold Haeb-Umbach, and Rainer Martin.
    “Privacy-Preserving Variational Information Feature Extraction for Domestic Activity
    Monitoring Versus Speaker Identification.” In <i>INTERSPEECH 2019, Graz, Austria</i>,
    2019.
  ieee: A. Nelus, J. Ebbers, R. Haeb-Umbach, and R. Martin, “Privacy-preserving Variational
    Information Feature Extraction for Domestic Activity Monitoring Versus Speaker
    Identification,” 2019.
  mla: Nelus, Alexandru, et al. “Privacy-Preserving Variational Information Feature
    Extraction for Domestic Activity Monitoring Versus Speaker Identification.” <i>INTERSPEECH
    2019, Graz, Austria</i>, 2019.
  short: 'A. Nelus, J. Ebbers, R. Haeb-Umbach, R. Martin, in: INTERSPEECH 2019, Graz,
    Austria, 2019.'
date_created: 2020-02-05T10:07:53Z
date_updated: 2023-11-22T08:27:55Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2020-02-05T10:11:40Z
  date_updated: 2020-02-05T10:11:40Z
  file_id: '15793'
  file_name: INTERSPEECH_2019_Ebbers_Paper.pdf
  file_size: 454600
  relation: main_file
file_date_updated: 2020-02-05T10:11:40Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
publication: INTERSPEECH 2019, Graz, Austria
quality_controlled: '1'
status: public
title: Privacy-preserving Variational Information Feature Extraction for Domestic
  Activity Monitoring Versus Speaker Identification
type: conference
user_id: '34851'
year: '2019'
...
---
_id: '11760'
abstract:
- lang: eng
  text: Acoustic event detection, i.e., the task of assigning a human interpretable
    label to a segment of audio, has only recently attracted increased interest in
    the research community. Driven by the DCASE challenges and the availability of
    large-scale audio datasets, the state-of-the-art has progressed rapidly with deep-learning-based
    classi- fiers dominating the field. Because several potential use cases favor
    a realization on distributed sensor nodes, e.g. ambient assisted living applications,
    habitat monitoring or surveillance, we are concerned with two issues here. Firstly
    the classification performance of such systems and secondly the computing resources
    required to achieve a certain performance considering node level feature extraction.
    In this contribution we look at the balance between the two criteria by employing
    traditional techniques and different deep learning architectures, including convolutional
    and recurrent models in the context of real life everyday audio recordings in
    realistic, however challenging, multisource conditions.
author:
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Alexandru
  full_name: Nelus, Alexandru
  last_name: Nelus
- first_name: Rainer
  full_name: Martin, Rainer
  last_name: Martin
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Ebbers J, Nelus A, Martin R, Haeb-Umbach R. Evaluation of Modulation-MFCC
    Features and DNN Classification for Acoustic Event Detection. In: <i>DAGA 2018,
    München</i>. ; 2018.'
  apa: Ebbers, J., Nelus, A., Martin, R., &#38; Haeb-Umbach, R. (2018). Evaluation
    of Modulation-MFCC Features and DNN Classification for Acoustic Event Detection.
    In <i>DAGA 2018, München</i>.
  bibtex: '@inproceedings{Ebbers_Nelus_Martin_Haeb-Umbach_2018, title={Evaluation
    of Modulation-MFCC Features and DNN Classification for Acoustic Event Detection},
    booktitle={DAGA 2018, München}, author={Ebbers, Janek and Nelus, Alexandru and
    Martin, Rainer and Haeb-Umbach, Reinhold}, year={2018} }'
  chicago: Ebbers, Janek, Alexandru Nelus, Rainer Martin, and Reinhold Haeb-Umbach.
    “Evaluation of Modulation-MFCC Features and DNN Classification for Acoustic Event
    Detection.” In <i>DAGA 2018, München</i>, 2018.
  ieee: J. Ebbers, A. Nelus, R. Martin, and R. Haeb-Umbach, “Evaluation of Modulation-MFCC
    Features and DNN Classification for Acoustic Event Detection,” in <i>DAGA 2018,
    München</i>, 2018.
  mla: Ebbers, Janek, et al. “Evaluation of Modulation-MFCC Features and DNN Classification
    for Acoustic Event Detection.” <i>DAGA 2018, München</i>, 2018.
  short: 'J. Ebbers, A. Nelus, R. Martin, R. Haeb-Umbach, in: DAGA 2018, München,
    2018.'
date_created: 2019-07-12T05:27:43Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/Daga_2018_Ebbers_Paper.pdf
oa: '1'
publication: DAGA 2018, München
status: public
title: Evaluation of Modulation-MFCC Features and DNN Classification for Acoustic
  Event Detection
type: conference
user_id: '44006'
year: '2018'
...
---
_id: '11835'
abstract:
- lang: eng
  text: Signal dereverberation using the weighted prediction error (WPE) method has
    been proven to be an effective means to raise the accuracy of far-field speech
    recognition. But in its original formulation, WPE requires multiple iterations
    over a sufficiently long utterance, rendering it unsuitable for online low-latency
    applications. Recently, two methods have been proposed to overcome this limitation.
    One utilizes a neural network to estimate the power spectral density (PSD) of
    the target signal and works in a block-online fashion. The other method relies
    on a rather simple PSD estimation which smoothes the observed PSD and utilizes
    a recursive formulation which enables it to work on a frame-by-frame basis. In
    this paper, we integrate a deep neural network (DNN) based estimator into the
    recursive frame-online formulation. We evaluate the performance of the recursive
    system with different PSD estimators in comparison to the block-online and offline
    variant on two distinct corpora. The REVERB challenge data, where the signal is
    mainly deteriorated by reverberation, and a database which combines WSJ and VoiceHome
    to also consider (directed) noise sources. The results show that although smoothing
    works surprisingly well, the more sophisticated DNN based estimator shows promising
    improvements and shortens the performance gap between online and offline processing.
author:
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Tomohiro
  full_name: Nakatani, Tomohiro
  last_name: Nakatani
citation:
  ama: 'Heymann J, Drude L, Haeb-Umbach R, Kinoshita K, Nakatani T. Frame-Online DNN-WPE
    Dereverberation. In: <i>IWAENC 2018, Tokio, Japan</i>. ; 2018.'
  apa: Heymann, J., Drude, L., Haeb-Umbach, R., Kinoshita, K., &#38; Nakatani, T.
    (2018). Frame-Online DNN-WPE Dereverberation. In <i>IWAENC 2018, Tokio, Japan</i>.
  bibtex: '@inproceedings{Heymann_Drude_Haeb-Umbach_Kinoshita_Nakatani_2018, title={Frame-Online
    DNN-WPE Dereverberation}, booktitle={IWAENC 2018, Tokio, Japan}, author={Heymann,
    Jahn and Drude, Lukas and Haeb-Umbach, Reinhold and Kinoshita, Keisuke and Nakatani,
    Tomohiro}, year={2018} }'
  chicago: Heymann, Jahn, Lukas Drude, Reinhold Haeb-Umbach, Keisuke Kinoshita, and
    Tomohiro Nakatani. “Frame-Online DNN-WPE Dereverberation.” In <i>IWAENC 2018,
    Tokio, Japan</i>, 2018.
  ieee: J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, and T. Nakatani, “Frame-Online
    DNN-WPE Dereverberation,” in <i>IWAENC 2018, Tokio, Japan</i>, 2018.
  mla: Heymann, Jahn, et al. “Frame-Online DNN-WPE Dereverberation.” <i>IWAENC 2018,
    Tokio, Japan</i>, 2018.
  short: 'J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, T. Nakatani, in: IWAENC
    2018, Tokio, Japan, 2018.'
date_created: 2019-07-12T05:29:10Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/IWAENC_2018_Heymann_Paper.pdf
oa: '1'
publication: IWAENC 2018, Tokio, Japan
related_material:
  link:
  - description: Poster
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/IWAENC_2018_Heymann_Poster.pdf
status: public
title: Frame-Online DNN-WPE Dereverberation
type: conference
user_id: '44006'
year: '2018'
...
---
_id: '11837'
abstract:
- lang: eng
  text: We present a block-online multi-channel front end for automatic speech recognition
    in noisy and reverberated environments. It is an online version of our earlier
    proposed neural network supported acoustic beamformer, whose coefficients are
    calculated from noise and speech spatial covariance matrices which are estimated
    utilizing a neural mask estimator. However, the sparsity of speech in the STFT
    domain causes problems for the initial beamformer coefficients estimation in some
    frequency bins due to lack of speech observations. We propose two methods to mitigate
    this issue. The first is to lower the frequency resolution of the STFT, which
    comes with the additional advantage of a reduced time window, thus lowering the
    latency introduced by block processing. The second approach is to smooth beamforming
    coefficients along the frequency axis, thus exploiting their high interfrequency
    correlation. With both approaches the gap between offline and block-online beamformer
    performance, as measured by the word error rate achieved by a downstream speech
    recognizer, is significantly reduced. Experiments are carried out on two copora,
    representing noisy (CHiME-4) and noisy reverberant (voiceHome) environments.
author:
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Heitkaemper J, Heymann J, Haeb-Umbach R. Smoothing along Frequency in Online
    Neural Network Supported Acoustic Beamforming. In: <i>ITG 2018, Oldenburg, Germany</i>.
    ; 2018.'
  apa: Heitkaemper, J., Heymann, J., &#38; Haeb-Umbach, R. (2018). Smoothing along
    Frequency in Online Neural Network Supported Acoustic Beamforming. In <i>ITG 2018,
    Oldenburg, Germany</i>.
  bibtex: '@inproceedings{Heitkaemper_Heymann_Haeb-Umbach_2018, title={Smoothing along
    Frequency in Online Neural Network Supported Acoustic Beamforming}, booktitle={ITG
    2018, Oldenburg, Germany}, author={Heitkaemper, Jens and Heymann, Jahn and Haeb-Umbach,
    Reinhold}, year={2018} }'
  chicago: Heitkaemper, Jens, Jahn Heymann, and Reinhold Haeb-Umbach. “Smoothing along
    Frequency in Online Neural Network Supported Acoustic Beamforming.” In <i>ITG
    2018, Oldenburg, Germany</i>, 2018.
  ieee: J. Heitkaemper, J. Heymann, and R. Haeb-Umbach, “Smoothing along Frequency
    in Online Neural Network Supported Acoustic Beamforming,” in <i>ITG 2018, Oldenburg,
    Germany</i>, 2018.
  mla: Heitkaemper, Jens, et al. “Smoothing along Frequency in Online Neural Network
    Supported Acoustic Beamforming.” <i>ITG 2018, Oldenburg, Germany</i>, 2018.
  short: 'J. Heitkaemper, J. Heymann, R. Haeb-Umbach, in: ITG 2018, Oldenburg, Germany,
    2018.'
date_created: 2019-07-12T05:29:13Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/ITG_2018_Heitkaemper_Paper.pdf
oa: '1'
publication: ITG 2018, Oldenburg, Germany
related_material:
  link:
  - description: Slides
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/ITG_2018_Heitkaemper_Slides.pdf
status: public
title: Smoothing along Frequency in Online Neural Network Supported Acoustic Beamforming
type: conference
user_id: '44006'
year: '2018'
...
---
_id: '11872'
abstract:
- lang: eng
  text: 'The weighted prediction error (WPE) algorithm has proven to be a very successful
    dereverberation method for the REVERB challenge. Likewise, neural network based
    mask estimation for beamforming demonstrated very good noise suppression in the
    CHiME 3 and CHiME 4 challenges. Recently, it has been shown that this estimator
    can also be trained to perform dereverberation and denoising jointly. However,
    up to now a comparison of a neural beamformer and WPE is still missing, so is
    an investigation into a combination of the two. Therefore, we here provide an
    extensive evaluation of both and consequently propose variants to integrate deep
    neural network based beamforming with WPE. For these integrated variants we identify
    a consistent word error rate (WER) reduction on two distinct databases. In particular,
    our study shows that deep learning based beamforming benefits from a model-based
    dereverberation technique (i.e. WPE) and vice versa. Our key findings are: (a)
    Neural beamforming yields the lower WERs in comparison to WPE the more channels
    and noise are present. (b) Integration of WPE and a neural beamformer consistently
    outperforms all stand-alone systems.'
author:
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Tomohiro
  full_name: Nakatani, Tomohiro
  last_name: Nakatani
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Drude L, Boeddeker C, Heymann J, et al. Integration neural network based beamforming
    and weighted prediction error dereverberation. In: <i>INTERSPEECH 2018, Hyderabad,
    India</i>. ; 2018.'
  apa: Drude, L., Boeddeker, C., Heymann, J., Kinoshita, K., Delcroix, M., Nakatani,
    T., &#38; Haeb-Umbach, R. (2018). Integration neural network based beamforming
    and weighted prediction error dereverberation. In <i>INTERSPEECH 2018, Hyderabad,
    India</i>.
  bibtex: '@inproceedings{Drude_Boeddeker_Heymann_Kinoshita_Delcroix_Nakatani_Haeb-Umbach_2018,
    title={Integration neural network based beamforming and weighted prediction error
    dereverberation}, booktitle={INTERSPEECH 2018, Hyderabad, India}, author={Drude,
    Lukas and Boeddeker, Christoph and Heymann, Jahn and Kinoshita, Keisuke and Delcroix,
    Marc and Nakatani, Tomohiro and Haeb-Umbach, Reinhold}, year={2018} }'
  chicago: Drude, Lukas, Christoph Boeddeker, Jahn Heymann, Keisuke Kinoshita, Marc
    Delcroix, Tomohiro Nakatani, and Reinhold Haeb-Umbach. “Integration Neural Network
    Based Beamforming and Weighted Prediction Error Dereverberation.” In <i>INTERSPEECH
    2018, Hyderabad, India</i>, 2018.
  ieee: L. Drude <i>et al.</i>, “Integration neural network based beamforming and
    weighted prediction error dereverberation,” in <i>INTERSPEECH 2018, Hyderabad,
    India</i>, 2018.
  mla: Drude, Lukas, et al. “Integration Neural Network Based Beamforming and Weighted
    Prediction Error Dereverberation.” <i>INTERSPEECH 2018, Hyderabad, India</i>,
    2018.
  short: 'L. Drude, C. Boeddeker, J. Heymann, K. Kinoshita, M. Delcroix, T. Nakatani,
    R. Haeb-Umbach, in: INTERSPEECH 2018, Hyderabad, India, 2018.'
date_created: 2019-07-12T05:29:53Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Drude_Paper.pdf
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: INTERSPEECH 2018, Hyderabad, India
related_material:
  link:
  - description: Slides
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Drude_Slides.pdf
status: public
title: Integration neural network based beamforming and weighted prediction error
  dereverberation
type: conference
user_id: '40767'
year: '2018'
...
---
_id: '11873'
abstract:
- lang: eng
  text: NARA-WPE is a Python software package providing implementations of the weighted
    prediction error (WPE) dereverberation algorithm. WPE has been shown to be a highly
    effective tool for speech dereverberation, thus improving the perceptual quality
    of the signal and improving the recognition performance of downstream automatic
    speech recognition (ASR). It is suitable both for single-channel and multi-channel
    applications. The package consist of (1) a Numpy implementation which can easily
    be integrated into a custom Python toolchain, and (2) a TensorFlow implementation
    which allows integration into larger computational graphs and enables backpropagation
    through WPE to train more advanced front-ends. This package comprises of an iterative
    offline (batch) version, a block-online version, and a frame-online version which
    can be used in moderately low latency applications, e.g. digital speech assistants.
author:
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Jahn
  full_name: Heymann, Jahn
  id: '9168'
  last_name: Heymann
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Drude L, Heymann J, Boeddeker C, Haeb-Umbach R. NARA-WPE: A Python package
    for weighted prediction error dereverberation in Numpy and Tensorflow for online
    and offline processing. In: <i>ITG 2018, Oldenburg, Germany</i>. ; 2018.'
  apa: 'Drude, L., Heymann, J., Boeddeker, C., &#38; Haeb-Umbach, R. (2018). NARA-WPE:
    A Python package for weighted prediction error dereverberation in Numpy and Tensorflow
    for online and offline processing. In <i>ITG 2018, Oldenburg, Germany</i>.'
  bibtex: '@inproceedings{Drude_Heymann_Boeddeker_Haeb-Umbach_2018, title={NARA-WPE:
    A Python package for weighted prediction error dereverberation in Numpy and Tensorflow
    for online and offline processing}, booktitle={ITG 2018, Oldenburg, Germany},
    author={Drude, Lukas and Heymann, Jahn and Boeddeker, Christoph and Haeb-Umbach,
    Reinhold}, year={2018} }'
  chicago: 'Drude, Lukas, Jahn Heymann, Christoph Boeddeker, and Reinhold Haeb-Umbach.
    “NARA-WPE: A Python Package for Weighted Prediction Error Dereverberation in Numpy
    and Tensorflow for Online and Offline Processing.” In <i>ITG 2018, Oldenburg,
    Germany</i>, 2018.'
  ieee: 'L. Drude, J. Heymann, C. Boeddeker, and R. Haeb-Umbach, “NARA-WPE: A Python
    package for weighted prediction error dereverberation in Numpy and Tensorflow
    for online and offline processing,” in <i>ITG 2018, Oldenburg, Germany</i>, 2018.'
  mla: 'Drude, Lukas, et al. “NARA-WPE: A Python Package for Weighted Prediction Error
    Dereverberation in Numpy and Tensorflow for Online and Offline Processing.” <i>ITG
    2018, Oldenburg, Germany</i>, 2018.'
  short: 'L. Drude, J. Heymann, C. Boeddeker, R. Haeb-Umbach, in: ITG 2018, Oldenburg,
    Germany, 2018.'
date_created: 2019-07-12T05:29:54Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/ITG_2018_Drude_Paper.pdf
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: ITG 2018, Oldenburg, Germany
related_material:
  link:
  - description: Poster
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/ITG_2018_Drude_Poster.pdf
status: public
title: 'NARA-WPE: A Python package for weighted prediction error dereverberation in
  Numpy and Tensorflow for online and offline processing'
type: conference
user_id: '40767'
year: '2018'
...
---
_id: '11916'
abstract:
- lang: eng
  text: We present an experimental comparison of seven state-of-the-art machine learning
    algorithms for the task of semantic analysis of spoken input, with a special emphasis
    on applications for dysarthric speech. Dysarthria is a motor speech disorder,
    which is characterized by poor articulation of phonemes. In order to cater for
    these noncanonical phoneme realizations, we employed an unsupervised learning
    approach to estimate the acoustic models for speech recognition, which does not
    require a literal transcription of the training data. Even for the subsequent
    task of semantic analysis, only weak supervision is employed, whereby the training
    utterance is accompanied by a semantic label only, rather than a literal transcription.
    Results on two databases, one of them containing dysarthric speech, are presented
    showing that Markov logic networks and conditional random fields substantially
    outperform other machine learning approaches. Markov logic networks have proved
    to be especially robust to recognition errors, which are caused by imprecise articulation
    in dysarthric speech.
author:
- first_name: Vladimir
  full_name: Despotovic, Vladimir
  last_name: Despotovic
- first_name: Oliver
  full_name: Walter, Oliver
  last_name: Walter
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Despotovic V, Walter O, Haeb-Umbach R. Machine learning techniques for semantic
    analysis of dysarthric speech: An experimental study. <i>Speech Communication
    99 (2018) 242-251 (Elsevier BV)</i>. 2018.'
  apa: 'Despotovic, V., Walter, O., &#38; Haeb-Umbach, R. (2018). Machine learning
    techniques for semantic analysis of dysarthric speech: An experimental study.
    <i>Speech Communication 99 (2018) 242-251 (Elsevier B.V.)</i>.'
  bibtex: '@article{Despotovic_Walter_Haeb-Umbach_2018, title={Machine learning techniques
    for semantic analysis of dysarthric speech: An experimental study}, journal={Speech
    Communication 99 (2018) 242-251 (Elsevier B.V.)}, author={Despotovic, Vladimir
    and Walter, Oliver and Haeb-Umbach, Reinhold}, year={2018} }'
  chicago: 'Despotovic, Vladimir, Oliver Walter, and Reinhold Haeb-Umbach. “Machine
    Learning Techniques for Semantic Analysis of Dysarthric Speech: An Experimental
    Study.” <i>Speech Communication 99 (2018) 242-251 (Elsevier B.V.)</i>, 2018.'
  ieee: 'V. Despotovic, O. Walter, and R. Haeb-Umbach, “Machine learning techniques
    for semantic analysis of dysarthric speech: An experimental study,” <i>Speech
    Communication 99 (2018) 242-251 (Elsevier B.V.)</i>, 2018.'
  mla: 'Despotovic, Vladimir, et al. “Machine Learning Techniques for Semantic Analysis
    of Dysarthric Speech: An Experimental Study.” <i>Speech Communication 99 (2018)
    242-251 (Elsevier B.V.)</i>, 2018.'
  short: V. Despotovic, O. Walter, R. Haeb-Umbach, Speech Communication 99 (2018)
    242-251 (Elsevier B.V.) (2018).
date_created: 2019-07-12T05:30:44Z
date_updated: 2022-01-06T06:51:12Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/SpeechCommunication_2018_Walter_Paper.pdf
oa: '1'
publication: Speech Communication 99 (2018) 242-251 (Elsevier B.V.)
status: public
title: 'Machine learning techniques for semantic analysis of dysarthric speech: An
  experimental study'
type: journal_article
user_id: '44006'
year: '2018'
...
---
_id: '12898'
abstract:
- lang: eng
  text: Deep clustering (DC) and deep attractor networks (DANs) are a data-driven
    way to monaural blind source separation. Both approaches provide astonishing single
    channel performance but have not yet been generalized to block-online processing.
    When separating speech in a continuous stream with a block-online algorithm, it
    needs to be determined in each block which of the output streams belongs to whom.
    In this contribution we solve this block permutation problem by introducing an
    additional speaker identification embedding to the DAN model structure. We motivate
    this model decision by analyzing the embedding topology of DC and DANs and show,
    that DC and DANs themselves are not sufficient for speaker identification. This
    model structure (a) improves the signal to distortion ratio (SDR) over a DAN baseline
    and (b) provides up to 61% and up to 34% relative reduction in permutation error
    rate and re-identification error rate compared to an i-vector baseline, respectively.
author:
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: Thilo
  full_name: von Neumann, Thilo
  last_name: von Neumann
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Drude L, von Neumann T, Haeb-Umbach R. Deep Attractor Networks for Speaker
    Re-Identifikation and Blind Source Separation. In: <i>ICASSP 2018, Calgary, Canada</i>.
    ; 2018.'
  apa: Drude, L., von Neumann, T., &#38; Haeb-Umbach, R. (2018). Deep Attractor Networks
    for Speaker Re-Identifikation and Blind Source Separation. In <i>ICASSP 2018,
    Calgary, Canada</i>.
  bibtex: '@inproceedings{Drude_von Neumann_Haeb-Umbach_2018, title={Deep Attractor
    Networks for Speaker Re-Identifikation and Blind Source Separation}, booktitle={ICASSP
    2018, Calgary, Canada}, author={Drude, Lukas and von Neumann, Thilo and Haeb-Umbach,
    Reinhold}, year={2018} }'
  chicago: Drude, Lukas, Thilo von Neumann, and Reinhold Haeb-Umbach. “Deep Attractor
    Networks for Speaker Re-Identifikation and Blind Source Separation.” In <i>ICASSP
    2018, Calgary, Canada</i>, 2018.
  ieee: L. Drude, T. von Neumann, and R. Haeb-Umbach, “Deep Attractor Networks for
    Speaker Re-Identifikation and Blind Source Separation,” in <i>ICASSP 2018, Calgary,
    Canada</i>, 2018.
  mla: Drude, Lukas, et al. “Deep Attractor Networks for Speaker Re-Identifikation
    and Blind Source Separation.” <i>ICASSP 2018, Calgary, Canada</i>, 2018.
  short: 'L. Drude, T. von Neumann, R. Haeb-Umbach, in: ICASSP 2018, Calgary, Canada,
    2018.'
date_created: 2019-07-30T14:22:53Z
date_updated: 2022-01-06T06:51:24Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/ICASSP_2018_Drude2_Paper.pdf
oa: '1'
publication: ICASSP 2018, Calgary, Canada
related_material:
  link:
  - description: Slides
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/ICASSP_2018_Drude2_Slides.pdf
status: public
title: Deep Attractor Networks for Speaker Re-Identifikation and Blind Source Separation
type: conference
user_id: '44006'
year: '2018'
...
---
_id: '12900'
abstract:
- lang: eng
  text: 'Deep attractor networks (DANs) are a recently introduced method to blindly
    separate sources from spectral features of a monaural recording using bidirectional
    long short-term memory networks (BLSTMs). Due to the nature of BLSTMs, this is
    inherently not online-ready and resorting to operating on blocks yields a block
    permutation problem in that the index of each speaker may change between blocks.
    We here propose the joint modeling of spatial and spectral features to solve the
    block permutation problem and generalize DANs to multi-channel meeting recordings:
    The DAN acts as a spectral feature extractor for a subsequent model-based clustering
    approach. We first analyze different joint models in batch-processing scenarios
    and finally propose a block-online blind source separation algorithm. The efficacy
    of the proposed models is demonstrated on reverberant mixtures corrupted by real
    recordings of multi-channel background noise. We demonstrate that both the proposed
    batch-processing and the proposed block-online system outperform (a) a spatial-only
    model with a state-of-the-art frequency permutation solver and (b) a spectral-only
    model with an oracle block permutation solver in terms of signal to distortion
    ratio (SDR) gains.'
author:
- first_name: Lukas
  full_name: Drude, Lukas
  id: '11213'
  last_name: Drude
- first_name: ' Takuya '
  full_name: 'Higuchi,,  Takuya '
  last_name: Higuchi,
- first_name: 'Keisuke '
  full_name: 'Kinoshita, Keisuke '
  last_name: Kinoshita
- first_name: 'Tomohiro '
  full_name: 'Nakatani, Tomohiro '
  last_name: Nakatani
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Drude L, Higuchi,  Takuya , Kinoshita K, Nakatani T, Haeb-Umbach R. Dual Frequency-
    and Block-Permutation Alignment for Deep Learning Based Block-Online Blind Source
    Separation. In: <i>ICASSP 2018, Calgary, Canada</i>. ; 2018.'
  apa: Drude, L., Higuchi,  Takuya , Kinoshita, K., Nakatani, T., &#38; Haeb-Umbach,
    R. (2018). Dual Frequency- and Block-Permutation Alignment for Deep Learning Based
    Block-Online Blind Source Separation. In <i>ICASSP 2018, Calgary, Canada</i>.
  bibtex: '@inproceedings{Drude_Higuchi,_Kinoshita_Nakatani_Haeb-Umbach_2018, title={Dual
    Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online
    Blind Source Separation}, booktitle={ICASSP 2018, Calgary, Canada}, author={Drude,
    Lukas and Higuchi,  Takuya  and Kinoshita, Keisuke  and Nakatani, Tomohiro  and
    Haeb-Umbach, Reinhold}, year={2018} }'
  chicago: Drude, Lukas,  Takuya  Higuchi, Keisuke  Kinoshita, Tomohiro  Nakatani,
    and Reinhold Haeb-Umbach. “Dual Frequency- and Block-Permutation Alignment for
    Deep Learning Based Block-Online Blind Source Separation.” In <i>ICASSP 2018,
    Calgary, Canada</i>, 2018.
  ieee: L. Drude,  Takuya  Higuchi, K. Kinoshita, T. Nakatani, and R. Haeb-Umbach,
    “Dual Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online
    Blind Source Separation,” in <i>ICASSP 2018, Calgary, Canada</i>, 2018.
  mla: Drude, Lukas, et al. “Dual Frequency- and Block-Permutation Alignment for Deep
    Learning Based Block-Online Blind Source Separation.” <i>ICASSP 2018, Calgary,
    Canada</i>, 2018.
  short: 'L. Drude,  Takuya  Higuchi, K. Kinoshita, T. Nakatani, R. Haeb-Umbach, in:
    ICASSP 2018, Calgary, Canada, 2018.'
date_created: 2019-07-30T14:42:15Z
date_updated: 2022-01-06T06:51:24Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2018/ICASSP_2018_Drude_Paper.pdf
oa: '1'
publication: ICASSP 2018, Calgary, Canada
related_material:
  link:
  - description: Poster
    relation: supplementary_material
    url: https://groups.uni-paderborn.de/nt/pubs/2018/ICASSP_2018_Drude_Poster.pdf
status: public
title: Dual Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online
  Blind Source Separation
type: conference
user_id: '44006'
year: '2018'
...
