---
_id: '33816'
author:
- first_name: Tobias
  full_name: Gburrek, Tobias
  id: '44006'
  last_name: Gburrek
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Gburrek T, Boeddeker C, von Neumann T, Cord-Landwehr T, Schmalenstroeer J,
    Haeb-Umbach R. <i>A Meeting Transcription System for an Ad-Hoc Acoustic Sensor
    Network</i>. arXiv; 2022. doi:<a href="https://doi.org/10.48550/ARXIV.2205.00944">10.48550/ARXIV.2205.00944</a>
  apa: Gburrek, T., Boeddeker, C., von Neumann, T., Cord-Landwehr, T., Schmalenstroeer,
    J., &#38; Haeb-Umbach, R. (2022). <i>A Meeting Transcription System for an Ad-Hoc
    Acoustic Sensor Network</i>. arXiv. <a href="https://doi.org/10.48550/ARXIV.2205.00944">https://doi.org/10.48550/ARXIV.2205.00944</a>
  bibtex: '@book{Gburrek_Boeddeker_von Neumann_Cord-Landwehr_Schmalenstroeer_Haeb-Umbach_2022,
    title={A Meeting Transcription System for an Ad-Hoc Acoustic Sensor Network},
    DOI={<a href="https://doi.org/10.48550/ARXIV.2205.00944">10.48550/ARXIV.2205.00944</a>},
    publisher={arXiv}, author={Gburrek, Tobias and Boeddeker, Christoph and von Neumann,
    Thilo and Cord-Landwehr, Tobias and Schmalenstroeer, Joerg and Haeb-Umbach, Reinhold},
    year={2022} }'
  chicago: Gburrek, Tobias, Christoph Boeddeker, Thilo von Neumann, Tobias Cord-Landwehr,
    Joerg Schmalenstroeer, and Reinhold Haeb-Umbach. <i>A Meeting Transcription System
    for an Ad-Hoc Acoustic Sensor Network</i>. arXiv, 2022. <a href="https://doi.org/10.48550/ARXIV.2205.00944">https://doi.org/10.48550/ARXIV.2205.00944</a>.
  ieee: T. Gburrek, C. Boeddeker, T. von Neumann, T. Cord-Landwehr, J. Schmalenstroeer,
    and R. Haeb-Umbach, <i>A Meeting Transcription System for an Ad-Hoc Acoustic Sensor
    Network</i>. arXiv, 2022.
  mla: Gburrek, Tobias, et al. <i>A Meeting Transcription System for an Ad-Hoc Acoustic
    Sensor Network</i>. arXiv, 2022, doi:<a href="https://doi.org/10.48550/ARXIV.2205.00944">10.48550/ARXIV.2205.00944</a>.
  short: T. Gburrek, C. Boeddeker, T. von Neumann, T. Cord-Landwehr, J. Schmalenstroeer,
    R. Haeb-Umbach, A Meeting Transcription System for an Ad-Hoc Acoustic Sensor Network,
    arXiv, 2022.
date_created: 2022-10-18T11:10:58Z
date_updated: 2025-02-12T09:03:42Z
ddc:
- '004'
department:
- _id: '54'
doi: 10.48550/ARXIV.2205.00944
file:
- access_level: open_access
  content_type: application/pdf
  creator: tgburrek
  date_created: 2023-11-17T06:42:04Z
  date_updated: 2023-11-17T06:42:04Z
  file_id: '48992'
  file_name: meeting_transcription_22.pdf
  file_size: 199006
  relation: main_file
file_date_updated: 2023-11-17T06:42:04Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publisher: arXiv
status: public
title: A Meeting Transcription System for an Ad-Hoc Acoustic Sensor Network
type: misc
user_id: '40767'
year: '2022'
...
---
_id: '33954'
author:
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Boeddeker C, Cord-Landwehr T, von Neumann T, Haeb-Umbach R. An Initialization
    Scheme for Meeting Separation with Spatial Mixture Models. In: <i>Interspeech
    2022</i>. ISCA; 2022. doi:<a href="https://doi.org/10.21437/interspeech.2022-10929">10.21437/interspeech.2022-10929</a>'
  apa: Boeddeker, C., Cord-Landwehr, T., von Neumann, T., &#38; Haeb-Umbach, R. (2022).
    An Initialization Scheme for Meeting Separation with Spatial Mixture Models. <i>Interspeech
    2022</i>. <a href="https://doi.org/10.21437/interspeech.2022-10929">https://doi.org/10.21437/interspeech.2022-10929</a>
  bibtex: '@inproceedings{Boeddeker_Cord-Landwehr_von Neumann_Haeb-Umbach_2022, title={An
    Initialization Scheme for Meeting Separation with Spatial Mixture Models}, DOI={<a
    href="https://doi.org/10.21437/interspeech.2022-10929">10.21437/interspeech.2022-10929</a>},
    booktitle={Interspeech 2022}, publisher={ISCA}, author={Boeddeker, Christoph and
    Cord-Landwehr, Tobias and von Neumann, Thilo and Haeb-Umbach, Reinhold}, year={2022}
    }'
  chicago: Boeddeker, Christoph, Tobias Cord-Landwehr, Thilo von Neumann, and Reinhold
    Haeb-Umbach. “An Initialization Scheme for Meeting Separation with Spatial Mixture
    Models.” In <i>Interspeech 2022</i>. ISCA, 2022. <a href="https://doi.org/10.21437/interspeech.2022-10929">https://doi.org/10.21437/interspeech.2022-10929</a>.
  ieee: 'C. Boeddeker, T. Cord-Landwehr, T. von Neumann, and R. Haeb-Umbach, “An Initialization
    Scheme for Meeting Separation with Spatial Mixture Models,” 2022, doi: <a href="https://doi.org/10.21437/interspeech.2022-10929">10.21437/interspeech.2022-10929</a>.'
  mla: Boeddeker, Christoph, et al. “An Initialization Scheme for Meeting Separation
    with Spatial Mixture Models.” <i>Interspeech 2022</i>, ISCA, 2022, doi:<a href="https://doi.org/10.21437/interspeech.2022-10929">10.21437/interspeech.2022-10929</a>.
  short: 'C. Boeddeker, T. Cord-Landwehr, T. von Neumann, R. Haeb-Umbach, in: Interspeech
    2022, ISCA, 2022.'
date_created: 2022-10-28T10:53:56Z
date_updated: 2025-02-12T09:06:56Z
department:
- _id: '54'
doi: 10.21437/interspeech.2022-10929
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://www.isca-archive.org/interspeech_2022/boeddeker22_interspeech.pdf
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: Interspeech 2022
publication_status: published
publisher: ISCA
status: public
title: An Initialization Scheme for Meeting Separation with Spatial Mixture Models
type: conference
user_id: '40767'
year: '2022'
...
---
_id: '33958'
abstract:
- lang: eng
  text: Recent speaker diarization studies showed that integration of end-to-end neural
    diarization (EEND) and clustering-based diarization is a promising approach for
    achieving state-of-the-art performance on various tasks. Such an approach first
    divides an observed signal into fixed-length segments, then performs {\it segment-level}
    local diarization based on an EEND module, and merges the segment-level results
    via clustering to form a final global diarization result. The segmentation is
    done to limit the number of speakers in each segment since the current EEND cannot
    handle a large number of speakers. In this paper, we argue that such an approach
    involving the segmentation has several issues; for example, it inevitably faces
    a dilemma that larger segment sizes increase both the context available for enhancing
    the performance and the number of speakers for the local EEND module to handle.
    To resolve such a problem, this paper proposes a novel framework that performs
    diarization without segmentation. However, it can still handle challenging data
    containing many speakers and a significant amount of overlapping speech. The proposed
    method can take an entire meeting for inference and perform {\it utterance-by-utterance}
    diarization that clusters utterance activities in terms of speakers. To this end,
    we leverage a neural network training scheme called Graph-PIT proposed recently
    for neural source separation. Experiments with simulated active-meeting-like data
    and CALLHOME data show the superiority of the proposed approach over the conventional
    methods.
author:
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Kinoshita K, von Neumann T, Delcroix M, Boeddeker C, Haeb-Umbach R. Utterance-by-utterance
    overlap-aware neural diarization with Graph-PIT. In: <i>Proc. Interspeech 2022</i>.
    ISCA; 2022:1486-1490. doi:<a href="https://doi.org/10.21437/Interspeech.2022-11408">10.21437/Interspeech.2022-11408</a>'
  apa: Kinoshita, K., von Neumann, T., Delcroix, M., Boeddeker, C., &#38; Haeb-Umbach,
    R. (2022). Utterance-by-utterance overlap-aware neural diarization with Graph-PIT.
    <i>Proc. Interspeech 2022</i>, 1486–1490. <a href="https://doi.org/10.21437/Interspeech.2022-11408">https://doi.org/10.21437/Interspeech.2022-11408</a>
  bibtex: '@inproceedings{Kinoshita_von Neumann_Delcroix_Boeddeker_Haeb-Umbach_2022,
    title={Utterance-by-utterance overlap-aware neural diarization with Graph-PIT},
    DOI={<a href="https://doi.org/10.21437/Interspeech.2022-11408">10.21437/Interspeech.2022-11408</a>},
    booktitle={Proc. Interspeech 2022}, publisher={ISCA}, author={Kinoshita, Keisuke
    and von Neumann, Thilo and Delcroix, Marc and Boeddeker, Christoph and Haeb-Umbach,
    Reinhold}, year={2022}, pages={1486–1490} }'
  chicago: Kinoshita, Keisuke, Thilo von Neumann, Marc Delcroix, Christoph Boeddeker,
    and Reinhold Haeb-Umbach. “Utterance-by-Utterance Overlap-Aware Neural Diarization
    with Graph-PIT.” In <i>Proc. Interspeech 2022</i>, 1486–90. ISCA, 2022. <a href="https://doi.org/10.21437/Interspeech.2022-11408">https://doi.org/10.21437/Interspeech.2022-11408</a>.
  ieee: 'K. Kinoshita, T. von Neumann, M. Delcroix, C. Boeddeker, and R. Haeb-Umbach,
    “Utterance-by-utterance overlap-aware neural diarization with Graph-PIT,” in <i>Proc.
    Interspeech 2022</i>, 2022, pp. 1486–1490, doi: <a href="https://doi.org/10.21437/Interspeech.2022-11408">10.21437/Interspeech.2022-11408</a>.'
  mla: Kinoshita, Keisuke, et al. “Utterance-by-Utterance Overlap-Aware Neural Diarization
    with Graph-PIT.” <i>Proc. Interspeech 2022</i>, ISCA, 2022, pp. 1486–90, doi:<a
    href="https://doi.org/10.21437/Interspeech.2022-11408">10.21437/Interspeech.2022-11408</a>.
  short: 'K. Kinoshita, T. von Neumann, M. Delcroix, C. Boeddeker, R. Haeb-Umbach,
    in: Proc. Interspeech 2022, ISCA, 2022, pp. 1486–1490.'
conference:
  name: Interspeech 2022
date_created: 2022-10-28T12:07:57Z
date_updated: 2025-02-12T09:09:05Z
department:
- _id: '54'
doi: 10.21437/Interspeech.2022-11408
language:
- iso: eng
main_file_link:
- url: https://www.isca-archive.org/interspeech_2022/kinoshita22_interspeech.pdf
page: 1486-1490
publication: Proc. Interspeech 2022
publication_status: published
publisher: ISCA
quality_controlled: '1'
status: public
title: Utterance-by-utterance overlap-aware neural diarization with Graph-PIT
type: conference
user_id: '40767'
year: '2022'
...
---
_id: '21065'
abstract:
- lang: eng
  text: The machine recognition of speech spoken at a distance from the microphones,
    known as far-field automatic speech recognition (ASR), has received a significant
    increase of attention in science and industry, which caused or was caused by an
    equally significant improvement in recognition accuracy. Meanwhile it has entered
    the consumer market with digital home assistants with a spoken language interface
    being its most prominent application. Speech recorded at a distance is affected
    by various acoustic distortions and, consequently, quite different processing
    pipelines have emerged compared to ASR for close-talk speech. A signal enhancement
    front-end for dereverberation, source separation and acoustic beamforming is employed
    to clean up the speech, and the back-end ASR engine is robustified by multi-condition
    training and adaptation. We will also describe the so-called end-to-end approach
    to ASR, which is a new promising architecture that has recently been extended
    to the far-field scenario. This tutorial article gives an account of the algorithms
    used to enable accurate speech recognition from a distance, and it will be seen
    that, although deep learning has a significant share in the technological breakthroughs,
    a clever combination with traditional signal processing can lead to surprisingly
    effective solutions.
author:
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Jahn
  full_name: Heymann, Jahn
  last_name: Heymann
- first_name: Lukas
  full_name: Drude, Lukas
  last_name: Drude
- first_name: Shinji
  full_name: Watanabe, Shinji
  last_name: Watanabe
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Tomohiro
  full_name: Nakatani, Tomohiro
  last_name: Nakatani
citation:
  ama: Haeb-Umbach R, Heymann J, Drude L, Watanabe S, Delcroix M, Nakatani T. Far-Field
    Automatic Speech Recognition. <i>Proceedings of the IEEE</i>. 2021;109(2):124-148.
    doi:<a href="https://doi.org/10.1109/JPROC.2020.3018668">10.1109/JPROC.2020.3018668</a>
  apa: Haeb-Umbach, R., Heymann, J., Drude, L., Watanabe, S., Delcroix, M., &#38;
    Nakatani, T. (2021). Far-Field Automatic Speech Recognition. <i>Proceedings of
    the IEEE</i>, <i>109</i>(2), 124–148. <a href="https://doi.org/10.1109/JPROC.2020.3018668">https://doi.org/10.1109/JPROC.2020.3018668</a>
  bibtex: '@article{Haeb-Umbach_Heymann_Drude_Watanabe_Delcroix_Nakatani_2021, title={Far-Field
    Automatic Speech Recognition}, volume={109}, DOI={<a href="https://doi.org/10.1109/JPROC.2020.3018668">10.1109/JPROC.2020.3018668</a>},
    number={2}, journal={Proceedings of the IEEE}, author={Haeb-Umbach, Reinhold and
    Heymann, Jahn and Drude, Lukas and Watanabe, Shinji and Delcroix, Marc and Nakatani,
    Tomohiro}, year={2021}, pages={124–148} }'
  chicago: 'Haeb-Umbach, Reinhold, Jahn Heymann, Lukas Drude, Shinji Watanabe, Marc
    Delcroix, and Tomohiro Nakatani. “Far-Field Automatic Speech Recognition.” <i>Proceedings
    of the IEEE</i> 109, no. 2 (2021): 124–48. <a href="https://doi.org/10.1109/JPROC.2020.3018668">https://doi.org/10.1109/JPROC.2020.3018668</a>.'
  ieee: R. Haeb-Umbach, J. Heymann, L. Drude, S. Watanabe, M. Delcroix, and T. Nakatani,
    “Far-Field Automatic Speech Recognition,” <i>Proceedings of the IEEE</i>, vol.
    109, no. 2, pp. 124–148, 2021.
  mla: Haeb-Umbach, Reinhold, et al. “Far-Field Automatic Speech Recognition.” <i>Proceedings
    of the IEEE</i>, vol. 109, no. 2, 2021, pp. 124–48, doi:<a href="https://doi.org/10.1109/JPROC.2020.3018668">10.1109/JPROC.2020.3018668</a>.
  short: R. Haeb-Umbach, J. Heymann, L. Drude, S. Watanabe, M. Delcroix, T. Nakatani,
    Proceedings of the IEEE 109 (2021) 124–148.
date_created: 2021-01-25T08:15:27Z
date_updated: 2022-01-06T06:54:44Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/JPROC.2020.3018668
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2021-01-25T08:17:23Z
  date_updated: 2021-01-25T08:17:23Z
  file_id: '21066'
  file_name: proceedings_2021_haebumbach_Paper.pdf
  file_size: 4173988
  relation: main_file
file_date_updated: 2021-01-25T08:17:23Z
has_accepted_license: '1'
intvolume: '       109'
issue: '2'
language:
- iso: eng
oa: '1'
page: 124-148
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: Proceedings of the IEEE
status: public
title: Far-Field Automatic Speech Recognition
type: journal_article
user_id: '59789'
volume: 109
year: '2021'
...
---
_id: '28256'
author:
- first_name: Wangyou
  full_name: Zhang, Wangyou
  last_name: Zhang
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Shinji
  full_name: Watanabe, Shinji
  last_name: Watanabe
- first_name: Tomohiro
  full_name: Nakatani, Tomohiro
  last_name: Nakatani
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Tsubasa
  full_name: Ochiai, Tsubasa
  last_name: Ochiai
- first_name: Naoyuki
  full_name: Kamo, Naoyuki
  last_name: Kamo
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Yanmin
  full_name: Qian, Yanmin
  last_name: Qian
citation:
  ama: 'Zhang W, Boeddeker C, Watanabe S, et al. End-to-End Dereverberation, Beamforming,
    and Speech Recognition with Improved Numerical Stability and Advanced Frontend.
    In: <i>ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and
    Signal Processing (ICASSP)</i>. ; 2021. doi:<a href="https://doi.org/10.1109/icassp39728.2021.9414464">10.1109/icassp39728.2021.9414464</a>'
  apa: Zhang, W., Boeddeker, C., Watanabe, S., Nakatani, T., Delcroix, M., Kinoshita,
    K., Ochiai, T., Kamo, N., Haeb-Umbach, R., &#38; Qian, Y. (2021). End-to-End Dereverberation,
    Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced
    Frontend. <i>ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)</i>. <a href="https://doi.org/10.1109/icassp39728.2021.9414464">https://doi.org/10.1109/icassp39728.2021.9414464</a>
  bibtex: '@inproceedings{Zhang_Boeddeker_Watanabe_Nakatani_Delcroix_Kinoshita_Ochiai_Kamo_Haeb-Umbach_Qian_2021,
    title={End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved
    Numerical Stability and Advanced Frontend}, DOI={<a href="https://doi.org/10.1109/icassp39728.2021.9414464">10.1109/icassp39728.2021.9414464</a>},
    booktitle={ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)}, author={Zhang, Wangyou and Boeddeker, Christoph
    and Watanabe, Shinji and Nakatani, Tomohiro and Delcroix, Marc and Kinoshita,
    Keisuke and Ochiai, Tsubasa and Kamo, Naoyuki and Haeb-Umbach, Reinhold and Qian,
    Yanmin}, year={2021} }'
  chicago: Zhang, Wangyou, Christoph Boeddeker, Shinji Watanabe, Tomohiro Nakatani,
    Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach,
    and Yanmin Qian. “End-to-End Dereverberation, Beamforming, and Speech Recognition
    with Improved Numerical Stability and Advanced Frontend.” In <i>ICASSP 2021 -
    2021 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)</i>, 2021. <a href="https://doi.org/10.1109/icassp39728.2021.9414464">https://doi.org/10.1109/icassp39728.2021.9414464</a>.
  ieee: 'W. Zhang <i>et al.</i>, “End-to-End Dereverberation, Beamforming, and Speech
    Recognition with Improved Numerical Stability and Advanced Frontend,” 2021, doi:
    <a href="https://doi.org/10.1109/icassp39728.2021.9414464">10.1109/icassp39728.2021.9414464</a>.'
  mla: Zhang, Wangyou, et al. “End-to-End Dereverberation, Beamforming, and Speech
    Recognition with Improved Numerical Stability and Advanced Frontend.” <i>ICASSP
    2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)</i>, 2021, doi:<a href="https://doi.org/10.1109/icassp39728.2021.9414464">10.1109/icassp39728.2021.9414464</a>.
  short: 'W. Zhang, C. Boeddeker, S. Watanabe, T. Nakatani, M. Delcroix, K. Kinoshita,
    T. Ochiai, N. Kamo, R. Haeb-Umbach, Y. Qian, in: ICASSP 2021 - 2021 IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021.'
date_created: 2021-12-03T11:31:42Z
date_updated: 2022-01-13T08:31:27Z
department:
- _id: '54'
doi: 10.1109/icassp39728.2021.9414464
language:
- iso: eng
publication: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
publication_status: published
status: public
title: End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved
  Numerical Stability and Advanced Frontend
type: conference
user_id: '40767'
year: '2021'
...
---
_id: '24000'
author:
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Valentin
  full_name: Ion, Valentin
  last_name: Ion
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Heitkaemper J, Schmalenstroeer J, Ion V, Haeb-Umbach R. A Database for Research
    on Detection and Enhancement of Speech Transmitted over HF links. In: <i>Speech
    Communication; 14th ITG-Symposium</i>. ; 2021:1-5.'
  apa: Heitkaemper, J., Schmalenstroeer, J., Ion, V., &#38; Haeb-Umbach, R. (2021).
    A Database for Research on Detection and Enhancement of Speech Transmitted over
    HF links. <i>Speech Communication; 14th ITG-Symposium</i>, 1–5.
  bibtex: '@inproceedings{Heitkaemper_Schmalenstroeer_Ion_Haeb-Umbach_2021, title={A
    Database for Research on Detection and Enhancement of Speech Transmitted over
    HF links}, booktitle={Speech Communication; 14th ITG-Symposium}, author={Heitkaemper,
    Jens and Schmalenstroeer, Joerg and Ion, Valentin and Haeb-Umbach, Reinhold},
    year={2021}, pages={1–5} }'
  chicago: Heitkaemper, Jens, Joerg Schmalenstroeer, Valentin Ion, and Reinhold Haeb-Umbach.
    “A Database for Research on Detection and Enhancement of Speech Transmitted over
    HF Links.” In <i>Speech Communication; 14th ITG-Symposium</i>, 1–5, 2021.
  ieee: J. Heitkaemper, J. Schmalenstroeer, V. Ion, and R. Haeb-Umbach, “A Database
    for Research on Detection and Enhancement of Speech Transmitted over HF links,”
    in <i>Speech Communication; 14th ITG-Symposium</i>, 2021, pp. 1–5.
  mla: Heitkaemper, Jens, et al. “A Database for Research on Detection and Enhancement
    of Speech Transmitted over HF Links.” <i>Speech Communication; 14th ITG-Symposium</i>,
    2021, pp. 1–5.
  short: 'J. Heitkaemper, J. Schmalenstroeer, V. Ion, R. Haeb-Umbach, in: Speech Communication;
    14th ITG-Symposium, 2021, pp. 1–5.'
date_created: 2021-09-09T08:41:25Z
date_updated: 2023-10-26T08:06:57Z
department:
- _id: '54'
language:
- iso: eng
page: 1-5
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: Speech Communication; 14th ITG-Symposium
quality_controlled: '1'
status: public
title: A Database for Research on Detection and Enhancement of Speech Transmitted
  over HF links
type: conference
user_id: '460'
year: '2021'
...
---
_id: '44843'
abstract:
- lang: eng
  text: "Unsupervised blind source separation methods do not require a training phase\r\nand
    thus cannot suffer from a train-test mismatch, which is a common concern in\r\nneural
    network based source separation. The unsupervised techniques can be\r\ncategorized
    in two classes, those building upon the sparsity of speech in the\r\nShort-Time
    Fourier transform domain and those exploiting non-Gaussianity or\r\nnon-stationarity
    of the source signals. In this contribution, spatial mixture\r\nmodels which fall
    in the first category and independent vector analysis (IVA)\r\nas a representative
    of the second category are compared w.r.t. their separation\r\nperformance and
    the performance of a downstream speech recognizer on a\r\nreverberant dataset
    of reasonable size. Furthermore, we introduce a serial\r\nconcatenation of the
    two, where the result of the mixture model serves as\r\ninitialization of IVA,
    which achieves significantly better WER performance than\r\neach algorithm individually
    and even approaches the performance of a much more\r\ncomplex neural network based
    technique."
author:
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Frederik
  full_name: Rautenberg, Frederik
  id: '72602'
  last_name: Rautenberg
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Boeddeker C, Rautenberg F, Haeb-Umbach R. A Comparison and Combination of
    Unsupervised Blind Source Separation  Techniques. In: <i>ITG Conference on Speech
    Communication</i>. ; 2021.'
  apa: Boeddeker, C., Rautenberg, F., &#38; Haeb-Umbach, R. (2021). A Comparison and
    Combination of Unsupervised Blind Source Separation  Techniques. <i>ITG Conference
    on Speech Communication</i>. ITG Conference on Speech Communication, Kiel.
  bibtex: '@inproceedings{Boeddeker_Rautenberg_Haeb-Umbach_2021, title={A Comparison
    and Combination of Unsupervised Blind Source Separation  Techniques}, booktitle={ITG
    Conference on Speech Communication}, author={Boeddeker, Christoph and Rautenberg,
    Frederik and Haeb-Umbach, Reinhold}, year={2021} }'
  chicago: Boeddeker, Christoph, Frederik Rautenberg, and Reinhold Haeb-Umbach. “A
    Comparison and Combination of Unsupervised Blind Source Separation  Techniques.”
    In <i>ITG Conference on Speech Communication</i>, 2021.
  ieee: C. Boeddeker, F. Rautenberg, and R. Haeb-Umbach, “A Comparison and Combination
    of Unsupervised Blind Source Separation  Techniques,” presented at the ITG Conference
    on Speech Communication, Kiel, 2021.
  mla: Boeddeker, Christoph, et al. “A Comparison and Combination of Unsupervised
    Blind Source Separation  Techniques.” <i>ITG Conference on Speech Communication</i>,
    2021.
  short: 'C. Boeddeker, F. Rautenberg, R. Haeb-Umbach, in: ITG Conference on Speech
    Communication, 2021.'
conference:
  location: Kiel
  name: ITG Conference on Speech Communication
date_created: 2023-05-15T07:59:33Z
date_updated: 2023-11-15T15:29:32Z
ddc:
- '000'
department:
- _id: '54'
external_id:
  arxiv:
  - '2106.05627'
file:
- access_level: open_access
  content_type: application/pdf
  creator: frra
  date_created: 2023-05-16T08:37:31Z
  date_updated: 2023-11-15T15:29:32Z
  file_id: '44856'
  file_name: 2106.05627.pdf
  file_size: 295972
  relation: main_file
file_date_updated: 2023-11-15T15:29:32Z
has_accepted_license: '1'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/pdf/2106.05627.pdf
oa: '1'
publication: ITG Conference on Speech Communication
status: public
title: A Comparison and Combination of Unsupervised Blind Source Separation  Techniques
type: conference
user_id: '40767'
year: '2021'
...
---
_id: '28259'
author:
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Wangyou
  full_name: Zhang, Wangyou
  last_name: Zhang
- first_name: Tomohiro
  full_name: Nakatani, Tomohiro
  last_name: Nakatani
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Tsubasa
  full_name: Ochiai, Tsubasa
  last_name: Ochiai
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Naoyuki
  full_name: Kamo, Naoyuki
  last_name: Kamo
- first_name: Yanmin
  full_name: Qian, Yanmin
  last_name: Qian
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Boeddeker C, Zhang W, Nakatani T, et al. Convolutive Transfer Function Invariant
    SDR Training Criteria for Multi-Channel Reverberant Speech Separation. In: <i>ICASSP
    2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)</i>. ; 2021. doi:<a href="https://doi.org/10.1109/icassp39728.2021.9414661">10.1109/icassp39728.2021.9414661</a>'
  apa: Boeddeker, C., Zhang, W., Nakatani, T., Kinoshita, K., Ochiai, T., Delcroix,
    M., Kamo, N., Qian, Y., &#38; Haeb-Umbach, R. (2021). Convolutive Transfer Function
    Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.
    <i>ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal
    Processing (ICASSP)</i>. <a href="https://doi.org/10.1109/icassp39728.2021.9414661">https://doi.org/10.1109/icassp39728.2021.9414661</a>
  bibtex: '@inproceedings{Boeddeker_Zhang_Nakatani_Kinoshita_Ochiai_Delcroix_Kamo_Qian_Haeb-Umbach_2021,
    title={Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel
    Reverberant Speech Separation}, DOI={<a href="https://doi.org/10.1109/icassp39728.2021.9414661">10.1109/icassp39728.2021.9414661</a>},
    booktitle={ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)}, author={Boeddeker, Christoph and Zhang, Wangyou
    and Nakatani, Tomohiro and Kinoshita, Keisuke and Ochiai, Tsubasa and Delcroix,
    Marc and Kamo, Naoyuki and Qian, Yanmin and Haeb-Umbach, Reinhold}, year={2021}
    }'
  chicago: Boeddeker, Christoph, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita,
    Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, and Reinhold Haeb-Umbach.
    “Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel
    Reverberant Speech Separation.” In <i>ICASSP 2021 - 2021 IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2021. <a href="https://doi.org/10.1109/icassp39728.2021.9414661">https://doi.org/10.1109/icassp39728.2021.9414661</a>.
  ieee: 'C. Boeddeker <i>et al.</i>, “Convolutive Transfer Function Invariant SDR
    Training Criteria for Multi-Channel Reverberant Speech Separation,” 2021, doi:
    <a href="https://doi.org/10.1109/icassp39728.2021.9414661">10.1109/icassp39728.2021.9414661</a>.'
  mla: Boeddeker, Christoph, et al. “Convolutive Transfer Function Invariant SDR Training
    Criteria for Multi-Channel Reverberant Speech Separation.” <i>ICASSP 2021 - 2021
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>,
    2021, doi:<a href="https://doi.org/10.1109/icassp39728.2021.9414661">10.1109/icassp39728.2021.9414661</a>.
  short: 'C. Boeddeker, W. Zhang, T. Nakatani, K. Kinoshita, T. Ochiai, M. Delcroix,
    N. Kamo, Y. Qian, R. Haeb-Umbach, in: ICASSP 2021 - 2021 IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP), 2021.'
date_created: 2021-12-03T12:00:16Z
date_updated: 2023-11-15T15:18:09Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/icassp39728.2021.9414661
file:
- access_level: open_access
  content_type: application/pdf
  creator: cbj
  date_created: 2021-12-03T12:01:20Z
  date_updated: 2023-11-15T15:18:08Z
  file_id: '28260'
  file_name: ICASSP2021_BSSEval.pdf
  file_size: 228717
  relation: main_file
file_date_updated: 2023-11-15T15:18:08Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
publication_status: published
status: public
title: Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel
  Reverberant Speech Separation
type: conference
user_id: '40767'
year: '2021'
...
---
_id: '23998'
author:
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Joerg
  full_name: Ullmann, Joerg
  id: '16256'
  last_name: Ullmann
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Schmalenstroeer J, Heitkaemper J, Ullmann J, Haeb-Umbach R. Open Range Pitch
    Tracking for Carrier Frequency Difference Estimation from HF Transmitted Speech.
    In: <i>29th European Signal Processing Conference (EUSIPCO)</i>. ; 2021:1-5.'
  apa: Schmalenstroeer, J., Heitkaemper, J., Ullmann, J., &#38; Haeb-Umbach, R. (2021).
    Open Range Pitch Tracking for Carrier Frequency Difference Estimation from HF
    Transmitted Speech. <i>29th European Signal Processing Conference (EUSIPCO)</i>,
    1–5.
  bibtex: '@inproceedings{Schmalenstroeer_Heitkaemper_Ullmann_Haeb-Umbach_2021, title={Open
    Range Pitch Tracking for Carrier Frequency Difference Estimation from HF Transmitted
    Speech}, booktitle={29th European Signal Processing Conference (EUSIPCO)}, author={Schmalenstroeer,
    Joerg and Heitkaemper, Jens and Ullmann, Joerg and Haeb-Umbach, Reinhold}, year={2021},
    pages={1–5} }'
  chicago: Schmalenstroeer, Joerg, Jens Heitkaemper, Joerg Ullmann, and Reinhold Haeb-Umbach.
    “Open Range Pitch Tracking for Carrier Frequency Difference Estimation from HF
    Transmitted Speech.” In <i>29th European Signal Processing Conference (EUSIPCO)</i>,
    1–5, 2021.
  ieee: J. Schmalenstroeer, J. Heitkaemper, J. Ullmann, and R. Haeb-Umbach, “Open
    Range Pitch Tracking for Carrier Frequency Difference Estimation from HF Transmitted
    Speech,” in <i>29th European Signal Processing Conference (EUSIPCO)</i>, 2021,
    pp. 1–5.
  mla: Schmalenstroeer, Joerg, et al. “Open Range Pitch Tracking for Carrier Frequency
    Difference Estimation from HF Transmitted Speech.” <i>29th European Signal Processing
    Conference (EUSIPCO)</i>, 2021, pp. 1–5.
  short: 'J. Schmalenstroeer, J. Heitkaemper, J. Ullmann, R. Haeb-Umbach, in: 29th
    European Signal Processing Conference (EUSIPCO), 2021, pp. 1–5.'
date_created: 2021-09-09T08:40:04Z
date_updated: 2023-11-15T14:56:38Z
department:
- _id: '54'
extern: '1'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2103.01599
oa: '1'
page: 1-5
publication: 29th European Signal Processing Conference (EUSIPCO)
status: public
title: Open Range Pitch Tracking for Carrier Frequency Difference Estimation from
  HF Transmitted Speech
type: conference
user_id: '460'
year: '2021'
...
---
_id: '22528'
abstract:
- lang: eng
  text: Due to the ad hoc nature of wireless acoustic sensor networks, the position
    of the sensor nodes is typically unknown. This contribution proposes a technique
    to estimate the position and orientation of the sensor nodes from the recorded
    speech signals. The method assumes that a node comprises a microphone array with
    synchronously sampled microphones rather than a single microphone, but does not
    require the sampling clocks of the nodes to be synchronized. From the observed
    audio signals, the distances between the acoustic sources and arrays, as well
    as the directions of arrival, are estimated. They serve as input to a non-linear
    least squares problem, from which both the sensor nodes’ positions and orientations,
    as well as the source positions, are alternatingly estimated in an iterative process.
    Given one set of unknowns, i.e., either the source positions or the sensor nodes’
    geometry, the other set of unknowns can be computed in closed-form. The proposed
    approach is computationally efficient and the first one, which employs both distance
    and directional information for geometry calibration in a common cost function.
    Since both distance and direction of arrival measurements suffer from outliers,
    e.g., caused by strong reflections of the sound waves on the surfaces of the room,
    we introduce measures to deemphasize or remove unreliable measurements. Additionally,
    we discuss modifications of our previously proposed deep neural network-based
    acoustic distance estimator, to account not only for omnidirectional sources but
    also for directional sources. Simulation results show good positioning accuracy
    and compare very favorably with alternative approaches from the literature.
author:
- first_name: Tobias
  full_name: Gburrek, Tobias
  id: '44006'
  last_name: Gburrek
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Gburrek T, Schmalenstroeer J, Haeb-Umbach R. Geometry calibration in wireless
    acoustic sensor networks utilizing DoA and distance information. <i>EURASIP Journal
    on Audio, Speech, and Music Processing</i>. Published online 2021. doi:<a href="https://doi.org/10.1186/s13636-021-00210-x">10.1186/s13636-021-00210-x</a>
  apa: Gburrek, T., Schmalenstroeer, J., &#38; Haeb-Umbach, R. (2021). Geometry calibration
    in wireless acoustic sensor networks utilizing DoA and distance information. <i>EURASIP
    Journal on Audio, Speech, and Music Processing</i>. <a href="https://doi.org/10.1186/s13636-021-00210-x">https://doi.org/10.1186/s13636-021-00210-x</a>
  bibtex: '@article{Gburrek_Schmalenstroeer_Haeb-Umbach_2021, title={Geometry calibration
    in wireless acoustic sensor networks utilizing DoA and distance information},
    DOI={<a href="https://doi.org/10.1186/s13636-021-00210-x">10.1186/s13636-021-00210-x</a>},
    journal={EURASIP Journal on Audio, Speech, and Music Processing}, author={Gburrek,
    Tobias and Schmalenstroeer, Joerg and Haeb-Umbach, Reinhold}, year={2021} }'
  chicago: Gburrek, Tobias, Joerg Schmalenstroeer, and Reinhold Haeb-Umbach. “Geometry
    Calibration in Wireless Acoustic Sensor Networks Utilizing DoA and Distance Information.”
    <i>EURASIP Journal on Audio, Speech, and Music Processing</i>, 2021. <a href="https://doi.org/10.1186/s13636-021-00210-x">https://doi.org/10.1186/s13636-021-00210-x</a>.
  ieee: 'T. Gburrek, J. Schmalenstroeer, and R. Haeb-Umbach, “Geometry calibration
    in wireless acoustic sensor networks utilizing DoA and distance information,”
    <i>EURASIP Journal on Audio, Speech, and Music Processing</i>, 2021, doi: <a href="https://doi.org/10.1186/s13636-021-00210-x">10.1186/s13636-021-00210-x</a>.'
  mla: Gburrek, Tobias, et al. “Geometry Calibration in Wireless Acoustic Sensor Networks
    Utilizing DoA and Distance Information.” <i>EURASIP Journal on Audio, Speech,
    and Music Processing</i>, 2021, doi:<a href="https://doi.org/10.1186/s13636-021-00210-x">10.1186/s13636-021-00210-x</a>.
  short: T. Gburrek, J. Schmalenstroeer, R. Haeb-Umbach, EURASIP Journal on Audio,
    Speech, and Music Processing (2021).
date_created: 2021-07-05T05:30:15Z
date_updated: 2023-11-17T06:36:17Z
department:
- _id: '54'
doi: 10.1186/s13636-021-00210-x
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://asmp-eurasipjournals.springeropen.com/articles/10.1186/s13636-021-00210-x
oa: '1'
publication: EURASIP Journal on Audio, Speech, and Music Processing
publication_identifier:
  issn:
  - 1687-4722
publication_status: published
quality_controlled: '1'
status: public
title: Geometry calibration in wireless acoustic sensor networks utilizing DoA and
  distance information
type: journal_article
user_id: '44006'
year: '2021'
...
---
_id: '23994'
author:
- first_name: Tobias
  full_name: Gburrek, Tobias
  id: '44006'
  last_name: Gburrek
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Gburrek T, Schmalenstroeer J, Haeb-Umbach R. Iterative Geometry Calibration
    from Distance Estimates for Wireless Acoustic Sensor Networks. In: <i>ICASSP 2021
    - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)</i>. ; 2021. doi:<a href="https://doi.org/10.1109/icassp39728.2021.9413831">10.1109/icassp39728.2021.9413831</a>'
  apa: Gburrek, T., Schmalenstroeer, J., &#38; Haeb-Umbach, R. (2021). Iterative Geometry
    Calibration from Distance Estimates for Wireless Acoustic Sensor Networks. <i>ICASSP
    2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)</i>. <a href="https://doi.org/10.1109/icassp39728.2021.9413831">https://doi.org/10.1109/icassp39728.2021.9413831</a>
  bibtex: '@inproceedings{Gburrek_Schmalenstroeer_Haeb-Umbach_2021, title={Iterative
    Geometry Calibration from Distance Estimates for Wireless Acoustic Sensor Networks},
    DOI={<a href="https://doi.org/10.1109/icassp39728.2021.9413831">10.1109/icassp39728.2021.9413831</a>},
    booktitle={ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)}, author={Gburrek, Tobias and Schmalenstroeer,
    Joerg and Haeb-Umbach, Reinhold}, year={2021} }'
  chicago: Gburrek, Tobias, Joerg Schmalenstroeer, and Reinhold Haeb-Umbach. “Iterative
    Geometry Calibration from Distance Estimates for Wireless Acoustic Sensor Networks.”
    In <i>ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and
    Signal Processing (ICASSP)</i>, 2021. <a href="https://doi.org/10.1109/icassp39728.2021.9413831">https://doi.org/10.1109/icassp39728.2021.9413831</a>.
  ieee: 'T. Gburrek, J. Schmalenstroeer, and R. Haeb-Umbach, “Iterative Geometry Calibration
    from Distance Estimates for Wireless Acoustic Sensor Networks,” 2021, doi: <a
    href="https://doi.org/10.1109/icassp39728.2021.9413831">10.1109/icassp39728.2021.9413831</a>.'
  mla: Gburrek, Tobias, et al. “Iterative Geometry Calibration from Distance Estimates
    for Wireless Acoustic Sensor Networks.” <i>ICASSP 2021 - 2021 IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2021, doi:<a
    href="https://doi.org/10.1109/icassp39728.2021.9413831">10.1109/icassp39728.2021.9413831</a>.
  short: 'T. Gburrek, J. Schmalenstroeer, R. Haeb-Umbach, in: ICASSP 2021 - 2021 IEEE
    International Conference on Acoustics, Speech and Signal Processing (ICASSP),
    2021.'
date_created: 2021-09-09T08:30:16Z
date_updated: 2023-11-17T06:30:12Z
ddc:
- '004'
department:
- _id: '54'
doi: 10.1109/icassp39728.2021.9413831
file:
- access_level: open_access
  content_type: application/pdf
  creator: tgburrek
  date_created: 2023-11-17T06:29:40Z
  date_updated: 2023-11-17T06:30:11Z
  file_id: '48988'
  file_name: icassp21.pdf
  file_size: 312400
  relation: main_file
file_date_updated: 2023-11-17T06:30:11Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
publication: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
publication_status: published
quality_controlled: '1'
status: public
title: Iterative Geometry Calibration from Distance Estimates for Wireless Acoustic
  Sensor Networks
type: conference
user_id: '44006'
year: '2021'
...
---
_id: '23999'
author:
- first_name: Tobias
  full_name: Gburrek, Tobias
  id: '44006'
  last_name: Gburrek
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Gburrek T, Schmalenstroeer J, Haeb-Umbach R. On Source-Microphone Distance
    Estimation Using Convolutional Recurrent Neural Networks. In: <i>Speech Communication;
    14th ITG-Symposium</i>. ; 2021:1-5.'
  apa: Gburrek, T., Schmalenstroeer, J., &#38; Haeb-Umbach, R. (2021). On Source-Microphone
    Distance Estimation Using Convolutional Recurrent Neural Networks. <i>Speech Communication;
    14th ITG-Symposium</i>, 1–5.
  bibtex: '@inproceedings{Gburrek_Schmalenstroeer_Haeb-Umbach_2021, title={On Source-Microphone
    Distance Estimation Using Convolutional Recurrent Neural Networks}, booktitle={Speech
    Communication; 14th ITG-Symposium}, author={Gburrek, Tobias and Schmalenstroeer,
    Joerg and Haeb-Umbach, Reinhold}, year={2021}, pages={1–5} }'
  chicago: Gburrek, Tobias, Joerg Schmalenstroeer, and Reinhold Haeb-Umbach. “On Source-Microphone
    Distance Estimation Using Convolutional Recurrent Neural Networks.” In <i>Speech
    Communication; 14th ITG-Symposium</i>, 1–5, 2021.
  ieee: T. Gburrek, J. Schmalenstroeer, and R. Haeb-Umbach, “On Source-Microphone
    Distance Estimation Using Convolutional Recurrent Neural Networks,” in <i>Speech
    Communication; 14th ITG-Symposium</i>, 2021, pp. 1–5.
  mla: Gburrek, Tobias, et al. “On Source-Microphone Distance Estimation Using Convolutional
    Recurrent Neural Networks.” <i>Speech Communication; 14th ITG-Symposium</i>, 2021,
    pp. 1–5.
  short: 'T. Gburrek, J. Schmalenstroeer, R. Haeb-Umbach, in: Speech Communication;
    14th ITG-Symposium, 2021, pp. 1–5.'
date_created: 2021-09-09T08:40:44Z
date_updated: 2023-11-17T06:32:20Z
ddc:
- '004'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: tgburrek
  date_created: 2023-11-17T06:31:37Z
  date_updated: 2023-11-17T06:31:37Z
  file_id: '48989'
  file_name: dist_est.pdf
  file_size: 449694
  relation: main_file
file_date_updated: 2023-11-17T06:31:37Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
page: 1-5
publication: Speech Communication; 14th ITG-Symposium
quality_controlled: '1'
status: public
title: On Source-Microphone Distance Estimation Using Convolutional Recurrent Neural
  Networks
type: conference
user_id: '44006'
year: '2021'
...
---
_id: '29304'
abstract:
- lang: eng
  text: 'In this work we address disentanglement of style and content in speech signals.
    We propose a fully convolutional variational autoencoder employing two encoders:
    a content encoder and a style encoder. To foster disentanglement, we propose adversarial
    contrastive predictive coding. This new disentanglement method does neither need
    parallel data nor any supervision. We show that the proposed technique is capable
    of separating speaker and content traits into the two different representations
    and show competitive speaker-content disentanglement performance compared to other
    unsupervised approaches. We further demonstrate an increased robustness of the
    content representation against a train-test mismatch compared to spectral features,
    when used for phone recognition.'
author:
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Michael
  full_name: Kuhlmann, Michael
  id: '49871'
  last_name: Kuhlmann
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Ebbers J, Kuhlmann M, Cord-Landwehr T, Haeb-Umbach R. Contrastive Predictive
    Coding Supported Factorized Variational Autoencoder for Unsupervised Learning
    of Disentangled Speech Representations. In: <i>Proceedings of the IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>. ; 2021:3860–3864.'
  apa: Ebbers, J., Kuhlmann, M., Cord-Landwehr, T., &#38; Haeb-Umbach, R. (2021).
    Contrastive Predictive Coding Supported Factorized Variational Autoencoder for
    Unsupervised Learning of Disentangled Speech Representations. <i>Proceedings of
    the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>,
    3860–3864.
  bibtex: '@inproceedings{Ebbers_Kuhlmann_Cord-Landwehr_Haeb-Umbach_2021, title={Contrastive
    Predictive Coding Supported Factorized Variational Autoencoder for Unsupervised
    Learning of Disentangled Speech Representations}, booktitle={Proceedings of the
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    author={Ebbers, Janek and Kuhlmann, Michael and Cord-Landwehr, Tobias and Haeb-Umbach,
    Reinhold}, year={2021}, pages={3860–3864} }'
  chicago: Ebbers, Janek, Michael Kuhlmann, Tobias Cord-Landwehr, and Reinhold Haeb-Umbach.
    “Contrastive Predictive Coding Supported Factorized Variational Autoencoder for
    Unsupervised Learning of Disentangled Speech Representations.” In <i>Proceedings
    of the IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)</i>, 3860–3864, 2021.
  ieee: J. Ebbers, M. Kuhlmann, T. Cord-Landwehr, and R. Haeb-Umbach, “Contrastive
    Predictive Coding Supported Factorized Variational Autoencoder for Unsupervised
    Learning of Disentangled Speech Representations,” in <i>Proceedings of the IEEE
    International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>,
    2021, pp. 3860–3864.
  mla: Ebbers, Janek, et al. “Contrastive Predictive Coding Supported Factorized Variational
    Autoencoder for Unsupervised Learning of Disentangled Speech Representations.”
    <i>Proceedings of the IEEE International Conference on Acoustics, Speech and Signal
    Processing (ICASSP)</i>, 2021, pp. 3860–3864.
  short: 'J. Ebbers, M. Kuhlmann, T. Cord-Landwehr, R. Haeb-Umbach, in: Proceedings
    of the IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP), 2021, pp. 3860–3864.'
date_created: 2022-01-13T07:55:29Z
date_updated: 2023-11-22T08:29:42Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: ebbers
  date_created: 2022-01-13T07:56:30Z
  date_updated: 2022-01-13T08:19:19Z
  file_id: '29305'
  file_name: Template.pdf
  file_size: 236628
  relation: main_file
file_date_updated: 2022-01-13T08:19:19Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
page: 3860–3864
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: Proceedings of the IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
quality_controlled: '1'
status: public
title: Contrastive Predictive Coding Supported Factorized Variational Autoencoder
  for Unsupervised Learning of Disentangled Speech Representations
type: conference
user_id: '34851'
year: '2021'
...
---
_id: '26770'
abstract:
- lang: eng
  text: "Automatic transcription of meetings requires handling of overlapped speech,
    which calls for continuous speech separation (CSS) systems. The uPIT criterion
    was proposed for utterance-level separation with neural networks and introduces
    the constraint that the total number of speakers must not exceed the number of
    output channels. When processing meeting-like data in a segment-wise manner, i.e.,
    by separating overlapping segments independently and stitching adjacent segments
    to continuous output streams, this constraint has to be fulfilled for any segment.
    In this contribution, we show that this constraint can be significantly relaxed.
    We propose a novel graph-based PIT criterion, which casts the assignment of utterances
    to output channels in a graph coloring problem. It only requires that the number
    of concurrently active speakers must not exceed the number of output channels.
    As a consequence, the system can process an arbitrary number of speakers and arbitrarily
    long segments and thus can handle more diverse scenarios.\r\nFurther, the stitching
    algorithm for obtaining a consistent output order in neighboring segments is of
    less importance and can even be eliminated completely, not the least reducing
    the computational effort. Experiments on meeting-style WSJ data show improvements
    in recognition performance over using the uPIT criterion. "
author:
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'von Neumann T, Kinoshita K, Boeddeker C, Delcroix M, Haeb-Umbach R. Graph-PIT:
    Generalized Permutation Invariant Training for Continuous Separation of Arbitrary
    Numbers of Speakers. In: <i>Interspeech 2021</i>. ; 2021. doi:<a href="https://doi.org/10.21437/interspeech.2021-1177">10.21437/interspeech.2021-1177</a>'
  apa: 'von Neumann, T., Kinoshita, K., Boeddeker, C., Delcroix, M., &#38; Haeb-Umbach,
    R. (2021). Graph-PIT: Generalized Permutation Invariant Training for Continuous
    Separation of Arbitrary Numbers of Speakers. <i>Interspeech 2021</i>. Interspeech.
    <a href="https://doi.org/10.21437/interspeech.2021-1177">https://doi.org/10.21437/interspeech.2021-1177</a>'
  bibtex: '@inproceedings{von Neumann_Kinoshita_Boeddeker_Delcroix_Haeb-Umbach_2021,
    title={Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation
    of Arbitrary Numbers of Speakers}, DOI={<a href="https://doi.org/10.21437/interspeech.2021-1177">10.21437/interspeech.2021-1177</a>},
    booktitle={Interspeech 2021}, author={von Neumann, Thilo and Kinoshita, Keisuke
    and Boeddeker, Christoph and Delcroix, Marc and Haeb-Umbach, Reinhold}, year={2021}
    }'
  chicago: 'Neumann, Thilo von, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix,
    and Reinhold Haeb-Umbach. “Graph-PIT: Generalized Permutation Invariant Training
    for Continuous Separation of Arbitrary Numbers of Speakers.” In <i>Interspeech
    2021</i>, 2021. <a href="https://doi.org/10.21437/interspeech.2021-1177">https://doi.org/10.21437/interspeech.2021-1177</a>.'
  ieee: 'T. von Neumann, K. Kinoshita, C. Boeddeker, M. Delcroix, and R. Haeb-Umbach,
    “Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation
    of Arbitrary Numbers of Speakers,” presented at the Interspeech, 2021, doi: <a
    href="https://doi.org/10.21437/interspeech.2021-1177">10.21437/interspeech.2021-1177</a>.'
  mla: 'von Neumann, Thilo, et al. “Graph-PIT: Generalized Permutation Invariant Training
    for Continuous Separation of Arbitrary Numbers of Speakers.” <i>Interspeech 2021</i>,
    2021, doi:<a href="https://doi.org/10.21437/interspeech.2021-1177">10.21437/interspeech.2021-1177</a>.'
  short: 'T. von Neumann, K. Kinoshita, C. Boeddeker, M. Delcroix, R. Haeb-Umbach,
    in: Interspeech 2021, 2021.'
conference:
  name: Interspeech
date_created: 2021-10-25T08:50:01Z
date_updated: 2023-11-15T12:14:40Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.21437/interspeech.2021-1177
file:
- access_level: open_access
  content_type: video/mp4
  creator: tvn
  date_created: 2021-12-06T10:39:13Z
  date_updated: 2021-12-06T10:48:30Z
  file_id: '28327'
  file_name: Interspeech 2021 voiceover-002-compressed.mp4
  file_size: 9550220
  relation: supplementary_material
  title: Video for INTERSPEECH 2021
- access_level: open_access
  content_type: application/vnd.openxmlformats-officedocument.presentationml.presentation
  creator: tvn
  date_created: 2021-12-06T10:47:01Z
  date_updated: 2021-12-06T10:47:01Z
  file_id: '28328'
  file_name: Graph-PIT-poster-presentation.pptx
  file_size: 1337297
  relation: slides
  title: Slides from INTERSPEECH 2021
- access_level: open_access
  content_type: application/pdf
  creator: tvn
  date_created: 2021-12-06T10:48:21Z
  date_updated: 2021-12-06T10:48:21Z
  file_id: '28329'
  file_name: INTERSPEECH2021_Graph_PIT.pdf
  file_size: 226589
  relation: main_file
file_date_updated: 2021-12-06T10:48:30Z
has_accepted_license: '1'
keyword:
- Continuous speech separation
- automatic speech recognition
- overlapped speech
- permutation invariant training
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: Interspeech 2021
publication_status: published
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/fgnt/graph_pit
status: public
title: 'Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation
  of Arbitrary Numbers of Speakers'
type: conference
user_id: '49870'
year: '2021'
...
---
_id: '29173'
author:
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'von Neumann T, Boeddeker C, Kinoshita K, Delcroix M, Haeb-Umbach R. Speeding
    Up Permutation Invariant Training for Source Separation. In: <i>Speech Communication;
    14th ITG Conference</i>. ; 2021.'
  apa: von Neumann, T., Boeddeker, C., Kinoshita, K., Delcroix, M., &#38; Haeb-Umbach,
    R. (2021). Speeding Up Permutation Invariant Training for Source Separation. <i>Speech
    Communication; 14th ITG Conference</i>. Speech Communication; 14th ITG Conference,
    Kiel.
  bibtex: '@inproceedings{von Neumann_Boeddeker_Kinoshita_Delcroix_Haeb-Umbach_2021,
    title={Speeding Up Permutation Invariant Training for Source Separation}, booktitle={Speech
    Communication; 14th ITG Conference}, author={von Neumann, Thilo and Boeddeker,
    Christoph and Kinoshita, Keisuke and Delcroix, Marc and Haeb-Umbach, Reinhold},
    year={2021} }'
  chicago: Neumann, Thilo von, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix,
    and Reinhold Haeb-Umbach. “Speeding Up Permutation Invariant Training for Source
    Separation.” In <i>Speech Communication; 14th ITG Conference</i>, 2021.
  ieee: T. von Neumann, C. Boeddeker, K. Kinoshita, M. Delcroix, and R. Haeb-Umbach,
    “Speeding Up Permutation Invariant Training for Source Separation,” presented
    at the Speech Communication; 14th ITG Conference, Kiel, 2021.
  mla: von Neumann, Thilo, et al. “Speeding Up Permutation Invariant Training for
    Source Separation.” <i>Speech Communication; 14th ITG Conference</i>, 2021.
  short: 'T. von Neumann, C. Boeddeker, K. Kinoshita, M. Delcroix, R. Haeb-Umbach,
    in: Speech Communication; 14th ITG Conference, 2021.'
conference:
  end_date: 2021-10-01
  location: Kiel
  name: Speech Communication; 14th ITG Conference
  start_date: 2021-09-29
date_created: 2022-01-07T10:40:56Z
date_updated: 2023-11-15T12:16:31Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: tvn
  date_created: 2022-01-06T13:23:27Z
  date_updated: 2022-01-06T13:23:27Z
  file_id: '29180'
  file_name: poster.pdf
  file_size: 191938
  relation: poster
- access_level: open_access
  content_type: application/pdf
  creator: tvn
  date_created: 2022-01-07T10:42:54Z
  date_updated: 2022-01-07T10:42:54Z
  file_id: '29181'
  file_name: ITG2021_Speeding_up_Permutation_Invariant_Training.pdf
  file_size: 236670
  relation: main_file
file_date_updated: 2022-01-07T10:42:54Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: Speech Communication; 14th ITG Conference
quality_controlled: '1'
status: public
title: Speeding Up Permutation Invariant Training for Source Separation
type: conference
user_id: '49870'
year: '2021'
...
---
_id: '29308'
abstract:
- lang: eng
  text: 'In this paper we present our system for the Detection and Classification
    of Acoustic Scenes and Events (DCASE) 2021 Challenge Task 4: Sound Event Detection
    and Separation in Domestic Environments, where it scored the fourth rank. Our
    presented solution is an advancement of our system used in the previous edition
    of the task.We use a forward-backward convolutional recurrent neural network (FBCRNN)
    for tagging and pseudo labeling followed by tag-conditioned sound event detection
    (SED) models which are trained using strong pseudo labels provided by the FBCRNN.
    Our advancement over our earlier model is threefold. First, we introduce a strong
    label loss in the objective of the FBCRNN to take advantage of the strongly labeled
    synthetic data during training. Second, we perform multiple iterations of self-training
    for both the FBCRNN and tag-conditioned SED models. Third, while we used only
    tag-conditioned CNNs as our SED model in the previous edition we here explore
    sophisticated tag-conditioned SED model architectures, namely, bidirectional CRNNs
    and bidirectional convolutional transformer neural networks (CTNNs), and combine
    them. With metric and class specific tuning of median filter lengths for post-processing,
    our final SED model, consisting of 6 submodels (2 of each architecture), achieves
    on the public evaluation set poly-phonic sound event detection scores (PSDS) of
    0.455 for scenario 1 and 0.684 for scenario as well as a collar-based F1-score
    of 0.596 outperforming the baselines and our model from the previous edition by
    far. Source code is publicly available at https://github.com/fgnt/pb_sed.'
author:
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Ebbers J, Haeb-Umbach R. Self-Trained Audio Tagging and Sound Event Detection
    in Domestic Environments. In: <i>Proceedings of the 6th Detection and Classification
    of Acoustic Scenes and Events 2021 Workshop (DCASE2021)</i>. ; 2021:226–230.'
  apa: Ebbers, J., &#38; Haeb-Umbach, R. (2021). Self-Trained Audio Tagging and Sound
    Event Detection in Domestic Environments. <i>Proceedings of the 6th Detection
    and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)</i>,
    226–230.
  bibtex: '@inproceedings{Ebbers_Haeb-Umbach_2021, place={Barcelona, Spain}, title={Self-Trained
    Audio Tagging and Sound Event Detection in Domestic Environments}, booktitle={Proceedings
    of the 6th Detection and Classification of Acoustic Scenes and Events 2021 Workshop
    (DCASE2021)}, author={Ebbers, Janek and Haeb-Umbach, Reinhold}, year={2021}, pages={226–230}
    }'
  chicago: Ebbers, Janek, and Reinhold Haeb-Umbach. “Self-Trained Audio Tagging and
    Sound Event Detection in Domestic Environments.” In <i>Proceedings of the 6th
    Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)</i>,
    226–230. Barcelona, Spain, 2021.
  ieee: J. Ebbers and R. Haeb-Umbach, “Self-Trained Audio Tagging and Sound Event
    Detection in Domestic Environments,” in <i>Proceedings of the 6th Detection and
    Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)</i>, 2021,
    pp. 226–230.
  mla: Ebbers, Janek, and Reinhold Haeb-Umbach. “Self-Trained Audio Tagging and Sound
    Event Detection in Domestic Environments.” <i>Proceedings of the 6th Detection
    and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)</i>,
    2021, pp. 226–230.
  short: 'J. Ebbers, R. Haeb-Umbach, in: Proceedings of the 6th Detection and Classification
    of Acoustic Scenes and Events 2021 Workshop (DCASE2021), Barcelona, Spain, 2021,
    pp. 226–230.'
date_created: 2022-01-13T08:07:47Z
date_updated: 2023-11-22T08:28:32Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: ebbers
  date_created: 2022-01-13T08:08:54Z
  date_updated: 2022-01-13T08:19:50Z
  file_id: '29309'
  file_name: template.pdf
  file_size: 239462
  relation: main_file
file_date_updated: 2022-01-13T08:19:50Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
page: 226–230
place: Barcelona, Spain
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: Proceedings of the 6th Detection and Classification of Acoustic Scenes
  and Events 2021 Workshop (DCASE2021)
publication_identifier:
  isbn:
  - 978-84-09-36072-7
quality_controlled: '1'
status: public
title: Self-Trained Audio Tagging and Sound Event Detection in Domestic Environments
type: conference
user_id: '34851'
year: '2021'
...
---
_id: '29306'
abstract:
- lang: eng
  text: Recently, there has been a rising interest in sound recognition via Acoustic
    Sensor Networks to support applications such as ambient assisted living or environmental
    habitat monitoring. With state-of-the-art sound recognition being dominated by
    deep-learning-based approaches, there is a high demand for labeled training data.
    Despite the availability of large-scale  data sets such as Google's AudioSet,
    acquiring training data matching a certain application environment is still often
    a problem. In this paper we are concerned with human activity monitoring in a
    domestic environment using an ASN consisting of multiple nodes each providing
    multichannel signals. We propose a self-training based domain adaptation approach,
    which only requires unlabeled data from the target environment. Here, a sound
    recognition system trained on AudioSet, the teacher, generates pseudo labels for
    data from the target environment on which a student network is trained. The student
    can furthermore glean information about the spatial arrangement of sensors and
    sound sources to further improve classification performance. It is shown that  the
    student significantly improves recognition performance over the pre-trained teacher
    without relying on labeled data from the environment the system is deployed in.
author:
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Moritz Curt
  full_name: Keyser, Moritz Curt
  last_name: Keyser
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Ebbers J, Keyser MC, Haeb-Umbach R. Adapting Sound Recognition to A New Environment
    Via Self-Training. In: <i>Proceedings of the 29th European Signal Processing Conference
    (EUSIPCO)</i>. ; 2021:1135–1139.'
  apa: Ebbers, J., Keyser, M. C., &#38; Haeb-Umbach, R. (2021). Adapting Sound Recognition
    to A New Environment Via Self-Training. <i>Proceedings of the 29th European Signal
    Processing Conference (EUSIPCO)</i>, 1135–1139.
  bibtex: '@inproceedings{Ebbers_Keyser_Haeb-Umbach_2021, title={Adapting Sound Recognition
    to A New Environment Via Self-Training}, booktitle={Proceedings of the 29th European
    Signal Processing Conference (EUSIPCO)}, author={Ebbers, Janek and Keyser, Moritz
    Curt and Haeb-Umbach, Reinhold}, year={2021}, pages={1135–1139} }'
  chicago: Ebbers, Janek, Moritz Curt Keyser, and Reinhold Haeb-Umbach. “Adapting
    Sound Recognition to A New Environment Via Self-Training.” In <i>Proceedings of
    the 29th European Signal Processing Conference (EUSIPCO)</i>, 1135–1139, 2021.
  ieee: J. Ebbers, M. C. Keyser, and R. Haeb-Umbach, “Adapting Sound Recognition to
    A New Environment Via Self-Training,” in <i>Proceedings of the 29th European Signal
    Processing Conference (EUSIPCO)</i>, 2021, pp. 1135–1139.
  mla: Ebbers, Janek, et al. “Adapting Sound Recognition to A New Environment Via
    Self-Training.” <i>Proceedings of the 29th European Signal Processing Conference
    (EUSIPCO)</i>, 2021, pp. 1135–1139.
  short: 'J. Ebbers, M.C. Keyser, R. Haeb-Umbach, in: Proceedings of the 29th European
    Signal Processing Conference (EUSIPCO), 2021, pp. 1135–1139.'
date_created: 2022-01-13T08:01:21Z
date_updated: 2023-11-22T08:28:50Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: ebbers
  date_created: 2022-01-13T08:03:26Z
  date_updated: 2022-01-13T08:19:35Z
  file_id: '29307'
  file_name: conference_101719.pdf
  file_size: 213938
  relation: main_file
file_date_updated: 2022-01-13T08:19:35Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
page: 1135–1139
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: Proceedings of the 29th European Signal Processing Conference (EUSIPCO)
quality_controlled: '1'
status: public
title: Adapting Sound Recognition to A New Environment Via Self-Training
type: conference
user_id: '34851'
year: '2021'
...
---
_id: '24456'
abstract:
- lang: eng
  text: One objective of current research in explainable intelligent systems is to
    implement social aspects in order to increase the relevance of explanations. In
    this paper, we argue that a novel conceptual framework is needed to overcome shortcomings
    of existing AI systems with little attention to processes of interaction and learning.
    Drawing from research in interaction and development, we first outline the novel
    conceptual framework that pushes the design of AI systems toward true interactivity
    with an emphasis on the role of the partner and social relevance. We propose that
    AI systems will be able to provide a meaningful and relevant explanation only
    if the process of explaining is extended to active contribution of both partners
    that brings about dynamics that is modulated by different levels of analysis.
    Accordingly, our conceptual framework comprises monitoring and scaffolding as
    key concepts and claims that the process of explaining is not only modulated by
    the interaction between explainee and explainer but is embedded into a larger
    social context in which conventionalized and routinized behaviors are established.
    We discuss our conceptual framework in relation to the established objectives
    of transparency and autonomy that are raised for the design of explainable AI
    systems currently.
article_type: original
author:
- first_name: Katharina J.
  full_name: Rohlfing, Katharina J.
  id: '50352'
  last_name: Rohlfing
- first_name: Philipp
  full_name: Cimiano, Philipp
  last_name: Cimiano
- first_name: Ingrid
  full_name: Scharlau, Ingrid
  id: '451'
  last_name: Scharlau
  orcid: 0000-0003-2364-9489
- first_name: Tobias
  full_name: Matzner, Tobias
  id: '65695'
  last_name: Matzner
- first_name: Heike M.
  full_name: Buhl, Heike M.
  id: '27152'
  last_name: Buhl
- first_name: Hendrik
  full_name: Buschmeier, Hendrik
  last_name: Buschmeier
- first_name: Elena
  full_name: Esposito, Elena
  last_name: Esposito
- first_name: Angela
  full_name: Grimminger, Angela
  id: '57578'
  last_name: Grimminger
- first_name: Barbara
  full_name: Hammer, Barbara
  last_name: Hammer
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Ilona
  full_name: Horwath, Ilona
  id: '68836'
  last_name: Horwath
- first_name: Eyke
  full_name: Hüllermeier, Eyke
  id: '48129'
  last_name: Hüllermeier
- first_name: Friederike
  full_name: Kern, Friederike
  last_name: Kern
- first_name: Stefan
  full_name: Kopp, Stefan
  last_name: Kopp
- first_name: Kirsten
  full_name: Thommes, Kirsten
  id: '72497'
  last_name: Thommes
- first_name: Axel-Cyrille
  full_name: Ngonga Ngomo, Axel-Cyrille
  id: '65716'
  last_name: Ngonga Ngomo
- first_name: Carsten
  full_name: Schulte, Carsten
  id: '60311'
  last_name: Schulte
- first_name: Henning
  full_name: Wachsmuth, Henning
  id: '3900'
  last_name: Wachsmuth
- first_name: Petra
  full_name: Wagner, Petra
  last_name: Wagner
- first_name: Britta
  full_name: Wrede, Britta
  last_name: Wrede
citation:
  ama: 'Rohlfing KJ, Cimiano P, Scharlau I, et al. Explanation as a Social Practice:
    Toward a Conceptual Framework for the Social Design of AI Systems. <i>IEEE Transactions
    on Cognitive and Developmental Systems</i>. 2021;13(3):717-728. doi:<a href="https://doi.org/10.1109/tcds.2020.3044366">10.1109/tcds.2020.3044366</a>'
  apa: 'Rohlfing, K. J., Cimiano, P., Scharlau, I., Matzner, T., Buhl, H. M., Buschmeier,
    H., Esposito, E., Grimminger, A., Hammer, B., Haeb-Umbach, R., Horwath, I., Hüllermeier,
    E., Kern, F., Kopp, S., Thommes, K., Ngonga Ngomo, A.-C., Schulte, C., Wachsmuth,
    H., Wagner, P., &#38; Wrede, B. (2021). Explanation as a Social Practice: Toward
    a Conceptual Framework for the Social Design of AI Systems. <i>IEEE Transactions
    on Cognitive and Developmental Systems</i>, <i>13</i>(3), 717–728. <a href="https://doi.org/10.1109/tcds.2020.3044366">https://doi.org/10.1109/tcds.2020.3044366</a>'
  bibtex: '@article{Rohlfing_Cimiano_Scharlau_Matzner_Buhl_Buschmeier_Esposito_Grimminger_Hammer_Haeb-Umbach_et
    al._2021, title={Explanation as a Social Practice: Toward a Conceptual Framework
    for the Social Design of AI Systems}, volume={13}, DOI={<a href="https://doi.org/10.1109/tcds.2020.3044366">10.1109/tcds.2020.3044366</a>},
    number={3}, journal={IEEE Transactions on Cognitive and Developmental Systems},
    author={Rohlfing, Katharina J. and Cimiano, Philipp and Scharlau, Ingrid and Matzner,
    Tobias and Buhl, Heike M. and Buschmeier, Hendrik and Esposito, Elena and Grimminger,
    Angela and Hammer, Barbara and Haeb-Umbach, Reinhold and et al.}, year={2021},
    pages={717–728} }'
  chicago: 'Rohlfing, Katharina J., Philipp Cimiano, Ingrid Scharlau, Tobias Matzner,
    Heike M. Buhl, Hendrik Buschmeier, Elena Esposito, et al. “Explanation as a Social
    Practice: Toward a Conceptual Framework for the Social Design of AI Systems.”
    <i>IEEE Transactions on Cognitive and Developmental Systems</i> 13, no. 3 (2021):
    717–28. <a href="https://doi.org/10.1109/tcds.2020.3044366">https://doi.org/10.1109/tcds.2020.3044366</a>.'
  ieee: 'K. J. Rohlfing <i>et al.</i>, “Explanation as a Social Practice: Toward a
    Conceptual Framework for the Social Design of AI Systems,” <i>IEEE Transactions
    on Cognitive and Developmental Systems</i>, vol. 13, no. 3, pp. 717–728, 2021,
    doi: <a href="https://doi.org/10.1109/tcds.2020.3044366">10.1109/tcds.2020.3044366</a>.'
  mla: 'Rohlfing, Katharina J., et al. “Explanation as a Social Practice: Toward a
    Conceptual Framework for the Social Design of AI Systems.” <i>IEEE Transactions
    on Cognitive and Developmental Systems</i>, vol. 13, no. 3, 2021, pp. 717–28,
    doi:<a href="https://doi.org/10.1109/tcds.2020.3044366">10.1109/tcds.2020.3044366</a>.'
  short: K.J. Rohlfing, P. Cimiano, I. Scharlau, T. Matzner, H.M. Buhl, H. Buschmeier,
    E. Esposito, A. Grimminger, B. Hammer, R. Haeb-Umbach, I. Horwath, E. Hüllermeier,
    F. Kern, S. Kopp, K. Thommes, A.-C. Ngonga Ngomo, C. Schulte, H. Wachsmuth, P.
    Wagner, B. Wrede, IEEE Transactions on Cognitive and Developmental Systems 13
    (2021) 717–728.
date_created: 2021-09-14T20:52:57Z
date_updated: 2023-12-05T10:15:02Z
ddc:
- '300'
department:
- _id: '603'
- _id: '749'
- _id: '424'
- _id: '67'
- _id: '574'
- _id: '184'
- _id: '757'
- _id: '54'
- _id: '178'
doi: 10.1109/tcds.2020.3044366
file:
- access_level: open_access
  content_type: application/pdf
  creator: haebumb
  date_created: 2023-11-20T16:33:51Z
  date_updated: 2023-11-20T16:33:51Z
  file_id: '49081'
  file_name: 2020-12-01_explainability_final_version.pdf
  file_size: 626217
  relation: main_file
file_date_updated: 2023-11-20T16:33:51Z
has_accepted_license: '1'
intvolume: '        13'
issue: '3'
keyword:
- Explainability
- process ofexplaining andunderstanding
- explainable artificial systems
language:
- iso: eng
oa: '1'
page: 717-728
project:
- _id: '109'
  grant_number: '438445824'
  name: 'TRR 318: TRR 318 - Erklärbarkeit konstruieren'
publication: IEEE Transactions on Cognitive and Developmental Systems
publication_identifier:
  issn:
  - 2379-8920
  - 2379-8939
publication_status: published
quality_controlled: '1'
status: public
title: 'Explanation as a Social Practice: Toward a Conceptual Framework for the Social
  Design of AI Systems'
type: journal_article
user_id: '42933'
volume: 13
year: '2021'
...
---
_id: '17763'
author:
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Haeb-Umbach R. Sprachtechnologien für Digitale Assistenten. In: Böck R, Siegert
    I, Wendemuth A, eds. <i>Studientexte Zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung
    2020</i>. TUDpress, Dresden; 2020:227-234.'
  apa: 'Haeb-Umbach, R. (2020). Sprachtechnologien für Digitale Assistenten. In R.
    Böck, I. Siegert, &#38; A. Wendemuth (Eds.), <i>Studientexte zur Sprachkommunikation:
    Elektronische Sprachsignalverarbeitung 2020</i> (pp. 227–234). TUDpress, Dresden.'
  bibtex: '@inproceedings{Haeb-Umbach_2020, title={Sprachtechnologien für Digitale
    Assistenten}, booktitle={Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung
    2020}, publisher={TUDpress, Dresden}, author={Haeb-Umbach, Reinhold}, editor={Böck,
    Ronald and Siegert, Ingo and Wendemuth, AndreasEditors}, year={2020}, pages={227–234}
    }'
  chicago: 'Haeb-Umbach, Reinhold. “Sprachtechnologien Für Digitale Assistenten.”
    In <i>Studientexte Zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung
    2020</i>, edited by Ronald Böck, Ingo Siegert, and Andreas Wendemuth, 227–34.
    TUDpress, Dresden, 2020.'
  ieee: 'R. Haeb-Umbach, “Sprachtechnologien für Digitale Assistenten,” in <i>Studientexte
    zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2020</i>, 2020,
    pp. 227–234.'
  mla: 'Haeb-Umbach, Reinhold. “Sprachtechnologien Für Digitale Assistenten.” <i>Studientexte
    Zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2020</i>, edited
    by Ronald Böck et al., TUDpress, Dresden, 2020, pp. 227–34.'
  short: 'R. Haeb-Umbach, in: R. Böck, I. Siegert, A. Wendemuth (Eds.), Studientexte
    Zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2020, TUDpress,
    Dresden, 2020, pp. 227–234.'
date_created: 2020-08-10T09:53:12Z
date_updated: 2022-01-06T06:53:19Z
department:
- _id: '54'
editor:
- first_name: Ronald
  full_name: Böck, Ronald
  last_name: Böck
- first_name: Ingo
  full_name: Siegert, Ingo
  last_name: Siegert
- first_name: Andreas
  full_name: Wendemuth, Andreas
  last_name: Wendemuth
keyword:
- Poster
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2020/ESSV_2020_haeb_umbach.pdf
oa: '1'
page: 227-234
publication: 'Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung
  2020'
publication_identifier:
  isbn:
  - 978-3-959081-93-1
publisher: TUDpress, Dresden
status: public
title: Sprachtechnologien für Digitale Assistenten
type: conference
user_id: '44006'
year: '2020'
...
---
_id: '20700'
author:
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Catalin
  full_name: Zorila, Catalin
  last_name: Zorila
- first_name: Daichi
  full_name: Hayakawa, Daichi
  last_name: Hayakawa
- first_name: Mohan
  full_name: Li, Mohan
  last_name: Li
- first_name: Min
  full_name: Liu, Min
  last_name: Liu
- first_name: Rama
  full_name: Doddipatla, Rama
  last_name: Doddipatla
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Boeddeker C, Cord-Landwehr T, Heitkaemper J, et al. Towards a speaker diarization
    system for the CHiME 2020 dinner party transcription. In: <i>Proc. CHiME 2020
    Workshop on Speech Processing in Everyday Environments</i>. ; 2020.'
  apa: Boeddeker, C., Cord-Landwehr, T., Heitkaemper, J., Zorila, C., Hayakawa, D.,
    Li, M., … Haeb-Umbach, R. (2020). Towards a speaker diarization system for the
    CHiME 2020 dinner party transcription. In <i>Proc. CHiME 2020 Workshop on Speech
    Processing in Everyday Environments</i>.
  bibtex: '@inproceedings{Boeddeker_Cord-Landwehr_Heitkaemper_Zorila_Hayakawa_Li_Liu_Doddipatla_Haeb-Umbach_2020,
    title={Towards a speaker diarization system for the CHiME 2020 dinner party transcription},
    booktitle={Proc. CHiME 2020 Workshop on Speech Processing in Everyday Environments},
    author={Boeddeker, Christoph and Cord-Landwehr, Tobias and Heitkaemper, Jens and
    Zorila, Catalin and Hayakawa, Daichi and Li, Mohan and Liu, Min and Doddipatla,
    Rama and Haeb-Umbach, Reinhold}, year={2020} }'
  chicago: Boeddeker, Christoph, Tobias Cord-Landwehr, Jens Heitkaemper, Catalin Zorila,
    Daichi Hayakawa, Mohan Li, Min Liu, Rama Doddipatla, and Reinhold Haeb-Umbach.
    “Towards a Speaker Diarization System for the CHiME 2020 Dinner Party Transcription.”
    In <i>Proc. CHiME 2020 Workshop on Speech Processing in Everyday Environments</i>,
    2020.
  ieee: C. Boeddeker <i>et al.</i>, “Towards a speaker diarization system for the
    CHiME 2020 dinner party transcription,” in <i>Proc. CHiME 2020 Workshop on Speech
    Processing in Everyday Environments</i>, 2020.
  mla: Boeddeker, Christoph, et al. “Towards a Speaker Diarization System for the
    CHiME 2020 Dinner Party Transcription.” <i>Proc. CHiME 2020 Workshop on Speech
    Processing in Everyday Environments</i>, 2020.
  short: 'C. Boeddeker, T. Cord-Landwehr, J. Heitkaemper, C. Zorila, D. Hayakawa,
    M. Li, M. Liu, R. Doddipatla, R. Haeb-Umbach, in: Proc. CHiME 2020 Workshop on
    Speech Processing in Everyday Environments, 2020.'
date_created: 2020-12-11T12:49:13Z
date_updated: 2022-01-06T06:54:33Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: cbj
  date_created: 2020-12-11T12:48:48Z
  date_updated: 2020-12-11T12:48:48Z
  file_id: '20702'
  file_name: template.pdf
  file_size: 115421
  relation: main_file
file_date_updated: 2020-12-11T12:48:48Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: Proc. CHiME 2020 Workshop on Speech Processing in Everyday Environments
status: public
title: Towards a speaker diarization system for the CHiME 2020 dinner party transcription
type: conference
user_id: '40767'
year: '2020'
...
