---
_id: '56273'
abstract:
- lang: eng
  text: "This paper presents the CHiME-8 DASR challenge which carries on from the\r\nprevious
    edition CHiME-7 DASR (C7DASR) and the past CHiME-6 challenge. It\r\nfocuses on
    joint multi-channel distant speech recognition (DASR) and\r\ndiarization with
    one or more, possibly heterogeneous, devices. The main goal is\r\nto spur research
    towards meeting transcription approaches that can generalize\r\nacross arbitrary
    number of speakers, diverse settings (formal vs. informal\r\nconversations), meeting
    duration, wide-variety of acoustic scenarios and\r\ndifferent recording configurations.
    Novelties with respect to C7DASR include:\r\ni) the addition of NOTSOFAR-1, an
    additional office/corporate meeting scenario,\r\nii) a manually corrected Mixer
    6 development set, iii) a new track in which we\r\nallow the use of large-language
    models (LLM) iv) a jury award mechanism to\r\nencourage participants to explore
    also more practical and innovative solutions.\r\nTo lower the entry barrier for
    participants, we provide a standalone toolkit\r\nfor downloading and preparing
    such datasets as well as performing text\r\nnormalization and scoring their submissions.
    Furthermore, this year we also\r\nprovide two baseline systems, one directly inherited
    from C7DASR and based on\r\nESPnet and another one developed on NeMo and based
    on NeMo team submission in\r\nlast year C7DASR. Baseline system results suggest
    that the addition of the\r\nNOTSOFAR-1 scenario significantly increases the task's
    difficulty due to its\r\nhigh number of speakers and very short duration."
author:
- first_name: Samuele
  full_name: Cornell, Samuele
  last_name: Cornell
- first_name: Taejin
  full_name: Park, Taejin
  last_name: Park
- first_name: Steve
  full_name: Huang, Steve
  last_name: Huang
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Xuankai
  full_name: Chang, Xuankai
  last_name: Chang
- first_name: Matthew
  full_name: Maciejewski, Matthew
  last_name: Maciejewski
- first_name: Matthew
  full_name: Wiesner, Matthew
  last_name: Wiesner
- first_name: Paola
  full_name: Garcia, Paola
  last_name: Garcia
- first_name: Shinji
  full_name: Watanabe, Shinji
  last_name: Watanabe
citation:
  ama: Cornell S, Park T, Huang S, et al. The CHiME-8 DASR Challenge for Generalizable
    and Array Agnostic Distant  Automatic Speech Recognition and Diarization. <i>arXiv:240716447</i>.
    Published online 2024.
  apa: Cornell, S., Park, T., Huang, S., Boeddeker, C., Chang, X., Maciejewski, M.,
    Wiesner, M., Garcia, P., &#38; Watanabe, S. (2024). The CHiME-8 DASR Challenge
    for Generalizable and Array Agnostic Distant  Automatic Speech Recognition and
    Diarization. In <i>arXiv:2407.16447</i>.
  bibtex: '@article{Cornell_Park_Huang_Boeddeker_Chang_Maciejewski_Wiesner_Garcia_Watanabe_2024,
    title={The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant 
    Automatic Speech Recognition and Diarization}, journal={arXiv:2407.16447}, author={Cornell,
    Samuele and Park, Taejin and Huang, Steve and Boeddeker, Christoph and Chang,
    Xuankai and Maciejewski, Matthew and Wiesner, Matthew and Garcia, Paola and Watanabe,
    Shinji}, year={2024} }'
  chicago: Cornell, Samuele, Taejin Park, Steve Huang, Christoph Boeddeker, Xuankai
    Chang, Matthew Maciejewski, Matthew Wiesner, Paola Garcia, and Shinji Watanabe.
    “The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant  Automatic
    Speech Recognition and Diarization.” <i>ArXiv:2407.16447</i>, 2024.
  ieee: S. Cornell <i>et al.</i>, “The CHiME-8 DASR Challenge for Generalizable and
    Array Agnostic Distant  Automatic Speech Recognition and Diarization,” <i>arXiv:2407.16447</i>.
    2024.
  mla: Cornell, Samuele, et al. “The CHiME-8 DASR Challenge for Generalizable and
    Array Agnostic Distant  Automatic Speech Recognition and Diarization.” <i>ArXiv:2407.16447</i>,
    2024.
  short: S. Cornell, T. Park, S. Huang, C. Boeddeker, X. Chang, M. Maciejewski, M.
    Wiesner, P. Garcia, S. Watanabe, ArXiv:2407.16447 (2024).
date_created: 2024-09-30T08:08:46Z
date_updated: 2024-09-30T08:09:40Z
department:
- _id: '54'
external_id:
  arxiv:
  - '2407.16447'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/pdf/2407.16447
oa: '1'
publication: arXiv:2407.16447
status: public
title: The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant  Automatic
  Speech Recognition and Diarization
type: preprint
user_id: '40767'
year: '2024'
...
---
_id: '52958'
author:
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Aswin Shanmugam
  full_name: Subramanian, Aswin Shanmugam
  last_name: Subramanian
- first_name: Gordon
  full_name: Wichern, Gordon
  last_name: Wichern
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Jonathan
  full_name: Le Roux, Jonathan
  last_name: Le Roux
citation:
  ama: 'Boeddeker C, Subramanian AS, Wichern G, Haeb-Umbach R, Le Roux J. TS-SEP:
    Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings.
    <i>IEEE/ACM Transactions on Audio, Speech, and Language Processing</i>. 2024;32:1185-1197.
    doi:<a href="https://doi.org/10.1109/taslp.2024.3350887">10.1109/taslp.2024.3350887</a>'
  apa: 'Boeddeker, C., Subramanian, A. S., Wichern, G., Haeb-Umbach, R., &#38; Le
    Roux, J. (2024). TS-SEP: Joint Diarization and Separation Conditioned on Estimated
    Speaker Embeddings. <i>IEEE/ACM Transactions on Audio, Speech, and Language Processing</i>,
    <i>32</i>, 1185–1197. <a href="https://doi.org/10.1109/taslp.2024.3350887">https://doi.org/10.1109/taslp.2024.3350887</a>'
  bibtex: '@article{Boeddeker_Subramanian_Wichern_Haeb-Umbach_Le Roux_2024, title={TS-SEP:
    Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings},
    volume={32}, DOI={<a href="https://doi.org/10.1109/taslp.2024.3350887">10.1109/taslp.2024.3350887</a>},
    journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, publisher={Institute
    of Electrical and Electronics Engineers (IEEE)}, author={Boeddeker, Christoph
    and Subramanian, Aswin Shanmugam and Wichern, Gordon and Haeb-Umbach, Reinhold
    and Le Roux, Jonathan}, year={2024}, pages={1185–1197} }'
  chicago: 'Boeddeker, Christoph, Aswin Shanmugam Subramanian, Gordon Wichern, Reinhold
    Haeb-Umbach, and Jonathan Le Roux. “TS-SEP: Joint Diarization and Separation Conditioned
    on Estimated Speaker Embeddings.” <i>IEEE/ACM Transactions on Audio, Speech, and
    Language Processing</i> 32 (2024): 1185–97. <a href="https://doi.org/10.1109/taslp.2024.3350887">https://doi.org/10.1109/taslp.2024.3350887</a>.'
  ieee: 'C. Boeddeker, A. S. Subramanian, G. Wichern, R. Haeb-Umbach, and J. Le Roux,
    “TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings,”
    <i>IEEE/ACM Transactions on Audio, Speech, and Language Processing</i>, vol. 32,
    pp. 1185–1197, 2024, doi: <a href="https://doi.org/10.1109/taslp.2024.3350887">10.1109/taslp.2024.3350887</a>.'
  mla: 'Boeddeker, Christoph, et al. “TS-SEP: Joint Diarization and Separation Conditioned
    on Estimated Speaker Embeddings.” <i>IEEE/ACM Transactions on Audio, Speech, and
    Language Processing</i>, vol. 32, Institute of Electrical and Electronics Engineers
    (IEEE), 2024, pp. 1185–97, doi:<a href="https://doi.org/10.1109/taslp.2024.3350887">10.1109/taslp.2024.3350887</a>.'
  short: C. Boeddeker, A.S. Subramanian, G. Wichern, R. Haeb-Umbach, J. Le Roux, IEEE/ACM
    Transactions on Audio, Speech, and Language Processing 32 (2024) 1185–1197.
date_created: 2024-03-26T16:11:54Z
date_updated: 2025-04-16T10:21:45Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/taslp.2024.3350887
file:
- access_level: open_access
  content_type: application/pdf
  creator: cbj
  date_created: 2025-04-16T10:14:47Z
  date_updated: 2025-04-16T10:21:45Z
  file_id: '59602'
  file_name: main.pdf
  file_size: 3432879
  relation: main_file
- access_level: open_access
  content_type: application/pdf
  creator: cbj
  date_created: 2025-04-16T10:15:08Z
  date_updated: 2025-04-16T10:21:45Z
  file_id: '59603'
  file_name: slides.pdf
  file_size: 2838635
  relation: main_file
- access_level: open_access
  content_type: application/pdf
  creator: cbj
  date_created: 2025-04-16T10:15:22Z
  date_updated: 2025-04-16T10:21:45Z
  file_id: '59604'
  file_name: poster.pdf
  file_size: 2038741
  relation: main_file
file_date_updated: 2025-04-16T10:21:45Z
has_accepted_license: '1'
intvolume: '        32'
keyword:
- Electrical and Electronic Engineering
- Acoustics and Ultrasonics
- Computer Science (miscellaneous)
- Computational Mathematics
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/abs/2303.03849
oa: '1'
page: 1185-1197
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: IEEE/ACM Transactions on Audio, Speech, and Language Processing
publication_identifier:
  issn:
  - 2329-9290
  - 2329-9304
publication_status: published
publisher: Institute of Electrical and Electronics Engineers (IEEE)
status: public
title: 'TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker
  Embeddings'
type: journal_article
user_id: '40767'
volume: 32
year: '2024'
...
---
_id: '56004'
author:
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'von Neumann T, Boeddeker C, Cord-Landwehr T, Delcroix M, Haeb-Umbach R. Meeting
    Recognition with Continuous Speech Separation and Transcription-Supported Diarization.
    In: <i>2024 IEEE International Conference on Acoustics, Speech, and Signal Processing
    Workshops (ICASSPW)</i>. IEEE; 2024. doi:<a href="https://doi.org/10.1109/icasspw62465.2024.10625894">10.1109/icasspw62465.2024.10625894</a>'
  apa: von Neumann, T., Boeddeker, C., Cord-Landwehr, T., Delcroix, M., &#38; Haeb-Umbach,
    R. (2024). Meeting Recognition with Continuous Speech Separation and Transcription-Supported
    Diarization. <i>2024 IEEE International Conference on Acoustics, Speech, and Signal
    Processing Workshops (ICASSPW)</i>. <a href="https://doi.org/10.1109/icasspw62465.2024.10625894">https://doi.org/10.1109/icasspw62465.2024.10625894</a>
  bibtex: '@inproceedings{von Neumann_Boeddeker_Cord-Landwehr_Delcroix_Haeb-Umbach_2024,
    title={Meeting Recognition with Continuous Speech Separation and Transcription-Supported
    Diarization}, DOI={<a href="https://doi.org/10.1109/icasspw62465.2024.10625894">10.1109/icasspw62465.2024.10625894</a>},
    booktitle={2024 IEEE International Conference on Acoustics, Speech, and Signal
    Processing Workshops (ICASSPW)}, publisher={IEEE}, author={von Neumann, Thilo
    and Boeddeker, Christoph and Cord-Landwehr, Tobias and Delcroix, Marc and Haeb-Umbach,
    Reinhold}, year={2024} }'
  chicago: Neumann, Thilo von, Christoph Boeddeker, Tobias Cord-Landwehr, Marc Delcroix,
    and Reinhold Haeb-Umbach. “Meeting Recognition with Continuous Speech Separation
    and Transcription-Supported Diarization.” In <i>2024 IEEE International Conference
    on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)</i>. IEEE, 2024.
    <a href="https://doi.org/10.1109/icasspw62465.2024.10625894">https://doi.org/10.1109/icasspw62465.2024.10625894</a>.
  ieee: 'T. von Neumann, C. Boeddeker, T. Cord-Landwehr, M. Delcroix, and R. Haeb-Umbach,
    “Meeting Recognition with Continuous Speech Separation and Transcription-Supported
    Diarization,” 2024, doi: <a href="https://doi.org/10.1109/icasspw62465.2024.10625894">10.1109/icasspw62465.2024.10625894</a>.'
  mla: von Neumann, Thilo, et al. “Meeting Recognition with Continuous Speech Separation
    and Transcription-Supported Diarization.” <i>2024 IEEE International Conference
    on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)</i>, IEEE, 2024,
    doi:<a href="https://doi.org/10.1109/icasspw62465.2024.10625894">10.1109/icasspw62465.2024.10625894</a>.
  short: 'T. von Neumann, C. Boeddeker, T. Cord-Landwehr, M. Delcroix, R. Haeb-Umbach,
    in: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing
    Workshops (ICASSPW), IEEE, 2024.'
date_created: 2024-09-04T07:26:02Z
date_updated: 2025-02-12T09:20:07Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/icasspw62465.2024.10625894
file:
- access_level: open_access
  content_type: application/pdf
  creator: tvn
  date_created: 2024-09-04T07:34:30Z
  date_updated: 2024-09-04T07:34:30Z
  file_id: '56005'
  file_name: main.pdf
  file_size: 150432
  relation: main_file
file_date_updated: 2024-09-04T07:34:30Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing
  Workshops (ICASSPW)
publication_status: published
publisher: IEEE
status: public
title: Meeting Recognition with Continuous Speech Separation and Transcription-Supported
  Diarization
type: conference
user_id: '40767'
year: '2024'
...
---
_id: '56272'
author:
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Boeddeker C, Cord-Landwehr T, Haeb-Umbach R. Once more Diarization: Improving
    meeting transcription systems through segment-level speaker reassignment. In:
    <i>Interspeech 2024</i>. ISCA; 2024. doi:<a href="https://doi.org/10.21437/interspeech.2024-1286">10.21437/interspeech.2024-1286</a>'
  apa: 'Boeddeker, C., Cord-Landwehr, T., &#38; Haeb-Umbach, R. (2024). Once more
    Diarization: Improving meeting transcription systems through segment-level speaker
    reassignment. <i>Interspeech 2024</i>. <a href="https://doi.org/10.21437/interspeech.2024-1286">https://doi.org/10.21437/interspeech.2024-1286</a>'
  bibtex: '@inproceedings{Boeddeker_Cord-Landwehr_Haeb-Umbach_2024, title={Once more
    Diarization: Improving meeting transcription systems through segment-level speaker
    reassignment}, DOI={<a href="https://doi.org/10.21437/interspeech.2024-1286">10.21437/interspeech.2024-1286</a>},
    booktitle={Interspeech 2024}, publisher={ISCA}, author={Boeddeker, Christoph and
    Cord-Landwehr, Tobias and Haeb-Umbach, Reinhold}, year={2024} }'
  chicago: 'Boeddeker, Christoph, Tobias Cord-Landwehr, and Reinhold Haeb-Umbach.
    “Once More Diarization: Improving Meeting Transcription Systems through Segment-Level
    Speaker Reassignment.” In <i>Interspeech 2024</i>. ISCA, 2024. <a href="https://doi.org/10.21437/interspeech.2024-1286">https://doi.org/10.21437/interspeech.2024-1286</a>.'
  ieee: 'C. Boeddeker, T. Cord-Landwehr, and R. Haeb-Umbach, “Once more Diarization:
    Improving meeting transcription systems through segment-level speaker reassignment,”
    2024, doi: <a href="https://doi.org/10.21437/interspeech.2024-1286">10.21437/interspeech.2024-1286</a>.'
  mla: 'Boeddeker, Christoph, et al. “Once More Diarization: Improving Meeting Transcription
    Systems through Segment-Level Speaker Reassignment.” <i>Interspeech 2024</i>,
    ISCA, 2024, doi:<a href="https://doi.org/10.21437/interspeech.2024-1286">10.21437/interspeech.2024-1286</a>.'
  short: 'C. Boeddeker, T. Cord-Landwehr, R. Haeb-Umbach, in: Interspeech 2024, ISCA,
    2024.'
date_created: 2024-09-30T08:04:47Z
date_updated: 2025-02-12T09:18:36Z
department:
- _id: '54'
doi: 10.21437/interspeech.2024-1286
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://www.isca-archive.org/interspeech_2024/boeddeker24_interspeech.pdf
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: Interspeech 2024
publication_status: published
publisher: ISCA
status: public
title: 'Once more Diarization: Improving meeting transcription systems through segment-level
  speaker reassignment'
type: conference
user_id: '40767'
year: '2024'
...
---
_id: '57659'
author:
- first_name: Peter
  full_name: Vieting, Peter
  last_name: Vieting
- first_name: Simon
  full_name: Berger, Simon
  last_name: Berger
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Ralf
  full_name: Schlüter, Ralf
  last_name: Schlüter
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Vieting P, Berger S, von Neumann T, Boeddeker C, Schlüter R, Haeb-Umbach R.
    Combining TF-GridNet and Mixture Encoder for Continuous Speech Separation for
    Meeting Transcription. In: <i>2024 IEEE Spoken Language Technology Workshop (SLT)</i>.
    ; 2024.'
  apa: Vieting, P., Berger, S., von Neumann, T., Boeddeker, C., Schlüter, R., &#38;
    Haeb-Umbach, R. (2024). Combining TF-GridNet and Mixture Encoder for Continuous
    Speech Separation for Meeting Transcription. <i>2024 IEEE Spoken Language Technology
    Workshop (SLT)</i>.
  bibtex: '@inproceedings{Vieting_Berger_von Neumann_Boeddeker_Schlüter_Haeb-Umbach_2024,
    title={Combining TF-GridNet and Mixture Encoder for Continuous Speech Separation
    for Meeting Transcription}, booktitle={2024 IEEE Spoken Language Technology Workshop
    (SLT)}, author={Vieting, Peter and Berger, Simon and von Neumann, Thilo and Boeddeker,
    Christoph and Schlüter, Ralf and Haeb-Umbach, Reinhold}, year={2024} }'
  chicago: Vieting, Peter, Simon Berger, Thilo von Neumann, Christoph Boeddeker, Ralf
    Schlüter, and Reinhold Haeb-Umbach. “Combining TF-GridNet and Mixture Encoder
    for Continuous Speech Separation for Meeting Transcription.” In <i>2024 IEEE Spoken
    Language Technology Workshop (SLT)</i>, 2024.
  ieee: P. Vieting, S. Berger, T. von Neumann, C. Boeddeker, R. Schlüter, and R. Haeb-Umbach,
    “Combining TF-GridNet and Mixture Encoder for Continuous Speech Separation for
    Meeting Transcription,” 2024.
  mla: Vieting, Peter, et al. “Combining TF-GridNet and Mixture Encoder for Continuous
    Speech Separation for Meeting Transcription.” <i>2024 IEEE Spoken Language Technology
    Workshop (SLT)</i>, 2024.
  short: 'P. Vieting, S. Berger, T. von Neumann, C. Boeddeker, R. Schlüter, R. Haeb-Umbach,
    in: 2024 IEEE Spoken Language Technology Workshop (SLT), 2024.'
date_created: 2024-12-09T11:46:18Z
date_updated: 2025-02-12T09:20:59Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://www-i6.informatik.rwth-aachen.de/publications/download/1259/VietingPeterBergerSimonNeumannThilovonBoeddekerChristophSchl%FCterRalfHaeb-UmbachReinhold--CombiningTF-GridNetMixtureEncoderforContinuousSpeechSeparationforMeetingTranscription--2024.pdf
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: 2024 IEEE Spoken Language Technology Workshop (SLT)
status: public
title: Combining TF-GridNet and Mixture Encoder for Continuous Speech Separation for
  Meeting Transcription
type: conference
user_id: '40767'
year: '2024'
...
---
_id: '57085'
abstract:
- lang: eng
  text: We propose an approach for simultaneous diarization and separation of meeting
    data. It consists of a complex Angular Central Gaussian Mixture Model (cACGMM)
    for speech source separation, and a von-Mises-Fisher Mixture Model (VMFMM) for
    diarization in a joint statistical framework. Through the integration, both spatial
    and spectral information are exploited for diarization and separation. We also
    develop a method for counting the number of active speakers in a segment of a
    meeting to support block-wise processing. While the total number of speakers in
    a meeting may be known, it is usually not known on a per-segment level. With the
    proposed speaker counting, joint diarization and source separation can be done
    segment-by-segment, and the permutation problem across segments is solved, thus
    allowing for block-online processing in the future. Experimental results on the
    LibriCSS meeting corpus show that the integrated approach outperforms a cascaded
    approach of diarization and speech enhancement in terms of WER, both on a per-segment
    and on a per-meeting level.
author:
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Cord-Landwehr T, Boeddeker C, Haeb-Umbach R. Simultaneous Diarization and
    Separation of Meetings through the Integration of Statistical Mixture Models.
    In: <i>ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and
    Signal Processing (ICASSP)</i>. ; 2024. doi:<a href="https://doi.org/10.1109/ICASSP49660.2025.10888445">10.1109/ICASSP49660.2025.10888445</a>'
  apa: Cord-Landwehr, T., Boeddeker, C., &#38; Haeb-Umbach, R. (2024). Simultaneous
    Diarization and Separation of Meetings through the Integration of Statistical
    Mixture Models. <i>ICASSP 2025 - 2025 IEEE International Conference on Acoustics,
    Speech and Signal Processing (ICASSP)</i>. 2025 IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India. <a href="https://doi.org/10.1109/ICASSP49660.2025.10888445">https://doi.org/10.1109/ICASSP49660.2025.10888445</a>
  bibtex: '@inproceedings{Cord-Landwehr_Boeddeker_Haeb-Umbach_2024, title={Simultaneous
    Diarization and Separation of Meetings through the Integration of Statistical
    Mixture Models}, DOI={<a href="https://doi.org/10.1109/ICASSP49660.2025.10888445">10.1109/ICASSP49660.2025.10888445</a>},
    booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)}, author={Cord-Landwehr, Tobias and Boeddeker,
    Christoph and Haeb-Umbach, Reinhold}, year={2024} }'
  chicago: Cord-Landwehr, Tobias, Christoph Boeddeker, and Reinhold Haeb-Umbach. “Simultaneous
    Diarization and Separation of Meetings through the Integration of Statistical
    Mixture Models.” In <i>ICASSP 2025 - 2025 IEEE International Conference on Acoustics,
    Speech and Signal Processing (ICASSP)</i>, 2024. <a href="https://doi.org/10.1109/ICASSP49660.2025.10888445">https://doi.org/10.1109/ICASSP49660.2025.10888445</a>.
  ieee: 'T. Cord-Landwehr, C. Boeddeker, and R. Haeb-Umbach, “Simultaneous Diarization
    and Separation of Meetings through the Integration of Statistical Mixture Models,”
    presented at the 2025 IEEE International Conference on Acoustics, Speech and Signal
    Processing (ICASSP), Hyderabad, India, 2024, doi: <a href="https://doi.org/10.1109/ICASSP49660.2025.10888445">10.1109/ICASSP49660.2025.10888445</a>.'
  mla: Cord-Landwehr, Tobias, et al. “Simultaneous Diarization and Separation of Meetings
    through the Integration of Statistical Mixture Models.” <i>ICASSP 2025 - 2025
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>,
    2024, doi:<a href="https://doi.org/10.1109/ICASSP49660.2025.10888445">10.1109/ICASSP49660.2025.10888445</a>.
  short: 'T. Cord-Landwehr, C. Boeddeker, R. Haeb-Umbach, in: ICASSP 2025 - 2025 IEEE
    International Conference on Acoustics, Speech and Signal Processing (ICASSP),
    2024.'
conference:
  location: Hyderabad, India
  name: 2025 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)
date_created: 2024-11-14T09:32:38Z
date_updated: 2025-08-14T08:12:22Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/ICASSP49660.2025.10888445
file:
- access_level: closed
  content_type: application/pdf
  creator: cord
  date_created: 2025-08-14T08:11:57Z
  date_updated: 2025-08-14T08:11:57Z
  file_id: '60930'
  file_name: main.pdf
  file_size: 259907
  relation: main_file
  success: 1
file_date_updated: 2025-08-14T08:11:57Z
has_accepted_license: '1'
keyword:
- diarization
- source separation
- mixture model
- meeting
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/pdf/2410.21455
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  name: Automatische Transkription von Gesprächssituationen
publication: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
status: public
title: Simultaneous Diarization and Separation of Meetings through the Integration
  of Statistical Mixture Models
type: conference
user_id: '44393'
year: '2024'
...
---
_id: '53659'
author:
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Cătălin
  full_name: Zorilă, Cătălin
  last_name: Zorilă
- first_name: Rama
  full_name: Doddipatla, Rama
  last_name: Doddipatla
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Cord-Landwehr T, Boeddeker C, Zorilă C, Doddipatla R, Haeb-Umbach R. Geodesic
    Interpolation of Frame-Wise Speaker Embeddings for the Diarization of Meeting
    Scenarios. In: <i>ICASSP 2024 - 2024 IEEE International Conference on Acoustics,
    Speech and Signal Processing (ICASSP)</i>. IEEE; 2024. doi:<a href="https://doi.org/10.1109/icassp48485.2024.10445911">10.1109/icassp48485.2024.10445911</a>'
  apa: Cord-Landwehr, T., Boeddeker, C., Zorilă, C., Doddipatla, R., &#38; Haeb-Umbach,
    R. (2024). Geodesic Interpolation of Frame-Wise Speaker Embeddings for the Diarization
    of Meeting Scenarios. <i>ICASSP 2024 - 2024 IEEE International Conference on Acoustics,
    Speech and Signal Processing (ICASSP)</i>. 2024 IEEE International Conference
    on Acoustics, Speech, and Signal Processing (ICASSP), Seoul. <a href="https://doi.org/10.1109/icassp48485.2024.10445911">https://doi.org/10.1109/icassp48485.2024.10445911</a>
  bibtex: '@inproceedings{Cord-Landwehr_Boeddeker_Zorilă_Doddipatla_Haeb-Umbach_2024,
    title={Geodesic Interpolation of Frame-Wise Speaker Embeddings for the Diarization
    of Meeting Scenarios}, DOI={<a href="https://doi.org/10.1109/icassp48485.2024.10445911">10.1109/icassp48485.2024.10445911</a>},
    booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)}, publisher={IEEE}, author={Cord-Landwehr, Tobias
    and Boeddeker, Christoph and Zorilă, Cătălin and Doddipatla, Rama and Haeb-Umbach,
    Reinhold}, year={2024} }'
  chicago: Cord-Landwehr, Tobias, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla,
    and Reinhold Haeb-Umbach. “Geodesic Interpolation of Frame-Wise Speaker Embeddings
    for the Diarization of Meeting Scenarios.” In <i>ICASSP 2024 - 2024 IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>. IEEE, 2024.
    <a href="https://doi.org/10.1109/icassp48485.2024.10445911">https://doi.org/10.1109/icassp48485.2024.10445911</a>.
  ieee: 'T. Cord-Landwehr, C. Boeddeker, C. Zorilă, R. Doddipatla, and R. Haeb-Umbach,
    “Geodesic Interpolation of Frame-Wise Speaker Embeddings for the Diarization of
    Meeting Scenarios,” presented at the 2024 IEEE International Conference on Acoustics,
    Speech, and Signal Processing (ICASSP), Seoul, 2024, doi: <a href="https://doi.org/10.1109/icassp48485.2024.10445911">10.1109/icassp48485.2024.10445911</a>.'
  mla: Cord-Landwehr, Tobias, et al. “Geodesic Interpolation of Frame-Wise Speaker
    Embeddings for the Diarization of Meeting Scenarios.” <i>ICASSP 2024 - 2024 IEEE
    International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>,
    IEEE, 2024, doi:<a href="https://doi.org/10.1109/icassp48485.2024.10445911">10.1109/icassp48485.2024.10445911</a>.
  short: 'T. Cord-Landwehr, C. Boeddeker, C. Zorilă, R. Doddipatla, R. Haeb-Umbach,
    in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and
    Signal Processing (ICASSP), IEEE, 2024.'
conference:
  location: Seoul
  name: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing
    (ICASSP)
date_created: 2024-04-25T12:57:22Z
date_updated: 2025-08-14T08:11:07Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/icassp48485.2024.10445911
file:
- access_level: closed
  content_type: application/pdf
  creator: cord
  date_created: 2025-08-14T08:09:52Z
  date_updated: 2025-08-14T08:09:52Z
  file_id: '60929'
  file_name: main.pdf
  file_size: 254478
  relation: main_file
  success: 1
file_date_updated: 2025-08-14T08:09:52Z
has_accepted_license: '1'
language:
- iso: eng
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  name: Automatische Transkription von Gesprächssituationen
publication: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
publication_status: published
publisher: IEEE
status: public
title: Geodesic Interpolation of Frame-Wise Speaker Embeddings for the Diarization
  of Meeting Scenarios
type: conference
user_id: '44393'
year: '2024'
...
---
_id: '48391'
author:
- first_name: Rohith
  full_name: Aralikatti, Rohith
  last_name: Aralikatti
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Gordon
  full_name: Wichern, Gordon
  last_name: Wichern
- first_name: Aswin
  full_name: Subramanian, Aswin
  last_name: Subramanian
- first_name: Jonathan
  full_name: Le Roux, Jonathan
  last_name: Le Roux
citation:
  ama: 'Aralikatti R, Boeddeker C, Wichern G, Subramanian A, Le Roux J. Reverberation
    as Supervision For Speech Separation. In: <i>ICASSP 2023 - 2023 IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>. IEEE; 2023.
    doi:<a href="https://doi.org/10.1109/icassp49357.2023.10095022">10.1109/icassp49357.2023.10095022</a>'
  apa: Aralikatti, R., Boeddeker, C., Wichern, G., Subramanian, A., &#38; Le Roux,
    J. (2023). Reverberation as Supervision For Speech Separation. <i>ICASSP 2023
    - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)</i>. <a href="https://doi.org/10.1109/icassp49357.2023.10095022">https://doi.org/10.1109/icassp49357.2023.10095022</a>
  bibtex: '@inproceedings{Aralikatti_Boeddeker_Wichern_Subramanian_Le Roux_2023, title={Reverberation
    as Supervision For Speech Separation}, DOI={<a href="https://doi.org/10.1109/icassp49357.2023.10095022">10.1109/icassp49357.2023.10095022</a>},
    booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)}, publisher={IEEE}, author={Aralikatti, Rohith
    and Boeddeker, Christoph and Wichern, Gordon and Subramanian, Aswin and Le Roux,
    Jonathan}, year={2023} }'
  chicago: Aralikatti, Rohith, Christoph Boeddeker, Gordon Wichern, Aswin Subramanian,
    and Jonathan Le Roux. “Reverberation as Supervision For Speech Separation.” In
    <i>ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal
    Processing (ICASSP)</i>. IEEE, 2023. <a href="https://doi.org/10.1109/icassp49357.2023.10095022">https://doi.org/10.1109/icassp49357.2023.10095022</a>.
  ieee: 'R. Aralikatti, C. Boeddeker, G. Wichern, A. Subramanian, and J. Le Roux,
    “Reverberation as Supervision For Speech Separation,” 2023, doi: <a href="https://doi.org/10.1109/icassp49357.2023.10095022">10.1109/icassp49357.2023.10095022</a>.'
  mla: Aralikatti, Rohith, et al. “Reverberation as Supervision For Speech Separation.”
    <i>ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal
    Processing (ICASSP)</i>, IEEE, 2023, doi:<a href="https://doi.org/10.1109/icassp49357.2023.10095022">10.1109/icassp49357.2023.10095022</a>.
  short: 'R. Aralikatti, C. Boeddeker, G. Wichern, A. Subramanian, J. Le Roux, in:
    ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal
    Processing (ICASSP), IEEE, 2023.'
date_created: 2023-10-23T15:09:13Z
date_updated: 2023-10-23T15:10:16Z
department:
- _id: '54'
doi: 10.1109/icassp49357.2023.10095022
language:
- iso: eng
publication: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
publication_status: published
publisher: IEEE
status: public
title: Reverberation as Supervision For Speech Separation
type: conference
user_id: '40767'
year: '2023'
...
---
_id: '35602'
abstract:
- lang: eng
  text: "Continuous Speech Separation (CSS) has been proposed to address speech overlaps
    during the analysis of realistic meeting-like conversations by eliminating any
    overlaps before further processing.\r\nCSS separates a recording of arbitrarily
    many speakers into a small number of overlap-free output channels, where each
    output channel may contain speech of multiple speakers.\r\nThis is often done
    by applying a conventional separation model trained with Utterance-level Permutation
    Invariant Training (uPIT), which exclusively maps a speaker to an output channel,
    in sliding window approach called stitching.\r\nRecently, we introduced an alternative
    training scheme called Graph-PIT that teaches the separation network to directly
    produce output streams in the required format without stitching.\r\nIt can handle
    an arbitrary number of speakers as long as never more of them overlap at the same
    time than the separator has output channels.\r\nIn this contribution, we further
    investigate the Graph-PIT training scheme.\r\nWe show in extended experiments
    that models trained with Graph-PIT also work in challenging reverberant conditions.\r\nModels
    trained in this way are able to perform segment-less CSS, i.e., without stitching,
    and achieve comparable and often better separation quality than the conventional
    CSS with uPIT and stitching.\r\nWe simplify the training schedule for Graph-PIT
    with the recently proposed Source Aggregated Signal-to-Distortion Ratio (SA-SDR)
    loss.\r\nIt eliminates unfavorable properties of the previously used A-SDR loss
    and thus enables training with Graph-PIT from scratch.\r\nGraph-PIT training relaxes
    the constraints w.r.t. the allowed numbers of speakers and speaking patterns which
    allows using a larger variety of training data.\r\nFurthermore, we introduce novel
    signal-level evaluation metrics for meeting scenarios, namely the source-aggregated
    scale- and convolution-invariant Signal-to-Distortion Ratio (SA-SI-SDR and SA-CI-SDR),
    which are generalizations of the commonly used SDR-based metrics for the CSS case."
article_type: original
author:
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'von Neumann T, Kinoshita K, Boeddeker C, Delcroix M, Haeb-Umbach R. Segment-Less
    Continuous Speech Separation of Meetings: Training and Evaluation Criteria. <i>IEEE/ACM
    Transactions on Audio, Speech, and Language Processing</i>. 2023;31:576-589. doi:<a
    href="https://doi.org/10.1109/taslp.2022.3228629">10.1109/taslp.2022.3228629</a>'
  apa: 'von Neumann, T., Kinoshita, K., Boeddeker, C., Delcroix, M., &#38; Haeb-Umbach,
    R. (2023). Segment-Less Continuous Speech Separation of Meetings: Training and
    Evaluation Criteria. <i>IEEE/ACM Transactions on Audio, Speech, and Language Processing</i>,
    <i>31</i>, 576–589. <a href="https://doi.org/10.1109/taslp.2022.3228629">https://doi.org/10.1109/taslp.2022.3228629</a>'
  bibtex: '@article{von Neumann_Kinoshita_Boeddeker_Delcroix_Haeb-Umbach_2023, title={Segment-Less
    Continuous Speech Separation of Meetings: Training and Evaluation Criteria}, volume={31},
    DOI={<a href="https://doi.org/10.1109/taslp.2022.3228629">10.1109/taslp.2022.3228629</a>},
    journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, publisher={Institute
    of Electrical and Electronics Engineers (IEEE)}, author={von Neumann, Thilo and
    Kinoshita, Keisuke and Boeddeker, Christoph and Delcroix, Marc and Haeb-Umbach,
    Reinhold}, year={2023}, pages={576–589} }'
  chicago: 'Neumann, Thilo von, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix,
    and Reinhold Haeb-Umbach. “Segment-Less Continuous Speech Separation of Meetings:
    Training and Evaluation Criteria.” <i>IEEE/ACM Transactions on Audio, Speech,
    and Language Processing</i> 31 (2023): 576–89. <a href="https://doi.org/10.1109/taslp.2022.3228629">https://doi.org/10.1109/taslp.2022.3228629</a>.'
  ieee: 'T. von Neumann, K. Kinoshita, C. Boeddeker, M. Delcroix, and R. Haeb-Umbach,
    “Segment-Less Continuous Speech Separation of Meetings: Training and Evaluation
    Criteria,” <i>IEEE/ACM Transactions on Audio, Speech, and Language Processing</i>,
    vol. 31, pp. 576–589, 2023, doi: <a href="https://doi.org/10.1109/taslp.2022.3228629">10.1109/taslp.2022.3228629</a>.'
  mla: 'von Neumann, Thilo, et al. “Segment-Less Continuous Speech Separation of Meetings:
    Training and Evaluation Criteria.” <i>IEEE/ACM Transactions on Audio, Speech,
    and Language Processing</i>, vol. 31, Institute of Electrical and Electronics
    Engineers (IEEE), 2023, pp. 576–89, doi:<a href="https://doi.org/10.1109/taslp.2022.3228629">10.1109/taslp.2022.3228629</a>.'
  short: T. von Neumann, K. Kinoshita, C. Boeddeker, M. Delcroix, R. Haeb-Umbach,
    IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2023) 576–589.
date_created: 2023-01-09T17:24:17Z
date_updated: 2023-11-15T12:16:11Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/taslp.2022.3228629
file:
- access_level: open_access
  content_type: application/pdf
  creator: haebumb
  date_created: 2023-01-09T17:46:05Z
  date_updated: 2023-01-11T08:50:19Z
  file_id: '35607'
  file_name: main.pdf
  file_size: 7185077
  relation: main_file
file_date_updated: 2023-01-11T08:50:19Z
has_accepted_license: '1'
intvolume: '        31'
keyword:
- Continuous Speech Separation
- Source Separation
- Graph-PIT
- Dynamic Programming
- Permutation Invariant Training
language:
- iso: eng
oa: '1'
page: 576-589
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: IEEE/ACM Transactions on Audio, Speech, and Language Processing
publication_identifier:
  issn:
  - 2329-9290
  - 2329-9304
publication_status: published
publisher: Institute of Electrical and Electronics Engineers (IEEE)
quality_controlled: '1'
status: public
title: 'Segment-Less Continuous Speech Separation of Meetings: Training and Evaluation
  Criteria'
type: journal_article
user_id: '49870'
volume: 31
year: '2023'
...
---
_id: '48281'
abstract:
- lang: eng
  text: "\tWe propose a general framework to compute the word error rate (WER) of
    ASR systems that process recordings containing multiple speakers at their input
    and that produce multiple output word sequences (MIMO).\r\n\tSuch ASR systems
    are typically required, e.g., for meeting transcription.\r\n\tWe provide an efficient
    implementation based on a dynamic programming search in a multi-dimensional Levenshtein
    distance tensor under the constraint that a reference utterance must be matched
    consistently with one hypothesis output. \r\n\tThis also results in an efficient
    implementation of the ORC WER which previously suffered from exponential complexity.\r\n\tWe
    give an overview of commonly used WER definitions for multi-speaker scenarios
    and show that they are specializations of the above MIMO WER tuned to particular
    application scenarios. \r\n\tWe conclude with a  discussion of the pros and cons
    of the various WER definitions and a recommendation when to use which."
author:
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'von Neumann T, Boeddeker C, Kinoshita K, Delcroix M, Haeb-Umbach R. On Word
    Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech
    Recognition Systems. In: <i>ICASSP 2023 - 2023 IEEE International Conference on
    Acoustics, Speech and Signal Processing (ICASSP)</i>. IEEE; 2023. doi:<a href="https://doi.org/10.1109/icassp49357.2023.10094784">10.1109/icassp49357.2023.10094784</a>'
  apa: von Neumann, T., Boeddeker, C., Kinoshita, K., Delcroix, M., &#38; Haeb-Umbach,
    R. (2023). On Word Error Rate Definitions and Their Efficient Computation for
    Multi-Speaker Speech Recognition Systems. <i>ICASSP 2023 - 2023 IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>. <a href="https://doi.org/10.1109/icassp49357.2023.10094784">https://doi.org/10.1109/icassp49357.2023.10094784</a>
  bibtex: '@inproceedings{von Neumann_Boeddeker_Kinoshita_Delcroix_Haeb-Umbach_2023,
    title={On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker
    Speech Recognition Systems}, DOI={<a href="https://doi.org/10.1109/icassp49357.2023.10094784">10.1109/icassp49357.2023.10094784</a>},
    booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)}, publisher={IEEE}, author={von Neumann, Thilo
    and Boeddeker, Christoph and Kinoshita, Keisuke and Delcroix, Marc and Haeb-Umbach,
    Reinhold}, year={2023} }'
  chicago: Neumann, Thilo von, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix,
    and Reinhold Haeb-Umbach. “On Word Error Rate Definitions and Their Efficient
    Computation for Multi-Speaker Speech Recognition Systems.” In <i>ICASSP 2023 -
    2023 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)</i>. IEEE, 2023. <a href="https://doi.org/10.1109/icassp49357.2023.10094784">https://doi.org/10.1109/icassp49357.2023.10094784</a>.
  ieee: 'T. von Neumann, C. Boeddeker, K. Kinoshita, M. Delcroix, and R. Haeb-Umbach,
    “On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker
    Speech Recognition Systems,” 2023, doi: <a href="https://doi.org/10.1109/icassp49357.2023.10094784">10.1109/icassp49357.2023.10094784</a>.'
  mla: von Neumann, Thilo, et al. “On Word Error Rate Definitions and Their Efficient
    Computation for Multi-Speaker Speech Recognition Systems.” <i>ICASSP 2023 - 2023
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>,
    IEEE, 2023, doi:<a href="https://doi.org/10.1109/icassp49357.2023.10094784">10.1109/icassp49357.2023.10094784</a>.
  short: 'T. von Neumann, C. Boeddeker, K. Kinoshita, M. Delcroix, R. Haeb-Umbach,
    in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and
    Signal Processing (ICASSP), IEEE, 2023.'
date_created: 2023-10-19T07:38:31Z
date_updated: 2025-02-12T09:16:34Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/icassp49357.2023.10094784
file:
- access_level: open_access
  content_type: application/pdf
  creator: tvn
  date_created: 2023-10-19T07:39:57Z
  date_updated: 2023-10-19T07:41:56Z
  file_id: '48282'
  file_name: ICASSP_2023_Meeting_Evaluation.pdf
  file_size: 204994
  relation: main_file
file_date_updated: 2023-10-19T07:41:56Z
has_accepted_license: '1'
keyword:
- Word Error Rate
- Meeting Recognition
- Levenshtein Distance
language:
- iso: eng
main_file_link:
- url: https://ieeexplore.ieee.org/document/10094784
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
publication_status: published
publisher: IEEE
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/fgnt/meeteval
status: public
title: On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker
  Speech Recognition Systems
type: conference
user_id: '40767'
year: '2023'
...
---
_id: '48275'
abstract:
- lang: eng
  text: "MeetEval is an open-source toolkit to evaluate  all kinds of meeting transcription
    systems.\r\nIt provides a unified interface for the computation of commonly used
    Word Error Rates (WERs), specifically cpWER, ORC WER and MIMO WER along other
    WER definitions.\r\nWe extend the cpWER computation by a temporal constraint to
    ensure that only words are identified as correct when the temporal alignment is
    plausible.\r\nThis leads to a better quality of the matching of the hypothesis
    string to the reference string that more closely resembles the actual transcription
    quality, and a system is penalized if it provides poor time annotations.\r\nSince
    word-level timing information is often not available, we present a way to approximate
    exact word-level timings from segment-level timings (e.g., a sentence) and show
    that the approximation leads to a similar WER as a matching with exact word-level
    annotations.\r\nAt the same time, the time constraint leads to a speedup of the
    matching algorithm, which outweighs the additional overhead caused by processing
    the time stamps."
author:
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'von Neumann T, Boeddeker C, Delcroix M, Haeb-Umbach R. MeetEval: A Toolkit
    for Computation of Word Error Rates for Meeting Transcription Systems. In: <i>Proc.
    CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>. ; 2023.'
  apa: 'von Neumann, T., Boeddeker, C., Delcroix, M., &#38; Haeb-Umbach, R. (2023).
    MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription
    Systems. <i>Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>.
    CHiME 2023 Workshop on Speech Processing in Everyday Environments, Dublin.'
  bibtex: '@inproceedings{von Neumann_Boeddeker_Delcroix_Haeb-Umbach_2023, title={MeetEval:
    A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems},
    booktitle={Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments},
    author={von Neumann, Thilo and Boeddeker, Christoph and Delcroix, Marc and Haeb-Umbach,
    Reinhold}, year={2023} }'
  chicago: 'Neumann, Thilo von, Christoph Boeddeker, Marc Delcroix, and Reinhold Haeb-Umbach.
    “MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription
    Systems.” In <i>Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>,
    2023.'
  ieee: 'T. von Neumann, C. Boeddeker, M. Delcroix, and R. Haeb-Umbach, “MeetEval:
    A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems,”
    presented at the CHiME 2023 Workshop on Speech Processing in Everyday Environments,
    Dublin, 2023.'
  mla: 'von Neumann, Thilo, et al. “MeetEval: A Toolkit for Computation of Word Error
    Rates for Meeting Transcription Systems.” <i>Proc. CHiME 2023 Workshop on Speech
    Processing in Everyday Environments</i>, 2023.'
  short: 'T. von Neumann, C. Boeddeker, M. Delcroix, R. Haeb-Umbach, in: Proc. CHiME
    2023 Workshop on Speech Processing in Everyday Environments, 2023.'
conference:
  location: Dublin
  name: CHiME 2023 Workshop on Speech Processing in Everyday Environments
date_created: 2023-10-19T07:24:51Z
date_updated: 2025-02-12T09:12:05Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: tvn
  date_created: 2023-10-19T07:19:59Z
  date_updated: 2023-10-19T07:19:59Z
  file_id: '48276'
  file_name: Chime_7__MeetEval.pdf
  file_size: 263744
  relation: main_file
file_date_updated: 2023-10-19T07:19:59Z
has_accepted_license: '1'
keyword:
- Speech Recognition
- Word Error Rate
- Meeting Transcription
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/abs/2307.11394
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/fgnt/meeteval
status: public
title: 'MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription
  Systems'
type: conference
user_id: '40767'
year: '2023'
...
---
_id: '47128'
author:
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Cătălin
  full_name: Zorilă, Cătălin
  last_name: Zorilă
- first_name: Rama
  full_name: Doddipatla, Rama
  last_name: Doddipatla
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Cord-Landwehr T, Boeddeker C, Zorilă C, Doddipatla R, Haeb-Umbach R. Frame-Wise
    and Overlap-Robust Speaker Embeddings for Meeting Diarization. In: <i>ICASSP 2023
    - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)</i>. IEEE; 2023. doi:<a href="https://doi.org/10.1109/icassp49357.2023.10095370">10.1109/icassp49357.2023.10095370</a>'
  apa: Cord-Landwehr, T., Boeddeker, C., Zorilă, C., Doddipatla, R., &#38; Haeb-Umbach,
    R. (2023). Frame-Wise and Overlap-Robust Speaker Embeddings for Meeting Diarization.
    <i>ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal
    Processing (ICASSP)</i>. 2023 IEEE International Conference on Acoustics, Speech,
    and Signal Processing (ICASSP), Rhodes. <a href="https://doi.org/10.1109/icassp49357.2023.10095370">https://doi.org/10.1109/icassp49357.2023.10095370</a>
  bibtex: '@inproceedings{Cord-Landwehr_Boeddeker_Zorilă_Doddipatla_Haeb-Umbach_2023,
    title={Frame-Wise and Overlap-Robust Speaker Embeddings for Meeting Diarization},
    DOI={<a href="https://doi.org/10.1109/icassp49357.2023.10095370">10.1109/icassp49357.2023.10095370</a>},
    booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)}, publisher={IEEE}, author={Cord-Landwehr, Tobias
    and Boeddeker, Christoph and Zorilă, Cătălin and Doddipatla, Rama and Haeb-Umbach,
    Reinhold}, year={2023} }'
  chicago: Cord-Landwehr, Tobias, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla,
    and Reinhold Haeb-Umbach. “Frame-Wise and Overlap-Robust Speaker Embeddings for
    Meeting Diarization.” In <i>ICASSP 2023 - 2023 IEEE International Conference on
    Acoustics, Speech and Signal Processing (ICASSP)</i>. IEEE, 2023. <a href="https://doi.org/10.1109/icassp49357.2023.10095370">https://doi.org/10.1109/icassp49357.2023.10095370</a>.
  ieee: 'T. Cord-Landwehr, C. Boeddeker, C. Zorilă, R. Doddipatla, and R. Haeb-Umbach,
    “Frame-Wise and Overlap-Robust Speaker Embeddings for Meeting Diarization,” presented
    at the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing
    (ICASSP), Rhodes, 2023, doi: <a href="https://doi.org/10.1109/icassp49357.2023.10095370">10.1109/icassp49357.2023.10095370</a>.'
  mla: Cord-Landwehr, Tobias, et al. “Frame-Wise and Overlap-Robust Speaker Embeddings
    for Meeting Diarization.” <i>ICASSP 2023 - 2023 IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP)</i>, IEEE, 2023, doi:<a href="https://doi.org/10.1109/icassp49357.2023.10095370">10.1109/icassp49357.2023.10095370</a>.
  short: 'T. Cord-Landwehr, C. Boeddeker, C. Zorilă, R. Doddipatla, R. Haeb-Umbach,
    in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and
    Signal Processing (ICASSP), IEEE, 2023.'
conference:
  location: Rhodes
  name: 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing
    (ICASSP)
date_created: 2023-09-19T14:01:20Z
date_updated: 2025-02-12T09:14:45Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/icassp49357.2023.10095370
file:
- access_level: open_access
  content_type: application/pdf
  creator: cord
  date_created: 2023-11-15T14:56:18Z
  date_updated: 2023-11-15T14:56:18Z
  file_id: '48932'
  file_name: teacher_student_embeddings.pdf
  file_size: 246306
  relation: main_file
file_date_updated: 2023-11-15T14:56:18Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
publication_status: published
publisher: IEEE
status: public
title: Frame-Wise and Overlap-Robust Speaker Embeddings for Meeting Diarization
type: conference
user_id: '40767'
year: '2023'
...
---
_id: '47129'
author:
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Cătălin
  full_name: Zorilă, Cătălin
  last_name: Zorilă
- first_name: Rama
  full_name: Doddipatla, Rama
  last_name: Doddipatla
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Cord-Landwehr T, Boeddeker C, Zorilă C, Doddipatla R, Haeb-Umbach R. A Teacher-Student
    Approach for Extracting Informative Speaker Embeddings From Speech Mixtures. In:
    <i>INTERSPEECH 2023</i>. ISCA; 2023. doi:<a href="https://doi.org/10.21437/interspeech.2023-1379">10.21437/interspeech.2023-1379</a>'
  apa: Cord-Landwehr, T., Boeddeker, C., Zorilă, C., Doddipatla, R., &#38; Haeb-Umbach,
    R. (2023). A Teacher-Student Approach for Extracting Informative Speaker Embeddings
    From Speech Mixtures. <i>INTERSPEECH 2023</i>. <a href="https://doi.org/10.21437/interspeech.2023-1379">https://doi.org/10.21437/interspeech.2023-1379</a>
  bibtex: '@inproceedings{Cord-Landwehr_Boeddeker_Zorilă_Doddipatla_Haeb-Umbach_2023,
    title={A Teacher-Student Approach for Extracting Informative Speaker Embeddings
    From Speech Mixtures}, DOI={<a href="https://doi.org/10.21437/interspeech.2023-1379">10.21437/interspeech.2023-1379</a>},
    booktitle={INTERSPEECH 2023}, publisher={ISCA}, author={Cord-Landwehr, Tobias
    and Boeddeker, Christoph and Zorilă, Cătălin and Doddipatla, Rama and Haeb-Umbach,
    Reinhold}, year={2023} }'
  chicago: Cord-Landwehr, Tobias, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla,
    and Reinhold Haeb-Umbach. “A Teacher-Student Approach for Extracting Informative
    Speaker Embeddings From Speech Mixtures.” In <i>INTERSPEECH 2023</i>. ISCA, 2023.
    <a href="https://doi.org/10.21437/interspeech.2023-1379">https://doi.org/10.21437/interspeech.2023-1379</a>.
  ieee: 'T. Cord-Landwehr, C. Boeddeker, C. Zorilă, R. Doddipatla, and R. Haeb-Umbach,
    “A Teacher-Student Approach for Extracting Informative Speaker Embeddings From
    Speech Mixtures,” 2023, doi: <a href="https://doi.org/10.21437/interspeech.2023-1379">10.21437/interspeech.2023-1379</a>.'
  mla: Cord-Landwehr, Tobias, et al. “A Teacher-Student Approach for Extracting Informative
    Speaker Embeddings From Speech Mixtures.” <i>INTERSPEECH 2023</i>, ISCA, 2023,
    doi:<a href="https://doi.org/10.21437/interspeech.2023-1379">10.21437/interspeech.2023-1379</a>.
  short: 'T. Cord-Landwehr, C. Boeddeker, C. Zorilă, R. Doddipatla, R. Haeb-Umbach,
    in: INTERSPEECH 2023, ISCA, 2023.'
date_created: 2023-09-19T14:34:37Z
date_updated: 2025-02-12T09:15:28Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.21437/interspeech.2023-1379
file:
- access_level: open_access
  content_type: application/pdf
  creator: cord
  date_created: 2023-11-15T15:00:02Z
  date_updated: 2023-11-15T15:00:02Z
  file_id: '48933'
  file_name: multispeaker_embeddings.pdf
  file_size: 303203
  relation: main_file
file_date_updated: 2023-11-15T15:00:02Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: INTERSPEECH 2023
publication_status: published
publisher: ISCA
status: public
title: A Teacher-Student Approach for Extracting Informative Speaker Embeddings From
  Speech Mixtures
type: conference
user_id: '40767'
year: '2023'
...
---
_id: '54439'
author:
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Boeddeker C, Cord-Landwehr T, von Neumann T, Haeb-Umbach R. Multi-stage diarization
    refinement for the CHiME-7 DASR scenario. In: <i>7th International Workshop on
    Speech Processing in Everyday Environments (CHiME 2023)</i>. ISCA; 2023. doi:<a
    href="https://doi.org/10.21437/chime.2023-10">10.21437/chime.2023-10</a>'
  apa: Boeddeker, C., Cord-Landwehr, T., von Neumann, T., &#38; Haeb-Umbach, R. (2023).
    Multi-stage diarization refinement for the CHiME-7 DASR scenario. <i>7th International
    Workshop on Speech Processing in Everyday Environments (CHiME 2023)</i>. <a href="https://doi.org/10.21437/chime.2023-10">https://doi.org/10.21437/chime.2023-10</a>
  bibtex: '@inproceedings{Boeddeker_Cord-Landwehr_von Neumann_Haeb-Umbach_2023, title={Multi-stage
    diarization refinement for the CHiME-7 DASR scenario}, DOI={<a href="https://doi.org/10.21437/chime.2023-10">10.21437/chime.2023-10</a>},
    booktitle={7th International Workshop on Speech Processing in Everyday Environments
    (CHiME 2023)}, publisher={ISCA}, author={Boeddeker, Christoph and Cord-Landwehr,
    Tobias and von Neumann, Thilo and Haeb-Umbach, Reinhold}, year={2023} }'
  chicago: Boeddeker, Christoph, Tobias Cord-Landwehr, Thilo von Neumann, and Reinhold
    Haeb-Umbach. “Multi-Stage Diarization Refinement for the CHiME-7 DASR Scenario.”
    In <i>7th International Workshop on Speech Processing in Everyday Environments
    (CHiME 2023)</i>. ISCA, 2023. <a href="https://doi.org/10.21437/chime.2023-10">https://doi.org/10.21437/chime.2023-10</a>.
  ieee: 'C. Boeddeker, T. Cord-Landwehr, T. von Neumann, and R. Haeb-Umbach, “Multi-stage
    diarization refinement for the CHiME-7 DASR scenario,” 2023, doi: <a href="https://doi.org/10.21437/chime.2023-10">10.21437/chime.2023-10</a>.'
  mla: Boeddeker, Christoph, et al. “Multi-Stage Diarization Refinement for the CHiME-7
    DASR Scenario.” <i>7th International Workshop on Speech Processing in Everyday
    Environments (CHiME 2023)</i>, ISCA, 2023, doi:<a href="https://doi.org/10.21437/chime.2023-10">10.21437/chime.2023-10</a>.
  short: 'C. Boeddeker, T. Cord-Landwehr, T. von Neumann, R. Haeb-Umbach, in: 7th
    International Workshop on Speech Processing in Everyday Environments (CHiME 2023),
    ISCA, 2023.'
date_created: 2024-05-23T15:16:15Z
date_updated: 2025-02-12T09:16:13Z
department:
- _id: '54'
doi: 10.21437/chime.2023-10
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://www.isca-archive.org/chime_2023/boeddeker23_chime.pdf
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: 7th International Workshop on Speech Processing in Everyday Environments
  (CHiME 2023)
publication_status: published
publisher: ISCA
status: public
title: Multi-stage diarization refinement for the CHiME-7 DASR scenario
type: conference
user_id: '40767'
year: '2023'
...
---
_id: '48390'
author:
- first_name: Simon
  full_name: Berger, Simon
  last_name: Berger
- first_name: Peter
  full_name: Vieting, Peter
  last_name: Vieting
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Ralf
  full_name: Schlüter, Ralf
  last_name: Schlüter
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Berger S, Vieting P, Boeddeker C, Schlüter R, Haeb-Umbach R. Mixture Encoder
    for Joint Speech Separation and Recognition. In: <i>INTERSPEECH 2023</i>. ISCA;
    2023. doi:<a href="https://doi.org/10.21437/interspeech.2023-1815">10.21437/interspeech.2023-1815</a>'
  apa: Berger, S., Vieting, P., Boeddeker, C., Schlüter, R., &#38; Haeb-Umbach, R.
    (2023). Mixture Encoder for Joint Speech Separation and Recognition. <i>INTERSPEECH
    2023</i>. <a href="https://doi.org/10.21437/interspeech.2023-1815">https://doi.org/10.21437/interspeech.2023-1815</a>
  bibtex: '@inproceedings{Berger_Vieting_Boeddeker_Schlüter_Haeb-Umbach_2023, title={Mixture
    Encoder for Joint Speech Separation and Recognition}, DOI={<a href="https://doi.org/10.21437/interspeech.2023-1815">10.21437/interspeech.2023-1815</a>},
    booktitle={INTERSPEECH 2023}, publisher={ISCA}, author={Berger, Simon and Vieting,
    Peter and Boeddeker, Christoph and Schlüter, Ralf and Haeb-Umbach, Reinhold},
    year={2023} }'
  chicago: Berger, Simon, Peter Vieting, Christoph Boeddeker, Ralf Schlüter, and Reinhold
    Haeb-Umbach. “Mixture Encoder for Joint Speech Separation and Recognition.” In
    <i>INTERSPEECH 2023</i>. ISCA, 2023. <a href="https://doi.org/10.21437/interspeech.2023-1815">https://doi.org/10.21437/interspeech.2023-1815</a>.
  ieee: 'S. Berger, P. Vieting, C. Boeddeker, R. Schlüter, and R. Haeb-Umbach, “Mixture
    Encoder for Joint Speech Separation and Recognition,” 2023, doi: <a href="https://doi.org/10.21437/interspeech.2023-1815">10.21437/interspeech.2023-1815</a>.'
  mla: Berger, Simon, et al. “Mixture Encoder for Joint Speech Separation and Recognition.”
    <i>INTERSPEECH 2023</i>, ISCA, 2023, doi:<a href="https://doi.org/10.21437/interspeech.2023-1815">10.21437/interspeech.2023-1815</a>.
  short: 'S. Berger, P. Vieting, C. Boeddeker, R. Schlüter, R. Haeb-Umbach, in: INTERSPEECH
    2023, ISCA, 2023.'
date_created: 2023-10-23T15:06:39Z
date_updated: 2025-02-12T09:11:30Z
department:
- _id: '54'
doi: 10.21437/interspeech.2023-1815
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://www.isca-archive.org/interspeech_2023/berger23_interspeech.pdf
oa: '1'
project:
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: INTERSPEECH 2023
publication_status: published
publisher: ISCA
status: public
title: Mixture Encoder for Joint Speech Separation and Recognition
type: conference
user_id: '40767'
year: '2023'
...
---
_id: '33669'
abstract:
- lang: eng
  text: Far-field multi-speaker automatic speech recognition (ASR) has drawn increasing
    attention in recent years. Most existing methods feature a signal processing frontend
    and an ASR backend. In realistic scenarios, these modules are usually trained
    separately or progressively, which suffers from either inter-module mismatch or
    a complicated training process. In this paper, we propose an end-to-end multi-channel
    model that jointly optimizes the speech enhancement (including speech dereverberation,
    denoising, and separation) frontend and the ASR backend as a single system. To
    the best of our knowledge, this is the first work that proposes to optimize dereverberation,
    beamforming, and multi-speaker ASR in a fully end-to-end manner. The frontend
    module consists of a weighted prediction error (WPE) based submodule for dereverberation
    and a neural beamformer for denoising and speech separation. For the backend,
    we adopt a widely used end-to-end (E2E) ASR architecture. It is worth noting that
    the entire model is differentiable and can be optimized in a fully end-to-end
    manner using only the ASR criterion, without the need of parallel signal-level
    labels. We evaluate the proposed model on several multi-speaker benchmark datasets,
    and experimental results show that the fully E2E ASR model can achieve competitive
    performance on both noisy and reverberant conditions, with over 30% relative word
    error rate (WER) reduction over the single-channel baseline systems.
author:
- first_name: Wangyou
  full_name: Zhang, Wangyou
  last_name: Zhang
- first_name: Xuankai
  full_name: Chang, Xuankai
  last_name: Chang
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Tomohiro
  full_name: Nakatani, Tomohiro
  last_name: Nakatani
- first_name: Shinji
  full_name: Watanabe, Shinji
  last_name: Watanabe
- first_name: Yanmin
  full_name: Qian, Yanmin
  last_name: Qian
citation:
  ama: Zhang W, Chang X, Boeddeker C, Nakatani T, Watanabe S, Qian Y. End-to-End Dereverberation,
    Beamforming, and Speech Recognition in A Cocktail Party. <i>IEEE/ACM Transactions
    on Audio, Speech, and Language Processing</i>. Published online 2022. doi:<a href="https://doi.org/10.1109/TASLP.2022.3209942">10.1109/TASLP.2022.3209942</a>
  apa: Zhang, W., Chang, X., Boeddeker, C., Nakatani, T., Watanabe, S., &#38; Qian,
    Y. (2022). End-to-End Dereverberation, Beamforming, and Speech Recognition in
    A Cocktail Party. <i>IEEE/ACM Transactions on Audio, Speech, and Language Processing</i>.
    <a href="https://doi.org/10.1109/TASLP.2022.3209942">https://doi.org/10.1109/TASLP.2022.3209942</a>
  bibtex: '@article{Zhang_Chang_Boeddeker_Nakatani_Watanabe_Qian_2022, title={End-to-End
    Dereverberation, Beamforming, and Speech Recognition in A Cocktail Party}, DOI={<a
    href="https://doi.org/10.1109/TASLP.2022.3209942">10.1109/TASLP.2022.3209942</a>},
    journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, author={Zhang,
    Wangyou and Chang, Xuankai and Boeddeker, Christoph and Nakatani, Tomohiro and
    Watanabe, Shinji and Qian, Yanmin}, year={2022} }'
  chicago: Zhang, Wangyou, Xuankai Chang, Christoph Boeddeker, Tomohiro Nakatani,
    Shinji Watanabe, and Yanmin Qian. “End-to-End Dereverberation, Beamforming, and
    Speech Recognition in A Cocktail Party.” <i>IEEE/ACM Transactions on Audio, Speech,
    and Language Processing</i>, 2022. <a href="https://doi.org/10.1109/TASLP.2022.3209942">https://doi.org/10.1109/TASLP.2022.3209942</a>.
  ieee: 'W. Zhang, X. Chang, C. Boeddeker, T. Nakatani, S. Watanabe, and Y. Qian,
    “End-to-End Dereverberation, Beamforming, and Speech Recognition in A Cocktail
    Party,” <i>IEEE/ACM Transactions on Audio, Speech, and Language Processing</i>,
    2022, doi: <a href="https://doi.org/10.1109/TASLP.2022.3209942">10.1109/TASLP.2022.3209942</a>.'
  mla: Zhang, Wangyou, et al. “End-to-End Dereverberation, Beamforming, and Speech
    Recognition in A Cocktail Party.” <i>IEEE/ACM Transactions on Audio, Speech, and
    Language Processing</i>, 2022, doi:<a href="https://doi.org/10.1109/TASLP.2022.3209942">10.1109/TASLP.2022.3209942</a>.
  short: W. Zhang, X. Chang, C. Boeddeker, T. Nakatani, S. Watanabe, Y. Qian, IEEE/ACM
    Transactions on Audio, Speech, and Language Processing (2022).
date_created: 2022-10-11T07:27:51Z
date_updated: 2022-12-05T12:35:31Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/TASLP.2022.3209942
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2022-10-11T07:23:13Z
  date_updated: 2022-10-11T07:23:13Z
  file_id: '33674'
  file_name: End-to-End_Dereverberation_Beamforming_and_Speech_Recognition_in_A_Cocktail_Party.pdf
  file_size: 6167931
  relation: main_file
file_date_updated: 2022-10-11T07:23:13Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
publication: IEEE/ACM Transactions on Audio, Speech, and Language Processing
publication_identifier:
  issn:
  - 'Print ISSN: 2329-9290 Electronic ISSN: 2329-9304'
publication_status: published
related_material:
  link:
  - relation: confirmation
    url: https://ieeexplore.ieee.org/abstract/document/9904314
status: public
title: End-to-End Dereverberation, Beamforming, and Speech Recognition in A Cocktail
  Party
type: journal_article
user_id: '40767'
year: '2022'
...
---
_id: '33847'
abstract:
- lang: eng
  text: "The scope of speech enhancement has changed from a monolithic view of single,\r\nindependent
    tasks, to a joint processing of complex conversational speech\r\nrecordings. Training
    and evaluation of these single tasks requires synthetic\r\ndata with access to
    intermediate signals that is as close as possible to the\r\nevaluation scenario.
    As such data often is not available, many works instead\r\nuse specialized databases
    for the training of each system component, e.g\r\nWSJ0-mix for source separation.
    We present a Multi-purpose Multi-Speaker\r\nMixture Signal Generator (MMS-MSG)
    for generating a variety of speech mixture\r\nsignals based on any speech corpus,
    ranging from classical anechoic mixtures\r\n(e.g., WSJ0-mix) over reverberant
    mixtures (e.g., SMS-WSJ) to meeting-style\r\ndata. Its highly modular and flexible
    structure allows for the simulation of\r\ndiverse environments and dynamic mixing,
    while simultaneously enabling an easy\r\nextension and modification to generate
    new scenarios and mixture types. These\r\nmeetings can be used for prototyping,
    evaluation, or training purposes. We\r\nprovide example evaluation data and baseline
    results for meetings based on the\r\nWSJ corpus. Further, we demonstrate the usefulness
    for realistic scenarios by\r\nusing MMS-MSG to provide training data for the LibriCSS
    database."
author:
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Cord-Landwehr T, von Neumann T, Boeddeker C, Haeb-Umbach R. MMS-MSG: A Multi-purpose
    Multi-Speaker Mixture Signal Generator. In: <i>2022 International Workshop on
    Acoustic Signal Enhancement (IWAENC)</i>. ; 2022.'
  apa: 'Cord-Landwehr, T., von Neumann, T., Boeddeker, C., &#38; Haeb-Umbach, R. (2022).
    MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator. <i>2022 International
    Workshop on Acoustic Signal Enhancement (IWAENC)</i>. 2022 International Workshop
    on Acoustic Signal Enhancement (IWAENC), Bamberg.'
  bibtex: '@inproceedings{Cord-Landwehr_von Neumann_Boeddeker_Haeb-Umbach_2022, title={MMS-MSG:
    A Multi-purpose Multi-Speaker Mixture Signal Generator}, booktitle={2022 International
    Workshop on Acoustic Signal Enhancement (IWAENC)}, author={Cord-Landwehr, Tobias
    and von Neumann, Thilo and Boeddeker, Christoph and Haeb-Umbach, Reinhold}, year={2022}
    }'
  chicago: 'Cord-Landwehr, Tobias, Thilo von Neumann, Christoph Boeddeker, and Reinhold
    Haeb-Umbach. “MMS-MSG: A Multi-Purpose Multi-Speaker Mixture Signal Generator.”
    In <i>2022 International Workshop on Acoustic Signal Enhancement (IWAENC)</i>,
    2022.'
  ieee: 'T. Cord-Landwehr, T. von Neumann, C. Boeddeker, and R. Haeb-Umbach, “MMS-MSG:
    A Multi-purpose Multi-Speaker Mixture Signal Generator,” presented at the 2022
    International Workshop on Acoustic Signal Enhancement (IWAENC), Bamberg, 2022.'
  mla: 'Cord-Landwehr, Tobias, et al. “MMS-MSG: A Multi-Purpose Multi-Speaker Mixture
    Signal Generator.” <i>2022 International Workshop on Acoustic Signal Enhancement
    (IWAENC)</i>, 2022.'
  short: 'T. Cord-Landwehr, T. von Neumann, C. Boeddeker, R. Haeb-Umbach, in: 2022
    International Workshop on Acoustic Signal Enhancement (IWAENC), 2022.'
conference:
  location: Bamberg
  name: 2022 International Workshop on Acoustic Signal Enhancement (IWAENC)
date_created: 2022-10-20T14:02:14Z
date_updated: 2023-11-15T14:55:14Z
ddc:
- '000'
department:
- _id: '54'
external_id:
  arxiv:
  - '2209.11494'
file:
- access_level: open_access
  content_type: application/pdf
  creator: cord
  date_created: 2023-11-15T14:54:56Z
  date_updated: 2023-11-15T14:54:56Z
  file_id: '48931'
  file_name: mms_msg_camera_ready.pdf
  file_size: 177975
  relation: main_file
file_date_updated: 2023-11-15T14:54:56Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: 2022 International Workshop on Acoustic Signal Enhancement (IWAENC)
quality_controlled: '1'
status: public
title: 'MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator'
type: conference
user_id: '44393'
year: '2022'
...
---
_id: '33848'
abstract:
- lang: eng
  text: "Impressive progress in neural network-based single-channel speech source\r\nseparation
    has been made in recent years. But those improvements have been\r\nmostly reported
    on anechoic data, a situation that is hardly met in practice.\r\nTaking the SepFormer
    as a starting point, which achieves state-of-the-art\r\nperformance on anechoic
    mixtures, we gradually modify it to optimize its\r\nperformance on reverberant
    mixtures. Although this leads to a word error rate\r\nimprovement by 7 percentage
    points compared to the standard SepFormer\r\nimplementation, the system ends up
    with only marginally better performance than\r\na PIT-BLSTM separation system,
    that is optimized with rather straightforward\r\nmeans. This is surprising and
    at the same time sobering, challenging the\r\npractical usefulness of many improvements
    reported in recent years for monaural\r\nsource separation on nonreverberant data."
author:
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Catalin
  full_name: Zorila, Catalin
  last_name: Zorila
- first_name: Rama
  full_name: Doddipatla, Rama
  last_name: Doddipatla
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Cord-Landwehr T, Boeddeker C, von Neumann T, Zorila C, Doddipatla R, Haeb-Umbach
    R. Monaural source separation: From anechoic to reverberant environments. In:
    <i>2022 International Workshop on Acoustic Signal Enhancement (IWAENC)</i>. IEEE;
    2022.'
  apa: 'Cord-Landwehr, T., Boeddeker, C., von Neumann, T., Zorila, C., Doddipatla,
    R., &#38; Haeb-Umbach, R. (2022). Monaural source separation: From anechoic to
    reverberant environments. <i>2022 International Workshop on Acoustic Signal Enhancement
    (IWAENC)</i>. 2022 International Workshop on Acoustic Signal Enhancement (IWAENC).'
  bibtex: '@inproceedings{Cord-Landwehr_Boeddeker_von Neumann_Zorila_Doddipatla_Haeb-Umbach_2022,
    place={Bamberg}, title={Monaural source separation: From anechoic to reverberant
    environments}, booktitle={2022 International Workshop on Acoustic Signal Enhancement
    (IWAENC)}, publisher={IEEE}, author={Cord-Landwehr, Tobias and Boeddeker, Christoph
    and von Neumann, Thilo and Zorila, Catalin and Doddipatla, Rama and Haeb-Umbach,
    Reinhold}, year={2022} }'
  chicago: 'Cord-Landwehr, Tobias, Christoph Boeddeker, Thilo von Neumann, Catalin
    Zorila, Rama Doddipatla, and Reinhold Haeb-Umbach. “Monaural Source Separation:
    From Anechoic to Reverberant Environments.” In <i>2022 International Workshop
    on Acoustic Signal Enhancement (IWAENC)</i>. Bamberg: IEEE, 2022.'
  ieee: 'T. Cord-Landwehr, C. Boeddeker, T. von Neumann, C. Zorila, R. Doddipatla,
    and R. Haeb-Umbach, “Monaural source separation: From anechoic to reverberant
    environments,” presented at the 2022 International Workshop on Acoustic Signal
    Enhancement (IWAENC), 2022.'
  mla: 'Cord-Landwehr, Tobias, et al. “Monaural Source Separation: From Anechoic to
    Reverberant Environments.” <i>2022 International Workshop on Acoustic Signal Enhancement
    (IWAENC)</i>, IEEE, 2022.'
  short: 'T. Cord-Landwehr, C. Boeddeker, T. von Neumann, C. Zorila, R. Doddipatla,
    R. Haeb-Umbach, in: 2022 International Workshop on Acoustic Signal Enhancement
    (IWAENC), IEEE, Bamberg, 2022.'
conference:
  name: 2022 International Workshop on Acoustic Signal Enhancement (IWAENC)
date_created: 2022-10-20T14:07:28Z
date_updated: 2025-02-12T09:05:25Z
ddc:
- '000'
department:
- _id: '54'
external_id:
  arxiv:
  - '2111.07578'
file:
- access_level: open_access
  content_type: application/pdf
  creator: cord
  date_created: 2023-11-15T14:52:16Z
  date_updated: 2023-11-15T14:52:16Z
  file_id: '48930'
  file_name: monaural_source_separation.pdf
  file_size: 212890
  relation: main_file
file_date_updated: 2023-11-15T14:52:16Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
place: Bamberg
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: 2022 International Workshop on Acoustic Signal Enhancement (IWAENC)
publisher: IEEE
status: public
title: 'Monaural source separation: From anechoic to reverberant environments'
type: conference
user_id: '40767'
year: '2022'
...
---
_id: '33819'
author:
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'von Neumann T, Kinoshita K, Boeddeker C, Delcroix M, Haeb-Umbach R. SA-SDR:
    A Novel Loss Function for Separation of Meeting Style Data. In: <i>ICASSP 2022
    - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)</i>. IEEE; 2022. doi:<a href="https://doi.org/10.1109/icassp43922.2022.9746757">10.1109/icassp43922.2022.9746757</a>'
  apa: 'von Neumann, T., Kinoshita, K., Boeddeker, C., Delcroix, M., &#38; Haeb-Umbach,
    R. (2022). SA-SDR: A Novel Loss Function for Separation of Meeting Style Data.
    <i>ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal
    Processing (ICASSP)</i>. <a href="https://doi.org/10.1109/icassp43922.2022.9746757">https://doi.org/10.1109/icassp43922.2022.9746757</a>'
  bibtex: '@inproceedings{von Neumann_Kinoshita_Boeddeker_Delcroix_Haeb-Umbach_2022,
    title={SA-SDR: A Novel Loss Function for Separation of Meeting Style Data}, DOI={<a
    href="https://doi.org/10.1109/icassp43922.2022.9746757">10.1109/icassp43922.2022.9746757</a>},
    booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)}, publisher={IEEE}, author={von Neumann, Thilo
    and Kinoshita, Keisuke and Boeddeker, Christoph and Delcroix, Marc and Haeb-Umbach,
    Reinhold}, year={2022} }'
  chicago: 'Neumann, Thilo von, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix,
    and Reinhold Haeb-Umbach. “SA-SDR: A Novel Loss Function for Separation of Meeting
    Style Data.” In <i>ICASSP 2022 - 2022 IEEE International Conference on Acoustics,
    Speech and Signal Processing (ICASSP)</i>. IEEE, 2022. <a href="https://doi.org/10.1109/icassp43922.2022.9746757">https://doi.org/10.1109/icassp43922.2022.9746757</a>.'
  ieee: 'T. von Neumann, K. Kinoshita, C. Boeddeker, M. Delcroix, and R. Haeb-Umbach,
    “SA-SDR: A Novel Loss Function for Separation of Meeting Style Data,” 2022, doi:
    <a href="https://doi.org/10.1109/icassp43922.2022.9746757">10.1109/icassp43922.2022.9746757</a>.'
  mla: 'von Neumann, Thilo, et al. “SA-SDR: A Novel Loss Function for Separation of
    Meeting Style Data.” <i>ICASSP 2022 - 2022 IEEE International Conference on Acoustics,
    Speech and Signal Processing (ICASSP)</i>, IEEE, 2022, doi:<a href="https://doi.org/10.1109/icassp43922.2022.9746757">10.1109/icassp43922.2022.9746757</a>.'
  short: 'T. von Neumann, K. Kinoshita, C. Boeddeker, M. Delcroix, R. Haeb-Umbach,
    in: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and
    Signal Processing (ICASSP), IEEE, 2022.'
date_created: 2022-10-20T05:29:12Z
date_updated: 2025-02-12T09:08:14Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/icassp43922.2022.9746757
file:
- access_level: open_access
  content_type: application/pdf
  creator: tvn
  date_created: 2022-10-20T05:33:10Z
  date_updated: 2022-10-20T05:33:10Z
  file_id: '33820'
  file_name: main.pdf
  file_size: 228069
  relation: main_file
- access_level: open_access
  content_type: application/pdf
  creator: tvn
  date_created: 2022-10-20T05:35:32Z
  date_updated: 2022-10-20T05:35:32Z
  file_id: '33821'
  file_name: poster.pdf
  file_size: 229166
  relation: poster
file_date_updated: 2022-10-20T05:35:32Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
publication_status: published
publisher: IEEE
quality_controlled: '1'
related_material:
  link:
  - relation: supplementary_material
    url: https://github.com/fgnt/graph_pit
status: public
title: 'SA-SDR: A Novel Loss Function for Separation of Meeting Style Data'
type: conference
user_id: '40767'
year: '2022'
...
---
_id: '33816'
author:
- first_name: Tobias
  full_name: Gburrek, Tobias
  id: '44006'
  last_name: Gburrek
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Gburrek T, Boeddeker C, von Neumann T, Cord-Landwehr T, Schmalenstroeer J,
    Haeb-Umbach R. <i>A Meeting Transcription System for an Ad-Hoc Acoustic Sensor
    Network</i>. arXiv; 2022. doi:<a href="https://doi.org/10.48550/ARXIV.2205.00944">10.48550/ARXIV.2205.00944</a>
  apa: Gburrek, T., Boeddeker, C., von Neumann, T., Cord-Landwehr, T., Schmalenstroeer,
    J., &#38; Haeb-Umbach, R. (2022). <i>A Meeting Transcription System for an Ad-Hoc
    Acoustic Sensor Network</i>. arXiv. <a href="https://doi.org/10.48550/ARXIV.2205.00944">https://doi.org/10.48550/ARXIV.2205.00944</a>
  bibtex: '@book{Gburrek_Boeddeker_von Neumann_Cord-Landwehr_Schmalenstroeer_Haeb-Umbach_2022,
    title={A Meeting Transcription System for an Ad-Hoc Acoustic Sensor Network},
    DOI={<a href="https://doi.org/10.48550/ARXIV.2205.00944">10.48550/ARXIV.2205.00944</a>},
    publisher={arXiv}, author={Gburrek, Tobias and Boeddeker, Christoph and von Neumann,
    Thilo and Cord-Landwehr, Tobias and Schmalenstroeer, Joerg and Haeb-Umbach, Reinhold},
    year={2022} }'
  chicago: Gburrek, Tobias, Christoph Boeddeker, Thilo von Neumann, Tobias Cord-Landwehr,
    Joerg Schmalenstroeer, and Reinhold Haeb-Umbach. <i>A Meeting Transcription System
    for an Ad-Hoc Acoustic Sensor Network</i>. arXiv, 2022. <a href="https://doi.org/10.48550/ARXIV.2205.00944">https://doi.org/10.48550/ARXIV.2205.00944</a>.
  ieee: T. Gburrek, C. Boeddeker, T. von Neumann, T. Cord-Landwehr, J. Schmalenstroeer,
    and R. Haeb-Umbach, <i>A Meeting Transcription System for an Ad-Hoc Acoustic Sensor
    Network</i>. arXiv, 2022.
  mla: Gburrek, Tobias, et al. <i>A Meeting Transcription System for an Ad-Hoc Acoustic
    Sensor Network</i>. arXiv, 2022, doi:<a href="https://doi.org/10.48550/ARXIV.2205.00944">10.48550/ARXIV.2205.00944</a>.
  short: T. Gburrek, C. Boeddeker, T. von Neumann, T. Cord-Landwehr, J. Schmalenstroeer,
    R. Haeb-Umbach, A Meeting Transcription System for an Ad-Hoc Acoustic Sensor Network,
    arXiv, 2022.
date_created: 2022-10-18T11:10:58Z
date_updated: 2025-02-12T09:03:42Z
ddc:
- '004'
department:
- _id: '54'
doi: 10.48550/ARXIV.2205.00944
file:
- access_level: open_access
  content_type: application/pdf
  creator: tgburrek
  date_created: 2023-11-17T06:42:04Z
  date_updated: 2023-11-17T06:42:04Z
  file_id: '48992'
  file_name: meeting_transcription_22.pdf
  file_size: 199006
  relation: main_file
file_date_updated: 2023-11-17T06:42:04Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publisher: arXiv
status: public
title: A Meeting Transcription System for an Ad-Hoc Acoustic Sensor Network
type: misc
user_id: '40767'
year: '2022'
...
