---
_id: '57085'
abstract:
- lang: eng
  text: We propose an approach for simultaneous diarization and separation of meeting
    data. It consists of a complex Angular Central Gaussian Mixture Model (cACGMM)
    for speech source separation, and a von-Mises-Fisher Mixture Model (VMFMM) for
    diarization in a joint statistical framework. Through the integration, both spatial
    and spectral information are exploited for diarization and separation. We also
    develop a method for counting the number of active speakers in a segment of a
    meeting to support block-wise processing. While the total number of speakers in
    a meeting may be known, it is usually not known on a per-segment level. With the
    proposed speaker counting, joint diarization and source separation can be done
    segment-by-segment, and the permutation problem across segments is solved, thus
    allowing for block-online processing in the future. Experimental results on the
    LibriCSS meeting corpus show that the integrated approach outperforms a cascaded
    approach of diarization and speech enhancement in terms of WER, both on a per-segment
    and on a per-meeting level.
author:
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Cord-Landwehr T, Boeddeker C, Haeb-Umbach R. Simultaneous Diarization and
    Separation of Meetings through the Integration of Statistical Mixture Models.
    In: <i>ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and
    Signal Processing (ICASSP)</i>. ; 2024. doi:<a href="https://doi.org/10.1109/ICASSP49660.2025.10888445">10.1109/ICASSP49660.2025.10888445</a>'
  apa: Cord-Landwehr, T., Boeddeker, C., &#38; Haeb-Umbach, R. (2024). Simultaneous
    Diarization and Separation of Meetings through the Integration of Statistical
    Mixture Models. <i>ICASSP 2025 - 2025 IEEE International Conference on Acoustics,
    Speech and Signal Processing (ICASSP)</i>. 2025 IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India. <a href="https://doi.org/10.1109/ICASSP49660.2025.10888445">https://doi.org/10.1109/ICASSP49660.2025.10888445</a>
  bibtex: '@inproceedings{Cord-Landwehr_Boeddeker_Haeb-Umbach_2024, title={Simultaneous
    Diarization and Separation of Meetings through the Integration of Statistical
    Mixture Models}, DOI={<a href="https://doi.org/10.1109/ICASSP49660.2025.10888445">10.1109/ICASSP49660.2025.10888445</a>},
    booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)}, author={Cord-Landwehr, Tobias and Boeddeker,
    Christoph and Haeb-Umbach, Reinhold}, year={2024} }'
  chicago: Cord-Landwehr, Tobias, Christoph Boeddeker, and Reinhold Haeb-Umbach. “Simultaneous
    Diarization and Separation of Meetings through the Integration of Statistical
    Mixture Models.” In <i>ICASSP 2025 - 2025 IEEE International Conference on Acoustics,
    Speech and Signal Processing (ICASSP)</i>, 2024. <a href="https://doi.org/10.1109/ICASSP49660.2025.10888445">https://doi.org/10.1109/ICASSP49660.2025.10888445</a>.
  ieee: 'T. Cord-Landwehr, C. Boeddeker, and R. Haeb-Umbach, “Simultaneous Diarization
    and Separation of Meetings through the Integration of Statistical Mixture Models,”
    presented at the 2025 IEEE International Conference on Acoustics, Speech and Signal
    Processing (ICASSP), Hyderabad, India, 2024, doi: <a href="https://doi.org/10.1109/ICASSP49660.2025.10888445">10.1109/ICASSP49660.2025.10888445</a>.'
  mla: Cord-Landwehr, Tobias, et al. “Simultaneous Diarization and Separation of Meetings
    through the Integration of Statistical Mixture Models.” <i>ICASSP 2025 - 2025
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>,
    2024, doi:<a href="https://doi.org/10.1109/ICASSP49660.2025.10888445">10.1109/ICASSP49660.2025.10888445</a>.
  short: 'T. Cord-Landwehr, C. Boeddeker, R. Haeb-Umbach, in: ICASSP 2025 - 2025 IEEE
    International Conference on Acoustics, Speech and Signal Processing (ICASSP),
    2024.'
conference:
  location: Hyderabad, India
  name: 2025 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)
date_created: 2024-11-14T09:32:38Z
date_updated: 2025-08-14T08:12:22Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/ICASSP49660.2025.10888445
file:
- access_level: closed
  content_type: application/pdf
  creator: cord
  date_created: 2025-08-14T08:11:57Z
  date_updated: 2025-08-14T08:11:57Z
  file_id: '60930'
  file_name: main.pdf
  file_size: 259907
  relation: main_file
  success: 1
file_date_updated: 2025-08-14T08:11:57Z
has_accepted_license: '1'
keyword:
- diarization
- source separation
- mixture model
- meeting
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/pdf/2410.21455
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  name: Automatische Transkription von Gesprächssituationen
publication: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
status: public
title: Simultaneous Diarization and Separation of Meetings through the Integration
  of Statistical Mixture Models
type: conference
user_id: '44393'
year: '2024'
...
---
_id: '49109'
abstract:
- lang: eng
  text: "We propose a diarization system, that estimates “who spoke when” based on
    spatial information, to be used as a front-end of a meeting transcription system
    running on the signals gathered from an acoustic sensor network (ASN). Although
    the\r\nspatial distribution of the microphones is advantageous, exploiting the
    spatial diversity for diarization and signal enhancement is challenging, because
    the microphones’ positions are typically unknown, and the recorded signals are
    initially unsynchronized in general. Here, we approach these issues by first blindly
    synchronizing the signals and then estimating time differences of arrival (TDOAs).
    The TDOA information is exploited to estimate the speakers’ activity, even in
    the presence of multiple speakers being simultaneously active. This speaker activity
    information serves as a guide for a spatial mixture model, on which basis the
    individual speaker’s signals are extracted via beamforming. Finally, the extracted
    signals are forwarded to a speech recognizer. Additionally, a novel initialization
    scheme for spatial mixture models based on the TDOA estimates is proposed. Experiments
    conducted on real recordings from the LibriWASN data set have shown that our proposed
    system is advantageous compared to a system using a spatial mixture model, which
    does not make use\r\nof external diarization information."
author:
- first_name: Tobias
  full_name: Gburrek, Tobias
  id: '44006'
  last_name: Gburrek
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Gburrek T, Schmalenstroeer J, Haeb-Umbach R. Spatial Diarization for Meeting
    Transcription with Ad-Hoc Acoustic Sensor Networks. In: <i>Proc. Asilomar Conference
    on Signals, Systems, and Computers</i>. ; 2023.'
  apa: Gburrek, T., Schmalenstroeer, J., &#38; Haeb-Umbach, R. (2023). Spatial Diarization
    for Meeting Transcription with Ad-Hoc Acoustic Sensor Networks. <i>Proc. Asilomar
    Conference on Signals, Systems, and Computers</i>. 57th Asilomar Conference on
    Signals, Systems, and Computers.
  bibtex: '@inproceedings{Gburrek_Schmalenstroeer_Haeb-Umbach_2023, title={Spatial
    Diarization for Meeting Transcription with Ad-Hoc Acoustic Sensor Networks}, booktitle={Proc.
    Asilomar Conference on Signals, Systems, and Computers}, author={Gburrek, Tobias
    and Schmalenstroeer, Joerg and Haeb-Umbach, Reinhold}, year={2023} }'
  chicago: Gburrek, Tobias, Joerg Schmalenstroeer, and Reinhold Haeb-Umbach. “Spatial
    Diarization for Meeting Transcription with Ad-Hoc Acoustic Sensor Networks.” In
    <i>Proc. Asilomar Conference on Signals, Systems, and Computers</i>, 2023.
  ieee: T. Gburrek, J. Schmalenstroeer, and R. Haeb-Umbach, “Spatial Diarization for
    Meeting Transcription with Ad-Hoc Acoustic Sensor Networks,” presented at the
    57th Asilomar Conference on Signals, Systems, and Computers, 2023.
  mla: Gburrek, Tobias, et al. “Spatial Diarization for Meeting Transcription with
    Ad-Hoc Acoustic Sensor Networks.” <i>Proc. Asilomar Conference on Signals, Systems,
    and Computers</i>, 2023.
  short: 'T. Gburrek, J. Schmalenstroeer, R. Haeb-Umbach, in: Proc. Asilomar Conference
    on Signals, Systems, and Computers, 2023.'
conference:
  end_date: 2023-11-01
  name: 57th Asilomar Conference on Signals, Systems, and Computers
  start_date: 2023-10-31
date_created: 2023-11-22T07:52:29Z
date_updated: 2023-11-22T07:58:49Z
ddc:
- '004'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: schmalen
  date_created: 2023-11-22T07:51:18Z
  date_updated: 2023-11-22T07:58:49Z
  file_id: '49110'
  file_name: asilomar.pdf
  file_size: 212317
  relation: main_file
file_date_updated: 2023-11-22T07:58:49Z
has_accepted_license: '1'
keyword:
- Diarization
- time difference of arrival
- ad-hoc acoustic sensor network
- meeting transcription
language:
- iso: eng
oa: '1'
publication: Proc. Asilomar Conference on Signals, Systems, and Computers
quality_controlled: '1'
status: public
title: Spatial Diarization for Meeting Transcription with Ad-Hoc Acoustic Sensor Networks
type: conference
user_id: '460'
year: '2023'
...
---
_id: '48281'
abstract:
- lang: eng
  text: "\tWe propose a general framework to compute the word error rate (WER) of
    ASR systems that process recordings containing multiple speakers at their input
    and that produce multiple output word sequences (MIMO).\r\n\tSuch ASR systems
    are typically required, e.g., for meeting transcription.\r\n\tWe provide an efficient
    implementation based on a dynamic programming search in a multi-dimensional Levenshtein
    distance tensor under the constraint that a reference utterance must be matched
    consistently with one hypothesis output. \r\n\tThis also results in an efficient
    implementation of the ORC WER which previously suffered from exponential complexity.\r\n\tWe
    give an overview of commonly used WER definitions for multi-speaker scenarios
    and show that they are specializations of the above MIMO WER tuned to particular
    application scenarios. \r\n\tWe conclude with a  discussion of the pros and cons
    of the various WER definitions and a recommendation when to use which."
author:
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'von Neumann T, Boeddeker C, Kinoshita K, Delcroix M, Haeb-Umbach R. On Word
    Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech
    Recognition Systems. In: <i>ICASSP 2023 - 2023 IEEE International Conference on
    Acoustics, Speech and Signal Processing (ICASSP)</i>. IEEE; 2023. doi:<a href="https://doi.org/10.1109/icassp49357.2023.10094784">10.1109/icassp49357.2023.10094784</a>'
  apa: von Neumann, T., Boeddeker, C., Kinoshita, K., Delcroix, M., &#38; Haeb-Umbach,
    R. (2023). On Word Error Rate Definitions and Their Efficient Computation for
    Multi-Speaker Speech Recognition Systems. <i>ICASSP 2023 - 2023 IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>. <a href="https://doi.org/10.1109/icassp49357.2023.10094784">https://doi.org/10.1109/icassp49357.2023.10094784</a>
  bibtex: '@inproceedings{von Neumann_Boeddeker_Kinoshita_Delcroix_Haeb-Umbach_2023,
    title={On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker
    Speech Recognition Systems}, DOI={<a href="https://doi.org/10.1109/icassp49357.2023.10094784">10.1109/icassp49357.2023.10094784</a>},
    booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)}, publisher={IEEE}, author={von Neumann, Thilo
    and Boeddeker, Christoph and Kinoshita, Keisuke and Delcroix, Marc and Haeb-Umbach,
    Reinhold}, year={2023} }'
  chicago: Neumann, Thilo von, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix,
    and Reinhold Haeb-Umbach. “On Word Error Rate Definitions and Their Efficient
    Computation for Multi-Speaker Speech Recognition Systems.” In <i>ICASSP 2023 -
    2023 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)</i>. IEEE, 2023. <a href="https://doi.org/10.1109/icassp49357.2023.10094784">https://doi.org/10.1109/icassp49357.2023.10094784</a>.
  ieee: 'T. von Neumann, C. Boeddeker, K. Kinoshita, M. Delcroix, and R. Haeb-Umbach,
    “On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker
    Speech Recognition Systems,” 2023, doi: <a href="https://doi.org/10.1109/icassp49357.2023.10094784">10.1109/icassp49357.2023.10094784</a>.'
  mla: von Neumann, Thilo, et al. “On Word Error Rate Definitions and Their Efficient
    Computation for Multi-Speaker Speech Recognition Systems.” <i>ICASSP 2023 - 2023
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>,
    IEEE, 2023, doi:<a href="https://doi.org/10.1109/icassp49357.2023.10094784">10.1109/icassp49357.2023.10094784</a>.
  short: 'T. von Neumann, C. Boeddeker, K. Kinoshita, M. Delcroix, R. Haeb-Umbach,
    in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and
    Signal Processing (ICASSP), IEEE, 2023.'
date_created: 2023-10-19T07:38:31Z
date_updated: 2025-02-12T09:16:34Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/icassp49357.2023.10094784
file:
- access_level: open_access
  content_type: application/pdf
  creator: tvn
  date_created: 2023-10-19T07:39:57Z
  date_updated: 2023-10-19T07:41:56Z
  file_id: '48282'
  file_name: ICASSP_2023_Meeting_Evaluation.pdf
  file_size: 204994
  relation: main_file
file_date_updated: 2023-10-19T07:41:56Z
has_accepted_license: '1'
keyword:
- Word Error Rate
- Meeting Recognition
- Levenshtein Distance
language:
- iso: eng
main_file_link:
- url: https://ieeexplore.ieee.org/document/10094784
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
publication_status: published
publisher: IEEE
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/fgnt/meeteval
status: public
title: On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker
  Speech Recognition Systems
type: conference
user_id: '40767'
year: '2023'
...
---
_id: '48275'
abstract:
- lang: eng
  text: "MeetEval is an open-source toolkit to evaluate  all kinds of meeting transcription
    systems.\r\nIt provides a unified interface for the computation of commonly used
    Word Error Rates (WERs), specifically cpWER, ORC WER and MIMO WER along other
    WER definitions.\r\nWe extend the cpWER computation by a temporal constraint to
    ensure that only words are identified as correct when the temporal alignment is
    plausible.\r\nThis leads to a better quality of the matching of the hypothesis
    string to the reference string that more closely resembles the actual transcription
    quality, and a system is penalized if it provides poor time annotations.\r\nSince
    word-level timing information is often not available, we present a way to approximate
    exact word-level timings from segment-level timings (e.g., a sentence) and show
    that the approximation leads to a similar WER as a matching with exact word-level
    annotations.\r\nAt the same time, the time constraint leads to a speedup of the
    matching algorithm, which outweighs the additional overhead caused by processing
    the time stamps."
author:
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'von Neumann T, Boeddeker C, Delcroix M, Haeb-Umbach R. MeetEval: A Toolkit
    for Computation of Word Error Rates for Meeting Transcription Systems. In: <i>Proc.
    CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>. ; 2023.'
  apa: 'von Neumann, T., Boeddeker, C., Delcroix, M., &#38; Haeb-Umbach, R. (2023).
    MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription
    Systems. <i>Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>.
    CHiME 2023 Workshop on Speech Processing in Everyday Environments, Dublin.'
  bibtex: '@inproceedings{von Neumann_Boeddeker_Delcroix_Haeb-Umbach_2023, title={MeetEval:
    A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems},
    booktitle={Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments},
    author={von Neumann, Thilo and Boeddeker, Christoph and Delcroix, Marc and Haeb-Umbach,
    Reinhold}, year={2023} }'
  chicago: 'Neumann, Thilo von, Christoph Boeddeker, Marc Delcroix, and Reinhold Haeb-Umbach.
    “MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription
    Systems.” In <i>Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>,
    2023.'
  ieee: 'T. von Neumann, C. Boeddeker, M. Delcroix, and R. Haeb-Umbach, “MeetEval:
    A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems,”
    presented at the CHiME 2023 Workshop on Speech Processing in Everyday Environments,
    Dublin, 2023.'
  mla: 'von Neumann, Thilo, et al. “MeetEval: A Toolkit for Computation of Word Error
    Rates for Meeting Transcription Systems.” <i>Proc. CHiME 2023 Workshop on Speech
    Processing in Everyday Environments</i>, 2023.'
  short: 'T. von Neumann, C. Boeddeker, M. Delcroix, R. Haeb-Umbach, in: Proc. CHiME
    2023 Workshop on Speech Processing in Everyday Environments, 2023.'
conference:
  location: Dublin
  name: CHiME 2023 Workshop on Speech Processing in Everyday Environments
date_created: 2023-10-19T07:24:51Z
date_updated: 2025-02-12T09:12:05Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: tvn
  date_created: 2023-10-19T07:19:59Z
  date_updated: 2023-10-19T07:19:59Z
  file_id: '48276'
  file_name: Chime_7__MeetEval.pdf
  file_size: 263744
  relation: main_file
file_date_updated: 2023-10-19T07:19:59Z
has_accepted_license: '1'
keyword:
- Speech Recognition
- Word Error Rate
- Meeting Transcription
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/abs/2307.11394
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/fgnt/meeteval
status: public
title: 'MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription
  Systems'
type: conference
user_id: '40767'
year: '2023'
...
