---
_id: '49109'
abstract:
- lang: eng
  text: "We propose a diarization system, that estimates “who spoke when” based on
    spatial information, to be used as a front-end of a meeting transcription system
    running on the signals gathered from an acoustic sensor network (ASN). Although
    the\r\nspatial distribution of the microphones is advantageous, exploiting the
    spatial diversity for diarization and signal enhancement is challenging, because
    the microphones’ positions are typically unknown, and the recorded signals are
    initially unsynchronized in general. Here, we approach these issues by first blindly
    synchronizing the signals and then estimating time differences of arrival (TDOAs).
    The TDOA information is exploited to estimate the speakers’ activity, even in
    the presence of multiple speakers being simultaneously active. This speaker activity
    information serves as a guide for a spatial mixture model, on which basis the
    individual speaker’s signals are extracted via beamforming. Finally, the extracted
    signals are forwarded to a speech recognizer. Additionally, a novel initialization
    scheme for spatial mixture models based on the TDOA estimates is proposed. Experiments
    conducted on real recordings from the LibriWASN data set have shown that our proposed
    system is advantageous compared to a system using a spatial mixture model, which
    does not make use\r\nof external diarization information."
author:
- first_name: Tobias
  full_name: Gburrek, Tobias
  id: '44006'
  last_name: Gburrek
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Gburrek T, Schmalenstroeer J, Haeb-Umbach R. Spatial Diarization for Meeting
    Transcription with Ad-Hoc Acoustic Sensor Networks. In: <i>Proc. Asilomar Conference
    on Signals, Systems, and Computers</i>. ; 2023.'
  apa: Gburrek, T., Schmalenstroeer, J., &#38; Haeb-Umbach, R. (2023). Spatial Diarization
    for Meeting Transcription with Ad-Hoc Acoustic Sensor Networks. <i>Proc. Asilomar
    Conference on Signals, Systems, and Computers</i>. 57th Asilomar Conference on
    Signals, Systems, and Computers.
  bibtex: '@inproceedings{Gburrek_Schmalenstroeer_Haeb-Umbach_2023, title={Spatial
    Diarization for Meeting Transcription with Ad-Hoc Acoustic Sensor Networks}, booktitle={Proc.
    Asilomar Conference on Signals, Systems, and Computers}, author={Gburrek, Tobias
    and Schmalenstroeer, Joerg and Haeb-Umbach, Reinhold}, year={2023} }'
  chicago: Gburrek, Tobias, Joerg Schmalenstroeer, and Reinhold Haeb-Umbach. “Spatial
    Diarization for Meeting Transcription with Ad-Hoc Acoustic Sensor Networks.” In
    <i>Proc. Asilomar Conference on Signals, Systems, and Computers</i>, 2023.
  ieee: T. Gburrek, J. Schmalenstroeer, and R. Haeb-Umbach, “Spatial Diarization for
    Meeting Transcription with Ad-Hoc Acoustic Sensor Networks,” presented at the
    57th Asilomar Conference on Signals, Systems, and Computers, 2023.
  mla: Gburrek, Tobias, et al. “Spatial Diarization for Meeting Transcription with
    Ad-Hoc Acoustic Sensor Networks.” <i>Proc. Asilomar Conference on Signals, Systems,
    and Computers</i>, 2023.
  short: 'T. Gburrek, J. Schmalenstroeer, R. Haeb-Umbach, in: Proc. Asilomar Conference
    on Signals, Systems, and Computers, 2023.'
conference:
  end_date: 2023-11-01
  name: 57th Asilomar Conference on Signals, Systems, and Computers
  start_date: 2023-10-31
date_created: 2023-11-22T07:52:29Z
date_updated: 2023-11-22T07:58:49Z
ddc:
- '004'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: schmalen
  date_created: 2023-11-22T07:51:18Z
  date_updated: 2023-11-22T07:58:49Z
  file_id: '49110'
  file_name: asilomar.pdf
  file_size: 212317
  relation: main_file
file_date_updated: 2023-11-22T07:58:49Z
has_accepted_license: '1'
keyword:
- Diarization
- time difference of arrival
- ad-hoc acoustic sensor network
- meeting transcription
language:
- iso: eng
oa: '1'
publication: Proc. Asilomar Conference on Signals, Systems, and Computers
quality_controlled: '1'
status: public
title: Spatial Diarization for Meeting Transcription with Ad-Hoc Acoustic Sensor Networks
type: conference
user_id: '460'
year: '2023'
...
---
_id: '48275'
abstract:
- lang: eng
  text: "MeetEval is an open-source toolkit to evaluate  all kinds of meeting transcription
    systems.\r\nIt provides a unified interface for the computation of commonly used
    Word Error Rates (WERs), specifically cpWER, ORC WER and MIMO WER along other
    WER definitions.\r\nWe extend the cpWER computation by a temporal constraint to
    ensure that only words are identified as correct when the temporal alignment is
    plausible.\r\nThis leads to a better quality of the matching of the hypothesis
    string to the reference string that more closely resembles the actual transcription
    quality, and a system is penalized if it provides poor time annotations.\r\nSince
    word-level timing information is often not available, we present a way to approximate
    exact word-level timings from segment-level timings (e.g., a sentence) and show
    that the approximation leads to a similar WER as a matching with exact word-level
    annotations.\r\nAt the same time, the time constraint leads to a speedup of the
    matching algorithm, which outweighs the additional overhead caused by processing
    the time stamps."
author:
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'von Neumann T, Boeddeker C, Delcroix M, Haeb-Umbach R. MeetEval: A Toolkit
    for Computation of Word Error Rates for Meeting Transcription Systems. In: <i>Proc.
    CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>. ; 2023.'
  apa: 'von Neumann, T., Boeddeker, C., Delcroix, M., &#38; Haeb-Umbach, R. (2023).
    MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription
    Systems. <i>Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>.
    CHiME 2023 Workshop on Speech Processing in Everyday Environments, Dublin.'
  bibtex: '@inproceedings{von Neumann_Boeddeker_Delcroix_Haeb-Umbach_2023, title={MeetEval:
    A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems},
    booktitle={Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments},
    author={von Neumann, Thilo and Boeddeker, Christoph and Delcroix, Marc and Haeb-Umbach,
    Reinhold}, year={2023} }'
  chicago: 'Neumann, Thilo von, Christoph Boeddeker, Marc Delcroix, and Reinhold Haeb-Umbach.
    “MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription
    Systems.” In <i>Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>,
    2023.'
  ieee: 'T. von Neumann, C. Boeddeker, M. Delcroix, and R. Haeb-Umbach, “MeetEval:
    A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems,”
    presented at the CHiME 2023 Workshop on Speech Processing in Everyday Environments,
    Dublin, 2023.'
  mla: 'von Neumann, Thilo, et al. “MeetEval: A Toolkit for Computation of Word Error
    Rates for Meeting Transcription Systems.” <i>Proc. CHiME 2023 Workshop on Speech
    Processing in Everyday Environments</i>, 2023.'
  short: 'T. von Neumann, C. Boeddeker, M. Delcroix, R. Haeb-Umbach, in: Proc. CHiME
    2023 Workshop on Speech Processing in Everyday Environments, 2023.'
conference:
  location: Dublin
  name: CHiME 2023 Workshop on Speech Processing in Everyday Environments
date_created: 2023-10-19T07:24:51Z
date_updated: 2025-02-12T09:12:05Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: tvn
  date_created: 2023-10-19T07:19:59Z
  date_updated: 2023-10-19T07:19:59Z
  file_id: '48276'
  file_name: Chime_7__MeetEval.pdf
  file_size: 263744
  relation: main_file
file_date_updated: 2023-10-19T07:19:59Z
has_accepted_license: '1'
keyword:
- Speech Recognition
- Word Error Rate
- Meeting Transcription
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/abs/2307.11394
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/fgnt/meeteval
status: public
title: 'MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription
  Systems'
type: conference
user_id: '40767'
year: '2023'
...
