---
_id: '48281'
abstract:
- lang: eng
  text: "\tWe propose a general framework to compute the word error rate (WER) of
    ASR systems that process recordings containing multiple speakers at their input
    and that produce multiple output word sequences (MIMO).\r\n\tSuch ASR systems
    are typically required, e.g., for meeting transcription.\r\n\tWe provide an efficient
    implementation based on a dynamic programming search in a multi-dimensional Levenshtein
    distance tensor under the constraint that a reference utterance must be matched
    consistently with one hypothesis output. \r\n\tThis also results in an efficient
    implementation of the ORC WER which previously suffered from exponential complexity.\r\n\tWe
    give an overview of commonly used WER definitions for multi-speaker scenarios
    and show that they are specializations of the above MIMO WER tuned to particular
    application scenarios. \r\n\tWe conclude with a  discussion of the pros and cons
    of the various WER definitions and a recommendation when to use which."
author:
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'von Neumann T, Boeddeker C, Kinoshita K, Delcroix M, Haeb-Umbach R. On Word
    Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech
    Recognition Systems. In: <i>ICASSP 2023 - 2023 IEEE International Conference on
    Acoustics, Speech and Signal Processing (ICASSP)</i>. IEEE; 2023. doi:<a href="https://doi.org/10.1109/icassp49357.2023.10094784">10.1109/icassp49357.2023.10094784</a>'
  apa: von Neumann, T., Boeddeker, C., Kinoshita, K., Delcroix, M., &#38; Haeb-Umbach,
    R. (2023). On Word Error Rate Definitions and Their Efficient Computation for
    Multi-Speaker Speech Recognition Systems. <i>ICASSP 2023 - 2023 IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>. <a href="https://doi.org/10.1109/icassp49357.2023.10094784">https://doi.org/10.1109/icassp49357.2023.10094784</a>
  bibtex: '@inproceedings{von Neumann_Boeddeker_Kinoshita_Delcroix_Haeb-Umbach_2023,
    title={On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker
    Speech Recognition Systems}, DOI={<a href="https://doi.org/10.1109/icassp49357.2023.10094784">10.1109/icassp49357.2023.10094784</a>},
    booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)}, publisher={IEEE}, author={von Neumann, Thilo
    and Boeddeker, Christoph and Kinoshita, Keisuke and Delcroix, Marc and Haeb-Umbach,
    Reinhold}, year={2023} }'
  chicago: Neumann, Thilo von, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix,
    and Reinhold Haeb-Umbach. “On Word Error Rate Definitions and Their Efficient
    Computation for Multi-Speaker Speech Recognition Systems.” In <i>ICASSP 2023 -
    2023 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)</i>. IEEE, 2023. <a href="https://doi.org/10.1109/icassp49357.2023.10094784">https://doi.org/10.1109/icassp49357.2023.10094784</a>.
  ieee: 'T. von Neumann, C. Boeddeker, K. Kinoshita, M. Delcroix, and R. Haeb-Umbach,
    “On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker
    Speech Recognition Systems,” 2023, doi: <a href="https://doi.org/10.1109/icassp49357.2023.10094784">10.1109/icassp49357.2023.10094784</a>.'
  mla: von Neumann, Thilo, et al. “On Word Error Rate Definitions and Their Efficient
    Computation for Multi-Speaker Speech Recognition Systems.” <i>ICASSP 2023 - 2023
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>,
    IEEE, 2023, doi:<a href="https://doi.org/10.1109/icassp49357.2023.10094784">10.1109/icassp49357.2023.10094784</a>.
  short: 'T. von Neumann, C. Boeddeker, K. Kinoshita, M. Delcroix, R. Haeb-Umbach,
    in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and
    Signal Processing (ICASSP), IEEE, 2023.'
date_created: 2023-10-19T07:38:31Z
date_updated: 2025-02-12T09:16:34Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/icassp49357.2023.10094784
file:
- access_level: open_access
  content_type: application/pdf
  creator: tvn
  date_created: 2023-10-19T07:39:57Z
  date_updated: 2023-10-19T07:41:56Z
  file_id: '48282'
  file_name: ICASSP_2023_Meeting_Evaluation.pdf
  file_size: 204994
  relation: main_file
file_date_updated: 2023-10-19T07:41:56Z
has_accepted_license: '1'
keyword:
- Word Error Rate
- Meeting Recognition
- Levenshtein Distance
language:
- iso: eng
main_file_link:
- url: https://ieeexplore.ieee.org/document/10094784
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
publication_status: published
publisher: IEEE
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/fgnt/meeteval
status: public
title: On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker
  Speech Recognition Systems
type: conference
user_id: '40767'
year: '2023'
...
---
_id: '48275'
abstract:
- lang: eng
  text: "MeetEval is an open-source toolkit to evaluate  all kinds of meeting transcription
    systems.\r\nIt provides a unified interface for the computation of commonly used
    Word Error Rates (WERs), specifically cpWER, ORC WER and MIMO WER along other
    WER definitions.\r\nWe extend the cpWER computation by a temporal constraint to
    ensure that only words are identified as correct when the temporal alignment is
    plausible.\r\nThis leads to a better quality of the matching of the hypothesis
    string to the reference string that more closely resembles the actual transcription
    quality, and a system is penalized if it provides poor time annotations.\r\nSince
    word-level timing information is often not available, we present a way to approximate
    exact word-level timings from segment-level timings (e.g., a sentence) and show
    that the approximation leads to a similar WER as a matching with exact word-level
    annotations.\r\nAt the same time, the time constraint leads to a speedup of the
    matching algorithm, which outweighs the additional overhead caused by processing
    the time stamps."
author:
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'von Neumann T, Boeddeker C, Delcroix M, Haeb-Umbach R. MeetEval: A Toolkit
    for Computation of Word Error Rates for Meeting Transcription Systems. In: <i>Proc.
    CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>. ; 2023.'
  apa: 'von Neumann, T., Boeddeker, C., Delcroix, M., &#38; Haeb-Umbach, R. (2023).
    MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription
    Systems. <i>Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>.
    CHiME 2023 Workshop on Speech Processing in Everyday Environments, Dublin.'
  bibtex: '@inproceedings{von Neumann_Boeddeker_Delcroix_Haeb-Umbach_2023, title={MeetEval:
    A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems},
    booktitle={Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments},
    author={von Neumann, Thilo and Boeddeker, Christoph and Delcroix, Marc and Haeb-Umbach,
    Reinhold}, year={2023} }'
  chicago: 'Neumann, Thilo von, Christoph Boeddeker, Marc Delcroix, and Reinhold Haeb-Umbach.
    “MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription
    Systems.” In <i>Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>,
    2023.'
  ieee: 'T. von Neumann, C. Boeddeker, M. Delcroix, and R. Haeb-Umbach, “MeetEval:
    A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems,”
    presented at the CHiME 2023 Workshop on Speech Processing in Everyday Environments,
    Dublin, 2023.'
  mla: 'von Neumann, Thilo, et al. “MeetEval: A Toolkit for Computation of Word Error
    Rates for Meeting Transcription Systems.” <i>Proc. CHiME 2023 Workshop on Speech
    Processing in Everyday Environments</i>, 2023.'
  short: 'T. von Neumann, C. Boeddeker, M. Delcroix, R. Haeb-Umbach, in: Proc. CHiME
    2023 Workshop on Speech Processing in Everyday Environments, 2023.'
conference:
  location: Dublin
  name: CHiME 2023 Workshop on Speech Processing in Everyday Environments
date_created: 2023-10-19T07:24:51Z
date_updated: 2025-02-12T09:12:05Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: tvn
  date_created: 2023-10-19T07:19:59Z
  date_updated: 2023-10-19T07:19:59Z
  file_id: '48276'
  file_name: Chime_7__MeetEval.pdf
  file_size: 263744
  relation: main_file
file_date_updated: 2023-10-19T07:19:59Z
has_accepted_license: '1'
keyword:
- Speech Recognition
- Word Error Rate
- Meeting Transcription
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/abs/2307.11394
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/fgnt/meeteval
status: public
title: 'MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription
  Systems'
type: conference
user_id: '40767'
year: '2023'
...
---
_id: '11862'
abstract:
- lang: eng
  text: In this contribution we extend a previously proposed Bayesian approach for
    the enhancement of reverberant logarithmic mel power spectral coefficients for
    robust automatic speech recognition to the additional compensation of background
    noise. A recently proposed observation model is employed whose time-variant observation
    error statistics are obtained as a side product of the inference of the a posteriori
    probability density function of the clean speech feature vectors. Further a reduction
    of the computational effort and the memory requirements are achieved by using
    a recursive formulation of the observation model. The performance of the proposed
    algorithms is first experimentally studied on a connected digits recognition task
    with artificially created noisy reverberant data. It is shown that the use of
    the time-variant observation error model leads to a significant error rate reduction
    at low signal-to-noise ratios compared to a time-invariant model. Further experiments
    were conducted on a 5000 word task recorded in a reverberant and noisy environment.
    A significant word error rate reduction was obtained demonstrating the effectiveness
    of the approach on real-world data.
author:
- first_name: Volker
  full_name: Leutnant, Volker
  last_name: Leutnant
- first_name: Alexander
  full_name: Krueger, Alexander
  last_name: Krueger
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Leutnant V, Krueger A, Haeb-Umbach R. Bayesian Feature Enhancement for Reverberation
    and Noise Robust Speech Recognition. <i>IEEE Transactions on Audio, Speech, and
    Language Processing</i>. 2013;21(8):1640-1652. doi:<a href="https://doi.org/10.1109/TASL.2013.2258013">10.1109/TASL.2013.2258013</a>
  apa: Leutnant, V., Krueger, A., &#38; Haeb-Umbach, R. (2013). Bayesian Feature Enhancement
    for Reverberation and Noise Robust Speech Recognition. <i>IEEE Transactions on
    Audio, Speech, and Language Processing</i>, <i>21</i>(8), 1640–1652. <a href="https://doi.org/10.1109/TASL.2013.2258013">https://doi.org/10.1109/TASL.2013.2258013</a>
  bibtex: '@article{Leutnant_Krueger_Haeb-Umbach_2013, title={Bayesian Feature Enhancement
    for Reverberation and Noise Robust Speech Recognition}, volume={21}, DOI={<a href="https://doi.org/10.1109/TASL.2013.2258013">10.1109/TASL.2013.2258013</a>},
    number={8}, journal={IEEE Transactions on Audio, Speech, and Language Processing},
    author={Leutnant, Volker and Krueger, Alexander and Haeb-Umbach, Reinhold}, year={2013},
    pages={1640–1652} }'
  chicago: 'Leutnant, Volker, Alexander Krueger, and Reinhold Haeb-Umbach. “Bayesian
    Feature Enhancement for Reverberation and Noise Robust Speech Recognition.” <i>IEEE
    Transactions on Audio, Speech, and Language Processing</i> 21, no. 8 (2013): 1640–52.
    <a href="https://doi.org/10.1109/TASL.2013.2258013">https://doi.org/10.1109/TASL.2013.2258013</a>.'
  ieee: V. Leutnant, A. Krueger, and R. Haeb-Umbach, “Bayesian Feature Enhancement
    for Reverberation and Noise Robust Speech Recognition,” <i>IEEE Transactions on
    Audio, Speech, and Language Processing</i>, vol. 21, no. 8, pp. 1640–1652, 2013.
  mla: Leutnant, Volker, et al. “Bayesian Feature Enhancement for Reverberation and
    Noise Robust Speech Recognition.” <i>IEEE Transactions on Audio, Speech, and Language
    Processing</i>, vol. 21, no. 8, 2013, pp. 1640–52, doi:<a href="https://doi.org/10.1109/TASL.2013.2258013">10.1109/TASL.2013.2258013</a>.
  short: V. Leutnant, A. Krueger, R. Haeb-Umbach, IEEE Transactions on Audio, Speech,
    and Language Processing 21 (2013) 1640–1652.
date_created: 2019-07-12T05:29:42Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
doi: 10.1109/TASL.2013.2258013
intvolume: '        21'
issue: '8'
keyword:
- Bayes methods
- compensation
- error statistics
- reverberation
- speech recognition
- Bayesian feature enhancement
- background noise
- clean speech feature vectors
- compensation
- connected digits recognition task
- error statistics
- memory requirements
- noisy reverberant data
- posteriori probability density function
- recursive formulation
- reverberant logarithmic mel power spectral coefficients
- robust automatic speech recognition
- signal-to-noise ratios
- time-variant observation
- word error rate reduction
- Robust automatic speech recognition
- model-based Bayesian feature enhancement
- observation model for reverberant and noisy speech
- recursive observation model
language:
- iso: eng
page: 1640-1652
publication: IEEE Transactions on Audio, Speech, and Language Processing
status: public
title: Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition
type: journal_article
user_id: '44006'
volume: 21
year: '2013'
...
