The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
S. Cornell, T. Park, S. Huang, C. Boeddeker, X. Chang, M. Maciejewski, M. Wiesner, P. Garcia, S. Watanabe, ArXiv:2407.16447 (2024).
Download (ext.)
Preprint
| English
Author
Cornell, Samuele;
Park, Taejin;
Huang, Steve;
Boeddeker, ChristophLibreCat;
Chang, Xuankai;
Maciejewski, Matthew;
Wiesner, Matthew;
Garcia, Paola;
Watanabe, Shinji
Abstract
This paper presents the CHiME-8 DASR challenge which carries on from the
previous edition CHiME-7 DASR (C7DASR) and the past CHiME-6 challenge. It
focuses on joint multi-channel distant speech recognition (DASR) and
diarization with one or more, possibly heterogeneous, devices. The main goal is
to spur research towards meeting transcription approaches that can generalize
across arbitrary number of speakers, diverse settings (formal vs. informal
conversations), meeting duration, wide-variety of acoustic scenarios and
different recording configurations. Novelties with respect to C7DASR include:
i) the addition of NOTSOFAR-1, an additional office/corporate meeting scenario,
ii) a manually corrected Mixer 6 development set, iii) a new track in which we
allow the use of large-language models (LLM) iv) a jury award mechanism to
encourage participants to explore also more practical and innovative solutions.
To lower the entry barrier for participants, we provide a standalone toolkit
for downloading and preparing such datasets as well as performing text
normalization and scoring their submissions. Furthermore, this year we also
provide two baseline systems, one directly inherited from C7DASR and based on
ESPnet and another one developed on NeMo and based on NeMo team submission in
last year C7DASR. Baseline system results suggest that the addition of the
NOTSOFAR-1 scenario significantly increases the task's difficulty due to its
high number of speakers and very short duration.
Publishing Year
Journal Title
arXiv:2407.16447
LibreCat-ID
Cite this
Cornell S, Park T, Huang S, et al. The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization. arXiv:240716447. Published online 2024.
Cornell, S., Park, T., Huang, S., Boeddeker, C., Chang, X., Maciejewski, M., Wiesner, M., Garcia, P., & Watanabe, S. (2024). The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization. In arXiv:2407.16447.
@article{Cornell_Park_Huang_Boeddeker_Chang_Maciejewski_Wiesner_Garcia_Watanabe_2024, title={The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization}, journal={arXiv:2407.16447}, author={Cornell, Samuele and Park, Taejin and Huang, Steve and Boeddeker, Christoph and Chang, Xuankai and Maciejewski, Matthew and Wiesner, Matthew and Garcia, Paola and Watanabe, Shinji}, year={2024} }
Cornell, Samuele, Taejin Park, Steve Huang, Christoph Boeddeker, Xuankai Chang, Matthew Maciejewski, Matthew Wiesner, Paola Garcia, and Shinji Watanabe. “The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization.” ArXiv:2407.16447, 2024.
S. Cornell et al., “The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization,” arXiv:2407.16447. 2024.
Cornell, Samuele, et al. “The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization.” ArXiv:2407.16447, 2024.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Link(s) to Main File(s)
Access Level
Closed Access