An Investigation Into the Effectiveness of Enhancement in ASR Training and Test for Chime-5 Dinner Party Transcription

C. Zorila, C. Boeddeker, R. Doddipatla, R. Haeb-Umbach, in: ASRU 2019, Sentosa, Singapore, 2019.

Download
OA ASRU_2019_Boeddeker_Paper.pdf 200.26 KB
OA ASRU_2019_Boeddeker_Poster.pdf 123.96 KB
Conference Paper | English
Author
Zorila, Catalin; Boeddeker, ChristophLibreCat; Doddipatla, Rama; Haeb-Umbach, ReinholdLibreCat
Abstract
Despite the strong modeling power of neural network acoustic models, speech enhancement has been shown to deliver additional word error rate improvements if multi-channel data is available. However, there has been a longstanding debate whether enhancement should also be carried out on the ASR training data. In an extensive experimental evaluation on the acoustically very challenging CHiME-5 dinner party data we show that: (i) cleaning up the training data can lead to substantial error rate reductions, and (ii) enhancement in training is advisable as long as enhancement in test is at least as strong as in training. This approach stands in contrast and delivers larger gains than the common strategy reported in the literature to augment the training database with additional artificially degraded speech. Together with an acoustic model topology consisting of initial CNN layers followed by factorized TDNN layers we achieve with 41.6% and 43.2% WER on the DEV and EVAL test sets, respectively, a new single-system state-of-the-art result on the CHiME-5 data. This is a 8% relative improvement compared to the best word error rate published so far for a speech recognizer without system combination.
Publishing Year
Proceedings Title
ASRU 2019, Sentosa, Singapore
LibreCat-ID

Cite this

Zorila C, Boeddeker C, Doddipatla R, Haeb-Umbach R. An Investigation Into the Effectiveness of Enhancement in ASR Training and Test for Chime-5 Dinner Party Transcription. In: ASRU 2019, Sentosa, Singapore. ; 2019.
Zorila, C., Boeddeker, C., Doddipatla, R., & Haeb-Umbach, R. (2019). An Investigation Into the Effectiveness of Enhancement in ASR Training and Test for Chime-5 Dinner Party Transcription. In ASRU 2019, Sentosa, Singapore.
@inproceedings{Zorila_Boeddeker_Doddipatla_Haeb-Umbach_2019, title={An Investigation Into the Effectiveness of Enhancement in ASR Training and Test for Chime-5 Dinner Party Transcription}, booktitle={ASRU 2019, Sentosa, Singapore}, author={Zorila, Catalin and Boeddeker, Christoph and Doddipatla, Rama and Haeb-Umbach, Reinhold}, year={2019} }
Zorila, Catalin, Christoph Boeddeker, Rama Doddipatla, and Reinhold Haeb-Umbach. “An Investigation Into the Effectiveness of Enhancement in ASR Training and Test for Chime-5 Dinner Party Transcription.” In ASRU 2019, Sentosa, Singapore, 2019.
C. Zorila, C. Boeddeker, R. Doddipatla, and R. Haeb-Umbach, “An Investigation Into the Effectiveness of Enhancement in ASR Training and Test for Chime-5 Dinner Party Transcription,” in ASRU 2019, Sentosa, Singapore, 2019.
Zorila, Catalin, et al. “An Investigation Into the Effectiveness of Enhancement in ASR Training and Test for Chime-5 Dinner Party Transcription.” ASRU 2019, Sentosa, Singapore, 2019.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
Access Level
OA Open Access
Last Uploaded
2020-02-06T07:42:42Z
Access Level
OA Open Access
Last Uploaded
2020-02-06T07:42:55Z


Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar