Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach

Beyerlein, P.; Aubert, X.; Haeb-Umbach, Reinhold; Harris, M.; Klakow, D.; Wendemuth, A.; Molau, S.; Ney, N.; Pitz, Michael; Sixtus, A.

Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach

P. Beyerlein, X. Aubert, R. Haeb-Umbach, M. Harris, D. Klakow, A. Wendemuth, S. Molau, N. Ney, M. Pitz, A. Sixtus, Speech Communication (2002) 109–131.

Download (ext.)

https://groups.uni-paderborn.de/nt/pubs/2002/BeAuHaHaKlWeMoNePiSi02.pdf

Journal Article | English

Author

Beyerlein, P.; Aubert, X.; Haeb-Umbach, Reinhold^LibreCat; Harris, M.; Klakow, D.; Wendemuth, A.; Molau, S.; Ney, N.; Pitz, Michael; Sixtus, A.

Department

Nachrichtentechnik (NT) / Heinz Nixdorf Institut

Abstract

Automatic speech recognition of real-live broadcast news (BN) data (Hub-4) has become a challenging research topic in recent years. This paper summarizes our key efforts to build a large vocabulary continuous speech recognition system for the heterogenous BN task without inducing undesired complexity and computational resources. These key efforts included: - automatic segmentation of the audio signal into speech utterances; - efficient one-pass trigram decoding using look-ahead techniques; - optimal log-linear interpolation of a variety of acoustic and language models using discriminative model combination (DMC); - handling short-range and weak longer-range correlations in natural speech and language by the use of phrases and of distance-language models; - improving the acoustic modeling by a robust feature extraction, channel normalization, adaptation techniques as well as automatic script selection and verification. The starting point of the system development was the Philips 64k-NAB word-internal triphone trigram system. On the speaker-independent but microphone-dependent NAB-task (transcription of read newspaper texts) we obtained a word error rate of about 10\%. Now, at the conclusion of the system development, we have arrived at Philips at an DMC-interpolated phrase-based crossword-pentaphone 4-gram system. This system transcribes BN data with an overall word error rate of about 17\%.

Publishing Year

2002

Journal Title

Speech Communication

Issue

Page

109-131

LibreCat-ID

11727

Cite this

Beyerlein P, Aubert X, Haeb-Umbach R, et al. Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach. Speech Communication. 2002;(37):109-131.

Beyerlein, P., Aubert, X., Haeb-Umbach, R., Harris, M., Klakow, D., Wendemuth, A., … Sixtus, A. (2002). Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach. Speech Communication, (37), 109–131.

@article{Beyerlein_Aubert_Haeb-Umbach_Harris_Klakow_Wendemuth_Molau_Ney_Pitz_Sixtus_2002, title={Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach}, number={37}, journal={Speech Communication}, author={Beyerlein, P. and Aubert, X. and Haeb-Umbach, Reinhold and Harris, M. and Klakow, D. and Wendemuth, A. and Molau, S. and Ney, N. and Pitz, Michael and Sixtus, A.}, year={2002}, pages={109–131} }

Beyerlein, P., X. Aubert, Reinhold Haeb-Umbach, M. Harris, D. Klakow, A. Wendemuth, S. Molau, N. Ney, Michael Pitz, and A. Sixtus. “Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach.” Speech Communication, no. 37 (2002): 109–31.

P. Beyerlein et al., “Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach,” Speech Communication, no. 37, pp. 109–131, 2002.

Beyerlein, P., et al. “Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach.” Speech Communication, no. 37, 2002, pp. 109–31.

All files available under the following license(s):

Copyright Statement:

This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)

URL

https://groups.uni-paderborn.de/nt/pubs/2002/BeAuHaHaKlWeMoNePiSi02.pdf

Access Level

Closed Access

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar