Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach
P. Beyerlein, X. Aubert, R. Haeb-Umbach, M. Harris, D. Klakow, A. Wendemuth, S. Molau, N. Ney, M. Pitz, A. Sixtus, Speech Communication (2002) 109–131.
Journal Article
| English
Author
Beyerlein, P.;
Aubert, X.;
Haeb-Umbach, ReinholdLibreCat;
Harris, M.;
Klakow, D.;
Wendemuth, A.;
Molau, S.;
Ney, N.;
Pitz, Michael;
Sixtus, A.
Abstract
Automatic speech recognition of real-live broadcast news (BN) data (Hub-4) has become a challenging research topic in recent years. This paper summarizes our key efforts to build a large vocabulary continuous speech recognition system for the heterogenous BN task without inducing undesired complexity and computational resources. These key efforts included: - automatic segmentation of the audio signal into speech utterances; - efficient one-pass trigram decoding using look-ahead techniques; - optimal log-linear interpolation of a variety of acoustic and language models using discriminative model combination (DMC); - handling short-range and weak longer-range correlations in natural speech and language by the use of phrases and of distance-language models; - improving the acoustic modeling by a robust feature extraction, channel normalization, adaptation techniques as well as automatic script selection and verification. The starting point of the system development was the Philips 64k-NAB word-internal triphone trigram system. On the speaker-independent but microphone-dependent NAB-task (transcription of read newspaper texts) we obtained a word error rate of about 10\%. Now, at the conclusion of the system development, we have arrived at Philips at an DMC-interpolated phrase-based crossword-pentaphone 4-gram system. This system transcribes BN data with an overall word error rate of about 17\%.
Publishing Year
Journal Title
Speech Communication
Issue
37
Page
109-131
LibreCat-ID
Cite this
Beyerlein P, Aubert X, Haeb-Umbach R, et al. Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach. Speech Communication. 2002;(37):109-131.
Beyerlein, P., Aubert, X., Haeb-Umbach, R., Harris, M., Klakow, D., Wendemuth, A., … Sixtus, A. (2002). Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach. Speech Communication, (37), 109–131.
@article{Beyerlein_Aubert_Haeb-Umbach_Harris_Klakow_Wendemuth_Molau_Ney_Pitz_Sixtus_2002, title={Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach}, number={37}, journal={Speech Communication}, author={Beyerlein, P. and Aubert, X. and Haeb-Umbach, Reinhold and Harris, M. and Klakow, D. and Wendemuth, A. and Molau, S. and Ney, N. and Pitz, Michael and Sixtus, A.}, year={2002}, pages={109–131} }
Beyerlein, P., X. Aubert, Reinhold Haeb-Umbach, M. Harris, D. Klakow, A. Wendemuth, S. Molau, N. Ney, Michael Pitz, and A. Sixtus. “Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach.” Speech Communication, no. 37 (2002): 109–31.
P. Beyerlein et al., “Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach,” Speech Communication, no. 37, pp. 109–131, 2002.
Beyerlein, P., et al. “Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach.” Speech Communication, no. 37, 2002, pp. 109–31.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Link(s) to Main File(s)
Access Level
Closed Access