An Overview of Noise-Robust Automatic Speech Recognition

Li, Jinyu; Deng, Li; Gong, Yifan; Haeb-Umbach, Reinhold

An Overview of Noise-Robust Automatic Speech Recognition

J. Li, L. Deng, Y. Gong, R. Haeb-Umbach, IEEE Transactions on Audio, Speech and Language Processing 22 (2014) 745–777.

Download (ext.)

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6732927

DOI

10.1109/TASLP.2014.2304637

Journal Article | English

Author

Li, Jinyu; Deng, Li; Gong, Yifan; Haeb-Umbach, Reinhold^LibreCat

Department

Nachrichtentechnik (NT) / Heinz Nixdorf Institut

Abstract

New waves of consumer-centric applications, such as voice search and voice interaction with mobile devices and home entertainment systems, increasingly require automatic speech recognition (ASR) to be robust to the full range of real-world noise and other acoustic distorting conditions. Despite its practical importance, however, the inherent links between and distinctions among the myriad of methods for noise-robust ASR have yet to be carefully studied in order to advance the field further. To this end, it is critical to establish a solid, consistent, and common mathematical foundation for noise-robust ASR, which is lacking at present. This article is intended to fill this gap and to provide a thorough overview of modern noise-robust techniques for ASR developed over the past 30 years. We emphasize methods that are proven to be successful and that are likely to sustain or expand their future applicability. We distill key insights from our comprehensive overview in this field and take a fresh look at a few old problems, which nevertheless are still highly relevant today. Specifically, we have analyzed and categorized a wide range of noise-robust techniques using five different criteria: 1) feature-domain vs. model-domain processing, 2) the use of prior knowledge about the acoustic environment distortion, 3) the use of explicit environment-distortion models, 4) deterministic vs. uncertainty processing, and 5) the use of acoustic models trained jointly with the same feature enhancement or model adaptation process used in the testing stage. With this taxonomy-oriented review, we equip the reader with the insight to choose among techniques and with the awareness of the performance-complexity tradeoffs. The pros and cons of using different noise-robust ASR techniques in practical application scenarios are provided as a guide to interested practitioners. The current challenges and future research directions in this field is also carefully analyzed.

Keywords

Speech recognition; compensation; distortion modeling; joint model training; noise; robustness; uncertainty processing

Publishing Year

2014

Journal Title

IEEE Transactions on Audio, Speech and Language Processing

Volume

Issue

Page

745-777

LibreCat-ID

11867

Cite this

Li J, Deng L, Gong Y, Haeb-Umbach R. An Overview of Noise-Robust Automatic Speech Recognition. IEEE Transactions on Audio, Speech and Language Processing. 2014;22(4):745-777. doi:10.1109/TASLP.2014.2304637

Li, J., Deng, L., Gong, Y., & Haeb-Umbach, R. (2014). An Overview of Noise-Robust Automatic Speech Recognition. IEEE Transactions on Audio, Speech and Language Processing, 22(4), 745–777. https://doi.org/10.1109/TASLP.2014.2304637

@article{Li_Deng_Gong_Haeb-Umbach_2014, title={An Overview of Noise-Robust Automatic Speech Recognition}, volume={22}, DOI={10.1109/TASLP.2014.2304637}, number={4}, journal={IEEE Transactions on Audio, Speech and Language Processing}, author={Li, Jinyu and Deng, Li and Gong, Yifan and Haeb-Umbach, Reinhold}, year={2014}, pages={745–777} }

Li, Jinyu, Li Deng, Yifan Gong, and Reinhold Haeb-Umbach. “An Overview of Noise-Robust Automatic Speech Recognition.” IEEE Transactions on Audio, Speech and Language Processing 22, no. 4 (2014): 745–77. https://doi.org/10.1109/TASLP.2014.2304637.

J. Li, L. Deng, Y. Gong, and R. Haeb-Umbach, “An Overview of Noise-Robust Automatic Speech Recognition,” IEEE Transactions on Audio, Speech and Language Processing, vol. 22, no. 4, pp. 745–777, 2014.

Li, Jinyu, et al. “An Overview of Noise-Robust Automatic Speech Recognition.” IEEE Transactions on Audio, Speech and Language Processing, vol. 22, no. 4, 2014, pp. 745–77, doi:10.1109/TASLP.2014.2304637.

All files available under the following license(s):

Copyright Statement:

This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)

URL

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6732927

Access Level

Closed Access

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar