---
_id: '11867'
abstract:
- lang: eng
  text: 'New waves of consumer-centric applications, such as voice search and voice
    interaction with mobile devices and home entertainment systems, increasingly require
    automatic speech recognition (ASR) to be robust to the full range of real-world
    noise and other acoustic distorting conditions. Despite its practical importance,
    however, the inherent links between and distinctions among the myriad of methods
    for noise-robust ASR have yet to be carefully studied in order to advance the
    field further. To this end, it is critical to establish a solid, consistent, and
    common mathematical foundation for noise-robust ASR, which is lacking at present.
    This article is intended to fill this gap and to provide a thorough overview of
    modern noise-robust techniques for ASR developed over the past 30 years. We emphasize
    methods that are proven to be successful and that are likely to sustain or expand
    their future applicability. We distill key insights from our comprehensive overview
    in this field and take a fresh look at a few old problems, which nevertheless
    are still highly relevant today. Specifically, we have analyzed and categorized
    a wide range of noise-robust techniques using five different criteria: 1) feature-domain
    vs. model-domain processing, 2) the use of prior knowledge about the acoustic
    environment distortion, 3) the use of explicit environment-distortion models,
    4) deterministic vs. uncertainty processing, and 5) the use of acoustic models
    trained jointly with the same feature enhancement or model adaptation process
    used in the testing stage. With this taxonomy-oriented review, we equip the reader
    with the insight to choose among techniques and with the awareness of the performance-complexity
    tradeoffs. The pros and cons of using different noise-robust ASR techniques in
    practical application scenarios are provided as a guide to interested practitioners.
    The current challenges and future research directions in this field is also carefully
    analyzed.'
author:
- first_name: Jinyu
  full_name: Li, Jinyu
  last_name: Li
- first_name: Li
  full_name: Deng, Li
  last_name: Deng
- first_name: Yifan
  full_name: Gong, Yifan
  last_name: Gong
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Li J, Deng L, Gong Y, Haeb-Umbach R. An Overview of Noise-Robust Automatic
    Speech Recognition. <i>IEEE Transactions on Audio, Speech and Language Processing</i>.
    2014;22(4):745-777. doi:<a href="https://doi.org/10.1109/TASLP.2014.2304637">10.1109/TASLP.2014.2304637</a>
  apa: Li, J., Deng, L., Gong, Y., &#38; Haeb-Umbach, R. (2014). An Overview of Noise-Robust
    Automatic Speech Recognition. <i>IEEE Transactions on Audio, Speech and Language
    Processing</i>, <i>22</i>(4), 745–777. <a href="https://doi.org/10.1109/TASLP.2014.2304637">https://doi.org/10.1109/TASLP.2014.2304637</a>
  bibtex: '@article{Li_Deng_Gong_Haeb-Umbach_2014, title={An Overview of Noise-Robust
    Automatic Speech Recognition}, volume={22}, DOI={<a href="https://doi.org/10.1109/TASLP.2014.2304637">10.1109/TASLP.2014.2304637</a>},
    number={4}, journal={IEEE Transactions on Audio, Speech and Language Processing},
    author={Li, Jinyu and Deng, Li and Gong, Yifan and Haeb-Umbach, Reinhold}, year={2014},
    pages={745–777} }'
  chicago: 'Li, Jinyu, Li Deng, Yifan Gong, and Reinhold Haeb-Umbach. “An Overview
    of Noise-Robust Automatic Speech Recognition.” <i>IEEE Transactions on Audio,
    Speech and Language Processing</i> 22, no. 4 (2014): 745–77. <a href="https://doi.org/10.1109/TASLP.2014.2304637">https://doi.org/10.1109/TASLP.2014.2304637</a>.'
  ieee: J. Li, L. Deng, Y. Gong, and R. Haeb-Umbach, “An Overview of Noise-Robust
    Automatic Speech Recognition,” <i>IEEE Transactions on Audio, Speech and Language
    Processing</i>, vol. 22, no. 4, pp. 745–777, 2014.
  mla: Li, Jinyu, et al. “An Overview of Noise-Robust Automatic Speech Recognition.”
    <i>IEEE Transactions on Audio, Speech and Language Processing</i>, vol. 22, no.
    4, 2014, pp. 745–77, doi:<a href="https://doi.org/10.1109/TASLP.2014.2304637">10.1109/TASLP.2014.2304637</a>.
  short: J. Li, L. Deng, Y. Gong, R. Haeb-Umbach, IEEE Transactions on Audio, Speech
    and Language Processing 22 (2014) 745–777.
date_created: 2019-07-12T05:29:47Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
doi: 10.1109/TASLP.2014.2304637
intvolume: '        22'
issue: '4'
keyword:
- Speech recognition
- compensation
- distortion modeling
- joint model training
- noise
- robustness
- uncertainty processing
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6732927
oa: '1'
page: 745-777
publication: IEEE Transactions on Audio, Speech and Language Processing
status: public
title: An Overview of Noise-Robust Automatic Speech Recognition
type: journal_article
user_id: '44006'
volume: 22
year: '2014'
...
