---
res:
  bibo_abstract:
  - The rising interest in single-channel multi-speaker speech separation sparked
    development of End-to-End (E2E) approaches to multispeaker speech recognition.
    However, up until now, state-of-theart neural network–based time domain source
    separation has not yet been combined with E2E speech recognition. We here demonstrate
    how to combine a separation module based on a Convolutional Time domain Audio
    Separation Network (Conv-TasNet) with an E2E speech recognizer and how to train
    such a model jointly by distributing it over multiple GPUs or by approximating
    truncated back-propagation for the convolutional front-end. To put this work into
    perspective and illustrate the complexity of the design space, we provide a compact
    overview of single-channel multi-speaker recognition systems. Our experiments
    show a word error rate of 11.0% on WSJ0-2mix and indicate that our joint time
    domain model can yield substantial improvements over cascade DNN-HMM and monolithic
    E2E frequency domain systems proposed so far.@eng
  bibo_authorlist:
  - foaf_Person:
      foaf_givenName: Thilo
      foaf_name: von Neumann, Thilo
      foaf_surname: von Neumann
      foaf_workInfoHomepage: http://www.librecat.org/personId=49870
    orcid: https://orcid.org/0000-0002-7717-8670
  - foaf_Person:
      foaf_givenName: Keisuke
      foaf_name: Kinoshita, Keisuke
      foaf_surname: Kinoshita
  - foaf_Person:
      foaf_givenName: Lukas
      foaf_name: Drude, Lukas
      foaf_surname: Drude
  - foaf_Person:
      foaf_givenName: Christoph
      foaf_name: Boeddeker, Christoph
      foaf_surname: Boeddeker
      foaf_workInfoHomepage: http://www.librecat.org/personId=40767
  - foaf_Person:
      foaf_givenName: Marc
      foaf_name: Delcroix, Marc
      foaf_surname: Delcroix
  - foaf_Person:
      foaf_givenName: Tomohiro
      foaf_name: Nakatani, Tomohiro
      foaf_surname: Nakatani
  - foaf_Person:
      foaf_givenName: Reinhold
      foaf_name: Haeb-Umbach, Reinhold
      foaf_surname: Haeb-Umbach
      foaf_workInfoHomepage: http://www.librecat.org/personId=242
  bibo_doi: 10.1109/ICASSP40776.2020.9053461
  dct_date: 2020^xs_gYear
  dct_language: eng
  dct_title: End-to-End Training of Time Domain Audio Separation and Recognition@
...
