---
_id: '20504'
abstract:
- lang: eng
  text: 'In recent years time domain speech separation has excelled over frequency
    domain separation in single channel scenarios and noise-free environments. In
    this paper we dissect the gains of the time-domain audio separation network (TasNet)
    approach by gradually replacing components of an utterance-level permutation invariant
    training (u-PIT) based separation system in the frequency domain until the TasNet
    system is reached, thus blending components of frequency domain approaches with
    those of time domain approaches. Some of the intermediate variants achieve comparable
    signal-to-distortion ratio (SDR) gains to TasNet, but retain the advantage of
    frequency domain processing: compatibility with classic signal processing tools
    such as frequency-domain beamforming and the human interpretability of the masks.
    Furthermore, we show that the scale invariant signal-to-distortion ratio (si-SDR)
    criterion used as loss function in TasNet is related to a logarithmic mean square
    error criterion and that it is this criterion which contributes most reliable
    to the performance advantage of TasNet. Finally, we critically assess which gains
    in a noise-free single channel environment generalize to more realistic reverberant
    conditions.'
author:
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Darius
  full_name: Jakobeit, Darius
  last_name: Jakobeit
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Lukas
  full_name: Drude, Lukas
  last_name: Drude
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Heitkaemper J, Jakobeit D, Boeddeker C, Drude L, Haeb-Umbach R. Demystifying
    TasNet: A Dissecting Approach. In: <i>ICASSP 2020 Virtual Barcelona Spain</i>.
    ; 2020.'
  apa: 'Heitkaemper, J., Jakobeit, D., Boeddeker, C., Drude, L., &#38; Haeb-Umbach,
    R. (2020). Demystifying TasNet: A Dissecting Approach. <i>ICASSP 2020 Virtual
    Barcelona Spain</i>.'
  bibtex: '@inproceedings{Heitkaemper_Jakobeit_Boeddeker_Drude_Haeb-Umbach_2020, title={Demystifying
    TasNet: A Dissecting Approach}, booktitle={ICASSP 2020 Virtual Barcelona Spain},
    author={Heitkaemper, Jens and Jakobeit, Darius and Boeddeker, Christoph and Drude,
    Lukas and Haeb-Umbach, Reinhold}, year={2020} }'
  chicago: 'Heitkaemper, Jens, Darius Jakobeit, Christoph Boeddeker, Lukas Drude,
    and Reinhold Haeb-Umbach. “Demystifying TasNet: A Dissecting Approach.” In <i>ICASSP
    2020 Virtual Barcelona Spain</i>, 2020.'
  ieee: 'J. Heitkaemper, D. Jakobeit, C. Boeddeker, L. Drude, and R. Haeb-Umbach,
    “Demystifying TasNet: A Dissecting Approach,” 2020.'
  mla: 'Heitkaemper, Jens, et al. “Demystifying TasNet: A Dissecting Approach.” <i>ICASSP
    2020 Virtual Barcelona Spain</i>, 2020.'
  short: 'J. Heitkaemper, D. Jakobeit, C. Boeddeker, L. Drude, R. Haeb-Umbach, in:
    ICASSP 2020 Virtual Barcelona Spain, 2020.'
date_created: 2020-11-25T14:56:53Z
date_updated: 2022-01-13T08:47:32Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: closed
  content_type: application/pdf
  creator: jensheit
  date_created: 2020-12-11T12:36:37Z
  date_updated: 2020-12-11T12:36:37Z
  file_id: '20699'
  file_name: ms.pdf
  file_size: 3871374
  relation: main_file
  success: 1
file_date_updated: 2020-12-11T12:36:37Z
has_accepted_license: '1'
keyword:
- voice activity detection
- speech activity detection
- neural network
- statistical speech processing
language:
- iso: eng
license: https://creativecommons.org/publicdomain/zero/1.0/
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: ICASSP 2020 Virtual Barcelona Spain
quality_controlled: '1'
status: public
title: 'Demystifying TasNet: A Dissecting Approach'
type: conference
user_id: '40767'
year: '2020'
...
---
_id: '20505'
abstract:
- lang: eng
  text: "Speech activity detection (SAD), which often rests on the fact that the noise
    is \"more'' stationary than speech, is particularly challenging in non-stationary
    environments, because the time variance of the acoustic scene makes it difficult
    to discriminate  speech from noise. We propose two approaches to SAD, where one
    is based on statistical signal processing, while the other utilizes neural networks.
    The former employs sophisticated signal processing to track the noise and speech
    energies and is meant to support the case for a resource efficient, unsupervised
    signal processing approach.\r\nThe latter introduces a recurrent network layer
    that operates on short segments of the input speech to do temporal smoothing in
    the presence of non-stationary noise. The systems are tested on the Fearless Steps
    challenge database, which consists of the transmission data from the Apollo-11
    space mission.\r\nThe statistical SAD  achieves comparable detection performance
    to earlier proposed neural network based SADs, while the neural network based
    approach leads to a decision cost function of 1.07% on the evaluation set of the
    2020 Fearless Steps Challenge, which sets a new state of the art."
author:
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Heitkaemper J, Schmalenstroeer J, Haeb-Umbach R. Statistical and Neural Network
    Based Speech Activity Detection in Non-Stationary Acoustic Environments. In: <i>INTERSPEECH
    2020 Virtual Shanghai China</i>. ; 2020.'
  apa: Heitkaemper, J., Schmalenstroeer, J., &#38; Haeb-Umbach, R. (2020). Statistical
    and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic
    Environments. <i>INTERSPEECH 2020 Virtual Shanghai China</i>.
  bibtex: '@inproceedings{Heitkaemper_Schmalenstroeer_Haeb-Umbach_2020, title={Statistical
    and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic
    Environments}, booktitle={INTERSPEECH 2020 Virtual Shanghai China}, author={Heitkaemper,
    Jens and Schmalenstroeer, Joerg and Haeb-Umbach, Reinhold}, year={2020} }'
  chicago: Heitkaemper, Jens, Joerg Schmalenstroeer, and Reinhold Haeb-Umbach. “Statistical
    and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic
    Environments.” In <i>INTERSPEECH 2020 Virtual Shanghai China</i>, 2020.
  ieee: J. Heitkaemper, J. Schmalenstroeer, and R. Haeb-Umbach, “Statistical and Neural
    Network Based Speech Activity Detection in Non-Stationary Acoustic Environments,”
    2020.
  mla: Heitkaemper, Jens, et al. “Statistical and Neural Network Based Speech Activity
    Detection in Non-Stationary Acoustic Environments.” <i>INTERSPEECH 2020 Virtual
    Shanghai China</i>, 2020.
  short: 'J. Heitkaemper, J. Schmalenstroeer, R. Haeb-Umbach, in: INTERSPEECH 2020
    Virtual Shanghai China, 2020.'
date_created: 2020-11-25T15:03:19Z
date_updated: 2023-10-26T08:28:49Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: closed
  content_type: application/pdf
  creator: jensheit
  date_created: 2020-12-11T12:33:04Z
  date_updated: 2020-12-11T12:33:04Z
  file_id: '20697'
  file_name: ms.pdf
  file_size: 998706
  relation: main_file
  success: 1
file_date_updated: 2020-12-11T12:33:04Z
has_accepted_license: '1'
keyword:
- voice activity detection
- speech activity detection
- neural network
- statistical speech processing
language:
- iso: eng
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: INTERSPEECH 2020 Virtual Shanghai China
status: public
title: Statistical and Neural Network Based Speech Activity Detection in Non-Stationary
  Acoustic Environments
type: conference
user_id: '460'
year: '2020'
...