---
_id: '20504'
abstract:
- lang: eng
  text: 'In recent years time domain speech separation has excelled over frequency
    domain separation in single channel scenarios and noise-free environments. In
    this paper we dissect the gains of the time-domain audio separation network (TasNet)
    approach by gradually replacing components of an utterance-level permutation invariant
    training (u-PIT) based separation system in the frequency domain until the TasNet
    system is reached, thus blending components of frequency domain approaches with
    those of time domain approaches. Some of the intermediate variants achieve comparable
    signal-to-distortion ratio (SDR) gains to TasNet, but retain the advantage of
    frequency domain processing: compatibility with classic signal processing tools
    such as frequency-domain beamforming and the human interpretability of the masks.
    Furthermore, we show that the scale invariant signal-to-distortion ratio (si-SDR)
    criterion used as loss function in TasNet is related to a logarithmic mean square
    error criterion and that it is this criterion which contributes most reliable
    to the performance advantage of TasNet. Finally, we critically assess which gains
    in a noise-free single channel environment generalize to more realistic reverberant
    conditions.'
author:
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Darius
  full_name: Jakobeit, Darius
  last_name: Jakobeit
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Lukas
  full_name: Drude, Lukas
  last_name: Drude
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Heitkaemper J, Jakobeit D, Boeddeker C, Drude L, Haeb-Umbach R. Demystifying
    TasNet: A Dissecting Approach. In: <i>ICASSP 2020 Virtual Barcelona Spain</i>.
    ; 2020.'
  apa: 'Heitkaemper, J., Jakobeit, D., Boeddeker, C., Drude, L., &#38; Haeb-Umbach,
    R. (2020). Demystifying TasNet: A Dissecting Approach. <i>ICASSP 2020 Virtual
    Barcelona Spain</i>.'
  bibtex: '@inproceedings{Heitkaemper_Jakobeit_Boeddeker_Drude_Haeb-Umbach_2020, title={Demystifying
    TasNet: A Dissecting Approach}, booktitle={ICASSP 2020 Virtual Barcelona Spain},
    author={Heitkaemper, Jens and Jakobeit, Darius and Boeddeker, Christoph and Drude,
    Lukas and Haeb-Umbach, Reinhold}, year={2020} }'
  chicago: 'Heitkaemper, Jens, Darius Jakobeit, Christoph Boeddeker, Lukas Drude,
    and Reinhold Haeb-Umbach. “Demystifying TasNet: A Dissecting Approach.” In <i>ICASSP
    2020 Virtual Barcelona Spain</i>, 2020.'
  ieee: 'J. Heitkaemper, D. Jakobeit, C. Boeddeker, L. Drude, and R. Haeb-Umbach,
    “Demystifying TasNet: A Dissecting Approach,” 2020.'
  mla: 'Heitkaemper, Jens, et al. “Demystifying TasNet: A Dissecting Approach.” <i>ICASSP
    2020 Virtual Barcelona Spain</i>, 2020.'
  short: 'J. Heitkaemper, D. Jakobeit, C. Boeddeker, L. Drude, R. Haeb-Umbach, in:
    ICASSP 2020 Virtual Barcelona Spain, 2020.'
date_created: 2020-11-25T14:56:53Z
date_updated: 2022-01-13T08:47:32Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: closed
  content_type: application/pdf
  creator: jensheit
  date_created: 2020-12-11T12:36:37Z
  date_updated: 2020-12-11T12:36:37Z
  file_id: '20699'
  file_name: ms.pdf
  file_size: 3871374
  relation: main_file
  success: 1
file_date_updated: 2020-12-11T12:36:37Z
has_accepted_license: '1'
keyword:
- voice activity detection
- speech activity detection
- neural network
- statistical speech processing
language:
- iso: eng
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: ICASSP 2020 Virtual Barcelona Spain
quality_controlled: '1'
status: public
title: 'Demystifying TasNet: A Dissecting Approach'
type: conference
user_id: '40767'
year: '2020'
...
---
_id: '20505'
abstract:
- lang: eng
  text: "Speech activity detection (SAD), which often rests on the fact that the noise
    is \"more'' stationary than speech, is particularly challenging in non-stationary
    environments, because the time variance of the acoustic scene makes it difficult
    to discriminate  speech from noise. We propose two approaches to SAD, where one
    is based on statistical signal processing, while the other utilizes neural networks.
    The former employs sophisticated signal processing to track the noise and speech
    energies and is meant to support the case for a resource efficient, unsupervised
    signal processing approach.\r\nThe latter introduces a recurrent network layer
    that operates on short segments of the input speech to do temporal smoothing in
    the presence of non-stationary noise. The systems are tested on the Fearless Steps
    challenge database, which consists of the transmission data from the Apollo-11
    space mission.\r\nThe statistical SAD  achieves comparable detection performance
    to earlier proposed neural network based SADs, while the neural network based
    approach leads to a decision cost function of 1.07% on the evaluation set of the
    2020 Fearless Steps Challenge, which sets a new state of the art."
author:
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Heitkaemper J, Schmalenstroeer J, Haeb-Umbach R. Statistical and Neural Network
    Based Speech Activity Detection in Non-Stationary Acoustic Environments. In: <i>INTERSPEECH
    2020 Virtual Shanghai China</i>. ; 2020.'
  apa: Heitkaemper, J., Schmalenstroeer, J., &#38; Haeb-Umbach, R. (2020). Statistical
    and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic
    Environments. <i>INTERSPEECH 2020 Virtual Shanghai China</i>.
  bibtex: '@inproceedings{Heitkaemper_Schmalenstroeer_Haeb-Umbach_2020, title={Statistical
    and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic
    Environments}, booktitle={INTERSPEECH 2020 Virtual Shanghai China}, author={Heitkaemper,
    Jens and Schmalenstroeer, Joerg and Haeb-Umbach, Reinhold}, year={2020} }'
  chicago: Heitkaemper, Jens, Joerg Schmalenstroeer, and Reinhold Haeb-Umbach. “Statistical
    and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic
    Environments.” In <i>INTERSPEECH 2020 Virtual Shanghai China</i>, 2020.
  ieee: J. Heitkaemper, J. Schmalenstroeer, and R. Haeb-Umbach, “Statistical and Neural
    Network Based Speech Activity Detection in Non-Stationary Acoustic Environments,”
    2020.
  mla: Heitkaemper, Jens, et al. “Statistical and Neural Network Based Speech Activity
    Detection in Non-Stationary Acoustic Environments.” <i>INTERSPEECH 2020 Virtual
    Shanghai China</i>, 2020.
  short: 'J. Heitkaemper, J. Schmalenstroeer, R. Haeb-Umbach, in: INTERSPEECH 2020
    Virtual Shanghai China, 2020.'
date_created: 2020-11-25T15:03:19Z
date_updated: 2023-10-26T08:28:49Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: closed
  content_type: application/pdf
  creator: jensheit
  date_created: 2020-12-11T12:33:04Z
  date_updated: 2020-12-11T12:33:04Z
  file_id: '20697'
  file_name: ms.pdf
  file_size: 998706
  relation: main_file
  success: 1
file_date_updated: 2020-12-11T12:33:04Z
has_accepted_license: '1'
keyword:
- voice activity detection
- speech activity detection
- neural network
- statistical speech processing
language:
- iso: eng
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: INTERSPEECH 2020 Virtual Shanghai China
status: public
title: Statistical and Neural Network Based Speech Activity Detection in Non-Stationary
  Acoustic Environments
type: conference
user_id: '460'
year: '2020'
...
---
_id: '11917'
abstract:
- lang: eng
  text: In this paper we present a speech presence probability (SPP) estimation algorithmwhich
    exploits both temporal and spectral correlations of speech. To this end, the SPP
    estimation is formulated as the posterior probability estimation of the states
    of a two-dimensional (2D) Hidden Markov Model (HMM). We derive an iterative algorithm
    to decode the 2D-HMM which is based on the turbo principle. The experimental results
    show that indeed the SPP estimates improve from iteration to iteration, and further
    clearly outperform another state-of-the-art SPP estimation algorithm.
author:
- first_name: Dang Hai Tran
  full_name: Vu, Dang Hai Tran
  last_name: Vu
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Vu DHT, Haeb-Umbach R. Using the turbo principle for exploiting temporal and
    spectral correlations in speech presence probability estimation. In: <i>38th International
    Conference on Acoustics, Speech and Signal Processing (ICASSP 2013)</i>. ; 2013:863-867.
    doi:<a href="https://doi.org/10.1109/ICASSP.2013.6637771">10.1109/ICASSP.2013.6637771</a>'
  apa: Vu, D. H. T., &#38; Haeb-Umbach, R. (2013). Using the turbo principle for exploiting
    temporal and spectral correlations in speech presence probability estimation.
    In <i>38th International Conference on Acoustics, Speech and Signal Processing
    (ICASSP 2013)</i> (pp. 863–867). <a href="https://doi.org/10.1109/ICASSP.2013.6637771">https://doi.org/10.1109/ICASSP.2013.6637771</a>
  bibtex: '@inproceedings{Vu_Haeb-Umbach_2013, title={Using the turbo principle for
    exploiting temporal and spectral correlations in speech presence probability estimation},
    DOI={<a href="https://doi.org/10.1109/ICASSP.2013.6637771">10.1109/ICASSP.2013.6637771</a>},
    booktitle={38th International Conference on Acoustics, Speech and Signal Processing
    (ICASSP 2013)}, author={Vu, Dang Hai Tran and Haeb-Umbach, Reinhold}, year={2013},
    pages={863–867} }'
  chicago: Vu, Dang Hai Tran, and Reinhold Haeb-Umbach. “Using the Turbo Principle
    for Exploiting Temporal and Spectral Correlations in Speech Presence Probability
    Estimation.” In <i>38th International Conference on Acoustics, Speech and Signal
    Processing (ICASSP 2013)</i>, 863–67, 2013. <a href="https://doi.org/10.1109/ICASSP.2013.6637771">https://doi.org/10.1109/ICASSP.2013.6637771</a>.
  ieee: D. H. T. Vu and R. Haeb-Umbach, “Using the turbo principle for exploiting
    temporal and spectral correlations in speech presence probability estimation,”
    in <i>38th International Conference on Acoustics, Speech and Signal Processing
    (ICASSP 2013)</i>, 2013, pp. 863–867.
  mla: Vu, Dang Hai Tran, and Reinhold Haeb-Umbach. “Using the Turbo Principle for
    Exploiting Temporal and Spectral Correlations in Speech Presence Probability Estimation.”
    <i>38th International Conference on Acoustics, Speech and Signal Processing (ICASSP
    2013)</i>, 2013, pp. 863–67, doi:<a href="https://doi.org/10.1109/ICASSP.2013.6637771">10.1109/ICASSP.2013.6637771</a>.
  short: 'D.H.T. Vu, R. Haeb-Umbach, in: 38th International Conference on Acoustics,
    Speech and Signal Processing (ICASSP 2013), 2013, pp. 863–867.'
date_created: 2019-07-12T05:30:45Z
date_updated: 2022-01-06T06:51:12Z
department:
- _id: '54'
doi: 10.1109/ICASSP.2013.6637771
keyword:
- correlation methods
- estimation theory
- hidden Markov models
- iterative methods
- probability
- spectral analysis
- speech processing
- 2D HMM
- SPP estimates
- iterative algorithm
- posterior probability estimation
- spectral correlation
- speech presence probability estimation
- state-of-the-art SPP estimation algorithm
- temporal correlation
- turbo principle
- two-dimensional hidden Markov model
- Correlation
- Decoding
- Estimation
- Iterative decoding
- Noise
- Speech
- Vectors
language:
- iso: eng
page: 863-867
publication: 38th International Conference on Acoustics, Speech and Signal Processing
  (ICASSP 2013)
publication_identifier:
  issn:
  - 1520-6149
status: public
title: Using the turbo principle for exploiting temporal and spectral correlations
  in speech presence probability estimation
type: conference
user_id: '44006'
year: '2013'
...
---
_id: '17278'
abstract:
- lang: eng
  text: This paper investigates the influence of feedback provided by an autonomous
    robot (BIRON) on users’ discursive behavior. A user study is described during
    which users show objects to the robot. The results of the experiment indicate,
    that the robot’s verbal feedback utterances cause the humans to adapt their own
    way of speaking. The changes in users’ verbal behavior are due to their beliefs
    about the robots knowledge and abilities. In this paper they are identified and
    grouped. Moreover, the data implies variations in user behavior regarding gestures.
    Unlike speech, the robot was not able to give feedback with gestures. Due to the
    lack of feedback, users did not seem to have a consistent mental representation
    of the robot’s abilities to recognize gestures. As a result, changes between different
    gestures are interpreted to be unconscious variations accompanying speech.
author:
- first_name: Manja
  full_name: Lohse, Manja
  last_name: Lohse
- first_name: Katharina
  full_name: Rohlfing, Katharina
  id: '50352'
  last_name: Rohlfing
- first_name: Britta
  full_name: Wrede, Britta
  last_name: Wrede
- first_name: Gerhard
  full_name: Sagerer, Gerhard
  last_name: Sagerer
citation:
  ama: 'Lohse M, Rohlfing K, Wrede B, Sagerer G. “Try something else!” — When users
    change their discursive behavior in human-robot interaction. In: ; 2008:3481-3486.
    doi:<a href="https://doi.org/10.1109/ROBOT.2008.4543743">10.1109/ROBOT.2008.4543743</a>'
  apa: Lohse, M., Rohlfing, K., Wrede, B., &#38; Sagerer, G. (2008). <i>“Try something
    else!” — When users change their discursive behavior in human-robot interaction</i>.
    3481–3486. <a href="https://doi.org/10.1109/ROBOT.2008.4543743">https://doi.org/10.1109/ROBOT.2008.4543743</a>
  bibtex: '@inproceedings{Lohse_Rohlfing_Wrede_Sagerer_2008, title={“Try something
    else!” — When users change their discursive behavior in human-robot interaction},
    DOI={<a href="https://doi.org/10.1109/ROBOT.2008.4543743">10.1109/ROBOT.2008.4543743</a>},
    author={Lohse, Manja and Rohlfing, Katharina and Wrede, Britta and Sagerer, Gerhard},
    year={2008}, pages={3481–3486} }'
  chicago: Lohse, Manja, Katharina Rohlfing, Britta Wrede, and Gerhard Sagerer. “‘Try
    Something Else!’ — When Users Change Their Discursive Behavior in Human-Robot
    Interaction,” 3481–86, 2008. <a href="https://doi.org/10.1109/ROBOT.2008.4543743">https://doi.org/10.1109/ROBOT.2008.4543743</a>.
  ieee: 'M. Lohse, K. Rohlfing, B. Wrede, and G. Sagerer, “‘Try something else!’ —
    When users change their discursive behavior in human-robot interaction,” 2008,
    pp. 3481–3486, doi: <a href="https://doi.org/10.1109/ROBOT.2008.4543743">10.1109/ROBOT.2008.4543743</a>.'
  mla: Lohse, Manja, et al. <i>“Try Something Else!” — When Users Change Their Discursive
    Behavior in Human-Robot Interaction</i>. 2008, pp. 3481–86, doi:<a href="https://doi.org/10.1109/ROBOT.2008.4543743">10.1109/ROBOT.2008.4543743</a>.
  short: 'M. Lohse, K. Rohlfing, B. Wrede, G. Sagerer, in: 2008, pp. 3481–3486.'
date_created: 2020-06-24T13:02:49Z
date_updated: 2023-02-01T13:08:20Z
department:
- _id: '749'
doi: 10.1109/ROBOT.2008.4543743
keyword:
- discursive behavior
- autonomous robot
- BIRON
- man-machine systems
- robot abilities
- robot knowledge
- user gestures
- robot verbal feedback utterance
- speech processing
- user verbal behavior
- service robots
- human-robot interaction
- human computer interaction
- gesture recognition
language:
- iso: eng
page: 3481-3486
publication_identifier:
  isbn:
  - 1050-4729
status: public
title: “Try something else!” — When users change their discursive behavior in human-robot
  interaction
type: conference
user_id: '14931'
year: '2008'
...
---
_id: '2433'
author:
- first_name: Christian
  full_name: Plessl, Christian
  id: '16153'
  last_name: Plessl
  orcid: 0000-0001-5728-9982
- first_name: Simon
  full_name: Maurer, Simon
  last_name: Maurer
citation:
  ama: Plessl C, Maurer S. <i>Hardware/Software Codesign in Speech Compression Applications</i>.
    Computer Engineering and Networks Lab, ETH Zurich, Switzerland; 2000.
  apa: Plessl, C., &#38; Maurer, S. (2000). <i>Hardware/Software Codesign in Speech
    Compression Applications</i>. Computer Engineering and Networks Lab, ETH Zurich,
    Switzerland.
  bibtex: '@book{Plessl_Maurer_2000, title={Hardware/Software Codesign in Speech Compression
    Applications}, publisher={Computer Engineering and Networks Lab, ETH Zurich, Switzerland},
    author={Plessl, Christian and Maurer, Simon}, year={2000} }'
  chicago: Plessl, Christian, and Simon Maurer. <i>Hardware/Software Codesign in Speech
    Compression Applications</i>. Computer Engineering and Networks Lab, ETH Zurich,
    Switzerland, 2000.
  ieee: C. Plessl and S. Maurer, <i>Hardware/Software Codesign in Speech Compression
    Applications</i>. Computer Engineering and Networks Lab, ETH Zurich, Switzerland,
    2000.
  mla: Plessl, Christian, and Simon Maurer. <i>Hardware/Software Codesign in Speech
    Compression Applications</i>. Computer Engineering and Networks Lab, ETH Zurich,
    Switzerland, 2000.
  short: C. Plessl, S. Maurer, Hardware/Software Codesign in Speech Compression Applications,
    Computer Engineering and Networks Lab, ETH Zurich, Switzerland, 2000.
date_created: 2018-04-17T15:56:00Z
date_updated: 2022-01-06T06:56:17Z
department:
- _id: '518'
keyword:
- co-design
- speech processing
publisher: Computer Engineering and Networks Lab, ETH Zurich, Switzerland
status: public
title: Hardware/Software Codesign in Speech Compression Applications
type: mastersthesis
user_id: '24135'
year: '2000'
...