---
_id: '49111'
abstract:
- lang: eng
  text: Due to the high variation in the application requirements of sound event detection
    (SED) systems, it is not sufficient to evaluate systems only in a single operating
    mode. Therefore, the community recently adopted the polyphonic sound detection
    score (PSDS) as an evaluation metric, which is the normalized area under the PSD
    receiver operating characteristic (PSD-ROC). It summarizes the system performance
    over a range of operating modes resulting from varying the decision threshold
    that is used to translate the system output scores into a binary detection output.
    Hence, it provides a more complete picture of the overall system behavior and
    is less biased by specific threshold tuning. However, besides the decision threshold
    there is also the post-processing that can be changed to enter another operating
    mode. In this paper we propose the post-processing independent PSDS (piPSDS) as
    a generalization of the PSDS. Here, the post-processing independent PSD-ROC includes
    operating points from varying post-processings with varying decision thresholds.
    Thus, it summarizes even more operating modes of an SED system and allows for
    system comparison without the need of implementing a post-processing and without
    a bias due to different post-processings. While piPSDS can in principle combine
    different types of post-processing, we here, as a first step, present median filter
    independent PSDS (miPSDS) results for this year’s DCASE Challenge Task4a systems.
    Source code is publicly available in our sed_scores_eval package (https://github.com/fgnt/sed_scores_eval).
author:
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Romain
  full_name: Serizel, Romain
  last_name: Serizel
citation:
  ama: 'Ebbers J, Haeb-Umbach R, Serizel R. Post-Processing Independent Evaluation
    of Sound Event Detection Systems. In: <i>Proceedings of the 8th Detection and
    Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023)</i>. ;
    2023:36–40.'
  apa: Ebbers, J., Haeb-Umbach, R., &#38; Serizel, R. (2023). Post-Processing Independent
    Evaluation of Sound Event Detection Systems. <i>Proceedings of the 8th Detection
    and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023)</i>,
    36–40.
  bibtex: '@inproceedings{Ebbers_Haeb-Umbach_Serizel_2023, place={Tampere, Finland},
    title={Post-Processing Independent Evaluation of Sound Event Detection Systems},
    booktitle={Proceedings of the 8th Detection and Classification of Acoustic Scenes
    and Events 2023 Workshop (DCASE2023)}, author={Ebbers, Janek and Haeb-Umbach,
    Reinhold and Serizel, Romain}, year={2023}, pages={36–40} }'
  chicago: Ebbers, Janek, Reinhold Haeb-Umbach, and Romain Serizel. “Post-Processing
    Independent Evaluation of Sound Event Detection Systems.” In <i>Proceedings of
    the 8th Detection and Classification of Acoustic Scenes and Events 2023 Workshop
    (DCASE2023)</i>, 36–40. Tampere, Finland, 2023.
  ieee: J. Ebbers, R. Haeb-Umbach, and R. Serizel, “Post-Processing Independent Evaluation
    of Sound Event Detection Systems,” in <i>Proceedings of the 8th Detection and
    Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023)</i>, 2023,
    pp. 36–40.
  mla: Ebbers, Janek, et al. “Post-Processing Independent Evaluation of Sound Event
    Detection Systems.” <i>Proceedings of the 8th Detection and Classification of
    Acoustic Scenes and Events 2023 Workshop (DCASE2023)</i>, 2023, pp. 36–40.
  short: 'J. Ebbers, R. Haeb-Umbach, R. Serizel, in: Proceedings of the 8th Detection
    and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023), Tampere,
    Finland, 2023, pp. 36–40.'
date_created: 2023-11-22T08:20:26Z
date_updated: 2024-11-15T20:34:18Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: closed
  content_type: application/pdf
  creator: ebbers
  date_created: 2023-11-22T08:25:08Z
  date_updated: 2023-11-22T08:25:08Z
  file_id: '49112'
  file_name: dcase2023_ebbers.pdf
  file_size: 221875
  relation: main_file
  success: 1
file_date_updated: 2023-11-22T08:25:08Z
has_accepted_license: '1'
language:
- iso: eng
page: 36–40
place: Tampere, Finland
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: Proceedings of the 8th Detection and Classification of Acoustic Scenes
  and Events 2023 Workshop (DCASE2023)
quality_controlled: '1'
status: public
title: Post-Processing Independent Evaluation of Sound Event Detection Systems
type: conference
user_id: '34851'
year: '2023'
...
---
_id: '57098'
author:
- first_name: Fritz
  full_name: Seebauer, Fritz
  last_name: Seebauer
- first_name: Michael
  full_name: Kuhlmann, Michael
  id: '49871'
  last_name: Kuhlmann
- first_name: Reinhold
  full_name: Häb-Umbach, Reinhold
  id: '242'
  last_name: Häb-Umbach
- first_name: Petra
  full_name: Wagner, Petra
  last_name: Wagner
citation:
  ama: 'Seebauer F, Kuhlmann M, Häb-Umbach R, Wagner P. DISCERNING DIMENSIONS OF QUALITY
    FOR STATE OF THE ART SYNTHETIC SPEECH. In: <i>Proceedings of the 20th International
    Congress of Phonetic Sciences</i>. ; 2023.'
  apa: Seebauer, F., Kuhlmann, M., Häb-Umbach, R., &#38; Wagner, P. (2023). DISCERNING
    DIMENSIONS OF QUALITY FOR STATE OF THE ART SYNTHETIC SPEECH. <i>Proceedings of
    the 20th International Congress of Phonetic Sciences</i>. International Congress
    of Phonetic Sciences (ICPhS), Prague.
  bibtex: '@inproceedings{Seebauer_Kuhlmann_Häb-Umbach_Wagner_2023, title={DISCERNING
    DIMENSIONS OF QUALITY FOR STATE OF THE ART SYNTHETIC SPEECH}, booktitle={Proceedings
    of the 20th International Congress of Phonetic Sciences}, author={Seebauer, Fritz
    and Kuhlmann, Michael and Häb-Umbach, Reinhold and Wagner, Petra}, year={2023}
    }'
  chicago: Seebauer, Fritz, Michael Kuhlmann, Reinhold Häb-Umbach, and Petra Wagner.
    “DISCERNING DIMENSIONS OF QUALITY FOR STATE OF THE ART SYNTHETIC SPEECH.” In <i>Proceedings
    of the 20th International Congress of Phonetic Sciences</i>, 2023.
  ieee: F. Seebauer, M. Kuhlmann, R. Häb-Umbach, and P. Wagner, “DISCERNING DIMENSIONS
    OF QUALITY FOR STATE OF THE ART SYNTHETIC SPEECH,” presented at the International
    Congress of Phonetic Sciences (ICPhS), Prague, 2023.
  mla: Seebauer, Fritz, et al. “DISCERNING DIMENSIONS OF QUALITY FOR STATE OF THE
    ART SYNTHETIC SPEECH.” <i>Proceedings of the 20th International Congress of Phonetic
    Sciences</i>, 2023.
  short: 'F. Seebauer, M. Kuhlmann, R. Häb-Umbach, P. Wagner, in: Proceedings of the
    20th International Congress of Phonetic Sciences, 2023.'
conference:
  end_date: 2023-08-11
  location: Prague
  name: International Congress of Phonetic Sciences (ICPhS)
  start_date: 2023-08-07
date_created: 2024-11-15T06:49:27Z
date_updated: 2024-11-15T06:54:55Z
department:
- _id: '54'
language:
- iso: eng
publication: Proceedings of the 20th International Congress of Phonetic Sciences
publication_identifier:
  isbn:
  - 978-80-908 114-2-3
status: public
title: DISCERNING DIMENSIONS OF QUALITY FOR STATE OF THE ART SYNTHETIC SPEECH
type: conference
user_id: '49871'
year: '2023'
...
---
_id: '48281'
abstract:
- lang: eng
  text: "\tWe propose a general framework to compute the word error rate (WER) of
    ASR systems that process recordings containing multiple speakers at their input
    and that produce multiple output word sequences (MIMO).\r\n\tSuch ASR systems
    are typically required, e.g., for meeting transcription.\r\n\tWe provide an efficient
    implementation based on a dynamic programming search in a multi-dimensional Levenshtein
    distance tensor under the constraint that a reference utterance must be matched
    consistently with one hypothesis output. \r\n\tThis also results in an efficient
    implementation of the ORC WER which previously suffered from exponential complexity.\r\n\tWe
    give an overview of commonly used WER definitions for multi-speaker scenarios
    and show that they are specializations of the above MIMO WER tuned to particular
    application scenarios. \r\n\tWe conclude with a  discussion of the pros and cons
    of the various WER definitions and a recommendation when to use which."
author:
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'von Neumann T, Boeddeker C, Kinoshita K, Delcroix M, Haeb-Umbach R. On Word
    Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech
    Recognition Systems. In: <i>ICASSP 2023 - 2023 IEEE International Conference on
    Acoustics, Speech and Signal Processing (ICASSP)</i>. IEEE; 2023. doi:<a href="https://doi.org/10.1109/icassp49357.2023.10094784">10.1109/icassp49357.2023.10094784</a>'
  apa: von Neumann, T., Boeddeker, C., Kinoshita, K., Delcroix, M., &#38; Haeb-Umbach,
    R. (2023). On Word Error Rate Definitions and Their Efficient Computation for
    Multi-Speaker Speech Recognition Systems. <i>ICASSP 2023 - 2023 IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>. <a href="https://doi.org/10.1109/icassp49357.2023.10094784">https://doi.org/10.1109/icassp49357.2023.10094784</a>
  bibtex: '@inproceedings{von Neumann_Boeddeker_Kinoshita_Delcroix_Haeb-Umbach_2023,
    title={On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker
    Speech Recognition Systems}, DOI={<a href="https://doi.org/10.1109/icassp49357.2023.10094784">10.1109/icassp49357.2023.10094784</a>},
    booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)}, publisher={IEEE}, author={von Neumann, Thilo
    and Boeddeker, Christoph and Kinoshita, Keisuke and Delcroix, Marc and Haeb-Umbach,
    Reinhold}, year={2023} }'
  chicago: Neumann, Thilo von, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix,
    and Reinhold Haeb-Umbach. “On Word Error Rate Definitions and Their Efficient
    Computation for Multi-Speaker Speech Recognition Systems.” In <i>ICASSP 2023 -
    2023 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)</i>. IEEE, 2023. <a href="https://doi.org/10.1109/icassp49357.2023.10094784">https://doi.org/10.1109/icassp49357.2023.10094784</a>.
  ieee: 'T. von Neumann, C. Boeddeker, K. Kinoshita, M. Delcroix, and R. Haeb-Umbach,
    “On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker
    Speech Recognition Systems,” 2023, doi: <a href="https://doi.org/10.1109/icassp49357.2023.10094784">10.1109/icassp49357.2023.10094784</a>.'
  mla: von Neumann, Thilo, et al. “On Word Error Rate Definitions and Their Efficient
    Computation for Multi-Speaker Speech Recognition Systems.” <i>ICASSP 2023 - 2023
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>,
    IEEE, 2023, doi:<a href="https://doi.org/10.1109/icassp49357.2023.10094784">10.1109/icassp49357.2023.10094784</a>.
  short: 'T. von Neumann, C. Boeddeker, K. Kinoshita, M. Delcroix, R. Haeb-Umbach,
    in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and
    Signal Processing (ICASSP), IEEE, 2023.'
date_created: 2023-10-19T07:38:31Z
date_updated: 2025-02-12T09:16:34Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/icassp49357.2023.10094784
file:
- access_level: open_access
  content_type: application/pdf
  creator: tvn
  date_created: 2023-10-19T07:39:57Z
  date_updated: 2023-10-19T07:41:56Z
  file_id: '48282'
  file_name: ICASSP_2023_Meeting_Evaluation.pdf
  file_size: 204994
  relation: main_file
file_date_updated: 2023-10-19T07:41:56Z
has_accepted_license: '1'
keyword:
- Word Error Rate
- Meeting Recognition
- Levenshtein Distance
language:
- iso: eng
main_file_link:
- url: https://ieeexplore.ieee.org/document/10094784
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
publication_status: published
publisher: IEEE
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/fgnt/meeteval
status: public
title: On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker
  Speech Recognition Systems
type: conference
user_id: '40767'
year: '2023'
...
---
_id: '48275'
abstract:
- lang: eng
  text: "MeetEval is an open-source toolkit to evaluate  all kinds of meeting transcription
    systems.\r\nIt provides a unified interface for the computation of commonly used
    Word Error Rates (WERs), specifically cpWER, ORC WER and MIMO WER along other
    WER definitions.\r\nWe extend the cpWER computation by a temporal constraint to
    ensure that only words are identified as correct when the temporal alignment is
    plausible.\r\nThis leads to a better quality of the matching of the hypothesis
    string to the reference string that more closely resembles the actual transcription
    quality, and a system is penalized if it provides poor time annotations.\r\nSince
    word-level timing information is often not available, we present a way to approximate
    exact word-level timings from segment-level timings (e.g., a sentence) and show
    that the approximation leads to a similar WER as a matching with exact word-level
    annotations.\r\nAt the same time, the time constraint leads to a speedup of the
    matching algorithm, which outweighs the additional overhead caused by processing
    the time stamps."
author:
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'von Neumann T, Boeddeker C, Delcroix M, Haeb-Umbach R. MeetEval: A Toolkit
    for Computation of Word Error Rates for Meeting Transcription Systems. In: <i>Proc.
    CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>. ; 2023.'
  apa: 'von Neumann, T., Boeddeker, C., Delcroix, M., &#38; Haeb-Umbach, R. (2023).
    MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription
    Systems. <i>Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>.
    CHiME 2023 Workshop on Speech Processing in Everyday Environments, Dublin.'
  bibtex: '@inproceedings{von Neumann_Boeddeker_Delcroix_Haeb-Umbach_2023, title={MeetEval:
    A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems},
    booktitle={Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments},
    author={von Neumann, Thilo and Boeddeker, Christoph and Delcroix, Marc and Haeb-Umbach,
    Reinhold}, year={2023} }'
  chicago: 'Neumann, Thilo von, Christoph Boeddeker, Marc Delcroix, and Reinhold Haeb-Umbach.
    “MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription
    Systems.” In <i>Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>,
    2023.'
  ieee: 'T. von Neumann, C. Boeddeker, M. Delcroix, and R. Haeb-Umbach, “MeetEval:
    A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems,”
    presented at the CHiME 2023 Workshop on Speech Processing in Everyday Environments,
    Dublin, 2023.'
  mla: 'von Neumann, Thilo, et al. “MeetEval: A Toolkit for Computation of Word Error
    Rates for Meeting Transcription Systems.” <i>Proc. CHiME 2023 Workshop on Speech
    Processing in Everyday Environments</i>, 2023.'
  short: 'T. von Neumann, C. Boeddeker, M. Delcroix, R. Haeb-Umbach, in: Proc. CHiME
    2023 Workshop on Speech Processing in Everyday Environments, 2023.'
conference:
  location: Dublin
  name: CHiME 2023 Workshop on Speech Processing in Everyday Environments
date_created: 2023-10-19T07:24:51Z
date_updated: 2025-02-12T09:12:05Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: tvn
  date_created: 2023-10-19T07:19:59Z
  date_updated: 2023-10-19T07:19:59Z
  file_id: '48276'
  file_name: Chime_7__MeetEval.pdf
  file_size: 263744
  relation: main_file
file_date_updated: 2023-10-19T07:19:59Z
has_accepted_license: '1'
keyword:
- Speech Recognition
- Word Error Rate
- Meeting Transcription
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/abs/2307.11394
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/fgnt/meeteval
status: public
title: 'MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription
  Systems'
type: conference
user_id: '40767'
year: '2023'
...
---
_id: '47128'
author:
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Cătălin
  full_name: Zorilă, Cătălin
  last_name: Zorilă
- first_name: Rama
  full_name: Doddipatla, Rama
  last_name: Doddipatla
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Cord-Landwehr T, Boeddeker C, Zorilă C, Doddipatla R, Haeb-Umbach R. Frame-Wise
    and Overlap-Robust Speaker Embeddings for Meeting Diarization. In: <i>ICASSP 2023
    - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)</i>. IEEE; 2023. doi:<a href="https://doi.org/10.1109/icassp49357.2023.10095370">10.1109/icassp49357.2023.10095370</a>'
  apa: Cord-Landwehr, T., Boeddeker, C., Zorilă, C., Doddipatla, R., &#38; Haeb-Umbach,
    R. (2023). Frame-Wise and Overlap-Robust Speaker Embeddings for Meeting Diarization.
    <i>ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal
    Processing (ICASSP)</i>. 2023 IEEE International Conference on Acoustics, Speech,
    and Signal Processing (ICASSP), Rhodes. <a href="https://doi.org/10.1109/icassp49357.2023.10095370">https://doi.org/10.1109/icassp49357.2023.10095370</a>
  bibtex: '@inproceedings{Cord-Landwehr_Boeddeker_Zorilă_Doddipatla_Haeb-Umbach_2023,
    title={Frame-Wise and Overlap-Robust Speaker Embeddings for Meeting Diarization},
    DOI={<a href="https://doi.org/10.1109/icassp49357.2023.10095370">10.1109/icassp49357.2023.10095370</a>},
    booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)}, publisher={IEEE}, author={Cord-Landwehr, Tobias
    and Boeddeker, Christoph and Zorilă, Cătălin and Doddipatla, Rama and Haeb-Umbach,
    Reinhold}, year={2023} }'
  chicago: Cord-Landwehr, Tobias, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla,
    and Reinhold Haeb-Umbach. “Frame-Wise and Overlap-Robust Speaker Embeddings for
    Meeting Diarization.” In <i>ICASSP 2023 - 2023 IEEE International Conference on
    Acoustics, Speech and Signal Processing (ICASSP)</i>. IEEE, 2023. <a href="https://doi.org/10.1109/icassp49357.2023.10095370">https://doi.org/10.1109/icassp49357.2023.10095370</a>.
  ieee: 'T. Cord-Landwehr, C. Boeddeker, C. Zorilă, R. Doddipatla, and R. Haeb-Umbach,
    “Frame-Wise and Overlap-Robust Speaker Embeddings for Meeting Diarization,” presented
    at the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing
    (ICASSP), Rhodes, 2023, doi: <a href="https://doi.org/10.1109/icassp49357.2023.10095370">10.1109/icassp49357.2023.10095370</a>.'
  mla: Cord-Landwehr, Tobias, et al. “Frame-Wise and Overlap-Robust Speaker Embeddings
    for Meeting Diarization.” <i>ICASSP 2023 - 2023 IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP)</i>, IEEE, 2023, doi:<a href="https://doi.org/10.1109/icassp49357.2023.10095370">10.1109/icassp49357.2023.10095370</a>.
  short: 'T. Cord-Landwehr, C. Boeddeker, C. Zorilă, R. Doddipatla, R. Haeb-Umbach,
    in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and
    Signal Processing (ICASSP), IEEE, 2023.'
conference:
  location: Rhodes
  name: 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing
    (ICASSP)
date_created: 2023-09-19T14:01:20Z
date_updated: 2025-02-12T09:14:45Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/icassp49357.2023.10095370
file:
- access_level: open_access
  content_type: application/pdf
  creator: cord
  date_created: 2023-11-15T14:56:18Z
  date_updated: 2023-11-15T14:56:18Z
  file_id: '48932'
  file_name: teacher_student_embeddings.pdf
  file_size: 246306
  relation: main_file
file_date_updated: 2023-11-15T14:56:18Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
publication_status: published
publisher: IEEE
status: public
title: Frame-Wise and Overlap-Robust Speaker Embeddings for Meeting Diarization
type: conference
user_id: '40767'
year: '2023'
...
---
_id: '47129'
author:
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Cătălin
  full_name: Zorilă, Cătălin
  last_name: Zorilă
- first_name: Rama
  full_name: Doddipatla, Rama
  last_name: Doddipatla
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Cord-Landwehr T, Boeddeker C, Zorilă C, Doddipatla R, Haeb-Umbach R. A Teacher-Student
    Approach for Extracting Informative Speaker Embeddings From Speech Mixtures. In:
    <i>INTERSPEECH 2023</i>. ISCA; 2023. doi:<a href="https://doi.org/10.21437/interspeech.2023-1379">10.21437/interspeech.2023-1379</a>'
  apa: Cord-Landwehr, T., Boeddeker, C., Zorilă, C., Doddipatla, R., &#38; Haeb-Umbach,
    R. (2023). A Teacher-Student Approach for Extracting Informative Speaker Embeddings
    From Speech Mixtures. <i>INTERSPEECH 2023</i>. <a href="https://doi.org/10.21437/interspeech.2023-1379">https://doi.org/10.21437/interspeech.2023-1379</a>
  bibtex: '@inproceedings{Cord-Landwehr_Boeddeker_Zorilă_Doddipatla_Haeb-Umbach_2023,
    title={A Teacher-Student Approach for Extracting Informative Speaker Embeddings
    From Speech Mixtures}, DOI={<a href="https://doi.org/10.21437/interspeech.2023-1379">10.21437/interspeech.2023-1379</a>},
    booktitle={INTERSPEECH 2023}, publisher={ISCA}, author={Cord-Landwehr, Tobias
    and Boeddeker, Christoph and Zorilă, Cătălin and Doddipatla, Rama and Haeb-Umbach,
    Reinhold}, year={2023} }'
  chicago: Cord-Landwehr, Tobias, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla,
    and Reinhold Haeb-Umbach. “A Teacher-Student Approach for Extracting Informative
    Speaker Embeddings From Speech Mixtures.” In <i>INTERSPEECH 2023</i>. ISCA, 2023.
    <a href="https://doi.org/10.21437/interspeech.2023-1379">https://doi.org/10.21437/interspeech.2023-1379</a>.
  ieee: 'T. Cord-Landwehr, C. Boeddeker, C. Zorilă, R. Doddipatla, and R. Haeb-Umbach,
    “A Teacher-Student Approach for Extracting Informative Speaker Embeddings From
    Speech Mixtures,” 2023, doi: <a href="https://doi.org/10.21437/interspeech.2023-1379">10.21437/interspeech.2023-1379</a>.'
  mla: Cord-Landwehr, Tobias, et al. “A Teacher-Student Approach for Extracting Informative
    Speaker Embeddings From Speech Mixtures.” <i>INTERSPEECH 2023</i>, ISCA, 2023,
    doi:<a href="https://doi.org/10.21437/interspeech.2023-1379">10.21437/interspeech.2023-1379</a>.
  short: 'T. Cord-Landwehr, C. Boeddeker, C. Zorilă, R. Doddipatla, R. Haeb-Umbach,
    in: INTERSPEECH 2023, ISCA, 2023.'
date_created: 2023-09-19T14:34:37Z
date_updated: 2025-02-12T09:15:28Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.21437/interspeech.2023-1379
file:
- access_level: open_access
  content_type: application/pdf
  creator: cord
  date_created: 2023-11-15T15:00:02Z
  date_updated: 2023-11-15T15:00:02Z
  file_id: '48933'
  file_name: multispeaker_embeddings.pdf
  file_size: 303203
  relation: main_file
file_date_updated: 2023-11-15T15:00:02Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: INTERSPEECH 2023
publication_status: published
publisher: ISCA
status: public
title: A Teacher-Student Approach for Extracting Informative Speaker Embeddings From
  Speech Mixtures
type: conference
user_id: '40767'
year: '2023'
...
---
_id: '54439'
author:
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Boeddeker C, Cord-Landwehr T, von Neumann T, Haeb-Umbach R. Multi-stage diarization
    refinement for the CHiME-7 DASR scenario. In: <i>7th International Workshop on
    Speech Processing in Everyday Environments (CHiME 2023)</i>. ISCA; 2023. doi:<a
    href="https://doi.org/10.21437/chime.2023-10">10.21437/chime.2023-10</a>'
  apa: Boeddeker, C., Cord-Landwehr, T., von Neumann, T., &#38; Haeb-Umbach, R. (2023).
    Multi-stage diarization refinement for the CHiME-7 DASR scenario. <i>7th International
    Workshop on Speech Processing in Everyday Environments (CHiME 2023)</i>. <a href="https://doi.org/10.21437/chime.2023-10">https://doi.org/10.21437/chime.2023-10</a>
  bibtex: '@inproceedings{Boeddeker_Cord-Landwehr_von Neumann_Haeb-Umbach_2023, title={Multi-stage
    diarization refinement for the CHiME-7 DASR scenario}, DOI={<a href="https://doi.org/10.21437/chime.2023-10">10.21437/chime.2023-10</a>},
    booktitle={7th International Workshop on Speech Processing in Everyday Environments
    (CHiME 2023)}, publisher={ISCA}, author={Boeddeker, Christoph and Cord-Landwehr,
    Tobias and von Neumann, Thilo and Haeb-Umbach, Reinhold}, year={2023} }'
  chicago: Boeddeker, Christoph, Tobias Cord-Landwehr, Thilo von Neumann, and Reinhold
    Haeb-Umbach. “Multi-Stage Diarization Refinement for the CHiME-7 DASR Scenario.”
    In <i>7th International Workshop on Speech Processing in Everyday Environments
    (CHiME 2023)</i>. ISCA, 2023. <a href="https://doi.org/10.21437/chime.2023-10">https://doi.org/10.21437/chime.2023-10</a>.
  ieee: 'C. Boeddeker, T. Cord-Landwehr, T. von Neumann, and R. Haeb-Umbach, “Multi-stage
    diarization refinement for the CHiME-7 DASR scenario,” 2023, doi: <a href="https://doi.org/10.21437/chime.2023-10">10.21437/chime.2023-10</a>.'
  mla: Boeddeker, Christoph, et al. “Multi-Stage Diarization Refinement for the CHiME-7
    DASR Scenario.” <i>7th International Workshop on Speech Processing in Everyday
    Environments (CHiME 2023)</i>, ISCA, 2023, doi:<a href="https://doi.org/10.21437/chime.2023-10">10.21437/chime.2023-10</a>.
  short: 'C. Boeddeker, T. Cord-Landwehr, T. von Neumann, R. Haeb-Umbach, in: 7th
    International Workshop on Speech Processing in Everyday Environments (CHiME 2023),
    ISCA, 2023.'
date_created: 2024-05-23T15:16:15Z
date_updated: 2025-02-12T09:16:13Z
department:
- _id: '54'
doi: 10.21437/chime.2023-10
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://www.isca-archive.org/chime_2023/boeddeker23_chime.pdf
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: 7th International Workshop on Speech Processing in Everyday Environments
  (CHiME 2023)
publication_status: published
publisher: ISCA
status: public
title: Multi-stage diarization refinement for the CHiME-7 DASR scenario
type: conference
user_id: '40767'
year: '2023'
...
---
_id: '48390'
author:
- first_name: Simon
  full_name: Berger, Simon
  last_name: Berger
- first_name: Peter
  full_name: Vieting, Peter
  last_name: Vieting
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Ralf
  full_name: Schlüter, Ralf
  last_name: Schlüter
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Berger S, Vieting P, Boeddeker C, Schlüter R, Haeb-Umbach R. Mixture Encoder
    for Joint Speech Separation and Recognition. In: <i>INTERSPEECH 2023</i>. ISCA;
    2023. doi:<a href="https://doi.org/10.21437/interspeech.2023-1815">10.21437/interspeech.2023-1815</a>'
  apa: Berger, S., Vieting, P., Boeddeker, C., Schlüter, R., &#38; Haeb-Umbach, R.
    (2023). Mixture Encoder for Joint Speech Separation and Recognition. <i>INTERSPEECH
    2023</i>. <a href="https://doi.org/10.21437/interspeech.2023-1815">https://doi.org/10.21437/interspeech.2023-1815</a>
  bibtex: '@inproceedings{Berger_Vieting_Boeddeker_Schlüter_Haeb-Umbach_2023, title={Mixture
    Encoder for Joint Speech Separation and Recognition}, DOI={<a href="https://doi.org/10.21437/interspeech.2023-1815">10.21437/interspeech.2023-1815</a>},
    booktitle={INTERSPEECH 2023}, publisher={ISCA}, author={Berger, Simon and Vieting,
    Peter and Boeddeker, Christoph and Schlüter, Ralf and Haeb-Umbach, Reinhold},
    year={2023} }'
  chicago: Berger, Simon, Peter Vieting, Christoph Boeddeker, Ralf Schlüter, and Reinhold
    Haeb-Umbach. “Mixture Encoder for Joint Speech Separation and Recognition.” In
    <i>INTERSPEECH 2023</i>. ISCA, 2023. <a href="https://doi.org/10.21437/interspeech.2023-1815">https://doi.org/10.21437/interspeech.2023-1815</a>.
  ieee: 'S. Berger, P. Vieting, C. Boeddeker, R. Schlüter, and R. Haeb-Umbach, “Mixture
    Encoder for Joint Speech Separation and Recognition,” 2023, doi: <a href="https://doi.org/10.21437/interspeech.2023-1815">10.21437/interspeech.2023-1815</a>.'
  mla: Berger, Simon, et al. “Mixture Encoder for Joint Speech Separation and Recognition.”
    <i>INTERSPEECH 2023</i>, ISCA, 2023, doi:<a href="https://doi.org/10.21437/interspeech.2023-1815">10.21437/interspeech.2023-1815</a>.
  short: 'S. Berger, P. Vieting, C. Boeddeker, R. Schlüter, R. Haeb-Umbach, in: INTERSPEECH
    2023, ISCA, 2023.'
date_created: 2023-10-23T15:06:39Z
date_updated: 2025-02-12T09:11:30Z
department:
- _id: '54'
doi: 10.21437/interspeech.2023-1815
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://www.isca-archive.org/interspeech_2023/berger23_interspeech.pdf
oa: '1'
project:
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: INTERSPEECH 2023
publication_status: published
publisher: ISCA
status: public
title: Mixture Encoder for Joint Speech Separation and Recognition
type: conference
user_id: '40767'
year: '2023'
...
---
_id: '57086'
author:
- first_name: Michael
  full_name: Kuhlmann, Michael
  id: '49871'
  last_name: Kuhlmann
- first_name: Adrian Tobias
  full_name: Meise, Adrian Tobias
  id: '79268'
  last_name: Meise
- first_name: Fritz
  full_name: Seebauer, Fritz
  last_name: Seebauer
- first_name: Petra
  full_name: Wagner, Petra
  last_name: Wagner
- first_name: Reinhold
  full_name: Häb-Umbach, Reinhold
  id: '242'
  last_name: Häb-Umbach
citation:
  ama: 'Kuhlmann M, Meise AT, Seebauer F, Wagner P, Häb-Umbach R. Investigating Speaker
    Embedding Disentanglement on Natural Read Speech. In: <i>Speech Communication;
    15th ITG Conference</i>. ; 2023:121–125.'
  apa: Kuhlmann, M., Meise, A. T., Seebauer, F., Wagner, P., &#38; Häb-Umbach, R.
    (2023). Investigating Speaker Embedding Disentanglement on Natural Read Speech.
    <i>Speech Communication; 15th ITG Conference</i>, 121–125.
  bibtex: '@inproceedings{Kuhlmann_Meise_Seebauer_Wagner_Häb-Umbach_2023, title={Investigating
    Speaker Embedding Disentanglement on Natural Read Speech}, booktitle={Speech Communication;
    15th ITG Conference}, author={Kuhlmann, Michael and Meise, Adrian Tobias and Seebauer,
    Fritz and Wagner, Petra and Häb-Umbach, Reinhold}, year={2023}, pages={121–125}
    }'
  chicago: Kuhlmann, Michael, Adrian Tobias Meise, Fritz Seebauer, Petra Wagner, and
    Reinhold Häb-Umbach. “Investigating Speaker Embedding Disentanglement on Natural
    Read Speech.” In <i>Speech Communication; 15th ITG Conference</i>, 121–125, 2023.
  ieee: M. Kuhlmann, A. T. Meise, F. Seebauer, P. Wagner, and R. Häb-Umbach, “Investigating
    Speaker Embedding Disentanglement on Natural Read Speech,” in <i>Speech Communication;
    15th ITG Conference</i>, 2023, pp. 121–125.
  mla: Kuhlmann, Michael, et al. “Investigating Speaker Embedding Disentanglement
    on Natural Read Speech.” <i>Speech Communication; 15th ITG Conference</i>, 2023,
    pp. 121–125.
  short: 'M. Kuhlmann, A.T. Meise, F. Seebauer, P. Wagner, R. Häb-Umbach, in: Speech
    Communication; 15th ITG Conference, 2023, pp. 121–125.'
date_created: 2024-11-14T09:45:03Z
date_updated: 2026-01-05T10:12:23Z
department:
- _id: '54'
language:
- iso: eng
page: 121–125
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: Speech Communication; 15th ITG Conference
status: public
title: Investigating Speaker Embedding Disentanglement on Natural Read Speech
type: conference
user_id: '49871'
year: '2023'
...
---
_id: '33471'
abstract:
- lang: eng
  text: "The intelligibility of demodulated audio signals from analog high frequency
    transmissions, e.g., using single-sideband\r\n(SSB) modulation, can be severely
    degraded by channel distortions and/or a mismatch between modulation and demodulation
    carrier frequency. In this work a neural network (NN)-based approach for carrier
    frequency offset (CFO) estimation from demodulated SSB signals is proposed, whereby
    a task specific architecture is presented. Additionally, a simulation framework
    for SSB signals is introduced and utilized for training the NNs. The CFO estimator
    is combined with a speech enhancement network to investigate its influence on
    the enhancement performance. The NN-based system is compared to a recently proposed
    pitch tracking based approach on publicly available data from real high frequency
    transmissions. Experiments show that the NN exhibits good CFO estimation properties
    and results in significant improvements in speech intelligibility, especially
    when combined with a noise reduction network."
author:
- first_name: Jens
  full_name: Heitkämper, Jens
  id: '27643'
  last_name: Heitkämper
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Heitkämper J, Schmalenstroeer J, Haeb-Umbach R. Neural Network Based Carrier
    Frequency Offset Estimation From Speech Transmitted Over High Frequency Channels.
    In: <i>Proceedings of the 30th European Signal Processing Conference (EUSIPCO)</i>.'
  apa: Heitkämper, J., Schmalenstroeer, J., &#38; Haeb-Umbach, R. (n.d.). Neural Network
    Based Carrier Frequency Offset Estimation From Speech Transmitted Over High Frequency
    Channels. <i>Proceedings of the 30th European Signal Processing Conference (EUSIPCO)</i>.
    30th European Signal Processing Conference (EUSIPCO), Belgrad.
  bibtex: '@inproceedings{Heitkämper_Schmalenstroeer_Haeb-Umbach, place={Belgrad},
    title={Neural Network Based Carrier Frequency Offset Estimation From Speech Transmitted
    Over High Frequency Channels}, booktitle={Proceedings of the 30th European Signal
    Processing Conference (EUSIPCO)}, author={Heitkämper, Jens and Schmalenstroeer,
    Joerg and Haeb-Umbach, Reinhold} }'
  chicago: Heitkämper, Jens, Joerg Schmalenstroeer, and Reinhold Haeb-Umbach. “Neural
    Network Based Carrier Frequency Offset Estimation From Speech Transmitted Over
    High Frequency Channels.” In <i>Proceedings of the 30th European Signal Processing
    Conference (EUSIPCO)</i>. Belgrad, n.d.
  ieee: J. Heitkämper, J. Schmalenstroeer, and R. Haeb-Umbach, “Neural Network Based
    Carrier Frequency Offset Estimation From Speech Transmitted Over High Frequency
    Channels,” presented at the 30th European Signal Processing Conference (EUSIPCO),
    Belgrad.
  mla: Heitkämper, Jens, et al. “Neural Network Based Carrier Frequency Offset Estimation
    From Speech Transmitted Over High Frequency Channels.” <i>Proceedings of the 30th
    European Signal Processing Conference (EUSIPCO)</i>.
  short: 'J. Heitkämper, J. Schmalenstroeer, R. Haeb-Umbach, in: Proceedings of the
    30th European Signal Processing Conference (EUSIPCO), Belgrad, n.d.'
conference:
  end_date: 2022-09-02
  location: Belgrad
  name: 30th European Signal Processing Conference (EUSIPCO)
  start_date: 2022-08-29
date_created: 2022-09-22T10:56:13Z
date_updated: 2023-10-26T08:15:57Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: closed
  content_type: application/pdf
  creator: jensheit
  date_created: 2022-09-22T10:48:31Z
  date_updated: 2022-09-22T10:48:31Z
  file_id: '33472'
  file_name: cfo.pdf
  file_size: 1231379
  relation: main_file
  success: 1
file_date_updated: 2022-09-22T10:48:31Z
has_accepted_license: '1'
language:
- iso: eng
place: Belgrad
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: Proceedings of the 30th European Signal Processing Conference (EUSIPCO)
publication_status: accepted
quality_controlled: '1'
status: public
title: Neural Network Based Carrier Frequency Offset Estimation From Speech Transmitted
  Over High Frequency Channels
type: conference
user_id: '460'
year: '2022'
...
---
_id: '33847'
abstract:
- lang: eng
  text: "The scope of speech enhancement has changed from a monolithic view of single,\r\nindependent
    tasks, to a joint processing of complex conversational speech\r\nrecordings. Training
    and evaluation of these single tasks requires synthetic\r\ndata with access to
    intermediate signals that is as close as possible to the\r\nevaluation scenario.
    As such data often is not available, many works instead\r\nuse specialized databases
    for the training of each system component, e.g\r\nWSJ0-mix for source separation.
    We present a Multi-purpose Multi-Speaker\r\nMixture Signal Generator (MMS-MSG)
    for generating a variety of speech mixture\r\nsignals based on any speech corpus,
    ranging from classical anechoic mixtures\r\n(e.g., WSJ0-mix) over reverberant
    mixtures (e.g., SMS-WSJ) to meeting-style\r\ndata. Its highly modular and flexible
    structure allows for the simulation of\r\ndiverse environments and dynamic mixing,
    while simultaneously enabling an easy\r\nextension and modification to generate
    new scenarios and mixture types. These\r\nmeetings can be used for prototyping,
    evaluation, or training purposes. We\r\nprovide example evaluation data and baseline
    results for meetings based on the\r\nWSJ corpus. Further, we demonstrate the usefulness
    for realistic scenarios by\r\nusing MMS-MSG to provide training data for the LibriCSS
    database."
author:
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Cord-Landwehr T, von Neumann T, Boeddeker C, Haeb-Umbach R. MMS-MSG: A Multi-purpose
    Multi-Speaker Mixture Signal Generator. In: <i>2022 International Workshop on
    Acoustic Signal Enhancement (IWAENC)</i>. ; 2022.'
  apa: 'Cord-Landwehr, T., von Neumann, T., Boeddeker, C., &#38; Haeb-Umbach, R. (2022).
    MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator. <i>2022 International
    Workshop on Acoustic Signal Enhancement (IWAENC)</i>. 2022 International Workshop
    on Acoustic Signal Enhancement (IWAENC), Bamberg.'
  bibtex: '@inproceedings{Cord-Landwehr_von Neumann_Boeddeker_Haeb-Umbach_2022, title={MMS-MSG:
    A Multi-purpose Multi-Speaker Mixture Signal Generator}, booktitle={2022 International
    Workshop on Acoustic Signal Enhancement (IWAENC)}, author={Cord-Landwehr, Tobias
    and von Neumann, Thilo and Boeddeker, Christoph and Haeb-Umbach, Reinhold}, year={2022}
    }'
  chicago: 'Cord-Landwehr, Tobias, Thilo von Neumann, Christoph Boeddeker, and Reinhold
    Haeb-Umbach. “MMS-MSG: A Multi-Purpose Multi-Speaker Mixture Signal Generator.”
    In <i>2022 International Workshop on Acoustic Signal Enhancement (IWAENC)</i>,
    2022.'
  ieee: 'T. Cord-Landwehr, T. von Neumann, C. Boeddeker, and R. Haeb-Umbach, “MMS-MSG:
    A Multi-purpose Multi-Speaker Mixture Signal Generator,” presented at the 2022
    International Workshop on Acoustic Signal Enhancement (IWAENC), Bamberg, 2022.'
  mla: 'Cord-Landwehr, Tobias, et al. “MMS-MSG: A Multi-Purpose Multi-Speaker Mixture
    Signal Generator.” <i>2022 International Workshop on Acoustic Signal Enhancement
    (IWAENC)</i>, 2022.'
  short: 'T. Cord-Landwehr, T. von Neumann, C. Boeddeker, R. Haeb-Umbach, in: 2022
    International Workshop on Acoustic Signal Enhancement (IWAENC), 2022.'
conference:
  location: Bamberg
  name: 2022 International Workshop on Acoustic Signal Enhancement (IWAENC)
date_created: 2022-10-20T14:02:14Z
date_updated: 2023-11-15T14:55:14Z
ddc:
- '000'
department:
- _id: '54'
external_id:
  arxiv:
  - '2209.11494'
file:
- access_level: open_access
  content_type: application/pdf
  creator: cord
  date_created: 2023-11-15T14:54:56Z
  date_updated: 2023-11-15T14:54:56Z
  file_id: '48931'
  file_name: mms_msg_camera_ready.pdf
  file_size: 177975
  relation: main_file
file_date_updated: 2023-11-15T14:54:56Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: 2022 International Workshop on Acoustic Signal Enhancement (IWAENC)
quality_controlled: '1'
status: public
title: 'MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator'
type: conference
user_id: '44393'
year: '2022'
...
---
_id: '33807'
author:
- first_name: Tobias
  full_name: Gburrek, Tobias
  id: '44006'
  last_name: Gburrek
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Gburrek T, Schmalenstroeer J, Haeb-Umbach R. On Synchronization of Wireless
    Acoustic Sensor Networks in the Presence of Time-Varying Sampling Rate Offsets
    and Speaker Changes. In: <i>ICASSP 2022 - 2022 IEEE International Conference on
    Acoustics, Speech and Signal Processing (ICASSP)</i>. IEEE; 2022. doi:<a href="https://doi.org/10.1109/icassp43922.2022.9746284">10.1109/icassp43922.2022.9746284</a>'
  apa: Gburrek, T., Schmalenstroeer, J., &#38; Haeb-Umbach, R. (2022). On Synchronization
    of Wireless Acoustic Sensor Networks in the Presence of Time-Varying Sampling
    Rate Offsets and Speaker Changes. <i>ICASSP 2022 - 2022 IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP)</i>. <a href="https://doi.org/10.1109/icassp43922.2022.9746284">https://doi.org/10.1109/icassp43922.2022.9746284</a>
  bibtex: '@inproceedings{Gburrek_Schmalenstroeer_Haeb-Umbach_2022, title={On Synchronization
    of Wireless Acoustic Sensor Networks in the Presence of Time-Varying Sampling
    Rate Offsets and Speaker Changes}, DOI={<a href="https://doi.org/10.1109/icassp43922.2022.9746284">10.1109/icassp43922.2022.9746284</a>},
    booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)}, publisher={IEEE}, author={Gburrek, Tobias and
    Schmalenstroeer, Joerg and Haeb-Umbach, Reinhold}, year={2022} }'
  chicago: Gburrek, Tobias, Joerg Schmalenstroeer, and Reinhold Haeb-Umbach. “On Synchronization
    of Wireless Acoustic Sensor Networks in the Presence of Time-Varying Sampling
    Rate Offsets and Speaker Changes.” In <i>ICASSP 2022 - 2022 IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>. IEEE, 2022.
    <a href="https://doi.org/10.1109/icassp43922.2022.9746284">https://doi.org/10.1109/icassp43922.2022.9746284</a>.
  ieee: 'T. Gburrek, J. Schmalenstroeer, and R. Haeb-Umbach, “On Synchronization of
    Wireless Acoustic Sensor Networks in the Presence of Time-Varying Sampling Rate
    Offsets and Speaker Changes,” 2022, doi: <a href="https://doi.org/10.1109/icassp43922.2022.9746284">10.1109/icassp43922.2022.9746284</a>.'
  mla: Gburrek, Tobias, et al. “On Synchronization of Wireless Acoustic Sensor Networks
    in the Presence of Time-Varying Sampling Rate Offsets and Speaker Changes.” <i>ICASSP
    2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)</i>, IEEE, 2022, doi:<a href="https://doi.org/10.1109/icassp43922.2022.9746284">10.1109/icassp43922.2022.9746284</a>.
  short: 'T. Gburrek, J. Schmalenstroeer, R. Haeb-Umbach, in: ICASSP 2022 - 2022 IEEE
    International Conference on Acoustics, Speech and Signal Processing (ICASSP),
    IEEE, 2022.'
date_created: 2022-10-18T09:25:51Z
date_updated: 2023-11-17T06:39:28Z
ddc:
- '004'
department:
- _id: '54'
doi: 10.1109/icassp43922.2022.9746284
file:
- access_level: open_access
  content_type: application/pdf
  creator: tgburrek
  date_created: 2023-11-17T06:39:04Z
  date_updated: 2023-11-17T06:39:04Z
  file_id: '48990'
  file_name: gburrek_icassp22.pdf
  file_size: 358015
  relation: main_file
file_date_updated: 2023-11-17T06:39:04Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
publication: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
publication_status: published
publisher: IEEE
quality_controlled: '1'
status: public
title: On Synchronization of Wireless Acoustic Sensor Networks in the Presence of
  Time-Varying Sampling Rate Offsets and Speaker Changes
type: conference
user_id: '44006'
year: '2022'
...
---
_id: '33451'
abstract:
- lang: eng
  text: "We present an approach to automatically generate semantic labels for real
    recordings of automotive range-Doppler (RD) radar spectra. Such labels are required
    when training a neural network for object recognition from radar data. The automatic
    labeling approach rests on the simultaneous recording of camera and lidar data
    in addition to the radar spectrum. By warping radar spectra into the camera image,
    state-of-the-art object recognition algorithms can be applied to label relevant
    objects, such as cars, in the camera image. The warping operation is designed
    to be fully differentiable, which allows backpropagating the gradient computed
    on the camera image through the warping operation to the neural network operating
    on the radar data. As the warping operation relies on accurate scene flow estimation,
    we further propose a novel scene flow estimation algorithm which exploits information
    from camera, lidar and radar sensors. The\r\nproposed scene flow estimation approach
    is compared against a state-of-the-art scene flow algorithm, and it outperforms
    it by approximately 30% w.r.t. mean average error. The feasibility of the overall
    framework for automatic label generation for\r\nRD spectra is verified by evaluating
    the performance of neural networks trained with the proposed framework for Direction-of-Arrival
    estimation."
author:
- first_name: Christopher
  full_name: Grimm, Christopher
  last_name: Grimm
- first_name: Tai
  full_name: Fei, Tai
  last_name: Fei
- first_name: Ernst
  full_name: Warsitz, Ernst
  last_name: Warsitz
- first_name: Ridha
  full_name: Farhoud, Ridha
  last_name: Farhoud
- first_name: Tobias
  full_name: Breddermann, Tobias
  last_name: Breddermann
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Grimm C, Fei T, Warsitz E, Farhoud R, Breddermann T, Haeb-Umbach R. Warping
    of Radar Data Into Camera Image for Cross-Modal Supervision in Automotive Applications.
    <i>IEEE Transactions on Vehicular Technology</i>. 2022;71(9):9435-9449. doi:<a
    href="https://doi.org/10.1109/TVT.2022.3182411">10.1109/TVT.2022.3182411</a>
  apa: Grimm, C., Fei, T., Warsitz, E., Farhoud, R., Breddermann, T., &#38; Haeb-Umbach,
    R. (2022). Warping of Radar Data Into Camera Image for Cross-Modal Supervision
    in Automotive Applications. <i>IEEE Transactions on Vehicular Technology</i>,
    <i>71</i>(9), 9435–9449. <a href="https://doi.org/10.1109/TVT.2022.3182411">https://doi.org/10.1109/TVT.2022.3182411</a>
  bibtex: '@article{Grimm_Fei_Warsitz_Farhoud_Breddermann_Haeb-Umbach_2022, title={Warping
    of Radar Data Into Camera Image for Cross-Modal Supervision in Automotive Applications},
    volume={71}, DOI={<a href="https://doi.org/10.1109/TVT.2022.3182411">10.1109/TVT.2022.3182411</a>},
    number={9}, journal={IEEE Transactions on Vehicular Technology}, author={Grimm,
    Christopher and Fei, Tai and Warsitz, Ernst and Farhoud, Ridha and Breddermann,
    Tobias and Haeb-Umbach, Reinhold}, year={2022}, pages={9435–9449} }'
  chicago: 'Grimm, Christopher, Tai Fei, Ernst Warsitz, Ridha Farhoud, Tobias Breddermann,
    and Reinhold Haeb-Umbach. “Warping of Radar Data Into Camera Image for Cross-Modal
    Supervision in Automotive Applications.” <i>IEEE Transactions on Vehicular Technology</i>
    71, no. 9 (2022): 9435–49. <a href="https://doi.org/10.1109/TVT.2022.3182411">https://doi.org/10.1109/TVT.2022.3182411</a>.'
  ieee: 'C. Grimm, T. Fei, E. Warsitz, R. Farhoud, T. Breddermann, and R. Haeb-Umbach,
    “Warping of Radar Data Into Camera Image for Cross-Modal Supervision in Automotive
    Applications,” <i>IEEE Transactions on Vehicular Technology</i>, vol. 71, no.
    9, pp. 9435–9449, 2022, doi: <a href="https://doi.org/10.1109/TVT.2022.3182411">10.1109/TVT.2022.3182411</a>.'
  mla: Grimm, Christopher, et al. “Warping of Radar Data Into Camera Image for Cross-Modal
    Supervision in Automotive Applications.” <i>IEEE Transactions on Vehicular Technology</i>,
    vol. 71, no. 9, 2022, pp. 9435–49, doi:<a href="https://doi.org/10.1109/TVT.2022.3182411">10.1109/TVT.2022.3182411</a>.
  short: C. Grimm, T. Fei, E. Warsitz, R. Farhoud, T. Breddermann, R. Haeb-Umbach,
    IEEE Transactions on Vehicular Technology 71 (2022) 9435–9449.
date_created: 2022-09-21T07:26:19Z
date_updated: 2023-11-20T16:37:16Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/TVT.2022.3182411
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2022-09-22T07:00:29Z
  date_updated: 2022-09-22T07:00:29Z
  file_id: '33460'
  file_name: T-VT_AcceptedVersion.pdf
  file_size: 12117870
  relation: main_file
file_date_updated: 2022-09-22T07:00:29Z
has_accepted_license: '1'
intvolume: '        71'
issue: '9'
language:
- iso: eng
oa: '1'
page: 9435-9449
publication: IEEE Transactions on Vehicular Technology
quality_controlled: '1'
status: public
title: Warping of Radar Data Into Camera Image for Cross-Modal Supervision in Automotive
  Applications
type: journal_article
user_id: '242'
volume: 71
year: '2022'
...
---
_id: '33696'
author:
- first_name: Jana
  full_name: Wiechmann, Jana
  last_name: Wiechmann
- first_name: Thomas
  full_name: Glarner, Thomas
  last_name: Glarner
- first_name: Frederik
  full_name: Rautenberg, Frederik
  id: '72602'
  last_name: Rautenberg
- first_name: Petra
  full_name: Wagner, Petra
  last_name: Wagner
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Wiechmann J, Glarner T, Rautenberg F, Wagner P, Haeb-Umbach R. Technically
    enabled explaining of voice characteristics. In: <i>18. Phonetik Und Phonologie
    Im Deutschsprachigen Raum (P&#38;P)</i>. ; 2022.'
  apa: Wiechmann, J., Glarner, T., Rautenberg, F., Wagner, P., &#38; Haeb-Umbach,
    R. (2022). Technically enabled explaining of voice characteristics. <i>18. Phonetik
    Und Phonologie Im Deutschsprachigen Raum (P&#38;P)</i>.
  bibtex: '@inproceedings{Wiechmann_Glarner_Rautenberg_Wagner_Haeb-Umbach_2022, title={Technically
    enabled explaining of voice characteristics}, booktitle={18. Phonetik und Phonologie
    im deutschsprachigen Raum (P&#38;P)}, author={Wiechmann, Jana and Glarner, Thomas
    and Rautenberg, Frederik and Wagner, Petra and Haeb-Umbach, Reinhold}, year={2022}
    }'
  chicago: Wiechmann, Jana, Thomas Glarner, Frederik Rautenberg, Petra Wagner, and
    Reinhold Haeb-Umbach. “Technically Enabled Explaining of Voice Characteristics.”
    In <i>18. Phonetik Und Phonologie Im Deutschsprachigen Raum (P&#38;P)</i>, 2022.
  ieee: J. Wiechmann, T. Glarner, F. Rautenberg, P. Wagner, and R. Haeb-Umbach, “Technically
    enabled explaining of voice characteristics,” Bielefeld, 2022.
  mla: Wiechmann, Jana, et al. “Technically Enabled Explaining of Voice Characteristics.”
    <i>18. Phonetik Und Phonologie Im Deutschsprachigen Raum (P&#38;P)</i>, 2022.
  short: 'J. Wiechmann, T. Glarner, F. Rautenberg, P. Wagner, R. Haeb-Umbach, in:
    18. Phonetik Und Phonologie Im Deutschsprachigen Raum (P&#38;P), 2022.'
conference:
  end_date: 2022-10-07
  location: Bielefeld
  start_date: 2022-10-06
date_created: 2022-10-12T07:10:03Z
date_updated: 2023-11-22T13:45:30Z
ddc:
- '000'
department:
- _id: '54'
- _id: '660'
file:
- access_level: open_access
  content_type: application/pdf
  creator: huesera
  date_created: 2022-10-12T07:09:26Z
  date_updated: 2022-10-12T07:09:26Z
  file_id: '33697'
  file_name: PP_2022_paper_8911.pdf
  file_size: 109294
  relation: main_file
file_date_updated: 2022-10-12T07:09:26Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '129'
  grant_number: '438445824'
  name: 'TRR 318 - C06: TRR 318 - Technisch unterstütztes Erklären von Stimmcharakteristika
    (Teilprojekt C06)'
publication: 18. Phonetik und Phonologie im deutschsprachigen Raum (P&P)
status: public
title: Technically enabled explaining of voice characteristics
type: conference
user_id: '72602'
year: '2022'
...
---
_id: '33857'
author:
- first_name: Michael
  full_name: Kuhlmann, Michael
  id: '49871'
  last_name: Kuhlmann
- first_name: Fritz
  full_name: Seebauer, Fritz
  last_name: Seebauer
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Petra
  full_name: Wagner, Petra
  last_name: Wagner
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Kuhlmann M, Seebauer F, Ebbers J, Wagner P, Haeb-Umbach R. Investigation into
    Target Speaking Rate Adaptation for Voice Conversion. In: <i>Interspeech 2022</i>.
    ISCA; 2022. doi:<a href="https://doi.org/10.21437/interspeech.2022-10740">10.21437/interspeech.2022-10740</a>'
  apa: Kuhlmann, M., Seebauer, F., Ebbers, J., Wagner, P., &#38; Haeb-Umbach, R. (2022).
    Investigation into Target Speaking Rate Adaptation for Voice Conversion. <i>Interspeech
    2022</i>. <a href="https://doi.org/10.21437/interspeech.2022-10740">https://doi.org/10.21437/interspeech.2022-10740</a>
  bibtex: '@inproceedings{Kuhlmann_Seebauer_Ebbers_Wagner_Haeb-Umbach_2022, title={Investigation
    into Target Speaking Rate Adaptation for Voice Conversion}, DOI={<a href="https://doi.org/10.21437/interspeech.2022-10740">10.21437/interspeech.2022-10740</a>},
    booktitle={Interspeech 2022}, publisher={ISCA}, author={Kuhlmann, Michael and
    Seebauer, Fritz and Ebbers, Janek and Wagner, Petra and Haeb-Umbach, Reinhold},
    year={2022} }'
  chicago: Kuhlmann, Michael, Fritz Seebauer, Janek Ebbers, Petra Wagner, and Reinhold
    Haeb-Umbach. “Investigation into Target Speaking Rate Adaptation for Voice Conversion.”
    In <i>Interspeech 2022</i>. ISCA, 2022. <a href="https://doi.org/10.21437/interspeech.2022-10740">https://doi.org/10.21437/interspeech.2022-10740</a>.
  ieee: 'M. Kuhlmann, F. Seebauer, J. Ebbers, P. Wagner, and R. Haeb-Umbach, “Investigation
    into Target Speaking Rate Adaptation for Voice Conversion,” 2022, doi: <a href="https://doi.org/10.21437/interspeech.2022-10740">10.21437/interspeech.2022-10740</a>.'
  mla: Kuhlmann, Michael, et al. “Investigation into Target Speaking Rate Adaptation
    for Voice Conversion.” <i>Interspeech 2022</i>, ISCA, 2022, doi:<a href="https://doi.org/10.21437/interspeech.2022-10740">10.21437/interspeech.2022-10740</a>.
  short: 'M. Kuhlmann, F. Seebauer, J. Ebbers, P. Wagner, R. Haeb-Umbach, in: Interspeech
    2022, ISCA, 2022.'
date_created: 2022-10-21T06:50:59Z
date_updated: 2023-10-25T09:04:45Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.21437/interspeech.2022-10740
file:
- access_level: closed
  content_type: application/pdf
  creator: mikuhl
  date_created: 2023-07-15T16:16:12Z
  date_updated: 2023-07-15T16:16:12Z
  file_id: '46070'
  file_name: kuhlmann22_interspeech.pdf
  file_size: 303863
  relation: main_file
  success: 1
file_date_updated: 2023-07-15T16:16:12Z
has_accepted_license: '1'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://www.isca-speech.org/archive/pdfs/interspeech_2022/kuhlmann22_interspeech.pdf
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: Interspeech 2022
publication_status: published
publisher: ISCA
quality_controlled: '1'
status: public
title: Investigation into Target Speaking Rate Adaptation for Voice Conversion
type: conference
user_id: '34851'
year: '2022'
...
---
_id: '33808'
author:
- first_name: Tobias
  full_name: Gburrek, Tobias
  id: '44006'
  last_name: Gburrek
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Jens
  full_name: Heitkaemper, Jens
  id: '27643'
  last_name: Heitkaemper
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Gburrek T, Schmalenstroeer J, Heitkaemper J, Haeb-Umbach R. Informed vs. Blind
    Beamforming in Ad-Hoc Acoustic Sensor Networks for Meeting Transcription. In:
    <i>2022 International Workshop on Acoustic Signal Enhancement (IWAENC)</i>. IEEE;
    2022. doi:<a href="https://doi.org/10.1109/IWAENC53105.2022.9914772">10.1109/IWAENC53105.2022.9914772</a>'
  apa: Gburrek, T., Schmalenstroeer, J., Heitkaemper, J., &#38; Haeb-Umbach, R. (2022).
    Informed vs. Blind Beamforming in Ad-Hoc Acoustic Sensor Networks for Meeting
    Transcription. <i>2022 International Workshop on Acoustic Signal Enhancement (IWAENC)</i>.
    17th International Workshop on Acoustic Signal Enhancement (IWAENC 2022),  Bamberg,
    Germany . <a href="https://doi.org/10.1109/IWAENC53105.2022.9914772">https://doi.org/10.1109/IWAENC53105.2022.9914772</a>
  bibtex: '@inproceedings{Gburrek_Schmalenstroeer_Heitkaemper_Haeb-Umbach_2022, title={Informed
    vs. Blind Beamforming in Ad-Hoc Acoustic Sensor Networks for Meeting Transcription},
    DOI={<a href="https://doi.org/10.1109/IWAENC53105.2022.9914772">10.1109/IWAENC53105.2022.9914772</a>},
    booktitle={2022 International Workshop on Acoustic Signal Enhancement (IWAENC)},
    publisher={IEEE}, author={Gburrek, Tobias and Schmalenstroeer, Joerg and Heitkaemper,
    Jens and Haeb-Umbach, Reinhold}, year={2022} }'
  chicago: Gburrek, Tobias, Joerg Schmalenstroeer, Jens Heitkaemper, and Reinhold
    Haeb-Umbach. “Informed vs. Blind Beamforming in Ad-Hoc Acoustic Sensor Networks
    for Meeting Transcription.” In <i>2022 International Workshop on Acoustic Signal
    Enhancement (IWAENC)</i>. IEEE, 2022. <a href="https://doi.org/10.1109/IWAENC53105.2022.9914772">https://doi.org/10.1109/IWAENC53105.2022.9914772</a>.
  ieee: 'T. Gburrek, J. Schmalenstroeer, J. Heitkaemper, and R. Haeb-Umbach, “Informed
    vs. Blind Beamforming in Ad-Hoc Acoustic Sensor Networks for Meeting Transcription,”
    presented at the 17th International Workshop on Acoustic Signal Enhancement (IWAENC
    2022),  Bamberg, Germany , 2022, doi: <a href="https://doi.org/10.1109/IWAENC53105.2022.9914772">10.1109/IWAENC53105.2022.9914772</a>.'
  mla: Gburrek, Tobias, et al. “Informed vs. Blind Beamforming in Ad-Hoc Acoustic
    Sensor Networks for Meeting Transcription.” <i>2022 International Workshop on
    Acoustic Signal Enhancement (IWAENC)</i>, IEEE, 2022, doi:<a href="https://doi.org/10.1109/IWAENC53105.2022.9914772">10.1109/IWAENC53105.2022.9914772</a>.
  short: 'T. Gburrek, J. Schmalenstroeer, J. Heitkaemper, R. Haeb-Umbach, in: 2022
    International Workshop on Acoustic Signal Enhancement (IWAENC), IEEE, 2022.'
conference:
  end_date: 2022-09-08
  location: ' Bamberg, Germany '
  name: 17th International Workshop on Acoustic Signal Enhancement (IWAENC 2022)
  start_date: 2022-09-05
date_created: 2022-10-18T09:30:24Z
date_updated: 2023-11-17T06:40:58Z
ddc:
- '004'
department:
- _id: '54'
doi: 10.1109/IWAENC53105.2022.9914772
file:
- access_level: open_access
  content_type: application/pdf
  creator: tgburrek
  date_created: 2023-11-17T06:40:40Z
  date_updated: 2023-11-17T06:40:40Z
  file_id: '48991'
  file_name: iwaenc_22_camera_ready_ieee_check.pdf
  file_size: 266475
  relation: main_file
file_date_updated: 2023-11-17T06:40:40Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: 2022 International Workshop on Acoustic Signal Enhancement (IWAENC)
publisher: IEEE
quality_controlled: '1'
status: public
title: Informed vs. Blind Beamforming in Ad-Hoc Acoustic Sensor Networks for Meeting
  Transcription
type: conference
user_id: '44006'
year: '2022'
...
---
_id: '34072'
abstract:
- lang: eng
  text: "Performing an adequate evaluation of sound event detection (SED) systems
    is far from trivial and is still subject to ongoing research. The recently proposed
    polyphonic sound detection (PSD)-receiver operating characteristic (ROC) and PSD
    score (PSDS) make an important step into the direction of an evaluation of SED
    systems which is independent from a certain decision threshold. This allows to
    obtain a more complete picture of the overall system behavior which is less biased
    by threshold tuning. Yet, the PSD-ROC is currently only approximated using a finite
    set of thresholds. The choice of\r\nthe thresholds used in approximation, however,
    can have a severe impact on the resulting PSDS. In this paper we propose a method
    which allows for computing system performance on an evaluation set for all possible
    thresholds jointly, enabling accurate computation not only of the PSD-ROC and
    PSDS but also of other collar-based\r\nand intersection-based performance curves.
    It further allows to select the threshold which best fulfills the requirements
    of a given application. Source code is publicly available in our SED evaluation
    package sed_scores_eval."
author:
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
- first_name: Romain
  full_name: Serizel, Romain
  last_name: Serizel
citation:
  ama: 'Ebbers J, Haeb-Umbach R, Serizel R. Threshold Independent Evaluation of Sound
    Event Detection Scores. In: <i>Proceedings of the IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP)</i>. ; 2022.'
  apa: Ebbers, J., Haeb-Umbach, R., &#38; Serizel, R. (2022). Threshold Independent
    Evaluation of Sound Event Detection Scores. <i>Proceedings of the IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>.
  bibtex: '@inproceedings{Ebbers_Haeb-Umbach_Serizel_2022, title={Threshold Independent
    Evaluation of Sound Event Detection Scores}, booktitle={Proceedings of the IEEE
    International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    author={Ebbers, Janek and Haeb-Umbach, Reinhold and Serizel, Romain}, year={2022}
    }'
  chicago: Ebbers, Janek, Reinhold Haeb-Umbach, and Romain Serizel. “Threshold Independent
    Evaluation of Sound Event Detection Scores.” In <i>Proceedings of the IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, 2022.
  ieee: J. Ebbers, R. Haeb-Umbach, and R. Serizel, “Threshold Independent Evaluation
    of Sound Event Detection Scores,” 2022.
  mla: Ebbers, Janek, et al. “Threshold Independent Evaluation of Sound Event Detection
    Scores.” <i>Proceedings of the IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)</i>, 2022.
  short: 'J. Ebbers, R. Haeb-Umbach, R. Serizel, in: Proceedings of the IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.'
date_created: 2022-11-14T12:17:03Z
date_updated: 2023-11-22T08:26:58Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
  content_type: application/pdf
  creator: ebbers
  date_created: 2022-11-14T12:19:55Z
  date_updated: 2022-11-14T12:19:55Z
  file_id: '34073'
  file_name: Template.pdf
  file_size: 214001
  relation: main_file
file_date_updated: 2022-11-14T12:19:55Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
publication: Proceedings of the IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
quality_controlled: '1'
status: public
title: Threshold Independent Evaluation of Sound Event Detection Scores
type: conference
user_id: '34851'
year: '2022'
...
---
_id: '49113'
abstract:
- lang: eng
  text: 'In this report we present our system for the Detection and Classification
    of Acoustic Scenes and Events (DCASE) 2022 Challenge Task 4: Sound Event Detection
    in Domestic Environments 1 . As in previous editions of the Challenge, we use
    forward-backward convolutional recurrent neural networks (FBCRNNs) [1, 2] for
    weakly labeled and semi-supervised sound event detection (SED) and eventually
    generate strong pseudo labels for weakly labeled and unlabeled data. Then, (tag-conditioned)
    bidirectional CRNNs (Bi-CRNNs) [1, 2] are trained in a strongly supervised manner
    as our final SED models. In each of the training stages we use multiple iterations
    of self-training. Compared to previous editions, we improved our system performance
    by 1) some tweaks regarding data augmentation, pseudo labeling and inference 2)
    using weakly labeled AudioSet data [3] for pretraining larger networks and 3)
    augmenting the DESED data [4] with strongly labeled AudioSet data [5] for finetuning
    of the networks. Source code is publicly available at https://github.com/fgnt/pb_sed.'
author:
- first_name: Janek
  full_name: Ebbers, Janek
  id: '34851'
  last_name: Ebbers
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Ebbers J, Haeb-Umbach R. <i>Pre-Training And Self-Training For Sound Event
    Detection In Domestic Environments</i>.; 2022.
  apa: Ebbers, J., &#38; Haeb-Umbach, R. (2022). <i>Pre-Training And Self-Training
    For Sound Event Detection In Domestic Environments</i>.
  bibtex: '@book{Ebbers_Haeb-Umbach_2022, title={Pre-Training And Self-Training For
    Sound Event Detection In Domestic Environments}, author={Ebbers, Janek and Haeb-Umbach,
    Reinhold}, year={2022} }'
  chicago: Ebbers, Janek, and Reinhold Haeb-Umbach. <i>Pre-Training And Self-Training
    For Sound Event Detection In Domestic Environments</i>, 2022.
  ieee: J. Ebbers and R. Haeb-Umbach, <i>Pre-Training And Self-Training For Sound
    Event Detection In Domestic Environments</i>. 2022.
  mla: Ebbers, Janek, and Reinhold Haeb-Umbach. <i>Pre-Training And Self-Training
    For Sound Event Detection In Domestic Environments</i>. 2022.
  short: J. Ebbers, R. Haeb-Umbach, Pre-Training And Self-Training For Sound Event
    Detection In Domestic Environments, 2022.
date_created: 2023-11-22T08:34:23Z
date_updated: 2024-11-15T20:34:52Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: closed
  content_type: application/pdf
  creator: ebbers
  date_created: 2023-11-22T08:35:23Z
  date_updated: 2023-11-22T08:35:23Z
  file_id: '49114'
  file_name: dcase2022_tech_report_ebbers.pdf
  file_size: 491650
  relation: main_file
  success: 1
file_date_updated: 2023-11-22T08:35:23Z
has_accepted_license: '1'
language:
- iso: eng
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
status: public
title: Pre-Training And Self-Training For Sound Event Detection In Domestic Environments
type: report
user_id: '34851'
year: '2022'
...
---
_id: '33848'
abstract:
- lang: eng
  text: "Impressive progress in neural network-based single-channel speech source\r\nseparation
    has been made in recent years. But those improvements have been\r\nmostly reported
    on anechoic data, a situation that is hardly met in practice.\r\nTaking the SepFormer
    as a starting point, which achieves state-of-the-art\r\nperformance on anechoic
    mixtures, we gradually modify it to optimize its\r\nperformance on reverberant
    mixtures. Although this leads to a word error rate\r\nimprovement by 7 percentage
    points compared to the standard SepFormer\r\nimplementation, the system ends up
    with only marginally better performance than\r\na PIT-BLSTM separation system,
    that is optimized with rather straightforward\r\nmeans. This is surprising and
    at the same time sobering, challenging the\r\npractical usefulness of many improvements
    reported in recent years for monaural\r\nsource separation on nonreverberant data."
author:
- first_name: Tobias
  full_name: Cord-Landwehr, Tobias
  id: '44393'
  last_name: Cord-Landwehr
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Catalin
  full_name: Zorila, Catalin
  last_name: Zorila
- first_name: Rama
  full_name: Doddipatla, Rama
  last_name: Doddipatla
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'Cord-Landwehr T, Boeddeker C, von Neumann T, Zorila C, Doddipatla R, Haeb-Umbach
    R. Monaural source separation: From anechoic to reverberant environments. In:
    <i>2022 International Workshop on Acoustic Signal Enhancement (IWAENC)</i>. IEEE;
    2022.'
  apa: 'Cord-Landwehr, T., Boeddeker, C., von Neumann, T., Zorila, C., Doddipatla,
    R., &#38; Haeb-Umbach, R. (2022). Monaural source separation: From anechoic to
    reverberant environments. <i>2022 International Workshop on Acoustic Signal Enhancement
    (IWAENC)</i>. 2022 International Workshop on Acoustic Signal Enhancement (IWAENC).'
  bibtex: '@inproceedings{Cord-Landwehr_Boeddeker_von Neumann_Zorila_Doddipatla_Haeb-Umbach_2022,
    place={Bamberg}, title={Monaural source separation: From anechoic to reverberant
    environments}, booktitle={2022 International Workshop on Acoustic Signal Enhancement
    (IWAENC)}, publisher={IEEE}, author={Cord-Landwehr, Tobias and Boeddeker, Christoph
    and von Neumann, Thilo and Zorila, Catalin and Doddipatla, Rama and Haeb-Umbach,
    Reinhold}, year={2022} }'
  chicago: 'Cord-Landwehr, Tobias, Christoph Boeddeker, Thilo von Neumann, Catalin
    Zorila, Rama Doddipatla, and Reinhold Haeb-Umbach. “Monaural Source Separation:
    From Anechoic to Reverberant Environments.” In <i>2022 International Workshop
    on Acoustic Signal Enhancement (IWAENC)</i>. Bamberg: IEEE, 2022.'
  ieee: 'T. Cord-Landwehr, C. Boeddeker, T. von Neumann, C. Zorila, R. Doddipatla,
    and R. Haeb-Umbach, “Monaural source separation: From anechoic to reverberant
    environments,” presented at the 2022 International Workshop on Acoustic Signal
    Enhancement (IWAENC), 2022.'
  mla: 'Cord-Landwehr, Tobias, et al. “Monaural Source Separation: From Anechoic to
    Reverberant Environments.” <i>2022 International Workshop on Acoustic Signal Enhancement
    (IWAENC)</i>, IEEE, 2022.'
  short: 'T. Cord-Landwehr, C. Boeddeker, T. von Neumann, C. Zorila, R. Doddipatla,
    R. Haeb-Umbach, in: 2022 International Workshop on Acoustic Signal Enhancement
    (IWAENC), IEEE, Bamberg, 2022.'
conference:
  name: 2022 International Workshop on Acoustic Signal Enhancement (IWAENC)
date_created: 2022-10-20T14:07:28Z
date_updated: 2025-02-12T09:05:25Z
ddc:
- '000'
department:
- _id: '54'
external_id:
  arxiv:
  - '2111.07578'
file:
- access_level: open_access
  content_type: application/pdf
  creator: cord
  date_created: 2023-11-15T14:52:16Z
  date_updated: 2023-11-15T14:52:16Z
  file_id: '48930'
  file_name: monaural_source_separation.pdf
  file_size: 212890
  relation: main_file
file_date_updated: 2023-11-15T14:52:16Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
place: Bamberg
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: 2022 International Workshop on Acoustic Signal Enhancement (IWAENC)
publisher: IEEE
status: public
title: 'Monaural source separation: From anechoic to reverberant environments'
type: conference
user_id: '40767'
year: '2022'
...
---
_id: '33819'
author:
- first_name: Thilo
  full_name: von Neumann, Thilo
  id: '49870'
  last_name: von Neumann
  orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Keisuke
  full_name: Kinoshita, Keisuke
  last_name: Kinoshita
- first_name: Christoph
  full_name: Boeddeker, Christoph
  id: '40767'
  last_name: Boeddeker
- first_name: Marc
  full_name: Delcroix, Marc
  last_name: Delcroix
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: 'von Neumann T, Kinoshita K, Boeddeker C, Delcroix M, Haeb-Umbach R. SA-SDR:
    A Novel Loss Function for Separation of Meeting Style Data. In: <i>ICASSP 2022
    - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP)</i>. IEEE; 2022. doi:<a href="https://doi.org/10.1109/icassp43922.2022.9746757">10.1109/icassp43922.2022.9746757</a>'
  apa: 'von Neumann, T., Kinoshita, K., Boeddeker, C., Delcroix, M., &#38; Haeb-Umbach,
    R. (2022). SA-SDR: A Novel Loss Function for Separation of Meeting Style Data.
    <i>ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal
    Processing (ICASSP)</i>. <a href="https://doi.org/10.1109/icassp43922.2022.9746757">https://doi.org/10.1109/icassp43922.2022.9746757</a>'
  bibtex: '@inproceedings{von Neumann_Kinoshita_Boeddeker_Delcroix_Haeb-Umbach_2022,
    title={SA-SDR: A Novel Loss Function for Separation of Meeting Style Data}, DOI={<a
    href="https://doi.org/10.1109/icassp43922.2022.9746757">10.1109/icassp43922.2022.9746757</a>},
    booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech
    and Signal Processing (ICASSP)}, publisher={IEEE}, author={von Neumann, Thilo
    and Kinoshita, Keisuke and Boeddeker, Christoph and Delcroix, Marc and Haeb-Umbach,
    Reinhold}, year={2022} }'
  chicago: 'Neumann, Thilo von, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix,
    and Reinhold Haeb-Umbach. “SA-SDR: A Novel Loss Function for Separation of Meeting
    Style Data.” In <i>ICASSP 2022 - 2022 IEEE International Conference on Acoustics,
    Speech and Signal Processing (ICASSP)</i>. IEEE, 2022. <a href="https://doi.org/10.1109/icassp43922.2022.9746757">https://doi.org/10.1109/icassp43922.2022.9746757</a>.'
  ieee: 'T. von Neumann, K. Kinoshita, C. Boeddeker, M. Delcroix, and R. Haeb-Umbach,
    “SA-SDR: A Novel Loss Function for Separation of Meeting Style Data,” 2022, doi:
    <a href="https://doi.org/10.1109/icassp43922.2022.9746757">10.1109/icassp43922.2022.9746757</a>.'
  mla: 'von Neumann, Thilo, et al. “SA-SDR: A Novel Loss Function for Separation of
    Meeting Style Data.” <i>ICASSP 2022 - 2022 IEEE International Conference on Acoustics,
    Speech and Signal Processing (ICASSP)</i>, IEEE, 2022, doi:<a href="https://doi.org/10.1109/icassp43922.2022.9746757">10.1109/icassp43922.2022.9746757</a>.'
  short: 'T. von Neumann, K. Kinoshita, C. Boeddeker, M. Delcroix, R. Haeb-Umbach,
    in: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and
    Signal Processing (ICASSP), IEEE, 2022.'
date_created: 2022-10-20T05:29:12Z
date_updated: 2025-02-12T09:08:14Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.1109/icassp43922.2022.9746757
file:
- access_level: open_access
  content_type: application/pdf
  creator: tvn
  date_created: 2022-10-20T05:33:10Z
  date_updated: 2022-10-20T05:33:10Z
  file_id: '33820'
  file_name: main.pdf
  file_size: 228069
  relation: main_file
- access_level: open_access
  content_type: application/pdf
  creator: tvn
  date_created: 2022-10-20T05:35:32Z
  date_updated: 2022-10-20T05:35:32Z
  file_id: '33821'
  file_name: poster.pdf
  file_size: 229166
  relation: poster
file_date_updated: 2022-10-20T05:35:32Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
- _id: '508'
  grant_number: '448568305'
  name: Automatische Transkription von Gesprächssituationen
publication: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech
  and Signal Processing (ICASSP)
publication_status: published
publisher: IEEE
quality_controlled: '1'
related_material:
  link:
  - relation: supplementary_material
    url: https://github.com/fgnt/graph_pit
status: public
title: 'SA-SDR: A Novel Loss Function for Separation of Meeting Style Data'
type: conference
user_id: '40767'
year: '2022'
...
