Self-Trained Audio Tagging and Sound Event Detection in Domestic Environments

J. Ebbers, R. Haeb-Umbach, in: Proceedings of the 6th Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021), Barcelona, Spain, 2021, pp. 226–230.

Download
OA template.pdf 239.46 KB
Conference Paper | English
Abstract
In this paper we present our system for the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 Challenge Task 4: Sound Event Detection and Separation in Domestic Environments, where it scored the fourth rank. Our presented solution is an advancement of our system used in the previous edition of the task.We use a forward-backward convolutional recurrent neural network (FBCRNN) for tagging and pseudo labeling followed by tag-conditioned sound event detection (SED) models which are trained using strong pseudo labels provided by the FBCRNN. Our advancement over our earlier model is threefold. First, we introduce a strong label loss in the objective of the FBCRNN to take advantage of the strongly labeled synthetic data during training. Second, we perform multiple iterations of self-training for both the FBCRNN and tag-conditioned SED models. Third, while we used only tag-conditioned CNNs as our SED model in the previous edition we here explore sophisticated tag-conditioned SED model architectures, namely, bidirectional CRNNs and bidirectional convolutional transformer neural networks (CTNNs), and combine them. With metric and class specific tuning of median filter lengths for post-processing, our final SED model, consisting of 6 submodels (2 of each architecture), achieves on the public evaluation set poly-phonic sound event detection scores (PSDS) of 0.455 for scenario 1 and 0.684 for scenario as well as a collar-based F1-score of 0.596 outperforming the baselines and our model from the previous edition by far. Source code is publicly available at https://github.com/fgnt/pb_sed.
Publishing Year
Proceedings Title
Proceedings of the 6th Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)
Page
226–230
LibreCat-ID

Cite this

Ebbers J, Haeb-Umbach R. Self-Trained Audio Tagging and Sound Event Detection in Domestic Environments. In: Proceedings of the 6th Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021). ; 2021:226–230.
Ebbers, J., & Haeb-Umbach, R. (2021). Self-Trained Audio Tagging and Sound Event Detection in Domestic Environments. Proceedings of the 6th Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021), 226–230.
@inproceedings{Ebbers_Haeb-Umbach_2021, place={Barcelona, Spain}, title={Self-Trained Audio Tagging and Sound Event Detection in Domestic Environments}, booktitle={Proceedings of the 6th Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)}, author={Ebbers, Janek and Haeb-Umbach, Reinhold}, year={2021}, pages={226–230} }
Ebbers, Janek, and Reinhold Haeb-Umbach. “Self-Trained Audio Tagging and Sound Event Detection in Domestic Environments.” In Proceedings of the 6th Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021), 226–230. Barcelona, Spain, 2021.
J. Ebbers and R. Haeb-Umbach, “Self-Trained Audio Tagging and Sound Event Detection in Domestic Environments,” in Proceedings of the 6th Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021), 2021, pp. 226–230.
Ebbers, Janek, and Reinhold Haeb-Umbach. “Self-Trained Audio Tagging and Sound Event Detection in Domestic Environments.” Proceedings of the 6th Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021), 2021, pp. 226–230.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
File Name
template.pdf 239.46 KB
Access Level
OA Open Access
Last Uploaded
2022-01-13T08:19:50Z


Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar
ISBN Search