---
_id: '62163'
abstract:
- lang: eng
  text: Zero-shot classifiers based on Contrastive Language-Audio Pretraining (CLAP)
    models enable classification of given audio into classes defined at test time
    using text. These models are costly to run with respect to computation and memory
    requirements. In this work, we propose to build a specialized low-resource classifier
    for classes pre-defined using text, using a two-stage procedure consisting of
    zero-shot data set pruning and model compression. First, relevant in-domain data
    is selected from a source dataset using class label embeddings obtained from a
    pre-trained CLAP model. This data is then used to distill the audio encoder of
    a CLAP model. The proposed compression method produces compact audio encoders
    with slightly reduced accuracy. Note that neither labeled nor unlabeled in-domain
    audio data is required for its development. We verify by cross-dataset tests that
    the resulting classifiers are indeed specialized to their task.
author:
- first_name: Alexander
  full_name: Werning, Alexander
  id: '62152'
  last_name: Werning
- first_name: Reinhold
  full_name: Häb-Umbach, Reinhold
  id: '242'
  last_name: Häb-Umbach
citation:
  ama: 'Werning A, Häb-Umbach R. A Fully Zero-Shot Approach to Obtaining Specialized
    and Compact Audio Tagging Models. In: Möller S, Gerkmann T, Kolossa D, eds. <i>Proceedings
    of the 16th ITG Conference on Speech Communication</i>. ; 2025:76-80.'
  apa: Werning, A., &#38; Häb-Umbach, R. (2025). A Fully Zero-Shot Approach to Obtaining
    Specialized and Compact Audio Tagging Models. In S. Möller, T. Gerkmann, &#38;
    D. Kolossa (Eds.), <i>Proceedings of the 16th ITG Conference on Speech Communication</i>
    (pp. 76–80).
  bibtex: '@inproceedings{Werning_Häb-Umbach_2025, place={Berlin}, title={A Fully
    Zero-Shot Approach to Obtaining Specialized and Compact Audio Tagging Models},
    booktitle={Proceedings of the 16th ITG Conference on Speech Communication}, author={Werning,
    Alexander and Häb-Umbach, Reinhold}, editor={Möller, Sebastian and Gerkmann, Timo
    and Kolossa, Dorothea}, year={2025}, pages={76–80} }'
  chicago: Werning, Alexander, and Reinhold Häb-Umbach. “A Fully Zero-Shot Approach
    to Obtaining Specialized and Compact Audio Tagging Models.” In <i>Proceedings
    of the 16th ITG Conference on Speech Communication</i>, edited by Sebastian Möller,
    Timo Gerkmann, and Dorothea Kolossa, 76–80. Berlin, 2025.
  ieee: A. Werning and R. Häb-Umbach, “A Fully Zero-Shot Approach to Obtaining Specialized
    and Compact Audio Tagging Models,” in <i>Proceedings of the 16th ITG Conference
    on Speech Communication</i>, Berlin, 2025, pp. 76–80.
  mla: Werning, Alexander, and Reinhold Häb-Umbach. “A Fully Zero-Shot Approach to
    Obtaining Specialized and Compact Audio Tagging Models.” <i>Proceedings of the
    16th ITG Conference on Speech Communication</i>, edited by Sebastian Möller et
    al., 2025, pp. 76–80.
  short: 'A. Werning, R. Häb-Umbach, in: S. Möller, T. Gerkmann, D. Kolossa (Eds.),
    Proceedings of the 16th ITG Conference on Speech Communication, Berlin, 2025,
    pp. 76–80.'
conference:
  end_date: 2025-09-26
  location: Berlin
  name: 16th ITG Conference on Speech Communication
  start_date: 2025-09-24
date_created: 2025-11-11T11:46:42Z
date_updated: 2025-11-28T13:20:17Z
department:
- _id: '54'
editor:
- first_name: Sebastian
  full_name: Möller, Sebastian
  last_name: Möller
- first_name: Timo
  full_name: Gerkmann, Timo
  last_name: Gerkmann
- first_name: Dorothea
  full_name: Kolossa, Dorothea
  last_name: Kolossa
language:
- iso: eng
page: 76-80
place: Berlin
project:
- _id: '512'
  name: WestAI - AI Service Center West
publication: Proceedings of the 16th ITG Conference on Speech Communication
publication_identifier:
  unknown:
  - 978-3-8007-6617-8
publication_status: published
quality_controlled: '1'
status: public
title: A Fully Zero-Shot Approach to Obtaining Specialized and Compact Audio Tagging
  Models
type: conference
user_id: '62152'
year: '2025'
...