---
res:
  bibo_abstract:
  - Zero-shot classifiers based on Contrastive Language-Audio Pretraining (CLAP) models
    enable classification of given audio into classes defined at test time using text.
    These models are costly to run with respect to computation and memory requirements.
    In this work, we propose to build a specialized low-resource classifier for classes
    pre-defined using text, using a two-stage procedure consisting of zero-shot data
    set pruning and model compression. First, relevant in-domain data is selected
    from a source dataset using class label embeddings obtained from a pre-trained
    CLAP model. This data is then used to distill the audio encoder of a CLAP model.
    The proposed compression method produces compact audio encoders with slightly
    reduced accuracy. Note that neither labeled nor unlabeled in-domain audio data
    is required for its development. We verify by cross-dataset tests that the resulting
    classifiers are indeed specialized to their task.@eng
  bibo_authorlist:
  - foaf_Person:
      foaf_givenName: Alexander
      foaf_name: Werning, Alexander
      foaf_surname: Werning
      foaf_workInfoHomepage: http://www.librecat.org/personId=62152
  - foaf_Person:
      foaf_givenName: Reinhold
      foaf_name: Häb-Umbach, Reinhold
      foaf_surname: Häb-Umbach
      foaf_workInfoHomepage: http://www.librecat.org/personId=242
  dct_date: 2025^xs_gYear
  dct_language: eng
  dct_title: A Fully Zero-Shot Approach to Obtaining Specialized and Compact Audio
    Tagging Models@
...
