--- res: bibo_abstract: - 'In this contribution we show how to exploit text data to support word discovery from audio input in an underresourced target language. Given audio, of which a certain amount is transcribed at the word level, and additional unrelated text data, the approach is able to learn a probabilistic mapping from acoustic units to characters and utilize it to segment the audio data into words without the need of a pronunciation dictionary. This is achieved by three components: an unsupervised acoustic unit discovery system, a supervisedly trained acoustic unit-to-grapheme converter, and a word discovery system, which is initialized with a language model trained on the text data. Experiments for multiple setups show that the initialization of the language model with text data improves the word segementation performance by a large margin.@eng' bibo_authorlist: - foaf_Person: foaf_givenName: Thomas foaf_name: Glarner, Thomas foaf_surname: Glarner foaf_workInfoHomepage: http://www.librecat.org/personId=14169 - foaf_Person: foaf_givenName: Benedikt foaf_name: Boenninghoff, Benedikt foaf_surname: Boenninghoff - foaf_Person: foaf_givenName: Oliver foaf_name: Walter, Oliver foaf_surname: Walter - foaf_Person: foaf_givenName: Reinhold foaf_name: Haeb-Umbach, Reinhold foaf_surname: Haeb-Umbach foaf_workInfoHomepage: http://www.librecat.org/personId=242 dct_date: 2017^xs_gYear dct_language: eng dct_title: Leveraging Text Data for Word Segmentation for Underresourced Languages@ ...