{"oa":"1","citation":{"ama":"Glarner T, Boenninghoff B, Walter O, Haeb-Umbach R. Leveraging Text Data for Word Segmentation for Underresourced Languages. In: <i>INTERSPEECH 2017, Stockholm, Schweden</i>. ; 2017.","bibtex":"@inproceedings{Glarner_Boenninghoff_Walter_Haeb-Umbach_2017, title={Leveraging Text Data for Word Segmentation for Underresourced Languages}, booktitle={INTERSPEECH 2017, Stockholm, Schweden}, author={Glarner, Thomas and Boenninghoff, Benedikt and Walter, Oliver and Haeb-Umbach, Reinhold}, year={2017} }","chicago":"Glarner, Thomas, Benedikt Boenninghoff, Oliver Walter, and Reinhold Haeb-Umbach. “Leveraging Text Data for Word Segmentation for Underresourced Languages.” In <i>INTERSPEECH 2017, Stockholm, Schweden</i>, 2017.","short":"T. Glarner, B. Boenninghoff, O. Walter, R. Haeb-Umbach, in: INTERSPEECH 2017, Stockholm, Schweden, 2017.","ieee":"T. Glarner, B. Boenninghoff, O. Walter, and R. Haeb-Umbach, “Leveraging Text Data for Word Segmentation for Underresourced Languages,” in <i>INTERSPEECH 2017, Stockholm, Schweden</i>, 2017.","apa":"Glarner, T., Boenninghoff, B., Walter, O., &#38; Haeb-Umbach, R. (2017). Leveraging Text Data for Word Segmentation for Underresourced Languages. In <i>INTERSPEECH 2017, Stockholm, Schweden</i>.","mla":"Glarner, Thomas, et al. “Leveraging Text Data for Word Segmentation for Underresourced Languages.” <i>INTERSPEECH 2017, Stockholm, Schweden</i>, 2017."},"user_id":"44006","department":[{"_id":"54"}],"abstract":[{"text":"In this contribution we show how to exploit text data to support word discovery from audio input in an underresourced target language. Given audio, of which a certain amount is transcribed at the word level, and additional unrelated text data, the approach is able to learn a probabilistic mapping from acoustic units to characters and utilize it to segment the audio data into words without the need of a pronunciation dictionary. This is achieved by three components: an unsupervised acoustic unit discovery system, a supervisedly trained acoustic unit-to-grapheme converter, and a word discovery system, which is initialized with a language model trained on the text data. Experiments for multiple setups show that the initialization of the language model with text data improves the word segementation performance by a large margin.","lang":"eng"}],"main_file_link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2017/INTERSPEECH_2017_Glarner_paper.pdf","open_access":"1"}],"_id":"11770","language":[{"iso":"eng"}],"year":"2017","status":"public","author":[{"full_name":"Glarner, Thomas","last_name":"Glarner","first_name":"Thomas","id":"14169"},{"full_name":"Boenninghoff, Benedikt","last_name":"Boenninghoff","first_name":"Benedikt"},{"first_name":"Oliver","last_name":"Walter","full_name":"Walter, Oliver"},{"first_name":"Reinhold","id":"242","last_name":"Haeb-Umbach","full_name":"Haeb-Umbach, Reinhold"}],"date_updated":"2022-01-06T06:51:08Z","related_material":{"link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2017/INTERSPEECH_2017_Glarner_poster.pdf","relation":"supplementary_material","description":"Poster"}]},"publication":"INTERSPEECH 2017, Stockholm, Schweden","date_created":"2019-07-12T05:27:55Z","title":"Leveraging Text Data for Word Segmentation for Underresourced Languages","type":"conference"}