{"date_updated":"2022-01-06T06:51:12Z","language":[{"iso":"eng"}],"abstract":[{"text":"In this paper we show that recently developed algorithms for unsupervised word segmentation can be a valuable tool for the documentation of endangered languages. We applied an unsupervised word segmentation algorithm based on a nested Pitman-Yor language model to two austronesian languages, Wooi and Waima'a. The algorithm was then modified and parameterized to cater the needs of linguists for high precision of lexical discovery: We obtained a lexicon precision of of 69.2\\% and 67.5\\% for Wooi and Waima'a, respectively, if single-letter words and words found less than three times were discarded. A comparison with an English word segmentation task showed comparable performance, verifying that the assumptions underlying the Pitman-Yor language model, the universality of Zipf's law and the power of n-gram structures, do also hold for languages as exotic as Wooi and Waima'a.","lang":"eng"}],"title":"Lexicon Discovery for Language Preservation using Unsupervised Word Segmentation with Pitman-Yor Language Models (FGNT-2015-01)","author":[{"first_name":"Oliver","last_name":"Walter","full_name":"Walter, Oliver"},{"id":"242","first_name":"Reinhold","last_name":"Haeb-Umbach","full_name":"Haeb-Umbach, Reinhold"},{"first_name":"Jan","full_name":"Strunk, Jan","last_name":"Strunk"},{"first_name":"Nikolaus ","last_name":"P. Himmelmann","full_name":"P. Himmelmann, Nikolaus "}],"year":"2015","main_file_link":[{"url":"https://groups.uni-paderborn.de/nt/pubs/2015/WaHaStHi.pdf","open_access":"1"}],"user_id":"44006","date_created":"2019-07-12T05:30:52Z","_id":"11923","status":"public","department":[{"_id":"54"}],"oa":"1","type":"report","citation":{"ama":"Walter O, Haeb-Umbach R, Strunk J, P. Himmelmann N. Lexicon Discovery for Language Preservation Using Unsupervised Word Segmentation with Pitman-Yor Language Models (FGNT-2015-01).; 2015.","ieee":"O. Walter, R. Haeb-Umbach, J. Strunk, and N. P. Himmelmann, Lexicon Discovery for Language Preservation using Unsupervised Word Segmentation with Pitman-Yor Language Models (FGNT-2015-01). 2015.","bibtex":"@book{Walter_Haeb-Umbach_Strunk_P. Himmelmann_2015, title={Lexicon Discovery for Language Preservation using Unsupervised Word Segmentation with Pitman-Yor Language Models (FGNT-2015-01)}, author={Walter, Oliver and Haeb-Umbach, Reinhold and Strunk, Jan and P. Himmelmann, Nikolaus }, year={2015} }","short":"O. Walter, R. Haeb-Umbach, J. Strunk, N. P. Himmelmann, Lexicon Discovery for Language Preservation Using Unsupervised Word Segmentation with Pitman-Yor Language Models (FGNT-2015-01), 2015.","apa":"Walter, O., Haeb-Umbach, R., Strunk, J., & P. Himmelmann, N. (2015). Lexicon Discovery for Language Preservation using Unsupervised Word Segmentation with Pitman-Yor Language Models (FGNT-2015-01).","chicago":"Walter, Oliver, Reinhold Haeb-Umbach, Jan Strunk, and Nikolaus P. Himmelmann. Lexicon Discovery for Language Preservation Using Unsupervised Word Segmentation with Pitman-Yor Language Models (FGNT-2015-01), 2015.","mla":"Walter, Oliver, et al. Lexicon Discovery for Language Preservation Using Unsupervised Word Segmentation with Pitman-Yor Language Models (FGNT-2015-01). 2015."}}