{"abstract":[{"lang":"eng","text":"In this paper we show that recently developed algorithms for unsupervised word segmentation can be a valuable tool for the documentation of endangered languages. We applied an unsupervised word segmentation algorithm based on a nested Pitman-Yor language model to two austronesian languages, Wooi and Waima'a. The algorithm was then modified and parameterized to cater the needs of linguists for high precision of lexical discovery: We obtained a lexicon precision of of 69.2\\% and 67.5\\% for Wooi and Waima'a, respectively, if single-letter words and words found less than three times were discarded. A comparison with an English word segmentation task showed comparable performance, verifying that the assumptions underlying the Pitman-Yor language model, the universality of Zipf's law and the power of n-gram structures, do also hold for languages as exotic as Wooi and Waima'a."}],"year":"2015","author":[{"full_name":"Walter, Oliver","first_name":"Oliver","last_name":"Walter"},{"last_name":"Haeb-Umbach","first_name":"Reinhold","full_name":"Haeb-Umbach, Reinhold"},{"last_name":"Strunk","first_name":"Jan","full_name":"Strunk, Jan"},{"last_name":"P. Himmelmann","full_name":"P. Himmelmann, Nikolaus ","first_name":"Nikolaus "}],"title":"Lexicon Discovery for Language Preservation using Unsupervised Word Segmentation with Pitman-Yor Language Models (FGNT-2015-01)","type":"report"}