Unsupervised Word Discovery from Phonetic Input Using Nested Pitman-Yor Language Modeling

O. Walter, R. Haeb-Umbach, S. Chaudhuri, B. Raj, in: IEEE International Conference on Robotics and Automation (ICRA 2013), 2013.

Conference Paper | English
Author
Walter, Oliver; Haeb-Umbach, ReinholdLibreCat; Chaudhuri, Sourish; Raj, Bhiksha
Abstract
In this paper we consider the unsupervised word discovery from phonetic input. We employ a word segmentation algorithm which simultaneously develops a lexicon, i.e., the transcription of a word in terms of a phone sequence, learns a n-gram language model describing word and word sequence probabilities, and carries out the segmentation itself. The underlying statistical model is that of a Pitman-Yor process, a concept known from Bayesian non-parametrics, which allows for an a priori unknown and unlimited number of different words. Using a hierarchy of Pitman-Yor processes, language models of different order can be employed and nesting it with another hierarchy of Pitman-Yor processes on the phone level allows for backing off unknown word unigrams by phone m-grams. We present results on a large-vocabulary task, assuming an error-free phone sequence is given. We finish by discussing options how to cope with noisy phone sequences.
Publishing Year
Proceedings Title
IEEE International Conference on Robotics and Automation (ICRA 2013)
LibreCat-ID

Cite this

Walter O, Haeb-Umbach R, Chaudhuri S, Raj B. Unsupervised Word Discovery from Phonetic Input Using Nested Pitman-Yor Language Modeling. In: IEEE International Conference on Robotics and Automation (ICRA 2013). ; 2013.
Walter, O., Haeb-Umbach, R., Chaudhuri, S., & Raj, B. (2013). Unsupervised Word Discovery from Phonetic Input Using Nested Pitman-Yor Language Modeling. In IEEE International Conference on Robotics and Automation (ICRA 2013).
@inproceedings{Walter_Haeb-Umbach_Chaudhuri_Raj_2013, title={Unsupervised Word Discovery from Phonetic Input Using Nested Pitman-Yor Language Modeling}, booktitle={IEEE International Conference on Robotics and Automation (ICRA 2013)}, author={Walter, Oliver and Haeb-Umbach, Reinhold and Chaudhuri, Sourish and Raj, Bhiksha}, year={2013} }
Walter, Oliver, Reinhold Haeb-Umbach, Sourish Chaudhuri, and Bhiksha Raj. “Unsupervised Word Discovery from Phonetic Input Using Nested Pitman-Yor Language Modeling.” In IEEE International Conference on Robotics and Automation (ICRA 2013), 2013.
O. Walter, R. Haeb-Umbach, S. Chaudhuri, and B. Raj, “Unsupervised Word Discovery from Phonetic Input Using Nested Pitman-Yor Language Modeling,” in IEEE International Conference on Robotics and Automation (ICRA 2013), 2013.
Walter, Oliver, et al. “Unsupervised Word Discovery from Phonetic Input Using Nested Pitman-Yor Language Modeling.” IEEE International Conference on Robotics and Automation (ICRA 2013), 2013.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)
Access Level
Restricted Closed Access
External material:
Supplementary Material
Description
Poster
Supplementary Material
Description
Spotlight

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar