Iterative Bayesian Word Segmentation for Unspuervised Vocabulary Discovery from Phoneme Lattices

J. Heymann, O. Walter, R. Haeb-Umbach, B. Raj, in: 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), 2014.

Conference Paper | English
Author
Abstract
"In this paper we present an algorithm for the unsupervised segmentation of a lattice produced by a phoneme recognizer into words. Using a lattice rather than a single phoneme string accounts for the uncertainty of the recognizer about the true label sequence. An example application is the discovery of lexical units from the output of an error-prone phoneme recognizer in a zero-resource setting, where neither the lexicon nor the language model (LM) is known. We propose a computationally efficient iterative approach, which alternates between the following two steps: First, the most probable string is extracted from the lattice using a phoneme LM learned on the segmentation result of the previous iteration. Second, word segmentation is performed on the extracted string using a word and phoneme LM which is learned alongside the new segmentation. We present results on lattices produced by a phoneme recognizer on the WSJCAM0 dataset. We show that our approach delivers superior segmentation performance than an earlier approach found in the literature, in particular for higher-order language models. "
Publishing Year
Proceedings Title
39th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014)
LibreCat-ID

Cite this

Heymann J, Walter O, Haeb-Umbach R, Raj B. Iterative Bayesian Word Segmentation for Unspuervised Vocabulary Discovery from Phoneme Lattices. In: 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014). ; 2014.
Heymann, J., Walter, O., Haeb-Umbach, R., & Raj, B. (2014). Iterative Bayesian Word Segmentation for Unspuervised Vocabulary Discovery from Phoneme Lattices. In 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014).
@inproceedings{Heymann_Walter_Haeb-Umbach_Raj_2014, title={Iterative Bayesian Word Segmentation for Unspuervised Vocabulary Discovery from Phoneme Lattices}, booktitle={39th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014)}, author={Heymann, Jahn and Walter, Oliver and Haeb-Umbach, Reinhold and Raj, Bhiksha}, year={2014} }
Heymann, Jahn, Oliver Walter, Reinhold Haeb-Umbach, and Bhiksha Raj. “Iterative Bayesian Word Segmentation for Unspuervised Vocabulary Discovery from Phoneme Lattices.” In 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), 2014.
J. Heymann, O. Walter, R. Haeb-Umbach, and B. Raj, “Iterative Bayesian Word Segmentation for Unspuervised Vocabulary Discovery from Phoneme Lattices,” in 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), 2014.
Heymann, Jahn, et al. “Iterative Bayesian Word Segmentation for Unspuervised Vocabulary Discovery from Phoneme Lattices.” 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), 2014.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)
Access Level
Restricted Closed Access
External material:
Supplementary Material
Description
Poster

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar