Fusing Audio and Video Information for Online Speaker Diarization

J. Schmalenstroeer, M. Kelling, V. Leutnant, R. Haeb-Umbach, in: Interspeech 2009, 2009.

Conference Paper | English
Author
Abstract
In this paper we present a system for identifying and localizingspeakers using distant microphone arrays and a steerablepan-tilt-zoom camera. Audio and video streams are processedin real-time to obtain the diarization information {grqq}who speakswhen and where'' with low latency to be used in advanced videoconferencing systems or user-adaptive interfaces. A key featureof the proposed system is to first glean information about thespeaker{\rq}s location and identity from the audio and visual datastreams separately and then to fuse these data in a probabilisticframework employing the Viterbi algorithm. Here, visual evidenceof a person is utilized through a priori state probabilities,while location and speaker change information are employedvia time-variant transition probablities. Experiments show thatvideo information yields a substantial improvement comparedto pure audio-based diarization.
Publishing Year
Proceedings Title
Interspeech 2009
LibreCat-ID

Cite this

Schmalenstroeer J, Kelling M, Leutnant V, Haeb-Umbach R. Fusing Audio and Video Information for Online Speaker Diarization. In: Interspeech 2009. ; 2009.
Schmalenstroeer, J., Kelling, M., Leutnant, V., & Haeb-Umbach, R. (2009). Fusing Audio and Video Information for Online Speaker Diarization. In Interspeech 2009.
@inproceedings{Schmalenstroeer_Kelling_Leutnant_Haeb-Umbach_2009, title={Fusing Audio and Video Information for Online Speaker Diarization}, booktitle={Interspeech 2009}, author={Schmalenstroeer, Joerg and Kelling, Martin and Leutnant, Volker and Haeb-Umbach, Reinhold}, year={2009} }
Schmalenstroeer, Joerg, Martin Kelling, Volker Leutnant, and Reinhold Haeb-Umbach. “Fusing Audio and Video Information for Online Speaker Diarization.” In Interspeech 2009, 2009.
J. Schmalenstroeer, M. Kelling, V. Leutnant, and R. Haeb-Umbach, “Fusing Audio and Video Information for Online Speaker Diarization,” in Interspeech 2009, 2009.
Schmalenstroeer, Joerg, et al. “Fusing Audio and Video Information for Online Speaker Diarization.” Interspeech 2009, 2009.

Link(s) to Main File(s)
Access Level
Restricted Closed Access

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar