---
_id: '11892'
abstract:
- lang: eng
  text: For an environment to be perceived as being smart, contextual information
    has to be gathered to adapt the system's behavior and its interface towards the
    user. Being a rich source of context information speech can be acquired unobtrusively
    by microphone arrays and then processed to extract information about the user
    and his environment. In this paper, a system for joint temporal segmentation,
    speaker localization, and identification is presented, which is supported by face
    identification from video data obtained from a steerable camera. Special attention
    is paid to latency aspects and online processing capabilities, as they are important
    for the application under investigation, namely ambient communication. It describes
    the vision of terminal-less, session-less and multi-modal telecommunication with
    remote partners, where the user can move freely within his home while the communication
    follows him. The speaker diarization serves as a context source, which has been
    integrated in a service-oriented middleware architecture and provided to the application
    to select the most appropriate I/O device and to steer the camera towards the
    speaker during ambient communication.
author:
- first_name: Joerg
  full_name: Schmalenstroeer, Joerg
  id: '460'
  last_name: Schmalenstroeer
- first_name: Reinhold
  full_name: Haeb-Umbach, Reinhold
  id: '242'
  last_name: Haeb-Umbach
citation:
  ama: Schmalenstroeer J, Haeb-Umbach R. Online Diarization of Streaming Audio-Visual
    Data for Smart Environments. <i>IEEE Journal of Selected Topics in Signal Processing</i>.
    2010;4(5):845-856. doi:<a href="https://doi.org/10.1109/JSTSP.2010.2050519">10.1109/JSTSP.2010.2050519</a>
  apa: Schmalenstroeer, J., &#38; Haeb-Umbach, R. (2010). Online Diarization of Streaming
    Audio-Visual Data for Smart Environments. <i>IEEE Journal of Selected Topics in
    Signal Processing</i>, <i>4</i>(5), 845–856. <a href="https://doi.org/10.1109/JSTSP.2010.2050519">https://doi.org/10.1109/JSTSP.2010.2050519</a>
  bibtex: '@article{Schmalenstroeer_Haeb-Umbach_2010, title={Online Diarization of
    Streaming Audio-Visual Data for Smart Environments}, volume={4}, DOI={<a href="https://doi.org/10.1109/JSTSP.2010.2050519">10.1109/JSTSP.2010.2050519</a>},
    number={5}, journal={IEEE Journal of Selected Topics in Signal Processing}, author={Schmalenstroeer,
    Joerg and Haeb-Umbach, Reinhold}, year={2010}, pages={845–856} }'
  chicago: 'Schmalenstroeer, Joerg, and Reinhold Haeb-Umbach. “Online Diarization
    of Streaming Audio-Visual Data for Smart Environments.” <i>IEEE Journal of Selected
    Topics in Signal Processing</i> 4, no. 5 (2010): 845–56. <a href="https://doi.org/10.1109/JSTSP.2010.2050519">https://doi.org/10.1109/JSTSP.2010.2050519</a>.'
  ieee: 'J. Schmalenstroeer and R. Haeb-Umbach, “Online Diarization of Streaming Audio-Visual
    Data for Smart Environments,” <i>IEEE Journal of Selected Topics in Signal Processing</i>,
    vol. 4, no. 5, pp. 845–856, 2010, doi: <a href="https://doi.org/10.1109/JSTSP.2010.2050519">10.1109/JSTSP.2010.2050519</a>.'
  mla: Schmalenstroeer, Joerg, and Reinhold Haeb-Umbach. “Online Diarization of Streaming
    Audio-Visual Data for Smart Environments.” <i>IEEE Journal of Selected Topics
    in Signal Processing</i>, vol. 4, no. 5, 2010, pp. 845–56, doi:<a href="https://doi.org/10.1109/JSTSP.2010.2050519">10.1109/JSTSP.2010.2050519</a>.
  short: J. Schmalenstroeer, R. Haeb-Umbach, IEEE Journal of Selected Topics in Signal
    Processing 4 (2010) 845–856.
date_created: 2019-07-12T05:30:16Z
date_updated: 2023-10-26T08:10:18Z
department:
- _id: '54'
doi: 10.1109/JSTSP.2010.2050519
intvolume: '         4'
issue: '5'
keyword:
- audio streaming
- audio visual data streaming
- context information speech
- face identification
- face recognition
- image segmentation
- middleware
- multimodal telecommunication
- online diarization
- service oriented middleware architecture
- sessionless telecommunication
- software architecture
- speaker identification
- speaker localization
- speaker recognition
- steerable camera
- telecommunication computing
- temporal segmentation
- terminal-less telecommunication
- video streaming
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://groups.uni-paderborn.de/nt/pubs/2010/ScHa10.pdf
oa: '1'
page: 845-856
publication: IEEE Journal of Selected Topics in Signal Processing
quality_controlled: '1'
status: public
title: Online Diarization of Streaming Audio-Visual Data for Smart Environments
type: journal_article
user_id: '460'
volume: 4
year: '2010'
...