---
_id: '26770'
abstract:
- lang: eng
text: "Automatic transcription of meetings requires handling of overlapped speech,
which calls for continuous speech separation (CSS) systems. The uPIT criterion
was proposed for utterance-level separation with neural networks and introduces
the constraint that the total number of speakers must not exceed the number of
output channels. When processing meeting-like data in a segment-wise manner, i.e.,
by separating overlapping segments independently and stitching adjacent segments
to continuous output streams, this constraint has to be fulfilled for any segment.
In this contribution, we show that this constraint can be significantly relaxed.
We propose a novel graph-based PIT criterion, which casts the assignment of utterances
to output channels in a graph coloring problem. It only requires that the number
of concurrently active speakers must not exceed the number of output channels.
As a consequence, the system can process an arbitrary number of speakers and arbitrarily
long segments and thus can handle more diverse scenarios.\r\nFurther, the stitching
algorithm for obtaining a consistent output order in neighboring segments is of
less importance and can even be eliminated completely, not the least reducing
the computational effort. Experiments on meeting-style WSJ data show improvements
in recognition performance over using the uPIT criterion. "
author:
- first_name: Thilo
full_name: von Neumann, Thilo
id: '49870'
last_name: von Neumann
orcid: https://orcid.org/0000-0002-7717-8670
- first_name: Keisuke
full_name: Kinoshita, Keisuke
last_name: Kinoshita
- first_name: Christoph
full_name: Boeddeker, Christoph
id: '40767'
last_name: Boeddeker
- first_name: Marc
full_name: Delcroix, Marc
last_name: Delcroix
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'von Neumann T, Kinoshita K, Boeddeker C, Delcroix M, Haeb-Umbach R. Graph-PIT:
Generalized Permutation Invariant Training for Continuous Separation of Arbitrary
Numbers of Speakers. In: Interspeech 2021. ; 2021. doi:10.21437/interspeech.2021-1177'
apa: 'von Neumann, T., Kinoshita, K., Boeddeker, C., Delcroix, M., & Haeb-Umbach,
R. (2021). Graph-PIT: Generalized Permutation Invariant Training for Continuous
Separation of Arbitrary Numbers of Speakers. Interspeech 2021. Interspeech.
https://doi.org/10.21437/interspeech.2021-1177'
bibtex: '@inproceedings{von Neumann_Kinoshita_Boeddeker_Delcroix_Haeb-Umbach_2021,
title={Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation
of Arbitrary Numbers of Speakers}, DOI={10.21437/interspeech.2021-1177},
booktitle={Interspeech 2021}, author={von Neumann, Thilo and Kinoshita, Keisuke
and Boeddeker, Christoph and Delcroix, Marc and Haeb-Umbach, Reinhold}, year={2021}
}'
chicago: 'Neumann, Thilo von, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix,
and Reinhold Haeb-Umbach. “Graph-PIT: Generalized Permutation Invariant Training
for Continuous Separation of Arbitrary Numbers of Speakers.” In Interspeech
2021, 2021. https://doi.org/10.21437/interspeech.2021-1177.'
ieee: 'T. von Neumann, K. Kinoshita, C. Boeddeker, M. Delcroix, and R. Haeb-Umbach,
“Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation
of Arbitrary Numbers of Speakers,” presented at the Interspeech, 2021, doi: 10.21437/interspeech.2021-1177.'
mla: 'von Neumann, Thilo, et al. “Graph-PIT: Generalized Permutation Invariant Training
for Continuous Separation of Arbitrary Numbers of Speakers.” Interspeech 2021,
2021, doi:10.21437/interspeech.2021-1177.'
short: 'T. von Neumann, K. Kinoshita, C. Boeddeker, M. Delcroix, R. Haeb-Umbach,
in: Interspeech 2021, 2021.'
conference:
name: Interspeech
date_created: 2021-10-25T08:50:01Z
date_updated: 2023-11-15T12:14:40Z
ddc:
- '000'
department:
- _id: '54'
doi: 10.21437/interspeech.2021-1177
file:
- access_level: open_access
content_type: video/mp4
creator: tvn
date_created: 2021-12-06T10:39:13Z
date_updated: 2021-12-06T10:48:30Z
file_id: '28327'
file_name: Interspeech 2021 voiceover-002-compressed.mp4
file_size: 9550220
relation: supplementary_material
title: Video for INTERSPEECH 2021
- access_level: open_access
content_type: application/vnd.openxmlformats-officedocument.presentationml.presentation
creator: tvn
date_created: 2021-12-06T10:47:01Z
date_updated: 2021-12-06T10:47:01Z
file_id: '28328'
file_name: Graph-PIT-poster-presentation.pptx
file_size: 1337297
relation: slides
title: Slides from INTERSPEECH 2021
- access_level: open_access
content_type: application/pdf
creator: tvn
date_created: 2021-12-06T10:48:21Z
date_updated: 2021-12-06T10:48:21Z
file_id: '28329'
file_name: INTERSPEECH2021_Graph_PIT.pdf
file_size: 226589
relation: main_file
file_date_updated: 2021-12-06T10:48:30Z
has_accepted_license: '1'
keyword:
- Continuous speech separation
- automatic speech recognition
- overlapped speech
- permutation invariant training
language:
- iso: eng
oa: '1'
project:
- _id: '52'
name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: Interspeech 2021
publication_status: published
quality_controlled: '1'
related_material:
link:
- relation: software
url: https://github.com/fgnt/graph_pit
status: public
title: 'Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation
of Arbitrary Numbers of Speakers'
type: conference
user_id: '49870'
year: '2021'
...