{"quality_controlled":"1","language":[{"iso":"eng"}],"abstract":[{"lang":"eng","text":"Recent speaker diarization studies showed that integration of end-to-end neural diarization (EEND) and clustering-based diarization is a promising approach for achieving state-of-the-art performance on various tasks. Such an approach first divides an observed signal into fixed-length segments, then performs {\\it segment-level} local diarization based on an EEND module, and merges the segment-level results via clustering to form a final global diarization result. The segmentation is done to limit the number of speakers in each segment since the current EEND cannot handle a large number of speakers. In this paper, we argue that such an approach involving the segmentation has several issues; for example, it inevitably faces a dilemma that larger segment sizes increase both the context available for enhancing the performance and the number of speakers for the local EEND module to handle. To resolve such a problem, this paper proposes a novel framework that performs diarization without segmentation. However, it can still handle challenging data containing many speakers and a significant amount of overlapping speech. The proposed method can take an entire meeting for inference and perform {\\it utterance-by-utterance} diarization that clusters utterance activities in terms of speakers. To this end, we leverage a neural network training scheme called Graph-PIT proposed recently for neural source separation. Experiments with simulated active-meeting-like data and CALLHOME data show the superiority of the proposed approach over the conventional methods."}],"doi":"10.21437/Interspeech.2022-11408","publication_status":"published","department":[{"_id":"54"}],"status":"public","author":[{"full_name":"Kinoshita, Keisuke","last_name":"Kinoshita","first_name":"Keisuke"},{"full_name":"von Neumann, Thilo","orcid":"https://orcid.org/0000-0002-7717-8670","last_name":"von Neumann","id":"49870","first_name":"Thilo"},{"last_name":"Delcroix","full_name":"Delcroix, Marc","first_name":"Marc"},{"first_name":"Christoph","id":"40767","last_name":"Boeddeker","full_name":"Boeddeker, Christoph"},{"first_name":"Reinhold","id":"242","last_name":"Haeb-Umbach","full_name":"Haeb-Umbach, Reinhold"}],"type":"conference","conference":{"name":"Interspeech 2022"},"date_created":"2022-10-28T12:07:57Z","date_updated":"2025-02-12T09:09:05Z","page":"1486-1490","year":"2022","title":"Utterance-by-utterance overlap-aware neural diarization with Graph-PIT","citation":{"ama":"Kinoshita K, von Neumann T, Delcroix M, Boeddeker C, Haeb-Umbach R. Utterance-by-utterance overlap-aware neural diarization with Graph-PIT. In: Proc. Interspeech 2022. ISCA; 2022:1486-1490. doi:10.21437/Interspeech.2022-11408","apa":"Kinoshita, K., von Neumann, T., Delcroix, M., Boeddeker, C., & Haeb-Umbach, R. (2022). Utterance-by-utterance overlap-aware neural diarization with Graph-PIT. Proc. Interspeech 2022, 1486–1490. https://doi.org/10.21437/Interspeech.2022-11408","chicago":"Kinoshita, Keisuke, Thilo von Neumann, Marc Delcroix, Christoph Boeddeker, and Reinhold Haeb-Umbach. “Utterance-by-Utterance Overlap-Aware Neural Diarization with Graph-PIT.” In Proc. Interspeech 2022, 1486–90. ISCA, 2022. https://doi.org/10.21437/Interspeech.2022-11408.","short":"K. Kinoshita, T. von Neumann, M. Delcroix, C. Boeddeker, R. Haeb-Umbach, in: Proc. Interspeech 2022, ISCA, 2022, pp. 1486–1490.","bibtex":"@inproceedings{Kinoshita_von Neumann_Delcroix_Boeddeker_Haeb-Umbach_2022, title={Utterance-by-utterance overlap-aware neural diarization with Graph-PIT}, DOI={10.21437/Interspeech.2022-11408}, booktitle={Proc. Interspeech 2022}, publisher={ISCA}, author={Kinoshita, Keisuke and von Neumann, Thilo and Delcroix, Marc and Boeddeker, Christoph and Haeb-Umbach, Reinhold}, year={2022}, pages={1486–1490} }","mla":"Kinoshita, Keisuke, et al. “Utterance-by-Utterance Overlap-Aware Neural Diarization with Graph-PIT.” Proc. Interspeech 2022, ISCA, 2022, pp. 1486–90, doi:10.21437/Interspeech.2022-11408.","ieee":"K. Kinoshita, T. von Neumann, M. Delcroix, C. Boeddeker, and R. Haeb-Umbach, “Utterance-by-utterance overlap-aware neural diarization with Graph-PIT,” in Proc. Interspeech 2022, 2022, pp. 1486–1490, doi: 10.21437/Interspeech.2022-11408."},"_id":"33958","user_id":"40767","publication":"Proc. Interspeech 2022","publisher":"ISCA","main_file_link":[{"url":"https://www.isca-archive.org/interspeech_2022/kinoshita22_interspeech.pdf"}]}