{"oa":"1","abstract":[{"lang":"eng","text":"Recently, the source separation performance was greatly improved by time-domain audio source separation based on dual-path recurrent neural network (DPRNN). DPRNN is a simple but effective model for a long sequential data. While DPRNN is quite efficient in modeling a sequential data of the length of an utterance, i.e., about 5 to 10 second data, it is harder to apply it to longer sequences such as whole conversations consisting of multiple utterances. It is simply because, in such a case, the number of time steps consumed by its internal module called inter-chunk RNN becomes extremely large. To mitigate this problem, this paper proposes a multi-path RNN (MPRNN), a generalized version of DPRNN, that models the input data in a hierarchical manner. In the MPRNN framework, the input data is represented at several (>_ 3) time-resolutions, each of which is modeled by a specific RNN sub-module. For example, the RNN sub-module that deals with the finest resolution may model temporal relationship only within a phoneme, while the RNN sub-module handling the most coarse resolution may capture only the relationship between utterances such as speaker information. We perform experiments using simulated dialogue-like mixtures and show that MPRNN has greater model capacity, and it outperforms the current state-of-the-art DPRNN framework especially in online processing scenarios."}],"language":[{"iso":"eng"}],"year":"2020","title":"Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation","date_updated":"2023-01-11T11:24:31Z","file_date_updated":"2020-12-16T14:16:32Z","ddc":["000"],"has_accepted_license":"1","page":"2652-2656","doi":"10.21437/Interspeech.2020-2388","publication":"Proc. Interspeech 2020","status":"public","file":[{"creator":"huesera","date_updated":"2020-12-16T14:16:32Z","file_id":"20767","content_type":"application/pdf","file_size":1725219,"date_created":"2020-12-16T14:16:32Z","relation":"main_file","access_level":"open_access","file_name":"INTERSPEECH_2020_vonNeumann1_Paper.pdf"}],"user_id":"59789","citation":{"chicago":"Kinoshita, Keisuke, Thilo von Neumann, Marc Delcroix, Tomohiro Nakatani, and Reinhold Haeb-Umbach. “Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and Its Application to Speaker Stream Separation.” In Proc. Interspeech 2020, 2652–56, 2020. https://doi.org/10.21437/Interspeech.2020-2388.","bibtex":"@inproceedings{Kinoshita_von Neumann_Delcroix_Nakatani_Haeb-Umbach_2020, title={Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation}, DOI={10.21437/Interspeech.2020-2388}, booktitle={Proc. Interspeech 2020}, author={Kinoshita, Keisuke and von Neumann, Thilo and Delcroix, Marc and Nakatani, Tomohiro and Haeb-Umbach, Reinhold}, year={2020}, pages={2652–2656} }","ama":"Kinoshita K, von Neumann T, Delcroix M, Nakatani T, Haeb-Umbach R. Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation. In: Proc. Interspeech 2020. ; 2020:2652-2656. doi:10.21437/Interspeech.2020-2388","apa":"Kinoshita, K., von Neumann, T., Delcroix, M., Nakatani, T., & Haeb-Umbach, R. (2020). Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation. Proc. Interspeech 2020, 2652–2656. https://doi.org/10.21437/Interspeech.2020-2388","short":"K. Kinoshita, T. von Neumann, M. Delcroix, T. Nakatani, R. Haeb-Umbach, in: Proc. Interspeech 2020, 2020, pp. 2652–2656.","ieee":"K. Kinoshita, T. von Neumann, M. Delcroix, T. Nakatani, and R. Haeb-Umbach, “Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation,” in Proc. Interspeech 2020, 2020, pp. 2652–2656, doi: 10.21437/Interspeech.2020-2388.","mla":"Kinoshita, Keisuke, et al. “Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and Its Application to Speaker Stream Separation.” Proc. Interspeech 2020, 2020, pp. 2652–56, doi:10.21437/Interspeech.2020-2388."},"type":"conference","author":[{"last_name":"Kinoshita","full_name":"Kinoshita, Keisuke","first_name":"Keisuke"},{"full_name":"von Neumann, Thilo","first_name":"Thilo","orcid":"https://orcid.org/0000-0002-7717-8670","id":"49870","last_name":"von Neumann"},{"first_name":"Marc","full_name":"Delcroix, Marc","last_name":"Delcroix"},{"first_name":"Tomohiro","full_name":"Nakatani, Tomohiro","last_name":"Nakatani"},{"last_name":"Haeb-Umbach","full_name":"Haeb-Umbach, Reinhold","first_name":"Reinhold","id":"242"}],"date_created":"2020-12-16T14:15:24Z","department":[{"_id":"54"}],"_id":"20766"}