Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR
T. von Neumann, C. Boeddeker, L. Drude, K. Kinoshita, M. Delcroix, T. Nakatani, R. Haeb-Umbach, in: Proc. Interspeech 2020, 2020, pp. 3097–3101.
Download
              
            
            
            
            Conference Paper
            
            
            
              |              English
              
            
          
        Author
        
      von Neumann, ThiloLibreCat  ;
      Boeddeker, ChristophLibreCat;
      Drude, Lukas;
      Kinoshita, Keisuke;
      Delcroix, Marc;
      Nakatani, Tomohiro;
      Haeb-Umbach, ReinholdLibreCat
;
      Boeddeker, ChristophLibreCat;
      Drude, Lukas;
      Kinoshita, Keisuke;
      Delcroix, Marc;
      Nakatani, Tomohiro;
      Haeb-Umbach, ReinholdLibreCat
 ;
      Boeddeker, ChristophLibreCat;
      Drude, Lukas;
      Kinoshita, Keisuke;
      Delcroix, Marc;
      Nakatani, Tomohiro;
      Haeb-Umbach, ReinholdLibreCat
;
      Boeddeker, ChristophLibreCat;
      Drude, Lukas;
      Kinoshita, Keisuke;
      Delcroix, Marc;
      Nakatani, Tomohiro;
      Haeb-Umbach, ReinholdLibreCatAbstract
    Most approaches to multi-talker overlapped speech separation and recognition assume that the number of simultaneously active speakers is given, but in realistic situations, it is typically unknown. To cope with this, we extend an iterative speech extraction system with mechanisms to count the number of sources and combine it with a single-talker speech recognizer to form the first end-to-end multi-talker automatic speech recognition system for an unknown number of active speakers. Our experiments show very promising performance in counting accuracy, source separation and speech recognition on simulated clean mixtures from WSJ0-2mix and WSJ0-3mix. Among others, we set a new state-of-the-art word error rate on the WSJ0-2mix database. Furthermore, our system generalizes well to a larger number of speakers than it ever saw during training, as shown in experiments with the WSJ0-4mix database. 
    
  Publishing Year
    
  Proceedings Title
    Proc. Interspeech 2020
  Page
      3097-3101
    LibreCat-ID
    
  Cite this
von Neumann T, Boeddeker C, Drude L, et al. Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR. In: Proc. Interspeech 2020. ; 2020:3097-3101. doi:10.21437/Interspeech.2020-2519
    von Neumann, T., Boeddeker, C., Drude, L., Kinoshita, K., Delcroix, M., Nakatani, T., & Haeb-Umbach, R. (2020). Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR. Proc. Interspeech 2020, 3097–3101. https://doi.org/10.21437/Interspeech.2020-2519
    @inproceedings{von Neumann_Boeddeker_Drude_Kinoshita_Delcroix_Nakatani_Haeb-Umbach_2020, title={Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR}, DOI={10.21437/Interspeech.2020-2519}, booktitle={Proc. Interspeech 2020}, author={von Neumann, Thilo and Boeddeker, Christoph and Drude, Lukas and Kinoshita, Keisuke and Delcroix, Marc and Nakatani, Tomohiro and Haeb-Umbach, Reinhold}, year={2020}, pages={3097–3101} }
    Neumann, Thilo von, Christoph Boeddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, and Reinhold Haeb-Umbach. “Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR.” In Proc. Interspeech 2020, 3097–3101, 2020. https://doi.org/10.21437/Interspeech.2020-2519.
    T. von Neumann et al., “Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR,” in Proc. Interspeech 2020, 2020, pp. 3097–3101, doi: 10.21437/Interspeech.2020-2519.
    von Neumann, Thilo, et al. “Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR.” Proc. Interspeech 2020, 2020, pp. 3097–101, doi:10.21437/Interspeech.2020-2519.
  
      All files available under the following license(s):
      
      
        
          
        
      
      
    
  
            Creative Commons Public Domain Dedication (CC0 1.0):
          
        
      Main File(s)
    
  File Name
    
      
        INTERSPEECH_2020_vonNeumann_Paper.pdf
      
       267.89 KB
    
  Access Level
     Open Access
 Open Access
    Last Uploaded
    
      2020-12-16T14:14:14Z
    
   
                
 Google Scholar
Google Scholar