ESPnet: End-to-End Speech Processing Toolkit
S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N. Enrique Yalta Soplin, J. Heymann, M. Wiesner, N. Chen, A. Renduchintala, T. Ochiai, in: INTERSPEECH 2018, Hyderabad, India, 2018, pp. 2207–2211.
Download
INTERSPEECH_2018_Heymann_Paper.pdf
288.91 KB
Conference Paper
| English
Author
Watanabe, Shinji;
Hori, Takaaki;
Karita, Shigeki;
Hayashi, Tomoki;
Nishitoba, Jiro;
Unno, Yuya;
Enrique Yalta Soplin, Nelson;
Heymann, JahnLibreCat;
Wiesner, Matthew;
Chen, Nanxin;
Renduchintala, Adithya;
Ochiai, Tsubasa
All
All
Abstract
This paper introduces a new open source platform for end-toend speech processing named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and Py-Torch, as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style for data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. This paper explains a major architecture of this software platform, several important functionalities, which differentiate ESPnet from other open source ASR toolkits, and experimental results with
major ASR benchmarks.
Publishing Year
Proceedings Title
INTERSPEECH 2018, Hyderabad, India
Page
2207–2211
LibreCat-ID
Cite this
Watanabe S, Hori T, Karita S, et al. ESPnet: End-to-End Speech Processing Toolkit. In: INTERSPEECH 2018, Hyderabad, India. ; 2018:2207–2211. doi:10.21437/Interspeech.2018-1456
Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., Enrique Yalta Soplin, N., Heymann, J., Wiesner, M., Chen, N., Renduchintala, A., & Ochiai, T. (2018). ESPnet: End-to-End Speech Processing Toolkit. INTERSPEECH 2018, Hyderabad, India, 2207–2211. https://doi.org/10.21437/Interspeech.2018-1456
@inproceedings{Watanabe_Hori_Karita_Hayashi_Nishitoba_Unno_Enrique Yalta Soplin_Heymann_Wiesner_Chen_et al._2018, title={ESPnet: End-to-End Speech Processing Toolkit}, DOI={10.21437/Interspeech.2018-1456}, booktitle={INTERSPEECH 2018, Hyderabad, India}, author={Watanabe, Shinji and Hori, Takaaki and Karita, Shigeki and Hayashi, Tomoki and Nishitoba, Jiro and Unno, Yuya and Enrique Yalta Soplin, Nelson and Heymann, Jahn and Wiesner, Matthew and Chen, Nanxin and et al.}, year={2018}, pages={2207–2211} }
Watanabe, Shinji, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, et al. “ESPnet: End-to-End Speech Processing Toolkit.” In INTERSPEECH 2018, Hyderabad, India, 2207–2211, 2018. https://doi.org/10.21437/Interspeech.2018-1456.
S. Watanabe et al., “ESPnet: End-to-End Speech Processing Toolkit,” in INTERSPEECH 2018, Hyderabad, India, 2018, pp. 2207–2211, doi: 10.21437/Interspeech.2018-1456.
Watanabe, Shinji, et al. “ESPnet: End-to-End Speech Processing Toolkit.” INTERSPEECH 2018, Hyderabad, India, 2018, pp. 2207–2211, doi:10.21437/Interspeech.2018-1456.
All files available under the following license(s):
Creative Commons Public Domain Dedication (CC0 1.0):
Main File(s)
File Name
INTERSPEECH_2018_Heymann_Paper.pdf
288.91 KB
Access Level
Open Access
Last Uploaded
2022-02-23T08:03:13Z