ESPnet: End-to-End Speech Processing Toolkit

S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N. Enrique Yalta Soplin, J. Heymann, M. Wiesner, N. Chen, A. Renduchintala, T. Ochiai, in: INTERSPEECH 2018, Hyderabad, India, 2018, pp. 2207–2211.

OA INTERSPEECH_2018_Heymann_Paper.pdf 288.91 KB
Conference Paper | English
Watanabe, Shinji; Hori, Takaaki; Karita, Shigeki; Hayashi, Tomoki; Nishitoba, Jiro; Unno, Yuya; Enrique Yalta Soplin, Nelson; Heymann, JahnLibreCat; Wiesner, Matthew; Chen, Nanxin; Renduchintala, Adithya; Ochiai, Tsubasa
This paper introduces a new open source platform for end-toend speech processing named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and Py-Torch, as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style for data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. This paper explains a major architecture of this software platform, several important functionalities, which differentiate ESPnet from other open source ASR toolkits, and experimental results with major ASR benchmarks.
Publishing Year
Proceedings Title
INTERSPEECH 2018, Hyderabad, India

Cite this

Watanabe S, Hori T, Karita S, et al. ESPnet: End-to-End Speech Processing Toolkit. In: INTERSPEECH 2018, Hyderabad, India. ; 2018:2207–2211. doi:10.21437/Interspeech.2018-1456
Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., Enrique Yalta Soplin, N., Heymann, J., Wiesner, M., Chen, N., Renduchintala, A., & Ochiai, T. (2018). ESPnet: End-to-End Speech Processing Toolkit. INTERSPEECH 2018, Hyderabad, India, 2207–2211.
@inproceedings{Watanabe_Hori_Karita_Hayashi_Nishitoba_Unno_Enrique Yalta Soplin_Heymann_Wiesner_Chen_et al._2018, title={ESPnet: End-to-End Speech Processing Toolkit}, DOI={10.21437/Interspeech.2018-1456}, booktitle={INTERSPEECH 2018, Hyderabad, India}, author={Watanabe, Shinji and Hori, Takaaki and Karita, Shigeki and Hayashi, Tomoki and Nishitoba, Jiro and Unno, Yuya and Enrique Yalta Soplin, Nelson and Heymann, Jahn and Wiesner, Matthew and Chen, Nanxin and et al.}, year={2018}, pages={2207–2211} }
Watanabe, Shinji, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, et al. “ESPnet: End-to-End Speech Processing Toolkit.” In INTERSPEECH 2018, Hyderabad, India, 2207–2211, 2018.
S. Watanabe et al., “ESPnet: End-to-End Speech Processing Toolkit,” in INTERSPEECH 2018, Hyderabad, India, 2018, pp. 2207–2211, doi: 10.21437/Interspeech.2018-1456.
Watanabe, Shinji, et al. “ESPnet: End-to-End Speech Processing Toolkit.” INTERSPEECH 2018, Hyderabad, India, 2018, pp. 2207–2211, doi:10.21437/Interspeech.2018-1456.
All files available under the following license(s):
Main File(s)
Access Level
OA Open Access
Last Uploaded


Marked Publications

Open Data LibreCat

Search this title in

Google Scholar