ESPnet: End-to-End Speech Processing Toolkit

Watanabe, Shinji; Hori, Takaaki; Karita, Shigeki; Hayashi, Tomoki; Nishitoba, Jiro; Unno, Yuya; Enrique Yalta Soplin, Nelson; Heymann, Jahn; Wiesner, Matthew; Chen, Nanxin; Renduchintala, Adithya; Ochiai, Tsubasa

ESPnet: End-to-End Speech Processing Toolkit

S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N. Enrique Yalta Soplin, J. Heymann, M. Wiesner, N. Chen, A. Renduchintala, T. Ochiai, in: INTERSPEECH 2018, Hyderabad, India, 2018, pp. 2207–2211.

Download

INTERSPEECH_2018_Heymann_Paper.pdf 288.91 KB

DOI

10.21437/Interspeech.2018-1456

Conference Paper | English

Author

Watanabe, Shinji; Hori, Takaaki; Karita, Shigeki; Hayashi, Tomoki; Nishitoba, Jiro; Unno, Yuya; Enrique Yalta Soplin, Nelson; Heymann, Jahn^LibreCat; Wiesner, Matthew; Chen, Nanxin; Renduchintala, Adithya; Ochiai, Tsubasa
All

Department

Nachrichtentechnik (NT) / Heinz Nixdorf Institut

Abstract

This paper introduces a new open source platform for end-toend speech processing named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and Py-Torch, as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style for data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. This paper explains a major architecture of this software platform, several important functionalities, which differentiate ESPnet from other open source ASR toolkits, and experimental results with major ASR benchmarks.

Publishing Year

2018

Proceedings Title

INTERSPEECH 2018, Hyderabad, India

Page

2207–2211

LibreCat-ID

29923

Cite this

Watanabe S, Hori T, Karita S, et al. ESPnet: End-to-End Speech Processing Toolkit. In: INTERSPEECH 2018, Hyderabad, India. ; 2018:2207–2211. doi:10.21437/Interspeech.2018-1456

Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., Enrique Yalta Soplin, N., Heymann, J., Wiesner, M., Chen, N., Renduchintala, A., & Ochiai, T. (2018). ESPnet: End-to-End Speech Processing Toolkit. INTERSPEECH 2018, Hyderabad, India, 2207–2211. https://doi.org/10.21437/Interspeech.2018-1456

@inproceedings{Watanabe_Hori_Karita_Hayashi_Nishitoba_Unno_Enrique Yalta Soplin_Heymann_Wiesner_Chen_et al._2018, title={ESPnet: End-to-End Speech Processing Toolkit}, DOI={10.21437/Interspeech.2018-1456}, booktitle={INTERSPEECH 2018, Hyderabad, India}, author={Watanabe, Shinji and Hori, Takaaki and Karita, Shigeki and Hayashi, Tomoki and Nishitoba, Jiro and Unno, Yuya and Enrique Yalta Soplin, Nelson and Heymann, Jahn and Wiesner, Matthew and Chen, Nanxin and et al.}, year={2018}, pages={2207–2211} }

Watanabe, Shinji, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, et al. “ESPnet: End-to-End Speech Processing Toolkit.” In INTERSPEECH 2018, Hyderabad, India, 2207–2211, 2018. https://doi.org/10.21437/Interspeech.2018-1456.

S. Watanabe et al., “ESPnet: End-to-End Speech Processing Toolkit,” in INTERSPEECH 2018, Hyderabad, India, 2018, pp. 2207–2211, doi: 10.21437/Interspeech.2018-1456.

Watanabe, Shinji, et al. “ESPnet: End-to-End Speech Processing Toolkit.” INTERSPEECH 2018, Hyderabad, India, 2018, pp. 2207–2211, doi:10.21437/Interspeech.2018-1456.

All files available under the following license(s):

Creative Commons Public Domain Dedication (CC0 1.0):

https://creativecommons.org/publicdomain/zero/1.0/
https://creativecommons.org/publicdomain/zero/1.0/legalcode

Main File(s)

File Name

INTERSPEECH_2018_Heymann_Paper.pdf 288.91 KB

Access Level

Open Access

Last Uploaded

2022-02-23T08:03:13Z

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar