[{"file_date_updated":"2023-10-19T07:41:56Z","project":[{"name":"PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing","_id":"52"},{"_id":"508","name":"Automatische Transkription von Gesprächssituationen","grant_number":"448568305"}],"_id":"48281","user_id":"40767","department":[{"_id":"54"}],"status":"public","type":"conference","main_file_link":[{"url":"https://ieeexplore.ieee.org/document/10094784"}],"doi":"10.1109/icassp49357.2023.10094784","oa":"1","date_updated":"2025-02-12T09:16:34Z","author":[{"full_name":"von Neumann, Thilo","id":"49870","last_name":"von Neumann","orcid":"https://orcid.org/0000-0002-7717-8670","first_name":"Thilo"},{"first_name":"Christoph","full_name":"Boeddeker, Christoph","id":"40767","last_name":"Boeddeker"},{"first_name":"Keisuke","last_name":"Kinoshita","full_name":"Kinoshita, Keisuke"},{"full_name":"Delcroix, Marc","last_name":"Delcroix","first_name":"Marc"},{"first_name":"Reinhold","id":"242","full_name":"Haeb-Umbach, Reinhold","last_name":"Haeb-Umbach"}],"citation":{"mla":"von Neumann, Thilo, et al. “On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech Recognition Systems.” <i>ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, IEEE, 2023, doi:<a href=\"https://doi.org/10.1109/icassp49357.2023.10094784\">10.1109/icassp49357.2023.10094784</a>.","short":"T. von Neumann, C. Boeddeker, K. Kinoshita, M. Delcroix, R. Haeb-Umbach, in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2023.","bibtex":"@inproceedings{von Neumann_Boeddeker_Kinoshita_Delcroix_Haeb-Umbach_2023, title={On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech Recognition Systems}, DOI={<a href=\"https://doi.org/10.1109/icassp49357.2023.10094784\">10.1109/icassp49357.2023.10094784</a>}, booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, publisher={IEEE}, author={von Neumann, Thilo and Boeddeker, Christoph and Kinoshita, Keisuke and Delcroix, Marc and Haeb-Umbach, Reinhold}, year={2023} }","apa":"von Neumann, T., Boeddeker, C., Kinoshita, K., Delcroix, M., &#38; Haeb-Umbach, R. (2023). On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech Recognition Systems. <i>ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>. <a href=\"https://doi.org/10.1109/icassp49357.2023.10094784\">https://doi.org/10.1109/icassp49357.2023.10094784</a>","ama":"von Neumann T, Boeddeker C, Kinoshita K, Delcroix M, Haeb-Umbach R. On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech Recognition Systems. In: <i>ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>. IEEE; 2023. doi:<a href=\"https://doi.org/10.1109/icassp49357.2023.10094784\">10.1109/icassp49357.2023.10094784</a>","chicago":"Neumann, Thilo von, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix, and Reinhold Haeb-Umbach. “On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech Recognition Systems.” In <i>ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>. IEEE, 2023. <a href=\"https://doi.org/10.1109/icassp49357.2023.10094784\">https://doi.org/10.1109/icassp49357.2023.10094784</a>.","ieee":"T. von Neumann, C. Boeddeker, K. Kinoshita, M. Delcroix, and R. Haeb-Umbach, “On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech Recognition Systems,” 2023, doi: <a href=\"https://doi.org/10.1109/icassp49357.2023.10094784\">10.1109/icassp49357.2023.10094784</a>."},"publication_status":"published","has_accepted_license":"1","related_material":{"link":[{"url":"https://github.com/fgnt/meeteval","relation":"software"}]},"ddc":["000"],"keyword":["Word Error Rate","Meeting Recognition","Levenshtein Distance"],"language":[{"iso":"eng"}],"abstract":[{"lang":"eng","text":"\tWe propose a general framework to compute the word error rate (WER) of ASR systems that process recordings containing multiple speakers at their input and that produce multiple output word sequences (MIMO).\r\n\tSuch ASR systems are typically required, e.g., for meeting transcription.\r\n\tWe provide an efficient implementation based on a dynamic programming search in a multi-dimensional Levenshtein distance tensor under the constraint that a reference utterance must be matched consistently with one hypothesis output. \r\n\tThis also results in an efficient implementation of the ORC WER which previously suffered from exponential complexity.\r\n\tWe give an overview of commonly used WER definitions for multi-speaker scenarios and show that they are specializations of the above MIMO WER tuned to particular application scenarios. \r\n\tWe conclude with a  discussion of the pros and cons of the various WER definitions and a recommendation when to use which."}],"file":[{"relation":"main_file","content_type":"application/pdf","file_id":"48282","file_name":"ICASSP_2023_Meeting_Evaluation.pdf","access_level":"open_access","file_size":204994,"creator":"tvn","date_created":"2023-10-19T07:39:57Z","date_updated":"2023-10-19T07:41:56Z"}],"publication":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","title":"On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech Recognition Systems","publisher":"IEEE","date_created":"2023-10-19T07:38:31Z","year":"2023","quality_controlled":"1"},{"title":"MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems","date_created":"2023-10-19T07:24:51Z","year":"2023","quality_controlled":"1","language":[{"iso":"eng"}],"ddc":["000"],"keyword":["Speech Recognition","Word Error Rate","Meeting Transcription"],"file":[{"file_name":"Chime_7__MeetEval.pdf","access_level":"open_access","file_id":"48276","file_size":263744,"date_created":"2023-10-19T07:19:59Z","creator":"tvn","date_updated":"2023-10-19T07:19:59Z","relation":"main_file","content_type":"application/pdf"}],"abstract":[{"text":"MeetEval is an open-source toolkit to evaluate  all kinds of meeting transcription systems.\r\nIt provides a unified interface for the computation of commonly used Word Error Rates (WERs), specifically cpWER, ORC WER and MIMO WER along other WER definitions.\r\nWe extend the cpWER computation by a temporal constraint to ensure that only words are identified as correct when the temporal alignment is plausible.\r\nThis leads to a better quality of the matching of the hypothesis string to the reference string that more closely resembles the actual transcription quality, and a system is penalized if it provides poor time annotations.\r\nSince word-level timing information is often not available, we present a way to approximate exact word-level timings from segment-level timings (e.g., a sentence) and show that the approximation leads to a similar WER as a matching with exact word-level annotations.\r\nAt the same time, the time constraint leads to a speedup of the matching algorithm, which outweighs the additional overhead caused by processing the time stamps.","lang":"eng"}],"publication":"Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments","main_file_link":[{"url":"https://arxiv.org/abs/2307.11394","open_access":"1"}],"conference":{"name":"CHiME 2023 Workshop on Speech Processing in Everyday Environments","location":"Dublin"},"author":[{"first_name":"Thilo","last_name":"von Neumann","orcid":"https://orcid.org/0000-0002-7717-8670","id":"49870","full_name":"von Neumann, Thilo"},{"full_name":"Boeddeker, Christoph","id":"40767","last_name":"Boeddeker","first_name":"Christoph"},{"full_name":"Delcroix, Marc","last_name":"Delcroix","first_name":"Marc"},{"first_name":"Reinhold","full_name":"Haeb-Umbach, Reinhold","id":"242","last_name":"Haeb-Umbach"}],"oa":"1","date_updated":"2025-02-12T09:12:05Z","citation":{"ama":"von Neumann T, Boeddeker C, Delcroix M, Haeb-Umbach R. MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems. In: <i>Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>. ; 2023.","ieee":"T. von Neumann, C. Boeddeker, M. Delcroix, and R. Haeb-Umbach, “MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems,” presented at the CHiME 2023 Workshop on Speech Processing in Everyday Environments, Dublin, 2023.","chicago":"Neumann, Thilo von, Christoph Boeddeker, Marc Delcroix, and Reinhold Haeb-Umbach. “MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems.” In <i>Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>, 2023.","bibtex":"@inproceedings{von Neumann_Boeddeker_Delcroix_Haeb-Umbach_2023, title={MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems}, booktitle={Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments}, author={von Neumann, Thilo and Boeddeker, Christoph and Delcroix, Marc and Haeb-Umbach, Reinhold}, year={2023} }","mla":"von Neumann, Thilo, et al. “MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems.” <i>Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>, 2023.","short":"T. von Neumann, C. Boeddeker, M. Delcroix, R. Haeb-Umbach, in: Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments, 2023.","apa":"von Neumann, T., Boeddeker, C., Delcroix, M., &#38; Haeb-Umbach, R. (2023). MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems. <i>Proc. CHiME 2023 Workshop on Speech Processing in Everyday Environments</i>. CHiME 2023 Workshop on Speech Processing in Everyday Environments, Dublin."},"related_material":{"link":[{"relation":"software","url":"https://github.com/fgnt/meeteval"}]},"has_accepted_license":"1","file_date_updated":"2023-10-19T07:19:59Z","user_id":"40767","department":[{"_id":"54"}],"project":[{"_id":"52","name":"PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing"},{"name":"Automatische Transkription von Gesprächssituationen","_id":"508","grant_number":"448568305"}],"_id":"48275","status":"public","type":"conference"},{"year":"2013","citation":{"ama":"Leutnant V, Krueger A, Haeb-Umbach R. Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition. <i>IEEE Transactions on Audio, Speech, and Language Processing</i>. 2013;21(8):1640-1652. doi:<a href=\"https://doi.org/10.1109/TASL.2013.2258013\">10.1109/TASL.2013.2258013</a>","chicago":"Leutnant, Volker, Alexander Krueger, and Reinhold Haeb-Umbach. “Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition.” <i>IEEE Transactions on Audio, Speech, and Language Processing</i> 21, no. 8 (2013): 1640–52. <a href=\"https://doi.org/10.1109/TASL.2013.2258013\">https://doi.org/10.1109/TASL.2013.2258013</a>.","ieee":"V. Leutnant, A. Krueger, and R. Haeb-Umbach, “Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition,” <i>IEEE Transactions on Audio, Speech, and Language Processing</i>, vol. 21, no. 8, pp. 1640–1652, 2013.","short":"V. Leutnant, A. Krueger, R. Haeb-Umbach, IEEE Transactions on Audio, Speech, and Language Processing 21 (2013) 1640–1652.","bibtex":"@article{Leutnant_Krueger_Haeb-Umbach_2013, title={Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition}, volume={21}, DOI={<a href=\"https://doi.org/10.1109/TASL.2013.2258013\">10.1109/TASL.2013.2258013</a>}, number={8}, journal={IEEE Transactions on Audio, Speech, and Language Processing}, author={Leutnant, Volker and Krueger, Alexander and Haeb-Umbach, Reinhold}, year={2013}, pages={1640–1652} }","mla":"Leutnant, Volker, et al. “Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition.” <i>IEEE Transactions on Audio, Speech, and Language Processing</i>, vol. 21, no. 8, 2013, pp. 1640–52, doi:<a href=\"https://doi.org/10.1109/TASL.2013.2258013\">10.1109/TASL.2013.2258013</a>.","apa":"Leutnant, V., Krueger, A., &#38; Haeb-Umbach, R. (2013). Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition. <i>IEEE Transactions on Audio, Speech, and Language Processing</i>, <i>21</i>(8), 1640–1652. <a href=\"https://doi.org/10.1109/TASL.2013.2258013\">https://doi.org/10.1109/TASL.2013.2258013</a>"},"page":"1640-1652","intvolume":"        21","issue":"8","title":"Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition","doi":"10.1109/TASL.2013.2258013","date_updated":"2022-01-06T06:51:11Z","author":[{"full_name":"Leutnant, Volker","last_name":"Leutnant","first_name":"Volker"},{"first_name":"Alexander","full_name":"Krueger, Alexander","last_name":"Krueger"},{"id":"242","full_name":"Haeb-Umbach, Reinhold","last_name":"Haeb-Umbach","first_name":"Reinhold"}],"date_created":"2019-07-12T05:29:42Z","volume":21,"abstract":[{"lang":"eng","text":"In this contribution we extend a previously proposed Bayesian approach for the enhancement of reverberant logarithmic mel power spectral coefficients for robust automatic speech recognition to the additional compensation of background noise. A recently proposed observation model is employed whose time-variant observation error statistics are obtained as a side product of the inference of the a posteriori probability density function of the clean speech feature vectors. Further a reduction of the computational effort and the memory requirements are achieved by using a recursive formulation of the observation model. The performance of the proposed algorithms is first experimentally studied on a connected digits recognition task with artificially created noisy reverberant data. It is shown that the use of the time-variant observation error model leads to a significant error rate reduction at low signal-to-noise ratios compared to a time-invariant model. Further experiments were conducted on a 5000 word task recorded in a reverberant and noisy environment. A significant word error rate reduction was obtained demonstrating the effectiveness of the approach on real-world data."}],"status":"public","type":"journal_article","publication":"IEEE Transactions on Audio, Speech, and Language Processing","keyword":["Bayes methods","compensation","error statistics","reverberation","speech recognition","Bayesian feature enhancement","background noise","clean speech feature vectors","compensation","connected digits recognition task","error statistics","memory requirements","noisy reverberant data","posteriori probability density function","recursive formulation","reverberant logarithmic mel power spectral coefficients","robust automatic speech recognition","signal-to-noise ratios","time-variant observation","word error rate reduction","Robust automatic speech recognition","model-based Bayesian feature enhancement","observation model for reverberant and noisy speech","recursive observation model"],"language":[{"iso":"eng"}],"_id":"11862","user_id":"44006","department":[{"_id":"54"}]}]
