ELEVATE-ID: Extending Large Language Models for End-to-End Entity Linking Evaluation in Indonesian

R.H. Gusmita, A.F. Firmansyah, H.M. Zahera, A.-C. Ngonga Ngomo, Data & Knowledge Engineering 161 (2026) 102504.

Download
No fulltext has been uploaded.
Journal Article | English
Author
Gusmita, Ria Hari; Firmansyah, Asep Fajar; Zahera, Hamada M.; Ngonga Ngomo, Axel-Cyrille
Abstract
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks. However, their effectiveness in low-resource languages remains underexplored, particularly in complex tasks such as end-to-end Entity Linking (EL), which requires both mention detection and disambiguation against a knowledge base (KB). In earlier work, we introduced IndEL — the first end-to-end EL benchmark dataset for the Indonesian language — covering both a general domain (news) and a specific domain (religious text from the Indonesian translation of the Quran), and evaluated four traditional end-to-end EL systems on this dataset. In this study, we propose ELEVATE-ID, a comprehensive evaluation framework for assessing LLM performance on end-to-end EL in Indonesian. The framework evaluates LLMs under both zero-shot and fine-tuned conditions, using multilingual and Indonesian monolingual models, with Wikidata as the target KB. Our experiments include performance benchmarking, generalization analysis across domains, and systematic error analysis. Results show that GPT-4 and GPT-3.5 achieve the highest accuracy in zero-shot and fine-tuned settings, respectively. However, even fine-tuned GPT-3.5 underperforms compared to DBpedia Spotlight — the weakest of the traditional model baselines — in the general domain. Interestingly, GPT-3.5 outperforms Babelfy in the specific domain. Generalization analysis indicates that fine-tuned GPT-3.5 adapts more effectively to cross-domain and mixed-domain scenarios. Error analysis uncovers persistent challenges that hinder LLM performance: difficulties with non-complete mentions, acronym disambiguation, and full-name recognition in formal contexts. These issues point to limitations in mention boundary detection and contextual grounding. Indonesian-pretrained LLMs, Komodo and Merak, reveal core weaknesses: template leakage and entity hallucination, respectively—underscoring architectural and training limitations in low-resource end-to-end EL.11Code and dataset are available at https://github.com/dice-group/ELEVATE-ID.
Publishing Year
Journal Title
Data & Knowledge Engineering
Volume
161
Page
102504
ISSN
LibreCat-ID

Cite this

Gusmita RH, Firmansyah AF, Zahera HM, Ngonga Ngomo A-C. ELEVATE-ID: Extending Large Language Models for End-to-End Entity Linking Evaluation in Indonesian. Data & Knowledge Engineering. 2026;161:102504. doi:https://doi.org/10.1016/j.datak.2025.102504
Gusmita, R. H., Firmansyah, A. F., Zahera, H. M., & Ngonga Ngomo, A.-C. (2026). ELEVATE-ID: Extending Large Language Models for End-to-End Entity Linking Evaluation in Indonesian. Data & Knowledge Engineering, 161, 102504. https://doi.org/10.1016/j.datak.2025.102504
@article{Gusmita_Firmansyah_Zahera_Ngonga Ngomo_2026, title={ELEVATE-ID: Extending Large Language Models for End-to-End Entity Linking Evaluation in Indonesian}, volume={161}, DOI={https://doi.org/10.1016/j.datak.2025.102504}, journal={Data & Knowledge Engineering}, author={Gusmita, Ria Hari and Firmansyah, Asep Fajar and Zahera, Hamada M. and Ngonga Ngomo, Axel-Cyrille}, year={2026}, pages={102504} }
Gusmita, Ria Hari, Asep Fajar Firmansyah, Hamada M. Zahera, and Axel-Cyrille Ngonga Ngomo. “ELEVATE-ID: Extending Large Language Models for End-to-End Entity Linking Evaluation in Indonesian.” Data & Knowledge Engineering 161 (2026): 102504. https://doi.org/10.1016/j.datak.2025.102504.
R. H. Gusmita, A. F. Firmansyah, H. M. Zahera, and A.-C. Ngonga Ngomo, “ELEVATE-ID: Extending Large Language Models for End-to-End Entity Linking Evaluation in Indonesian,” Data & Knowledge Engineering, vol. 161, p. 102504, 2026, doi: https://doi.org/10.1016/j.datak.2025.102504.
Gusmita, Ria Hari, et al. “ELEVATE-ID: Extending Large Language Models for End-to-End Entity Linking Evaluation in Indonesian.” Data & Knowledge Engineering, vol. 161, 2026, p. 102504, doi:https://doi.org/10.1016/j.datak.2025.102504.

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar