---
_id: '56581'
abstract:
- lang: eng
  text: 'In recent years, there has been a surge in natural language processing research
    focused on low-resource languages (LrLs), underscoring the growing recognition
    that LrLs deserve the same attention as high-resource languages (HrLs). This shift
    is crucial for ensuring linguistic diversity and inclusivity in the digital age.
    Despite Indonesian ranking as the 11th most spoken language globally, it remains
    under-resourced in terms of computational tools and datasets. Within the semantic
    web domain, Entity Linking (EL) is pivotal, linking textual entity mentions to
    their corresponding entries in knowledge bases. This process is foundational for
    advanced information extraction tasks, including relation extraction and event
    detection. To bolster EL research in Indonesian, we introduce IndEL, the first
    benchmark dataset tailored for both general and specific domains. IndEL was manually
    curated using Wikidata, adhering to a rigorous set of annotation guidelines. We
    used two Named Entity Recognition (NER) benchmark datasets for entity extraction:
    NER UI for the general domain and IndQNER for the specific domain. IndQNER focused
    on entities from the Indonesian translation of the Quran. IndEL comprises 4765
    entities in the general domain and 2453 in the specific domain. Using the GERBIL
    framework, we use IndEL to evaluate the performance of various EL systems, such
    as Babelfy, DBpedia Spotlight, MAG, OpenTapioca, and WAT. Our further investigation
    reveals that within Wikidata, a significant number of NIL entities remain unlinked
    due to the limited number of Indonesian labels and the use of acronyms. Especially
    in the specific domain, transliteration and translation processes performed to
    create the Indonesian translation of the Quran contribute to the presence of entities
    in a descriptive form and as synonyms.'
author:
- first_name: Ria Hari
  full_name: Gusmita, Ria Hari
  id: '71039'
  last_name: Gusmita
- first_name: Muhammad Faruq Amiral
  full_name: Abshar, Muhammad Faruq Amiral
  last_name: Abshar
- first_name: Diego
  full_name: Moussallem, Diego
  id: '71635'
  last_name: Moussallem
- first_name: Axel-Cyrille
  full_name: Ngonga Ngomo, Axel-Cyrille
  id: '65716'
  last_name: Ngonga Ngomo
citation:
  ama: 'Gusmita RH, Abshar MFA, Moussallem D, Ngonga Ngomo A-C. IndEL: Indonesian
    Entity Linking Benchmark Dataset for General and Specific Domains. In: <i>Lecture
    Notes in Computer Science</i>. Springer Nature Switzerland; 2024. doi:<a href="https://doi.org/10.1007/978-3-031-70239-6_34">10.1007/978-3-031-70239-6_34</a>'
  apa: 'Gusmita, R. H., Abshar, M. F. A., Moussallem, D., &#38; Ngonga Ngomo, A.-C.
    (2024). IndEL: Indonesian Entity Linking Benchmark Dataset for General and Specific
    Domains. In <i>Lecture Notes in Computer Science</i>. The 29th Annual International
    Conference on Natural Language &#38; Information Systems (NLDB 2024), Turin, Italy.
    Springer Nature Switzerland. <a href="https://doi.org/10.1007/978-3-031-70239-6_34">https://doi.org/10.1007/978-3-031-70239-6_34</a>'
  bibtex: '@inbook{Gusmita_Abshar_Moussallem_Ngonga Ngomo_2024, place={Cham}, title={IndEL:
    Indonesian Entity Linking Benchmark Dataset for General and Specific Domains},
    DOI={<a href="https://doi.org/10.1007/978-3-031-70239-6_34">10.1007/978-3-031-70239-6_34</a>},
    booktitle={Lecture Notes in Computer Science}, publisher={Springer Nature Switzerland},
    author={Gusmita, Ria Hari and Abshar, Muhammad Faruq Amiral and Moussallem, Diego
    and Ngonga Ngomo, Axel-Cyrille}, year={2024} }'
  chicago: 'Gusmita, Ria Hari, Muhammad Faruq Amiral Abshar, Diego Moussallem, and
    Axel-Cyrille Ngonga Ngomo. “IndEL: Indonesian Entity Linking Benchmark Dataset
    for General and Specific Domains.” In <i>Lecture Notes in Computer Science</i>.
    Cham: Springer Nature Switzerland, 2024. <a href="https://doi.org/10.1007/978-3-031-70239-6_34">https://doi.org/10.1007/978-3-031-70239-6_34</a>.'
  ieee: 'R. H. Gusmita, M. F. A. Abshar, D. Moussallem, and A.-C. Ngonga Ngomo, “IndEL:
    Indonesian Entity Linking Benchmark Dataset for General and Specific Domains,”
    in <i>Lecture Notes in Computer Science</i>, Cham: Springer Nature Switzerland,
    2024.'
  mla: 'Gusmita, Ria Hari, et al. “IndEL: Indonesian Entity Linking Benchmark Dataset
    for General and Specific Domains.” <i>Lecture Notes in Computer Science</i>, Springer
    Nature Switzerland, 2024, doi:<a href="https://doi.org/10.1007/978-3-031-70239-6_34">10.1007/978-3-031-70239-6_34</a>.'
  short: 'R.H. Gusmita, M.F.A. Abshar, D. Moussallem, A.-C. Ngonga Ngomo, in: Lecture
    Notes in Computer Science, Springer Nature Switzerland, Cham, 2024.'
conference:
  end_date: 2024-06-27
  location: Turin, Italy
  name: The 29th Annual International Conference on Natural Language & Information
    Systems (NLDB 2024)
  start_date: 2024-06-25
date_created: 2024-10-10T14:29:08Z
date_updated: 2024-10-14T19:22:16Z
doi: 10.1007/978-3-031-70239-6_34
keyword:
- entity linking benchmark dataset
- Indonesian
- general and specific domains
language:
- iso: eng
place: Cham
publication: Lecture Notes in Computer Science
publication_identifier:
  isbn:
  - '9783031702389'
  - '9783031702396'
  issn:
  - 0302-9743
  - 1611-3349
publication_status: published
publisher: Springer Nature Switzerland
related_material:
  link:
  - relation: confirmation
    url: https://link.springer.com/chapter/10.1007/978-3-031-70239-6_34
status: public
title: 'IndEL: Indonesian Entity Linking Benchmark Dataset for General and Specific
  Domains'
type: book_chapter
user_id: '71039'
year: '2024'
...
