---
_id: '29027'
abstract:
- lang: eng
  text: Over the last years, the Linked Open Data (LOD) has evolved from a mere 12
    to more than 10, 000 knowledge bases. These knowledge bases come from diverse
    domains including (but not limited to) publications, life sciences, social networking,
    government, media, linguistics. Moreover, the LOD cloud also contains a large
    number of crossdomain knowledge bases such as DBpedia and Yago2. These knowledge
    bases are commonly managed in a decentralized fashion and contain partly overlapping
    information. This architectural choice has led to knowledge pertaining to the
    same domain being published by independent entities in the LOD cloud. For example,
    information on drugs can be found in Diseasome as well as DBpedia and Drugbank.
    Furthermore, certain knowledge bases such as DBLP have been published by several
    bodies, which in turn has lead to duplicated content in the LOD. In addition,
    large amounts of geo-spatial information have been made available with the growth
    of heterogeneous Web of Data. The concurrent publication of knowledge bases containing
    related information promises to become a phenomenon of increasing importance with
    the growth of the number of independent data providers. Enabling the joint use
    of the knowledge bases published by these providers for tasks such as federated
    queries, cross-ontology question answering and data integration is most commonly
    tackled by creating links between the resources described within these knowledge
    bases. Within this thesis, we spur the transition from isolated knowledge bases
    to enriched Linked Data sets where information can be easily integrated and processed.
    To achieve this goal, we provide concepts, approaches and use cases that facilitate
    the integration and enrichment of information with other data types that are already
    present on the Linked Data Web with a focus on geo-spatial data. The first challenge
    that motivates our work is the lack of measures that use the geographic data for
    linking geo-spatial knowledge bases. This is partly due to the geo-spatial resources
    being described by the means of vector geometry. In particular, discrepancies
    in granularity and error measurements across knowledge bases render the selection
    of appropriate distance measures for geo-spatial resources difficult. We address
    this challenge by evaluating existing literature for pointset measures that can
    be used to measure the similarity of vector geometries. Then, we present and evaluate
    the ten measures that we derived from the literature on samples of three real
    knowledge bases. The second challenge we address in this thesis is the lack of
    automatic Link Discovery (LD) approaches capable of dealing with geospatial knowledge
    bases with missing and erroneous data. To this end,we present Colibri, an unsupervised
    approach that allows discovering links between knowledge bases while improving
    the quality of the instance data in these knowledge bases. A Colibri iteration
    begins by generating links between knowledge bases. Then, the approach makes use
    of these links to detect resources with probably erroneous or missing information.
    This erroneous or missing infor- mation detected by the approach is finally corrected
    or added. The third challenge we address is the lack of scalable LD approaches
    for tackling big geo-spatial knowledge bases. Thus, we present Deterministic Particle-Swarm
    Optimization (DPSO), a novel load balancing technique for LD on parallel hardware
    based on particle-swarm optimization. We combine this approach with the Orchid
    algorithm for geo-spatial linking and evaluate it on real and artificial data
    sets. The lack of approaches for automatic updating of links of an evolving knowledge
    base is our fourth challenge. This challenge is addressed in this thesis by the
    Wombat algorithm. Wombat is a novel approach for the discovery of links between
    knowledge bases that relies exclusively on positive examples. Wombat is based
    on generalisation via an upward refinement operator to traverse the space of Link
    Specifications (LS). We study the theoretical characteristics of Wombat and evaluate
    it on different benchmark data sets. The last challenge addressed herein is the
    lack of automatic approaches for geo-spatial knowledge base enrichment. Thus,
    we propose Deer, a supervised learning approach based on a refinement operator
    for enriching Resource Description Framework (RDF) data sets. We show how we can
    use exemplary descriptions of enriched resources to generate accurate enrichment
    pipelines. We evaluate our approach against manually defined enrichment pipelines
    and show that our approach can learn accurate pipelines even when provided with
    a small number of training examples. Each of the proposed approaches is implemented
    and evaluated against state-of-the-art approaches on real and/or artificial data
    sets. Moreover, all approaches are peer-reviewed and published in a con- ference
    or a journal paper. Throughout this thesis, we detail the ideas, implementation
    and the evaluation of each of the approaches. Moreover, we discuss each approach
    and present lessons learned. Finally, we conclude this thesis by presenting a
    set of possible future extensions and use cases for each of the proposed approaches.
author:
- first_name: Mohamed
  full_name: Sherif, Mohamed
  id: '67234'
  last_name: Sherif
  orcid: https://orcid.org/0000-0002-9927-2203
citation:
  ama: Sherif M. <i>Automating Geospatial RDF Dataset Integration and Enrichment</i>.
    University of Leipzig; 2016.
  apa: Sherif, M. (2016). <i>Automating Geospatial RDF Dataset Integration and Enrichment</i>.
    University of Leipzig.
  bibtex: '@book{Sherif_2016, place={Leipzig, Germany}, title={Automating Geospatial
    RDF Dataset Integration and Enrichment}, publisher={University of Leipzig}, author={Sherif,
    Mohamed}, year={2016} }'
  chicago: 'Sherif, Mohamed. <i>Automating Geospatial RDF Dataset Integration and
    Enrichment</i>. Leipzig, Germany: University of Leipzig, 2016.'
  ieee: 'M. Sherif, <i>Automating Geospatial RDF Dataset Integration and Enrichment</i>.
    Leipzig, Germany: University of Leipzig, 2016.'
  mla: Sherif, Mohamed. <i>Automating Geospatial RDF Dataset Integration and Enrichment</i>.
    University of Leipzig, 2016.
  short: M. Sherif, Automating Geospatial RDF Dataset Integration and Enrichment,
    University of Leipzig, Leipzig, Germany, 2016.
date_created: 2021-12-17T09:59:57Z
date_updated: 2024-05-08T10:40:28Z
keyword:
- 2016 group\_aksw sys:relevantFor:geoknow sys:relevantFor:infai sys:relevantFor:bis
  ngonga simba dice sherif group\_aksw geoknow deer lehmann MOLE
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://www.qucosa.de/landing-page/?tx_dlf%5Bid%5D=https%3A%2F%2Fwww.qucosa.de%2Fapi%2Fqucosa%253A15175%2Fmets%2F&cHash=22c1b49c76de010dc4fb42260d8a1cf6
oa: '1'
place: Leipzig, Germany
publisher: University of Leipzig
status: public
supervisor:
- first_name: 'Klaus-Peter '
  full_name: 'Fähnrich, Klaus-Peter '
  last_name: Fähnrich
- first_name: 'Jens '
  full_name: 'Lehmann, Jens '
  last_name: Lehmann
title: Automating Geospatial RDF Dataset Integration and Enrichment
type: dissertation
user_id: '67234'
year: '2016'
...
