---
_id: '1114'
abstract:
- lang: eng
  text: This paper presents a system that uses the domain name of a German business
    website to locate its information pages (e.g. company profile, contact page, imprint)
    and then identifies business specific information. We therefore concentrate on
    the extraction of characteristic vocabulary like company names, addresses, contact
    details, CEOs, etc.  Above all, we interpret the HTML structure of documents and
    analyze some contextual facts to transform the unstructured web pages into structured
    forms. Our approach is quite robust in variability of the DOM, upgradeable and
    keeps data up-to-date. The evaluation experiments show high efficiency of information
    access to the generated data.  Hence, the developed technique is adaptive to non-German
    websites with slight language-specific modifications, and experimental results
    on real-life websites confirm the feasibility of the approach.
author:
- first_name: Yeong Su
  full_name: Lee, Yeong Su
  last_name: Lee
- first_name: Michaela
  full_name: Geierhos, Michaela
  id: '42496'
  last_name: Geierhos
  orcid: 0000-0002-8180-5606
citation:
  ama: 'Lee YS, Geierhos M. Business Specific Online Information Extraction from German
    Websites. In: Aly R, Hauff C, Hiemstra D, Huibers TWC, de Jong FMG, eds. <i>Proceedings
    of the 9th Dutch-Belgian Information Retrieval Workshop</i>. Workshop Proceedings
    Series. Enschede, The Netherlands: Centre for Telematics and Information Technology
    (CTIT), University of Twente; 2009:79-86.'
  apa: 'Lee, Y. S., &#38; Geierhos, M. (2009). Business Specific Online Information
    Extraction from German Websites. In R. Aly, C. Hauff, D. Hiemstra, T. W. C. Huibers,
    &#38; F. M. G. de Jong (Eds.), <i>Proceedings of the 9th Dutch-Belgian Information
    Retrieval Workshop</i> (pp. 79–86). Enschede, The Netherlands: Centre for Telematics
    and Information Technology (CTIT), University of Twente.'
  bibtex: '@inproceedings{Lee_Geierhos_2009, place={Enschede, The Netherlands}, series={Workshop
    Proceedings Series}, title={Business Specific Online Information Extraction from
    German Websites}, booktitle={Proceedings of the 9th Dutch-Belgian Information
    Retrieval Workshop}, publisher={Centre for Telematics and Information Technology
    (CTIT), University of Twente}, author={Lee, Yeong Su and Geierhos, Michaela},
    editor={Aly, Robin and Hauff, C. and Hiemstra, Djoerd and Huibers, Theo W.C. and
    de Jong, Franciska M.G.Editors}, year={2009}, pages={79–86}, collection={Workshop
    Proceedings Series} }'
  chicago: 'Lee, Yeong Su, and Michaela Geierhos. “Business Specific Online Information
    Extraction from German Websites.” In <i>Proceedings of the 9th Dutch-Belgian Information
    Retrieval Workshop</i>, edited by Robin Aly, C. Hauff, Djoerd Hiemstra, Theo W.C.
    Huibers, and Franciska M.G. de Jong, 79–86. Workshop Proceedings Series. Enschede,
    The Netherlands: Centre for Telematics and Information Technology (CTIT), University
    of Twente, 2009.'
  ieee: Y. S. Lee and M. Geierhos, “Business Specific Online Information Extraction
    from German Websites,” in <i>Proceedings of the 9th Dutch-Belgian Information
    Retrieval Workshop</i>, Enschede, The Netherlands, 2009, pp. 79–86.
  mla: Lee, Yeong Su, and Michaela Geierhos. “Business Specific Online Information
    Extraction from German Websites.” <i>Proceedings of the 9th Dutch-Belgian Information
    Retrieval Workshop</i>, edited by Robin Aly et al., Centre for Telematics and
    Information Technology (CTIT), University of Twente, 2009, pp. 79–86.
  short: 'Y.S. Lee, M. Geierhos, in: R. Aly, C. Hauff, D. Hiemstra, T.W.C. Huibers,
    F.M.G. de Jong (Eds.), Proceedings of the 9th Dutch-Belgian Information Retrieval
    Workshop, Centre for Telematics and Information Technology (CTIT), University
    of Twente, Enschede, The Netherlands, 2009, pp. 79–86.'
conference:
  end_date: 2009-02-03
  location: Enschede, The Netherlands
  name: 9th Dutch-Belgian Information Retrieval Workshop (DIR 2009)
  start_date: 2009-02-02
date_created: 2018-01-29T13:50:39Z
date_updated: 2022-01-06T06:50:57Z
department:
- _id: '36'
- _id: '1'
- _id: '579'
editor:
- first_name: Robin
  full_name: Aly, Robin
  last_name: Aly
- first_name: C.
  full_name: Hauff, C.
  last_name: Hauff
- first_name: Djoerd
  full_name: Hiemstra, Djoerd
  last_name: Hiemstra
- first_name: Theo W.C.
  full_name: Huibers, Theo W.C.
  last_name: Huibers
- first_name: Franciska M.G.
  full_name: de Jong, Franciska M.G.
  last_name: de Jong
extern: '1'
keyword:
- company search
- information extraction
- sublanguage
language:
- iso: eng
page: 79-86
place: Enschede, The Netherlands
publication: Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop
publication_identifier:
  issn:
  - 0929-0672
publication_status: published
publisher: Centre for Telematics and Information Technology (CTIT), University of
  Twente
quality_controlled: '1'
series_title: Workshop Proceedings Series
status: public
title: Business Specific Online Information Extraction from German Websites
type: conference
user_id: '42496'
year: '2009'
...
