---
_id: '1114'
abstract:
- lang: eng
text: This paper presents a system that uses the domain name of a German business
website to locate its information pages (e.g. company profile, contact page, imprint)
and then identifies business specific information. We therefore concentrate on
the extraction of characteristic vocabulary like company names, addresses, contact
details, CEOs, etc. Above all, we interpret the HTML structure of documents and
analyze some contextual facts to transform the unstructured web pages into structured
forms. Our approach is quite robust in variability of the DOM, upgradeable and
keeps data up-to-date. The evaluation experiments show high efficiency of information
access to the generated data. Hence, the developed technique is adaptive to non-German
websites with slight language-specific modifications, and experimental results
on real-life websites confirm the feasibility of the approach.
author:
- first_name: Yeong Su
full_name: Lee, Yeong Su
last_name: Lee
- first_name: Michaela
full_name: Geierhos, Michaela
id: '42496'
last_name: Geierhos
orcid: 0000-0002-8180-5606
citation:
ama: 'Lee YS, Geierhos M. Business Specific Online Information Extraction from German
Websites. In: Aly R, Hauff C, Hiemstra D, Huibers TWC, de Jong FMG, eds. Proceedings
of the 9th Dutch-Belgian Information Retrieval Workshop. Workshop Proceedings
Series. Enschede, The Netherlands: Centre for Telematics and Information Technology
(CTIT), University of Twente; 2009:79-86.'
apa: 'Lee, Y. S., & Geierhos, M. (2009). Business Specific Online Information
Extraction from German Websites. In R. Aly, C. Hauff, D. Hiemstra, T. W. C. Huibers,
& F. M. G. de Jong (Eds.), Proceedings of the 9th Dutch-Belgian Information
Retrieval Workshop (pp. 79–86). Enschede, The Netherlands: Centre for Telematics
and Information Technology (CTIT), University of Twente.'
bibtex: '@inproceedings{Lee_Geierhos_2009, place={Enschede, The Netherlands}, series={Workshop
Proceedings Series}, title={Business Specific Online Information Extraction from
German Websites}, booktitle={Proceedings of the 9th Dutch-Belgian Information
Retrieval Workshop}, publisher={Centre for Telematics and Information Technology
(CTIT), University of Twente}, author={Lee, Yeong Su and Geierhos, Michaela},
editor={Aly, Robin and Hauff, C. and Hiemstra, Djoerd and Huibers, Theo W.C. and
de Jong, Franciska M.G.Editors}, year={2009}, pages={79–86}, collection={Workshop
Proceedings Series} }'
chicago: 'Lee, Yeong Su, and Michaela Geierhos. “Business Specific Online Information
Extraction from German Websites.” In Proceedings of the 9th Dutch-Belgian Information
Retrieval Workshop, edited by Robin Aly, C. Hauff, Djoerd Hiemstra, Theo W.C.
Huibers, and Franciska M.G. de Jong, 79–86. Workshop Proceedings Series. Enschede,
The Netherlands: Centre for Telematics and Information Technology (CTIT), University
of Twente, 2009.'
ieee: Y. S. Lee and M. Geierhos, “Business Specific Online Information Extraction
from German Websites,” in Proceedings of the 9th Dutch-Belgian Information
Retrieval Workshop, Enschede, The Netherlands, 2009, pp. 79–86.
mla: Lee, Yeong Su, and Michaela Geierhos. “Business Specific Online Information
Extraction from German Websites.” Proceedings of the 9th Dutch-Belgian Information
Retrieval Workshop, edited by Robin Aly et al., Centre for Telematics and
Information Technology (CTIT), University of Twente, 2009, pp. 79–86.
short: 'Y.S. Lee, M. Geierhos, in: R. Aly, C. Hauff, D. Hiemstra, T.W.C. Huibers,
F.M.G. de Jong (Eds.), Proceedings of the 9th Dutch-Belgian Information Retrieval
Workshop, Centre for Telematics and Information Technology (CTIT), University
of Twente, Enschede, The Netherlands, 2009, pp. 79–86.'
conference:
end_date: 2009-02-03
location: Enschede, The Netherlands
name: 9th Dutch-Belgian Information Retrieval Workshop (DIR 2009)
start_date: 2009-02-02
date_created: 2018-01-29T13:50:39Z
date_updated: 2022-01-06T06:50:57Z
department:
- _id: '36'
- _id: '1'
- _id: '579'
editor:
- first_name: Robin
full_name: Aly, Robin
last_name: Aly
- first_name: C.
full_name: Hauff, C.
last_name: Hauff
- first_name: Djoerd
full_name: Hiemstra, Djoerd
last_name: Hiemstra
- first_name: Theo W.C.
full_name: Huibers, Theo W.C.
last_name: Huibers
- first_name: Franciska M.G.
full_name: de Jong, Franciska M.G.
last_name: de Jong
extern: '1'
keyword:
- company search
- information extraction
- sublanguage
language:
- iso: eng
page: 79-86
place: Enschede, The Netherlands
publication: Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop
publication_identifier:
issn:
- 0929-0672
publication_status: published
publisher: Centre for Telematics and Information Technology (CTIT), University of
Twente
quality_controlled: '1'
series_title: Workshop Proceedings Series
status: public
title: Business Specific Online Information Extraction from German Websites
type: conference
user_id: '42496'
year: '2009'
...