--- _id: '1114' abstract: - lang: eng text: This paper presents a system that uses the domain name of a German business website to locate its information pages (e.g. company profile, contact page, imprint) and then identifies business specific information. We therefore concentrate on the extraction of characteristic vocabulary like company names, addresses, contact details, CEOs, etc. Above all, we interpret the HTML structure of documents and analyze some contextual facts to transform the unstructured web pages into structured forms. Our approach is quite robust in variability of the DOM, upgradeable and keeps data up-to-date. The evaluation experiments show high efficiency of information access to the generated data. Hence, the developed technique is adaptive to non-German websites with slight language-specific modifications, and experimental results on real-life websites confirm the feasibility of the approach. author: - first_name: Yeong Su full_name: Lee, Yeong Su last_name: Lee - first_name: Michaela full_name: Geierhos, Michaela id: '42496' last_name: Geierhos orcid: 0000-0002-8180-5606 citation: ama: 'Lee YS, Geierhos M. Business Specific Online Information Extraction from German Websites. In: Aly R, Hauff C, Hiemstra D, Huibers TWC, de Jong FMG, eds. Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop. Workshop Proceedings Series. Enschede, The Netherlands: Centre for Telematics and Information Technology (CTIT), University of Twente; 2009:79-86.' apa: 'Lee, Y. S., & Geierhos, M. (2009). Business Specific Online Information Extraction from German Websites. In R. Aly, C. Hauff, D. Hiemstra, T. W. C. Huibers, & F. M. G. de Jong (Eds.), Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop (pp. 79–86). Enschede, The Netherlands: Centre for Telematics and Information Technology (CTIT), University of Twente.' bibtex: '@inproceedings{Lee_Geierhos_2009, place={Enschede, The Netherlands}, series={Workshop Proceedings Series}, title={Business Specific Online Information Extraction from German Websites}, booktitle={Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop}, publisher={Centre for Telematics and Information Technology (CTIT), University of Twente}, author={Lee, Yeong Su and Geierhos, Michaela}, editor={Aly, Robin and Hauff, C. and Hiemstra, Djoerd and Huibers, Theo W.C. and de Jong, Franciska M.G.Editors}, year={2009}, pages={79–86}, collection={Workshop Proceedings Series} }' chicago: 'Lee, Yeong Su, and Michaela Geierhos. “Business Specific Online Information Extraction from German Websites.” In Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop, edited by Robin Aly, C. Hauff, Djoerd Hiemstra, Theo W.C. Huibers, and Franciska M.G. de Jong, 79–86. Workshop Proceedings Series. Enschede, The Netherlands: Centre for Telematics and Information Technology (CTIT), University of Twente, 2009.' ieee: Y. S. Lee and M. Geierhos, “Business Specific Online Information Extraction from German Websites,” in Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop, Enschede, The Netherlands, 2009, pp. 79–86. mla: Lee, Yeong Su, and Michaela Geierhos. “Business Specific Online Information Extraction from German Websites.” Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop, edited by Robin Aly et al., Centre for Telematics and Information Technology (CTIT), University of Twente, 2009, pp. 79–86. short: 'Y.S. Lee, M. Geierhos, in: R. Aly, C. Hauff, D. Hiemstra, T.W.C. Huibers, F.M.G. de Jong (Eds.), Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop, Centre for Telematics and Information Technology (CTIT), University of Twente, Enschede, The Netherlands, 2009, pp. 79–86.' conference: end_date: 2009-02-03 location: Enschede, The Netherlands name: 9th Dutch-Belgian Information Retrieval Workshop (DIR 2009) start_date: 2009-02-02 date_created: 2018-01-29T13:50:39Z date_updated: 2022-01-06T06:50:57Z department: - _id: '36' - _id: '1' - _id: '579' editor: - first_name: Robin full_name: Aly, Robin last_name: Aly - first_name: C. full_name: Hauff, C. last_name: Hauff - first_name: Djoerd full_name: Hiemstra, Djoerd last_name: Hiemstra - first_name: Theo W.C. full_name: Huibers, Theo W.C. last_name: Huibers - first_name: Franciska M.G. full_name: de Jong, Franciska M.G. last_name: de Jong extern: '1' keyword: - company search - information extraction - sublanguage language: - iso: eng page: 79-86 place: Enschede, The Netherlands publication: Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop publication_identifier: issn: - 0929-0672 publication_status: published publisher: Centre for Telematics and Information Technology (CTIT), University of Twente quality_controlled: '1' series_title: Workshop Proceedings Series status: public title: Business Specific Online Information Extraction from German Websites type: conference user_id: '42496' year: '2009' ...