Business Specific Online Information Extraction from German Websites
Y.S. Lee, M. Geierhos, in: A. Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing: 5th International Conference, CICLing 2004, Seoul, Korea, February 15-21, 2004, Proceedings, Springer, Berlin, Germany, 2009, pp. 369–381.
No fulltext has been uploaded.
Book Chapter
| Published
| English
Lee, Yeong Su;
Geierhos, MichaelaLibreCat 

Book Editor
Gelbukh, Alexander
This paper presents a system that uses the domain name of a German business website to locate its information pages (e.g. company profile, contact page, imprint) and then identifies business specific information. We therefore concentrate on the extraction of characteristic vocabulary like company names, addresses, contact details, CEOs, etc. Above all, we interpret the HTML structure of documents and analyze some contextual facts to transform the unstructured web pages into structured forms. Our approach is quite robust in variability of the DOM, upgradeable and keeps data up-to-date. The evaluation experiments show high efficiency of information access to the generated data. Hence, the developed technique is adaptive to non-German websites with slight language-specific modifications, and experimental results on real-life websites confirm the feasibility of the approach.
Publishing Year
Book Title
Computational Linguistics and Intelligent Text Processing: 5th International Conference, CICLing 2004, Seoul, Korea, February 15-21, 2004, Proceedings
Series Title / Volume
Lecture Notes in Computer Science
10th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2009)
Conference Location
Mexico City, Mexico
Conference Date
2009-03-01 – 2009-03-07
Cite this
Lee YS, Geierhos M. Business Specific Online Information Extraction from German Websites. In: Gelbukh A, ed. Computational Linguistics and Intelligent Text Processing: 5th International Conference, CICLing 2004, Seoul, Korea, February 15-21, 2004, Proceedings. Vol 5449. Lecture Notes in Computer Science. Berlin, Germany: Springer; 2009:369-381. doi:10.1007/978-3-642-00382-0_30
Lee, Y. S., & Geierhos, M. (2009). Business Specific Online Information Extraction from German Websites. In A. Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing: 5th International Conference, CICLing 2004, Seoul, Korea, February 15-21, 2004, Proceedings (Vol. 5449, pp. 369–381). Berlin, Germany: Springer.
@inbook{Lee_Geierhos_2009, place={Berlin, Germany}, series={Lecture Notes in Computer Science}, title={Business Specific Online Information Extraction from German Websites}, volume={5449}, DOI={10.1007/978-3-642-00382-0_30}, booktitle={Computational Linguistics and Intelligent Text Processing: 5th International Conference, CICLing 2004, Seoul, Korea, February 15-21, 2004, Proceedings}, publisher={Springer}, author={Lee, Yeong Su and Geierhos, Michaela}, editor={Gelbukh, AlexanderEditor}, year={2009}, pages={369–381}, collection={Lecture Notes in Computer Science} }
Lee, Yeong Su, and Michaela Geierhos. “Business Specific Online Information Extraction from German Websites.” In Computational Linguistics and Intelligent Text Processing: 5th International Conference, CICLing 2004, Seoul, Korea, February 15-21, 2004, Proceedings, edited by Alexander Gelbukh, 5449:369–81. Lecture Notes in Computer Science. Berlin, Germany: Springer, 2009.
Y. S. Lee and M. Geierhos, “Business Specific Online Information Extraction from German Websites,” in Computational Linguistics and Intelligent Text Processing: 5th International Conference, CICLing 2004, Seoul, Korea, February 15-21, 2004, Proceedings, vol. 5449, A. Gelbukh, Ed. Berlin, Germany: Springer, 2009, pp. 369–381.
Lee, Yeong Su, and Michaela Geierhos. “Business Specific Online Information Extraction from German Websites.” Computational Linguistics and Intelligent Text Processing: 5th International Conference, CICLing 2004, Seoul, Korea, February 15-21, 2004, Proceedings, edited by Alexander Gelbukh, vol. 5449, Springer, 2009, pp. 369–81, doi:10.1007/978-3-642-00382-0_30.