Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis

S. Heindorf, M. Potthast, B. Stein, G. Engels, in: Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15), 2015, pp. 831--834.

Download
Restricted 239-p831-heindorf.pdf 735.90 KB
Conference Paper | English
Author
Heindorf, Stefan; Potthast, Martin; Stein, Benno; Engels, GregorLibreCat
Abstract
We report on the construction of the Wikidata Vandalism Corpus WDVC-2015, the first corpus for vandalism in knowledge bases. Our corpus is based on the entire revision history of Wikidata, the knowledge base underlying Wikipedia. Among Wikidata's 24 million manual revisions, we have identified more than 100,000 cases of vandalism. An in-depth corpus analysis lays the groundwork for research and development on automatic vandalism detection in public knowledge bases. Our analysis shows that 58% of the vandalism revisions can be found in the textual portions of Wikidata, and the remainder in structural content, e.g., subject-predicate-object triples. Moreover, we find that some vandals also target Wikidata content whose manipulation may impact content displayed on Wikipedia, revealing potential vulnerabilities. Given today's importance of knowledge bases for information systems, this shows that public knowledge bases must be used with caution.
Publishing Year
Proceedings Title
Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15)
Page
831--834
LibreCat-ID
239

Cite this

Heindorf S, Potthast M, Stein B, Engels G. Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis. In: Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15). ; 2015:831--834. doi:10.1145/2766462.2767804
Heindorf, S., Potthast, M., Stein, B., & Engels, G. (2015). Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis. In Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15) (pp. 831--834). https://doi.org/10.1145/2766462.2767804
@inproceedings{Heindorf_Potthast_Stein_Engels_2015, title={Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis}, DOI={10.1145/2766462.2767804}, booktitle={Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15)}, author={Heindorf, Stefan and Potthast, Martin and Stein, Benno and Engels, Gregor}, year={2015}, pages={831--834} }
Heindorf, Stefan, Martin Potthast, Benno Stein, and Gregor Engels. “Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis.” In Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15), 831--834, 2015. https://doi.org/10.1145/2766462.2767804.
S. Heindorf, M. Potthast, B. Stein, and G. Engels, “Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis,” in Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15), 2015, pp. 831--834.
Heindorf, Stefan, et al. “Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis.” Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15), 2015, pp. 831--834, doi:10.1145/2766462.2767804.
Main File(s)
File Name
239-p831-heindorf.pdf 735.90 KB
Access Level
Restricted Closed Access
Last Uploaded
2018-03-21T10:29:18Z


Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar