Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis
S. Heindorf, M. Potthast, B. Stein, G. Engels, in: Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15), 2015, pp. 831--834.
Download
239-p831-heindorf.pdf
735.90 KB
Conference Paper
| English
Author
Heindorf, Stefan;
Potthast, Martin;
Stein, Benno;
Engels, GregorLibreCat
Department
Abstract
We report on the construction of the Wikidata Vandalism Corpus WDVC-2015, the first corpus for vandalism in knowledge bases. Our corpus is based on the entire revision history of Wikidata, the knowledge base underlying Wikipedia. Among Wikidata's 24 million manual revisions, we have identified more than 100,000 cases of vandalism. An in-depth corpus analysis lays the groundwork for research and development on automatic vandalism detection in public knowledge bases. Our analysis shows that 58% of the vandalism revisions can be found in the textual portions of Wikidata, and the remainder in structural content, e.g., subject-predicate-object triples. Moreover, we find that some vandals also target Wikidata content whose manipulation may impact content displayed on Wikipedia, revealing potential vulnerabilities. Given today's importance of knowledge bases for information systems, this shows that public knowledge bases must be used with caution.
Publishing Year
Proceedings Title
Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15)
Page
831--834
LibreCat-ID
Cite this
Heindorf S, Potthast M, Stein B, Engels G. Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis. In: Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15). ; 2015:831--834. doi:10.1145/2766462.2767804
Heindorf, S., Potthast, M., Stein, B., & Engels, G. (2015). Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis. In Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15) (pp. 831--834). https://doi.org/10.1145/2766462.2767804
@inproceedings{Heindorf_Potthast_Stein_Engels_2015, title={Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis}, DOI={10.1145/2766462.2767804}, booktitle={Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15)}, author={Heindorf, Stefan and Potthast, Martin and Stein, Benno and Engels, Gregor}, year={2015}, pages={831--834} }
Heindorf, Stefan, Martin Potthast, Benno Stein, and Gregor Engels. “Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis.” In Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15), 831--834, 2015. https://doi.org/10.1145/2766462.2767804.
S. Heindorf, M. Potthast, B. Stein, and G. Engels, “Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis,” in Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15), 2015, pp. 831--834.
Heindorf, Stefan, et al. “Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis.” Proceedings of the 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15), 2015, pp. 831--834, doi:10.1145/2766462.2767804.
Main File(s)
File Name
239-p831-heindorf.pdf
735.90 KB
Access Level
Closed Access
Last Uploaded
2018-03-21T10:29:18Z