Ranking on Very Large Knowledge Graphs
Ranking plays a central role in a large number of applications driven by RDF knowledge graphs. Over the last years, many popular RDF knowledge graphs have grown so large that rankings for the facts they contain cannot be computed directly using the currently common 64-bit platforms. In this paper, we tackle two problems:
Computing ranks on such large knowledge bases efficiently and incrementally. First, we present D-HARE, a distributed approach for computing ranks on very large knowledge graphs. D-HARE assumes the random surfer model and relies on data partitioning to compute matrix multiplications and transpositions on disk for matrices of arbitrary size. Moreover, the data partitioning underlying D-HARE allows the execution of most of its steps in parallel.
As very large knowledge graphs are often updated periodically, we tackle the incremental computation of ranks on large knowledge bases as a second problem. We address this problem by presenting
I-HARE, an approximation technique for calculating the overall ranking scores of a knowledge without the need to recalculate the ranking from scratch at each new revision. We evaluate our approaches by calculating ranks on the 3 × 10^9 and 2.4 × 10^9 triples from Wikidata resp. LinkedGeoData. Our evaluation demonstrates
that D-HARE is the first holistic approach for computing ranks on very large RDF knowledge graphs. In addition, our incremental approach achieves a root mean squared error of less than 10E−7 in the best case. Both D-HARE
and I-HARE are open-source and are available at: https://github.com/dice-group/incrementalHARE.
163-171
163-171
ACM