Skip to content

First tests with SPARQL 1.1 Update

Hannah Bast edited this page Nov 21, 2024 · 4 revisions

First performance tests on 15.11.2024

As of 15.11.2024, QLever has basic support for SPARQL 1.1 Update, see https://github.com/ad-freiburg/qlever/wiki/QLever-support-for-SPARQL-1.1-Update . The following is a log of a first performance test, carried out a few days before the last missing pull request was merged into the QLever master.

To test the functionality, you need the access token used for starting the server. This is needed for all privileged operations, which an update of course is (normal users should only have read-only access, and not be able to modify the dataset). The following tests assume access-token=wikidata_R8VkeqQRYnlW, which is not the access token for the official instance at https://qlever.cs.uni-freiburg.de/wikidata.

The tests were run on our local machine indus (AMD Ryzen 9 7950X 16-Core, 7.1T NVMe SSD) on wikidata.ssd/index.2024-10-31.

TLDR

In the current (first) version, updates are processed at a speed of around 2 µs / triple. For each 1% change in the input data, query times slow down by about 2%.

Not bad for starters + there is still a lot of room improvement regarding both figures.

Remove random elements from wdt:P31 and check the correctness

Each of the following updates or queries clears the cache before executing and outputs the elapsed time in seconds. In addition, each update outputs "Update successful" and each query outputs the result size. Each update also clears the cache after executing.

Let us first start the server from scratch. This takes around 8s.

qlever start --kill-existing-with-same-port

The first query ask for the original size of the wdt:P31 predicate. It has 117,115,625 triples and the query takes around 1.5s

qlever clear-cache 2> /dev/null && /usr/bin/time -f "Elapsed time: %es" curl -s https://qlever.cs.uni-freiburg.de/api/wikidata-prut -H "Accept: application/qlever-results+json" -d "send=0" --data-urlencode "query=PREFIX wdt: <http://www.wikidata.org/prop/direct/> SELECT ?s ?o WHERE { ?s wdt:P31 ?o }" | jq .resultsize | numfmt --grouping

The second query computes 17,115,625 random triples from wdt:P31. It takes around 11s.

qlever clear-cache 2> /dev/null && /usr/bin/time -f "Elapsed time: %es" curl -s https://qlever.cs.uni-freiburg.de/api/wikidata-prut -H "Accept: application/qlever-results+json" -d "send=0" --data-urlencode "query=PREFIX wdt: <http://www.wikidata.org/prop/direct/> SELECT ?s ?o WHERE { ?s wdt:P31 ?o } ORDER BY RAND() LIMIT 17115625" | jq .resultsize | numfmt --grouping

The third query removes 17,115,625 random triples from wdt:P31. It takes around 40s, depending on the random distribution of the triples over the predicate. Subtracting the time for the previous query, we can deduce that the update took around 30s for 17,115,625, which is roughly 2 µs / triple or 0.5 M triples / second. Not bad for starters (there is still a lot of room for optimization over what we are currently doing).

qlever clear-cache 2> /dev/null && /usr/bin/time -f "\nElapsed time: %es" curl -s -H "Accept: application/qlever-results+json" https://qlever.cs.uni-freiburg.de/api/wikidata-prut --data-urlencode "update=PREFIX wdt: <http://www.wikidata.org/prop/direct/> DELETE { ?s wdt:P31 ?o } WHERE { { SELECT ?s ?o WHERE { ?s wdt:P31 ?o } ORDER BY RAND() LIMIT 17115625 } }" --data-urlencode "access-token=wikidata_R8VkeqQRYnlW" && qlever clear-cache 2> /dev/null

The fourth query is a repetition of the first query. It now says that the wdt:P31 predicate has 100,000,000 triples, which is correct. It takes around 2.0s, that is, around 30% more than the first query. This additional time is the overhead currently incurred by considering the delta triples. Note that the update affected around 15% of the data relevant for this query, which is a lot. Typical update queries affect only a small fraction of the data.

qlever clear-cache 2> /dev/null && /usr/bin/time -f "Elapsed time: %es" curl -s https://qlever.cs.uni-freiburg.de/api/wikidata-prut -H "Accept: application/qlever-results+json" -d "send=0" --data-urlencode "query=PREFIX wdt: <http://www.wikidata.org/prop/direct/> SELECT ?s ?o WHERE { ?s wdt:P31 ?o }" | jq .resultsize | numfmt --grouping