-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance degrades with the size of the API #26
Comments
To clarify: If I reduce the API in integration to 1 DataSet, without changing anything else, I also get the 1024 requests done in 12 seconds. |
The initial description to reproduce the issue was blending multiple scenarios that I had tested. Here is Software under test: https://github.com/statistikstadtzuerich/stat.stadt-zuerich.ch
Start the API Backend:
Without any modifications to the source, the Shape-API only contains one Dataset (BEW-RAUM-ZEIT-HEL). The API description is in the folder api_apidev. Let's hit the API with curl:
In order to be able to update the API afterwards, setting the password for the SPARQL Endpoint as environment variable is necessary. Endpoint and user are defined in api-config.apidev.js.
Now we will include more Datasets in the API. Comment the WHITELISTING
Restart the backend and hit the API again with curl. Startup takes a bit longer this time (this is a known issue):
In my setup, the response time increased from 1.6 to 10.2 seconds, for the same dataset. |
It seems as if performance degrades, when the API gets bigger.
In my test-scenario, I looked at the difference between an integration and production environment:
I also compared performance between going thru 1) the sparql-proxy and 2) hydraBox.
The query against the triplestore is in both cases the same.
With the given testdata, the size of the response is 40 KB ld+json if retrieval happens via sparql-proxy and 48 KB if retrieval is via hydraBox.
I also submitted requests in parallel (see results below). I noticed that retrieval via hydraBox with the larger API in integration is maxing out (only) one cpu on my multicore-system. This is different with the smaller API in production, where multiple cores get used.
Testresults for the given testdata:
This shows the context of the testdata:
This shows how I ran the requests in parallel. Switching targets (sparql-proxy or hydraBox) is done
by uncommenting the respective line in
multi.sh
:The text was updated successfully, but these errors were encountered: