You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am doing a research of Profile-Guided Optimization (PGO) benefits on different software (results are here). I optimized drill with PGO too (via cargo-pgo) and want to share my results.
Test environment
Fedora 38
Linux kernel 6.3.7
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Rustc: 1.70.0
drill version: the latest main commit for now (dfd5548c8d4269d5fa8b73e81d616572e9a9d445)
Benchmark
As a benchmark, I used the server from example/server and drill with drill --benchmark benchmark.yml --stats (the only change to the benchmark.yml was iteration count - increased to 10000). I compared Drill in Release mode vs Drill in Release + PGO. As a profiling load (to collect a profile) the same load was used.
Results
Firstly, I want to highlight that methodology is not ideal since the CPU core is not overloaded so I measured the "average" CPU load by drill on one core (by htop) utility and checked with my eyes during every run (yeah, some scripting over top can be used here but right now I am quite lazy :). The lower the average CPU usage is - the better. This method could be improved but as a quick way - it should be good enough. All measurements were done on the same hardware/software, with the same "quiet" background load, multiple times, in different orders, etc - they are quite stable at least on my machine.
I show you results for "Release", "Release with PGO", and "Instrumentation" mode (Instrumentation just for history so you can estimate how Drill is slow in the Instrumentation mode):
Release: average CPU load is ~9.0 - 9.7% (less frequently 10.3%)
Release + PGO: average CPU load is ~7.8 - 8.4%
Instrumentation: average CPU load is ~15.5%
At least in this test, I see an improvement in Drill performance with PGO. If we can develop a way where Drill will be a CPU bottleneck itself in a "near real-life" case instead of NodeJS server - would be great to test it as well.
These results could be important for the persons who want to maximize benchmark tool performance per core/CPU/machine since it could help with postponing a moment when for benchmark purposes we need to spawn multiple machines to create a required stress load and/or just spawn cheaper instances to create the same load.
The text was updated successfully, but these errors were encountered:
Hi!
I am doing a research of Profile-Guided Optimization (PGO) benefits on different software (results are here). I optimized
drill
with PGO too (via cargo-pgo) and want to share my results.Test environment
main
commit for now (dfd5548c8d4269d5fa8b73e81d616572e9a9d445
)Benchmark
As a benchmark, I used the server from
example/server
anddrill
withdrill --benchmark benchmark.yml --stats
(the only change to thebenchmark.yml
was iteration count - increased to 10000). I compared Drill in Release mode vs Drill in Release + PGO. As a profiling load (to collect a profile) the same load was used.Results
Firstly, I want to highlight that methodology is not ideal since the CPU core is not overloaded so I measured the "average" CPU load by
drill
on one core (byhtop
) utility and checked with my eyes during every run (yeah, some scripting overtop
can be used here but right now I am quite lazy :). The lower the average CPU usage is - the better. This method could be improved but as a quick way - it should be good enough. All measurements were done on the same hardware/software, with the same "quiet" background load, multiple times, in different orders, etc - they are quite stable at least on my machine.I show you results for "Release", "Release with PGO", and "Instrumentation" mode (Instrumentation just for history so you can estimate how Drill is slow in the Instrumentation mode):
~9.0 - 9.7% (less frequently 10.3%)
~7.8 - 8.4%
~15.5%
At least in this test, I see an improvement in Drill performance with PGO. If we can develop a way where Drill will be a CPU bottleneck itself in a "near real-life" case instead of NodeJS server - would be great to test it as well.
These results could be important for the persons who want to maximize benchmark tool performance per core/CPU/machine since it could help with postponing a moment when for benchmark purposes we need to spawn multiple machines to create a required stress load and/or just spawn cheaper instances to create the same load.
The text was updated successfully, but these errors were encountered: