You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As I have done many times before, I decided to test the Profile-Guided Optimization (PGO) technique to optimize the library performance. For reference, results for other projects are available at https://github.com/zamazan4ik/awesome-pgo . Since PGO helped a lot for many other libraries, I decided to apply it on pingora to see if the performance win (or lose) can be achieved. Here are my benchmark results.
This information can be interesting for anyone who wants to achieve more performance with the library in their use cases.
Test environment
Fedora 40
Linux kernel 6.10.7
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.79.0
pingora version: main branch on commit e288bfe8f036d995d74367acef4b2fa0f04ecf26
Disabled Turbo boost
Benchmark
For benchmark purposes, I use built-in into the project benchmarks. For PGO optimization I use cargo-pgo tool. Release bench results I got with taskset -c 0 cargo bench --workspace command. The PGO training phase is done with taskset -c 0 cargo pgo bench -- --workspace, PGO optimization phase - with taskset -c 0 cargo pgo optimize bench -- --workspace.
taskset -c 0 is used for reducing the OS scheduler's influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).
According to the results (if we compare "Release" and "PGO optimized" benchmark reports with tools like diff), PGO measurably improves the libraries' performance in many cases (especially if we are talking about caches performance)
Further steps
I understand that the steps above can be time-consuming and hard to implement in practice. At the very least, the library's users can find this performance report and decide to enable PGO for their applications if they care about pingora performance in their workloads. Maybe a small note somewhere in the documentation (the README file?) will be enough to raise awareness about this work.
Also, Post-Link Optimization (PLO) can be tested after PGO. It can be done by applying tools like LLVM BOLT to Pingora-based apps.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi!
As I have done many times before, I decided to test the Profile-Guided Optimization (PGO) technique to optimize the library performance. For reference, results for other projects are available at https://github.com/zamazan4ik/awesome-pgo . Since PGO helped a lot for many other libraries, I decided to apply it on
pingora
to see if the performance win (or lose) can be achieved. Here are my benchmark results.This information can be interesting for anyone who wants to achieve more performance with the library in their use cases.
Test environment
pingora
version:main
branch on commite288bfe8f036d995d74367acef4b2fa0f04ecf26
Benchmark
For benchmark purposes, I use built-in into the project benchmarks. For PGO optimization I use cargo-pgo tool. Release bench results I got with
taskset -c 0 cargo bench --workspace
command. The PGO training phase is done withtaskset -c 0 cargo pgo bench -- --workspace
, PGO optimization phase - withtaskset -c 0 cargo pgo optimize bench -- --workspace
.taskset -c 0
is used for reducing the OS scheduler's influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).Results
I got the following results:
According to the results (if we compare "Release" and "PGO optimized" benchmark reports with tools like
diff
), PGO measurably improves the libraries' performance in many cases (especially if we are talking about caches performance)Further steps
I understand that the steps above can be time-consuming and hard to implement in practice. At the very least, the library's users can find this performance report and decide to enable PGO for their applications if they care about
pingora
performance in their workloads. Maybe a small note somewhere in the documentation (the README file?) will be enough to raise awareness about this work.Also, Post-Link Optimization (PLO) can be tested after PGO. It can be done by applying tools like LLVM BOLT to Pingora-based apps.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions