Skip to content

Latest commit

 

History

History
117 lines (92 loc) · 4.98 KB

perf.org

File metadata and controls

117 lines (92 loc) · 4.98 KB

Perf

setup

There are just some personal notes on using perf. Notes are influx, maybe never finished, and not authoritative:

More info is available at:
http://www.brendangregg.com/perf.html \ http://sandsoftwaresound.net/perf/perf-tutorial-hot-spots/ (3-part series) \ http://sandsoftwaresound.net/perf/perf-tutorial-hot-spots/ https://perf.wiki.kernel.org/

To get started you need to install (on Ubuntu) linux-tools-common, linux-tools-generic and linux-cloud-tools-generic. Then you need to reboot

To confirm everything works run sudo perf test and make sure everything passes. Note that all commands need to be run with sudo. It will work without it, but you will be missing features and things will act up (just try running perf test without sudo

list

sudo perf list

Will return a huge list of events that you can monitor on your particular CPU. This list is not exhaustive. You can open up your CPU data sheet and find additional event and reference them by hex number

stat

sudo perf stat my-command my-args

This will give you a general overview of your program’s performance.

  • CPUs utilized should be close to the number of threads you have
  • insn per cycle is instructions per cycle. This should be higher than 1 and closer to 2
sudo perf stat -d my-command my-args

With the -d flag you will get a bit more info about cache misses

sudo perf stat -e event-name1 event-name2 .. my-command my-args

With the -e flag you can track specific event (see: perf list) and how often they occur.

For examample to get a runtime you could do:

perf stat -e cpu-clock my-command

This is called the counting mode and it’s just counting how many times the event happened. It’s very low overhead and simple. It’s accurate, but it’s only giving you info about the program as a whole

record

sudo perf record -F 99 my-command my-args

This is generating an actual performance profile. Here it interrupts the process at a given frequency -F freq (her at 99Hz) and saves statistics on the stack it gets. The result is saved to a perf.data file.

To have useful stack frame information the program should be compiled with debug information - however this seems to often not be sufficient. Two methods exist for adding back the symbols to get useful results

sudo perf record -F 99 --call-graph lbr my-command my-args

This method didn’t work for me (or only worked partially)

sudo perf record -F 99 --call-graph dwarf my-command my-args

here: https://stackoverflow.com/questions/10933408/how-can-i-get-perf-to-find-symbols-in-my-program they run

record -g dwarf -F 97 /path/to/my/program

We can also collect samples on events

sudo perf record -e some-event1,some-event2 my-command my-args

Again these are listed in perf list

This is the sampling mode an it’s triggering a handler function every X number of events. This will be slower than the counting-mode, but you will get granularity when it comes to where perf issues are coming up

Here we have two choices of how to sample:

sampling frequency
sampling the event X times per second. This is like before and uses the -F frequency flag
fixed sampling
sampling at every X number of events. This uses the -c flag

You want more samples for the statistic, but you don’t want too many that it interferes with the caches and normal executation of the program. One of the linked articles suggests aiming for a 5% overhead at most.

If you have too many samples, as you may have with the default sampling period, then you will get a warning like

Processed 77354 events and lost 47 chunks!

Check IO/CPU overload!

[ perf record: Captured and wrote 537.245 MB perf.data (66655 samples) ]

The perf.data file can be examined with a special perf tool

report

sudo perf report

Will open a TUI application to examine the results stored in perf.data. It takes a while to read the data

sudo perf report --gtk

will launch a GTK application.. theoretically. On Ubuntu the requisite library doesn’t exist

If you record at a certain frequency then you will get a list of functions where the program spends the most time. If you recorded multiple events then the first thing you will select is the event you want to examine.

If you select a function and hit enter then choose to see an Annotated version of the code that will show you exactly where the code is running “hot”

evlist

sudo perf evlist

Is a helper to see what configuration was used to generate the perf.data - in case you forgot or want to double check

sudo perf evlist -F

Is handy b/c it will display the sample frequency - which sometimes is running at some default (if not specified by the user)