Recently I came along Brendan D. Gregg’s Homepage who is doing really awesome stuff to find performance bottlenecks. As it does not make sense to repeat everything here from perf_events, probes to flamegraphs. I really recommend to take a look at his work.
Some Notes to myself:
For stack traces compile with -fno-omit-frame-pointer
for optimized builds
debug builds with -g dwarf
do not require it.
oneliner:
perf record -g
perf report -g "graph,0.5,caller"