Recently I came along Brendan D. Gregg’s Homepage who is doing really awesome stuff to find performance bottlenecks. As it does not make sense to repeat everything here from perf_events, probes to flamegraphs. I really recommend to take a look at his work.
Some Notes to myself:
For stack traces compile with -fno-omit-frame-pointer for optimized builds
debug builds with -g dwarf do not require it.
oneliner:
perf record -gperf report -g "graph,0.5,caller"