Cache misses with perf

We use perf to collect information on certain events. We are interested in the cache-misses per instruction. A small number is good. 5% is considered a critical value for performance that warrants further investigation (https://developers.redhat.com/blog/2014/03/10/determining-whether-an-application-has-poor-cache-performance-2/)

Analysis

sudo sh -c 'echo 0 >/proc/sys/kernel/perf_event_paranoid'

# 729 cells
python3 example-scripts/finitevolumes-with-ExaHyPE2-benchmark.py -cs 0.1 -pdt 0.01 --type gpu-ats -f -et 0.01
# or ~60k cells
python3 example-scripts/finitevolumes-with-ExaHyPE2-benchmark.py -cs 0.01 -pdt 0.01 --type gpu-ats -f -et 0.01



FUSENUM=500 FUSEMAX=10 perf stat -x "," -e task-clock,cycles,instructions,cache-references,cache-misses taskset -c 0-3 ./peano4  1> /dev/null 2>> data

L3 cache misses divided by the total number of instructions

This number, according to the redhat blog, is worrying if it is larger than 0.05.

L3 cache references divided by the total number of instructions

Here we see that about \(1.5\%\) of the instructions are accessing the cache. That is not a lot.

L3 cache misses divided by the cache references

Here we see that only a small fraction of about \(2\%\) of access to the cache result in a cache miss.

One would probably conclude that this is a cache oblivious algorithm.

TODO L2