We use perf to collect information on certain events. We are interested in the cache-misses per instruction. A small number is good. 5% is considered a critical value for performance that warrants further investigation (https://developers.redhat.com/blog/2014/03/10/determining-whether-an-application-has-poor-cache-performance-2/)
sudo sh -c 'echo 0 >/proc/sys/kernel/perf_event_paranoid'
# 729 cells
python3 example-scripts/finitevolumes-with-ExaHyPE2-benchmark.py -cs 0.1 -pdt 0.01 --type gpu-ats -f -et 0.01
# or ~60k cells
python3 example-scripts/finitevolumes-with-ExaHyPE2-benchmark.py -cs 0.01 -pdt 0.01 --type gpu-ats -f -et 0.01
FUSENUM=500 FUSEMAX=10 perf stat -x "," -e task-clock,cycles,instructions,cache-references,cache-misses taskset -c 0-3 ./peano4 1> /dev/null 2>> data
This number, according to the redhat blog, is worrying if it is larger than 0.05.
Here we see that about \(1.5\%\) of the instructions are accessing the cache. That is not a lot.
Here we see that only a small fraction of about \(2\%\) of access to the cache result in a cache miss.
One would probably conclude that this is a cache oblivious algorithm.