Introduction

This is a condensed version of the material presented at the 2021 performance workshop tailored to the peano4 use case. Score-p provides profiling and event tracing through instrumentation of user code. We use scalasca to profile the score-p instrumented executable.


Links:

General workflow for an autotools project

Score-p provides compiler wrappers that do the code instrumentation. The runtime behaviour of score-p is generally controlled through environment variables of the type SCOREP_. This pertains to the output directory (SCOREP_EXPERIMENT_DIRECTORY) as well. NOTE: Scalasca will take care of automatic outputidirectory generation.

There is a handy scripts that prints all available variables:

 scorep-info config-vars --full

In the simplest case, replacing gcc with scorep –user gcc in all compilation steps will be sufficient. For autotools (CMake as well) projects some more care needs to be taken.

DINE build

It is very important to turn off the compiler wrapper during the configure step and to –disable-dependency-tracking.

module purge
# Intel no <filesytem> support?
#module load intel_comp/2020-update2 intel_mpi/2020-update2
#module load scorep/7.0 - linker errors also with gnu10 get_location_from_adhoc_loc
module load gnu_comp/10.2.0 openmpi/4.0.5 scorep/6.0

SCOREP_WRAPPER=off CXX=scorep-g++ ./configure --with-mpi=scorep-mpicxx --with-multithreading=omp CXXFLAGS="-std=c++17 -fopenmp -march=native -O3" LDFLAGS="-fopenmp" --enable-loadbalancing --enable-exahype --enable-particles --enable-blockstructured --disable-dependency-tracking

make -j20

cd examples/exahype2/euler
export PYTHONPATH=$PWD/../../../python:$PYTHONPATH
python3 example-scripts/finitevolumes.py -cs 0.1 -f -et 0.0005

export SCOREP_TOTAL_MEMORY=16GB
scan mpiexec -np 2 ./peano4

square -s square -s ./scorep_peano4_2xO_sum

scan -q -t  mpiexec -np 2 ./peano4

NOTE: the scorep/7.0 modules seem not to work. It is very easy to install it locally though as it tracks and installs its own dependencies for you:

wget http://perftools.pages.jsc.fz-juelich.de/cicd/scorep/tags/scorep-7.0/scorep-7.0.tar.gz
tar xzf scorep-7.0.tar.gz
cd scorep-7.0
./configure --prefix=$PWD/local
make install -j20

export PATH=$PWD/local/bin:$PATH
export LD_LIBRARY_PATH=$PWD/local/lib:$LD_LIBRARY_PATH

Typical steps

  1. Application compiled with score-p wrappers
  2. Run Scalasca scan in summary mode
  3. Run Scalaca square -s, extract hint on SCOREP_TOTAL_MEMORY from produced .score file
  4. export SCOREP_TOTAL_MEMORY=???
  5. Run Scalasca scan -q -t in trace mode

NOTE: It is a good idea to generously overestimate what value SCOREP_TOTAL_MEMORY to set to in order to prevent frustrating iteration. The reason being that scorep will only complain (and fail to provide output) at the end of the program if there is not enough memory.

After running square, the summary will tell you how much memory you need to do a full trace run. If the reported minimum is in excess of the available system memory, a filtering fill must be provided

Filtering file

ybd