Memory leak analysis

Author

Marc Paterno

Description

We suspect a memory leak in the GPU integration modules. We are trying to determine what part of the code is leaking memory.

The data

We have output (CosmoSIS writing to standard output) from jobs with the CosmoSIS memory monitoring utility turned on.

We parse the data to capture the first memory report after each new sample starts.

The memory reports collected by CosmoSIS are obtained using the Python module psutil. On Linux, psutil is using the /proc filesystem to obtain data aboutthe process. The reported memory numbers are:

  1. RSS: the resident set size, reported as physical memory, which is the amount of non-swapped physical memory currently in use, and
  2. VMS: the virtual memory size, which is the amount of memory that has been requested by the running process.

We’ll look at the VMS because if we do not want our results confounded by any use of swap space, and we also do not want to be confounded by Linux not really allocating physical memory when arrays full of zeros are allocated.

First we’ll plot the VMS memory size as a function of the sample number for the two runs. Run t6 is doing 1200 integrals per sample; run t7 is doing only 120 integrals per sample.

Determining the leak rate

To fit the data (looking to find how many bytes are leaked per sample), we want to drop the first sample (because we are still setting up long-lived data structures for that sample, especially in modules following the integration module). On the plots below, the solid lines are the linear fits.

The fitted function is: \(y = a + b x\). The fit parameters are:

test a b
t6 6611.280 1.2660665
t7 6612.251 0.6774389

If there is a leak of some data structure once per sample, we would expect the slopes of the two lines to be equal. If there is a leak of some data structure once per integral, we would expect the slope for t6 to be 10 times greater than that of t7. The ratio of the slopes is 1.87. This is not consistent with either of the theories – we will need more investigation to figure out the problem.