Scaling of PandAna code

We use the original version of PandAna, and the kNumuCutND candidate selection criterion, to test the performance of PandAna. In this case, we are looking to see how the processing speed varies with the number of files being processed. The purpose is to see if there is any unexpected defect in the performance, such as a decrease in speed with number of files processed. We can also use this to determine whether there is any noticeable startup overhead in the processing.

Our dataframe contains the number of files processed, and the real, user and system time (as reported by time) to run the program that processes the files.

All tests were run on my laptop.

n	real	user	sys
1	7.058	6.402	0.408
2	10.470	10.018	0.414
3	15.593	15.223	0.495
4	20.743	20.292	0.570
5	26.418	25.656	0.758
6	30.928	30.175	0.871
7	36.658	35.691	1.071
8	41.951	40.777	1.216
9	48.104	46.549	1.449
10	52.657	51.260	1.499
11	58.003	56.227	1.726
12	63.814	61.982	1.844
13	68.575	66.572	2.063
14	74.312	72.159	2.213
15	78.159	75.880	2.342
16	88.258	82.553	2.544
17	89.435	86.700	2.683
18	94.124	91.319	2.840
19	93.879	91.110	2.865
20	98.771	95.985	2.886

Visual inspection shows the linearity that indicates no unexpected behaviors:

Similarly, for user time:

Finally, for system time. In this case, the linear behavior is less perfect. Probably, we would need many repeated measurements to be sure if there is any real non-linearity.

Scaling of PandAna code

Very Preliminary

Marc Paterno

9/20/2019