The objective of this document is to provide some quantification and visualization of the differences in processing times produced by parallelization of Katherine’s VisiMod::calculate_vi() tool. I ran every combination of the following:
Spatial scope: omnidirectional (“omnidir”), single directional (“directional_single”), and random directional (“directional_random”)
Number of cores: 1, 2, 4, 8
Number of points: 100, 200, 400, 800
Viewing distance: 100, 200, 400, 800
For the two directional scopes, I fixed the azimuth and field of view parameters both to 90 degrees. For the single-core test, I also compared the new version of the algorithm (which is set up for parallelization, even if multiple cores aren’t used) to Katherine’s original version of the algorithm. In all, this resulted in 240 runs of calculate_vi(). Here is the full table, in case it’s useful:
library(knitr)library(viridis)
Loading required package: viridisLite
# read in and print the processing time tabledf <-read.csv("S:/ursa2/campbell/visimod/other/calculate_vi_proc_time.csv")kable(df)
scope
npts
view.dist
cores
alg
time
outnum
omnidir
100
100
1
original
70.63772
NA
directional_single
100
100
1
original
70.59072
NA
directional_random
100
100
1
original
71.01162
NA
omnidir
200
100
1
original
142.55289
NA
directional_single
200
100
1
original
137.62270
NA
directional_random
200
100
1
original
144.32235
NA
omnidir
400
100
1
original
286.82741
NA
directional_single
400
100
1
original
278.07150
NA
directional_random
400
100
1
original
276.67465
NA
omnidir
800
100
1
original
587.48781
NA
directional_single
800
100
1
original
580.85630
NA
directional_random
800
100
1
original
576.40260
NA
omnidir
100
200
1
original
134.74830
NA
directional_single
100
200
1
original
120.64596
NA
directional_random
100
200
1
original
120.19371
NA
omnidir
200
200
1
original
272.83305
NA
directional_single
200
200
1
original
234.28282
NA
directional_random
200
200
1
original
232.01790
NA
omnidir
400
200
1
original
547.23781
NA
directional_single
400
200
1
original
484.65274
NA
directional_random
400
200
1
original
476.41584
NA
omnidir
800
200
1
original
1116.30866
NA
directional_single
800
200
1
original
1062.63343
NA
directional_random
800
200
1
original
1075.00032
NA
omnidir
100
400
1
original
339.46045
NA
directional_single
100
400
1
original
292.40580
NA
directional_random
100
400
1
original
274.60654
NA
omnidir
200
400
1
original
679.92421
NA
directional_single
200
400
1
original
530.52106
NA
directional_random
200
400
1
original
527.01095
NA
omnidir
400
400
1
original
1356.75858
NA
directional_single
400
400
1
original
1063.77346
NA
directional_random
400
400
1
original
1071.77791
NA
omnidir
800
400
1
original
2704.66966
NA
directional_single
800
400
1
original
2176.25649
NA
directional_random
800
400
1
original
2185.94928
NA
omnidir
100
800
1
original
883.16497
NA
directional_single
100
800
1
original
548.01438
NA
directional_random
100
800
1
original
535.72722
NA
omnidir
200
800
1
original
1798.41478
NA
directional_single
200
800
1
original
1084.36524
NA
directional_random
200
800
1
original
1096.45316
NA
omnidir
400
800
1
original
3572.46798
NA
directional_single
400
800
1
original
2189.92040
NA
directional_random
400
800
1
original
2146.70715
NA
omnidir
800
800
1
original
7153.52319
NA
directional_single
800
800
1
original
4395.78153
NA
directional_random
800
800
1
original
4385.84109
NA
omnidir
100
100
1
parallel
84.13921
NA
directional_single
100
100
1
parallel
79.54423
NA
directional_random
100
100
1
parallel
79.42195
NA
omnidir
200
100
1
parallel
162.00025
NA
directional_single
200
100
1
parallel
160.08555
NA
directional_random
200
100
1
parallel
157.43748
NA
omnidir
400
100
1
parallel
311.83589
NA
directional_single
400
100
1
parallel
334.15797
NA
directional_random
400
100
1
parallel
340.40427
NA
omnidir
800
100
1
parallel
642.13019
NA
directional_single
800
100
1
parallel
615.46636
NA
directional_random
800
100
1
parallel
674.60560
NA
omnidir
100
200
1
parallel
147.64638
NA
directional_single
100
200
1
parallel
142.67683
NA
directional_random
100
200
1
parallel
128.56106
NA
omnidir
200
200
1
parallel
284.76511
NA
directional_single
200
200
1
parallel
254.45493
NA
directional_random
200
200
1
parallel
253.42825
NA
omnidir
400
200
1
parallel
576.57082
NA
directional_single
400
200
1
parallel
496.06219
NA
directional_random
400
200
1
parallel
496.63654
NA
omnidir
800
200
1
parallel
1122.71923
NA
directional_single
800
200
1
parallel
1044.50807
NA
directional_random
800
200
1
parallel
1059.31634
NA
omnidir
100
400
1
parallel
332.94834
NA
directional_single
100
400
1
parallel
276.51486
NA
directional_random
100
400
1
parallel
266.60219
NA
omnidir
200
400
1
parallel
671.05632
NA
directional_single
200
400
1
parallel
525.14726
NA
directional_random
200
400
1
parallel
530.58606
NA
omnidir
400
400
1
parallel
1310.86202
NA
directional_single
400
400
1
parallel
1055.43139
NA
directional_random
400
400
1
parallel
1068.07023
NA
omnidir
800
400
1
parallel
2638.18057
NA
directional_single
800
400
1
parallel
1435.92782
NA
directional_random
800
400
1
parallel
1895.93681
NA
omnidir
100
800
1
parallel
863.87952
NA
directional_single
100
800
1
parallel
546.33587
NA
directional_random
100
800
1
parallel
545.92246
NA
omnidir
200
800
1
parallel
1751.58242
NA
directional_single
200
800
1
parallel
1067.34977
NA
directional_random
200
800
1
parallel
1070.31255
NA
omnidir
400
800
1
parallel
3458.91076
NA
directional_single
400
800
1
parallel
2116.47786
NA
directional_random
400
800
1
parallel
2144.41562
NA
omnidir
800
800
1
parallel
6972.79212
NA
directional_single
800
800
1
parallel
4328.40692
NA
directional_random
800
800
1
parallel
4318.53856
NA
omnidir
100
100
2
parallel
59.99200
NA
directional_single
100
100
2
parallel
58.24540
NA
directional_random
100
100
2
parallel
60.26649
NA
omnidir
200
100
2
parallel
109.43504
NA
directional_single
200
100
2
parallel
109.93555
NA
directional_random
200
100
2
parallel
108.18555
NA
omnidir
400
100
2
parallel
200.88993
NA
directional_single
400
100
2
parallel
205.77961
NA
directional_random
400
100
2
parallel
201.73757
NA
omnidir
800
100
2
parallel
388.03809
NA
directional_single
800
100
2
parallel
416.98202
NA
directional_random
800
100
2
parallel
406.06077
NA
omnidir
100
200
2
parallel
94.74341
NA
directional_single
100
200
2
parallel
89.43565
NA
directional_random
100
200
2
parallel
89.30568
NA
omnidir
200
200
2
parallel
176.18670
NA
directional_single
200
200
2
parallel
168.63403
NA
directional_random
200
200
2
parallel
170.22710
NA
omnidir
400
200
2
parallel
355.10987
NA
directional_single
400
200
2
parallel
336.18175
NA
directional_random
400
200
2
parallel
335.01714
NA
omnidir
800
200
2
parallel
688.51264
NA
directional_single
800
200
2
parallel
662.09964
NA
directional_random
800
200
2
parallel
647.20046
NA
omnidir
100
400
2
parallel
202.35913
NA
directional_single
100
400
2
parallel
174.21647
NA
directional_random
100
400
2
parallel
178.87415
NA
omnidir
200
400
2
parallel
404.28140
NA
directional_single
200
400
2
parallel
330.22755
NA
directional_random
200
400
2
parallel
333.93966
NA
omnidir
400
400
2
parallel
716.65385
NA
directional_single
400
400
2
parallel
643.34943
NA
directional_random
400
400
2
parallel
655.81239
NA
omnidir
800
400
2
parallel
725.90631
NA
directional_single
800
400
2
parallel
436.74711
NA
directional_random
800
400
2
parallel
438.07607
NA
omnidir
100
800
2
parallel
324.15754
NA
directional_single
100
800
2
parallel
331.80445
NA
directional_random
100
800
2
parallel
330.10323
NA
omnidir
200
800
2
parallel
976.79755
NA
directional_single
200
800
2
parallel
651.86488
NA
directional_random
200
800
2
parallel
664.58271
NA
omnidir
400
800
2
parallel
1969.99467
NA
directional_single
400
800
2
parallel
1333.62296
NA
directional_random
400
800
2
parallel
1328.21601
NA
omnidir
800
800
2
parallel
3814.53258
NA
directional_single
800
800
2
parallel
2604.21014
NA
directional_random
800
800
2
parallel
1126.86924
800
omnidir
100
100
4
parallel
60.38114
100
directional_single
100
100
4
parallel
63.77082
100
directional_random
100
100
4
parallel
59.48024
100
omnidir
200
100
4
parallel
94.19526
200
directional_single
200
100
4
parallel
99.79912
200
directional_random
200
100
4
parallel
101.36544
200
omnidir
400
100
4
parallel
173.83377
400
directional_single
400
100
4
parallel
190.61623
400
directional_random
400
100
4
parallel
182.90380
400
omnidir
800
100
4
parallel
325.28424
800
directional_single
800
100
4
parallel
348.43339
800
directional_random
800
100
4
parallel
351.58068
800
omnidir
100
200
4
parallel
85.16101
100
directional_single
100
200
4
parallel
90.31539
100
directional_random
100
200
4
parallel
88.30298
100
omnidir
200
200
4
parallel
166.10916
200
directional_single
200
200
4
parallel
164.77338
200
directional_random
200
200
4
parallel
161.39634
200
omnidir
400
200
4
parallel
296.51864
400
directional_single
400
200
4
parallel
315.14091
400
directional_random
400
200
4
parallel
312.31423
400
omnidir
800
200
4
parallel
545.89786
800
directional_single
800
200
4
parallel
597.52939
800
directional_random
800
200
4
parallel
590.50292
800
omnidir
100
400
4
parallel
153.08878
100
directional_single
100
400
4
parallel
142.57740
100
directional_random
100
400
4
parallel
138.93459
100
omnidir
200
400
4
parallel
283.32754
200
directional_single
200
400
4
parallel
273.25604
200
directional_random
200
400
4
parallel
275.44019
200
omnidir
400
400
4
parallel
554.90724
400
directional_single
400
400
4
parallel
532.79932
400
directional_random
400
400
4
parallel
532.02024
400
omnidir
800
400
4
parallel
1094.59979
800
directional_single
800
400
4
parallel
1013.34414
800
directional_random
800
400
4
parallel
1014.51034
800
omnidir
100
800
4
parallel
310.02373
100
directional_single
100
800
4
parallel
223.98841
100
directional_random
100
800
4
parallel
236.74394
100
omnidir
200
800
4
parallel
592.60422
200
directional_single
200
800
4
parallel
451.37142
200
directional_random
200
800
4
parallel
446.53854
200
omnidir
400
800
4
parallel
1175.90734
400
directional_single
400
800
4
parallel
892.24337
400
directional_random
400
800
4
parallel
886.15168
400
omnidir
800
800
4
parallel
2293.84276
800
directional_single
800
800
4
parallel
1739.03009
800
directional_random
800
800
4
parallel
1767.80408
800
omnidir
100
100
8
parallel
73.61730
100
directional_single
100
100
8
parallel
78.40073
100
directional_random
100
100
8
parallel
73.29307
100
omnidir
200
100
8
parallel
103.87823
200
directional_single
200
100
8
parallel
110.65896
200
directional_random
200
100
8
parallel
113.55110
200
omnidir
400
100
8
parallel
177.54109
400
directional_single
400
100
8
parallel
183.77147
400
directional_random
400
100
8
parallel
190.82922
400
omnidir
800
100
8
parallel
327.49503
800
directional_single
800
100
8
parallel
348.17692
800
directional_random
800
100
8
parallel
328.18255
800
omnidir
100
200
8
parallel
90.57976
100
directional_single
100
200
8
parallel
95.19723
100
directional_random
100
200
8
parallel
97.47222
100
omnidir
200
200
8
parallel
165.80856
200
directional_single
200
200
8
parallel
169.38807
200
directional_random
200
200
8
parallel
163.17122
200
omnidir
400
200
8
parallel
268.61135
400
directional_single
400
200
8
parallel
291.74348
400
directional_random
400
200
8
parallel
297.34216
400
omnidir
800
200
8
parallel
512.67468
800
directional_single
800
200
8
parallel
550.91199
800
directional_random
800
200
8
parallel
520.33828
800
omnidir
100
400
8
parallel
134.27153
100
directional_single
100
400
8
parallel
135.97793
100
directional_random
100
400
8
parallel
136.99969
100
omnidir
200
400
8
parallel
232.93261
200
directional_single
200
400
8
parallel
224.00346
200
directional_random
200
400
8
parallel
234.81328
200
omnidir
400
400
8
parallel
426.05954
400
directional_single
400
400
8
parallel
425.42579
400
directional_random
400
400
8
parallel
454.26180
400
omnidir
800
400
8
parallel
828.21894
800
directional_single
800
400
8
parallel
869.35064
800
directional_random
800
400
8
parallel
859.98512
800
omnidir
100
800
8
parallel
219.81920
100
directional_single
100
800
8
parallel
197.28584
100
directional_random
100
800
8
parallel
186.42941
100
omnidir
200
800
8
parallel
395.49037
200
directional_single
200
800
8
parallel
344.05725
200
directional_random
200
800
8
parallel
353.50756
200
omnidir
400
800
8
parallel
771.60772
400
directional_single
400
800
8
parallel
644.94691
400
directional_random
400
800
8
parallel
634.30165
400
omnidir
800
800
8
parallel
1516.28501
800
directional_single
800
800
8
parallel
1245.74822
800
directional_random
800
800
8
parallel
1252.79806
800
The time column refers to the processing time in seconds. The outnum column is something I added maybe 2/3 of the way into my processing. I was running into a few errors, and I was wondering if it was the result of the underlying (original) algorithm or if it was an artifact of parallelization. To test it, I added some error handling within the function that basically just retried the same VI calculation up to 5 times until it didn’t receive an error. The idea was that if it was the underlying algorithm, then the error would still occur even after 5 tries. If the error was resolved in one of the retries, then we can assume it was an artifact of the complexities of the parallelization process. If the error still persisted after 5 tries, then it would simply ignore that point, and there would be one fewer record in the output dataset. So outnum allows us to see if the number of output records were different than the number of input points. In all cases, the outnum was equal to npts, which says that there is no error in the underlying algorithm and that errors were simply an artifact of parallelization, but through the built-in retrying process, they can be resolved.
Single Core Comparison
In order to parallelize the script, I had to switch from using for to using foreach, which required a few additional changes to the underlying function’s code. foreach is known to be a bit slower than for, as it holds more information in memory throughout the looping process as it dynamically constructs merged outputs of the looped function. So the question becomes how much slower is it? If it’s way slower, and most end users are going to use a single core for processing, then that would call into question the use of foreach. Let’s explore…
First, I’ll compare processing time between the original algorithm and the new algorithm by number of points.
# create subset of just single-core runsdf.1core <- df[df$cores ==1,]# create factordf.1core$alg[df.1core$alg =="original"] <-"old"df.1core$alg[df.1core$alg =="parallel"] <-"new"df.1core$x <-as.factor(paste0(df.1core$npts, "\n", df.1core$alg))# set up plotpar(mar =c(5,5,1,1), las =1)# plot it outboxplot(time ~ x, data = df.1core, col =rep(c(2,4),4), xlab ="npts\nalgorithm",xaxt ="n")axis(1, at =seq(1,8), labels =levels(df.1core$x), lwd =0)
OK, so at least visually, the “new” (parallel) algorithm does not appear to increase processing time significantly in comparison to the “old” (sequential) algorithm when run on a single core, by number of points. Let’s do the same for viewing distance:
# create factordf.1core$x <-as.factor(paste0(df.1core$view.dist, "\n", df.1core$alg))# set up plotpar(mar =c(5,5,1,1), las =1)# plot it outboxplot(time ~ x, data = df.1core, col =rep(c(2,4),4), xlab ="view.dist\nalgorithm",xaxt ="n")axis(1, at =seq(1,8), labels =levels(df.1core$x), lwd =0)
Some very minor differences, but fair to say that when running on one core, on average, the old and new algorithms perform very similarly.
Number of Cores
Now let’s test the degree to which multiple cores improves processing times. First, by number of points:
# create subset data framedf.par <- df[df$alg =="parallel",]# create factordf.par$x <-as.factor(paste0(df.par$npts, "\n", df.par$cores))# set up plotpar(mar =c(5,5,1,1), las =1)# plot it outcols <-viridis(4)boxplot(time ~ x, data = df.par, col =rep(cols,4), xlab ="npts\ncores",xaxt ="n")axis(1, at =seq(1,length(levels(df.par$x))), labels =levels(df.par$x), lwd =0, cex.axis =0.8)
A slightly odd trend for the npts == 800 case notwithstanding, the benefits of multiple cores are clear, both in terms of reducting in processing times, but also reduction in the variation of processing times – that is, the effects of other parameters such as viewing distance (view.dist) and scope (scope) have comparably lesser effects on processing time when multithreading. Let’s look at viewing distance and number of cores:
# create factordf.par$x <-as.factor(paste0(df.par$view.dist, "\n", df.par$cores))# set up plotpar(mar =c(5,5,1,1), las =1)# plot it outcols <-viridis(4)boxplot(time ~ x, data = df.par, col =rep(cols,4), xlab ="view.dist\ncores",xaxt ="n")axis(1, at =seq(1,length(levels(df.par$x))), labels =levels(df.par$x), lwd =0, cex.axis =0.8)
Very clear trend. In the cases of both npts and view.dist, it’s clear that the relationship between processing time and number of cores is not linear. In other words, doubling your cores doesn’t halve your processing time. So, it might be useful to see if there’s a somewhat predictable trend.
After a little behind the scenes data exploration, I found that the relationship between the log of processing time, number of points, viewing distance, and number of cores can be pretty well explained with a linear model:
# run linear modelmod <-lm(log(time) ~ npts + view.dist + cores, data = df.par)# set up plotpar(mar =c(5,5,1,1), las =1)# define x and yx <-log(df.par$time)y <-predict(mod)ax.min <-min(c(y, x))ax.max <-max(c(y, x))ax.lim <-c(ax.min, ax.max)# plot predicted vs. observedcol <-rep(2, nrow(df.par))col[df.par$scope =="omnidir"] <-4plot(y ~ x, xlim = ax.lim, ylim = ax.lim,pch =16, col = col, xlab ="log(Observed Time)",ylab ="log(Predicted Time)")grid()lines(x =c(-100000,100000),y =c(-100000,100000), col ="lightgray")mod.pred.obs <-lm(y ~ x)abline(mod.pred.obs, lwd =2)r2 <-paste0("rsq = ", round(summary(mod.pred.obs)$adj.r.squared, 2))legend("bottomright", legend = r2, bty ="n", x.intersp =0)legend("topleft", legend =c("directional", "omnidir"), pch =16, col =c(2,4))
Pretty impressive predictive power! But clearly the omnidirectional cases are being somewhat underpredicted. If I tested more directional parameters (e.g., wider FOVs), I could add that in as a predictor variable, but being able to explain 87% of variance without that distinction is still pretty impressive. So, we now have some semblance of a processing time predictive equation. Using the coefficients from that log regression, and applying the exponential function to both sides of the equation to get a prediction in seconds, we get:
where \(t\) is time in seconds, \(p\) is the number of points, \(d\) is the viewing distance in number of pixels, \(c\) is the number of cores, and \(\alpha\), \(\beta_{1}\), \(\beta_{2}\), and \(\beta_{3}\) are all model coefficients, defined as follows:
\(\alpha\) = 4.4291754
\(\beta_{1}\) = 0.0024576
\(\beta_{2}\) = 0.0022904
\(\beta_{3}\) = -0.0868244
This can be used to make predictions, distilling down the approximate effects of adding cores with different input parameters. For example, if we had 500 points and a viewing distance of 500 meters, here is how adding cores might speed things up:
# create prediction functionpred.fun <-function(p,d,c){ t <-exp(coef(mod)[[1]] +coef(mod)[[2]] * p +coef(mod)[[3]] * d +coef(mod)[[4]] * c)return(t)}# predict for 1-10 corescs <-seq(1,10)ts <-pred.fun(500,500,cs)# set up plotpar(mar =c(5,5,1,5), las =1)# plot time vs. coresplot(ts ~ cs, type ="l", col =2, lwd =2, xlab ="Cores",ylab =NA, yaxt ="n")axis(2, col.axis =2)mtext("Time (s)", 2, 2.5, col =2, las =0)# plot percent speed improvement vs. coresimps <-100* ((ts[1] - ts) / ts[1])par(new = T)plot(imps ~ cs, type ="l", col =4, lwd =2, xlab =NA, xaxt ="n",ylab =NA, yaxt ="n")axis(4, col.axis =4)mtext("% Improvement", 4, 2.5, col =4, las =0)
We can see that the effect of adding cores tends to level off a bit. We can get into the extrapolation world and try to see at what point the relationship would truly flatten out… Let’s see if we leveraged all 36 cores of RDSH2 how things would play out:
# predict for 1-36 corescs <-seq(1,36)ts <-pred.fun(500,500,cs)# set up plotpar(mar =c(5,5,1,5), las =1)# plot time vs. coresplot(ts ~ cs, type ="l", col =2, lwd =2, xlab ="Cores",ylab =NA, yaxt ="n")axis(2, col.axis =2)mtext("Time (s)", 2, 2.5, col =2, las =0)# plot percent speed improvement vs. coresimps <-100* ((ts[1] - ts) / ts[1])par(new = T)plot(imps ~ cs, type ="l", col =4, lwd =2, xlab =NA, xaxt ="n",ylab =NA, yaxt ="n")axis(4, col.axis =4)mtext("% Improvement", 4, 2.5, col =4, las =0)
Pretty marginal gains towards the high end there. In any case, improvement is still improvement!
Conclusions
It seems pretty clear to me that parallelization is worthwhile for calculate_vi(), even if the gains are somewhat muted at high core counts.