Exploring calculate_vi() Parallel Processing Time

Introduction

The objective of this document is to provide some quantification and visualization of the differences in processing times produced by parallelization of Katherine’s VisiMod::calculate_vi() tool. I ran every combination of the following:

  • Spatial scope: omnidirectional (“omnidir”), single directional (“directional_single”), and random directional (“directional_random”)

  • Number of cores: 1, 2, 4, 8

  • Number of points: 100, 200, 400, 800

  • Viewing distance: 100, 200, 400, 800

For the two directional scopes, I fixed the azimuth and field of view parameters both to 90 degrees. For the single-core test, I also compared the new version of the algorithm (which is set up for parallelization, even if multiple cores aren’t used) to Katherine’s original version of the algorithm. In all, this resulted in 240 runs of calculate_vi(). Here is the full table, in case it’s useful:

library(knitr)
library(viridis)
Loading required package: viridisLite
# read in and print the processing time table
df <- read.csv("S:/ursa2/campbell/visimod/other/calculate_vi_proc_time.csv")
kable(df)
scope npts view.dist cores alg time outnum
omnidir 100 100 1 original 70.63772 NA
directional_single 100 100 1 original 70.59072 NA
directional_random 100 100 1 original 71.01162 NA
omnidir 200 100 1 original 142.55289 NA
directional_single 200 100 1 original 137.62270 NA
directional_random 200 100 1 original 144.32235 NA
omnidir 400 100 1 original 286.82741 NA
directional_single 400 100 1 original 278.07150 NA
directional_random 400 100 1 original 276.67465 NA
omnidir 800 100 1 original 587.48781 NA
directional_single 800 100 1 original 580.85630 NA
directional_random 800 100 1 original 576.40260 NA
omnidir 100 200 1 original 134.74830 NA
directional_single 100 200 1 original 120.64596 NA
directional_random 100 200 1 original 120.19371 NA
omnidir 200 200 1 original 272.83305 NA
directional_single 200 200 1 original 234.28282 NA
directional_random 200 200 1 original 232.01790 NA
omnidir 400 200 1 original 547.23781 NA
directional_single 400 200 1 original 484.65274 NA
directional_random 400 200 1 original 476.41584 NA
omnidir 800 200 1 original 1116.30866 NA
directional_single 800 200 1 original 1062.63343 NA
directional_random 800 200 1 original 1075.00032 NA
omnidir 100 400 1 original 339.46045 NA
directional_single 100 400 1 original 292.40580 NA
directional_random 100 400 1 original 274.60654 NA
omnidir 200 400 1 original 679.92421 NA
directional_single 200 400 1 original 530.52106 NA
directional_random 200 400 1 original 527.01095 NA
omnidir 400 400 1 original 1356.75858 NA
directional_single 400 400 1 original 1063.77346 NA
directional_random 400 400 1 original 1071.77791 NA
omnidir 800 400 1 original 2704.66966 NA
directional_single 800 400 1 original 2176.25649 NA
directional_random 800 400 1 original 2185.94928 NA
omnidir 100 800 1 original 883.16497 NA
directional_single 100 800 1 original 548.01438 NA
directional_random 100 800 1 original 535.72722 NA
omnidir 200 800 1 original 1798.41478 NA
directional_single 200 800 1 original 1084.36524 NA
directional_random 200 800 1 original 1096.45316 NA
omnidir 400 800 1 original 3572.46798 NA
directional_single 400 800 1 original 2189.92040 NA
directional_random 400 800 1 original 2146.70715 NA
omnidir 800 800 1 original 7153.52319 NA
directional_single 800 800 1 original 4395.78153 NA
directional_random 800 800 1 original 4385.84109 NA
omnidir 100 100 1 parallel 84.13921 NA
directional_single 100 100 1 parallel 79.54423 NA
directional_random 100 100 1 parallel 79.42195 NA
omnidir 200 100 1 parallel 162.00025 NA
directional_single 200 100 1 parallel 160.08555 NA
directional_random 200 100 1 parallel 157.43748 NA
omnidir 400 100 1 parallel 311.83589 NA
directional_single 400 100 1 parallel 334.15797 NA
directional_random 400 100 1 parallel 340.40427 NA
omnidir 800 100 1 parallel 642.13019 NA
directional_single 800 100 1 parallel 615.46636 NA
directional_random 800 100 1 parallel 674.60560 NA
omnidir 100 200 1 parallel 147.64638 NA
directional_single 100 200 1 parallel 142.67683 NA
directional_random 100 200 1 parallel 128.56106 NA
omnidir 200 200 1 parallel 284.76511 NA
directional_single 200 200 1 parallel 254.45493 NA
directional_random 200 200 1 parallel 253.42825 NA
omnidir 400 200 1 parallel 576.57082 NA
directional_single 400 200 1 parallel 496.06219 NA
directional_random 400 200 1 parallel 496.63654 NA
omnidir 800 200 1 parallel 1122.71923 NA
directional_single 800 200 1 parallel 1044.50807 NA
directional_random 800 200 1 parallel 1059.31634 NA
omnidir 100 400 1 parallel 332.94834 NA
directional_single 100 400 1 parallel 276.51486 NA
directional_random 100 400 1 parallel 266.60219 NA
omnidir 200 400 1 parallel 671.05632 NA
directional_single 200 400 1 parallel 525.14726 NA
directional_random 200 400 1 parallel 530.58606 NA
omnidir 400 400 1 parallel 1310.86202 NA
directional_single 400 400 1 parallel 1055.43139 NA
directional_random 400 400 1 parallel 1068.07023 NA
omnidir 800 400 1 parallel 2638.18057 NA
directional_single 800 400 1 parallel 1435.92782 NA
directional_random 800 400 1 parallel 1895.93681 NA
omnidir 100 800 1 parallel 863.87952 NA
directional_single 100 800 1 parallel 546.33587 NA
directional_random 100 800 1 parallel 545.92246 NA
omnidir 200 800 1 parallel 1751.58242 NA
directional_single 200 800 1 parallel 1067.34977 NA
directional_random 200 800 1 parallel 1070.31255 NA
omnidir 400 800 1 parallel 3458.91076 NA
directional_single 400 800 1 parallel 2116.47786 NA
directional_random 400 800 1 parallel 2144.41562 NA
omnidir 800 800 1 parallel 6972.79212 NA
directional_single 800 800 1 parallel 4328.40692 NA
directional_random 800 800 1 parallel 4318.53856 NA
omnidir 100 100 2 parallel 59.99200 NA
directional_single 100 100 2 parallel 58.24540 NA
directional_random 100 100 2 parallel 60.26649 NA
omnidir 200 100 2 parallel 109.43504 NA
directional_single 200 100 2 parallel 109.93555 NA
directional_random 200 100 2 parallel 108.18555 NA
omnidir 400 100 2 parallel 200.88993 NA
directional_single 400 100 2 parallel 205.77961 NA
directional_random 400 100 2 parallel 201.73757 NA
omnidir 800 100 2 parallel 388.03809 NA
directional_single 800 100 2 parallel 416.98202 NA
directional_random 800 100 2 parallel 406.06077 NA
omnidir 100 200 2 parallel 94.74341 NA
directional_single 100 200 2 parallel 89.43565 NA
directional_random 100 200 2 parallel 89.30568 NA
omnidir 200 200 2 parallel 176.18670 NA
directional_single 200 200 2 parallel 168.63403 NA
directional_random 200 200 2 parallel 170.22710 NA
omnidir 400 200 2 parallel 355.10987 NA
directional_single 400 200 2 parallel 336.18175 NA
directional_random 400 200 2 parallel 335.01714 NA
omnidir 800 200 2 parallel 688.51264 NA
directional_single 800 200 2 parallel 662.09964 NA
directional_random 800 200 2 parallel 647.20046 NA
omnidir 100 400 2 parallel 202.35913 NA
directional_single 100 400 2 parallel 174.21647 NA
directional_random 100 400 2 parallel 178.87415 NA
omnidir 200 400 2 parallel 404.28140 NA
directional_single 200 400 2 parallel 330.22755 NA
directional_random 200 400 2 parallel 333.93966 NA
omnidir 400 400 2 parallel 716.65385 NA
directional_single 400 400 2 parallel 643.34943 NA
directional_random 400 400 2 parallel 655.81239 NA
omnidir 800 400 2 parallel 725.90631 NA
directional_single 800 400 2 parallel 436.74711 NA
directional_random 800 400 2 parallel 438.07607 NA
omnidir 100 800 2 parallel 324.15754 NA
directional_single 100 800 2 parallel 331.80445 NA
directional_random 100 800 2 parallel 330.10323 NA
omnidir 200 800 2 parallel 976.79755 NA
directional_single 200 800 2 parallel 651.86488 NA
directional_random 200 800 2 parallel 664.58271 NA
omnidir 400 800 2 parallel 1969.99467 NA
directional_single 400 800 2 parallel 1333.62296 NA
directional_random 400 800 2 parallel 1328.21601 NA
omnidir 800 800 2 parallel 3814.53258 NA
directional_single 800 800 2 parallel 2604.21014 NA
directional_random 800 800 2 parallel 1126.86924 800
omnidir 100 100 4 parallel 60.38114 100
directional_single 100 100 4 parallel 63.77082 100
directional_random 100 100 4 parallel 59.48024 100
omnidir 200 100 4 parallel 94.19526 200
directional_single 200 100 4 parallel 99.79912 200
directional_random 200 100 4 parallel 101.36544 200
omnidir 400 100 4 parallel 173.83377 400
directional_single 400 100 4 parallel 190.61623 400
directional_random 400 100 4 parallel 182.90380 400
omnidir 800 100 4 parallel 325.28424 800
directional_single 800 100 4 parallel 348.43339 800
directional_random 800 100 4 parallel 351.58068 800
omnidir 100 200 4 parallel 85.16101 100
directional_single 100 200 4 parallel 90.31539 100
directional_random 100 200 4 parallel 88.30298 100
omnidir 200 200 4 parallel 166.10916 200
directional_single 200 200 4 parallel 164.77338 200
directional_random 200 200 4 parallel 161.39634 200
omnidir 400 200 4 parallel 296.51864 400
directional_single 400 200 4 parallel 315.14091 400
directional_random 400 200 4 parallel 312.31423 400
omnidir 800 200 4 parallel 545.89786 800
directional_single 800 200 4 parallel 597.52939 800
directional_random 800 200 4 parallel 590.50292 800
omnidir 100 400 4 parallel 153.08878 100
directional_single 100 400 4 parallel 142.57740 100
directional_random 100 400 4 parallel 138.93459 100
omnidir 200 400 4 parallel 283.32754 200
directional_single 200 400 4 parallel 273.25604 200
directional_random 200 400 4 parallel 275.44019 200
omnidir 400 400 4 parallel 554.90724 400
directional_single 400 400 4 parallel 532.79932 400
directional_random 400 400 4 parallel 532.02024 400
omnidir 800 400 4 parallel 1094.59979 800
directional_single 800 400 4 parallel 1013.34414 800
directional_random 800 400 4 parallel 1014.51034 800
omnidir 100 800 4 parallel 310.02373 100
directional_single 100 800 4 parallel 223.98841 100
directional_random 100 800 4 parallel 236.74394 100
omnidir 200 800 4 parallel 592.60422 200
directional_single 200 800 4 parallel 451.37142 200
directional_random 200 800 4 parallel 446.53854 200
omnidir 400 800 4 parallel 1175.90734 400
directional_single 400 800 4 parallel 892.24337 400
directional_random 400 800 4 parallel 886.15168 400
omnidir 800 800 4 parallel 2293.84276 800
directional_single 800 800 4 parallel 1739.03009 800
directional_random 800 800 4 parallel 1767.80408 800
omnidir 100 100 8 parallel 73.61730 100
directional_single 100 100 8 parallel 78.40073 100
directional_random 100 100 8 parallel 73.29307 100
omnidir 200 100 8 parallel 103.87823 200
directional_single 200 100 8 parallel 110.65896 200
directional_random 200 100 8 parallel 113.55110 200
omnidir 400 100 8 parallel 177.54109 400
directional_single 400 100 8 parallel 183.77147 400
directional_random 400 100 8 parallel 190.82922 400
omnidir 800 100 8 parallel 327.49503 800
directional_single 800 100 8 parallel 348.17692 800
directional_random 800 100 8 parallel 328.18255 800
omnidir 100 200 8 parallel 90.57976 100
directional_single 100 200 8 parallel 95.19723 100
directional_random 100 200 8 parallel 97.47222 100
omnidir 200 200 8 parallel 165.80856 200
directional_single 200 200 8 parallel 169.38807 200
directional_random 200 200 8 parallel 163.17122 200
omnidir 400 200 8 parallel 268.61135 400
directional_single 400 200 8 parallel 291.74348 400
directional_random 400 200 8 parallel 297.34216 400
omnidir 800 200 8 parallel 512.67468 800
directional_single 800 200 8 parallel 550.91199 800
directional_random 800 200 8 parallel 520.33828 800
omnidir 100 400 8 parallel 134.27153 100
directional_single 100 400 8 parallel 135.97793 100
directional_random 100 400 8 parallel 136.99969 100
omnidir 200 400 8 parallel 232.93261 200
directional_single 200 400 8 parallel 224.00346 200
directional_random 200 400 8 parallel 234.81328 200
omnidir 400 400 8 parallel 426.05954 400
directional_single 400 400 8 parallel 425.42579 400
directional_random 400 400 8 parallel 454.26180 400
omnidir 800 400 8 parallel 828.21894 800
directional_single 800 400 8 parallel 869.35064 800
directional_random 800 400 8 parallel 859.98512 800
omnidir 100 800 8 parallel 219.81920 100
directional_single 100 800 8 parallel 197.28584 100
directional_random 100 800 8 parallel 186.42941 100
omnidir 200 800 8 parallel 395.49037 200
directional_single 200 800 8 parallel 344.05725 200
directional_random 200 800 8 parallel 353.50756 200
omnidir 400 800 8 parallel 771.60772 400
directional_single 400 800 8 parallel 644.94691 400
directional_random 400 800 8 parallel 634.30165 400
omnidir 800 800 8 parallel 1516.28501 800
directional_single 800 800 8 parallel 1245.74822 800
directional_random 800 800 8 parallel 1252.79806 800

The time column refers to the processing time in seconds. The outnum column is something I added maybe 2/3 of the way into my processing. I was running into a few errors, and I was wondering if it was the result of the underlying (original) algorithm or if it was an artifact of parallelization. To test it, I added some error handling within the function that basically just retried the same VI calculation up to 5 times until it didn’t receive an error. The idea was that if it was the underlying algorithm, then the error would still occur even after 5 tries. If the error was resolved in one of the retries, then we can assume it was an artifact of the complexities of the parallelization process. If the error still persisted after 5 tries, then it would simply ignore that point, and there would be one fewer record in the output dataset. So outnum allows us to see if the number of output records were different than the number of input points. In all cases, the outnum was equal to npts, which says that there is no error in the underlying algorithm and that errors were simply an artifact of parallelization, but through the built-in retrying process, they can be resolved.

Single Core Comparison

In order to parallelize the script, I had to switch from using for to using foreach, which required a few additional changes to the underlying function’s code. foreach is known to be a bit slower than for, as it holds more information in memory throughout the looping process as it dynamically constructs merged outputs of the looped function. So the question becomes how much slower is it? If it’s way slower, and most end users are going to use a single core for processing, then that would call into question the use of foreach. Let’s explore…

First, I’ll compare processing time between the original algorithm and the new algorithm by number of points.

# create subset of just single-core runs
df.1core <- df[df$cores == 1,]

# create factor
df.1core$alg[df.1core$alg == "original"] <- "old"
df.1core$alg[df.1core$alg == "parallel"] <- "new"
df.1core$x <- as.factor(paste0(df.1core$npts, "\n", df.1core$alg))

# set up plot
par(mar = c(5,5,1,1), las = 1)

# plot it out
boxplot(time ~ x, data = df.1core, col = rep(c(2,4),4), xlab = "npts\nalgorithm",
        xaxt = "n")
axis(1, at = seq(1,8), labels = levels(df.1core$x), lwd = 0)

OK, so at least visually, the “new” (parallel) algorithm does not appear to increase processing time significantly in comparison to the “old” (sequential) algorithm when run on a single core, by number of points. Let’s do the same for viewing distance:

# create factor
df.1core$x <- as.factor(paste0(df.1core$view.dist, "\n", df.1core$alg))

# set up plot
par(mar = c(5,5,1,1), las = 1)

# plot it out
boxplot(time ~ x, data = df.1core, col = rep(c(2,4),4), xlab = "view.dist\nalgorithm",
        xaxt = "n")
axis(1, at = seq(1,8), labels = levels(df.1core$x), lwd = 0)

Some very minor differences, but fair to say that when running on one core, on average, the old and new algorithms perform very similarly.

Number of Cores

Now let’s test the degree to which multiple cores improves processing times. First, by number of points:

# create subset data frame
df.par <- df[df$alg == "parallel",]

# create factor
df.par$x <- as.factor(paste0(df.par$npts, "\n", df.par$cores))

# set up plot
par(mar = c(5,5,1,1), las = 1)

# plot it out
cols <- viridis(4)
boxplot(time ~ x, data = df.par, col = rep(cols,4), xlab = "npts\ncores",
        xaxt = "n")
axis(1, at = seq(1,length(levels(df.par$x))), labels = levels(df.par$x), lwd = 0, cex.axis = 0.8)

A slightly odd trend for the npts == 800 case notwithstanding, the benefits of multiple cores are clear, both in terms of reducting in processing times, but also reduction in the variation of processing times – that is, the effects of other parameters such as viewing distance (view.dist) and scope (scope) have comparably lesser effects on processing time when multithreading. Let’s look at viewing distance and number of cores:

# create factor
df.par$x <- as.factor(paste0(df.par$view.dist, "\n", df.par$cores))

# set up plot
par(mar = c(5,5,1,1), las = 1)

# plot it out
cols <- viridis(4)
boxplot(time ~ x, data = df.par, col = rep(cols,4), xlab = "view.dist\ncores",
        xaxt = "n")
axis(1, at = seq(1,length(levels(df.par$x))), labels = levels(df.par$x), lwd = 0, cex.axis = 0.8)

Very clear trend. In the cases of both npts and view.dist, it’s clear that the relationship between processing time and number of cores is not linear. In other words, doubling your cores doesn’t halve your processing time. So, it might be useful to see if there’s a somewhat predictable trend.

After a little behind the scenes data exploration, I found that the relationship between the log of processing time, number of points, viewing distance, and number of cores can be pretty well explained with a linear model:

# run linear model
mod <- lm(log(time) ~ npts + view.dist + cores, data = df.par)

# set up plot
par(mar = c(5,5,1,1), las = 1)

# define x and y
x <- log(df.par$time)
y <- predict(mod)
ax.min <- min(c(y, x))
ax.max <- max(c(y, x))
ax.lim <- c(ax.min, ax.max)

# plot predicted vs. observed
col <- rep(2, nrow(df.par))
col[df.par$scope == "omnidir"] <- 4
plot(y ~ x, xlim = ax.lim, ylim = ax.lim,
     pch = 16, col = col, xlab = "log(Observed Time)",
     ylab = "log(Predicted Time)")
grid()
lines(x = c(-100000,100000),
      y = c(-100000,100000), col = "lightgray")
mod.pred.obs <- lm(y ~ x)
abline(mod.pred.obs, lwd = 2)
r2 <- paste0("rsq = ", round(summary(mod.pred.obs)$adj.r.squared, 2))
legend("bottomright", legend = r2, bty = "n", x.intersp = 0)
legend("topleft", legend = c("directional", "omnidir"), pch = 16, col = c(2,4))

Pretty impressive predictive power! But clearly the omnidirectional cases are being somewhat underpredicted. If I tested more directional parameters (e.g., wider FOVs), I could add that in as a predictor variable, but being able to explain 87% of variance without that distinction is still pretty impressive. So, we now have some semblance of a processing time predictive equation. Using the coefficients from that log regression, and applying the exponential function to both sides of the equation to get a prediction in seconds, we get:

\[ t=e^{\alpha + \beta_{1}p+\beta_{2}d+\beta_{3}c} \]

where \(t\) is time in seconds, \(p\) is the number of points, \(d\) is the viewing distance in number of pixels, \(c\) is the number of cores, and \(\alpha\), \(\beta_{1}\), \(\beta_{2}\), and \(\beta_{3}\) are all model coefficients, defined as follows:

  • \(\alpha\) = 4.4291754
  • \(\beta_{1}\) = 0.0024576
  • \(\beta_{2}\) = 0.0022904
  • \(\beta_{3}\) = -0.0868244

This can be used to make predictions, distilling down the approximate effects of adding cores with different input parameters. For example, if we had 500 points and a viewing distance of 500 meters, here is how adding cores might speed things up:

# create prediction function
pred.fun <- function(p,d,c){
  t <- exp(coef(mod)[[1]] + 
             coef(mod)[[2]] * p +
             coef(mod)[[3]] * d +
             coef(mod)[[4]] * c)
  return(t)
}

# predict for 1-10 cores
cs <- seq(1,10)
ts <- pred.fun(500,500,cs)

# set up plot
par(mar = c(5,5,1,5), las = 1)

# plot time vs. cores
plot(ts ~ cs, type = "l", col = 2, lwd = 2, xlab = "Cores",
     ylab = NA, yaxt = "n")
axis(2, col.axis = 2)
mtext("Time (s)", 2, 2.5, col = 2, las = 0)

# plot percent speed improvement vs. cores
imps <- 100 * ((ts[1] - ts) / ts[1])
par(new = T)
plot(imps ~ cs, type = "l", col = 4, lwd = 2, xlab = NA, xaxt = "n",
     ylab = NA, yaxt = "n")
axis(4, col.axis = 4)
mtext("% Improvement", 4, 2.5, col = 4, las = 0)

We can see that the effect of adding cores tends to level off a bit. We can get into the extrapolation world and try to see at what point the relationship would truly flatten out… Let’s see if we leveraged all 36 cores of RDSH2 how things would play out:

# predict for 1-36 cores
cs <- seq(1,36)
ts <- pred.fun(500,500,cs)

# set up plot
par(mar = c(5,5,1,5), las = 1)

# plot time vs. cores
plot(ts ~ cs, type = "l", col = 2, lwd = 2, xlab = "Cores",
     ylab = NA, yaxt = "n")
axis(2, col.axis = 2)
mtext("Time (s)", 2, 2.5, col = 2, las = 0)

# plot percent speed improvement vs. cores
imps <- 100 * ((ts[1] - ts) / ts[1])
par(new = T)
plot(imps ~ cs, type = "l", col = 4, lwd = 2, xlab = NA, xaxt = "n",
     ylab = NA, yaxt = "n")
axis(4, col.axis = 4)
mtext("% Improvement", 4, 2.5, col = 4, las = 0)

Pretty marginal gains towards the high end there. In any case, improvement is still improvement!

Conclusions

It seems pretty clear to me that parallelization is worthwhile for calculate_vi(), even if the gains are somewhat muted at high core counts.