Exploring calculate_vi() Parallel Processing Time

Introduction

The objective of this document is to provide some quantification and visualization of the differences in processing times produced by parallelization of Katherine’s VisiMod::calculate_vi() tool. I ran every combination of the following:

Spatial scope: omnidirectional (“omnidir”), single directional (“directional_single”), and random directional (“directional_random”)
Number of cores: 1, 2, 4, 8
Number of points: 100, 200, 400, 800
Viewing distance: 100, 200, 400, 800

For the two directional scopes, I fixed the azimuth and field of view parameters both to 90 degrees. For the single-core test, I also compared the new version of the algorithm (which is set up for parallelization, even if multiple cores aren’t used) to Katherine’s original version of the algorithm. In all, this resulted in 240 runs of calculate_vi(). Here is the full table, in case it’s useful:

library(knitr)
library(viridis)

Loading required package: viridisLite

# read in and print the processing time table
df <- read.csv("S:/ursa2/campbell/visimod/other/calculate_vi_proc_time.csv")
kable(df)

scope	npts	view.dist	cores	alg	time	outnum
omnidir	100	100	1	original	70.63772	NA
directional_single	100	100	1	original	70.59072	NA
directional_random	100	100	1	original	71.01162	NA
omnidir	200	100	1	original	142.55289	NA
directional_single	200	100	1	original	137.62270	NA
directional_random	200	100	1	original	144.32235	NA
omnidir	400	100	1	original	286.82741	NA
directional_single	400	100	1	original	278.07150	NA
directional_random	400	100	1	original	276.67465	NA
omnidir	800	100	1	original	587.48781	NA
directional_single	800	100	1	original	580.85630	NA
directional_random	800	100	1	original	576.40260	NA
omnidir	100	200	1	original	134.74830	NA
directional_single	100	200	1	original	120.64596	NA
directional_random	100	200	1	original	120.19371	NA
omnidir	200	200	1	original	272.83305	NA
directional_single	200	200	1	original	234.28282	NA
directional_random	200	200	1	original	232.01790	NA
omnidir	400	200	1	original	547.23781	NA
directional_single	400	200	1	original	484.65274	NA
directional_random	400	200	1	original	476.41584	NA
omnidir	800	200	1	original	1116.30866	NA
directional_single	800	200	1	original	1062.63343	NA
directional_random	800	200	1	original	1075.00032	NA
omnidir	100	400	1	original	339.46045	NA
directional_single	100	400	1	original	292.40580	NA
directional_random	100	400	1	original	274.60654	NA
omnidir	200	400	1	original	679.92421	NA
directional_single	200	400	1	original	530.52106	NA
directional_random	200	400	1	original	527.01095	NA
omnidir	400	400	1	original	1356.75858	NA
directional_single	400	400	1	original	1063.77346	NA
directional_random	400	400	1	original	1071.77791	NA
omnidir	800	400	1	original	2704.66966	NA
directional_single	800	400	1	original	2176.25649	NA
directional_random	800	400	1	original	2185.94928	NA
omnidir	100	800	1	original	883.16497	NA
directional_single	100	800	1	original	548.01438	NA
directional_random	100	800	1	original	535.72722	NA
omnidir	200	800	1	original	1798.41478	NA
directional_single	200	800	1	original	1084.36524	NA
directional_random	200	800	1	original	1096.45316	NA
omnidir	400	800	1	original	3572.46798	NA
directional_single	400	800	1	original	2189.92040	NA
directional_random	400	800	1	original	2146.70715	NA
omnidir	800	800	1	original	7153.52319	NA
directional_single	800	800	1	original	4395.78153	NA
directional_random	800	800	1	original	4385.84109	NA
omnidir	100	100	1	parallel	84.13921	NA
directional_single	100	100	1	parallel	79.54423	NA
directional_random	100	100	1	parallel	79.42195	NA
omnidir	200	100	1	parallel	162.00025	NA
directional_single	200	100	1	parallel	160.08555	NA
directional_random	200	100	1	parallel	157.43748	NA
omnidir	400	100	1	parallel	311.83589	NA
directional_single	400	100	1	parallel	334.15797	NA
directional_random	400	100	1	parallel	340.40427	NA
omnidir	800	100	1	parallel	642.13019	NA
directional_single	800	100	1	parallel	615.46636	NA
directional_random	800	100	1	parallel	674.60560	NA
omnidir	100	200	1	parallel	147.64638	NA
directional_single	100	200	1	parallel	142.67683	NA
directional_random	100	200	1	parallel	128.56106	NA
omnidir	200	200	1	parallel	284.76511	NA
directional_single	200	200	1	parallel	254.45493	NA
directional_random	200	200	1	parallel	253.42825	NA
omnidir	400	200	1	parallel	576.57082	NA
directional_single	400	200	1	parallel	496.06219	NA
directional_random	400	200	1	parallel	496.63654	NA
omnidir	800	200	1	parallel	1122.71923	NA
directional_single	800	200	1	parallel	1044.50807	NA
directional_random	800	200	1	parallel	1059.31634	NA
omnidir	100	400	1	parallel	332.94834	NA
directional_single	100	400	1	parallel	276.51486	NA
directional_random	100	400	1	parallel	266.60219	NA
omnidir	200	400	1	parallel	671.05632	NA
directional_single	200	400	1	parallel	525.14726	NA
directional_random	200	400	1	parallel	530.58606	NA
omnidir	400	400	1	parallel	1310.86202	NA
directional_single	400	400	1	parallel	1055.43139	NA
directional_random	400	400	1	parallel	1068.07023	NA
omnidir	800	400	1	parallel	2638.18057	NA
directional_single	800	400	1	parallel	1435.92782	NA
directional_random	800	400	1	parallel	1895.93681	NA
omnidir	100	800	1	parallel	863.87952	NA
directional_single	100	800	1	parallel	546.33587	NA
directional_random	100	800	1	parallel	545.92246	NA
omnidir	200	800	1	parallel	1751.58242	NA
directional_single	200	800	1	parallel	1067.34977	NA
directional_random	200	800	1	parallel	1070.31255	NA
omnidir	400	800	1	parallel	3458.91076	NA
directional_single	400	800	1	parallel	2116.47786	NA
directional_random	400	800	1	parallel	2144.41562	NA
omnidir	800	800	1	parallel	6972.79212	NA
directional_single	800	800	1	parallel	4328.40692	NA
directional_random	800	800	1	parallel	4318.53856	NA
omnidir	100	100	2	parallel	59.99200	NA
directional_single	100	100	2	parallel	58.24540	NA
directional_random	100	100	2	parallel	60.26649	NA
omnidir	200	100	2	parallel	109.43504	NA
directional_single	200	100	2	parallel	109.93555	NA
directional_random	200	100	2	parallel	108.18555	NA
omnidir	400	100	2	parallel	200.88993	NA
directional_single	400	100	2	parallel	205.77961	NA
directional_random	400	100	2	parallel	201.73757	NA
omnidir	800	100	2	parallel	388.03809	NA
directional_single	800	100	2	parallel	416.98202	NA
directional_random	800	100	2	parallel	406.06077	NA
omnidir	100	200	2	parallel	94.74341	NA
directional_single	100	200	2	parallel	89.43565	NA
directional_random	100	200	2	parallel	89.30568	NA
omnidir	200	200	2	parallel	176.18670	NA
directional_single	200	200	2	parallel	168.63403	NA
directional_random	200	200	2	parallel	170.22710	NA
omnidir	400	200	2	parallel	355.10987	NA
directional_single	400	200	2	parallel	336.18175	NA
directional_random	400	200	2	parallel	335.01714	NA
omnidir	800	200	2	parallel	688.51264	NA
directional_single	800	200	2	parallel	662.09964	NA
directional_random	800	200	2	parallel	647.20046	NA
omnidir	100	400	2	parallel	202.35913	NA
directional_single	100	400	2	parallel	174.21647	NA
directional_random	100	400	2	parallel	178.87415	NA
omnidir	200	400	2	parallel	404.28140	NA
directional_single	200	400	2	parallel	330.22755	NA
directional_random	200	400	2	parallel	333.93966	NA
omnidir	400	400	2	parallel	716.65385	NA
directional_single	400	400	2	parallel	643.34943	NA
directional_random	400	400	2	parallel	655.81239	NA
omnidir	800	400	2	parallel	725.90631	NA
directional_single	800	400	2	parallel	436.74711	NA
directional_random	800	400	2	parallel	438.07607	NA
omnidir	100	800	2	parallel	324.15754	NA
directional_single	100	800	2	parallel	331.80445	NA
directional_random	100	800	2	parallel	330.10323	NA
omnidir	200	800	2	parallel	976.79755	NA
directional_single	200	800	2	parallel	651.86488	NA
directional_random	200	800	2	parallel	664.58271	NA
omnidir	400	800	2	parallel	1969.99467	NA
directional_single	400	800	2	parallel	1333.62296	NA
directional_random	400	800	2	parallel	1328.21601	NA
omnidir	800	800	2	parallel	3814.53258	NA
directional_single	800	800	2	parallel	2604.21014	NA
directional_random	800	800	2	parallel	1126.86924	800
omnidir	100	100	4	parallel	60.38114	100
directional_single	100	100	4	parallel	63.77082	100
directional_random	100	100	4	parallel	59.48024	100
omnidir	200	100	4	parallel	94.19526	200
directional_single	200	100	4	parallel	99.79912	200
directional_random	200	100	4	parallel	101.36544	200
omnidir	400	100	4	parallel	173.83377	400
directional_single	400	100	4	parallel	190.61623	400
directional_random	400	100	4	parallel	182.90380	400
omnidir	800	100	4	parallel	325.28424	800
directional_single	800	100	4	parallel	348.43339	800
directional_random	800	100	4	parallel	351.58068	800
omnidir	100	200	4	parallel	85.16101	100
directional_single	100	200	4	parallel	90.31539	100
directional_random	100	200	4	parallel	88.30298	100
omnidir	200	200	4	parallel	166.10916	200
directional_single	200	200	4	parallel	164.77338	200
directional_random	200	200	4	parallel	161.39634	200
omnidir	400	200	4	parallel	296.51864	400
directional_single	400	200	4	parallel	315.14091	400
directional_random	400	200	4	parallel	312.31423	400
omnidir	800	200	4	parallel	545.89786	800
directional_single	800	200	4	parallel	597.52939	800
directional_random	800	200	4	parallel	590.50292	800
omnidir	100	400	4	parallel	153.08878	100
directional_single	100	400	4	parallel	142.57740	100
directional_random	100	400	4	parallel	138.93459	100
omnidir	200	400	4	parallel	283.32754	200
directional_single	200	400	4	parallel	273.25604	200
directional_random	200	400	4	parallel	275.44019	200
omnidir	400	400	4	parallel	554.90724	400
directional_single	400	400	4	parallel	532.79932	400
directional_random	400	400	4	parallel	532.02024	400
omnidir	800	400	4	parallel	1094.59979	800
directional_single	800	400	4	parallel	1013.34414	800
directional_random	800	400	4	parallel	1014.51034	800
omnidir	100	800	4	parallel	310.02373	100
directional_single	100	800	4	parallel	223.98841	100
directional_random	100	800	4	parallel	236.74394	100
omnidir	200	800	4	parallel	592.60422	200
directional_single	200	800	4	parallel	451.37142	200
directional_random	200	800	4	parallel	446.53854	200
omnidir	400	800	4	parallel	1175.90734	400
directional_single	400	800	4	parallel	892.24337	400
directional_random	400	800	4	parallel	886.15168	400
omnidir	800	800	4	parallel	2293.84276	800
directional_single	800	800	4	parallel	1739.03009	800
directional_random	800	800	4	parallel	1767.80408	800
omnidir	100	100	8	parallel	73.61730	100
directional_single	100	100	8	parallel	78.40073	100
directional_random	100	100	8	parallel	73.29307	100
omnidir	200	100	8	parallel	103.87823	200
directional_single	200	100	8	parallel	110.65896	200
directional_random	200	100	8	parallel	113.55110	200
omnidir	400	100	8	parallel	177.54109	400
directional_single	400	100	8	parallel	183.77147	400
directional_random	400	100	8	parallel	190.82922	400
omnidir	800	100	8	parallel	327.49503	800
directional_single	800	100	8	parallel	348.17692	800
directional_random	800	100	8	parallel	328.18255	800
omnidir	100	200	8	parallel	90.57976	100
directional_single	100	200	8	parallel	95.19723	100
directional_random	100	200	8	parallel	97.47222	100
omnidir	200	200	8	parallel	165.80856	200
directional_single	200	200	8	parallel	169.38807	200
directional_random	200	200	8	parallel	163.17122	200
omnidir	400	200	8	parallel	268.61135	400
directional_single	400	200	8	parallel	291.74348	400
directional_random	400	200	8	parallel	297.34216	400
omnidir	800	200	8	parallel	512.67468	800
directional_single	800	200	8	parallel	550.91199	800
directional_random	800	200	8	parallel	520.33828	800
omnidir	100	400	8	parallel	134.27153	100
directional_single	100	400	8	parallel	135.97793	100
directional_random	100	400	8	parallel	136.99969	100
omnidir	200	400	8	parallel	232.93261	200
directional_single	200	400	8	parallel	224.00346	200
directional_random	200	400	8	parallel	234.81328	200
omnidir	400	400	8	parallel	426.05954	400
directional_single	400	400	8	parallel	425.42579	400
directional_random	400	400	8	parallel	454.26180	400
omnidir	800	400	8	parallel	828.21894	800
directional_single	800	400	8	parallel	869.35064	800
directional_random	800	400	8	parallel	859.98512	800
omnidir	100	800	8	parallel	219.81920	100
directional_single	100	800	8	parallel	197.28584	100
directional_random	100	800	8	parallel	186.42941	100
omnidir	200	800	8	parallel	395.49037	200
directional_single	200	800	8	parallel	344.05725	200
directional_random	200	800	8	parallel	353.50756	200
omnidir	400	800	8	parallel	771.60772	400
directional_single	400	800	8	parallel	644.94691	400
directional_random	400	800	8	parallel	634.30165	400
omnidir	800	800	8	parallel	1516.28501	800
directional_single	800	800	8	parallel	1245.74822	800
directional_random	800	800	8	parallel	1252.79806	800

The time column refers to the processing time in seconds. The outnum column is something I added maybe 2/3 of the way into my processing. I was running into a few errors, and I was wondering if it was the result of the underlying (original) algorithm or if it was an artifact of parallelization. To test it, I added some error handling within the function that basically just retried the same VI calculation up to 5 times until it didn’t receive an error. The idea was that if it was the underlying algorithm, then the error would still occur even after 5 tries. If the error was resolved in one of the retries, then we can assume it was an artifact of the complexities of the parallelization process. If the error still persisted after 5 tries, then it would simply ignore that point, and there would be one fewer record in the output dataset. So outnum allows us to see if the number of output records were different than the number of input points. In all cases, the outnum was equal to npts, which says that there is no error in the underlying algorithm and that errors were simply an artifact of parallelization, but through the built-in retrying process, they can be resolved.

Single Core Comparison

In order to parallelize the script, I had to switch from using for to using foreach, which required a few additional changes to the underlying function’s code. foreach is known to be a bit slower than for, as it holds more information in memory throughout the looping process as it dynamically constructs merged outputs of the looped function. So the question becomes how much slower is it? If it’s way slower, and most end users are going to use a single core for processing, then that would call into question the use of foreach. Let’s explore…

First, I’ll compare processing time between the original algorithm and the new algorithm by number of points.

# create subset of just single-core runs
df.1core <- df[df$cores == 1,]

# create factor
df.1core$alg[df.1core$alg == "original"] <- "old"
df.1core$alg[df.1core$alg == "parallel"] <- "new"
df.1core$x <- as.factor(paste0(df.1core$npts, "\n", df.1core$alg))

# set up plot
par(mar = c(5,5,1,1), las = 1)

# plot it out
boxplot(time ~ x, data = df.1core, col = rep(c(2,4),4), xlab = "npts\nalgorithm",
        xaxt = "n")
axis(1, at = seq(1,8), labels = levels(df.1core$x), lwd = 0)

OK, so at least visually, the “new” (parallel) algorithm does not appear to increase processing time significantly in comparison to the “old” (sequential) algorithm when run on a single core, by number of points. Let’s do the same for viewing distance:

# create factor
df.1core$x <- as.factor(paste0(df.1core$view.dist, "\n", df.1core$alg))

# set up plot
par(mar = c(5,5,1,1), las = 1)

# plot it out
boxplot(time ~ x, data = df.1core, col = rep(c(2,4),4), xlab = "view.dist\nalgorithm",
        xaxt = "n")
axis(1, at = seq(1,8), labels = levels(df.1core$x), lwd = 0)

Some very minor differences, but fair to say that when running on one core, on average, the old and new algorithms perform very similarly.

Number of Cores

Now let’s test the degree to which multiple cores improves processing times. First, by number of points:

# create subset data frame
df.par <- df[df$alg == "parallel",]

# create factor
df.par$x <- as.factor(paste0(df.par$npts, "\n", df.par$cores))

# set up plot
par(mar = c(5,5,1,1), las = 1)

# plot it out
cols <- viridis(4)
boxplot(time ~ x, data = df.par, col = rep(cols,4), xlab = "npts\ncores",
        xaxt = "n")
axis(1, at = seq(1,length(levels(df.par$x))), labels = levels(df.par$x), lwd = 0, cex.axis = 0.8)

A slightly odd trend for the npts == 800 case notwithstanding, the benefits of multiple cores are clear, both in terms of reducting in processing times, but also reduction in the variation of processing times – that is, the effects of other parameters such as viewing distance (view.dist) and scope (scope) have comparably lesser effects on processing time when multithreading. Let’s look at viewing distance and number of cores:

# create factor
df.par$x <- as.factor(paste0(df.par$view.dist, "\n", df.par$cores))

# set up plot
par(mar = c(5,5,1,1), las = 1)

# plot it out
cols <- viridis(4)
boxplot(time ~ x, data = df.par, col = rep(cols,4), xlab = "view.dist\ncores",
        xaxt = "n")
axis(1, at = seq(1,length(levels(df.par$x))), labels = levels(df.par$x), lwd = 0, cex.axis = 0.8)

Very clear trend. In the cases of both npts and view.dist, it’s clear that the relationship between processing time and number of cores is not linear. In other words, doubling your cores doesn’t halve your processing time. So, it might be useful to see if there’s a somewhat predictable trend.

After a little behind the scenes data exploration, I found that the relationship between the log of processing time, number of points, viewing distance, and number of cores can be pretty well explained with a linear model:

# run linear model
mod <- lm(log(time) ~ npts + view.dist + cores, data = df.par)

# set up plot
par(mar = c(5,5,1,1), las = 1)

# define x and y
x <- log(df.par$time)
y <- predict(mod)
ax.min <- min(c(y, x))
ax.max <- max(c(y, x))
ax.lim <- c(ax.min, ax.max)

# plot predicted vs. observed
col <- rep(2, nrow(df.par))
col[df.par$scope == "omnidir"] <- 4
plot(y ~ x, xlim = ax.lim, ylim = ax.lim,
     pch = 16, col = col, xlab = "log(Observed Time)",
     ylab = "log(Predicted Time)")
grid()
lines(x = c(-100000,100000),
      y = c(-100000,100000), col = "lightgray")
mod.pred.obs <- lm(y ~ x)
abline(mod.pred.obs, lwd = 2)
r2 <- paste0("rsq = ", round(summary(mod.pred.obs)$adj.r.squared, 2))
legend("bottomright", legend = r2, bty = "n", x.intersp = 0)
legend("topleft", legend = c("directional", "omnidir"), pch = 16, col = c(2,4))

Pretty impressive predictive power! But clearly the omnidirectional cases are being somewhat underpredicted. If I tested more directional parameters (e.g., wider FOVs), I could add that in as a predictor variable, but being able to explain 87% of variance without that distinction is still pretty impressive. So, we now have some semblance of a processing time predictive equation. Using the coefficients from that log regression, and applying the exponential function to both sides of the equation to get a prediction in seconds, we get:

\[ t=e^{\alpha + \beta_{1}p+\beta_{2}d+\beta_{3}c} \]

where \(t\) is time in seconds, \(p\) is the number of points, \(d\) is the viewing distance in number of pixels, \(c\) is the number of cores, and \(\alpha\), \(\beta_{1}\), \(\beta_{2}\), and \(\beta_{3}\) are all model coefficients, defined as follows:

\(\alpha\) = 4.4291754
\(\beta_{1}\) = 0.0024576
\(\beta_{2}\) = 0.0022904
\(\beta_{3}\) = -0.0868244

This can be used to make predictions, distilling down the approximate effects of adding cores with different input parameters. For example, if we had 500 points and a viewing distance of 500 meters, here is how adding cores might speed things up:

# create prediction function
pred.fun <- function(p,d,c){
  t <- exp(coef(mod)[[1]] + 
             coef(mod)[[2]] * p +
             coef(mod)[[3]] * d +
             coef(mod)[[4]] * c)
  return(t)
}

# predict for 1-10 cores
cs <- seq(1,10)
ts <- pred.fun(500,500,cs)

# set up plot
par(mar = c(5,5,1,5), las = 1)

# plot time vs. cores
plot(ts ~ cs, type = "l", col = 2, lwd = 2, xlab = "Cores",
     ylab = NA, yaxt = "n")
axis(2, col.axis = 2)
mtext("Time (s)", 2, 2.5, col = 2, las = 0)

# plot percent speed improvement vs. cores
imps <- 100 * ((ts[1] - ts) / ts[1])
par(new = T)
plot(imps ~ cs, type = "l", col = 4, lwd = 2, xlab = NA, xaxt = "n",
     ylab = NA, yaxt = "n")
axis(4, col.axis = 4)
mtext("% Improvement", 4, 2.5, col = 4, las = 0)

We can see that the effect of adding cores tends to level off a bit. We can get into the extrapolation world and try to see at what point the relationship would truly flatten out… Let’s see if we leveraged all 36 cores of RDSH2 how things would play out:

# predict for 1-36 cores
cs <- seq(1,36)
ts <- pred.fun(500,500,cs)

# set up plot
par(mar = c(5,5,1,5), las = 1)

# plot time vs. cores
plot(ts ~ cs, type = "l", col = 2, lwd = 2, xlab = "Cores",
     ylab = NA, yaxt = "n")
axis(2, col.axis = 2)
mtext("Time (s)", 2, 2.5, col = 2, las = 0)

# plot percent speed improvement vs. cores
imps <- 100 * ((ts[1] - ts) / ts[1])
par(new = T)
plot(imps ~ cs, type = "l", col = 4, lwd = 2, xlab = NA, xaxt = "n",
     ylab = NA, yaxt = "n")
axis(4, col.axis = 4)
mtext("% Improvement", 4, 2.5, col = 4, las = 0)

Pretty marginal gains towards the high end there. In any case, improvement is still improvement!

Conclusions

It seems pretty clear to me that parallelization is worthwhile for calculate_vi(), even if the gains are somewhat muted at high core counts.