Exploring Katherine’s VI data

Plot #1: Better performer according to R² by radius

First, I just want to get a site-by-site count of how many sites are doing better with the national-level model vs. the site-level model, by viewing radius, according to R².

# read in the data
df <- read.csv("S:/ursa/campbell/Other/viewshed_approximation/other/Mod_Vis_All_results_table.csv")

# change column names
colnames(df) <- c("region","biomass","sde","radius","obs.vi.mean","site.r2",
                  "national.r2","site.rrmse","national.rrmse","evt.lf","evt.class")

# create a field that classifies which model performed better (site vs.
# national), for each site
df$better.r2 <- "site"
df$better.r2[df$national.r2 > df$site.r2] <- "national"

# create summary table
df.summ <- table(df[,c("better.r2", "radius")])

# create a bar plot that counts better performer by radius
par(mar = c(5,5,1,7), las = 1)
barplot(df.summ, beside = T, col = c("red", "blue"), ylab = "Count",
        xlab = "Radius")
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
       fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")

So, national-level models are doing better at shorter radii, whereas site-level models are doing better at longer radii, at least according to R².

Plot #2: Better performer according to RRMSE by radius

Next, I’ll do the same thing, but focusing on RRMSE as the basis of comparison.

# create a field that classifies which model performed better (site vs.
# national), for each site
df$better.rrmse <- "site"
df$better.rrmse[df$national.rrmse < df$site.rrmse] <- "national"

# create summary table
df.summ <- table(df[,c("better.rrmse", "radius")])

# create a bar plot that counts better performer by radius
par(mar = c(5,5,1,7), las = 1)
barplot(df.summ, beside = T, col = c("red", "blue"), ylab = "Count",
        xlab = "Radius")
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
       fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")

According to RRMSE, site-level models are performing better at all radii, with increasing disparities at larger radii.

Plot #3: Better performer according to R² vs. biomass by radius

To add a layer of nuance into the analysis, I’ll now examine which sites did better (according to R²) by biomass and radius.

# create a boxplot that compares biomass by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(biomass ~ better.r2 + radius, data = df, xlab = "Radius", ylab = "Biomass",
        xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
       fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")

Clearly, the national model does much better at high-biomass sites than the site-level models at all radii. I’m already certain that these differences are statistically-significant, but if we need some numbers, we can run a Wilcoxon rank sum test.

# run wilcoxon rank sum test on each radius to test significance of biomass vs. site/national better performance
w.125 <- wilcox.test(x = df$biomass[df$better.r2 == "national" & df$radius == 125],
                     y = df$biomass[df$better.r2 == "site" & df$radius == 125],
                     alternative = "greater")
w.250 <- wilcox.test(x = df$biomass[df$better.r2 == "national" & df$radius == 250],
                     y = df$biomass[df$better.r2 == "site" & df$radius == 250],
                     alternative = "greater")
w.500 <- wilcox.test(x = df$biomass[df$better.r2 == "national" & df$radius == 500],
                     y = df$biomass[df$better.r2 == "site" & df$radius == 500],
                     alternative = "greater")
w.1000 <- wilcox.test(x = df$biomass[df$better.r2 == "national" & df$radius == 1000],
                      y = df$biomass[df$better.r2 == "site" & df$radius == 1000],
                      alternative = "greater")

Here are the p-values:

125m: 0.012
250m: 0
500m: 0.004
1000m: 0.004

Plot #4: Better performer according to R² vs. SDE by radius

Now I’ll do the same thing, but for SDE.

# create a boxplot that compares sde by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(sde ~ better.r2 + radius, data = df, xlab = "Radius", ylab = "SDE",
        xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
       fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")

…No noticeable differences.

Plot #5: Better performer according to R² vs. mean observed VI by radius

Now I’ll do the same thing, but for mean observed VI.

# create a boxplot that compares mean observed vi by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(obs.vi.mean ~ better.r2 + radius, data = df, xlab = "Radius", ylab = "Mean Observed VI",
        xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
       fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")

Some fairly stark differences emerge. Let’s test significance…

# run wilcoxon rank sum test on each radius to test significance of mean observed vi vs. site/national better performance
w.125 <- wilcox.test(x = df$obs.vi.mean[df$better.r2 == "national" & df$radius == 125],
            y = df$obs.vi.mean[df$better.r2 == "site" & df$radius == 125],
            alternative = "less")
w.250 <- wilcox.test(x = df$obs.vi.mean[df$better.r2 == "national" & df$radius == 250],
            y = df$obs.vi.mean[df$better.r2 == "site" & df$radius == 250],
            alternative = "less")
w.500 <- wilcox.test(x = df$obs.vi.mean[df$better.r2 == "national" & df$radius == 500],
            y = df$obs.vi.mean[df$better.r2 == "site" & df$radius == 500],
            alternative = "less")
w.1000 <- wilcox.test(x = df$obs.vi.mean[df$better.r2 == "national" & df$radius == 1000],
            y = df$obs.vi.mean[df$better.r2 == "site" & df$radius == 1000],
            alternative = "less")

Here are the p-values:

125m: 0.033
250m: 0.002
500m: 0.003
1000m: 0.006

Once again, all significant. But, if we’re willing to get into the p-value comparisons game (which may be statistically-iffy), then biomass has a stronger effect than observed VI.

Plot #6: Better performer according to RRMSE vs. biomass by radius

So, the last three plots were all based on R². But, the challenging narrative in the paper really comes from RRMSE. So, let’s repeat the last three plots, but focusing on RRMSE instead.

# create a boxplot that compares biomass by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(biomass ~ better.rrmse + radius, data = df, xlab = "Radius", ylab = "Biomass",
        xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
       fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")

Once again, we can see that generally national-level models outperform site-level models according to RRMSE, but the relationship is not as stark as that with R². Let’s check the significance…

# run wilcoxon rank sum test on each radius to test significance of biomass vs. site/national better performance
w.125 <- wilcox.test(x = df$biomass[df$better.rrmse == "national" & df$radius == 125],
                     y = df$biomass[df$better.rrmse == "site" & df$radius == 125],
                     alternative = "greater")
w.250 <- wilcox.test(x = df$biomass[df$better.rrmse == "national" & df$radius == 250],
                     y = df$biomass[df$better.rrmse == "site" & df$radius == 250],
                     alternative = "greater")
w.500 <- wilcox.test(x = df$biomass[df$better.rrmse == "national" & df$radius == 500],
                     y = df$biomass[df$better.rrmse == "site" & df$radius == 500],
                     alternative = "greater")
w.1000 <- wilcox.test(x = df$biomass[df$better.rrmse == "national" & df$radius == 1000],
                      y = df$biomass[df$better.rrmse == "site" & df$radius == 1000],
                      alternative = "greater")

Here are the p-values:

125m: 0.142
250m: 0.087
500m: 0.191
1000m: 0.038

Only significant at 1000m.

Plot #5: Better performer according to RRMSE vs. SDE by radius

# create a boxplot that compares sde by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(sde ~ better.rrmse + radius, data = df, xlab = "Radius", ylab = "SDE",
        xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
       fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")

No major differences.

Plot #6: Better performer according to RRMSE vs. mean observed VI by radius

# create a boxplot that compares mean observed vi by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(obs.vi.mean ~ better.rrmse + radius, data = df, xlab = "Radius", ylab = "Mean Observed VI",
        xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
       fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")

Once again, we see that national-level models do better at modeling lower-VI areas than site-level models. Are the differences significant?

# run wilcoxon rank sum test on each radius to test significance of mean observed vi vs. site/national better performance
w.125 <- wilcox.test(x = df$obs.vi.mean[df$better.rrmse == "national" & df$radius == 125],
                     y = df$obs.vi.mean[df$better.rrmse == "site" & df$radius == 125],
                     alternative = "less")
w.250 <- wilcox.test(x = df$obs.vi.mean[df$better.rrmse == "national" & df$radius == 250],
                     y = df$obs.vi.mean[df$better.rrmse == "site" & df$radius == 250],
                     alternative = "less")
w.500 <- wilcox.test(x = df$obs.vi.mean[df$better.rrmse == "national" & df$radius == 500],
                     y = df$obs.vi.mean[df$better.rrmse == "site" & df$radius == 500],
                     alternative = "less")
w.1000 <- wilcox.test(x = df$obs.vi.mean[df$better.rrmse == "national" & df$radius == 1000],
                      y = df$obs.vi.mean[df$better.rrmse == "site" & df$radius == 1000],
                      alternative = "less")

Here are the p-values:

125m: 0.08
250m: 0.09
500m: 0.133
1000m: 0.014

Once again, only significant at 1000m.

So what does this all tell us?

I guess it tells us a few things (that I think we already know…):

National-level models are able to explain more variance (have higher R²) than site-level models at shorter viewing radii, and vice versa at longer viewing radii.
National-level models have higher RRMSE at more sites across all radii, but the differences increase with increasing radius.
National-level models are better at modeling visibility in high-biomass, low-visibility sites
The differences in model performance according to R² are much clearer than those of RRMSE
However, even though the relationships are generally not statistically-significant, national-level models still tend to outperform site-level models at high-biomass, low-visibility sites.
SDE has no effect, across the board.

Exploring Katherine’s VI data

Mickey Campbell

2023-01-31

Plot #1: Better performer according to R² by radius

Plot #2: Better performer according to RRMSE by radius

Plot #3: Better performer according to R² vs. biomass by radius

Plot #4: Better performer according to R² vs. SDE by radius

Plot #5: Better performer according to R² vs. mean observed VI by radius

Plot #6: Better performer according to RRMSE vs. biomass by radius

Plot #5: Better performer according to RRMSE vs. SDE by radius

Plot #6: Better performer according to RRMSE vs. mean observed VI by radius

So what does this all tell us?

Exploring Katherine’s VI data

Mickey Campbell

2023-01-31

Plot #1: Better performer according to R2 by radius

Plot #2: Better performer according to RRMSE by radius

Plot #3: Better performer according to R2 vs. biomass by radius

Plot #4: Better performer according to R2 vs. SDE by radius

Plot #5: Better performer according to R2 vs. mean observed VI by radius

Plot #6: Better performer according to RRMSE vs. biomass by radius

Plot #5: Better performer according to RRMSE vs. SDE by radius

Plot #6: Better performer according to RRMSE vs. mean observed VI by radius

So what does this all tell us?

Plot #1: Better performer according to R² by radius

Plot #3: Better performer according to R² vs. biomass by radius

Plot #4: Better performer according to R² vs. SDE by radius

Plot #5: Better performer according to R² vs. mean observed VI by radius