Plot #1: Better performer according to R2 by radius

First, I just want to get a site-by-site count of how many sites are doing better with the national-level model vs. the site-level model, by viewing radius, according to R2.

# read in the data
df <- read.csv("S:/ursa/campbell/Other/viewshed_approximation/other/Mod_Vis_All_results_table.csv")

# change column names
colnames(df) <- c("region","biomass","sde","radius","obs.vi.mean","site.r2",
                  "national.r2","site.rrmse","national.rrmse","evt.lf","evt.class")

# create a field that classifies which model performed better (site vs.
# national), for each site
df$better.r2 <- "site"
df$better.r2[df$national.r2 > df$site.r2] <- "national"

# create summary table
df.summ <- table(df[,c("better.r2", "radius")])

# create a bar plot that counts better performer by radius
par(mar = c(5,5,1,7), las = 1)
barplot(df.summ, beside = T, col = c("red", "blue"), ylab = "Count",
        xlab = "Radius")
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
       fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")

So, national-level models are doing better at shorter radii, whereas site-level models are doing better at longer radii, at least according to R2.

Plot #2: Better performer according to RRMSE by radius

Next, I’ll do the same thing, but focusing on RRMSE as the basis of comparison.

# create a field that classifies which model performed better (site vs.
# national), for each site
df$better.rrmse <- "site"
df$better.rrmse[df$national.rrmse < df$site.rrmse] <- "national"

# create summary table
df.summ <- table(df[,c("better.rrmse", "radius")])

# create a bar plot that counts better performer by radius
par(mar = c(5,5,1,7), las = 1)
barplot(df.summ, beside = T, col = c("red", "blue"), ylab = "Count",
        xlab = "Radius")
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
       fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")

According to RRMSE, site-level models are performing better at all radii, with increasing disparities at larger radii.

Plot #3: Better performer according to R2 vs. biomass by radius

To add a layer of nuance into the analysis, I’ll now examine which sites did better (according to R2) by biomass and radius.

# create a boxplot that compares biomass by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(biomass ~ better.r2 + radius, data = df, xlab = "Radius", ylab = "Biomass",
        xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
       fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")

Clearly, the national model does much better at high-biomass sites than the site-level models at all radii. I’m already certain that these differences are statistically-significant, but if we need some numbers, we can run a Wilcoxon rank sum test.

# run wilcoxon rank sum test on each radius to test significance of biomass vs. site/national better performance
w.125 <- wilcox.test(x = df$biomass[df$better.r2 == "national" & df$radius == 125],
                     y = df$biomass[df$better.r2 == "site" & df$radius == 125],
                     alternative = "greater")
w.250 <- wilcox.test(x = df$biomass[df$better.r2 == "national" & df$radius == 250],
                     y = df$biomass[df$better.r2 == "site" & df$radius == 250],
                     alternative = "greater")
w.500 <- wilcox.test(x = df$biomass[df$better.r2 == "national" & df$radius == 500],
                     y = df$biomass[df$better.r2 == "site" & df$radius == 500],
                     alternative = "greater")
w.1000 <- wilcox.test(x = df$biomass[df$better.r2 == "national" & df$radius == 1000],
                      y = df$biomass[df$better.r2 == "site" & df$radius == 1000],
                      alternative = "greater")

Here are the p-values:

Plot #4: Better performer according to R2 vs. SDE by radius

Now I’ll do the same thing, but for SDE.

# create a boxplot that compares sde by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(sde ~ better.r2 + radius, data = df, xlab = "Radius", ylab = "SDE",
        xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
       fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")

…No noticeable differences.

Plot #5: Better performer according to R2 vs. mean observed VI by radius

Now I’ll do the same thing, but for mean observed VI.

# create a boxplot that compares mean observed vi by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(obs.vi.mean ~ better.r2 + radius, data = df, xlab = "Radius", ylab = "Mean Observed VI",
        xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
       fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")

Some fairly stark differences emerge. Let’s test significance…

# run wilcoxon rank sum test on each radius to test significance of mean observed vi vs. site/national better performance
w.125 <- wilcox.test(x = df$obs.vi.mean[df$better.r2 == "national" & df$radius == 125],
            y = df$obs.vi.mean[df$better.r2 == "site" & df$radius == 125],
            alternative = "less")
w.250 <- wilcox.test(x = df$obs.vi.mean[df$better.r2 == "national" & df$radius == 250],
            y = df$obs.vi.mean[df$better.r2 == "site" & df$radius == 250],
            alternative = "less")
w.500 <- wilcox.test(x = df$obs.vi.mean[df$better.r2 == "national" & df$radius == 500],
            y = df$obs.vi.mean[df$better.r2 == "site" & df$radius == 500],
            alternative = "less")
w.1000 <- wilcox.test(x = df$obs.vi.mean[df$better.r2 == "national" & df$radius == 1000],
            y = df$obs.vi.mean[df$better.r2 == "site" & df$radius == 1000],
            alternative = "less")

Here are the p-values:

Once again, all significant. But, if we’re willing to get into the p-value comparisons game (which may be statistically-iffy), then biomass has a stronger effect than observed VI.

Plot #6: Better performer according to RRMSE vs. biomass by radius

So, the last three plots were all based on R2. But, the challenging narrative in the paper really comes from RRMSE. So, let’s repeat the last three plots, but focusing on RRMSE instead.

# create a boxplot that compares biomass by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(biomass ~ better.rrmse + radius, data = df, xlab = "Radius", ylab = "Biomass",
        xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
       fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")

Once again, we can see that generally national-level models outperform site-level models according to RRMSE, but the relationship is not as stark as that with R2. Let’s check the significance…

# run wilcoxon rank sum test on each radius to test significance of biomass vs. site/national better performance
w.125 <- wilcox.test(x = df$biomass[df$better.rrmse == "national" & df$radius == 125],
                     y = df$biomass[df$better.rrmse == "site" & df$radius == 125],
                     alternative = "greater")
w.250 <- wilcox.test(x = df$biomass[df$better.rrmse == "national" & df$radius == 250],
                     y = df$biomass[df$better.rrmse == "site" & df$radius == 250],
                     alternative = "greater")
w.500 <- wilcox.test(x = df$biomass[df$better.rrmse == "national" & df$radius == 500],
                     y = df$biomass[df$better.rrmse == "site" & df$radius == 500],
                     alternative = "greater")
w.1000 <- wilcox.test(x = df$biomass[df$better.rrmse == "national" & df$radius == 1000],
                      y = df$biomass[df$better.rrmse == "site" & df$radius == 1000],
                      alternative = "greater")

Here are the p-values:

Only significant at 1000m.

Plot #5: Better performer according to RRMSE vs. SDE by radius

# create a boxplot that compares sde by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(sde ~ better.rrmse + radius, data = df, xlab = "Radius", ylab = "SDE",
        xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
       fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")

No major differences.

Plot #6: Better performer according to RRMSE vs. mean observed VI by radius

# create a boxplot that compares mean observed vi by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(obs.vi.mean ~ better.rrmse + radius, data = df, xlab = "Radius", ylab = "Mean Observed VI",
        xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
       fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")

Once again, we see that national-level models do better at modeling lower-VI areas than site-level models. Are the differences significant?

# run wilcoxon rank sum test on each radius to test significance of mean observed vi vs. site/national better performance
w.125 <- wilcox.test(x = df$obs.vi.mean[df$better.rrmse == "national" & df$radius == 125],
                     y = df$obs.vi.mean[df$better.rrmse == "site" & df$radius == 125],
                     alternative = "less")
w.250 <- wilcox.test(x = df$obs.vi.mean[df$better.rrmse == "national" & df$radius == 250],
                     y = df$obs.vi.mean[df$better.rrmse == "site" & df$radius == 250],
                     alternative = "less")
w.500 <- wilcox.test(x = df$obs.vi.mean[df$better.rrmse == "national" & df$radius == 500],
                     y = df$obs.vi.mean[df$better.rrmse == "site" & df$radius == 500],
                     alternative = "less")
w.1000 <- wilcox.test(x = df$obs.vi.mean[df$better.rrmse == "national" & df$radius == 1000],
                      y = df$obs.vi.mean[df$better.rrmse == "site" & df$radius == 1000],
                      alternative = "less")

Here are the p-values:

Once again, only significant at 1000m.

So what does this all tell us?

I guess it tells us a few things (that I think we already know…):

  1. National-level models are able to explain more variance (have higher R2) than site-level models at shorter viewing radii, and vice versa at longer viewing radii.
  2. National-level models have higher RRMSE at more sites across all radii, but the differences increase with increasing radius.
  3. National-level models are better at modeling visibility in high-biomass, low-visibility sites
  4. The differences in model performance according to R2 are much clearer than those of RRMSE
  5. However, even though the relationships are generally not statistically-significant, national-level models still tend to outperform site-level models at high-biomass, low-visibility sites.
  6. SDE has no effect, across the board.