First, I just want to get a site-by-site count of how many sites are doing better with the national-level model vs. the site-level model, by viewing radius, according to R2.
# read in the data
df <- read.csv("S:/ursa/campbell/Other/viewshed_approximation/other/Mod_Vis_All_results_table.csv")
# change column names
colnames(df) <- c("region","biomass","sde","radius","obs.vi.mean","site.r2",
"national.r2","site.rrmse","national.rrmse","evt.lf","evt.class")
# create a field that classifies which model performed better (site vs.
# national), for each site
df$better.r2 <- "site"
df$better.r2[df$national.r2 > df$site.r2] <- "national"
# create summary table
df.summ <- table(df[,c("better.r2", "radius")])
# create a bar plot that counts better performer by radius
par(mar = c(5,5,1,7), las = 1)
barplot(df.summ, beside = T, col = c("red", "blue"), ylab = "Count",
xlab = "Radius")
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")
So, national-level models are doing better at shorter radii, whereas site-level models are doing better at longer radii, at least according to R2.
Next, I’ll do the same thing, but focusing on RRMSE as the basis of comparison.
# create a field that classifies which model performed better (site vs.
# national), for each site
df$better.rrmse <- "site"
df$better.rrmse[df$national.rrmse < df$site.rrmse] <- "national"
# create summary table
df.summ <- table(df[,c("better.rrmse", "radius")])
# create a bar plot that counts better performer by radius
par(mar = c(5,5,1,7), las = 1)
barplot(df.summ, beside = T, col = c("red", "blue"), ylab = "Count",
xlab = "Radius")
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")
According to RRMSE, site-level models are performing better at all radii, with increasing disparities at larger radii.
To add a layer of nuance into the analysis, I’ll now examine which sites did better (according to R2) by biomass and radius.
# create a boxplot that compares biomass by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(biomass ~ better.r2 + radius, data = df, xlab = "Radius", ylab = "Biomass",
xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")
Clearly, the national model does much better at high-biomass sites than the site-level models at all radii. I’m already certain that these differences are statistically-significant, but if we need some numbers, we can run a Wilcoxon rank sum test.
# run wilcoxon rank sum test on each radius to test significance of biomass vs. site/national better performance
w.125 <- wilcox.test(x = df$biomass[df$better.r2 == "national" & df$radius == 125],
y = df$biomass[df$better.r2 == "site" & df$radius == 125],
alternative = "greater")
w.250 <- wilcox.test(x = df$biomass[df$better.r2 == "national" & df$radius == 250],
y = df$biomass[df$better.r2 == "site" & df$radius == 250],
alternative = "greater")
w.500 <- wilcox.test(x = df$biomass[df$better.r2 == "national" & df$radius == 500],
y = df$biomass[df$better.r2 == "site" & df$radius == 500],
alternative = "greater")
w.1000 <- wilcox.test(x = df$biomass[df$better.r2 == "national" & df$radius == 1000],
y = df$biomass[df$better.r2 == "site" & df$radius == 1000],
alternative = "greater")
Here are the p-values:
Now I’ll do the same thing, but for SDE.
# create a boxplot that compares sde by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(sde ~ better.r2 + radius, data = df, xlab = "Radius", ylab = "SDE",
xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")
…No noticeable differences.
Now I’ll do the same thing, but for mean observed VI.
# create a boxplot that compares mean observed vi by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(obs.vi.mean ~ better.r2 + radius, data = df, xlab = "Radius", ylab = "Mean Observed VI",
xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")
Some fairly stark differences emerge. Let’s test significance…
# run wilcoxon rank sum test on each radius to test significance of mean observed vi vs. site/national better performance
w.125 <- wilcox.test(x = df$obs.vi.mean[df$better.r2 == "national" & df$radius == 125],
y = df$obs.vi.mean[df$better.r2 == "site" & df$radius == 125],
alternative = "less")
w.250 <- wilcox.test(x = df$obs.vi.mean[df$better.r2 == "national" & df$radius == 250],
y = df$obs.vi.mean[df$better.r2 == "site" & df$radius == 250],
alternative = "less")
w.500 <- wilcox.test(x = df$obs.vi.mean[df$better.r2 == "national" & df$radius == 500],
y = df$obs.vi.mean[df$better.r2 == "site" & df$radius == 500],
alternative = "less")
w.1000 <- wilcox.test(x = df$obs.vi.mean[df$better.r2 == "national" & df$radius == 1000],
y = df$obs.vi.mean[df$better.r2 == "site" & df$radius == 1000],
alternative = "less")
Here are the p-values:
Once again, all significant. But, if we’re willing to get into the p-value comparisons game (which may be statistically-iffy), then biomass has a stronger effect than observed VI.
So, the last three plots were all based on R2. But, the challenging narrative in the paper really comes from RRMSE. So, let’s repeat the last three plots, but focusing on RRMSE instead.
# create a boxplot that compares biomass by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(biomass ~ better.rrmse + radius, data = df, xlab = "Radius", ylab = "Biomass",
xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")
Once again, we can see that generally national-level models outperform site-level models according to RRMSE, but the relationship is not as stark as that with R2. Let’s check the significance…
# run wilcoxon rank sum test on each radius to test significance of biomass vs. site/national better performance
w.125 <- wilcox.test(x = df$biomass[df$better.rrmse == "national" & df$radius == 125],
y = df$biomass[df$better.rrmse == "site" & df$radius == 125],
alternative = "greater")
w.250 <- wilcox.test(x = df$biomass[df$better.rrmse == "national" & df$radius == 250],
y = df$biomass[df$better.rrmse == "site" & df$radius == 250],
alternative = "greater")
w.500 <- wilcox.test(x = df$biomass[df$better.rrmse == "national" & df$radius == 500],
y = df$biomass[df$better.rrmse == "site" & df$radius == 500],
alternative = "greater")
w.1000 <- wilcox.test(x = df$biomass[df$better.rrmse == "national" & df$radius == 1000],
y = df$biomass[df$better.rrmse == "site" & df$radius == 1000],
alternative = "greater")
Here are the p-values:
Only significant at 1000m.
# create a boxplot that compares sde by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(sde ~ better.rrmse + radius, data = df, xlab = "Radius", ylab = "SDE",
xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")
No major differences.
# create a boxplot that compares mean observed vi by better performer and radius
par(mar = c(5,5,1,7), las = 1)
b <- boxplot(obs.vi.mean ~ better.rrmse + radius, data = df, xlab = "Radius", ylab = "Mean Observed VI",
xaxt = "n", col = rep(c("red", "blue"), 4))
labs <- c("125", "250", "500", "1000")
axis(1, at = seq(1.5,7.5,2), labels = labs, padj = 0, tick = F)
legend(x = par("usr")[2],y = mean(par("usr")[3:4]),legend = c("National", "Site"),
fill = c("red", "blue"),xpd = NA, yjust = 0.5, bty = "n")
Once again, we see that national-level models do better at modeling lower-VI areas than site-level models. Are the differences significant?
# run wilcoxon rank sum test on each radius to test significance of mean observed vi vs. site/national better performance
w.125 <- wilcox.test(x = df$obs.vi.mean[df$better.rrmse == "national" & df$radius == 125],
y = df$obs.vi.mean[df$better.rrmse == "site" & df$radius == 125],
alternative = "less")
w.250 <- wilcox.test(x = df$obs.vi.mean[df$better.rrmse == "national" & df$radius == 250],
y = df$obs.vi.mean[df$better.rrmse == "site" & df$radius == 250],
alternative = "less")
w.500 <- wilcox.test(x = df$obs.vi.mean[df$better.rrmse == "national" & df$radius == 500],
y = df$obs.vi.mean[df$better.rrmse == "site" & df$radius == 500],
alternative = "less")
w.1000 <- wilcox.test(x = df$obs.vi.mean[df$better.rrmse == "national" & df$radius == 1000],
y = df$obs.vi.mean[df$better.rrmse == "site" & df$radius == 1000],
alternative = "less")
Here are the p-values:
Once again, only significant at 1000m.
I guess it tells us a few things (that I think we already know…):