knitr::opts_chunk$set(echo = FALSE, fig.height = 5, fig.width = 9, fig.align = "center")
setwd("G:/My Drive/CanopyHeights_Analysis")
### Libraries
library(dplyr)
library(ggplot2)
library(xgboost)
library(knitr)
library(plotly)
library(caret)
library(kableExtra)
library(hexbin)
library(modelr)
library(readr)
library(broom)
### Import data
setwd("G:/My Drive/CanopyHeights_Analysis")
d <- read.csv("SamplePts.csv")
#head(d)
Lidar<-d$Lidar_ht
GEDI<-d$GEDI_ht
NFIS<-d$NFIS_hT
HARM<-d$Harmonized.Landcover
harmName<-d$Harmorized.Landcover.Name
wetcls<-d$ABMI_Wetland_Class
Purpose: To assess two open source canopy height models against Alberta GOA lidar and ABMI Photoplots
In the following scatterplots, darker shades of blue indicate greater density of data. The dashed line is a linear regression trendline. The yellow line is a Loess regression line (smoothed non parametric regression) [-1]http://r-statistics.co/Loess-Regression-With-R.html
Both open source satellite based canopy height layers (GEDI and NFIS) tend to exceed corresponding lidar values up to about 12m, after which they level off and eventually lidar values >15m exceed the corresponding values from the other two layers.
densPlot <-function(xVar, yVar,titleVar ){
ggplot(d, aes_string(x=xVar, y=yVar)) + geom_jitter(alpha=0.3, colour = "lightblue") +
xlim(0, 25) +
ylim (0, 25) +
stat_density2d(aes(fill = stat(level)),geom = "polygon", alpha = 0.7) +
scale_fill_gradient(low = "lightblue", high = "darkblue")+
theme(legend.position = "none") +
geom_smooth(span = 10, colour = "lightyellow", weight = 0.5) +
geom_smooth(method = lm, colour = "black", weight = 0.01, linetype = "dashed")+
xlab (deparse(substitute(xVar)))+
ylab (deparse(substitute(yVar))) +
labs(
title = paste(titleVar)
)
}
densPlot(Lidar, GEDI,"GEDI Based Canopy Heights Compared to GOA Airborne Lidar" )
gedi.lm= lm(Lidar ~ GEDI, data = d)
gedi_r2<-summary(gedi.lm)$r.squared
print ( paste ("Lidar ~ GEDI Linear Regression Rsquared: ", formatC(gedi_r2, digits = 3, format = "f")))
## [1] "Lidar ~ GEDI Linear Regression Rsquared: 0.336"
densPlot(Lidar, NFIS,"NFIS Extrapolated Canopy Heights Compared to GOA Airborne Lidar " )
nfis.lm=lm(Lidar ~NFIS, data = d)
nfis_r2<-summary(nfis.lm)$r.squared
print ( paste ("Lidar ~NFIS Linear Regression Rsquared: ", formatC(nfis_r2, digits = 3, format = "f")))
## [1] "Lidar ~NFIS Linear Regression Rsquared: 0.272"
Surprisingly, the GEDI and NFIS canopy height layers are not closely correlated with each other either. While the densest data range, values <2m, shows most values from the two datasets are within the same range, both also have values which are very low in one datset and high in the other. Most of the rest of the data is bunched up in the 10-15m (GEDI) or 12-17m (NFIS) range. The descrepancy is surprising because both these datasets use Landsat data for extrapolation.
densPlot(GEDI, NFIS,"NFIS Extrapolated Canopy Heights Compared to GEDI Based Canopy Heights " )
both.lm=lm(GEDI ~NFIS, data = d)
both_r2<-summary(both.lm)$r.squared
print ( paste ("GEDI ~NFIS Linear Regression Rsquared: ", formatC(both_r2, digits = 3, format = "f")))
## [1] "GEDI ~NFIS Linear Regression Rsquared: 0.362"
Comments: The descrepancy between the GOA Lidar CHM and the two open source canopy height layers can probably be attributed at least partially to the temporal descrepancy. Tree growth will proceed at different rates in different ages and types of forest, so we should not expect a linear relationahip to be preserved over time. The methodological difference of resampling 1m data down to 30m by taking a percentile vs. modelled data can also explain some of the descrepancy.
The difference between the two modelled datasets is harder to explain without a detailed analysis of the input data and modelling approaches.
To examine the spatial distribution of the differences between the canopy height datasets, difference images were created for 1) GEDI minus Lidar heights 2) NFIS minus Lidar Heights, and 3) GEDI minus NFIS. An Optimized Hot Spot Analysis was run in ArcPro. Red areas show significant cluster of positive values and blue areas show significant clusters of negative values. Grey areas have no significant clustering, although there may be individual high and low values mixed together.
The GEDI-minus-Lidar layer showed the most pronounced spatial clustering, with the Foothills Region, Hawk Hills, and Birch Mountains being hot spot areas where GEDI > Lidar, while surrounding areas in the northwest of the province were cold spots with Lidar > GEDI. These hilly areas have a lot of shrub, according to the NFIS Landcover layer. The NFIS-Minus-Lidar and GEDI-minus-NFIS showed less pronounced spatial patterns.
GEDI-Minus-Lidar
GEDI-Minus-NFIS
NFIS-Minus-Lidar
The next phase of analysis focused on drilling down into the landcover, height, and density characteristics that might explain the spatial patterns. Density based on the Percent Above Mean variable turned out to be strongly positively correlated with height, so height class was the focus.
1. Pivot Table analysis in Excel showed that the average heights when broken down by landcover and wetland type usually made sense. For example, here the data is arranged by wetland class, and then the fen is futher broken down by Lidar height class. It’s observed that the GEDI and NFIS heights average higher than Lidar for the shorter height categories. Fen can be graminoid, shrub, or treed. When the 0-2m lidar height category is broken down further by NFIS landcover type, it suggests some of the fen in the group are actually treed, and these account for the higher GEDI and NFIS heights. Possibly, these areas have transitioned from graminoid to shrub or treed fen in the time between the lidar aquisition and the modelled layers. (Note that this graph does not show how many points are in each category; some, like Snow/Ice in Fen, are negligable.)
Pivot Chart with Wetland Types, Lidar Height Classes, and Landcover
2. The following density plots show the breakdown by 6 class Harmonized Landcover, ABMI Wetlands/Upland, and 13 class NFIS Landcover.
#Density Plots by Class
densPlot2 <-function(xVar, yVar,wrapVar ){
ggplot(d, aes_string(x=xVar, y=yVar)) + geom_point(alpha=1/5, colour = "lightblue") +
xlim(0, 25) +
ylim (0, 25) +
stat_density2d(aes(fill = stat(level)),geom = "polygon", alpha = 0.7) +
scale_fill_gradient(low = "lightblue", high = "darkblue")+
theme(legend.position = "none") +
geom_smooth(span = 10, colour = "lightyellow", weight = 0.5) +
xlab (deparse(substitute(xVar)))+
ylab (deparse(substitute(yVar))) +
facet_wrap(wrapVar)+
labs(title = paste((deparse(substitute(yVar))), "vs", (deparse(substitute(xVar))), "by",(deparse(substitute(wrapVar)))))
}
densPlot2(Lidar, GEDI,"~Harmorized.Landcover.Name")
densPlot2(Lidar, GEDI,"~ABMI_Wetland_Class")
densPlot2(Lidar, GEDI,"~NFIS_Landcover_Name")
densPlot2(Lidar, NFIS,"~Harmorized.Landcover.Name")
densPlot2(Lidar, NFIS,"~ABMI_Wetland_Class")
densPlot2(Lidar, NFIS,"~NFIS_Landcover_Name")
3. These boxplots show the difference between GEDI and Lidar by Harmonized Landcover class. As expected, the range is greater for landcovers with a greater range of values, while the landcover types without woody vegetation’s differences are mainly outliers. The box plots for NFIS minus Lidar (not shown) are similar.
ggplot(d,aes(harmName,d$G_Minus_L))+
geom_hline(yintercept = 0, color = "white", size = 2) +
geom_boxplot() +
scale_y_continuous(breaks = seq(-20, 20, 5))+
labs(title = "GEDI Minus Lidar by Harmonized Landcover Class")
#ggplot(d,aes(d$NFIS_Landcover_Name,d$G_Minus_L))+geom_boxplot() +
# scale_y_continuous(breaks = seq(-20, 20, 5))
4. These boxplots show the differences between GEDI and Lidar, and NFIS and Lidar, by Lidar height class. They follow a similar pattern, though the NFIS ranges tend to be larger. The positive differences in the shorter height ranges may be related to vegetation growth, while the negative differences in the taller ranges indicate that the modelled canopy heights tend to saturate out after about 15m. The sum of differences average out close to zero.
ggplot(d,aes(LidarHtCls,G_Minus_L))+
geom_hline(yintercept = 0, color = "white", size = 2) +geom_boxplot() +
labs(title = "GEDI Minus Lidar by Lidar Height Class")
ggplot(d,aes(LidarHtCls,N_Minus_L))+
geom_hline(yintercept = 0, color = "white", size = 2) +geom_boxplot() +
labs(title = "NFIS Minus Lidar by Lidar Height Class")
5.When the data is viewed organized by GEDI height class, there seems to be some omission; data in the 0-2 height range that averages 6-8m in Lidar heights and taller in NFIS heights, and is classifed as treed in the NFIS Landcover.
GEDI Height Classes by Landcover
These bar charts of GEDI heights broken down the Harmonized Landcover class also suggest at these missing heights. The coloured bars are GEDI heights and the black outline bars show the coresponding lidar heights for the same data.
ggplot(d) +
geom_bar(aes(wetcls, GEDI, fill = harmName),
position = "dodge", alpha = 0.9, width=0.75,
stat = "summary", fun.y = "mean") +
geom_bar(aes(wetcls, Lidar, group = interaction(wetcls,harmName)),
position = "dodge", alpha = 0.15, width=0.75, colour = "black", fill = NA,
stat = "summary", fun.y = "mean") +
facet_wrap(~GEDI_htClass,ncol = 2)+
labs(title = "GEDI mean heights by Wetland Class and Harmonized Landcover Class")
Here’s an example of a copse of trees in the Parkland Region visible on the 30m Lidar CHM but not on the modelled GEDI based layer (There were no NFIS canopy height data in this area.)
The same issue exists in the NFIS layer, but not always at the same locations.
Calculations Based on Pivot Tables, assuming the NFIS Landcover and Harmonized Landcover Shrub, Conifer, Broadlead, and Mixed Forest and Treed Wetland classes are correct, suggest that
If the two modelled canopy height layers can be used together to identify forest/nonforest, the omission can be reduced to 7% (producer’s accuracy) or about 2% consumer’s accuracy. Commission of heights above 3m is negligable.
However, the NFIS canopy height layer does not cover the Parkland and Grassland Natural Regions.
1. The photoplot height attribute is meant to represent the average height of the tallest overstory canopy in each polygon and is expressed as integers in 1m increments. ArcPro’s Zonal Statistics as Table tool was used to calculate the mean for each raster canopy height layer corresponding to all photoplot polygons of a certain height.
The photoplot heights may be the closest we have to a comprehensive “ground truth” dataset since they are closer in time to the modelled canopy height layers. In addition, the averaging process across polygons may be more forgiving than matching single point locations as in the previous analysis, making for a better match.
At any rate, the correlation between the layers is much closer in this format. The black line represents what would be a 1:1 relationship between photoplot heights and mean raster canopy heights. Note that there are relatively few stands over 25m, so the data points over 25m represent relatively few polygons compared to the others.
#first the three summarized by photplot site height
g <- read_csv("PhotoplotHts.csv")
ggplot(g, aes(x= SITE_HT)) +
scale_colour_manual(name="", values =c("GEDI"="red", "NFIS"="forestgreen", "Lidar"= 'blue')) +
geom_hline(yintercept = 0, color = "white", size = 2) +
geom_point(aes( y=GEDI, colour = "GEDI")) +
geom_line(aes( y=GEDI),colour = "pink", alpha = 0.7) +
geom_point(aes( y=NFIS, colour = "NFIS")) +
geom_line(aes( y=NFIS),colour = "lightgreen", alpha = 0.7) +
geom_point(aes( y=Lidar, colour = "Lidar")) +
geom_line(aes( y=Lidar),colour = "lightblue", alpha = 0.7) +
geom_line(aes(y = SITE_HT))+
scale_x_continuous(breaks = seq(0, 36, 2)) +
scale_y_continuous(breaks = seq(0, 36, 2)) +
xlab("Photoplot Polygon Height")+
ylab ("Mean Raster Canopy Height")
2. Regression Models. Although the y intercepts and slopes are offset from the 1:1 line, the mean raster canopy heights are all quite closely correlated with the photoplot heights. The GOA Lidar data shows a very strong relationship to the photoplot heights, which increases confidence in its relevance for evaluation of the modelled canopy heights.
get_formula <- function(model) {
broom::tidy(model)[, 1:2] %>%
mutate(sign = ifelse(sign(estimate) == 1, ' + ', ' - ')) %>% #coeff signs
mutate_if(is.numeric, ~ abs(round(., 2))) %>% #for improving formatting
mutate(a = ifelse(term == '(Intercept)', paste0('y ~ ', estimate), paste0(sign, estimate, 'x'))) %>%
summarise(formula = paste(a, collapse = '')) %>%
as.character
}
ppgedi.lm= lm(g$GEDI~ g$SITE_HT, data = d)
gedi_r2<-summary(ppgedi.lm)$r.squared
print ( paste ("GEDI ~ Photoplot Linear Regression : ", get_formula(ppgedi.lm)))
## [1] "GEDI ~ Photoplot Linear Regression : y ~ 4.85 + 0.23x"
print ( paste (" R squared: ", formatC(gedi_r2, digits = 3, format = "f")))
## [1] " R squared: 0.773"
ppnfis.lm=lm(g$NFIS ~g$SITE_HT, data = d)
nfis_r2<-summary(ppnfis.lm)$r.squared
print ( paste ("NFIS ~Photoplot Linear Regression : ", get_formula(ppnfis.lm)))
## [1] "NFIS ~Photoplot Linear Regression : y ~ 4.87 + 0.41x"
print ( paste (" R squared: ", formatC(nfis_r2, digits = 3, format = "f")))
## [1] " R squared: 0.894"
pplidar.lm=lm(g$Lidar ~g$SITE_HT, data = d)
lidar_r2<-summary(pplidar.lm)$r.squared
print ( paste ("Lidar ~Photoplot Linear Regression : ", get_formula(pplidar.lm)))
## [1] "Lidar ~Photoplot Linear Regression : y ~ 0.37 + 0.7x"
print ( paste (" R squared: ", formatC(lidar_r2, digits = 3, format = "f")))
## [1] " R squared: 0.978"
3. The relationship with the photoplot heights can perhaps be seen most clearly by examining the differences. All three mean raster canopy heights initially exceed the polygon heights by a few metres, then drop into a negative difference, confirming the pattern observed with the sample points. When we examined the sample points, the NFIS values often seemed too high, but now averaged across thousands of polygons, they fall closer to the photoplot heights than the GEDI based layer.
g$diffGEDI<-g$GEDI-g$SITE_HT
g$diffNFIS<-g$NFIS-g$SITE_HT
g$diffLidar<-g$Lidar-g$SITE_HT
#print(g)
ggplot(g, aes(x= SITE_HT)) +
scale_colour_manual(name="", values =c("Diff_GEDI"="red", "Diff_NFIS"="forestgreen", "Diff_Lidar"= 'blue')) +
geom_hline(yintercept = 0, color = "white", size = 2) +
geom_line(aes( y=diffGEDI, colour = "Diff_GEDI")) +
geom_line(aes( y=diffNFIS, colour = "Diff_NFIS")) +
geom_line(aes( y=diffLidar, colour = "Diff_Lidar")) +
xlim(0, 25) +
ylim (-25, 5) +
xlab("Photoplot Polygon Height")+
ylab ("Differences from Photoplot Height")
It should be noted that the NFIS product is based on a 95th percentile of airborne lidar. The GEDI based product is based on “RH95”, a 95th percentile of a 25m wide spaceborne lidar footprint, which Potapov et al (2021) found corresponded best to the 90th percentile of airborne lidar.
The Government of Alberta lidar sampled down to 30m is an imperfect evaluation layer for the two modelled canopy height layers, principally due to the different in age, and lack of coverage in certain natural regions (Canadian Shield, Parkland and Grassland). It is getting too dated to be used as “truth” for canopy height models, but it can serve as a rough guide after known disturbances in the intervening years are removed.
There is spatial variation in the areas where the three canopy height layers differ, with no one layer consistently higher or lower than the others across the province. These variations seem to be related to variables such as height, density, terrain, and vegetation type, which makes it hard to separate out effects of vegetation growth over time, methodology, and consistency of results.
An examination of the canopy height values against three landcover layers, two from NFIS and one produced by ABMI (5 wetland classes and Uplands), suggest that on average the height values match the classified cover type i.e treed upland and wetland cover types have heights above 3m.
The two modelled height layers are often higher than the GOA lidar for shorter, younger forest, suggesting a growth effect, but level out for forest above 15m in height. The three layers are closest to each other in the 10 to 15m range, which is where a large proportion of the trees are. Both layers completely omit a small percentage of shorter tree or shrub heights,
The NFIS canopy height layer has a few shortcomings for our uses
However, when averaged over polygons, the NFIS layer appeared to match photoplot heights more closely than the GEDI layer.
Both open source canopy height modelled layers could be useful and meaningful for many applications, but must be used with caution and a few caveats until more precise open source canopy height data is available.
Matasci, G., T. Hermosilla, M.A. Wulder, J.C. White, N.C. Coops, G.W. Hobart, H.S.J. Zald Large-area mapping of Canadian boreal forest cover, height, biomass and other structural attributes using Landsat composites and lidar plots Remote Sens. Environ, 209 (2018) https://doi.org/10.1016/j.rse.2017.12.020
Potapov, P., X. Li, A. Hernandez-Serna, A. Tyukavina, M.C. Hansen, A. Kommareddy, A. Pickens, S. Turubanova, H. Tang, C.E. Silva, J. Armston, R. Dubayah, J.B. Blair, M. Hofton Mapping global forest canopy height through integration of GEDI and Landsat data Remote Sens. Environ., 253(2021) https://doi.org/10.1016/j.rse.2020.112165