In the STRIDE manuscript, currently in review, that presents a new “universal” travel rate function trained and validated on three separate field experiments, we examined the aboveground height range within which lidar point normalized relative density (NRD) was most predictive of travel rates. We found that 0.85 - 1.20m was the best range. However, we also found that there were many similar height ranges that yielded similar (albeit slightly worse) performance. One might wonder why bother examining anything but the singular best range? The answer is that 0.85 - 1.20m is relatively narrow (roughly waist to chest height). While the fact that this range was selected makes intuitive sense (vegetation between your waist and chest is hard to avoid, therefore high densities in this range mean slower movement), its narrowness may limit its broad applicability. The narrower the height range, the less likely a lidar pulse is to interact with a surface in this range. For example, the highest NRD found throughout our 39 transects within this range was 6.4% – a very low number. With a wider range, lidar pulses are more likely to encounter vegetated surfaces within that range. This is particularly true with lower pulse density airborne lidar datasets. So, even if the statistical link to travel rates may be slightly less robust with a wider range than 0.85 - 1.20, the accuracy with which vegetation density is represented within the height range may be more robust.
Accordingly, the objective of this analysis is to reexamine the performance metrics of the STRIDE model at different NRD height ranges in the interest of identifying a range that is both wide and still highly predictive of travel rates.
Analysis
library(knitr)
Warning: package 'knitr' was built under R version 4.3.3
library(stringr)
Warning: package 'stringr' was built under R version 4.3.3
# read in the datadf <-read.csv("S:/ursa2/campbell/hvm_travel_rate/modeling/nrd_range_and_roughness_radius_perf_mets_with_urb.csv")kable(head(df))
nrd.col
rgh.col
r2
rmse
n.errors
n.warnings
nrd.005.030
roughness.1m
0.6524752
NA
1
0
nrd.005.035
roughness.1m
0.6962799
0.2099557
0
0
nrd.005.040
roughness.1m
0.6962457
0.2099745
0
0
nrd.005.045
roughness.1m
0.6963438
0.2099471
0
0
nrd.005.050
roughness.1m
0.6964302
0.2099225
0
0
nrd.005.055
roughness.1m
0.6965026
0.2099014
0
0
As you can see, this table contains 6 columns, only the first four of which really matter here:
nrd.col defines what NRD variable was used to build the model, with the two numbers (e.g., 005 and 030) representing the height range in cm. So, nrd.005.030 is the NRD between 5 and 30cm.
rgh.col defines what roughness variable was used to build the model. In addition to testing NRD height ranges, we also tested roughness focal radii (1, 2, 3, 4, and 5m), finding that 2m was the best.
r2 is the coefficient of determination between predicted and observed travel rates from a model built using the NRD and roughness variables shown in each row, in addition to slope.
rmse is the RMSE bewteen predictions and observations for each model.
We’ll focus on RMSE as the target measure of interest, and we’ll filter to 2m-radius roughness models, since those were found to be the best. Then, we’ll simpify the table to just the columns of interest. Lastly, we’ll pull the actual heights out from the nrd.col variable names, and calculate the range width.
# filter to just the 2m radius roughness modelsdf <- df[df$rgh.col =="roughness.2m",]rownames(df) <-1:nrow(df)# simplify the tabledf <- df[,c("nrd.col", "rmse")]# add columns for height range floor and ceilingdf$nrd.floor <-str_split_i(df$nrd.col, "\\.", 2) |>as.numeric()df$nrd.ceiling <-str_split_i(df$nrd.col, "\\.", 3) |>as.numeric()# add column for height range widthdf$nrd.width <- df$nrd.ceiling - df$nrd.floorkable(head(df))
nrd.col
rmse
nrd.floor
nrd.ceiling
nrd.width
nrd.005.030
0.2023519
5
30
25
nrd.005.035
0.2021093
5
35
30
nrd.005.040
0.2019042
5
40
35
nrd.005.045
0.2017205
5
45
40
nrd.005.050
0.2015654
5
50
45
nrd.005.055
0.2014443
5
55
50
OK, so now let’s compare the NRD height range width to RMSE, to see if we can find a range that is both fairly wide and fairly accurate.
# plot out width vs rmsepar(mar =c(5,5,1,1), las =1)plot(x = df$nrd.width,y = df$rmse,pch =16,xlab ="NRD Height Range Width (cm)",ylab ="Travel Rate Model Error (RMSE, m/s)")
Trippy! After a little exploration, I figured out that the weird line of high-error points are all associated with the height range floors of 5cm (see code and plot below). I attribute this to the fact that 5cm is too close to the ground to be useful, due to misclassification of ground points, vegetation that low not affecting movement, and the fact that NRD calculations won’t be very robust, since the denominator will only include a few more points, usually, than the numerator.
# plot out width vs rmsepar(mar =c(5,5,1,1), las =1)cols <-rep(4, nrow(df))cols[df$nrd.floor ==5] <-2plot(x = df$nrd.width,y = df$rmse,pch =16,col = cols,xlab ="NRD Height Range Width (cm)",ylab ="Travel Rate Model Error (RMSE, m/s)")legend("bottomright",legend =c("5cm Floor", "Not 5cm Floor"),col =c(2,4),pch =16)
For visual simplicity, we can get rid of all of the 5cm floor models. We’ll also highlight the single best model (0.85 - 1.20m):
# remove 5cm floor modelsdf <- df[df$nrd.floor !=5,]# plot out width vs rmsepar(mar =c(5,5,1,1), las =1)plot(x = df$nrd.width,y = df$rmse,pch =16,col ="lightgray",xlab ="NRD Height Range Width (cm)",ylab ="Travel Rate Model Error (RMSE, m/s)")points(x = df$nrd.width[which.min(df$rmse)],y =min(df$rmse),pch =16,cex =1.5)text(x = df$nrd.width[which.min(df$rmse)],y =min(df$rmse),labels =paste0( df$nrd.floor[which.min(df$rmse)]," - ", df$nrd.ceiling[which.min(df$rmse)],"cm" ),pos =4,offset =0.5)legend("bottomright",legend =c("Best Model", "Not Best Models"),col =c("black", "lightgray"),pch =16,pt.cex =c(1.5, 1))
The question becomes “how much worse are we willing to go to get a wider range?” It’s hard to think about in terms of raw RMSE increase, and perhaps easier to do so in relative terms. In other words, saying that predictive capacity decreases by 0.005 m/s is harder to understand than predictive capacity decreases by 5%. To calculate relative decrease in model performance (RDMP), which is equivalent to the relative increase in RMSE, from the best model to each suboptimal models [i, …, n], I’ll use the following formula:
# calculate rdmp for each modeldf$rdmp <- (df$rmse -min(df$rmse)) / (min(df$rmse))# plot out rdmp versus height range widthpar(mar =c(5,5,1,1), las =1)plot(x = df$nrd.width,y = df$rdmp,pch =16,xlab ="NRD Height Range Width (cm)",ylab ="Relative Decrease in Model Performance")
From here, we can apply a variety of thresholds, depending on our willingness to increase error (decrease model performance). Certainly decreasing model performance by 1% won’t matter, but are we willing to go as high as 5%, for example?
RDMP <= 1%
# define rdmp thresholdthres <-0.01# plot out rdmp versus height range widthpar(mar =c(5,5,1,1), las =1)cols <-rep(4, nrow(df))cols[df$rdmp <= thres] <-2plot(x = df$nrd.width,y = df$rdmp,pch =16,col = cols,xlab ="NRD Height Range Width (cm)",ylab ="Relative Decrease in Model Performance")abline(h = thres, lty =2)text(x =250,y = thres,labels =paste0("RDMP <= ", thres),pos =1,offset =0.5)
# get the widest range within this thresholddf.1<- df[df$rdmp <= thres,]df.1<- df.1[which.max(df.1$nrd.width),]df.1$thres <- thres
RDMP <= 2%
# define rdmp thresholdthres <-0.02# plot out rdmp versus height range widthpar(mar =c(5,5,1,1), las =1)cols <-rep(4, nrow(df))cols[df$rdmp <= thres] <-2plot(x = df$nrd.width,y = df$rdmp,pch =16,col = cols,xlab ="NRD Height Range Width (cm)",ylab ="Relative Decrease in Model Performance")abline(h = thres, lty =2)text(x =250,y = thres,labels =paste0("RDMP <= ", thres),pos =1,offset =0.5)
# get the widest range within this thresholddf.2<- df[df$rdmp <= thres,]df.2<- df.2[which.max(df.2$nrd.width),]df.2$thres <- thres
RDMP <= 3%
# define rdmp thresholdthres <-0.03# plot out rdmp versus height range widthpar(mar =c(5,5,1,1), las =1)cols <-rep(4, nrow(df))cols[df$rdmp <= thres] <-2plot(x = df$nrd.width,y = df$rdmp,pch =16,col = cols,xlab ="NRD Height Range Width (cm)",ylab ="Relative Decrease in Model Performance")abline(h = thres, lty =2)text(x =250,y = thres,labels =paste0("RDMP <= ", thres),pos =1,offset =0.5)
# get the widest range within this thresholddf.3<- df[df$rdmp <= thres,]df.3<- df.3[which.max(df.3$nrd.width),]df.3$thres <- thres
RDMP <= 4%
# define rdmp thresholdthres <-0.04# plot out rdmp versus height range widthpar(mar =c(5,5,1,1), las =1)cols <-rep(4, nrow(df))cols[df$rdmp <= thres] <-2plot(x = df$nrd.width,y = df$rdmp,pch =16,col = cols,xlab ="NRD Height Range Width (cm)",ylab ="Relative Decrease in Model Performance")abline(h = thres, lty =2)text(x =250,y = thres,labels =paste0("RDMP <= ", thres),pos =1,offset =0.5)
# get the widest range within this thresholddf.4<- df[df$rdmp <= thres,]df.4<- df.4[which.max(df.4$nrd.width),]df.4$thres <- thres
RDMP <= 5%
# define rdmp thresholdthres <-0.05# plot out rdmp versus height range widthpar(mar =c(5,5,1,1), las =1)cols <-rep(4, nrow(df))cols[df$rdmp <= thres] <-2plot(x = df$nrd.width,y = df$rdmp,pch =16,col = cols,xlab ="NRD Height Range Width (cm)",ylab ="Relative Decrease in Model Performance")abline(h = thres, lty =2)text(x =250,y = thres,labels =paste0("RDMP <= ", thres),pos =1,offset =0.5)
# get the widest range within this thresholddf.5<- df[df$rdmp <= thres,]df.5<- df.5[which.max(df.5$nrd.width),]df.5$thres <- thres
Now I’ll compile the results, and print it out for all to see.
And for comparison, the optimal height range (85 - 120cm) has an NRD width of 35cm. So, even by expanding to just an RDMP <= 1, we’ve already nearly tripled our height range of interest.
So, I guess the question is how low are we willing to go? Part of the answer to this might lie in how the lidar data look when mapped at these different height ranges.