NRD Height Range Analysis

Introduction

In the STRIDE manuscript, currently in review, that presents a new “universal” travel rate function trained and validated on three separate field experiments, we examined the aboveground height range within which lidar point normalized relative density (NRD) was most predictive of travel rates. We found that 0.85 - 1.20m was the best range. However, we also found that there were many similar height ranges that yielded similar (albeit slightly worse) performance. One might wonder why bother examining anything but the singular best range? The answer is that 0.85 - 1.20m is relatively narrow (roughly waist to chest height). While the fact that this range was selected makes intuitive sense (vegetation between your waist and chest is hard to avoid, therefore high densities in this range mean slower movement), its narrowness may limit its broad applicability. The narrower the height range, the less likely a lidar pulse is to interact with a surface in this range. For example, the highest NRD found throughout our 39 transects within this range was 6.4% – a very low number. With a wider range, lidar pulses are more likely to encounter vegetated surfaces within that range. This is particularly true with lower pulse density airborne lidar datasets. So, even if the statistical link to travel rates may be slightly less robust with a wider range than 0.85 - 1.20, the accuracy with which vegetation density is represented within the height range may be more robust.

Accordingly, the objective of this analysis is to reexamine the performance metrics of the STRIDE model at different NRD height ranges in the interest of identifying a range that is both wide and still highly predictive of travel rates.

Analysis

library(knitr)

Warning: package 'knitr' was built under R version 4.3.3

library(stringr)

Warning: package 'stringr' was built under R version 4.3.3

# read in the data
df <- read.csv("S:/ursa2/campbell/hvm_travel_rate/modeling/nrd_range_and_roughness_radius_perf_mets_with_urb.csv")
kable(head(df))

nrd.col	rgh.col	r2	rmse	n.errors
nrd.005.030	roughness.1m	0.6524752	NA	1
nrd.005.035	roughness.1m	0.6962799	0.2099557	0
nrd.005.040	roughness.1m	0.6962457	0.2099745	0
nrd.005.045	roughness.1m	0.6963438	0.2099471	0
nrd.005.050	roughness.1m	0.6964302	0.2099225	0
nrd.005.055	roughness.1m	0.6965026	0.2099014	0

As you can see, this table contains 6 columns, only the first four of which really matter here:

nrd.col defines what NRD variable was used to build the model, with the two numbers (e.g., 005 and 030) representing the height range in cm. So, nrd.005.030 is the NRD between 5 and 30cm.
rgh.col defines what roughness variable was used to build the model. In addition to testing NRD height ranges, we also tested roughness focal radii (1, 2, 3, 4, and 5m), finding that 2m was the best.
r2 is the coefficient of determination between predicted and observed travel rates from a model built using the NRD and roughness variables shown in each row, in addition to slope.
rmse is the RMSE bewteen predictions and observations for each model.

We’ll focus on RMSE as the target measure of interest, and we’ll filter to 2m-radius roughness models, since those were found to be the best. Then, we’ll simpify the table to just the columns of interest. Lastly, we’ll pull the actual heights out from the nrd.col variable names, and calculate the range width.

# filter to just the 2m radius roughness models
df <- df[df$rgh.col == "roughness.2m",]
rownames(df) <- 1:nrow(df)

# simplify the table
df <- df[,c("nrd.col", "rmse")]

# add columns for height range floor and ceiling
df$nrd.floor <- str_split_i(df$nrd.col, "\\.", 2) |> as.numeric()
df$nrd.ceiling <- str_split_i(df$nrd.col, "\\.", 3) |> as.numeric()

# add column for height range width
df$nrd.width <- df$nrd.ceiling - df$nrd.floor
kable(head(df))

nrd.col	rmse	nrd.floor	nrd.ceiling	nrd.width
nrd.005.030	0.2023519	5	30	25
nrd.005.035	0.2021093	5	35	30
nrd.005.040	0.2019042	5	40	35
nrd.005.045	0.2017205	5	45	40
nrd.005.050	0.2015654	5	50	45
nrd.005.055	0.2014443	5	55	50

OK, so now let’s compare the NRD height range width to RMSE, to see if we can find a range that is both fairly wide and fairly accurate.

# plot out width vs rmse
par(mar = c(5,5,1,1), las = 1)
plot(
  x = df$nrd.width,
  y = df$rmse,
  pch = 16,
  xlab = "NRD Height Range Width (cm)",
  ylab = "Travel Rate Model Error (RMSE, m/s)"
)

Trippy! After a little exploration, I figured out that the weird line of high-error points are all associated with the height range floors of 5cm (see code and plot below). I attribute this to the fact that 5cm is too close to the ground to be useful, due to misclassification of ground points, vegetation that low not affecting movement, and the fact that NRD calculations won’t be very robust, since the denominator will only include a few more points, usually, than the numerator.

# plot out width vs rmse
par(mar = c(5,5,1,1), las = 1)
cols <- rep(4, nrow(df))
cols[df$nrd.floor == 5] <- 2
plot(
  x = df$nrd.width,
  y = df$rmse,
  pch = 16,
  col = cols,
  xlab = "NRD Height Range Width (cm)",
  ylab = "Travel Rate Model Error (RMSE, m/s)"
)
legend(
  "bottomright",
  legend = c("5cm Floor", "Not 5cm Floor"),
  col = c(2,4),
  pch = 16
)

For visual simplicity, we can get rid of all of the 5cm floor models. We’ll also highlight the single best model (0.85 - 1.20m):

# remove 5cm floor models
df <- df[df$nrd.floor != 5,]

# plot out width vs rmse
par(mar = c(5,5,1,1), las = 1)
plot(
  x = df$nrd.width,
  y = df$rmse,
  pch = 16,
  col = "lightgray",
  xlab = "NRD Height Range Width (cm)",
  ylab = "Travel Rate Model Error (RMSE, m/s)"
)
points(
  x = df$nrd.width[which.min(df$rmse)],
  y = min(df$rmse),
  pch = 16,
  cex = 1.5
)
text(
  x = df$nrd.width[which.min(df$rmse)],
  y = min(df$rmse),
  labels = paste0(
    df$nrd.floor[which.min(df$rmse)],
    " - ",
    df$nrd.ceiling[which.min(df$rmse)],
    "cm"
  ),
  pos = 4,
  offset = 0.5
)
legend(
  "bottomright",
  legend = c("Best Model", "Not Best Models"),
  col = c("black", "lightgray"),
  pch = 16,
  pt.cex = c(1.5, 1)
)

The question becomes “how much worse are we willing to go to get a wider range?” It’s hard to think about in terms of raw RMSE increase, and perhaps easier to do so in relative terms. In other words, saying that predictive capacity decreases by 0.005 m/s is harder to understand than predictive capacity decreases by 5%. To calculate relative decrease in model performance (RDMP), which is equivalent to the relative increase in RMSE, from the best model to each suboptimal models [i, …, n], I’ll use the following formula:

\[RDMP_{i}=\frac{RMSE_{i}-RMSE_{0.85-1.20m}}{RMSE_{0.85-1.20m}} \tag{1}\]

# calculate rdmp for each model
df$rdmp <- (df$rmse - min(df$rmse)) / (min(df$rmse))

# plot out rdmp versus height range width
par(mar = c(5,5,1,1), las = 1)
plot(
  x = df$nrd.width,
  y = df$rdmp,
  pch = 16,
  xlab = "NRD Height Range Width (cm)",
  ylab = "Relative Decrease in Model Performance"
)

From here, we can apply a variety of thresholds, depending on our willingness to increase error (decrease model performance). Certainly decreasing model performance by 1% won’t matter, but are we willing to go as high as 5%, for example?

RDMP <= 1%

# define rdmp threshold
thres <- 0.01

# plot out rdmp versus height range width
par(mar = c(5,5,1,1), las = 1)
cols <- rep(4, nrow(df))
cols[df$rdmp <= thres] <- 2
plot(
  x = df$nrd.width,
  y = df$rdmp,
  pch = 16,
  col = cols,
  xlab = "NRD Height Range Width (cm)",
  ylab = "Relative Decrease in Model Performance"
)
abline(h = thres, lty = 2)
text(
  x = 250,
  y = thres,
  labels = paste0("RDMP <= ", thres),
  pos = 1,
  offset = 0.5
)

# get the widest range within this threshold
df.1 <- df[df$rdmp <= thres,]
df.1 <- df.1[which.max(df.1$nrd.width),]
df.1$thres <- thres

RDMP <= 2%

# define rdmp threshold
thres <- 0.02

# plot out rdmp versus height range width
par(mar = c(5,5,1,1), las = 1)
cols <- rep(4, nrow(df))
cols[df$rdmp <= thres] <- 2
plot(
  x = df$nrd.width,
  y = df$rdmp,
  pch = 16,
  col = cols,
  xlab = "NRD Height Range Width (cm)",
  ylab = "Relative Decrease in Model Performance"
)
abline(h = thres, lty = 2)
text(
  x = 250,
  y = thres,
  labels = paste0("RDMP <= ", thres),
  pos = 1,
  offset = 0.5
)

# get the widest range within this threshold
df.2 <- df[df$rdmp <= thres,]
df.2 <- df.2[which.max(df.2$nrd.width),]
df.2$thres <- thres

RDMP <= 3%

# define rdmp threshold
thres <- 0.03

# plot out rdmp versus height range width
par(mar = c(5,5,1,1), las = 1)
cols <- rep(4, nrow(df))
cols[df$rdmp <= thres] <- 2
plot(
  x = df$nrd.width,
  y = df$rdmp,
  pch = 16,
  col = cols,
  xlab = "NRD Height Range Width (cm)",
  ylab = "Relative Decrease in Model Performance"
)
abline(h = thres, lty = 2)
text(
  x = 250,
  y = thres,
  labels = paste0("RDMP <= ", thres),
  pos = 1,
  offset = 0.5
)

# get the widest range within this threshold
df.3 <- df[df$rdmp <= thres,]
df.3 <- df.3[which.max(df.3$nrd.width),]
df.3$thres <- thres

RDMP <= 4%

# define rdmp threshold
thres <- 0.04

# plot out rdmp versus height range width
par(mar = c(5,5,1,1), las = 1)
cols <- rep(4, nrow(df))
cols[df$rdmp <= thres] <- 2
plot(
  x = df$nrd.width,
  y = df$rdmp,
  pch = 16,
  col = cols,
  xlab = "NRD Height Range Width (cm)",
  ylab = "Relative Decrease in Model Performance"
)
abline(h = thres, lty = 2)
text(
  x = 250,
  y = thres,
  labels = paste0("RDMP <= ", thres),
  pos = 1,
  offset = 0.5
)

# get the widest range within this threshold
df.4 <- df[df$rdmp <= thres,]
df.4 <- df.4[which.max(df.4$nrd.width),]
df.4$thres <- thres

RDMP <= 5%

# define rdmp threshold
thres <- 0.05

# plot out rdmp versus height range width
par(mar = c(5,5,1,1), las = 1)
cols <- rep(4, nrow(df))
cols[df$rdmp <= thres] <- 2
plot(
  x = df$nrd.width,
  y = df$rdmp,
  pch = 16,
  col = cols,
  xlab = "NRD Height Range Width (cm)",
  ylab = "Relative Decrease in Model Performance"
)
abline(h = thres, lty = 2)
text(
  x = 250,
  y = thres,
  labels = paste0("RDMP <= ", thres),
  pos = 1,
  offset = 0.5
)

# get the widest range within this threshold
df.5 <- df[df$rdmp <= thres,]
df.5 <- df.5[which.max(df.5$nrd.width),]
df.5$thres <- thres

Now I’ll compile the results, and print it out for all to see.

# combine the individual rdmp threshold dfs
df.results <- rbind(
  df.1,
  df.2,
  df.3,
  df.4,
  df.5
)
kable(df.results)

	nrd.col	rmse	nrd.floor	nrd.ceiling	nrd.width	rdmp	thres
564	nrd.060.150	0.1698264	60	150	90	0.0095820	0.01
432	nrd.045.165	0.1714425	45	165	120	0.0191893	0.02
339	nrd.035.175	0.1732465	35	175	140	0.0299138	0.03
294	nrd.030.195	0.1748993	30	195	165	0.0397393	0.04
248	nrd.025.215	0.1765293	25	215	190	0.0494295	0.05

For reference:

25cm is approximately shin height
50cm is approximately knee height
85cm is approximately waist height
100cm is approximately stomach height
120cm is approximately chest height
140cm is approximately shoulder height
170cm is approximately top of head height

And for comparison, the optimal height range (85 - 120cm) has an NRD width of 35cm. So, even by expanding to just an RDMP <= 1, we’ve already nearly tripled our height range of interest.

So, I guess the question is how low are we willing to go? Part of the answer to this might lie in how the lidar data look when mapped at these different height ranges.