Two additional changes were made:
This updated Markdown document reflects the changes above.
After a meeting on 7/6/2023 where I shared the results of the initial site placement, we decided to make a few key changes to the sampling procedure:
This updated Markdown document reflects the changes above.
In July-August of 2023, we will be collecting field data in the Monroe Mountain area of Fishlake National Forest. The field data will consist of two primary components: (1) fuel structural measurements; and (2) mobile laser scanning. The data will be collected in support of Carlos Silva and Andy Hudak’s recently-awarded Joint Fire Sciences Program grant aimed at using multi-scale, multi-platform lidar to characterize fuel structure throughout the western US. The data will also be useful for the Fire and Smoke Model Evaluation Experiment (FASMEE) project, supporting fuel structure and consumption mapping efforts.
Field data will be collected in a series of new plots, distributed throughout Monroe Mountain and aimed at capturing a wide range of fuel types, structures, and conditions. Accordingly, the objective of this document and the associated analysis is to devise and execute a sampling strategy for placing field plots within the Monroe Mountain area of Fishlake National Forest. This procedure will be driven by three main types of data:
The sampling scheme will be driven by the recently-developed Structurally-Guided Sampling (sgsR) R library by Goodbody et al., which provides a range of spatially-explicit statistical sampling algorithms designed specifically for these types of endeavors.
In this step, I will generate a suite of vegetation structural metrics from height-normalized airborne lidar data. I will use a pretty standard set of metrics I have been using for a number of years for various vegetation structural mapping projects. For more details on what these metrics represent, please see Table 1. in Campbell et al. (2021).
# load necessary libraries
library(moments)
library(lidR)
library(future)
library(terra)
library(rmarkdown)
library(randomForest)
library(RColorBrewer)
library(sgsR)
library(leaflet)
library(sf)
library(stringr)
library(foreign)
# define directory structure
main.dir <- "S:/ursa/campbell/fasmee/data"
als.dir <- file.path(main.dir, "lidar_data/monroe_mtn_lidar")
als.in.dir <- file.path(als.dir, "a04_height")
als.out.dir <- file.path(als.dir, "a05_rasters")
# define output file
out.file <- file.path(als.out.dir, "veg_mets.tif")
# check if this step has already been run
if (!file.exists(out.file)){
# begin parallelization
n.cores <- 24L
set_lidr_threads(n.cores)
plan(multisession, workers = n.cores)
# read in the las catalog
ctg <- readLAScatalog(als.in.dir)
# define pixel metrics function
pixel.metrics <- function(z,r,c){
n <- length(z)
n.first <- length(which(r == 1L))
n.first.bh <- length(which(r == 1L & z >= 1.37))
cc <- list(cc = n.first.bh / n.first)
n.bh <- length(which(z >= 1.37))
cd <- list(cd = n.bh / n)
ps <- seq(0,1,0.1)
p <- as.list(stats::quantile(z[c != 2L],ps))
names(p) <- paste0("p.",ps*100)
mh <- list(mh = mean(z))
mh.ag <- list(mh.ag = mean(z[c != 2L]))
sd <- list(sd = sd(z))
sd.ag <- list(sd.ag = sd(z[c != 2L]))
skew <- list(skew = moments::skewness(z))
skew.ag <- list(skew.ag = moments::skewness(z[c != 2L]))
kurt <- list(kurt = moments::kurtosis(z))
kurt.ag <- list(kurt.ag = moments::kurtosis(z[c != 2L]))
h <- c(0,0.5,1,2,4,8,16,32)
vrd <- list()
for (j in seq(1,length(h)-1)){
lo <- h[j]
hi <- h[j+1]
n.slice <- length(which(z >= lo & z < hi))
slice.name <- paste0("vrd.",lo * 100,".",hi * 100)
vrd[[slice.name]] <- n.slice / n
}
vnrd <- list()
for (j in seq(1,length(h)-1)){
lo <- h[j]
hi <- h[j+1]
n.slice <- length(which(z >= lo & z < hi))
n.slice.below <- length(which(z < hi))
slice.name <- paste0("vnrd.",lo * 100,".",hi * 100)
vnrd[[slice.name]] <- n.slice / n.slice.below
}
metrics <- c(cc, cd, p, mh, mh.ag, sd, sd.ag, skew, skew.ag, kurt, kurt.ag, vrd, vnrd)
return(metrics)
}
# generate metrics
veg.mets <- grid_metrics(ctg,
~pixel.metrics(Z,ReturnNumber,Classification),
res = 30)
veg.mets <- rast(veg.mets)
writeRaster(veg.mets, out.file, T)
# end parallelization
plan(sequential)
}
# read in the metrics as a rast
veg.mets <- rast(out.file)
Note that in the above code,
the “best” way to generate the metrics would be to use the
lidR::pixel_metrics() function, as it is the modern
replacement for the lidR::grid_metrics() function, which I
wound up using. However, it seems there is a bug somewhere in either the
lidR::pixel_metrics() function that was producing erroneous
outputs. No big deal, basically the same function – just had to convert
the raster object to a terra object to
modernize the output of grid_metrics().
OK, let’s take a look at the vegetation metrics results. In all, the resulting raster stack of metrics has 35 variables. Here are the variable names:
# print variable names
print(names(veg.mets))
## [1] "cc" "cd" "p.0" "p.10"
## [5] "p.20" "p.30" "p.40" "p.50"
## [9] "p.60" "p.70" "p.80" "p.90"
## [13] "p.100" "mh" "mh.ag" "sd"
## [17] "sd.ag" "skew" "skew.ag" "kurt"
## [21] "kurt.ag" "vrd.0.50" "vrd.50.100" "vrd.100.200"
## [25] "vrd.200.400" "vrd.400.800" "vrd.800.1600" "vrd.1600.3200"
## [29] "vnrd.0.50" "vnrd.50.100" "vnrd.100.200" "vnrd.200.400"
## [33] "vnrd.400.800" "vnrd.800.1600" "vnrd.1600.3200"
Let’s take a look at a handful of predictors:
# plot veg.mets
plot(veg.mets)
Cool, looks good. Let’s move on to vegetation spectral response…
As mentioned above, vegetation spectral response will be derived from a single, 2022 Landsat 9 OLI2 image. Ideally, we’d have 2023 data to work with but, to date, there are no available images from 2023 that are cloud-free over Monroe Mountain. I considered using an Earth Engine-derived summer composite, instead of a single scene, but nothing beats the spectral integrity of a single, cloud-free image acquired directly from the USGS. So, for the sake of deriving high-quality spectral information that is temporally consistent across the entire study area, I opted for a single scene:
This was the same image I used as the basis of identifying non-change areas (discussed later), so I already have the data processed to surface reflectance, a suite of vegetation indices derived, and everything subset to a bounding box around Monroe Mountain. Below, I’ll read the data in and plot out a few layers for illustrative purposes.
# define directory structure
ls.dir <- file.path(main.dir, "landsat")
ls.2022.dir <- file.path(ls.dir, "ls_20220610")
# list and print the individual landsat layers
ls.files <- list.files(ls.2022.dir, "*.tif$", full.names = T)
print(ls.files)
## [1] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_b1.tif"
## [2] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_b2.tif"
## [3] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_b3.tif"
## [4] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_b4.tif"
## [5] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_b5.tif"
## [6] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_b6.tif"
## [7] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_b7.tif"
## [8] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_evi.tif"
## [9] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_msavi.tif"
## [10] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_nbr.tif"
## [11] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_nbr2.tif"
## [12] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_ndmi.tif"
## [13] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_ndvi.tif"
## [14] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_nirv.tif"
## [15] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_savi.tif"
## [16] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_tcb.tif"
## [17] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_tcg.tif"
## [18] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_tcw.tif"
# read the data in
ls <- rast(ls.files)
# fix the layer names
rename.fun <- function(x){
splt <- strsplit(x, "/")[[1]]
last <- splt[length(splt)]
splt <- strsplit(last, "\\.")[[1]]
name <- splt[1]
return(name)
}
names <- unlist(lapply(ls.files, rename.fun))
names(ls) <- names
# plot out the first few layers
plot(ls)
Now, I’ll do some minor housekeeping… Eventually, all of these data
will have to be combined into a single, multi-layer raster dataset.
Accordingly, everything will have to be in the same coordinate
system/projection, have the same extent and origin (pixel grid
alignment). The ALS-derived vegetation metrics data are already
generated at a desirable extent (a couple-hundred meter buffer around
Monroe Mountain), so the Landsat data will have to be clipped to match
that. However, unfortunately, the ALS data have a somewhat peculiar
projection, so I’ll first have to reproject it into a “vanilla” version
of the projection they are currently in (NAD83 UTM Zone 12N, or
EPSG:26912). Once the data are aligned, I’ll add the vegetation metrics
and image data together into a single stack called
sample.rast. As this takes a while to run, and I’m running
this several times throughout the course of creating this document, I’ll
couch all of this inside a conditional statement that checks if this
merging has already been done.
# define directory structure
field.dir <- file.path(main.dir, "field_data")
samp.dir <- file.path(field.dir, "sampling")
# check if this has already been run
out.file <- file.path(samp.dir, "sample_rast_vegmets_ls.tif")
if (!file.exists(out.file)){
# reproject veg.mets
veg.mets <- project(veg.mets, "epsg:26912")
# create a cropping mask
m <- ifel(is.na(veg.mets[[1]]), NA, 1)
# reproject and crop ls
ls <- project(ls, m, align = T)
ls <- crop(ls, m, mask = T)
# reassign names (since they annoyingly get dropped)
names(ls) <- names
# merge veg.mets and ls
sample.rast <- c(veg.mets, ls)
# write to file
writeRaster(sample.rast, out.file, T)
}
# read the data back in and print the layer names
sample.rast <- rast(out.file)
print(names(sample.rast))
## [1] "cc" "cd" "p.0"
## [4] "p.10" "p.20" "p.30"
## [7] "p.40" "p.50" "p.60"
## [10] "p.70" "p.80" "p.90"
## [13] "p.100" "mh" "mh.ag"
## [16] "sd" "sd.ag" "skew"
## [19] "skew.ag" "kurt" "kurt.ag"
## [22] "vrd.0.50" "vrd.50.100" "vrd.100.200"
## [25] "vrd.200.400" "vrd.400.800" "vrd.800.1600"
## [28] "vrd.1600.3200" "vnrd.0.50" "vnrd.50.100"
## [31] "vnrd.100.200" "vnrd.200.400" "vnrd.400.800"
## [34] "vnrd.800.1600" "vnrd.1600.3200" "ls_20220610_b1"
## [37] "ls_20220610_b2" "ls_20220610_b3" "ls_20220610_b4"
## [40] "ls_20220610_b5" "ls_20220610_b6" "ls_20220610_b7"
## [43] "ls_20220610_evi" "ls_20220610_msavi" "ls_20220610_nbr"
## [46] "ls_20220610_nbr2" "ls_20220610_ndmi" "ls_20220610_ndvi"
## [49] "ls_20220610_nirv" "ls_20220610_savi" "ls_20220610_tcb"
## [52] "ls_20220610_tcg" "ls_20220610_tcw"
Excellent! Let’s move on to topography…
To capture a range of topographic conditions, I’ll grab DEM data from the USGS 3DEP. One might wonder “why not just use the ALS data?” which would be a very valid question. My answer is simply that the USGS is already using the ALS data to generate their DEMs (since the Monroe Mountain dataset is included in 3DEP), and they do a bunch of additional processing, QA/QC’ing of the data prior to producing their DEMs. So, I trust theirs more than I trust my own ability to generate a clean DEM. And, since we’re working at 30m resolution anyway, it doesn’t really benefit us to have the high spatial precision of the ALS data.
DEMs are available in 1x1 degree tiles, and Monroe Mountain sits right between two tiles:
So, I’ll download the latest version of those two tiles, at 1
arc-second (~30m) resolution from the National Map. I’ll proceed to
mosaic and reproject them to match the projection of
sample.rast. I’ll then crop it to an extent that is
somewhat larger than the sample.rast, so as to minimizing
processing time while still avoiding edge effects for focal terrain
metrics that rely on a neighborhood around each pixel (e.g., slope,
topographic position, etc.).
# define directory structure
dem.dir <- file.path(main.dir, "dem/res_30m")
raw.dir <- file.path(dem.dir, "a01_raw")
# check if this has already been run
out.file <- file.path(dem.dir, "elevation.tif")
if (!file.exists(out.file)){
# mosaic the tiles together
tile.files <- list.files(raw.dir, "*.tif$", full.names = T)
tile.rasts <- lapply(tile.files, rast)
tile.sprc <- sprc(tile.rasts)
elev <- mosaic(tile.sprc)
# reproject
elev <- project(elev, sample.rast, align = T)
# crop it to a large buffer around the extent of sample.rast
m <- ifel(is.na(sample.rast[[1]]), NA, 1)
buff <- buffer(m, 1000)
elev <- crop(elev, buff)
# change name
names(elev) <- "elevation"
# write to file
writeRaster(elev, out.file, T)
}
# read in the data and plot it out
elev <- rast(out.file)
plot(elev)
Now, I’ll generate a suite of terrain metrics. Some of them are
easily-generated using the terra::terrain() function,
others I will generate in a somewhat more manual way. Among the latter
type, you will notice my creation of a “donut” function – this is aimed
at calculating topographic position index (TPI), which is the difference
between a cell’s elevation and the mean of a donut (or, more technically
“annulus”) surrounding that cell with a given inner and outer radius.
For my donuts, the inner radius is always half the outer radius.
# slope
out.file <- file.path(dem.dir, "slope.tif")
if (!file.exists(out.file)){
slope <- terrain(elev, "slope")
names(slope) <- "slope"
writeRaster(slope, out.file, T)
}
slope <- rast(out.file)
# aspect
out.file <- file.path(dem.dir, "aspect.tif")
if (!file.exists(out.file)){
aspect <- terrain(elev, "aspect")
names(aspect) <- "aspect"
writeRaster(aspect, out.file, T)
}
aspect <- rast(out.file)
# cos(aspect)
out.file <- file.path(dem.dir, "aspect_cos.tif")
if (!file.exists(out.file)){
aspect.cos <- cos(aspect * pi / 180)
names(aspect.cos) <- "aspect_cos"
writeRaster(aspect.cos, out.file, T)
}
aspect.cos <- rast(out.file)
# sin(aspect)
out.file <- file.path(dem.dir, "aspect_sin.tif")
if (!file.exists(out.file)){
aspect.sin <- sin(aspect * pi / 180)
names(aspect.sin) <- "aspect_sin"
writeRaster(aspect.sin, out.file, T)
}
aspect.sin <- rast(out.file)
# slope x cos(aspect)
out.file <- file.path(dem.dir, "slope_aspect_cos.tif")
if (!file.exists(out.file)){
slope.aspect.cos <- slope * aspect.cos
names(slope.aspect.cos) <- "slope_aspect_cos"
writeRaster(slope.aspect.cos, out.file, T)
}
slope.aspect.cos <- rast(out.file)
# slope x sin(aspect)
out.file <- file.path(dem.dir, "slope_aspect_sin.tif")
if (!file.exists(out.file)){
slope.aspect.sin <- slope * aspect.sin
names(slope.aspect.sin) <- "slope_aspect_sin"
writeRaster(slope.aspect.sin, out.file, T)
}
slope.aspect.sin <- rast(out.file)
# tri.1
out.file <- file.path(dem.dir, "tri_1.tif")
if (!file.exists(out.file)){
tri.1 <- terrain(elev, "TRI")
names(tri.1) <- "tri_1"
writeRaster(tri.1, out.file, T)
}
tri.1 <- rast(out.file)
# tri.2
out.file <- file.path(dem.dir, "tri_2.tif")
if (!file.exists(out.file)){
tri.2 <- terrain(elev, "TRIriley")
names(tri.2) <- "tri_2"
writeRaster(tri.2, out.file, T)
}
tri.2 <- rast(out.file)
# tri.3
out.file <- file.path(dem.dir, "tri_3.tif")
if (!file.exists(out.file)){
tri.3 <- terrain(elev, "TRIrmsd")
names(tri.3) <- "tri_3"
writeRaster(tri.3, out.file, T)
}
tri.3 <- rast(out.file)
# roughness
out.file <- file.path(dem.dir, "roughness.tif")
if (!file.exists(out.file)){
roughness <- terrain(elev, "roughness")
names(roughness) <- "roughness"
writeRaster(roughness, out.file, T)
}
roughness <- rast(out.file)
# define focal donut function
donut <- function(inner,outer){
r <- rast(xmin = outer * (-1) - 0.5,
xmax = outer + 0.5,
ymin = outer * (-1) - 0.5,
ymax = outer + 0.5,
crs = "EPSG:5070",
resolution = 1)
values(r) <- 1
pt <- vect(data.frame(x = 0, y = 0), geom = c("x", "y"), crs = "EPSG:5070")
buff.in <- buffer(pt, inner)
buff.out <- buffer(pt, outer)
buff.out$rastval <- 1
don <- erase(buff.out, buff.in)
r <- crop(r, don, mask = T)
m <- as.matrix(r, wide = T)
m[is.nan(m)] <- NA
m
}
# tpi.02
out.file <- file.path(dem.dir, "tpi_02.tif")
if (!file.exists(out.file)){
focal.mat <- donut(1,2)
tpi.02 <- elev - focal(elev, focal.mat, mean, na.rm = T)
names(tpi.02) <- "tpi_02"
writeRaster(tpi.02, out.file, T)
}
tpi.02 <- rast(out.file)
# tpi.04
out.file <- file.path(dem.dir, "tpi_04.tif")
if (!file.exists(out.file)){
focal.mat <- donut(2,4)
tpi.04 <- elev - focal(elev, focal.mat, mean, na.rm = T)
names(tpi.04) <- "tpi_04"
writeRaster(tpi.04, out.file, T)
}
tpi.04 <- rast(out.file)
# tpi.08
out.file <- file.path(dem.dir, "tpi_08.tif")
if (!file.exists(out.file)){
focal.mat <- donut(4,8)
tpi.08 <- elev - focal(elev, focal.mat, mean, na.rm = T)
names(tpi.08) <- "tpi_08"
writeRaster(tpi.08, out.file, T)
}
tpi.08 <- rast(out.file)
# tpi.16
out.file <- file.path(dem.dir, "tpi_16.tif")
if (!file.exists(out.file)){
focal.mat <- donut(8,16)
tpi.16 <- elev - focal(elev, focal.mat, mean, na.rm = T)
names(tpi.16) <- "tpi_16"
writeRaster(tpi.16, out.file, T)
}
tpi.16 <- rast(out.file)
# combine into a multi-layer raster and plot
topo <- c(elev, slope, aspect.cos, aspect.sin, slope.aspect.cos,
slope.aspect.sin, tri.1, tri.2, tri.3, roughness, tpi.02, tpi.04,
tpi.08, tpi.16)
plot(topo)
Lastly, I’ll crop the topographic predictor stack to match the extent
of sample.rast and merge the two.
# check if this has already been run
out.file <- file.path(samp.dir, "sample_rast_vegmets_ls_topo.tif")
if (!file.exists(out.file)){
# get list of names before they get lost
names <- names(topo)
# crop topo
m <- ifel(is.na(sample.rast[[1]]), NA, 1)
topo <- crop(topo, m, mask = T)
# reassign names (since they annoyingly get dropped)
names(topo) <- names
# merge sample.rast and topo
sample.rast <- c(sample.rast, topo)
# write to file
writeRaster(sample.rast, out.file, T)
}
# read the data back in and print the layer names
sample.rast <- rast(out.file)
print(names(sample.rast))
## [1] "cc" "cd" "p.0"
## [4] "p.10" "p.20" "p.30"
## [7] "p.40" "p.50" "p.60"
## [10] "p.70" "p.80" "p.90"
## [13] "p.100" "mh" "mh.ag"
## [16] "sd" "sd.ag" "skew"
## [19] "skew.ag" "kurt" "kurt.ag"
## [22] "vrd.0.50" "vrd.50.100" "vrd.100.200"
## [25] "vrd.200.400" "vrd.400.800" "vrd.800.1600"
## [28] "vrd.1600.3200" "vnrd.0.50" "vnrd.50.100"
## [31] "vnrd.100.200" "vnrd.200.400" "vnrd.400.800"
## [34] "vnrd.800.1600" "vnrd.1600.3200" "ls_20220610_b1"
## [37] "ls_20220610_b2" "ls_20220610_b3" "ls_20220610_b4"
## [40] "ls_20220610_b5" "ls_20220610_b6" "ls_20220610_b7"
## [43] "ls_20220610_evi" "ls_20220610_msavi" "ls_20220610_nbr"
## [46] "ls_20220610_nbr2" "ls_20220610_ndmi" "ls_20220610_ndvi"
## [49] "ls_20220610_nirv" "ls_20220610_savi" "ls_20220610_tcb"
## [52] "ls_20220610_tcg" "ls_20220610_tcw" "elevation"
## [55] "slope" "aspect_cos" "aspect_sin"
## [58] "slope_aspect_cos" "slope_aspect_sin" "tri_1"
## [61] "tri_2" "tri_3" "roughness"
## [64] "tpi_02" "tpi_04" "tpi_08"
## [67] "tpi_16"
Splendid! Moving on…
While recognizing that the primary goal of this document is to select a set of field plot locations, I wanted to quickly test something out that’s not entirely unrelated. Ryan McCarley sent me a dataset containing plot locations for 100 stand exam plots the US Forest Service collected, apparently in 2017. The data were shared with the intent of being useful for this site selection process, so I thought I’d give it a shot. Since they’re just a sparse sample of plots, they would only be directly useful if we were to try to collect the exact same plot locations. But, perhaps through some quick predictive modeling, we can use them to drive a wall-to-wall prediction of some sort that might add value to this site selection process.
With that in mind, I’ll first read in and explore the data a bit.
# define directory structure
usfs.dir <- file.path(main.dir, "usfs_boundaries")
ryan.dir <- file.path(main.dir, "from_ryan/for_mickey_062223")
# read in the data
plots <- vect(file.path(ryan.dir, "fs2017_ba.shp"))
# read in monroe mountain boundary for reference
mm <- vect(file.path(usfs.dir, "monroe_mountain_section.shp"))
# get everything in the same coordinate system
plots <- project(plots, sample.rast)
mm <- project(mm, sample.rast)
# plot the data out
plot(mm)
points(plots, pch = 21, bg = "blue", cex = 1)
# explore the data
paged_table(as.data.frame(plots))
Cool! So, I’m not exactly sure what the units are for basal area in these plots, but let’s assume they’re at least directly comparable to one another. Under that assumption, I think there’s probably two maximally-useful metrics we can derive here (and ultimately use to try to make predictions):
# define species ba columns
sp.cols <- names(plots)[!names(plots) %in% c("Comment", "Plot_Num")]
# calculate total basal area
plots$ba.total <- rowSums(plots[[sp.cols]], na.rm = T)
# remove rows without basal area
plots <- plots[plots$ba.total > 0,]
# get most abundant species
sp.dom <- unlist(apply(as.data.frame(plots[sp.cols]), 1, function(x) sp.cols[which.max(x)]))
plots$sp.dom <- sp.dom
# remove unncessary fields
plots <- plots[,c("sp.dom", "ba.total")]
# print out the results
paged_table(as.data.frame(plots))
# plot out the results
hist(plots$ba.total)
barplot(table(plots$sp.dom), las = 2)
Clearly there’s a nice distribution of basal areas. Not a very even
distribution of species dominance, as we might expect, but let’s try
modeling both variables anyway. I’ll do this in a pretty “dumb” way, by
just extracting pixel values from sample.rast at the plot
locations, and running an untuned random forest. This will at least
provide a sense of if there’s any hope whatsoever. Obviously some
tweaks, such as model tuning, proper training/test splits, using Landsat
data that were from when the plots were collected, etc. etc. etc. would
improve results. But, we can explore that later if it turns out this
simple approach shows any promise.
First, I’ll try modeling species dominance…
# extract pixel values from the sample raster at each plot
model.df <- extract(sample.rast, plots, "mean", exact = T)
model.df <- cbind(plots[c("sp.dom", "ba.total")], model.df[,seq(2,ncol(model.df))])
# remove columns with nas
na.sums <- apply(as.data.frame(model.df), 2, function(x) sum(is.na(x)))
model.df <- model.df[,-which(na.sums > 0)]
# coerce sp.dom to factor
model.df$sp.dom <- as.factor(model.df$sp.dom)
#---------------sp.dom modeling
# create modeling data frame
x.cols <- seq(3,ncol(model.df))
y.col <- 1
mod.df.sp.dom <- as.data.frame(model.df[,c(y.col, x.cols)])
# build random forest model
rf.sp.dom <- randomForest(sp.dom ~ ., data = mod.df.sp.dom)
rf.sp.dom
##
## Call:
## randomForest(formula = sp.dom ~ ., data = mod.df.sp.dom)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 8
##
## OOB estimate of error rate: 37.78%
## Confusion matrix:
## ABCO ABLA JUSC2 PIED PIEN PIPU POTR5 PSME QUGA class.error
## ABCO 1 3 0 0 0 0 5 0 0 0.8888889
## ABLA 2 10 0 0 1 0 3 1 0 0.4117647
## JUSC2 0 0 6 0 0 0 0 0 0 0.0000000
## PIED 0 0 1 1 0 0 0 0 0 0.5000000
## PIEN 1 3 0 0 1 0 2 0 0 0.8571429
## PIPU 0 0 0 0 0 0 1 0 0 1.0000000
## POTR5 3 1 0 0 0 0 37 1 0 0.1190476
## PSME 1 2 0 0 0 0 2 0 0 1.0000000
## QUGA 0 0 1 0 0 0 0 0 0 1.0000000
Not altogether awful! Of course, not great by any means. Let’s try modeling total basal area:
#---------------ba.total modeling
# create modeling data frame
x.cols <- seq(3,ncol(model.df))
y.col <- 2
mod.df.ba.total <- as.data.frame(model.df[,c(y.col, x.cols)])
# build random forest model
rf.ba.total <- randomForest(ba.total ~ ., data = mod.df.ba.total)
rf.ba.total
##
## Call:
## randomForest(formula = ba.total ~ ., data = mod.df.ba.total)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 22
##
## Mean of squared residuals: 1847.465
## % Var explained: 58.24
# plot predicted vs. observed and get model performance metrics
x <- mod.df.ba.total$ba.total
y <- predict(rf.ba.total)
plot(y ~ x,
xlim = c(min(c(x,y)), max(c(x,y))),
ylim = c(min(c(x,y)), max(c(x,y))),
ylab = "Predicted", xlab = "Observed")
lines(x = c(-1000,1000), y = c(-1000,1000))
mod <- lm(y ~ x)
abline(mod, lwd = 2, col = "red")
r2 <- round(summary(mod)$adj.r.squared,2)
rmse <- round(sqrt(mean((y - x)^2)),2)
legend("topleft",
legend = c(paste0("R2 = ", r2),
paste0("RMSE = ", rmse)),
bty = "n", x.intersp = 0, text.col = "red")
Not bad! It’s not entirely clear how much these two predictive models might add to the site selection process, but for the sake of completeness, let’s make some predictive maps and add them to the suite of spatial variables.
# create a subset of sample.rast with just the non-na layers
sample.rast.nona <- sample.rast[[names(sample.rast) %in% names(model.df)]]
# check to see if this has already been run
out.file <- file.path(ryan.dir, "sp_dom.tif")
if (!file.exists(out.file)){
# make ba.total prediction map
pf <- function(mod, dat, ...){
library(randomForest)
predict(mod, dat, ...)
}
sp.dom.rast <- terra::predict(sample.rast.nona, rf.sp.dom,
fun = pf, na.rm = T, cores = 8L)
names(sp.dom.rast) <- "sp_dom"
writeRaster(sp.dom.rast, out.file, T)
}
# read the data back in and plot it out
sp.dom.rast <- rast(out.file)
plot(sp.dom.rast, col = brewer.pal(9, "Set1"))
# check to see if this has already been run
out.file <- file.path(ryan.dir, "ba_total.tif")
if (!file.exists(out.file)){
# make ba.total prediction map
pf <- function(mod, dat, ...){
library(randomForest)
predict(mod, dat, ...)
}
ba.total.rast <- terra::predict(sample.rast.nona, rf.ba.total,
fun = pf, na.rm = T, cores = 8L)
names(ba.total.rast) <- "ba_total"
writeRaster(ba.total.rast, out.file, T)
}
# read the data back in and plot it out
ba.total.rast <- rast(out.file)
plot(ba.total.rast)
# add these two new layers to sample.rast
out.file <- file.path(samp.dir, "sample_rast_vegmets_ls_topo_pred.tif")
if (!file.exists(out.file)){
sample.rast <- c(sample.rast, sp.dom.rast, ba.total.rast)
writeRaster(sample.rast, out.file, T)
}
sample.rast <- rast(out.file)
print(names(sample.rast))
## [1] "cc" "cd" "p.0"
## [4] "p.10" "p.20" "p.30"
## [7] "p.40" "p.50" "p.60"
## [10] "p.70" "p.80" "p.90"
## [13] "p.100" "mh" "mh.ag"
## [16] "sd" "sd.ag" "skew"
## [19] "skew.ag" "kurt" "kurt.ag"
## [22] "vrd.0.50" "vrd.50.100" "vrd.100.200"
## [25] "vrd.200.400" "vrd.400.800" "vrd.800.1600"
## [28] "vrd.1600.3200" "vnrd.0.50" "vnrd.50.100"
## [31] "vnrd.100.200" "vnrd.200.400" "vnrd.400.800"
## [34] "vnrd.800.1600" "vnrd.1600.3200" "ls_20220610_b1"
## [37] "ls_20220610_b2" "ls_20220610_b3" "ls_20220610_b4"
## [40] "ls_20220610_b5" "ls_20220610_b6" "ls_20220610_b7"
## [43] "ls_20220610_evi" "ls_20220610_msavi" "ls_20220610_nbr"
## [46] "ls_20220610_nbr2" "ls_20220610_ndmi" "ls_20220610_ndvi"
## [49] "ls_20220610_nirv" "ls_20220610_savi" "ls_20220610_tcb"
## [52] "ls_20220610_tcg" "ls_20220610_tcw" "elevation"
## [55] "slope" "aspect_cos" "aspect_sin"
## [58] "slope_aspect_cos" "slope_aspect_sin" "tri_1"
## [61] "tri_2" "tri_3" "roughness"
## [64] "tpi_02" "tpi_04" "tpi_08"
## [67] "tpi_16" "sp_dom" "ba_total"
OK, so we now have this cool 69-layer predictor stack that should capture a wide range of variability in fuel types, conditions, and loads. Prior to running the structurally-guided sampling, we need to mask out areas that we do not want to or cannot sample. Here is a list of the landscape characteristics that we do want to sample:
Monroe Mountain has an extensive recent history of prescribed fire – partly through its association with FASMEE. In addition to recent past burns, there are a host of future burns planned in the area. This represents a valuable remeasurement opportunity to characterize fuel consumption. Accordingly, it will be useful to consider the extent of planned burn units in the placement of fuel plots. However, to only consider areas that are slated for future burns would be too limiting, as it would focus heavily on areas with high fuel loads and forested areas, perhaps ignoring valuable areas that have more “desirable” fuel conditions, such as those that have already burned recently. Accordingly, we will use a bounding box surrounding the planned burn units as a constraint, which will ensure that a large number of plots falls within the burn units, while also allowing plots to fall in nearby areas that represent a broader set of fuel structural conditions.
# check to see if this has already been run
out.file <- file.path(samp.dir, "mask_bu.tif")
if (!file.exists(out.file)){
# define directory structure
burn.dir <- file.path(main.dir, "usfs_burn_units")
# read in the data
burn.file <- file.path(burn.dir, "usfs_burn_units_20230714_utm_box.shp")
burn.vect <- vect(burn.file)
# add rasterization field
burn.vect$rast.val <- 1
# rasterize burn vect
burn.rast <- rasterize(burn.vect, sample.rast, "rast.val")
# create a copy of it and call it 'combined.mask'
combined.mask <- burn.rast
# write to file
writeRaster(combined.mask, out.file, T)
}
# read the data back in and plot it out
combined.mask <- rast(out.file)
plot(combined.mask)
This is for accessibility purposes and practicality of fuel plot
measurement and MLS scanning. In a perfect world, we would sample steep
slopes, as they might possess unique fuel conditions. But, in our
imperfect, more pragmatic world, this is a necessary step. This can be
easily derived from the existing slope layer of the
sample.rast.
# create a function for identifying the smallest combined extent between two rasts
min.ext.fun <- function(a,b){
xmin <- max(c(xmin(a), xmin(b)))
xmax <- min(c(xmax(a), xmax(b)))
ymin <- max(c(ymin(a), ymin(b)))
ymax <- min(c(ymax(a), ymax(b)))
e <- ext(xmin, xmax, ymin, ymax)
return(e)
}
# check to see if this has already been run
out.file <- file.path(samp.dir, "mask_bu_sl.tif")
if (!file.exists(out.file)){
# isolate the slope layer from sample.rast
slope.rast <- sample.rast[["slope"]]
# threshold it
slope.rast <- ifel(slope.rast < 30, 1, NA)
# plot it out
plot(slope.rast)
# crop both rasters to the minimum extent
cropper <- min.ext.fun(slope.rast, combined.mask)
slope.rast <- crop(slope.rast, cropper)
combined.mask <- crop(combined.mask, cropper)
# merge them
combined.mask <- slope.rast + combined.mask
combined.mask <- ifel(combined.mask == 2, 1, NA)
# write to file
writeRaster(combined.mask, out.file, T)
}
# read the data back in and plot
combined.mask <- rast(out.file)
plot(combined.mask)
This is for accessibility purposes – we don’t want to be too close to
a road (>25m) but we also don’t want to be too far from a road
(<175m). Theoretically, this this can be done from within the
sgsR sampling functions, but since I’ll be doing a whole
bunch of other masking, I’ll just lump it into this step. There are
several potential roads datasets that can be used, including:
The first two appear to only have USFS roads (i.e., misses main state highways, etc.). So I’ll go with the FSTopo roads. Since we’ll be using high-clearance vehicles to access plots, I’m going to be fairly liberal with what I’m willing to consider an accessible road. The only class I will remove is that of “trails”. All other roads will be considered passible (which is almost certainly not the case, but in my experience USFS road quality attribution is questionable at best anyway).
# check to see if this has already been run
out.file <- file.path(samp.dir, "mask_bu_sl_rd.tif")
if (!file.exists(out.file)){
# define directory structure
road.gdb <- file.path(main.dir, "usfs_roads/S_USA.FSTopo_Transport_LN.gdb")
# read in the data
roads <- vect(road.gdb, "FSTopo_Transport_LN_monroe")
# reproject to match sample.rast
roads <- project(roads, sample.rast)
# read in road classification lookup table and join to data
road.lut <- read.csv(file.path(main.dir, "usfs_roads/FCSubTypes-by-Feature-Class.csv"))
colnames(road.lut) <- toupper(colnames(road.lut))
roads <- merge(roads, road.lut)
# print out unique road types
print(unique(as.data.frame(roads[,c("FCSUBTYPE", "DESCRIPTION")])))
# remove trails, but keep all other road types
roads <- roads[roads$FCSUBTYPE != 107,]
# create inner and outer buffers, and get difference
buff.inner <- buffer(roads, 25)
buff.outer <- buffer(roads, 175)
buff.diff <- erase(buff.outer, buff.inner)
# convert to raster
buff.diff$rastid <- 1
buff.rast <- rasterize(buff.diff, sample.rast, "rastid")
# crop both rasters to the minimum extent
cropper <- min.ext.fun(buff.rast, combined.mask)
buff.rast <- crop(buff.rast, cropper)
combined.mask <- crop(combined.mask, cropper)
# merge them
combined.mask <- buff.rast + combined.mask
combined.mask <- ifel(combined.mask == 2, 1, NA)
# write to file
writeRaster(combined.mask, out.file, T)
}
# read the data back in and plot
combined.mask <- rast(out.file)
plot(combined.mask)
This is for accessibility and ensuring we’re not on any private inholdings within the USFS admin boundary. It will be derived from the 2023 USDA Forest Service Basic Ownership dataset.
# check to see if this has already been run
out.file <- file.path(samp.dir, "mask_bu_sl_rd_pl.tif")
if (!file.exists(out.file)){
# define directory structure
own.gdb <- file.path(main.dir, "usfs_land_ownership/S_USA.BasicOwnership.gdb")
# read in the data
own <- vect(own.gdb, "BasicOwnership_monroe")
# reproject to match sample.rast
own <- project(own, sample.rast)
# select only usfs ownership
own <- own[own$OWNERCLASSIFICATION == "USDA FOREST SERVICE",]
# convert to raster
own$rastid <- 1
own.rast <- rasterize(own, sample.rast, "rastid")
# crop both rasters to the minimum extent
cropper <- min.ext.fun(own.rast, combined.mask)
own.rast <- crop(own.rast, cropper)
combined.mask <- crop(combined.mask, cropper)
# merge them
combined.mask <- own.rast + combined.mask
combined.mask <- ifel(combined.mask == 2, 1, NA)
# write to file
writeRaster(combined.mask, out.file, T)
}
# read the data back in and plot
combined.mask <- rast(out.file)
plot(combined.mask)
We want to focus only on naturally vegetated areas, avoiding water features, developed areas, and cultivated areas (not that there will be much of the latter two in Monroe Mountain…). This will be derived from the 2022 LANDFIRE Existing Vegetation Type.
# check to see if this has already been run
out.file <- file.path(samp.dir, "mask_bu_sl_rd_pl_lc.tif")
if (!file.exists(out.file)){
# define directory structure
lf.dir <- file.path(main.dir, "landfire/LF2022_EVT_230_CONUS")
# read in the data
lf <- rast(file.path(lf.dir, "LC22_EVT_230.tif"))
# reproject to match sample.rast
lf <- project(lf, sample.rast, align = T, method = "near")
# read in attribute table
lf.df <- read.dbf(file.path(lf.dir, "LF22_EVT_230.dbf"))
# get desirable land covers
phys <- c("Shrubland", "Conifer", "Conifer-Hardwood", "Hardwood")
keeps <- lf.df$VALUE[lf.df$EVT_PHYS %in% phys]
# select only natural, upland vegetation classes
lf <- ifel(lf %in% keeps, 1, NA)
# crop both rasters to the minimum extent
cropper <- min.ext.fun(lf, combined.mask)
lf <- crop(lf, cropper)
combined.mask <- crop(combined.mask, cropper)
# merge them
combined.mask <- lf + combined.mask
combined.mask <- ifel(combined.mask == 2, 1, NA)
# write to file
writeRaster(combined.mask, out.file, T)
}
# read the data back in and plot
combined.mask <- rast(out.file)
plot(combined.mask)
Although the NLCD analysis above should have eliminated most large water features, the spatial precision of the NLCD’s water classification is relatively low, and smaller water features that we would still not want to place plots in are likely missed. Accordingly, in this step, the National Hydrography Dataset will be used to further refine the mask to remove all areas that are identified as perennial streams or waterbodies.
# check to see if this has already been run
out.file <- file.path(samp.dir, "mask_bu_sl_rd_pl_lc_hy.tif")
if (!file.exists(out.file)){
# define directory structure
nhd.dir <- file.path(main.dir, "nhd")
# define inputs
stream.file <- file.path(nhd.dir, "streams.shp")
wtrbdy.file <- file.path(nhd.dir, "waterbodies.shp")
# read in the data
stream <- vect(stream.file)
wtrbdy <- vect(wtrbdy.file)
# add raster value
stream$rast.val <- 1
wtrbdy$rast.val <- 1
# rasterize them
stream.rast <- rasterize(stream, sample.rast, "rast.val", background = 0)
wtrbdy.rast <- rasterize(wtrbdy, sample.rast, "rast.val", background = 0)
# combine them
water.rast <- stream.rast + wtrbdy.rast
# select only non-water areas
water.rast <- ifel(water.rast == 0, 1, NA)
# merge them
combined.mask <- water.rast + combined.mask
combined.mask <- ifel(combined.mask == 2, 1, NA)
# write to file
writeRaster(combined.mask, out.file, T)
}
# read the data back in and plot
combined.mask <- rast(out.file)
plot(combined.mask)
I think it makes sense to avoid riparian areas, since they will tend to have poorly-represented vegetation types (e.g., cottonwoods, willows, etc.). Technically, there are classes in the NLCD that should, in theory, capture these areas. But there is this cool, nationwide riparian map that is more focused on the landform-based classification of riparian areas that will likely more precisely capture the unique vegetation assemblages found within these areas. It is the 2019 National Riparian Areas Base Map.
# check to see if this has already been run
out.file <- file.path(samp.dir, "mask_bu_sl_rd_pl_lc_hy_rp.tif")
if (!file.exists(out.file)){
# define directory structure
rip.dir <- file.path(main.dir, "usfs_riparian")
# read in the data
rip <- rast(file.path(rip.dir, "usfs_riparian_areas_monroe.tif"))
# reproject to match sample.rast
rip <- project(rip, sample.rast, align = T, method = "near")
# select only non-riparian areas
rip <- ifel(is.na(rip), 1, NA)
# crop both rasters to the minimum extent
cropper <- min.ext.fun(rip, combined.mask)
rip <- crop(rip, cropper)
combined.mask <- crop(combined.mask, cropper)
# merge them
combined.mask <- rip + combined.mask
combined.mask <- ifel(combined.mask == 2, 1, NA)
# write to file
writeRaster(combined.mask, out.file, T)
}
# read the data back in and plot
combined.mask <- rast(out.file)
plot(combined.mask)
Now that we’ve got our mask finalized, let’s use it to crop our
sample.rast. The result of this will be the input dataset
for our structurally guided sampling.
# check to see if this has already been run
out.file <- file.path(samp.dir, "sample_rast_vegmets_ls_topo_pred_mask.tif")
if (!file.exists(out.file)){
# crop both rasters to the minimum extent
cropper <- min.ext.fun(sample.rast, combined.mask)
sample.rast.mask <- crop(sample.rast, cropper)
combined.mask <- crop(combined.mask, cropper)
sample.rast.mask <- mask(sample.rast.mask, combined.mask)
# write to file
writeRaster(sample.rast.mask, out.file, T)
}
# read the data back in and plot it out
sample.rast.mask <- rast(out.file)
plot(sample.rast.mask)
To see how well this sample area captures variability in important landscape characteristics throughout the entirety of Monroe Mountain, I’ll plot out some density distribution comparisons. For simplicity, I’ll just look at four variables:
# get density distributions
d.cc.full <- density(sample.rast[["cc"]], plot = F)
d.cc.samp <- density(sample.rast.mask[["cc"]], plot = F)
d.ch.full <- density(sample.rast[["p.100"]], plot = F)
d.ch.samp <- density(sample.rast.mask[["p.100"]], plot = F)
d.ba.full <- density(sample.rast[["ba_total"]], plot = F)
d.ba.samp <- density(sample.rast.mask[["ba_total"]], plot = F)
d.el.full <- density(sample.rast[["elevation"]], plot = F)
d.el.samp <- density(sample.rast.mask[["elevation"]], plot = F)
# define function to add alpha
add.alpha <- function(col, alpha=1){
apply(sapply(col, col2rgb)/255, 2,
function(x)
rgb(x[1], x[2], x[3], alpha=alpha))
}
# set up plot
par(mfrow = c(2,2), las = 1, mar = c(5,5,1,1))
cols.full <- c(2, add.alpha(2, 0.25))
cols.samp <- c(4, add.alpha(4, 0.25))
# cc plot
xmin <- min(c(d.cc.full[[1]]$x, d.cc.samp[[1]]$x))
xmax <- max(c(d.cc.full[[1]]$x, d.cc.samp[[1]]$x))
ymin <- min(c(d.cc.full[[1]]$y, d.cc.samp[[1]]$y))
ymax <- max(c(d.cc.full[[1]]$y, d.cc.samp[[1]]$y))
plot(c(xmin, xmax), c(ymin, ymax), type = "n", ylab = "Density",
xlab = "Canopy Cover")
grid()
box()
polygon(d.cc.full[[1]]$x, d.cc.full[[1]]$y, border = NA, col = cols.full[2])
lines(d.cc.full[[1]]$x, d.cc.full[[1]]$y, col = cols.full[1])
polygon(d.cc.samp[[1]]$x, d.cc.samp[[1]]$y, border = NA, col = cols.samp[2])
lines(d.cc.samp[[1]]$x, d.cc.samp[[1]]$y, col = cols.samp[1])
# ch plot
xmin <- min(c(d.ch.full[[1]]$x, d.ch.samp[[1]]$x))
xmax <- max(c(d.ch.full[[1]]$x, d.ch.samp[[1]]$x))
ymin <- min(c(d.ch.full[[1]]$y, d.ch.samp[[1]]$y))
ymax <- max(c(d.ch.full[[1]]$y, d.ch.samp[[1]]$y))
plot(c(xmin, xmax), c(ymin, ymax), type = "n", ylab = "Density",
xlab = "Canopy Height")
grid()
box()
polygon(d.ch.full[[1]]$x, d.ch.full[[1]]$y, border = NA, col = cols.full[2])
lines(d.ch.full[[1]]$x, d.ch.full[[1]]$y, col = cols.full[1])
polygon(d.ch.samp[[1]]$x, d.ch.samp[[1]]$y, border = NA, col = cols.samp[2])
lines(d.ch.samp[[1]]$x, d.ch.samp[[1]]$y, col = cols.samp[1])
# add legend
legend("topright", legend = c("Full", "Sample"),
border = c(cols.full[1], cols.samp[1]),
fill = c(cols.full[2], cols.samp[2]))
# ba plot
xmin <- min(c(d.ba.full[[1]]$x, d.ba.samp[[1]]$x))
xmax <- max(c(d.ba.full[[1]]$x, d.ba.samp[[1]]$x))
ymin <- min(c(d.ba.full[[1]]$y, d.ba.samp[[1]]$y))
ymax <- max(c(d.ba.full[[1]]$y, d.ba.samp[[1]]$y))
plot(c(xmin, xmax), c(ymin, ymax), type = "n", ylab = "Density",
xlab = "Basal Area")
grid()
box()
polygon(d.ba.full[[1]]$x, d.ba.full[[1]]$y, border = NA, col = cols.full[2])
lines(d.ba.full[[1]]$x, d.ba.full[[1]]$y, col = cols.full[1])
polygon(d.ba.samp[[1]]$x, d.ba.samp[[1]]$y, border = NA, col = cols.samp[2])
lines(d.ba.samp[[1]]$x, d.ba.samp[[1]]$y, col = cols.samp[1])
# el plot
xmin <- min(c(d.el.full[[1]]$x, d.el.samp[[1]]$x))
xmax <- max(c(d.el.full[[1]]$x, d.el.samp[[1]]$x))
ymin <- min(c(d.el.full[[1]]$y, d.el.samp[[1]]$y))
ymax <- max(c(d.el.full[[1]]$y, d.el.samp[[1]]$y))
plot(c(xmin, xmax), c(ymin, ymax), type = "n", ylab = "Density",
xlab = "Elevation")
grid()
box()
polygon(d.el.full[[1]]$x, d.el.full[[1]]$y, border = NA, col = cols.full[2])
lines(d.el.full[[1]]$x, d.el.full[[1]]$y, col = cols.full[1])
polygon(d.el.samp[[1]]$x, d.el.samp[[1]]$y, border = NA, col = cols.samp[2])
lines(d.el.samp[[1]]$x, d.el.samp[[1]]$y, col = cols.samp[1])
Awesome! Looks like the sample area, despite being a relatively small proportion of the full Monroe Mountain study area (4.9% to be precise), captures variability in some of the most important variables really nicely. If I were to nitpick, I would say that the low end of basal area is oversampled in this dataset. That kind of makes sense – these are likely non-forest areas, that are more likely to be near roads, on moderate terrain, etc.
The moment we’ve all been waiting for – the actual site selection
process. In the past, I have used two different sampling algorithms from
the sgsR library before:
I’m happy to explore both if there is interest, but given the capacity to provide “backup” samples, which may come in very handy in a relatively study area with accessibility limitations, I am going to go with NC for now.
Of course the most important question for this part of the analysis is: how many plots to place with the sample area?
The current plan is to be in the field, collecting data from 7/30 to 8/4 – 6 days in total. It sounded like there would be two fuel measurement crews. Without a detailed understanding of how long these fuel plots take, it is difficult to estimate how many plots each crew could collect in a single day. But, let’s imagine a situation in which each plot takes 2 hours. Add in 30 minutes of travel time between plots for a total of 2.5 hours. With long (10-hour) days, that could yield up to 4 plots per day per crew. So 6 days x 2 crews x 4 plots = 48 plots. In case a couple of the strata are found to be undesirable and to match the number of plots in North Kaibab, I will up this total to 50.
For the sake of this sampling procedure, I will place 50 plots
I will place two points in each cluster, effectively providing one backup for each plot.
# check to see if this has already been run
out.file <- file.path(samp.dir, "sample_pts.shp")
if (!file.exists(out.file)){
# generate strata
set.seed(5757)
sample.rast.mask <- as.numeric(sample.rast.mask)
stratum.rast <- strat_kmeans(sample.rast.mask, 24)
# write to file
out.file <- file.path(samp.dir, "sample_strata.tif")
writeRaster(stratum.rast, out.file, overwrite = T)
# generate samples
set.seed(5757)
samps <- sample_nc(mraster = sample.rast.mask,
nSamp = 24, k = 2)
samps <- vect(samps)
# add plot ids
samps$plot.id <- paste0(samps$kcenter)
samps$plot.id[1:(nrow(samps)/2)] <- paste0(samps$plot.id[1:(nrow(samps)/2)], "A")
samps$plot.id[(nrow(samps)/2+1):(nrow(samps))] <- paste0(samps$plot.id[(nrow(samps)/2+1):(nrow(samps))], "B")
# change periods to underscores for shapefiles
names(samps) <- str_replace_all(names(samps), "\\.", "_")
# write to file
out.file <- file.path(samp.dir, "sample_pts.shp")
writeVector(samps, out.file, overwrite = T)
}
## K-means being performed on 69 layers with 24 centers.
## K-means being performed on 69 layers with 24 centers.
# read the data back in and plot it out
samps <- vect(out.file)
plot(mm)
plot(samps, add = T)
Just to see how well these plots capture the range of conditions present within the study area, I’ll regenerate those same density plots from above, but now I’ll add in vertical lines for each of the plot points. I’ll only add one of the two paired cluster centroid points.
# subset samples
samps.sub <- samps[1:(nrow(samps)/2)]
# set up plot
par(mfrow = c(2,2), las = 1, mar = c(5,5,1,1))
# cc plot
xmin <- min(c(d.cc.full[[1]]$x, d.cc.samp[[1]]$x))
xmax <- max(c(d.cc.full[[1]]$x, d.cc.samp[[1]]$x))
ymin <- min(c(d.cc.full[[1]]$y, d.cc.samp[[1]]$y))
ymax <- max(c(d.cc.full[[1]]$y, d.cc.samp[[1]]$y))
plot(c(xmin, xmax), c(ymin, ymax), type = "n", ylab = "Density",
xlab = "Canopy Cover")
grid()
box()
for (i in seq(1,(nrow(samps)/2))){
abline(v = samps.sub$cc[i], col = "gray")
}
polygon(d.cc.full[[1]]$x, d.cc.full[[1]]$y, border = NA, col = cols.full[2])
lines(d.cc.full[[1]]$x, d.cc.full[[1]]$y, col = cols.full[1])
polygon(d.cc.samp[[1]]$x, d.cc.samp[[1]]$y, border = NA, col = cols.samp[2])
lines(d.cc.samp[[1]]$x, d.cc.samp[[1]]$y, col = cols.samp[1])
# ch plot
xmin <- min(c(d.ch.full[[1]]$x, d.ch.samp[[1]]$x))
xmax <- max(c(d.ch.full[[1]]$x, d.ch.samp[[1]]$x))
ymin <- min(c(d.ch.full[[1]]$y, d.ch.samp[[1]]$y))
ymax <- max(c(d.ch.full[[1]]$y, d.ch.samp[[1]]$y))
plot(c(xmin, xmax), c(ymin, ymax), type = "n", ylab = "Density",
xlab = "Canopy Height")
grid()
box()
for (i in seq(1,(nrow(samps)/2))){
abline(v = samps.sub$p_100[i], col = "gray")
}
polygon(d.ch.full[[1]]$x, d.ch.full[[1]]$y, border = NA, col = cols.full[2])
lines(d.ch.full[[1]]$x, d.ch.full[[1]]$y, col = cols.full[1])
polygon(d.ch.samp[[1]]$x, d.ch.samp[[1]]$y, border = NA, col = cols.samp[2])
lines(d.ch.samp[[1]]$x, d.ch.samp[[1]]$y, col = cols.samp[1])
# add legend
legend("topright", legend = c("Full", "Sample"),
border = c(cols.full[1], cols.samp[1]),
fill = c(cols.full[2], cols.samp[2]))
# ba plot
xmin <- min(c(d.ba.full[[1]]$x, d.ba.samp[[1]]$x))
xmax <- max(c(d.ba.full[[1]]$x, d.ba.samp[[1]]$x))
ymin <- min(c(d.ba.full[[1]]$y, d.ba.samp[[1]]$y))
ymax <- max(c(d.ba.full[[1]]$y, d.ba.samp[[1]]$y))
plot(c(xmin, xmax), c(ymin, ymax), type = "n", ylab = "Density",
xlab = "Basal Area")
grid()
box()
for (i in seq(1,(nrow(samps)/2))){
abline(v = samps.sub$ba_total[i], col = "gray")
}
polygon(d.ba.full[[1]]$x, d.ba.full[[1]]$y, border = NA, col = cols.full[2])
lines(d.ba.full[[1]]$x, d.ba.full[[1]]$y, col = cols.full[1])
polygon(d.ba.samp[[1]]$x, d.ba.samp[[1]]$y, border = NA, col = cols.samp[2])
lines(d.ba.samp[[1]]$x, d.ba.samp[[1]]$y, col = cols.samp[1])
# el plot
xmin <- min(c(d.el.full[[1]]$x, d.el.samp[[1]]$x))
xmax <- max(c(d.el.full[[1]]$x, d.el.samp[[1]]$x))
ymin <- min(c(d.el.full[[1]]$y, d.el.samp[[1]]$y))
ymax <- max(c(d.el.full[[1]]$y, d.el.samp[[1]]$y))
plot(c(xmin, xmax), c(ymin, ymax), type = "n", ylab = "Density",
xlab = "Elevation")
grid()
box()
for (i in seq(1,(nrow(samps)/2))){
abline(v = samps.sub$elevation[i], col = "gray")
}
polygon(d.el.full[[1]]$x, d.el.full[[1]]$y, border = NA, col = cols.full[2])
lines(d.el.full[[1]]$x, d.el.full[[1]]$y, col = cols.full[1])
polygon(d.el.samp[[1]]$x, d.el.samp[[1]]$y, border = NA, col = cols.samp[2])
lines(d.el.samp[[1]]$x, d.el.samp[[1]]$y, col = cols.samp[1])
I want to give everyone the opportunity to look at these plots in a more dynamic way, rather than just these diagnostic, descriptive statistically-driven plots and maps. To that end, I’ll add the plots to a leaflet map with aerial imagery as the backdrop, enabling you to pan and zoom to each plot to get a feel for what types of landscape conditions they represent. All 84 plots will be added to the map, but the primary plots will be labeled with an A and the backups will be labeled with a B. I’ll also buffer the points by 12.5m to create polygons so that when you zoom in you’ll be able to see what a 25m radius circular plot would look like on the ground.
# buffer points
buffs <- buffer(samps, 12.5)
buffs <- st_as_sf(buffs) %>%
st_transform(4326)
# create leaflet map
samps <- st_as_sf(samps) %>%
st_transform(4326)
m <- leaflet(samps) %>%
addProviderTiles(providers$Esri.WorldImagery) %>%
addMarkers(label = ~plot_id, group = "points") %>%
addPolygons(data = buffs, col = "red", group = "polygons", label = buffs$plot_id) %>%
groupOptions("points", zoomLevels = 1:15) %>%
groupOptions("polygons", zoomLevels = 15:20)
m
I wanted to provide two quick summaries of the resulting plot locations. First, I wanted to share how many of the plots fell within the planned burn units.
# define inputs
burn.dir <- file.path(main.dir, "usfs_burn_units")
samps <- vect(file.path(samp.dir, "sample_pts.shp"))
burns <- vect(file.path(burn.dir, "usfs_burn_units_20230714_utm.shp"))
## Warning: [vect] Z coordinates ignored
# split into a and b
samps.a <- samps[grep("A", samps$plot_id),]
samps.b <- samps[grep("B", samps$plot_id),]
# determine how many points are within planned burn units
samps.a.burn <- intersect(samps.a, burns)
samps.b.burn <- intersect(samps.b, burns)
Of the 24 plots in Group A (the primary plots), 7 of them fall within the planned burn units, representing a proportion of 29%.
Of the 24 plots in Group B (the backup plots), 5 of them fall within the planned burn units, representing a proportion of 21%.
Second, I wanted to compare plot locations to LANDFIRE Existing Vegetation Type to see the distribution of plots against vegetation types and life forms.
# define inputs
lf <- rast("S:/ursa/campbell/fasmee/data/landfire/LF2022_EVT_230_CONUS/LC22_EVT_230.tif")
# reproject landfire
lf <- project(lf, "epsg:26912", "near")
# extract pixel values
lf.vals.a <- terra::extract(lf, samps.a)
lf.tab.a <- table(lf.vals.a$EVT_NAME)
lf.tab.a <- lf.tab.a[lf.tab.a > 0]
lf.vals.b <- terra::extract(lf, samps.b)
lf.tab.b <- table(lf.vals.b$EVT_NAME)
lf.tab.b <- lf.tab.b[lf.tab.b > 0]
# join pixel values back to points
samps.a$evt_name <- lf.vals.a$EVT_NAME
samps.b$evt_name <- lf.vals.b$EVT_NAME
# write to file
out.file <- file.path(samp.dir, "sample_pts_a.shp")
writeVector(samps.a, out.file, overwrite = T)
out.file <- file.path(samp.dir, "sample_pts_b.shp")
writeVector(samps.b, out.file, overwrite = T)
# buffer to create plot polygons
buffs.a <- buffer(samps.a, 12.5, 50)
buffs.b <- buffer(samps.b, 12.5, 50)
# write to file
out.file <- file.path(samp.dir, "sample_polys_a.shp")
writeVector(buffs.a, out.file, overwrite = T)
out.file <- file.path(samp.dir, "sample_polys_b.shp")
writeVector(buffs.b, out.file, overwrite = T)
Here are the distributions of vegetation types in Group A
(the primary plots):
Here are the distributions of vegetation types in Group B
(the backup plots):