Monroe Mountain Field Site Sampling (v3)

Update 7/28/2023

Two additional changes were made:

Base the land cover part of the masking process on LANDFIRE Existing Vegetation Type instead of NLCD
Reduce number of points from 50 to 24

This updated Markdown document reflects the changes above.

Update 7/22/2023

After a meeting on 7/6/2023 where I shared the results of the initial site placement, we decided to make a few key changes to the sampling procedure:

Broaden the range of slopes to include areas +/- 30 degrees
Limit the sampling area of interest to a bounding box surrounding planned burn units
Remove the constraint of only sampling within areas that had not changed since 2016
Added hydrography data as a constraint to avoid plots being placed in or in very close proximity to water features
Increased the number of total plots from 42 (which assumed 3.5 plots/day/crew on average) to 50 (which now assumes 4 plots/day/crew + 2 extras)

This updated Markdown document reflects the changes above.

Introduction

In July-August of 2023, we will be collecting field data in the Monroe Mountain area of Fishlake National Forest. The field data will consist of two primary components: (1) fuel structural measurements; and (2) mobile laser scanning. The data will be collected in support of Carlos Silva and Andy Hudak’s recently-awarded Joint Fire Sciences Program grant aimed at using multi-scale, multi-platform lidar to characterize fuel structure throughout the western US. The data will also be useful for the Fire and Smoke Model Evaluation Experiment (FASMEE) project, supporting fuel structure and consumption mapping efforts.

Field data will be collected in a series of new plots, distributed throughout Monroe Mountain and aimed at capturing a wide range of fuel types, structures, and conditions. Accordingly, the objective of this document and the associated analysis is to devise and execute a sampling strategy for placing field plots within the Monroe Mountain area of Fishlake National Forest. This procedure will be driven by three main types of data:

Vegetation structure
- Derived from a 2016 ALS dataset
- Using a suite of point cloud metrics aimed at capturing important vegetation structural characteristics, from Campbell et al. (2021).
Vegetation spectral response
- Derived from 2022 Landsat imagery
- Aimed at capturing variation in vegetation type and condition that may not be captured by ALS alone
Topography
- Derived from USGS 3DEP DEM data
- These DEMs are derived from ALS data, but have undergone USGS’s rigorous QA/QC procedures, hence using these instead of the ALS data directly

The sampling scheme will be driven by the recently-developed Structurally-Guided Sampling (sgsR) R library by Goodbody et al., which provides a range of spatially-explicit statistical sampling algorithms designed specifically for these types of endeavors.

1. Vegetation structure

In this step, I will generate a suite of vegetation structural metrics from height-normalized airborne lidar data. I will use a pretty standard set of metrics I have been using for a number of years for various vegetation structural mapping projects. For more details on what these metrics represent, please see Table 1. in Campbell et al. (2021).

# load necessary libraries
library(moments)
library(lidR)
library(future)
library(terra)
library(rmarkdown)
library(randomForest)
library(RColorBrewer)
library(sgsR)
library(leaflet)
library(sf)
library(stringr)
library(foreign)

# define directory structure
main.dir <- "S:/ursa/campbell/fasmee/data"
als.dir <- file.path(main.dir, "lidar_data/monroe_mtn_lidar")
als.in.dir <- file.path(als.dir, "a04_height")
als.out.dir <- file.path(als.dir, "a05_rasters")

# define output file
out.file <- file.path(als.out.dir, "veg_mets.tif")

# check if this step has already been run
if (!file.exists(out.file)){
  
  # begin parallelization
  n.cores <- 24L
  set_lidr_threads(n.cores)
  plan(multisession, workers = n.cores)
  
  # read in the las catalog
  ctg <- readLAScatalog(als.in.dir)
  
  # define pixel metrics function
  pixel.metrics <- function(z,r,c){
    n <- length(z)
    n.first <- length(which(r == 1L))
    n.first.bh <- length(which(r == 1L & z >= 1.37))
    cc <- list(cc = n.first.bh / n.first)
    n.bh <- length(which(z >= 1.37))
    cd <- list(cd = n.bh / n)
    ps <- seq(0,1,0.1)
    p <- as.list(stats::quantile(z[c != 2L],ps))
    names(p) <- paste0("p.",ps*100)
    mh <- list(mh = mean(z))
    mh.ag <- list(mh.ag = mean(z[c != 2L]))
    sd <- list(sd = sd(z))
    sd.ag <- list(sd.ag = sd(z[c != 2L]))
    skew <- list(skew = moments::skewness(z))
    skew.ag <- list(skew.ag = moments::skewness(z[c != 2L]))
    kurt <- list(kurt = moments::kurtosis(z))
    kurt.ag <- list(kurt.ag = moments::kurtosis(z[c != 2L]))
    h <- c(0,0.5,1,2,4,8,16,32)
    vrd <- list()
    for (j in seq(1,length(h)-1)){
      lo <- h[j]
      hi <- h[j+1]
      n.slice <- length(which(z >= lo & z < hi))
      slice.name <- paste0("vrd.",lo * 100,".",hi * 100)
      vrd[[slice.name]] <- n.slice / n
    }
    vnrd <- list()
    for (j in seq(1,length(h)-1)){
      lo <- h[j]
      hi <- h[j+1]
      n.slice <- length(which(z >= lo & z < hi))
      n.slice.below <- length(which(z < hi))
      slice.name <- paste0("vnrd.",lo * 100,".",hi * 100)
      vnrd[[slice.name]] <- n.slice / n.slice.below
    }
    metrics <- c(cc, cd, p, mh, mh.ag, sd, sd.ag, skew, skew.ag, kurt, kurt.ag, vrd, vnrd)
    return(metrics)
  }

  # generate metrics
  veg.mets <- grid_metrics(ctg, 
                           ~pixel.metrics(Z,ReturnNumber,Classification),
                            res = 30)
  veg.mets <- rast(veg.mets)
  writeRaster(veg.mets, out.file, T)
  
  # end parallelization
  plan(sequential)
  
}

# read in the metrics as a rast
veg.mets <- rast(out.file)

Note that in the above code, the “best” way to generate the metrics would be to use the lidR::pixel_metrics() function, as it is the modern replacement for the lidR::grid_metrics() function, which I wound up using. However, it seems there is a bug somewhere in either the lidR::pixel_metrics() function that was producing erroneous outputs. No big deal, basically the same function – just had to convert the raster object to a terra object to modernize the output of grid_metrics().

OK, let’s take a look at the vegetation metrics results. In all, the resulting raster stack of metrics has 35 variables. Here are the variable names:

# print variable names
print(names(veg.mets))

##  [1] "cc"             "cd"             "p.0"            "p.10"          
##  [5] "p.20"           "p.30"           "p.40"           "p.50"          
##  [9] "p.60"           "p.70"           "p.80"           "p.90"          
## [13] "p.100"          "mh"             "mh.ag"          "sd"            
## [17] "sd.ag"          "skew"           "skew.ag"        "kurt"          
## [21] "kurt.ag"        "vrd.0.50"       "vrd.50.100"     "vrd.100.200"   
## [25] "vrd.200.400"    "vrd.400.800"    "vrd.800.1600"   "vrd.1600.3200" 
## [29] "vnrd.0.50"      "vnrd.50.100"    "vnrd.100.200"   "vnrd.200.400"  
## [33] "vnrd.400.800"   "vnrd.800.1600"  "vnrd.1600.3200"

Let’s take a look at a handful of predictors:

# plot veg.mets
plot(veg.mets)

Cool, looks good. Let’s move on to vegetation spectral response…

2. Vegetation spectral response

As mentioned above, vegetation spectral response will be derived from a single, 2022 Landsat 9 OLI2 image. Ideally, we’d have 2023 data to work with but, to date, there are no available images from 2023 that are cloud-free over Monroe Mountain. I considered using an Earth Engine-derived summer composite, instead of a single scene, but nothing beats the spectral integrity of a single, cloud-free image acquired directly from the USGS. So, for the sake of deriving high-quality spectral information that is temporally consistent across the entire study area, I opted for a single scene:

LC09_L1TP_038033_20220610_20230413_02_T1
- Path 38, Row 33
- Acquisition date: 2022-06-10
- Tier 1, Collection 2

This was the same image I used as the basis of identifying non-change areas (discussed later), so I already have the data processed to surface reflectance, a suite of vegetation indices derived, and everything subset to a bounding box around Monroe Mountain. Below, I’ll read the data in and plot out a few layers for illustrative purposes.

# define directory structure
ls.dir <- file.path(main.dir, "landsat")
ls.2022.dir <- file.path(ls.dir, "ls_20220610")

# list and print the individual landsat layers
ls.files <- list.files(ls.2022.dir, "*.tif$", full.names = T)
print(ls.files)

##  [1] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_b1.tif"   
##  [2] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_b2.tif"   
##  [3] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_b3.tif"   
##  [4] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_b4.tif"   
##  [5] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_b5.tif"   
##  [6] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_b6.tif"   
##  [7] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_b7.tif"   
##  [8] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_evi.tif"  
##  [9] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_msavi.tif"
## [10] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_nbr.tif"  
## [11] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_nbr2.tif" 
## [12] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_ndmi.tif" 
## [13] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_ndvi.tif" 
## [14] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_nirv.tif" 
## [15] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_savi.tif" 
## [16] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_tcb.tif"  
## [17] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_tcg.tif"  
## [18] "S:/ursa/campbell/fasmee/data/landsat/ls_20220610/ls_20220610_tcw.tif"

# read the data in
ls <- rast(ls.files)

# fix the layer names
rename.fun <- function(x){
  splt <- strsplit(x, "/")[[1]]
  last <- splt[length(splt)]
  splt <- strsplit(last, "\\.")[[1]]
  name <- splt[1]
  return(name)
}
names <- unlist(lapply(ls.files, rename.fun))
names(ls) <- names

# plot out the first few layers
plot(ls)

Now, I’ll do some minor housekeeping… Eventually, all of these data will have to be combined into a single, multi-layer raster dataset. Accordingly, everything will have to be in the same coordinate system/projection, have the same extent and origin (pixel grid alignment). The ALS-derived vegetation metrics data are already generated at a desirable extent (a couple-hundred meter buffer around Monroe Mountain), so the Landsat data will have to be clipped to match that. However, unfortunately, the ALS data have a somewhat peculiar projection, so I’ll first have to reproject it into a “vanilla” version of the projection they are currently in (NAD83 UTM Zone 12N, or EPSG:26912). Once the data are aligned, I’ll add the vegetation metrics and image data together into a single stack called sample.rast. As this takes a while to run, and I’m running this several times throughout the course of creating this document, I’ll couch all of this inside a conditional statement that checks if this merging has already been done.

# define directory structure
field.dir <- file.path(main.dir, "field_data")
samp.dir <- file.path(field.dir, "sampling")

# check if this has already been run
out.file <- file.path(samp.dir, "sample_rast_vegmets_ls.tif")
if (!file.exists(out.file)){
  
  # reproject veg.mets
  veg.mets <- project(veg.mets, "epsg:26912")
  
  # create a cropping mask
  m <- ifel(is.na(veg.mets[[1]]), NA, 1)
  
  # reproject and crop ls
  ls <- project(ls, m, align = T)
  ls <- crop(ls, m, mask = T)
  
  # reassign names (since they annoyingly get dropped)
  names(ls) <- names
  
  # merge veg.mets and ls
  sample.rast <- c(veg.mets, ls)
  
  # write to file
  writeRaster(sample.rast, out.file, T)
  
}

# read the data back in and print the layer names
sample.rast <- rast(out.file)
print(names(sample.rast))

##  [1] "cc"                "cd"                "p.0"              
##  [4] "p.10"              "p.20"              "p.30"             
##  [7] "p.40"              "p.50"              "p.60"             
## [10] "p.70"              "p.80"              "p.90"             
## [13] "p.100"             "mh"                "mh.ag"            
## [16] "sd"                "sd.ag"             "skew"             
## [19] "skew.ag"           "kurt"              "kurt.ag"          
## [22] "vrd.0.50"          "vrd.50.100"        "vrd.100.200"      
## [25] "vrd.200.400"       "vrd.400.800"       "vrd.800.1600"     
## [28] "vrd.1600.3200"     "vnrd.0.50"         "vnrd.50.100"      
## [31] "vnrd.100.200"      "vnrd.200.400"      "vnrd.400.800"     
## [34] "vnrd.800.1600"     "vnrd.1600.3200"    "ls_20220610_b1"   
## [37] "ls_20220610_b2"    "ls_20220610_b3"    "ls_20220610_b4"   
## [40] "ls_20220610_b5"    "ls_20220610_b6"    "ls_20220610_b7"   
## [43] "ls_20220610_evi"   "ls_20220610_msavi" "ls_20220610_nbr"  
## [46] "ls_20220610_nbr2"  "ls_20220610_ndmi"  "ls_20220610_ndvi" 
## [49] "ls_20220610_nirv"  "ls_20220610_savi"  "ls_20220610_tcb"  
## [52] "ls_20220610_tcg"   "ls_20220610_tcw"

Excellent! Let’s move on to topography…

3. Topography

To capture a range of topographic conditions, I’ll grab DEM data from the USGS 3DEP. One might wonder “why not just use the ALS data?” which would be a very valid question. My answer is simply that the USGS is already using the ALS data to generate their DEMs (since the Monroe Mountain dataset is included in 3DEP), and they do a bunch of additional processing, QA/QC’ing of the data prior to producing their DEMs. So, I trust theirs more than I trust my own ability to generate a clean DEM. And, since we’re working at 30m resolution anyway, it doesn’t really benefit us to have the high spatial precision of the ALS data.

DEMs are available in 1x1 degree tiles, and Monroe Mountain sits right between two tiles:

n39w112
n39w113

So, I’ll download the latest version of those two tiles, at 1 arc-second (~30m) resolution from the National Map. I’ll proceed to mosaic and reproject them to match the projection of sample.rast. I’ll then crop it to an extent that is somewhat larger than the sample.rast, so as to minimizing processing time while still avoiding edge effects for focal terrain metrics that rely on a neighborhood around each pixel (e.g., slope, topographic position, etc.).

# define directory structure
dem.dir <- file.path(main.dir, "dem/res_30m")
raw.dir <- file.path(dem.dir, "a01_raw")

# check if this has already been run
out.file <- file.path(dem.dir, "elevation.tif")
if (!file.exists(out.file)){
  
  # mosaic the tiles together
  tile.files <- list.files(raw.dir, "*.tif$", full.names = T)
  tile.rasts <- lapply(tile.files, rast)
  tile.sprc <- sprc(tile.rasts)
  elev <- mosaic(tile.sprc)

  # reproject
  elev <- project(elev, sample.rast, align = T)
  
  # crop it to a large buffer around the extent of sample.rast
  m <- ifel(is.na(sample.rast[[1]]), NA, 1)
  buff <- buffer(m, 1000)
  elev <- crop(elev, buff)

  # change name
  names(elev) <- "elevation"
  
  # write to file
  writeRaster(elev, out.file, T)
  
}

# read in the data and plot it out
elev <- rast(out.file)
plot(elev)

Now, I’ll generate a suite of terrain metrics. Some of them are easily-generated using the terra::terrain() function, others I will generate in a somewhat more manual way. Among the latter type, you will notice my creation of a “donut” function – this is aimed at calculating topographic position index (TPI), which is the difference between a cell’s elevation and the mean of a donut (or, more technically “annulus”) surrounding that cell with a given inner and outer radius. For my donuts, the inner radius is always half the outer radius.

# slope
out.file <- file.path(dem.dir, "slope.tif")
if (!file.exists(out.file)){
  slope <- terrain(elev, "slope")
  names(slope) <- "slope"
  writeRaster(slope, out.file, T)
}
slope <- rast(out.file)

# aspect
out.file <- file.path(dem.dir, "aspect.tif")
if (!file.exists(out.file)){
  aspect <- terrain(elev, "aspect")
  names(aspect) <- "aspect"
  writeRaster(aspect, out.file, T)
}
aspect <- rast(out.file)

# cos(aspect)
out.file <- file.path(dem.dir, "aspect_cos.tif")
if (!file.exists(out.file)){
  aspect.cos <- cos(aspect * pi / 180)
  names(aspect.cos) <- "aspect_cos"
  writeRaster(aspect.cos, out.file, T)
}
aspect.cos <- rast(out.file)

# sin(aspect)
out.file <- file.path(dem.dir, "aspect_sin.tif")
if (!file.exists(out.file)){
  aspect.sin <- sin(aspect * pi / 180)
  names(aspect.sin) <- "aspect_sin"
  writeRaster(aspect.sin, out.file, T)
}
aspect.sin <- rast(out.file)

# slope x cos(aspect)
out.file <- file.path(dem.dir, "slope_aspect_cos.tif")
if (!file.exists(out.file)){
  slope.aspect.cos <- slope * aspect.cos
  names(slope.aspect.cos) <- "slope_aspect_cos"
  writeRaster(slope.aspect.cos, out.file, T)
}
slope.aspect.cos <- rast(out.file)

# slope x sin(aspect)
out.file <- file.path(dem.dir, "slope_aspect_sin.tif")
if (!file.exists(out.file)){
  slope.aspect.sin <- slope * aspect.sin
  names(slope.aspect.sin) <- "slope_aspect_sin"
  writeRaster(slope.aspect.sin, out.file, T)
}
slope.aspect.sin <- rast(out.file)

# tri.1
out.file <- file.path(dem.dir, "tri_1.tif")
if (!file.exists(out.file)){
  tri.1 <- terrain(elev, "TRI")
  names(tri.1) <- "tri_1"
  writeRaster(tri.1, out.file, T)
}
tri.1 <- rast(out.file)

# tri.2
out.file <- file.path(dem.dir, "tri_2.tif")
if (!file.exists(out.file)){
  tri.2 <- terrain(elev, "TRIriley")
  names(tri.2) <- "tri_2"
  writeRaster(tri.2, out.file, T)
}
tri.2 <- rast(out.file)

# tri.3
out.file <- file.path(dem.dir, "tri_3.tif")
if (!file.exists(out.file)){
  tri.3 <- terrain(elev, "TRIrmsd")
  names(tri.3) <- "tri_3"
  writeRaster(tri.3, out.file, T)
}
tri.3 <- rast(out.file)

# roughness
out.file <- file.path(dem.dir, "roughness.tif")
if (!file.exists(out.file)){
  roughness <- terrain(elev, "roughness")
  names(roughness) <- "roughness"
  writeRaster(roughness, out.file, T)
}
roughness <- rast(out.file)

# define focal donut function
donut <- function(inner,outer){
  r <- rast(xmin = outer * (-1) - 0.5,
            xmax = outer + 0.5,
            ymin = outer * (-1) - 0.5,
            ymax = outer + 0.5,
            crs = "EPSG:5070",
            resolution = 1)
  values(r) <- 1
  pt <- vect(data.frame(x = 0, y = 0), geom = c("x", "y"), crs = "EPSG:5070")
  buff.in <- buffer(pt, inner)
  buff.out <- buffer(pt, outer)
  buff.out$rastval <- 1
  don <- erase(buff.out, buff.in)
  r <- crop(r, don, mask = T)
  m <- as.matrix(r, wide = T)
  m[is.nan(m)] <- NA
  m
}

# tpi.02
out.file <- file.path(dem.dir, "tpi_02.tif")
if (!file.exists(out.file)){
  focal.mat <- donut(1,2)
  tpi.02 <- elev - focal(elev, focal.mat, mean, na.rm = T)
  names(tpi.02) <- "tpi_02"
  writeRaster(tpi.02, out.file, T)
}
tpi.02 <- rast(out.file)

# tpi.04
out.file <- file.path(dem.dir, "tpi_04.tif")
if (!file.exists(out.file)){
  focal.mat <- donut(2,4)
  tpi.04 <- elev - focal(elev, focal.mat, mean, na.rm = T)
  names(tpi.04) <- "tpi_04"
  writeRaster(tpi.04, out.file, T)
}
tpi.04 <- rast(out.file)

# tpi.08
out.file <- file.path(dem.dir, "tpi_08.tif")
if (!file.exists(out.file)){
  focal.mat <- donut(4,8)
  tpi.08 <- elev - focal(elev, focal.mat, mean, na.rm = T)
  names(tpi.08) <- "tpi_08"
  writeRaster(tpi.08, out.file, T)
}
tpi.08 <- rast(out.file)

# tpi.16
out.file <- file.path(dem.dir, "tpi_16.tif")
if (!file.exists(out.file)){
  focal.mat <- donut(8,16)
  tpi.16 <- elev - focal(elev, focal.mat, mean, na.rm = T)
  names(tpi.16) <- "tpi_16"
  writeRaster(tpi.16, out.file, T)
}
tpi.16 <- rast(out.file)

# combine into a multi-layer raster and plot
topo <- c(elev, slope, aspect.cos, aspect.sin, slope.aspect.cos,
          slope.aspect.sin, tri.1, tri.2, tri.3, roughness, tpi.02, tpi.04,
          tpi.08, tpi.16)
plot(topo)

Lastly, I’ll crop the topographic predictor stack to match the extent of sample.rast and merge the two.

# check if this has already been run
out.file <- file.path(samp.dir, "sample_rast_vegmets_ls_topo.tif")
if (!file.exists(out.file)){
  
  # get list of names before they get lost
  names <- names(topo)
  
  # crop topo
  m <- ifel(is.na(sample.rast[[1]]), NA, 1)
  topo <- crop(topo, m, mask = T)
  
  # reassign names (since they annoyingly get dropped)
  names(topo) <- names
  
  # merge sample.rast and topo
  sample.rast <- c(sample.rast, topo)
  
  # write to file
  writeRaster(sample.rast, out.file, T)
  
}

# read the data back in and print the layer names
sample.rast <- rast(out.file)
print(names(sample.rast))

##  [1] "cc"                "cd"                "p.0"              
##  [4] "p.10"              "p.20"              "p.30"             
##  [7] "p.40"              "p.50"              "p.60"             
## [10] "p.70"              "p.80"              "p.90"             
## [13] "p.100"             "mh"                "mh.ag"            
## [16] "sd"                "sd.ag"             "skew"             
## [19] "skew.ag"           "kurt"              "kurt.ag"          
## [22] "vrd.0.50"          "vrd.50.100"        "vrd.100.200"      
## [25] "vrd.200.400"       "vrd.400.800"       "vrd.800.1600"     
## [28] "vrd.1600.3200"     "vnrd.0.50"         "vnrd.50.100"      
## [31] "vnrd.100.200"      "vnrd.200.400"      "vnrd.400.800"     
## [34] "vnrd.800.1600"     "vnrd.1600.3200"    "ls_20220610_b1"   
## [37] "ls_20220610_b2"    "ls_20220610_b3"    "ls_20220610_b4"   
## [40] "ls_20220610_b5"    "ls_20220610_b6"    "ls_20220610_b7"   
## [43] "ls_20220610_evi"   "ls_20220610_msavi" "ls_20220610_nbr"  
## [46] "ls_20220610_nbr2"  "ls_20220610_ndmi"  "ls_20220610_ndvi" 
## [49] "ls_20220610_nirv"  "ls_20220610_savi"  "ls_20220610_tcb"  
## [52] "ls_20220610_tcg"   "ls_20220610_tcw"   "elevation"        
## [55] "slope"             "aspect_cos"        "aspect_sin"       
## [58] "slope_aspect_cos"  "slope_aspect_sin"  "tri_1"            
## [61] "tri_2"             "tri_3"             "roughness"        
## [64] "tpi_02"            "tpi_04"            "tpi_08"           
## [67] "tpi_16"

Splendid! Moving on…

Experimental aside

While recognizing that the primary goal of this document is to select a set of field plot locations, I wanted to quickly test something out that’s not entirely unrelated. Ryan McCarley sent me a dataset containing plot locations for 100 stand exam plots the US Forest Service collected, apparently in 2017. The data were shared with the intent of being useful for this site selection process, so I thought I’d give it a shot. Since they’re just a sparse sample of plots, they would only be directly useful if we were to try to collect the exact same plot locations. But, perhaps through some quick predictive modeling, we can use them to drive a wall-to-wall prediction of some sort that might add value to this site selection process.

With that in mind, I’ll first read in and explore the data a bit.

# define directory structure
usfs.dir <- file.path(main.dir, "usfs_boundaries")
ryan.dir <- file.path(main.dir, "from_ryan/for_mickey_062223")

# read in the data
plots <- vect(file.path(ryan.dir, "fs2017_ba.shp"))

# read in monroe mountain boundary for reference
mm <- vect(file.path(usfs.dir, "monroe_mountain_section.shp"))

# get everything in the same coordinate system
plots <- project(plots, sample.rast)
mm <- project(mm, sample.rast)

# plot the data out
plot(mm)
points(plots, pch = 21, bg = "blue", cex = 1)

# explore the data
paged_table(as.data.frame(plots))

Cool! So, I’m not exactly sure what the units are for basal area in these plots, but let’s assume they’re at least directly comparable to one another. Under that assumption, I think there’s probably two maximally-useful metrics we can derive here (and ultimately use to try to make predictions):

Total basal area
- Which should be highly correlated with tree fuel loads
Dominant species
- Which would also be a useful stratum

# define species ba columns
sp.cols <- names(plots)[!names(plots) %in% c("Comment", "Plot_Num")]

# calculate total basal area
plots$ba.total <- rowSums(plots[[sp.cols]], na.rm = T)

# remove rows without basal area
plots <- plots[plots$ba.total > 0,]

# get most abundant species
sp.dom <- unlist(apply(as.data.frame(plots[sp.cols]), 1, function(x) sp.cols[which.max(x)]))
plots$sp.dom <- sp.dom

# remove unncessary fields
plots <- plots[,c("sp.dom", "ba.total")]

# print out the results
paged_table(as.data.frame(plots))

# plot out the results
hist(plots$ba.total)

barplot(table(plots$sp.dom), las = 2)

Clearly there’s a nice distribution of basal areas. Not a very even distribution of species dominance, as we might expect, but let’s try modeling both variables anyway. I’ll do this in a pretty “dumb” way, by just extracting pixel values from sample.rast at the plot locations, and running an untuned random forest. This will at least provide a sense of if there’s any hope whatsoever. Obviously some tweaks, such as model tuning, proper training/test splits, using Landsat data that were from when the plots were collected, etc. etc. etc. would improve results. But, we can explore that later if it turns out this simple approach shows any promise.

First, I’ll try modeling species dominance…

# extract pixel values from the sample raster at each plot
model.df <- extract(sample.rast, plots, "mean", exact = T)
model.df <- cbind(plots[c("sp.dom", "ba.total")], model.df[,seq(2,ncol(model.df))])

# remove columns with nas
na.sums <- apply(as.data.frame(model.df), 2, function(x) sum(is.na(x)))
model.df <- model.df[,-which(na.sums > 0)]

# coerce sp.dom to factor
model.df$sp.dom <- as.factor(model.df$sp.dom)

#---------------sp.dom modeling

# create modeling data frame
x.cols <- seq(3,ncol(model.df))
y.col <- 1
mod.df.sp.dom <- as.data.frame(model.df[,c(y.col, x.cols)])

# build random forest model
rf.sp.dom <- randomForest(sp.dom ~ ., data = mod.df.sp.dom)
rf.sp.dom

## 
## Call:
##  randomForest(formula = sp.dom ~ ., data = mod.df.sp.dom) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 8
## 
##         OOB estimate of  error rate: 37.78%
## Confusion matrix:
##       ABCO ABLA JUSC2 PIED PIEN PIPU POTR5 PSME QUGA class.error
## ABCO     1    3     0    0    0    0     5    0    0   0.8888889
## ABLA     2   10     0    0    1    0     3    1    0   0.4117647
## JUSC2    0    0     6    0    0    0     0    0    0   0.0000000
## PIED     0    0     1    1    0    0     0    0    0   0.5000000
## PIEN     1    3     0    0    1    0     2    0    0   0.8571429
## PIPU     0    0     0    0    0    0     1    0    0   1.0000000
## POTR5    3    1     0    0    0    0    37    1    0   0.1190476
## PSME     1    2     0    0    0    0     2    0    0   1.0000000
## QUGA     0    0     1    0    0    0     0    0    0   1.0000000

Not altogether awful! Of course, not great by any means. Let’s try modeling total basal area:

#---------------ba.total modeling

# create modeling data frame
x.cols <- seq(3,ncol(model.df))
y.col <- 2
mod.df.ba.total <- as.data.frame(model.df[,c(y.col, x.cols)])

# build random forest model
rf.ba.total <- randomForest(ba.total ~ ., data = mod.df.ba.total)
rf.ba.total

## 
## Call:
##  randomForest(formula = ba.total ~ ., data = mod.df.ba.total) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 22
## 
##           Mean of squared residuals: 1847.465
##                     % Var explained: 58.24

# plot predicted vs. observed and get model performance metrics
x <- mod.df.ba.total$ba.total
y <- predict(rf.ba.total)
plot(y ~ x,
     xlim = c(min(c(x,y)), max(c(x,y))),
     ylim = c(min(c(x,y)), max(c(x,y))),
     ylab = "Predicted", xlab = "Observed")
lines(x = c(-1000,1000), y = c(-1000,1000))
mod <- lm(y ~ x)
abline(mod, lwd = 2, col = "red")
r2 <- round(summary(mod)$adj.r.squared,2)
rmse <- round(sqrt(mean((y - x)^2)),2)
legend("topleft",
       legend = c(paste0("R2 = ", r2),
                  paste0("RMSE = ", rmse)),
       bty = "n", x.intersp = 0, text.col = "red")

Not bad! It’s not entirely clear how much these two predictive models might add to the site selection process, but for the sake of completeness, let’s make some predictive maps and add them to the suite of spatial variables.

# create a subset of sample.rast with just the non-na layers
sample.rast.nona <- sample.rast[[names(sample.rast) %in% names(model.df)]]

# check to see if this has already been run
out.file <- file.path(ryan.dir, "sp_dom.tif")
if (!file.exists(out.file)){
  
  # make ba.total prediction map
  pf <- function(mod, dat, ...){
    library(randomForest)
    predict(mod, dat, ...)
  }
  sp.dom.rast <- terra::predict(sample.rast.nona, rf.sp.dom,
                                fun = pf, na.rm = T, cores = 8L)
  names(sp.dom.rast) <- "sp_dom"
  writeRaster(sp.dom.rast, out.file, T)
  
}

# read the data back in and plot it out
sp.dom.rast <- rast(out.file)
plot(sp.dom.rast, col = brewer.pal(9, "Set1"))

# check to see if this has already been run
out.file <- file.path(ryan.dir, "ba_total.tif")
if (!file.exists(out.file)){
  
  # make ba.total prediction map
  pf <- function(mod, dat, ...){
    library(randomForest)
    predict(mod, dat, ...)
  }
  ba.total.rast <- terra::predict(sample.rast.nona, rf.ba.total, 
                                  fun = pf, na.rm = T, cores = 8L)
  names(ba.total.rast) <- "ba_total"
  writeRaster(ba.total.rast, out.file, T)
  
}

# read the data back in and plot it out
ba.total.rast <- rast(out.file)
plot(ba.total.rast)

# add these two new layers to sample.rast
out.file <- file.path(samp.dir, "sample_rast_vegmets_ls_topo_pred.tif")
if (!file.exists(out.file)){
  sample.rast <- c(sample.rast, sp.dom.rast, ba.total.rast)
  writeRaster(sample.rast, out.file, T)
}
sample.rast <- rast(out.file)
print(names(sample.rast))

##  [1] "cc"                "cd"                "p.0"              
##  [4] "p.10"              "p.20"              "p.30"             
##  [7] "p.40"              "p.50"              "p.60"             
## [10] "p.70"              "p.80"              "p.90"             
## [13] "p.100"             "mh"                "mh.ag"            
## [16] "sd"                "sd.ag"             "skew"             
## [19] "skew.ag"           "kurt"              "kurt.ag"          
## [22] "vrd.0.50"          "vrd.50.100"        "vrd.100.200"      
## [25] "vrd.200.400"       "vrd.400.800"       "vrd.800.1600"     
## [28] "vrd.1600.3200"     "vnrd.0.50"         "vnrd.50.100"      
## [31] "vnrd.100.200"      "vnrd.200.400"      "vnrd.400.800"     
## [34] "vnrd.800.1600"     "vnrd.1600.3200"    "ls_20220610_b1"   
## [37] "ls_20220610_b2"    "ls_20220610_b3"    "ls_20220610_b4"   
## [40] "ls_20220610_b5"    "ls_20220610_b6"    "ls_20220610_b7"   
## [43] "ls_20220610_evi"   "ls_20220610_msavi" "ls_20220610_nbr"  
## [46] "ls_20220610_nbr2"  "ls_20220610_ndmi"  "ls_20220610_ndvi" 
## [49] "ls_20220610_nirv"  "ls_20220610_savi"  "ls_20220610_tcb"  
## [52] "ls_20220610_tcg"   "ls_20220610_tcw"   "elevation"        
## [55] "slope"             "aspect_cos"        "aspect_sin"       
## [58] "slope_aspect_cos"  "slope_aspect_sin"  "tri_1"            
## [61] "tri_2"             "tri_3"             "roughness"        
## [64] "tpi_02"            "tpi_04"            "tpi_08"           
## [67] "tpi_16"            "sp_dom"            "ba_total"

Masking

OK, so we now have this cool 69-layer predictor stack that should capture a wide range of variability in fuel types, conditions, and loads. Prior to running the structurally-guided sampling, we need to mask out areas that we do not want to or cannot sample. Here is a list of the landscape characteristics that we do want to sample:

Within planned burn unit bounding box
Slope < 30 deg
25 m < Distance from roads < 175 m
Public lands
Vegetated land cover
Non-water areas
Non-riparian areas

1. Within planned burn unit bounding box

Monroe Mountain has an extensive recent history of prescribed fire – partly through its association with FASMEE. In addition to recent past burns, there are a host of future burns planned in the area. This represents a valuable remeasurement opportunity to characterize fuel consumption. Accordingly, it will be useful to consider the extent of planned burn units in the placement of fuel plots. However, to only consider areas that are slated for future burns would be too limiting, as it would focus heavily on areas with high fuel loads and forested areas, perhaps ignoring valuable areas that have more “desirable” fuel conditions, such as those that have already burned recently. Accordingly, we will use a bounding box surrounding the planned burn units as a constraint, which will ensure that a large number of plots falls within the burn units, while also allowing plots to fall in nearby areas that represent a broader set of fuel structural conditions.

# check to see if this has already been run
out.file <- file.path(samp.dir, "mask_bu.tif")
if (!file.exists(out.file)){
  
  # define directory structure
  burn.dir <- file.path(main.dir, "usfs_burn_units")

  # read in the data
  burn.file <- file.path(burn.dir, "usfs_burn_units_20230714_utm_box.shp")
  burn.vect <- vect(burn.file)
  
  # add rasterization field
  burn.vect$rast.val <- 1
  
  # rasterize burn vect
  burn.rast <- rasterize(burn.vect, sample.rast, "rast.val")
  
  # create a copy of it and call it 'combined.mask'
  combined.mask <- burn.rast
  
  # write to file
  writeRaster(combined.mask, out.file, T)
  
}

# read the data back in and plot it out
combined.mask <- rast(out.file)
plot(combined.mask)

2. Slope < 30 deg

This is for accessibility purposes and practicality of fuel plot measurement and MLS scanning. In a perfect world, we would sample steep slopes, as they might possess unique fuel conditions. But, in our imperfect, more pragmatic world, this is a necessary step. This can be easily derived from the existing slope layer of the sample.rast.

# create a function for identifying the smallest combined extent between two rasts
min.ext.fun <- function(a,b){
  xmin <- max(c(xmin(a), xmin(b)))
  xmax <- min(c(xmax(a), xmax(b)))
  ymin <- max(c(ymin(a), ymin(b)))
  ymax <- min(c(ymax(a), ymax(b)))
  e <- ext(xmin, xmax, ymin, ymax)
  return(e)
}

# check to see if this has already been run
out.file <- file.path(samp.dir, "mask_bu_sl.tif")
if (!file.exists(out.file)){

  # isolate the slope layer from sample.rast
  slope.rast <- sample.rast[["slope"]]
  
  # threshold it
  slope.rast <- ifel(slope.rast < 30, 1, NA)
  
  # plot it out
  plot(slope.rast)
  
  # crop both rasters to the minimum extent
  cropper <- min.ext.fun(slope.rast, combined.mask)
  slope.rast <- crop(slope.rast, cropper)
  combined.mask <- crop(combined.mask, cropper)
  
  # merge them
  combined.mask <- slope.rast + combined.mask
  combined.mask <- ifel(combined.mask == 2, 1, NA)
  
  # write to file
  writeRaster(combined.mask, out.file, T)

}

# read the data back in and plot
combined.mask <- rast(out.file)
plot(combined.mask)

3. 25 m < Distance from roads < 175 m

This is for accessibility purposes – we don’t want to be too close to a road (>25m) but we also don’t want to be too far from a road (<175m). Theoretically, this this can be done from within the sgsR sampling functions, but since I’ll be doing a whole bunch of other masking, I’ll just lump it into this step. There are several potential roads datasets that can be used, including:

The first two appear to only have USFS roads (i.e., misses main state highways, etc.). So I’ll go with the FSTopo roads. Since we’ll be using high-clearance vehicles to access plots, I’m going to be fairly liberal with what I’m willing to consider an accessible road. The only class I will remove is that of “trails”. All other roads will be considered passible (which is almost certainly not the case, but in my experience USFS road quality attribution is questionable at best anyway).

# check to see if this has already been run
out.file <- file.path(samp.dir, "mask_bu_sl_rd.tif")
if (!file.exists(out.file)){

  # define directory structure
  road.gdb <- file.path(main.dir, "usfs_roads/S_USA.FSTopo_Transport_LN.gdb")
  
  # read in the data
  roads <- vect(road.gdb, "FSTopo_Transport_LN_monroe")
  
  # reproject to match sample.rast
  roads <- project(roads, sample.rast)
  
  # read in road classification lookup table and join to data
  road.lut <- read.csv(file.path(main.dir, "usfs_roads/FCSubTypes-by-Feature-Class.csv"))
  colnames(road.lut) <- toupper(colnames(road.lut))
  roads <- merge(roads, road.lut)
  
  # print out unique road types
  print(unique(as.data.frame(roads[,c("FCSUBTYPE", "DESCRIPTION")])))
  
  # remove trails, but keep all other road types
  roads <- roads[roads$FCSUBTYPE != 107,]
  
  # create inner and outer buffers, and get difference
  buff.inner <- buffer(roads, 25)
  buff.outer <- buffer(roads, 175)
  buff.diff <- erase(buff.outer, buff.inner)
  
  # convert to raster
  buff.diff$rastid <- 1
  buff.rast <- rasterize(buff.diff, sample.rast, "rastid")

  # crop both rasters to the minimum extent
  cropper <- min.ext.fun(buff.rast, combined.mask)
  buff.rast <- crop(buff.rast, cropper)
  combined.mask <- crop(combined.mask, cropper)
  
  # merge them
  combined.mask <- buff.rast + combined.mask
  combined.mask <- ifel(combined.mask == 2, 1, NA)
  
  # write to file
  writeRaster(combined.mask, out.file, T)
  
}

# read the data back in and plot
combined.mask <- rast(out.file)
plot(combined.mask)

4. Public lands

This is for accessibility and ensuring we’re not on any private inholdings within the USFS admin boundary. It will be derived from the 2023 USDA Forest Service Basic Ownership dataset.

# check to see if this has already been run
out.file <- file.path(samp.dir, "mask_bu_sl_rd_pl.tif")
if (!file.exists(out.file)){

  # define directory structure
  own.gdb <- file.path(main.dir, "usfs_land_ownership/S_USA.BasicOwnership.gdb")
  
  # read in the data
  own <- vect(own.gdb, "BasicOwnership_monroe")
  
  # reproject to match sample.rast
  own <- project(own, sample.rast)
  
  # select only usfs ownership
  own <- own[own$OWNERCLASSIFICATION == "USDA FOREST SERVICE",]
  
  # convert to raster
  own$rastid <- 1
  own.rast <- rasterize(own, sample.rast, "rastid")

  # crop both rasters to the minimum extent
  cropper <- min.ext.fun(own.rast, combined.mask)
  own.rast <- crop(own.rast, cropper)
  combined.mask <- crop(combined.mask, cropper)
  
  # merge them
  combined.mask <- own.rast + combined.mask
  combined.mask <- ifel(combined.mask == 2, 1, NA)
  
  # write to file
  writeRaster(combined.mask, out.file, T)
  
}

# read the data back in and plot
combined.mask <- rast(out.file)
plot(combined.mask)

5. Vegetated land cover

We want to focus only on naturally vegetated areas, avoiding water features, developed areas, and cultivated areas (not that there will be much of the latter two in Monroe Mountain…). This will be derived from the 2022 LANDFIRE Existing Vegetation Type.

# check to see if this has already been run
out.file <- file.path(samp.dir, "mask_bu_sl_rd_pl_lc.tif")
if (!file.exists(out.file)){

  # define directory structure
  lf.dir <- file.path(main.dir, "landfire/LF2022_EVT_230_CONUS")
  
  # read in the data
  lf <- rast(file.path(lf.dir, "LC22_EVT_230.tif"))
  
  # reproject to match sample.rast
  lf <- project(lf, sample.rast, align = T, method = "near")
  
  # read in attribute table
  lf.df <- read.dbf(file.path(lf.dir, "LF22_EVT_230.dbf"))
  
  # get desirable land covers
  phys <- c("Shrubland", "Conifer", "Conifer-Hardwood", "Hardwood")
  keeps <- lf.df$VALUE[lf.df$EVT_PHYS %in% phys]
  
  # select only natural, upland vegetation classes
  lf <- ifel(lf %in% keeps, 1, NA)
  
  # crop both rasters to the minimum extent
  cropper <- min.ext.fun(lf, combined.mask)
  lf <- crop(lf, cropper)
  combined.mask <- crop(combined.mask, cropper)
  
  # merge them
  combined.mask <- lf + combined.mask
  combined.mask <- ifel(combined.mask == 2, 1, NA)
  
  # write to file
  writeRaster(combined.mask, out.file, T)
  
}

# read the data back in and plot
combined.mask <- rast(out.file)
plot(combined.mask)

6. Non-water areas

Although the NLCD analysis above should have eliminated most large water features, the spatial precision of the NLCD’s water classification is relatively low, and smaller water features that we would still not want to place plots in are likely missed. Accordingly, in this step, the National Hydrography Dataset will be used to further refine the mask to remove all areas that are identified as perennial streams or waterbodies.

# check to see if this has already been run
out.file <- file.path(samp.dir, "mask_bu_sl_rd_pl_lc_hy.tif")
if (!file.exists(out.file)){

  # define directory structure
  nhd.dir <- file.path(main.dir, "nhd")
  
  # define inputs
  stream.file <- file.path(nhd.dir, "streams.shp")
  wtrbdy.file <- file.path(nhd.dir, "waterbodies.shp")
  
  # read in the data
  stream <- vect(stream.file)
  wtrbdy <- vect(wtrbdy.file)
  
  # add raster value
  stream$rast.val <- 1
  wtrbdy$rast.val <- 1
  
  # rasterize them
  stream.rast <- rasterize(stream, sample.rast, "rast.val", background = 0)
  wtrbdy.rast <- rasterize(wtrbdy, sample.rast, "rast.val", background = 0)
  
  # combine them
  water.rast <- stream.rast + wtrbdy.rast
  
  # select only non-water areas
  water.rast <- ifel(water.rast == 0, 1, NA)

  # merge them
  combined.mask <- water.rast + combined.mask
  combined.mask <- ifel(combined.mask == 2, 1, NA)
  
  # write to file
  writeRaster(combined.mask, out.file, T)
  
}

# read the data back in and plot
combined.mask <- rast(out.file)
plot(combined.mask)

7. Non-riparian areas

I think it makes sense to avoid riparian areas, since they will tend to have poorly-represented vegetation types (e.g., cottonwoods, willows, etc.). Technically, there are classes in the NLCD that should, in theory, capture these areas. But there is this cool, nationwide riparian map that is more focused on the landform-based classification of riparian areas that will likely more precisely capture the unique vegetation assemblages found within these areas. It is the 2019 National Riparian Areas Base Map.

# check to see if this has already been run
out.file <- file.path(samp.dir, "mask_bu_sl_rd_pl_lc_hy_rp.tif")
if (!file.exists(out.file)){

  # define directory structure
  rip.dir <- file.path(main.dir, "usfs_riparian")
  
  # read in the data
  rip <- rast(file.path(rip.dir, "usfs_riparian_areas_monroe.tif"))
  
  # reproject to match sample.rast
  rip <- project(rip, sample.rast, align = T, method = "near")

  # select only non-riparian areas
  rip <- ifel(is.na(rip), 1, NA)
  
  # crop both rasters to the minimum extent
  cropper <- min.ext.fun(rip, combined.mask)
  rip <- crop(rip, cropper)
  combined.mask <- crop(combined.mask, cropper)
  
  # merge them
  combined.mask <- rip + combined.mask
  combined.mask <- ifel(combined.mask == 2, 1, NA)
  
  # write to file
  writeRaster(combined.mask, out.file, T)
  
}

# read the data back in and plot
combined.mask <- rast(out.file)
plot(combined.mask)

Masking the sample raster

Now that we’ve got our mask finalized, let’s use it to crop our sample.rast. The result of this will be the input dataset for our structurally guided sampling.

# check to see if this has already been run
out.file <- file.path(samp.dir, "sample_rast_vegmets_ls_topo_pred_mask.tif")
if (!file.exists(out.file)){

  # crop both rasters to the minimum extent
  cropper <- min.ext.fun(sample.rast, combined.mask)
  sample.rast.mask <- crop(sample.rast, cropper)
  combined.mask <- crop(combined.mask, cropper)
  sample.rast.mask <- mask(sample.rast.mask, combined.mask)
  
  # write to file
  writeRaster(sample.rast.mask, out.file, T)

}

# read the data back in and plot it out
sample.rast.mask <- rast(out.file)
plot(sample.rast.mask)

To see how well this sample area captures variability in important landscape characteristics throughout the entirety of Monroe Mountain, I’ll plot out some density distribution comparisons. For simplicity, I’ll just look at four variables:

Canopy cover
Canopy height
Basal area
Elevation

# get density distributions
d.cc.full <- density(sample.rast[["cc"]], plot = F)
d.cc.samp <- density(sample.rast.mask[["cc"]], plot = F)
d.ch.full <- density(sample.rast[["p.100"]], plot = F)
d.ch.samp <- density(sample.rast.mask[["p.100"]], plot = F)
d.ba.full <- density(sample.rast[["ba_total"]], plot = F)
d.ba.samp <- density(sample.rast.mask[["ba_total"]], plot = F)
d.el.full <- density(sample.rast[["elevation"]], plot = F)
d.el.samp <- density(sample.rast.mask[["elevation"]], plot = F)

# define function to add alpha
add.alpha <- function(col, alpha=1){
  apply(sapply(col, col2rgb)/255, 2, 
                     function(x) 
                       rgb(x[1], x[2], x[3], alpha=alpha))  
}

# set up plot
par(mfrow = c(2,2), las = 1, mar = c(5,5,1,1))
cols.full <- c(2, add.alpha(2, 0.25))
cols.samp <- c(4, add.alpha(4, 0.25))

# cc plot
xmin <- min(c(d.cc.full[[1]]$x, d.cc.samp[[1]]$x))
xmax <- max(c(d.cc.full[[1]]$x, d.cc.samp[[1]]$x))
ymin <- min(c(d.cc.full[[1]]$y, d.cc.samp[[1]]$y))
ymax <- max(c(d.cc.full[[1]]$y, d.cc.samp[[1]]$y))
plot(c(xmin, xmax), c(ymin, ymax), type = "n", ylab = "Density", 
     xlab = "Canopy Cover")
grid()
box()
polygon(d.cc.full[[1]]$x, d.cc.full[[1]]$y, border = NA, col = cols.full[2])
lines(d.cc.full[[1]]$x, d.cc.full[[1]]$y, col = cols.full[1])
polygon(d.cc.samp[[1]]$x, d.cc.samp[[1]]$y, border = NA, col = cols.samp[2])
lines(d.cc.samp[[1]]$x, d.cc.samp[[1]]$y, col = cols.samp[1])

# ch plot
xmin <- min(c(d.ch.full[[1]]$x, d.ch.samp[[1]]$x))
xmax <- max(c(d.ch.full[[1]]$x, d.ch.samp[[1]]$x))
ymin <- min(c(d.ch.full[[1]]$y, d.ch.samp[[1]]$y))
ymax <- max(c(d.ch.full[[1]]$y, d.ch.samp[[1]]$y))
plot(c(xmin, xmax), c(ymin, ymax), type = "n", ylab = "Density", 
     xlab = "Canopy Height")
grid()
box()
polygon(d.ch.full[[1]]$x, d.ch.full[[1]]$y, border = NA, col = cols.full[2])
lines(d.ch.full[[1]]$x, d.ch.full[[1]]$y, col = cols.full[1])
polygon(d.ch.samp[[1]]$x, d.ch.samp[[1]]$y, border = NA, col = cols.samp[2])
lines(d.ch.samp[[1]]$x, d.ch.samp[[1]]$y, col = cols.samp[1])

# add legend
legend("topright", legend = c("Full", "Sample"),
       border = c(cols.full[1], cols.samp[1]),
       fill = c(cols.full[2], cols.samp[2]))

# ba plot
xmin <- min(c(d.ba.full[[1]]$x, d.ba.samp[[1]]$x))
xmax <- max(c(d.ba.full[[1]]$x, d.ba.samp[[1]]$x))
ymin <- min(c(d.ba.full[[1]]$y, d.ba.samp[[1]]$y))
ymax <- max(c(d.ba.full[[1]]$y, d.ba.samp[[1]]$y))
plot(c(xmin, xmax), c(ymin, ymax), type = "n", ylab = "Density", 
     xlab = "Basal Area")
grid()
box()
polygon(d.ba.full[[1]]$x, d.ba.full[[1]]$y, border = NA, col = cols.full[2])
lines(d.ba.full[[1]]$x, d.ba.full[[1]]$y, col = cols.full[1])
polygon(d.ba.samp[[1]]$x, d.ba.samp[[1]]$y, border = NA, col = cols.samp[2])
lines(d.ba.samp[[1]]$x, d.ba.samp[[1]]$y, col = cols.samp[1])

# el plot
xmin <- min(c(d.el.full[[1]]$x, d.el.samp[[1]]$x))
xmax <- max(c(d.el.full[[1]]$x, d.el.samp[[1]]$x))
ymin <- min(c(d.el.full[[1]]$y, d.el.samp[[1]]$y))
ymax <- max(c(d.el.full[[1]]$y, d.el.samp[[1]]$y))
plot(c(xmin, xmax), c(ymin, ymax), type = "n", ylab = "Density", 
     xlab = "Elevation")
grid()
box()
polygon(d.el.full[[1]]$x, d.el.full[[1]]$y, border = NA, col = cols.full[2])
lines(d.el.full[[1]]$x, d.el.full[[1]]$y, col = cols.full[1])
polygon(d.el.samp[[1]]$x, d.el.samp[[1]]$y, border = NA, col = cols.samp[2])
lines(d.el.samp[[1]]$x, d.el.samp[[1]]$y, col = cols.samp[1])

Awesome! Looks like the sample area, despite being a relatively small proportion of the full Monroe Mountain study area (4.9% to be precise), captures variability in some of the most important variables really nicely. If I were to nitpick, I would say that the low end of basal area is oversampled in this dataset. That kind of makes sense – these are likely non-forest areas, that are more likely to be near roads, on moderate terrain, etc.

Structurally guided sampling

The moment we’ve all been waiting for – the actual site selection process. In the past, I have used two different sampling algorithms from the sgsR library before:

Nearest centroid (NC) sampling
- This is a k-means driven algorithm that subdivides your n-dimensional predictor data into k clusters, and then places one or more points at or near the multidimensional centroid of those clusters in variable space. One of the nice things about this algorithm is that you can place multiple points in each cluster, which gives you the capacity to provide “backup” plots that still capture the same vegetation structure.
- This algorithm is based on the work of Melville & Stone (2016)
Conditioned Latin hypercube (CLHS) sampling
- Similar to NC sampling, CLHS is a multidimensional, stratified random sampling algorithm that attempts to maximize representation of each variable in n-dimensional space, given an existing sample of pixels and a desired number of plots.

I’m happy to explore both if there is interest, but given the capacity to provide “backup” samples, which may come in very handy in a relatively study area with accessibility limitations, I am going to go with NC for now.

Of course the most important question for this part of the analysis is: how many plots to place with the sample area?

The current plan is to be in the field, collecting data from 7/30 to 8/4 – 6 days in total. It sounded like there would be two fuel measurement crews. Without a detailed understanding of how long these fuel plots take, it is difficult to estimate how many plots each crew could collect in a single day. But, let’s imagine a situation in which each plot takes 2 hours. Add in 30 minutes of travel time between plots for a total of 2.5 hours. With long (10-hour) days, that could yield up to 4 plots per day per crew. So 6 days x 2 crews x 4 plots = 48 plots. In case a couple of the strata are found to be undesirable and to match the number of plots in North Kaibab, I will up this total to 50.

For the sake of this sampling procedure, I will place 50 plots

I will place two points in each cluster, effectively providing one backup for each plot.

# check to see if this has already been run
out.file <- file.path(samp.dir, "sample_pts.shp")
if (!file.exists(out.file)){
  
  # generate strata
  set.seed(5757)
  sample.rast.mask <- as.numeric(sample.rast.mask)
  stratum.rast <- strat_kmeans(sample.rast.mask, 24)  
  
  # write to file
  out.file <- file.path(samp.dir, "sample_strata.tif")
  writeRaster(stratum.rast, out.file, overwrite = T)

  # generate samples
  set.seed(5757)
  samps <- sample_nc(mraster = sample.rast.mask,
                     nSamp = 24, k = 2)
  samps <- vect(samps)
  
  # add plot ids
  samps$plot.id <- paste0(samps$kcenter)
  samps$plot.id[1:(nrow(samps)/2)] <- paste0(samps$plot.id[1:(nrow(samps)/2)], "A")
  samps$plot.id[(nrow(samps)/2+1):(nrow(samps))] <- paste0(samps$plot.id[(nrow(samps)/2+1):(nrow(samps))], "B")
  
  # change periods to underscores for shapefiles
  names(samps) <- str_replace_all(names(samps), "\\.", "_")
  
  # write to file
  out.file <- file.path(samp.dir, "sample_pts.shp")
  writeVector(samps, out.file, overwrite = T)

}

## K-means being performed on 69 layers with 24 centers.

## K-means being performed on 69 layers with 24 centers.

# read the data back in and plot it out
samps <- vect(out.file)
plot(mm)
plot(samps, add = T)

Just to see how well these plots capture the range of conditions present within the study area, I’ll regenerate those same density plots from above, but now I’ll add in vertical lines for each of the plot points. I’ll only add one of the two paired cluster centroid points.

# subset samples
samps.sub <- samps[1:(nrow(samps)/2)]

# set up plot
par(mfrow = c(2,2), las = 1, mar = c(5,5,1,1))

# cc plot
xmin <- min(c(d.cc.full[[1]]$x, d.cc.samp[[1]]$x))
xmax <- max(c(d.cc.full[[1]]$x, d.cc.samp[[1]]$x))
ymin <- min(c(d.cc.full[[1]]$y, d.cc.samp[[1]]$y))
ymax <- max(c(d.cc.full[[1]]$y, d.cc.samp[[1]]$y))
plot(c(xmin, xmax), c(ymin, ymax), type = "n", ylab = "Density", 
     xlab = "Canopy Cover")
grid()
box()
for (i in seq(1,(nrow(samps)/2))){
  abline(v = samps.sub$cc[i], col = "gray")
}
polygon(d.cc.full[[1]]$x, d.cc.full[[1]]$y, border = NA, col = cols.full[2])
lines(d.cc.full[[1]]$x, d.cc.full[[1]]$y, col = cols.full[1])
polygon(d.cc.samp[[1]]$x, d.cc.samp[[1]]$y, border = NA, col = cols.samp[2])
lines(d.cc.samp[[1]]$x, d.cc.samp[[1]]$y, col = cols.samp[1])

# ch plot
xmin <- min(c(d.ch.full[[1]]$x, d.ch.samp[[1]]$x))
xmax <- max(c(d.ch.full[[1]]$x, d.ch.samp[[1]]$x))
ymin <- min(c(d.ch.full[[1]]$y, d.ch.samp[[1]]$y))
ymax <- max(c(d.ch.full[[1]]$y, d.ch.samp[[1]]$y))
plot(c(xmin, xmax), c(ymin, ymax), type = "n", ylab = "Density", 
     xlab = "Canopy Height")
grid()
box()
for (i in seq(1,(nrow(samps)/2))){
  abline(v = samps.sub$p_100[i], col = "gray")
}
polygon(d.ch.full[[1]]$x, d.ch.full[[1]]$y, border = NA, col = cols.full[2])
lines(d.ch.full[[1]]$x, d.ch.full[[1]]$y, col = cols.full[1])
polygon(d.ch.samp[[1]]$x, d.ch.samp[[1]]$y, border = NA, col = cols.samp[2])
lines(d.ch.samp[[1]]$x, d.ch.samp[[1]]$y, col = cols.samp[1])

# add legend
legend("topright", legend = c("Full", "Sample"),
       border = c(cols.full[1], cols.samp[1]),
       fill = c(cols.full[2], cols.samp[2]))

# ba plot
xmin <- min(c(d.ba.full[[1]]$x, d.ba.samp[[1]]$x))
xmax <- max(c(d.ba.full[[1]]$x, d.ba.samp[[1]]$x))
ymin <- min(c(d.ba.full[[1]]$y, d.ba.samp[[1]]$y))
ymax <- max(c(d.ba.full[[1]]$y, d.ba.samp[[1]]$y))
plot(c(xmin, xmax), c(ymin, ymax), type = "n", ylab = "Density", 
     xlab = "Basal Area")
grid()
box()
for (i in seq(1,(nrow(samps)/2))){
  abline(v = samps.sub$ba_total[i], col = "gray")
}
polygon(d.ba.full[[1]]$x, d.ba.full[[1]]$y, border = NA, col = cols.full[2])
lines(d.ba.full[[1]]$x, d.ba.full[[1]]$y, col = cols.full[1])
polygon(d.ba.samp[[1]]$x, d.ba.samp[[1]]$y, border = NA, col = cols.samp[2])
lines(d.ba.samp[[1]]$x, d.ba.samp[[1]]$y, col = cols.samp[1])

# el plot
xmin <- min(c(d.el.full[[1]]$x, d.el.samp[[1]]$x))
xmax <- max(c(d.el.full[[1]]$x, d.el.samp[[1]]$x))
ymin <- min(c(d.el.full[[1]]$y, d.el.samp[[1]]$y))
ymax <- max(c(d.el.full[[1]]$y, d.el.samp[[1]]$y))
plot(c(xmin, xmax), c(ymin, ymax), type = "n", ylab = "Density", 
     xlab = "Elevation")
grid()
box()
for (i in seq(1,(nrow(samps)/2))){
  abline(v = samps.sub$elevation[i], col = "gray")
}
polygon(d.el.full[[1]]$x, d.el.full[[1]]$y, border = NA, col = cols.full[2])
lines(d.el.full[[1]]$x, d.el.full[[1]]$y, col = cols.full[1])
polygon(d.el.samp[[1]]$x, d.el.samp[[1]]$y, border = NA, col = cols.samp[2])
lines(d.el.samp[[1]]$x, d.el.samp[[1]]$y, col = cols.samp[1])

Interative map exploration of plot placement

I want to give everyone the opportunity to look at these plots in a more dynamic way, rather than just these diagnostic, descriptive statistically-driven plots and maps. To that end, I’ll add the plots to a leaflet map with aerial imagery as the backdrop, enabling you to pan and zoom to each plot to get a feel for what types of landscape conditions they represent. All 84 plots will be added to the map, but the primary plots will be labeled with an A and the backups will be labeled with a B. I’ll also buffer the points by 12.5m to create polygons so that when you zoom in you’ll be able to see what a 25m radius circular plot would look like on the ground.

# buffer points
buffs <- buffer(samps, 12.5)
buffs <- st_as_sf(buffs) %>%
  st_transform(4326)

# create leaflet map
samps <- st_as_sf(samps) %>%
  st_transform(4326)
m <- leaflet(samps) %>%
  addProviderTiles(providers$Esri.WorldImagery) %>%
  addMarkers(label = ~plot_id, group = "points") %>%
  addPolygons(data = buffs, col = "red", group = "polygons", label = buffs$plot_id) %>%
  groupOptions("points", zoomLevels = 1:15) %>%
  groupOptions("polygons", zoomLevels = 15:20)
m

Plot summary statistics

I wanted to provide two quick summaries of the resulting plot locations. First, I wanted to share how many of the plots fell within the planned burn units.

# define inputs
burn.dir <- file.path(main.dir, "usfs_burn_units")
samps <- vect(file.path(samp.dir, "sample_pts.shp"))
burns <- vect(file.path(burn.dir, "usfs_burn_units_20230714_utm.shp"))

## Warning: [vect] Z coordinates ignored

# split into a and b
samps.a <- samps[grep("A", samps$plot_id),]
samps.b <- samps[grep("B", samps$plot_id),]

# determine how many points are within planned burn units
samps.a.burn <- intersect(samps.a, burns)
samps.b.burn <- intersect(samps.b, burns)

Of the 24 plots in Group A (the primary plots), 7 of them fall within the planned burn units, representing a proportion of 29%.

Of the 24 plots in Group B (the backup plots), 5 of them fall within the planned burn units, representing a proportion of 21%.

Second, I wanted to compare plot locations to LANDFIRE Existing Vegetation Type to see the distribution of plots against vegetation types and life forms.

# define inputs
lf <- rast("S:/ursa/campbell/fasmee/data/landfire/LF2022_EVT_230_CONUS/LC22_EVT_230.tif")

# reproject landfire
lf <- project(lf, "epsg:26912", "near")

# extract pixel values
lf.vals.a <- terra::extract(lf, samps.a)
lf.tab.a <- table(lf.vals.a$EVT_NAME)
lf.tab.a <- lf.tab.a[lf.tab.a > 0]

lf.vals.b <- terra::extract(lf, samps.b)
lf.tab.b <- table(lf.vals.b$EVT_NAME)
lf.tab.b <- lf.tab.b[lf.tab.b > 0]

# join pixel values back to points
samps.a$evt_name <- lf.vals.a$EVT_NAME
samps.b$evt_name <- lf.vals.b$EVT_NAME

# write to file
out.file <- file.path(samp.dir, "sample_pts_a.shp")
writeVector(samps.a, out.file, overwrite = T)

out.file <- file.path(samp.dir, "sample_pts_b.shp")
writeVector(samps.b, out.file, overwrite = T)

# buffer to create plot polygons
buffs.a <- buffer(samps.a, 12.5, 50)
buffs.b <- buffer(samps.b, 12.5, 50)

# write to file
out.file <- file.path(samp.dir, "sample_polys_a.shp")
writeVector(buffs.a, out.file, overwrite = T)

out.file <- file.path(samp.dir, "sample_polys_b.shp")
writeVector(buffs.b, out.file, overwrite = T)

Here are the distributions of vegetation types in Group A (the primary plots):

Here are the distributions of vegetation types in Group B (the backup plots):