Introduction

I wanted to put together an R Markdown document to (1) maintain a degree of analytical openness between us and (2) have a second and third set of eyes on my process to ensure you two don’t see any red flags with my procedure. So, here goes…

Step 1. Housekeeping

First things first – I will load the necessary libraries and set the working directory. Very unexciting stuff.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(stringr)
library(emo)

# set working directory
setwd("S:/ursa/campbell/bwa/data/fia/data_from_erin")

Step 2. Reading in and exporing the data

There are quite a few different tables to read in. Put another way, there are quite a few opportunities for me to mess something up. Each table brings something important to the… ahem… table, so basically the main point of this whole process is just figuring out how to strategically join these tables together. Here I’ll read in the tables one by one, provide a glimpse of the data within the tables, and highlight the tables’ usefulness. And, for the sake of consistency (and because I just don’t like capital letters) I will slightly modify all of the column names.

Table #1. Subplot condition table

This table contains “conditions” (essentially land cover types) found within each subplot within each plot in all of the states that overlap our study area (CO, ID, NV, UT, and WY). Modeling (so far) is done at the subplot level, so this will help refine which subplots should and shouldn’t be used in the modeling process.

# read in table
df.subp <- read.csv("BWA_SUBP_COND.csv")

# clean up column names
colnames(df.subp) <- tolower(colnames(df.subp))
colnames(df.subp) <- str_replace_all(colnames(df.subp), "_", ".")

# convert cn's to character
df.subp$plt.cn <- as.character(df.subp$plt.cn)

# check out first few rows
head(df.subp)
##            plt.cn subp condid cond.status.cd lcc cond.nonsample.reasn.cd
## 1 188761891020004    1      1              2  17                      NA
## 2 188761891020004    2      1              2  17                      NA
## 3 188761891020004    3      1              2  17                      NA
## 4 188761891020004    4      1              2  17                      NA
## 5 188761892020004    1      1              1  55                      NA
## 6 188761892020004    2      1              1  55                      NA
##   subpcond.prop
## 1             1
## 2             1
## 3             1
## 4             1
## 5             1
## 6             1

Summary of fields:

  • plt.cn is the plot sequence number
  • subp is the subplot number (1-4)
  • condid is the unique condition ID for that subplot, which is basically just a way to track how many conditions are found within a subplot and is useful for separating single-condition plots (better reference data) from multi-condition plots (worse reference data – think half forest/half grassland, for example).
  • cond.status.cd is either:
    • 1: accesible “forest” land (where “forest” >= 10% tree cover)
    • 2: “non-forest” land (where “non-forest” < 10% tree cover, but could still have tress present, a.k.a. “other wooded lands”)
    • 3: noncensus water (small water bodies)
    • 4: census water (larger water bodies)
    • 5: nonsampled, possibility of forest land (could be forest, but just wasn’t able to be accessed for some reason)
  • lcc: live canopy cover (%), which is useful for distinguishing between “other wooded lands”, where trees are present, but no tree measurements are available (lcc between 1% and 9%), and true absence subplots, such as grasslands, shrublands, barren, developed, etc. (lcc == 0%)
  • cond.nonsample.reasn.cd: if cond.status.cd was 5 this is why. Not super important, since I’ll be eliminating those subplots from consideration.
  • subpcond.prop: proportion of the subplot that has a given condition. So, if a subplot was half forest and half shrub, then there would (theoretically) be two records for that subplot, each wiht subpcond.prop equal to 0.5

Summary stats

  • There are 329924 records in this table
  • There are 328192 total unique subplot records in this table from 82048 total unique plot records.
  • There are 1717 plots with more than one value for cond.status.cd, meaning they have multiple conditions, leaving 328207 single condition subplots.

Table #2. Plot status table

This table contains general plot-level information, again for all plots within overlapping states.

# read in table
df.plot <- read.csv("BWA_PLOT_STATUS.csv")

# clean up column names
colnames(df.plot) <- tolower(colnames(df.plot))
colnames(df.plot) <- str_replace_all(colnames(df.plot), "_", ".")

# convert cn's to character
df.plot$plt.cn <- as.character(df.plot$plt.cn)
df.plot$prev.plt.cn <- as.character(df.plot$prev.plt.cn)

# check out first few rows
head(df.plot)
##           plt.cn prev.plt.cn invyr statecd unitcd countycd  plot plot.status.cd
## 1 37263834010690        <NA>  2010      35      3       51 86556              2
## 2 37258774010690        <NA>  2010      35      4        5 88590              2
## 3 37258810010690        <NA>  2010      35      4        5 93018              2
## 4 39636386010690        <NA>  2011      35      2       37 84800              3
## 5 39636388010690        <NA>  2011      35      2       37 90980              2
## 6 39636392010690        <NA>  2011      35      2       37 85665              2
##   measyear
## 1     2010
## 2     2010
## 3     2010
## 4     2012
## 5     2011
## 6     2011

Summary of fields:

  • plt.cn is the plot sequence number
  • prev.plt.cn is the previous plot sequence number (if multiple inventories are recorded at the same plot over time within the same database)
  • invyr is inventory year (importantly, not the same as the measurement year)
  • statecd is a unique identifier for each state
  • unitcd is a unique identifier for each survey unit (groups of counties)
  • countycd is a unique identifier for each county
  • plot is a unique identifier for each plot (though not strictly speaking “unique” – has to be paired with other fields to be truly unique)
  • plot.status.cd is either:
    • 1: sampled with at least one accessible forest land condition present on the plot
    • 2: sampled with no accessible forest land condition present on plot
    • 3: nonsampled.
    • These are less useful now that Erin has provided more detailed status information at the subplot level
  • measyear is the year of measurement

Summary stats

  • There are 79278 records in this table
  • There are 79278 total unique plot records in this table (should be the same as above)
  • Note that this differs from the number of unique plot records in the subplot condition table… Strange, but as we’ll see shortly, it doesn’t matter for our purposes.

Table #3. AOI plot table

This table identifies only those plots that fall within the AOI (a bounding box surrounding a 200 km buffer from UWC and Ashely NFs) and will be used to subset both the plot-level and subplot-level data from the previous two tables.

# read in table
df.aoip <- read.csv("plots_within_AOI.csv")

# clean up column names
colnames(df.aoip) <- tolower(colnames(df.aoip))
colnames(df.aoip) <- str_replace_all(colnames(df.aoip), "_", ".")

# convert cn's to character
df.aoip$plt.cn <- as.character(df.aoip$plt.cn)

# check out first few rows
head(df.aoip)
##            plt.cn measyea
## 1 352143526489998    2017
## 2 429485673489998    2019
## 3  40399059010690    2011
## 4 352144158489998    2017
## 5 429485755489998    2019
## 6 188777104020004    2015

Summary of fields:

  • plt.cn is the plot sequence number
  • measyear is the year of measurement

Summary stats

  • There are 18289 records in this table
  • There are 18289 total unique plot records in this table (should be the same as above)
  • Of these, 18289 are also in the plot status table, and 18289 are also in the subplot condition table. So, **even though the number of plots from the plot status table differed from the number of plots in the subplot condition table, both of those tables contain data for all of the plots within the AOI, which is really all that matters.

Table #4. Tree parameters table

This table contains summary structural data aggregated to the subplot level for each tree species measured for all plots within overlapping states.

# read in table
df.tree <- read.csv("tree_parameters_Campbell.csv")

# clean up column names
colnames(df.tree) <- tolower(colnames(df.tree))
colnames(df.tree) <- str_replace_all(colnames(df.tree), "_", ".")

# convert cn's to character
df.tree$plt.cn <- as.character(df.tree$plt.cn)

# check out first few rows
head(df.tree)
##            plt.cn subp dstrbcd1 dstrbyr1 condid spcd statuscd    tpa    ba
## 1 188761909020004    2        0       NA      2  106        1  24.07 13.13
## 2 188761409020004    2        0       NA      1  122        1  48.14 84.93
## 3 188761410020004    4       12     9999      1  202        1  24.07 29.15
## 4 431917369489998    3        0       NA      1  746        1 299.86 17.81
## 5 431917370489998    2        0       NA      2   65        2  24.07 18.91
## 6 431917373489998    1       12     2014      1   93        1 923.66 27.92

Summary of fields:

  • plt.cn is the plot sequence number
  • subp is the subplot number (1-4)
  • dstrbcd1 is type of disturbance since the last measurement
  • dstrbdyr1 is the year of said disturbance
  • condid is the tree-level condition code – identifies the condition on which the trees are located
  • spcd is the FIA species codes
  • statuscd is either:
    • 0: no status
    • 1: live tree (ultimately what I will filter to)
    • 2: dead tree
    • 3: removed tree
  • tpa is trees per acre
  • ba is basal area in square feet per acre

Summary stats

  • There are 107666 records in this table
  • There are 48856 total unique subplot records in this table from 14429 total unique plot records.
  • Of the 14429 total plots, 5378 are within the list of 18289 plots within the AOI
  • Of the 48856 total plots, 17969 are within the list of plots within the AOI
  • It’s important to note that these subplots only include those with tree measurements, so they would have had to have been visited and found to have >= 10% cover. Also note that these data were for all plots within all overlapping states. This is why the number of plots within the AOI is so small.

Table #5. Tree species code table

This is just a FIA species code lookup table so I could work with species abbreviations rather than the numeric species codes. I’ll remove all of the unnecessary fields first.

# read in table
df.spcd <- read.csv("2021 Master Species FGver9-1_9_2021_final.csv")

# clean up column names
colnames(df.spcd) <- tolower(colnames(df.spcd))
colnames(df.spcd) <- str_replace_all(colnames(df.spcd), "_", ".")

# remove unnecessary fields
df.spcd <- df.spcd[,c("fia.code", "plants.code")]

# check out first few rows
head(df.spcd)
##   fia.code plants.code
## 1       11        ABAM
## 2       12        ABBA
## 3       14        ABBR
## 4       15        ABCO
## 5       16        ABFR
## 6       17        ABGR

Summary of fields:

  • fia.code is numeric species code
  • plants.code is the USDA PLANTS database code

Table #6. Full state-level plot tables

Even though they won’t be used in any sort of modeling context, there is still some value in having the fuzzed plot coordinates, for diagnostic and plotting purposes. To get these, I downloaded the full state-level plot tables from the FIA DataMart. So, this step involves merging them all together and simplifying them.

# read in state-level plot data to get fuzzed coordinates
df.cord.co <- read.csv("S:/ursa/campbell/bwa/data/fia/CO_PLOT.csv")
df.cord.id <- read.csv("S:/ursa/campbell/bwa/data/fia/ID_PLOT.csv")
df.cord.nv <- read.csv("S:/ursa/campbell/bwa/data/fia/NV_PLOT.csv")
df.cord.ut <- read.csv("S:/ursa/campbell/bwa/data/fia/UT_PLOT.csv")
df.cord.wy <- read.csv("S:/ursa/campbell/bwa/data/fia/WY_PLOT.csv")
df.cord <- rbind(df.cord.co,
                 df.cord.id,
                 df.cord.nv,
                 df.cord.ut,
                 df.cord.wy)

# clean up column names
colnames(df.cord) <- tolower(colnames(df.cord))
colnames(df.cord) <- str_replace_all(colnames(df.cord), "_", ".")

# convert cn's to character
df.cord$cn <- as.character(df.cord$cn)

# simplify coordinate data
df.cord <- df.cord[,c("cn", "lat", "lon")]
colnames(df.cord)[1] <- "plt.cn"

# check out first few rows
head(df.cord)
##           plt.cn      lat       lon
## 1 35374046010690 40.59182 -108.8127
## 2 35374047010690 40.57703 -108.1593
## 3 35374048010690 40.57548 -107.7195
## 4 35374053010690 40.52328 -107.8978
## 5 35374054010690 40.53379 -107.5385
## 6 35374060010690 40.45123 -108.9959

Summary of fields:

The only fields that matter are:

  • plt.cn is the plot sequence number
  • lat is the fuzzed plot center latitude
  • lon is the fuzzed plot center longitude

Visual summary

# plot the data out
par(mar = c(5,5,1,1))
par(las = 1)
plot(lat ~ lon,
     data = df.cord,
     xlab = "longitude",
     ylab = "latitude",
     pch = 16,
     cex = 0.05)

Table #7. Predictors tables

Due to slightly different data extents and pixel alignments, and due to R’s finicky need to have all raster layers align perfectly, I delivered the massive batch of predictor layers in six different chunks. Erin took each of the subplots that overlaps our AOI and extracted spatially-overlapping pixel values for each subplot. Each chunk has its own table, so I will first merge them together.

# read in tables
df.pred.1 <- read.csv("Extracted_data/chunk_1.csv")
df.pred.2 <- read.csv("Extracted_data/chunk_2.csv")
df.pred.3 <- read.csv("Extracted_data/chunk_3.csv")
df.pred.4 <- read.csv("Extracted_data/chunk_4.csv")
df.pred.5 <- read.csv("Extracted_data/chunk_5.csv")
df.pred.6 <- read.csv("Extracted_data/chunk_6.csv")

# merge predictors together
df.pred <- merge(df.pred.1, df.pred.2)%>%
  merge(df.pred.3) %>%
  merge(df.pred.4) %>%
  merge(df.pred.5) %>%
  merge(df.pred.6)
  
# clean up column names
colnames(df.pred) <- tolower(colnames(df.pred))
colnames(df.pred) <- str_replace_all(colnames(df.pred), "_", ".")

# convert cn's to character
df.pred$plt.cn <- as.character(df.pred$plt.cn)

# check out first few rows and columns
head(df.pred[,1:10])
##           plt.cn subp measyear    id clim.20m.ahm clim.20m.bffp clim.20m.cmd
## 1 14546577020004    1     2017 34150          510           143          733
## 2 14546577020004    2     2017 34151          511           143          733
## 3 14546577020004    3     2017 34152          510           143          733
## 4 14546577020004    4     2017 34153          512           143          734
## 5 14546578020004    1     2019 33362          486           142          656
## 6 14546578020004    2     2019 33363          491           142          661
##   clim.20m.cmd.at clim.20m.cmd.sm clim.20m.cmd.sp
## 1             131             423             179
## 2             131             423             179
## 3             131             423             179
## 4             132             423             179
## 5             110             391             155
## 6             111             393             157

Summary of fields:

  • plt.cn is the plot sequence number
  • subp is the subplot number (1-4)
  • measyear is the measurement year
  • id is a unique identifier for each unique plot-subplot combination (i.e. each point)

…And then there are a whole bunch of predictors (I’ve only shown a few here)

Summary stats

  • There are 73152 records in this table
  • There are 626 columns in this table
  • There are 73152 total unique subplot records in this table from 18291 total unique plot records.
  • Of the 18291 total plots, 18289 are within the list of 18289 plots within the AOI
  • Of the 73152 total plots, 73150 are within the list of plots within the AOI