I wanted to put together an R Markdown document to (1) maintain a degree of analytical openness between us and (2) have a second and third set of eyes on my process to ensure you two don’t see any red flags with my procedure. So, here goes…
First things first – I will load the necessary libraries and set the working directory. Very unexciting stuff.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringr)
library(emo)
# set working directory
setwd("S:/ursa/campbell/bwa/data/fia/data_from_erin")
There are quite a few different tables to read in. Put another way, there are quite a few opportunities for me to mess something up. Each table brings something important to the… ahem… table, so basically the main point of this whole process is just figuring out how to strategically join these tables together. Here I’ll read in the tables one by one, provide a glimpse of the data within the tables, and highlight the tables’ usefulness. And, for the sake of consistency (and because I just don’t like capital letters) I will slightly modify all of the column names.
This table contains “conditions” (essentially land cover types) found within each subplot within each plot in all of the states that overlap our study area (CO, ID, NV, UT, and WY). Modeling (so far) is done at the subplot level, so this will help refine which subplots should and shouldn’t be used in the modeling process.
# read in table
df.subp <- read.csv("BWA_SUBP_COND.csv")
# clean up column names
colnames(df.subp) <- tolower(colnames(df.subp))
colnames(df.subp) <- str_replace_all(colnames(df.subp), "_", ".")
# convert cn's to character
df.subp$plt.cn <- as.character(df.subp$plt.cn)
# check out first few rows
head(df.subp)
## plt.cn subp condid cond.status.cd lcc cond.nonsample.reasn.cd
## 1 188761891020004 1 1 2 17 NA
## 2 188761891020004 2 1 2 17 NA
## 3 188761891020004 3 1 2 17 NA
## 4 188761891020004 4 1 2 17 NA
## 5 188761892020004 1 1 1 55 NA
## 6 188761892020004 2 1 1 55 NA
## subpcond.prop
## 1 1
## 2 1
## 3 1
## 4 1
## 5 1
## 6 1
Summary of fields:
plt.cn is the plot sequence numbersubp is the subplot number (1-4)condid is the unique condition ID for that subplot,
which is basically just a way to track how many conditions are found
within a subplot and is useful for separating single-condition plots
(better reference data) from multi-condition plots (worse reference data
– think half forest/half grassland, for example).cond.status.cd is either:
1: accesible “forest” land (where “forest” >= 10%
tree cover)2: “non-forest” land (where “non-forest” < 10% tree
cover, but could still have tress present, a.k.a. “other wooded
lands”)3: noncensus water (small water bodies)4: census water (larger water bodies)5: nonsampled, possibility of forest land (could be
forest, but just wasn’t able to be accessed for some reason)lcc: live canopy cover (%), which is useful for
distinguishing between “other wooded lands”, where trees are present,
but no tree measurements are available (lcc between 1% and 9%), and
true absence subplots, such as grasslands,
shrublands, barren, developed, etc. (lcc == 0%)cond.nonsample.reasn.cd: if cond.status.cd
was 5 this is why. Not super important, since I’ll be
eliminating those subplots from consideration.subpcond.prop: proportion of the subplot that has a
given condition. So, if a subplot was half forest and half shrub, then
there would (theoretically) be two records for that subplot, each wiht
subpcond.prop equal to 0.5Summary stats
cond.status.cd, meaning they have multiple conditions,
leaving 328207 single condition subplots.This table contains general plot-level information, again for all plots within overlapping states.
# read in table
df.plot <- read.csv("BWA_PLOT_STATUS.csv")
# clean up column names
colnames(df.plot) <- tolower(colnames(df.plot))
colnames(df.plot) <- str_replace_all(colnames(df.plot), "_", ".")
# convert cn's to character
df.plot$plt.cn <- as.character(df.plot$plt.cn)
df.plot$prev.plt.cn <- as.character(df.plot$prev.plt.cn)
# check out first few rows
head(df.plot)
## plt.cn prev.plt.cn invyr statecd unitcd countycd plot plot.status.cd
## 1 37263834010690 <NA> 2010 35 3 51 86556 2
## 2 37258774010690 <NA> 2010 35 4 5 88590 2
## 3 37258810010690 <NA> 2010 35 4 5 93018 2
## 4 39636386010690 <NA> 2011 35 2 37 84800 3
## 5 39636388010690 <NA> 2011 35 2 37 90980 2
## 6 39636392010690 <NA> 2011 35 2 37 85665 2
## measyear
## 1 2010
## 2 2010
## 3 2010
## 4 2012
## 5 2011
## 6 2011
Summary of fields:
plt.cn is the plot sequence numberprev.plt.cn is the previous plot sequence number (if
multiple inventories are recorded at the same plot over time within the
same database)invyr is inventory year (importantly, not the same as
the measurement year)statecd is a unique identifier for each stateunitcd is a unique identifier for each survey unit
(groups of counties)countycd is a unique identifier for each countyplot is a unique identifier for each plot (though not
strictly speaking “unique” – has to be paired with other fields
to be truly unique)plot.status.cd is either:
1: sampled with at least one accessible forest land
condition present on the plot2: sampled with no accessible forest land condition
present on plot3: nonsampled.measyear is the year of measurementSummary stats
This table identifies only those plots that fall within the AOI (a bounding box surrounding a 200 km buffer from UWC and Ashely NFs) and will be used to subset both the plot-level and subplot-level data from the previous two tables.
# read in table
df.aoip <- read.csv("plots_within_AOI.csv")
# clean up column names
colnames(df.aoip) <- tolower(colnames(df.aoip))
colnames(df.aoip) <- str_replace_all(colnames(df.aoip), "_", ".")
# convert cn's to character
df.aoip$plt.cn <- as.character(df.aoip$plt.cn)
# check out first few rows
head(df.aoip)
## plt.cn measyea
## 1 352143526489998 2017
## 2 429485673489998 2019
## 3 40399059010690 2011
## 4 352144158489998 2017
## 5 429485755489998 2019
## 6 188777104020004 2015
Summary of fields:
plt.cn is the plot sequence numbermeasyear is the year of measurementSummary stats
This table contains summary structural data aggregated to the subplot level for each tree species measured for all plots within overlapping states.
# read in table
df.tree <- read.csv("tree_parameters_Campbell.csv")
# clean up column names
colnames(df.tree) <- tolower(colnames(df.tree))
colnames(df.tree) <- str_replace_all(colnames(df.tree), "_", ".")
# convert cn's to character
df.tree$plt.cn <- as.character(df.tree$plt.cn)
# check out first few rows
head(df.tree)
## plt.cn subp dstrbcd1 dstrbyr1 condid spcd statuscd tpa ba
## 1 188761909020004 2 0 NA 2 106 1 24.07 13.13
## 2 188761409020004 2 0 NA 1 122 1 48.14 84.93
## 3 188761410020004 4 12 9999 1 202 1 24.07 29.15
## 4 431917369489998 3 0 NA 1 746 1 299.86 17.81
## 5 431917370489998 2 0 NA 2 65 2 24.07 18.91
## 6 431917373489998 1 12 2014 1 93 1 923.66 27.92
Summary of fields:
plt.cn is the plot sequence numbersubp is the subplot number (1-4)dstrbcd1 is type of disturbance since the last
measurementdstrbdyr1 is the year of said disturbancecondid is the tree-level condition code – identifies
the condition on which the trees are locatedspcd is the FIA species codesstatuscd is either:
0: no status1: live tree (ultimately what I will filter to)2: dead tree3: removed treetpa is trees per acreba is basal area in square feet per acreSummary stats
This is just a FIA species code lookup table so I could work with species abbreviations rather than the numeric species codes. I’ll remove all of the unnecessary fields first.
# read in table
df.spcd <- read.csv("2021 Master Species FGver9-1_9_2021_final.csv")
# clean up column names
colnames(df.spcd) <- tolower(colnames(df.spcd))
colnames(df.spcd) <- str_replace_all(colnames(df.spcd), "_", ".")
# remove unnecessary fields
df.spcd <- df.spcd[,c("fia.code", "plants.code")]
# check out first few rows
head(df.spcd)
## fia.code plants.code
## 1 11 ABAM
## 2 12 ABBA
## 3 14 ABBR
## 4 15 ABCO
## 5 16 ABFR
## 6 17 ABGR
Summary of fields:
fia.code is numeric species codeplants.code is the USDA PLANTS database codeEven though they won’t be used in any sort of modeling context, there is still some value in having the fuzzed plot coordinates, for diagnostic and plotting purposes. To get these, I downloaded the full state-level plot tables from the FIA DataMart. So, this step involves merging them all together and simplifying them.
# read in state-level plot data to get fuzzed coordinates
df.cord.co <- read.csv("S:/ursa/campbell/bwa/data/fia/CO_PLOT.csv")
df.cord.id <- read.csv("S:/ursa/campbell/bwa/data/fia/ID_PLOT.csv")
df.cord.nv <- read.csv("S:/ursa/campbell/bwa/data/fia/NV_PLOT.csv")
df.cord.ut <- read.csv("S:/ursa/campbell/bwa/data/fia/UT_PLOT.csv")
df.cord.wy <- read.csv("S:/ursa/campbell/bwa/data/fia/WY_PLOT.csv")
df.cord <- rbind(df.cord.co,
df.cord.id,
df.cord.nv,
df.cord.ut,
df.cord.wy)
# clean up column names
colnames(df.cord) <- tolower(colnames(df.cord))
colnames(df.cord) <- str_replace_all(colnames(df.cord), "_", ".")
# convert cn's to character
df.cord$cn <- as.character(df.cord$cn)
# simplify coordinate data
df.cord <- df.cord[,c("cn", "lat", "lon")]
colnames(df.cord)[1] <- "plt.cn"
# check out first few rows
head(df.cord)
## plt.cn lat lon
## 1 35374046010690 40.59182 -108.8127
## 2 35374047010690 40.57703 -108.1593
## 3 35374048010690 40.57548 -107.7195
## 4 35374053010690 40.52328 -107.8978
## 5 35374054010690 40.53379 -107.5385
## 6 35374060010690 40.45123 -108.9959
Summary of fields:
The only fields that matter are:
plt.cn is the plot sequence numberlat is the fuzzed plot center latitudelon is the fuzzed plot center longitudeVisual summary
# plot the data out
par(mar = c(5,5,1,1))
par(las = 1)
plot(lat ~ lon,
data = df.cord,
xlab = "longitude",
ylab = "latitude",
pch = 16,
cex = 0.05)
Due to slightly different data extents and pixel alignments, and due to R’s finicky need to have all raster layers align perfectly, I delivered the massive batch of predictor layers in six different chunks. Erin took each of the subplots that overlaps our AOI and extracted spatially-overlapping pixel values for each subplot. Each chunk has its own table, so I will first merge them together.
# read in tables
df.pred.1 <- read.csv("Extracted_data/chunk_1.csv")
df.pred.2 <- read.csv("Extracted_data/chunk_2.csv")
df.pred.3 <- read.csv("Extracted_data/chunk_3.csv")
df.pred.4 <- read.csv("Extracted_data/chunk_4.csv")
df.pred.5 <- read.csv("Extracted_data/chunk_5.csv")
df.pred.6 <- read.csv("Extracted_data/chunk_6.csv")
# merge predictors together
df.pred <- merge(df.pred.1, df.pred.2)%>%
merge(df.pred.3) %>%
merge(df.pred.4) %>%
merge(df.pred.5) %>%
merge(df.pred.6)
# clean up column names
colnames(df.pred) <- tolower(colnames(df.pred))
colnames(df.pred) <- str_replace_all(colnames(df.pred), "_", ".")
# convert cn's to character
df.pred$plt.cn <- as.character(df.pred$plt.cn)
# check out first few rows and columns
head(df.pred[,1:10])
## plt.cn subp measyear id clim.20m.ahm clim.20m.bffp clim.20m.cmd
## 1 14546577020004 1 2017 34150 510 143 733
## 2 14546577020004 2 2017 34151 511 143 733
## 3 14546577020004 3 2017 34152 510 143 733
## 4 14546577020004 4 2017 34153 512 143 734
## 5 14546578020004 1 2019 33362 486 142 656
## 6 14546578020004 2 2019 33363 491 142 661
## clim.20m.cmd.at clim.20m.cmd.sm clim.20m.cmd.sp
## 1 131 423 179
## 2 131 423 179
## 3 131 423 179
## 4 132 423 179
## 5 110 391 155
## 6 111 393 157
Summary of fields:
plt.cn is the plot sequence numbersubp is the subplot number (1-4)measyear is the measurement yearid is a unique identifier for each unique plot-subplot
combination (i.e. each point)…And then there are a whole bunch of predictors (I’ve only shown a few here)
Summary stats