BWA Infestation Scoring

Introduction

I thought I would put together an R Markdown document as a convenient way to start sharing my work on the preliminary field data for the BWA project. My goal here to be transparent about how I’m conducting various components of the analysis, and share the raw(-ish) code with you so you can take a look, comment, make suggestions, etc.

Reading in the data

To be 100% transparent, I’ve already done some behind the scenes data cleanup stuff that I don’t think warrants sharing in an open environment like this. That said, if you want to know the nitty-gritty details of that process, I’m happy to put together another Markdown for data cleanup. The result of that data cleanup process is a single table, where plot-level data have been joined with tree-level data (flat file, rather than relational database). So, in the table below, you’ll see both plot-level stuff (repeated for each plot as many times as there are trees in the plot), and tree-level stuff (one row for each tree). Feel free to explore how I’ve distilled down the most important information into a single table below.

library(rmarkdown)

# set working directory
setwd("S:\\ursa\\campbell\\bwa\\data\\field_data\\field_data\\processed_data\\download_202110121039\\a03_csv")

# read in and display dataset
df <- read.csv("bwa_field_data_plots_20211013_cleaned.csv")
paged_table(df)

So, that brings us to our first interesting statistic of the day… Not including the drone trees, we have measured 2315 trees! And, our slightly less interesting second statistic is the number of plots we’ve collected: 31. We’ll get there…

Removing old dead trees

The ultimate goal of this script is to generate per-tree and per-plot infestation scores. Since we don’t have any infestation data for old dead trees, let’s just start by removing these. Again, we can discuss how they might be useful down the road, but at least for preliminary purposes, I don’t see a huge need.

# remove old dead trees
df <- df[df$tree.status != "old_dead",]

Minus dead trees, we have 2008 trees left over.

Some other fun descriptive statistics

Since we’re in R world, it’s super easy to generate some quick visualizations and descriptive statistics that provide us with a baseline understanding of our dataset. So, why the hell not?

# bar plot of trees by species
par(mar = c(5,5,1,1))
sp.counts <- table(df$tree.species)
barplot(sp.counts, las = 2)

# histogram of diameters
hist(df$tree.dbh, las = 1, xlab = "DBH (cm)", main = NULL)
mn <- round(mean(df$tree.dbh),1)
md <- round(median(df$tree.dbh),1)
abline(v = md, col = "blue", lwd = 2, lty = 2)
abline(v = mn, col = "red", lwd = 2, lty = 2)
legend("topright", legend = c(paste0("median (", md, " cm)"),
                              paste0("mean (", mn, " cm)")),
       col = c("blue", "red"),
       lwd = 2, lty = 2)

# table of median dbh by species
dbh.sp.mn <- aggregate(df$tree.dbh, list(df$tree.species), median)
colnames(dbh.sp.mn) <- c("species", "mean.dbh")
paged_table(dbh.sp.mn)

# bar plot of bwa scores
bwa.counts <- table(df$tree.bwa.score)
barplot(bwa.counts, las = 1, xlab = "BWA Scores")

Basal Area Calculation

One of our underlying assumptions (or maybe hypotheses is a better way to put it) has been that trees with larger DBHs (greater basal area) will have a greater effect on the spectral signal as compared to smaller trees. So, to generate infestation scores requires basal area calculations first. I’m going to do a few things here: (1) calculate basal area per tree; (2) calculate total basal area per plot; and then (3) calculate proportional basal area for each tree within each plot.

There are a TON of different ways to approach this infestation scoring process… Addition versus taking means. Raw scores versus normalized scores. Etc. Etc. Etc. In the end, I’m not super sure that it matters all that much – it will just affect the scale of the final scores, not necessarily the relative differences between scores. But, I think it’s worth having a detailed discussion about this at some point. In any case, this is my thinking here… If you have an infestation score, rather than weighting it by the raw basal area (e.g. 3.5 m^2), you can weight it by the proportion of basal area of that tree within the entire plot (e.g. 2%).

# calculate basal area per tree (square meters)
df$tree.ba <- (pi * (df$tree.dbh / 2) ^ 2) / 10000

# calculate basal area per plot (square meters per hectare)
plot.ba <- tapply(df$tree.ba, df$plot.id, sum)
df$plot.ba.sum <- rep(plot.ba, rle(df$plot.id)$lengths)
df$plot.ba.dens <- df$plot.ba.sum / 0.0706858347058

# calculate proportional basal area per tree
df$tree.ba.norm <- df$tree.ba / df$plot.ba.sum

# print out subset of the table
sub.cols <- c("plot.id", "plot.ba.sum", "tree.id", "tree.species", "tree.dbh", "tree.ba", "tree.ba.norm")
paged_table(df[,sub.cols])

Normalizing SCE

Very minor step here, but in keeping with the idea of normalization – that is, getting everything on the same scale, from 0 to 1 – I’m now going to convert SCE (which natively ranges from 0 to 100) to 0 to 1.

# calculate normalized sce
df$tree.sce.norm <- df$tree.sce / 100

Normalizing all of the infestation/damage metrics

Now onto some (slightly) more interesting stuff… As with SCE and basal area, I think it makes sense to normalize the individual damage/infestation metrics. At this point there’s some variability in terms of how these metrics are quantitatively stored…

par(mfrow = c(3,3))
par(mar = c(5,5,1,1))

# wool dens
tab.bwa.dens <- table(df$tree.bwa.dens)
barplot(tab.bwa.dens, las = 1, main = "Wool Dens")

# gout sev
tab.gout.sev <- table(df$tree.gout.sev)
barplot(tab.gout.sev, las = 1, main = "Gout Sev")

# crown def
tab.crown.def <- table(df$tree.crown.def)
barplot(tab.crown.def, las = 1, main = "Crown Def")

# dead top
tab.dead.top <- table(df$tree.top)
barplot(tab.dead.top, las = 1, main = "Dead Top")

# branch die
tab.branch.die <- table(df$tree.branch.die)
barplot(tab.branch.die, las = 1, main = "Branch Die")

# bwa score
tab.bwa.score <- table(df$tree.bwa.score)
barplot(tab.bwa.score, las = 1, main = "BWA Score")

# oth score
tab.oth.score <- table(df$tree.oth.score)
barplot(tab.oth.score, las = 1, main = "Other Score")

Most variables range from 0 to 3, but then there’s dead top (yes/no or 0/1) and branch dieback (0 to 100). So, I think it’s worthwhile to convert everything to a 0 to 1 scale. I’ll start with the non-gut-feel variables:

# create normalized scores (range from 0 to)
df$tree.bwa.dens.norm <- df$tree.bwa.dens / 3
df$tree.gout.sev.norm <- df$tree.gout.sev / 3
df$tree.top.norm <- 0
df$tree.top.norm[df$tree.top == "yes"] <- 1
df$tree.crown.def.norm <- df$tree.crown.def / 3
df$tree.branch.die.norm <- df$tree.branch.die / 100

Now, onto the gut-feel variables. Things get a little more sticky here, I think. We’ve discussed this in the field, but for the sake of posterity, let’s have the discussion again. We need, on some level, to be able to distinguish the effects of BWA versus other agents. So, let’s say a tree has some crown deformities, a dead top, and high branch dieback, but they’re all driven by some other agent (e.g. crack frost), and there is no evidence of BWA. Or, if a tree has severe symptoms, but is likely equally due to, say, bark beetles and BWA. We need to figure out how to handle that. My suggestion a while back was to do a pretty simple equation:

\(DS = BS + OS\)

\(BS_{normalized} = \frac{BS}{DS}\)

\(OS_{normalized} = \frac{OS}{DS}\)

where BS is our gut-feel BWA score, OS is our gut-feel other agent score, and DS is the combined damage score. The good thing here is that it allows for relative contributions of these damage symptoms. For example, if a tree had a BS of 3 and an OS of 1, then the normalized BS would be 0.75, which is to say that 3/4 of the damage seen is due to BWA. I think that’s a fair way to score things. The down side, however, is that if there is only one agent present (e.g. just BWA, no other agents), then it doesn’t really “care” if it’s severe or light. For example, if a tree had a BS of 3 and an OS of 0, then the normalized BS would be 1. Likewise, if a tree had a BS of 1 and an OS of 0, then the normalized BS would also be 1. One could argue that this is not a problem, however, since the infestation symptoms would correct for this. So, in effect, the BS and OS are only useful in terms of attributing the symptoms to an agent, and won’t be used otherwise. I think this is actually justifiable, since they are “gut feel” scores, and the symptom values are more quantitative. But, I’m open to more discussion here. Anyway, for now I’ll move forward with this approach:

# calculate normalized damage scores
df$tree.bwa.score.norm <- df$tree.bwa.score / (df$tree.bwa.score + df$tree.oth.score)
df$tree.oth.score.norm <- df$tree.oth.score / (df$tree.bwa.score + df$tree.oth.score)

Calculating per-tree symptom scores

The next big question is how to combine the individual symptom scores into a single, tree-level symptom score. The simplest solution (though perhaps not the best one?) would be to average them out. The pro here is conceptual simplicity. The con here is the assumption that all variables are equally representative of BWA infestation. We know, for example, that gouting is the most direct and perhaps most damaging symptom of BWA infestation. So, should it be weighted more heavily than, say, wool density, which as we’ve seen bears relatively little correspondence to apparent tree health? Another point for discussion at some point. But, for the sake of simplicity, let’s just assume that all of these variables should be treated equally.

# calculate a mean "symptom score"
mean.cols <- c("tree.bwa.dens.norm",
               "tree.gout.sev.norm",
               "tree.top.norm",
               "tree.crown.def.norm",
               "tree.branch.die.norm")
df$tree.sympt.score <- rowMeans(df[,mean.cols], na.rm = T)

# take a look at the distributions
hist(df$tree.sympt.score, las = 1, main = NULL, xlab = "Symptom Score")

…Looks about how we’d expect (mostly low/no symptoms, with a decreasing tail of higher symptoms).

We now have a symptom score (tree.sympt.score), independent of any quantification of what might have caused those symptoms. So the next step is to take our normalized BWA score (tree.bwa.score.norm) and multiply it by our symptom scores to get per-tree, BWA-driven symptom scores (tree.sympt.score.bwa). While we’re at it, we can also generate some additional, weighted scores. Recall that, at this point, we have done nothing to weight these scores by basal area or by SCE. So, in addition to the unweighted BWA symptom score, we can generate three additional symptom scores:

BWA-driven symptoms weighted by basal area (tree.sympt.score.bwa.ba)
BWA-driven symptoms weighted by SCE (tree.sympt.score.bwa.sce)
BWA-driven symptoms weighted by basal area and SCE (tree.sympt.score.bwa.ba.sce)

# get bwa-driven symptom scores
df$tree.sympt.score.bwa <- df$tree.sympt.score * df$tree.bwa.score.norm
df$tree.sympt.score.bwa.ba <- df$tree.sympt.score.bwa * df$tree.ba.norm
df$tree.sympt.score.bwa.sce <- df$tree.sympt.score.bwa * df$tree.sce.norm
df$tree.sympt.score.bwa.ba.sce <- df$tree.sympt.score.bwa.ba * df$tree.sce.norm

Let’s take a look at the ranges of these values:

# plot histograms
par(mfrow = c(2,2))
par(mar = c(5,5,1,1))
hist(df$tree.sympt.score.bwa, main = "Unweighted", xlab = "Score", las = 1)
hist(df$tree.sympt.score.bwa.ba, main = "BA-Weighted", xlab = "Score", las = 1)
hist(df$tree.sympt.score.bwa.sce, main = "SCE-Weighted", xlab = "Score", las = 1)
hist(df$tree.sympt.score.bwa.ba.sce, main = "BA & SCE-Weighted", xlab = "Score", las = 1)

Notice how the values of the two basal area-weighted scores get majorly-reduced. This is because, again, the basal areas have been normalized to represent the proportional, per-plot basal area of each tree. So, the numbers are very low (e.g. 0.01). Thus, when multiplying through, this drastically reduces the resulting scores. Is this a problem?? I don’t really know. I think that, relatively speaking, the numbers aren’t badly affected, but it just becomes difficult to understand what these values actually mean anymore (e.g. “what does a BA-weighted symptom score of 0.02 mean? Is that severe? Moderate? etc.”). Again, more fodder for discussion.

For now, let’s just move forward with this system. Very easy to adjust later if/as needed. I always like data visualizations, so let’s see how these various unweighted and weighted symptom scores compare to one another.

pairs(df[,c("tree.sympt.score.bwa",
            "tree.sympt.score.bwa.ba",
            "tree.sympt.score.bwa.sce",
            "tree.sympt.score.bwa.ba.sce")])

Quite a lot of variability!! Just goes to show you how significant the effects of massaging your data are on the story being told in the long run.

Calculating per-plot symptom scores

The last step in this process is to summarize the per-tree data at the plot-level. Once again, there are several ways to approach this… Do you add the tree scores together per plot? Take a mean? I think the answer for these variables depends on which variable we’re talking about.

Variable 1: Unweighted (tree.sympt.score.bwa)
- Again, this represents the average symptom score (wool density, gout, etc.) per tree, multiplied by the normalized BWA gut score, which should theoretically account for trees with symptoms that are due to non-BWA agents. I think the best approach here is to take an average per plot. In essence, this would tree all trees equally, regardless of size or synoptic, and take the average BWA-driven symptom score across all trees in the plot. So, the average makes sense here.
Variable 2: Basal Area-Weighted (tree.sympt.score.bwa.ba)
- Here, I would actually add the scores together, instead of taking a mean. Why? Because by multiplying by the proportional basal area each tree represents, we’re already taking the full, plot-level basal area, and subdividing it by tree. Therefore, to get a plot-level estimate, I think it makes sense to un-subdivide the symptom scores. Think about it this way – if you took a mean, and you had 100 evenly-sized trees in a plot, each would have a proportional basal area of 0.01 (1%), giving each score a relatively low value. Whereas if you had 10 evenly-sized trees in a plot, each would have propotional basal areas of 0.1 (10%), givin each score a relatively high value. So, even if those two plots had the same level of infestation/symptom manifestation, the results would be very different if you took a plot-level mean. If you add them together, however, you’d see that they were the same. So, I think the best approach here is to take a sum per plot.
Variable 3: SCE-Weighted (tree.sympt.score.bwa.sce)
- Here, I would go back to the mean. All we’re doing by adding in the SCE is saying how much effect this tree might have on the spectral signal in the end. So, it’s just increasing the weight of exposed trees and decreasing the weight of suppressed trees. We’re not trying to say what proportion of the plot is occupied by a given tree’s crown in the canopy (in which case I would suggest addition, as with basal area). We’re just trying to augment the per-tree estimates to downweight those trees that aren’t seen from above. So, I think the best approach here is to take a mean per plot.
Variable 4: Basal Area- and SCE-Weighted (tree.sympt.score.bwa.ba.sce)
- Since we’re back to the basal area weighting, I think we should likewise be back in the realm of addition, rather than averaging. So, I think the best approach here is to take a sum per plot.

Enough talk, let’s do it!

# create summary df
plot.sympt.score.bwa <- aggregate(df$tree.sympt.score.bwa, list(df$plot.id), mean, na.rm = T)
plot.sympt.score.bwa.ba <- aggregate(df$tree.sympt.score.bwa.ba, list(df$plot.id), sum, na.rm = T)
plot.sympt.score.bwa.sce <- aggregate(df$tree.sympt.score.bwa.sce, list(df$plot.id), mean, na.rm = T)
plot.sympt.score.bwa.ba.sce <- aggregate(df$tree.sympt.score.bwa.ba.sce, list(df$plot.id), sum, na.rm = T)

# change the column names to something meaningful
colnames(plot.sympt.score.bwa) <- c("plot.id", "sympt.score.bwa")
colnames(plot.sympt.score.bwa.ba) <- c("plot.id", "sympt.score.bwa.ba")
colnames(plot.sympt.score.bwa.sce) <- c("plot.id", "sympt.score.bwa.sce")
colnames(plot.sympt.score.bwa.ba.sce) <- c("plot.id", "sympt.score.bwa.ba.sce")

# merge them together into a single table
summary.df <- merge(plot.sympt.score.bwa, plot.sympt.score.bwa.ba)
summary.df <- merge(summary.df, plot.sympt.score.bwa.sce)
summary.df <- merge(summary.df, plot.sympt.score.bwa.ba.sce)

# take a look at the table
paged_table(summary.df)

colnames(summary.df)

## [1] "plot.id"                "sympt.score.bwa"        "sympt.score.bwa.ba"    
## [4] "sympt.score.bwa.sce"    "sympt.score.bwa.ba.sce"

# compare the resulting plot-level scores graphically
pairs(summary.df[,c("sympt.score.bwa",
                    "sympt.score.bwa.ba",
                    "sympt.score.bwa.sce",
                    "sympt.score.bwa.ba.sce")])

Interesting… So, once you aggregate at the plot level, some of those individual tree-level differences get washed out. But, there’s definitely some variability in there!

Create a final data table

With plot-level scores calculated, let’s merge these numbers back with the original, plot-level data (e.g. plot names, dates, coordinates, notes, etc.).

# reduce original data frame to just plot-level data, and get rid of tree-level repetition
plot.df <- unique(df[,1:12])

# join the summary stats to the plot-level data
summary.df <- merge(plot.df, summary.df)
paged_table(summary.df)

# export to csv
write.csv(summary.df, "plot_level_summaries_20211015.csv", row.names = F)

Map it out!

We’re geographers, why not? It never hurts to see how things look spatially, particularly given our knowledge of what areas had high/low infestation. Plus, we’re in R Markdown land, which is a fun place to mess with some mapping capabilities.

library(sp)

## Warning: package 'sp' was built under R version 4.0.3

# convert points to spatial points data frame and plot them
spdf <- SpatialPointsDataFrame(coords = summary.df[,c("longitude", "latitude")],
                               data = summary.df,
                               proj4string = CRS("+init=epsg:4326"))
plot(spdf)

…Looks about right, but not that useful. Let’s spice things up with leaflet.

library(leaflet)

## Warning: package 'leaflet' was built under R version 4.0.5

library(viridis)

## Loading required package: viridisLite

# create color palette
p <- colorNumeric(palette = magma(100), 
                    domain = c(0, max(c(spdf$sympt.score.bwa,
                                        spdf$sympt.score.bwa.ba,
                                        spdf$sympt.score.bwa.sce,
                                        spdf$sympt.score.bwa.ba.sce))))

# create leaflet map -- unweighted
m1 <- leaflet(spdf) %>%
  addProviderTiles("Esri.WorldImagery") %>%
  addCircleMarkers(radius = 8,
                   color = ~p(sympt.score.bwa),
                   stroke = F,
                   fillOpacity = 0.75) %>%
  addLegend("topright",
            pal = p,
            values = ~sympt.score.bwa,
            title = "Unweighted Score",
            opacity = 1)
m1

# create leaflet map -- ba-weighted
m2 <- leaflet(spdf) %>%
  addProviderTiles("Esri.WorldImagery") %>%
  addCircleMarkers(radius = 8,
                   color = ~p(sympt.score.bwa.ba),
                   stroke = F,
                   fillOpacity = 0.75) %>%
  addLegend("topright",
            pal = p,
            values = ~sympt.score.bwa.ba,
            title = "BA-Weighted Score",
            opacity = 1)
m2

# create leaflet map -- sce-weighted
m3 <- leaflet(spdf) %>%
  addProviderTiles("Esri.WorldImagery") %>%
  addCircleMarkers(radius = 8,
                   color = ~p(sympt.score.bwa.sce),
                   stroke = F,
                   fillOpacity = 0.75) %>%
  addLegend("topright",
            pal = p,
            values = ~sympt.score.bwa.sce,
            title = "SCE-Weighted Score",
            opacity = 1)
m3

# create leaflet map -- ba- and sce-weighted
m4 <- leaflet(spdf) %>%
  addProviderTiles("Esri.WorldImagery") %>%
  addCircleMarkers(radius = 8,
                   color = ~p(sympt.score.bwa.ba.sce),
                   stroke = F,
                   fillOpacity = 0.75) %>%
  addLegend("topright",
            pal = p,
            values = ~sympt.score.bwa.ba.sce,
            title = "BA- and SCE-Weighted Score",
            opacity = 1)
m4

…Looks about right to me!

Conclusions

Well, as you can see, there are an awful lot of dials to tweak and turn. But, I think this is a solid starting point. I’ve already done some preliminary modeling stuff, but in the interest of full transparency, I’ll throw together another Markdown document when I get a chance.

If you’ve made it this far, you’ve spent too much time reading this. :)