Modeling the effects of interspecies competition on the growth of trees

Monday, December 4th, 2017

What variables are being collected?

Data source: 2008 & 2014 tree censuses

Forestry variables

Diameter at breast height (DBH) used for "growth"
DBH -> basal area -> proxy for biomass

Outcome variable \(y\)

Observed average annual growth of trees 2008-2014

Covariates

Biomass & number of "nearby" competitors (< 7.5m)
Presence of spatial autocorrelation

Model for growth

We model average annual growth \(y_{ij}\) of tree \(i\) of species \(j\) for \(j=1,\ldots,k\).

\[ \begin{align} y_{ij} &= \beta_{0,j} + \left\{\text{focal info}\right\} + \left\{\text{competitor info}\right\} + \epsilon\\ &= \beta_{0,j} + \left\{\beta_{\text{dbh},j}\text{dbh}_{ij}\right\} + \left\{\text{competitor info}\right\} + \epsilon\\ \end{align} \]

Two choices for \(\left\{\text{competitor info}\right\}\) where \(\text{BM}\) is biomass:

\[ \begin{array}{rr} \text{Choice 1:} & \beta_{\text{BM},j} \sum_{\text{comp trees}} \text{BM}\\ \text{Choice 2:} & \sum_{j'} \lambda_{j,j'} \sum_{\substack{\text{comp trees} \\ \text{of species} \\ j'}} \text{BM}\\ \end{array} \]

Competitor Info: First choice

Competitor Info: Second choice

Competitor Info

To model the competitive effect of neighboring trees on trees of species \(j\), should we

Lump all competitors together?
Distinguish between species?

i.e. Do we use \(k\) parameters or \(k \times k\) parameters?

\[ \begin{align} \boldsymbol{\beta} = \left(\beta_{\text{BM},1}, \ldots, \beta_{\text{BM},k}\right) \text{ vs. } \boldsymbol{\lambda} = \left( \begin{array}{ccc} \lambda_{1,1} & \ldots & \lambda_{1,k}\\ \vdots & \ddots & \vdots\\ \lambda_{k,1} & \ldots & \lambda_{k,k}\\ \end{array} \right) \end{align} \]

Model selection

Fit Model 1 with \(\beta_{\text{BM},j}\) and Model 2 with \(\lambda_{j,j'}\)
Make predicitions \(\widehat{y}_{ij}\) using both models
See if \(\text{MSE}_2(\widehat{y}_{ij}, y_{ij}) < \text{MSE}_1(\widehat{y}_{ij}, y_{ij})\)

Two components

Bayesian hierarchical models for posterior predictions
Spatial crossvalidation

Hierarchical models in a nutshell

Intuition: Posterior means

Prior \(\mu \sim\text{Normal}(\mu_0, \sigma^2_0)\)
Observations \(y_i \sim \text{Normal}(\mu, \sigma^2)\) then

\[ \begin{align} \mathbb{E}[\mu|y_1,\ldots,y_n] &= \left(\frac{\frac{1}{\sigma_0^2}}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma_0^2}}\right) \mu_0 + \left(\frac{\frac{n}{\sigma_0^2}}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma_0^2}}\right) \overline{y} \end{align} \]

Along these lines, the red oaks (smaller \(n\)) will "borrow" information from black oaks (larger \(n\)) via \(\mu_{\text{oaks}}\).

Posterior predictive distribution

To generate posterior predictions \(\widehat{y}_{ij}\) of \(y_{ij}\)

\[ \begin{align} p\left(\widehat{y}_{ij}\left|\boldsymbol{y}\right.\right) &= \int_{\boldsymbol{\lambda}}\int_{\boldsymbol{\beta}} p\left(\widehat{y}_{ij}\left|\boldsymbol{\beta}, \boldsymbol{\lambda},\boldsymbol{y}\right.\right) \times p\left(\boldsymbol{\beta}, \boldsymbol{\lambda}\left|\boldsymbol{y}\right.\right) d\boldsymbol{\beta}d\boldsymbol{\lambda} \end{align} \]

where samples from \(p\left(\boldsymbol{\beta}, \boldsymbol{\lambda}\left|\boldsymbol{y}\right.\right)\) are generated via Hamiltonian MCMC (RStan).

Two components:

~~Bayesian hierarchical model posterior predictions~~
Spatial crossvalidation

Recall: Spatial autocorrelation

Covariates: Spatial distribution of species and biomass
Residuals: Elevation and sunlight not measured

From Roberts (2017)

Spatial crossvalidation

Results: Overall RMSE

Note: Quartiles of \(y_{ij}\): 0.053 / 0.122 / 0.249.

RMSE_1	RMSE_2
0.161	0.155

Results: RMSE split by species.

species	RMSE_1	RMSE_2	mean_growth
Red Oak	0.230	0.238	0.386
Black/Red Oak hybrid	0.220	0.217	0.331
Red Maple	0.196	0.180	0.227
Pignut Hickory	0.191	0.185	0.245
Black Oak	0.187	0.187	0.292
White Oak	0.182	0.181	0.242
Sassafras	0.160	0.170	0.269
Black Cherry	0.131	0.130	0.128