Monday, December 4th, 2017

What variables are being collected?

Data source: 2008 & 2014 tree censuses

Forestry variables

  • Diameter at breast height (DBH) used for "growth"
  • DBH -> basal area -> proxy for biomass
Drawing

Outcome variable \(y\)

Observed average annual growth of trees 2008-2014

Covariates

  • Biomass & number of "nearby" competitors (< 7.5m)
  • Presence of spatial autocorrelation

Model for growth

We model average annual growth \(y_{ij}\) of tree \(i\) of species \(j\) for \(j=1,\ldots,k\).

\[ \begin{align} y_{ij} &= \beta_{0,j} + \left\{\text{focal info}\right\} + \left\{\text{competitor info}\right\} + \epsilon\\ &= \beta_{0,j} + \left\{\beta_{\text{dbh},j}\text{dbh}_{ij}\right\} + \left\{\text{competitor info}\right\} + \epsilon\\ \end{align} \]

Two choices for \(\left\{\text{competitor info}\right\}\) where \(\text{BM}\) is biomass:

\[ \begin{array}{rr} \text{Choice 1:} & \beta_{\text{BM},j} \sum_{\text{comp trees}} \text{BM}\\ \text{Choice 2:} & \sum_{j'} \lambda_{j,j'} \sum_{\substack{\text{comp trees} \\ \text{of species} \\ j'}} \text{BM}\\ \end{array} \]

Competitor Info: First choice

Competitor Info: Second choice

Competitor Info

To model the competitive effect of neighboring trees on trees of species \(j\), should we

  1. Lump all competitors together?
  2. Distinguish between species?

i.e. Do we use \(k\) parameters or \(k \times k\) parameters?

\[ \begin{align} \boldsymbol{\beta} = \left(\beta_{\text{BM},1}, \ldots, \beta_{\text{BM},k}\right) \text{ vs. } \boldsymbol{\lambda} = \left( \begin{array}{ccc} \lambda_{1,1} & \ldots & \lambda_{1,k}\\ \vdots & \ddots & \vdots\\ \lambda_{k,1} & \ldots & \lambda_{k,k}\\ \end{array} \right) \end{align} \]

Model selection

  1. Fit Model 1 with \(\beta_{\text{BM},j}\) and Model 2 with \(\lambda_{j,j'}\)
  2. Make predicitions \(\widehat{y}_{ij}\) using both models
  3. See if \(\text{MSE}_2(\widehat{y}_{ij}, y_{ij}) < \text{MSE}_1(\widehat{y}_{ij}, y_{ij})\)

Two components

  1. Bayesian hierarchical models for posterior predictions
  2. Spatial crossvalidation

Hierarchical models in a nutshell

Drawing

Intuition: Posterior means

  1. Prior \(\mu \sim\text{Normal}(\mu_0, \sigma^2_0)\)
  2. Observations \(y_i \sim \text{Normal}(\mu, \sigma^2)\) then

\[ \begin{align} \mathbb{E}[\mu|y_1,\ldots,y_n] &= \left(\frac{\frac{1}{\sigma_0^2}}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma_0^2}}\right) \mu_0 + \left(\frac{\frac{n}{\sigma_0^2}}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma_0^2}}\right) \overline{y} \end{align} \]


Along these lines, the red oaks (smaller \(n\)) will "borrow" information from black oaks (larger \(n\)) via \(\mu_{\text{oaks}}\).

Posterior predictive distribution

To generate posterior predictions \(\widehat{y}_{ij}\) of \(y_{ij}\)

\[ \begin{align} p\left(\widehat{y}_{ij}\left|\boldsymbol{y}\right.\right) &= \int_{\boldsymbol{\lambda}}\int_{\boldsymbol{\beta}} p\left(\widehat{y}_{ij}\left|\boldsymbol{\beta}, \boldsymbol{\lambda},\boldsymbol{y}\right.\right) \times p\left(\boldsymbol{\beta}, \boldsymbol{\lambda}\left|\boldsymbol{y}\right.\right) d\boldsymbol{\beta}d\boldsymbol{\lambda} \end{align} \]

where samples from \(p\left(\boldsymbol{\beta}, \boldsymbol{\lambda}\left|\boldsymbol{y}\right.\right)\) are generated via Hamiltonian MCMC (RStan).

Two components:

  1. Bayesian hierarchical model posterior predictions
  2. Spatial crossvalidation

Recall: Spatial autocorrelation

  1. Covariates: Spatial distribution of species and biomass
  2. Residuals: Elevation and sunlight not measured

From Roberts (2017)

Drawing

Spatial crossvalidation

Results: Overall RMSE

Note: Quartiles of \(y_{ij}\): 0.053 / 0.122 / 0.249.

RMSE_1 RMSE_2
0.161 0.155

Results: RMSE split by species.

species RMSE_1 RMSE_2 mean_growth
Red Oak 0.230 0.238 0.386
Black/Red Oak hybrid 0.220 0.217 0.331
Red Maple 0.196 0.180 0.227
Pignut Hickory 0.191 0.185 0.245
Black Oak 0.187 0.187 0.292
White Oak 0.182 0.181 0.242
Sassafras 0.160 0.170 0.269
Black Cherry 0.131 0.130 0.128

Predictive ability

Residuals