November 8th, 2016

Summary Image

Outline

  1. Data and Context
  2. Scientific Question
  3. Models
  4. Results
  5. Conclusion

Background

  • Collaborative work with Prof. David Allen in Biology
  • 2008 and 2014 census of all trees Edwin S. George Reserve at the University of Michigan
    • 22634 trees alive on both dates
    • 34 different species

Diameter at Breast Height (DBH)

Important variable in forest ecology. Surrogate for age, shadow cast, tree height/size, basal area/biomass.

Drawing

Variables

Random sample of 6 trees:

species x y dbh08 dbh14
Service Berry -114.5 148.7 3.565 3.565
Pignut Hickory 216.9 129.6 35.014 36.065
Black Cherry -21.7 12.7 5.634 5.570
Shagbark Hickory -2.0 20.0 10.250 10.250
Shagbark Hickory 63.6 199.8 8.594 9.199
Black Oak 260.7 222.4 32.308 34.568

Variables

Convert DBH's to outcome variable: annual growth

species x y annual growth
Service Berry -114.5 148.7 0.000
Pignut Hickory 216.9 129.6 0.175
Black Cherry -21.7 12.7 -0.011
Shagbark Hickory -2.0 20.0 0.000
Shagbark Hickory 63.6 199.8 0.101
Black Oak 260.7 222.4 0.377

Summary 1: Species and Location

Color represents top 15 species (in terms of biomass):

Summary 2: Annual Growth

Observed annual growth:

Outline

  1. Data and Context
  2. Scientific Question
  3. Models
  4. Results
  5. Conclusion

Scientific Question

Can we model the annual growth of a tree while accounting for

  • Basic information about it: species, current DBH, etc.
  • Information about its nearby competitors
  • In particular:
    • the number of competitors
    • the DBH/size/biomass of the competitors
    • nearness of the competitors
    • the species of the competitors

Statistical Model

This translates to a statistical model that needs to incorporate:

  • Population parameters \(\mu_i\): true unknown annual growth for species \(i=1, \ldots, 34\).
  • Relationship parameter \(\lambda_{ij}\): true unknown competitive effect of species \(i\) on species \(j\)
  • Sample data \(\vec{y}_{i}\): observed annual growth for all trees of species \(i\)
  • The point of statistics: using sample data to estimate unknown parameters

Competitive Effects

Effect of American Basswoods on growth of Witch Hazels via \(\lambda_{AB, WH}\). Aid or impede?

American Basswood Witch Hazel
Drawing Drawing

Quick-and-Dirty Estimates of \(\mu_i\)

We display 4 (of 34) estimates of \(\mu_i\) in red: sample mean \(\overline{y}_i\).

Outline

  1. Data and Context
  2. Scientific Question
  3. Models
    1. Canham '04
    2. Bayesian Models in General
    3. Our Bayesian Model
  4. Results
  5. Conclusion

Model of Canham '04

At the root, model assumes:

\[ \mbox{observed annual growth} \sim \mbox{Normal}\left(\mu_i, \sigma^2\right) \]

where \(\mu_i\) incorporates all

  • species \(i\) specific information
  • all species \(i\) vs \(j\) competitive information

Model of Canham '04

Example: Say \(\mu_{AB}=0.20\), the observed growth of 1000 American Basswood trees might be:

Model of Canham '04

Drawing

Issues

  • Model is complicated:
    • Lots of parameters to estimate: \(\lambda_{ij}\), \(\alpha\), \(\beta\), \(\gamma\), \(a_1\), \(a_2\), \(a_3\), etc.
    • Who says this functional form is even right?
  • Practical problem: maximum likelihood solver crashes for certain species!
  • Biggie: Distribution of sample sizes is uneven

First Order Problem

Small sample size for certain species…

Second Order Problem

… but also for pairs of species. How can we estimate \(\lambda_{ij}\)?

Outline

  1. Data and Context
  2. Scientific Question
  3. Models
    1. Canham '04
    2. Bayesian Models In General
    3. Our Bayesian Model
  4. Results
  5. Conclusion

Background: Bayesian Inference

To infer about \(\mu\) based on observations \(\vec{y}\):

  • Prior \(\pi(\mu)\): prior belief about \(\mu\)
  • Data generating mechanism: \(p(\vec{y}|\mu)\)
  • Posterior \(\pi(\mu | \vec{y})\): updated belief about \(\mu\)
  • Put it together: \[\pi(\mu | \vec{y}) \propto p(\vec{y}| \mu) \times \pi(\mu)\]

Posterior Mean as a Weighted Average

Who cares? See blackboard

  • If \(n_i\) is small, more weight is given to the prior mean
  • If \(n_i\) is large, more weight is given to the data (via sample mean)

Back to Trees: Different Sample Sizes

Drawing

Observed Annual Growths are Normal

Drawing

What about \(\mu_i\)'s?

Drawing

Key Idea: Let them be Normal as Well

Drawing

Hierarchical Models

  • \(\mu_{oak}\) is a hyperparameter; a parameter to rule them all
  • Using this, we can borrow information from other similar species when the sample size is small
  • If the sample size is large, less borrowing
  • Bayesian methods lend themselves well to this hierarchical structuring

Can we take advantage of…

binomial classification family species
Quercus alba Fagaceae White Oak
Quercus coccinea Fagaceae Scarlet Oak
Quercus ellipsoidalis Fagaceae Northern Pin Oak
Quercus rubra Fagaceae Red Oak
Quercus velutina Fagaceae Black Oak
Acer rubrum Sapindaceae Red Maple
Acer saccharum Sapindaceae Sugar Maple

Recall

We display 4 (of 34) estimates \(\overline{y}_i\) of \(\mu_i\) in red.

Histogram of 34 Estimates of \(\mu_i\)

In blue we mark the estimate of the hyperparameter \(\mu\):

Outline

  1. Data and Context
  2. Scientific Question
  3. Models
    1. Canham '04
    2. Bayesian Models In General
    3. Our Bayesian Model
  4. Results
  5. Conclusion

Our (Simple) Bayesian Model

The model for the observed annual growth of a specific tree, so far, incorporates:

  1. DBH of tree itself
  2. Biomass of competitors: competition for resources
  3. Number of competitors: "
  4. i.e. we don't distinguish the species of the competitors yet

Our (Simple) Bayesian Model

For species \(i\), we model

\[ \mbox{observed annual growth} \sim \mbox{Normal}\left(\mu_i, \sigma^2\right) \]

  • where \[ \mu_{i} = \beta_{0,i} + \beta_{1, i} \mbox{dbh} + \beta_{2,i}\mbox{biomass} + \beta_{3,i}\mbox{num competitors} \]
  • Each of the parameters \((\beta_{0,i}, \beta_{1, i}, \beta_{2,i}, \beta_{3,i})\) come from distributions with their own hyperparameters \((\beta_{0}, \beta_{1}, \beta_{2}, \beta_{3})\).

Outline

  1. Data and Context
  2. Scientific Question
  3. Models
    1. Canham '04
    2. Bayesian Models In General
    3. Our Bayesian Model
  4. Results
  5. Conclusion

Focal Species

We focus only on two species for now:

species count median dbh total biomass
Red Maple 7584 5.09 55.01
Red Oak 160 31.08 17.12

Parameter Estimates

Hyperparameter Estimates

Outline

  1. Data and Context
  2. Scientific Question
  3. Models
    1. Canham '04
    2. Bayesian Models In General
    3. Our Bayesian Model
  4. Results
  5. Conclusion

Future Work

  • Implement this for all species at once and get all estimates.
  • Understand methods for comparing how well different models fit the data.
  • Think of clever ways to build hierarchy.
  • We only considered growth models. What about birth/death?

Executive Summary

Even with small \(n_i\), using Bayesian Hierarchical models can incorporate inherent contextual structure in the data to nevertheless obtain reliable estimates of all parameters.

Executive Summary

Getting the Posterior

There are in general two ways to derive a distribution:

Analytically (Pen & Paper) Via Simulation
Drawing Drawing
  • In many cases, \(\pi(\mu|\vec{y})\) is too hard to derive by hand, so we rely on Markov Chain Monte Carlo methods to simulate draws from this distribution.

Getting Posterior

So say the unknown distribution \(\pi(\mu|\vec{y})\) is this. Without knowing the true shape of this distribution…

Getting Posterior

… we simulate random samples using MCMC and approximate all quantities.