Explorations

Average annual growth from 2008 to 2014 by species, sorted by median, color coded by rough family.

Does the raw average annual growth look normal for the different species? Not unreasonble to assume they are normally distributed (plots sorted in descending order of sample size).

Normality Assumption

Given that the observations themselves look normal, suggesting that the true population distribution is normal, we thus assume the sampling distributions of the \(i=1,\ldots,34\) sample means are normal with

  • mean = \(\overline{x}_i\)
  • standard error = \(\frac{s}{\sqrt{n}}\)

Comparison of Sampling Distributions

Here is a plot of the 34 (assumed normal) sampling distributions

Posterior Means

We run RStan to compute posterior means and compare them to the original means.

term estimate std.error
mu 0.208 0.024
tau 0.124 0.021
species n mean post_mean
Black Walnut 2 -0.058 0.036
Witch Hazel 2659 0.003 0.003
Flowering Dogwood 298 0.054 0.054
Musclewood 3 0.058 0.063
Autumn Olive 164 0.062 0.064
Service Berry 1324 0.091 0.091
Choke Cherry 30 0.103 0.106
Black Cherry 7238 0.109 0.109
Hophornbeam 233 0.161 0.161
Bitternut Hickory 45 0.168 0.169
American Elm 539 0.221 0.221
Red Maple 5287 0.224 0.224
Pignut Hickory 1089 0.236 0.236
Shagbark Hickory 143 0.241 0.240
White Oak 1090 0.246 0.246
Sassafras 503 0.259 0.258
Black/Northern Pin hybrid 152 0.262 0.260
White Ash 10 0.278 0.260
American Basswood 65 0.299 0.290
American Beech 60 0.320 0.312
Black Oak 933 0.321 0.318
Black/Red Oak hybrid 536 0.331 0.327
Big Tooth Aspen 31 0.366 0.349
Red Oak 149 0.382 0.377
Sugar Maple 10 0.459 0.371
n_species <- species_summary$species %>% table() %>% length()
n_sim <- 10000

sim <- NULL
for(i in 1:n_species) {
  sim <- bind_rows(
    sim,
    data_frame(
      species = species_summary$species[i],
      mean = species_summary$mean[i],
      value = rnorm(n_sim, species_summary$post_mean[i], species_summary$SE[i])
    )
  )
}

ggplot(data=sim, aes(x=value)) +
  geom_histogram() +
  facet_wrap(~species, scales = "free") +
  geom_vline(aes(xintercept = mean), color="red")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

For Next Time

  • Layout the Bayesian model, including parameter/hyperparameter structure
  • Incorporate
    • dbh of focal tree
    • biomass of neighbors within neighborhood rather than number of neighbors. Think of one giant oak versus 10 small shrubs.