Interspecies Differences in Tree Growth

Explorations

Average annual growth from 2008 to 2014 by species, sorted by median, color coded by rough family.

Does the raw average annual growth look normal for the different species? Not unreasonble to assume they are normally distributed (plots sorted in descending order of sample size).

Normality Assumption

Given that the observations themselves look normal, suggesting that the true population distribution is normal, we thus assume the sampling distributions of the \(i=1,\ldots,34\) sample means are normal with

mean = \(\overline{x}_i\)
standard error = \(\frac{s}{\sqrt{n}}\)

Comparison of Sampling Distributions

Here is a plot of the 34 (assumed normal) sampling distributions

Posterior Means

We run RStan to compute posterior means and compare them to the original means.

term	estimate	std.error
mu	0.208	0.024
tau	0.124	0.021

species	n	mean	post_mean
Black Walnut	2	-0.058	0.036
Witch Hazel	2659	0.003	0.003
Flowering Dogwood	298	0.054	0.054
Musclewood	3	0.058	0.063
Autumn Olive	164	0.062	0.064
Service Berry	1324	0.091	0.091
Choke Cherry	30	0.103	0.106
Black Cherry	7238	0.109	0.109
Hophornbeam	233	0.161	0.161
Bitternut Hickory	45	0.168	0.169
American Elm	539	0.221	0.221
Red Maple	5287	0.224	0.224
Pignut Hickory	1089	0.236	0.236
Shagbark Hickory	143	0.241	0.240
White Oak	1090	0.246	0.246
Sassafras	503	0.259	0.258
Black/Northern Pin hybrid	152	0.262	0.260
White Ash	10	0.278	0.260
American Basswood	65	0.299	0.290
American Beech	60	0.320	0.312
Black Oak	933	0.321	0.318
Black/Red Oak hybrid	536	0.331	0.327
Big Tooth Aspen	31	0.366	0.349
Red Oak	149	0.382	0.377
Sugar Maple	10	0.459	0.371

n_species <- species_summary$species %>% table() %>% length()
n_sim <- 10000

sim <- NULL
for(i in 1:n_species) {
  sim <- bind_rows(
    sim,
    data_frame(
      species = species_summary$species[i],
      mean = species_summary$mean[i],
      value = rnorm(n_sim, species_summary$post_mean[i], species_summary$SE[i])
    )
  )
}

ggplot(data=sim, aes(x=value)) +
  geom_histogram() +
  facet_wrap(~species, scales = "free") +
  geom_vline(aes(xintercept = mean), color="red")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

For Next Time

Layout the Bayesian model, including parameter/hyperparameter structure
Incorporate
- dbh of focal tree
- biomass of neighbors within neighborhood rather than number of neighbors. Think of one giant oak versus 10 small shrubs.