The statistical model:

\(y_t = \beta_0 + \beta_1 * (Elevation_s)_t + \beta_2 * Slope_t + (b_s)_t + \epsilon_t\)

Where:

Let’s define the parameters:

nstand = 5
nplot = 4
b0 = -1
b1 = .005
b2 = .1
sds = 2
sd = 1

Simulate other variables:

set.seed(16)
stand = rep(LETTERS[1:nstand], each = nplot)
standeff = rep( rnorm(nstand, 0, sds), each = nplot)
ploteff = rnorm(nstand*nplot, 0, sd)

Simulate elevation and slope:

elevation = rep( runif(nstand, 1000, 1500), each = nplot)
slope = runif(nstand*nplot, 2, 75)

Simulate response variable:

resp2 = b0 + b1*elevation + b2*slope + standeff + ploteff 

Your tasks (complete each task in its’ own code chunk, make sure to use echo=TRUE so I can see your code):

  1. Fit a linear mixed model with the response variable as a function of elevation and slope with stand as a random effect. Are the estimated parameters similar to the intial parameters as we defined them?
# use this chunk to answer question 1
library(lme4)
## Loading required package: Matrix
fit1 <- lmer(resp2 ~ elevation + slope + (1|stand))

summary(fit1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ elevation + slope + (1 | stand)
## 
## REML criterion at convergence: 82
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.65583 -0.62467 -0.01693  0.53669  1.41736 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  stand    (Intercept) 1.208    1.099   
##  Residual             1.358    1.165   
## Number of obs: 20, groups:  stand, 5
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept) -21.314628   6.602053  -3.228
## elevation     0.020600   0.004916   4.190
## slope         0.095105   0.016441   5.785
## 
## Correlation of Fixed Effects:
##           (Intr) elevtn
## elevation -0.991       
## slope      0.049 -0.148

The estimate of b0 is -21.314628b2 while the initial is -1. The estimate of b1 is 0.0206 while the initial is .005 but estimate of b2 is 0.095 while the initial is 0.1. The estimated parameters are not similar to the intial parameters.

  1. Create a function for your model and run 1000 simulations of that model.
# use this chunk to answer question 2
library(purrr)
model <- function(nstand=5, nplot=4, b0=-1, b1=0.005, b2=0.1, sigma_s=2, sigma=1) {
        stand <- rep(LETTERS[1:nstand], each = nplot)
        standeff <- rep(rnorm(nstand, 0, sigma_s), each = nplot)
        ploteff <- rnorm(nstand*nplot, 0, sigma)
        elevation <- rep(runif(nstand, 1000, 1500), each = nplot)
        slope <- runif(nstand*nplot, 2, 75)
        resp2 <- b0 + b1*elevation + b2*slope + standeff + ploteff 
        fit2 <- lmer(resp2 ~ elevation + slope + (1|stand))
}

set.seed(16)
sims <- replicate(n = 1000, expr = model())
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
  1. Extract the stand and residual variances from this simulation run. Print the first 6 rows of the data.
# use this chunk to answer question 3
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ readr   1.3.1
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   0.8.3     ✔ stringr 1.4.0
## ✔ ggplot2 3.2.1     ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::expand() masks Matrix::expand()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(broom)
library(knitr)
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
stand_var <- sims %>%
  map_dfr(tidy, effects="ran_pars", scales="vcov") %>%
  filter(group=="stand")
stand_var$id <- rownames(stand_var)

res_var <- sims %>%
  map_dfr(tidy, effects="ran_pars", scales="vcov") %>%
  filter(group=="Residual")
res_var$id <- rownames(res_var)

dat <- merge(stand_var[c(1:6),], res_var[c(1:6),], by="id", all=T)
dat %>% kable() %>% kable_styling()
id term.x group.x estimate.x term.y group.y estimate.y
1 var_(Intercept).stand stand 1.207522 var_Observation.Residual Residual 1.3578703
2 var_(Intercept).stand stand 5.557048 var_Observation.Residual Residual 0.9513796
3 var_(Intercept).stand stand 2.610489 var_Observation.Residual Residual 1.1057024
4 var_(Intercept).stand stand 9.731346 var_Observation.Residual Residual 1.3649953
5 var_(Intercept).stand stand 0.826860 var_Observation.Residual Residual 0.9137930
6 var_(Intercept).stand stand 9.587962 var_Observation.Residual Residual 1.3157805
  1. Choose three different sample sizes (your choice) and run 1000 model simulations with each sample size. Create 3 visualizations that compare distributions of the variances for each of the 3 sample sizes. Make sure that the axes are labelled correctly. What do these graphs say about the relationship between sample size and variance?
# use this chunk to answer question 4
library(ggplot2)
stand_sims <- c(5,20,100) %>%
  set_names() %>%
  map(~rerun(1000, model(nstand=.x)))
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
stand_vars <- stand_sims %>%
  modify_depth(2, ~tidy(.x, effects="ran_pars", scales="vcov")) %>%
  map_dfr(bind_rows, .id="num") %>%
  filter(group=="stand")
stand_vars <- mutate(stand_vars, num=fct_inorder(num))

groupmed <- stand_vars %>%
  group_by(num) %>%
  summarise(medvar=median(estimate))

f1 <- ggplot(stand_vars, aes(x=estimate))+
  geom_density(fill="orange", alpha=0.5)+
  facet_wrap(~num, labeller=as_labeller(function(x) paste("Number of stands:", x, sep=" ")))+
  geom_vline(aes(xintercept=sds^2, linetype="True Variance"), size=0.4)+
  geom_vline(data=groupmed, aes(xintercept=medvar, linetype="Median Variance"), size=0.4)+
  scale_linetype_manual(name="", values=c(2,1))+
  labs(x="Estimated Variance", y="Density")+
  theme_bw()
f1

As sample size increases, variance decreases.

  1. Plot the coefficients of the estimates of elevation and slope. Hint: the x-axis should have 1000 values. Discuss the graphs.
# use this chunk to answer question 5
coef <- sims %>%
  map_dfr(tidy, effects="fixed") %>%
  filter(term %in% c("elevation", "slope"))
coef$sequence <- rep(1:1000, each=2)

coefmed <- coef %>%
  group_by(term) %>%
  summarise(med=median(estimate))

f2 <- ggplot(coef, aes(sequence, estimate))+
  geom_point(size=0.2)+
  facet_wrap(~term)+
  geom_hline(data=coefmed, aes(yintercept=med), size=0.4)+
  labs(x="Sequence number", y="Estimated coefficients")+
  theme_bw()
f2

The model may not be a good fit asthe estimated coefficients are much scattered.

  1. Submit a link to this document in R Pubs to your Moodle. This assignment is worth 25 points.