The statistical model:

\(y_t = \beta_0 + \beta_1 * (Elevation_s)_t + \beta_2 * Slope_t + (b_s)_t + \epsilon_t\)

Where:

Let’s define the parameters:

nstand <- 5; nplot <- 4
b0 <- -1; b1 <- 0.005; b2 <- 0.1
sds <- 2; sd <- 1

Simulate other variables:

set.seed(16)
stand <- rep(LETTERS[1:nstand], each = nplot)
standeff <- rep(rnorm(nstand, 0, sds), each = nplot)
ploteff <- rnorm(nstand*nplot, 0, sd)

Simulate elevation and slope:

elevation <- rep(runif(nstand, 1000, 1500), each = nplot)
slope <- runif(nstand*nplot, 2, 75)

Simulate response variable:

resp2 <- b0 + b1*elevation + b2*slope + standeff + ploteff 

Your tasks (complete each task in its’ own code chunk, make sure to use echo=TRUE so I can see your code):

 

  1. Fit a linear mixed model with the response variable as a function of elevation and slope with stand as a random effect. Are the estimated parameters similar to the initial parameters as we defined them?
fit2 <- lmer(resp2 ~ elevation + slope + (1|stand)) #the linear mixed model
options(scipen=100,digits=4)
tidy(fit2) %>% kable() %>% kable_styling()
term estimate std.error statistic group
(Intercept) -21.3146 6.6021 -3.228 fixed
elevation 0.0206 0.0049 4.190 fixed
slope 0.0951 0.0164 5.785 fixed
sd_(Intercept).stand 1.0989 NA NA stand
sd_Observation.Residual 1.1653 NA NA Residual

 

ANSWER: Estimated β2 (0.0951) is close to the initial parameter 0.1, but estimated β0 (-21.3146) and estimated β1 (0.0206) is both not close to the initial parameters -1 and 0.005 respectively.

 

  1. Create a function for your model and run 1000 simulations of that model.
linearmixed_fun <- function(nstand=5, nplot=4, b0=-1, b1=0.005, b2=0.1, sigma_s=2, sigma=1) {
        stand <- rep(LETTERS[1:nstand], each = nplot)
        standeff <- rep(rnorm(nstand, 0, sigma_s), each = nplot)
        ploteff <- rnorm(nstand*nplot, 0, sigma)
        elevation <- rep(runif(nstand, 1000, 1500), each = nplot)
        slope <- runif(nstand*nplot, 2, 75)
        resp2 <- b0 + b1*elevation + b2*slope + standeff + ploteff 
        fit2 <- lmer(resp2 ~ elevation + slope + (1|stand))
}

times <- 1000
set.seed(16)
sims <- rerun(times, linearmixed_fun()) #omit messages "boundary (singular) fit: see ?isSingular"

 

  1. Extract the stand and residual variances from this simulation run. Print the first 6 rows of the data.
stand_var <- sims %>%
  map_dfr(tidy, effects="ran_pars", scales="vcov") %>%
  filter(group=="stand")
stand_var$sequence <- rownames(stand_var)

res_var <- sims %>%
  map_dfr(tidy, effects="ran_pars", scales="vcov") %>%
  filter(group=="Residual")
res_var$sequence <- rownames(res_var)

dat <- merge(stand_var[c(1:6),], res_var[c(1:6),], by="sequence", all=T)
dat %>% kable() %>% kable_styling()
sequence term.x group.x estimate.x term.y group.y estimate.y
1 var_(Intercept).stand stand 1.2075 var_Observation.Residual Residual 1.3579
2 var_(Intercept).stand stand 5.5570 var_Observation.Residual Residual 0.9514
3 var_(Intercept).stand stand 2.6105 var_Observation.Residual Residual 1.1057
4 var_(Intercept).stand stand 9.7313 var_Observation.Residual Residual 1.3650
5 var_(Intercept).stand stand 0.8269 var_Observation.Residual Residual 0.9138
6 var_(Intercept).stand stand 9.5880 var_Observation.Residual Residual 1.3158

 

  1. Choose three different sample sizes (your choice) and run 1000 model simulations with each sample size. Create 3 visualizations that compare distributions of the variances for each of the 3 sample sizes. Make sure that the axes are labelled correctly. What do these graphs say about the relationship between sample size and variance?
stand_sims <- c(5,20,100) %>%
  set_names() %>%
  map(~rerun(times, linearmixed_fun(nstand=.x)))

stand_vars <- stand_sims %>%
  modify_depth(2, ~tidy(.x, effects="ran_pars", scales="vcov")) %>%
  map_dfr(bind_rows, .id="num") %>%
  filter(group=="stand")
stand_vars <- mutate(stand_vars, num=fct_inorder(num))

groupmed <- stand_vars %>%
  group_by(num) %>%
  summarise(medvar=median(estimate))

f1 <- ggplot(stand_vars, aes(x=estimate))+
  geom_density(fill="orange", alpha=0.5)+
  facet_wrap(~num, labeller=as_labeller(function(x) paste("Number of stands:", x, sep=" ")))+
  geom_vline(aes(xintercept=sds^2, linetype="True Variance"), size=0.4)+
  geom_vline(data=groupmed, aes(xintercept=medvar, linetype="Median Variance"), size=0.4)+
  scale_linetype_manual(name="", values=c(2,1))+
  labs(x="Estimated Variance", y="Density")+
  theme_bw()
f1

 

ANSWER: Sample size is respectively chosen as 5, 20, 100. As sample size increases, variance will decrease.

 

  1. Plot the coefficients of the estimates of elevation and slope. Hint: the x-axis should have 1000 values. Discuss the graphs.
coef <- sims %>%
  map_dfr(tidy, effects="fixed") %>%
  filter(term %in% c("elevation", "slope"))
coef$sequence <- rep(1:times, each=2)

coefmed <- coef %>%
  group_by(term) %>%
  summarise(med=median(estimate))

f2 <- ggplot(coef, aes(sequence, estimate))+
  geom_point(size=0.2)+
  facet_wrap(~term)+
  geom_hline(data=coefmed, aes(yintercept=med), size=0.4)+
  labs(x="Sequence number", y="Estimated coefficients")+
  theme_bw()
f2

 

ANSWER: The median values of the estimated coefficients are shown as line whose values 0.00472 and 0.0998 are close to the initial parameters 0.005 and 0.1. However, the estimated coefficients β1 and β2 are much scattered in each plot. Some estimated β1 are even negative. The model may not fit well in this case.

 

  1. Submit a link to this document in R Pubs to your Moodle.
    ANSWER: This document is also available at http://rpubs.com/sherloconan/491673.