ANLY 505 - Problem Set #2

The statistical model:

\(y_t = \beta_0 + \beta_1 * (Elevation_s)_t + \beta_2 * Slope_t + (b_s)_t + \epsilon_t\)

Where:

\(\beta_0\) is the mean response when both Elevation and Slope are 0
\(\beta_1\) is the change in mean response for a 1-unit change in elevation. Elevation is measured at the stand level, so all plots in a stand share a single value in elevation.
\(\beta_2\) is the change in mean response for a 1-unit change in slope. Slope is measured at the plot level, so every plot potentially has a unique value of slope.

Let’s define the parameters:

the intercept \((\beta_0)\) will be -1
the coefficient for elevation \((\beta_1)\) will be set to 0.005
the coefficient for slope \((\beta_2)\) will be set to 0.1

nstand <- 5; nplot <- 4
b0 <- -1; b1 <- 0.005; b2 <- 0.1
sds <- 2; sd <- 1

Simulate other variables:

set.seed(16)
stand <- rep(LETTERS[1:nstand], each = nplot)
standeff <- rep(rnorm(nstand, 0, sds), each = nplot)
ploteff <- rnorm(nstand*nplot, 0, sd)

Simulate elevation and slope:

elevation <- rep(runif(nstand, 1000, 1500), each = nplot)
slope <- runif(nstand*nplot, 2, 75)

Simulate response variable:

resp2 <- b0 + b1*elevation + b2*slope + standeff + ploteff

Your tasks (complete each task in its’ own code chunk, make sure to use echo=TRUE so I can see your code):

Fit a linear mixed model with the response variable as a function of elevation and slope with stand as a random effect. Are the estimated parameters similar to the initial parameters as we defined them?

fit2 <- lmer(resp2 ~ elevation + slope + (1|stand)) #the linear mixed model
options(scipen=100,digits=4)
tidy(fit2) %>% kable() %>% kable_styling()

term	estimate	std.error	statistic	group
(Intercept)	-21.3146	6.6021	-3.228	fixed
elevation	0.0206	0.0049	4.190	fixed
slope	0.0951	0.0164	5.785	fixed
sd_(Intercept).stand	1.0989	NA	NA	stand
sd_Observation.Residual	1.1653	NA	NA	Residual

ANSWER: Estimated β2 (0.0951) is close to the initial parameter 0.1, but estimated β0 (-21.3146) and estimated β1 (0.0206) is both not close to the initial parameters -1 and 0.005 respectively.

Create a function for your model and run 1000 simulations of that model.

linearmixed_fun <- function(nstand=5, nplot=4, b0=-1, b1=0.005, b2=0.1, sigma_s=2, sigma=1) {
        stand <- rep(LETTERS[1:nstand], each = nplot)
        standeff <- rep(rnorm(nstand, 0, sigma_s), each = nplot)
        ploteff <- rnorm(nstand*nplot, 0, sigma)
        elevation <- rep(runif(nstand, 1000, 1500), each = nplot)
        slope <- runif(nstand*nplot, 2, 75)
        resp2 <- b0 + b1*elevation + b2*slope + standeff + ploteff 
        fit2 <- lmer(resp2 ~ elevation + slope + (1|stand))
}

times <- 1000
set.seed(16)
sims <- rerun(times, linearmixed_fun()) #omit messages "boundary (singular) fit: see ?isSingular"

Extract the stand and residual variances from this simulation run. Print the first 6 rows of the data.

stand_var <- sims %>%
  map_dfr(tidy, effects="ran_pars", scales="vcov") %>%
  filter(group=="stand")
stand_var$sequence <- rownames(stand_var)

res_var <- sims %>%
  map_dfr(tidy, effects="ran_pars", scales="vcov") %>%
  filter(group=="Residual")
res_var$sequence <- rownames(res_var)

dat <- merge(stand_var[c(1:6),], res_var[c(1:6),], by="sequence", all=T)
dat %>% kable() %>% kable_styling()

sequence	term.x	group.x	estimate.x	term.y	group.y	estimate.y
1	var_(Intercept).stand	stand	1.2075	var_Observation.Residual	Residual	1.3579
2	var_(Intercept).stand	stand	5.5570	var_Observation.Residual	Residual	0.9514
3	var_(Intercept).stand	stand	2.6105	var_Observation.Residual	Residual	1.1057
4	var_(Intercept).stand	stand	9.7313	var_Observation.Residual	Residual	1.3650
5	var_(Intercept).stand	stand	0.8269	var_Observation.Residual	Residual	0.9138
6	var_(Intercept).stand	stand	9.5880	var_Observation.Residual	Residual	1.3158

Choose three different sample sizes (your choice) and run 1000 model simulations with each sample size. Create 3 visualizations that compare distributions of the variances for each of the 3 sample sizes. Make sure that the axes are labelled correctly. What do these graphs say about the relationship between sample size and variance?

stand_sims <- c(5,20,100) %>%
  set_names() %>%
  map(~rerun(times, linearmixed_fun(nstand=.x)))

stand_vars <- stand_sims %>%
  modify_depth(2, ~tidy(.x, effects="ran_pars", scales="vcov")) %>%
  map_dfr(bind_rows, .id="num") %>%
  filter(group=="stand")
stand_vars <- mutate(stand_vars, num=fct_inorder(num))

groupmed <- stand_vars %>%
  group_by(num) %>%
  summarise(medvar=median(estimate))

f1 <- ggplot(stand_vars, aes(x=estimate))+
  geom_density(fill="orange", alpha=0.5)+
  facet_wrap(~num, labeller=as_labeller(function(x) paste("Number of stands:", x, sep=" ")))+
  geom_vline(aes(xintercept=sds^2, linetype="True Variance"), size=0.4)+
  geom_vline(data=groupmed, aes(xintercept=medvar, linetype="Median Variance"), size=0.4)+
  scale_linetype_manual(name="", values=c(2,1))+
  labs(x="Estimated Variance", y="Density")+
  theme_bw()
f1

ANSWER: Sample size is respectively chosen as 5, 20, 100. As sample size increases, variance will decrease.

Plot the coefficients of the estimates of elevation and slope. Hint: the x-axis should have 1000 values. Discuss the graphs.

coef <- sims %>%
  map_dfr(tidy, effects="fixed") %>%
  filter(term %in% c("elevation", "slope"))
coef$sequence <- rep(1:times, each=2)

coefmed <- coef %>%
  group_by(term) %>%
  summarise(med=median(estimate))

f2 <- ggplot(coef, aes(sequence, estimate))+
  geom_point(size=0.2)+
  facet_wrap(~term)+
  geom_hline(data=coefmed, aes(yintercept=med), size=0.4)+
  labs(x="Sequence number", y="Estimated coefficients")+
  theme_bw()
f2

ANSWER: The median values of the estimated coefficients are shown as line whose values 0.00472 and 0.0998 are close to the initial parameters 0.005 and 0.1. However, the estimated coefficients β1 and β2 are much scattered in each plot. Some estimated β1 are even negative. The model may not fit well in this case.

Submit a link to this document in R Pubs to your Moodle.
ANSWER: This document is also available at http://rpubs.com/sherloconan/491673.

ANLY 505 - Problem Set #2

Simulation of Linear Mixed Models

Zhengxiao Wei

2019-04-25