ANLY 505 - Problem Set #2

The statistical model:

\(y_t = \beta_0 + \beta_1 * (Elevation_s)_t + \beta_2 * Slope_t + (b_s)_t + \epsilon_t\)

Where:

\(\beta_0\) is the mean response when both Elevation and Slope are 0
\(\beta_1\) is the change in mean response for a 1-unit change in elevation. Elevation is measured at the stand level, so all plots in a stand share a single value in elevation.
\(\beta_2\) is the change in mean response for a 1-unit change in slope. Slope is measured at the plot level, so every plot potentially has a unique value of slope.

Let’s define the parameters:

the intercept \((\beta_0)\) will be -1
the coefficient for elevation \((\beta_1)\) will be set to 0.005
the coefficient for slope \((\beta_2)\) will be set to 0.1

nstand = 5
nplot = 4
b0 = -1
b1 = .005
b2 = .1
sds = 2
sd = 1

Simulate other variables:

set.seed(16)
stand = rep(LETTERS[1:nstand], each = nplot)
standeff = rep( rnorm(nstand, 0, sds), each = nplot)
ploteff = rnorm(nstand*nplot, 0, sd)

Simulate elevation and slope:

elevation = rep( runif(nstand, 1000, 1500), each = nplot)
slope = runif(nstand*nplot, 2, 75)

Simulate response variable:

resp2 = b0 + b1*elevation + b2*slope + standeff + ploteff

Your tasks (complete each task in its’ own code chunk, make sure to use echo=TRUE so I can see your code):

Fit a linear mixed model with the response variable as a function of elevation and slope with stand as a random effect. Are the estimated parameters similar to the intial parameters as we defined them?

library(lme4)

## Loading required package: Matrix

library(lmerTest)

## 
## Attaching package: 'lmerTest'

## The following object is masked from 'package:lme4':
## 
##     lmer

## The following object is masked from 'package:stats':
## 
##     step

fit2 = lmer(resp2 ~ elevation + slope + (1|stand))
fit2

## Linear mixed model fit by REML ['lmerModLmerTest']
## Formula: resp2 ~ elevation + slope + (1 | stand)
## REML criterion at convergence: 81.9874
## Random effects:
##  Groups   Name        Std.Dev.
##  stand    (Intercept) 1.099   
##  Residual             1.165   
## Number of obs: 20, groups:  stand, 5
## Fixed Effects:
## (Intercept)    elevation        slope  
##   -21.31463      0.02060      0.09511

Create a function for your model and run 1000 simulations of that model.

set.seed(16)
fit2model_fun = function(nstand = 5, nplot = 4, b0 = -1, b1 = 0.005, b2 = .1, sds = 2, sd = 1) {
  stand = rep(LETTERS[1:nstand], each = nplot)
  standeff = rep(rnorm(nstand, 0, sds), each = nplot)
  ploteff = rnorm(nstand*nplot, 0, sd)
  
  elevation = rep(runif(nstand, 1000, 1500), each = nplot)
  slope = runif(nstand*nplot, 2, 75)
  resp2 = b0 + b1*elevation + b2*slope + standeff + ploteff
  dframe = data.frame(resp2, elevation, slope, stand)
  fit2 = lmer(resp2 ~ elevation + slope + (1|stand), data = dframe)
 
}

set.seed(16)
sim_fit2model = replicate(1000 , expr = fit2model_fun())

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular

Extract the stand and residual variances from this simulation run. Print the first 6 rows of the data.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.1.1     ✔ purrr   0.2.5
## ✔ tibble  2.0.0     ✔ dplyr   0.8.3
## ✔ tidyr   0.8.3     ✔ stringr 1.3.1
## ✔ readr   1.3.1     ✔ forcats 0.3.0

## ── Conflicts ────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::expand() masks Matrix::expand()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(broom)
library(broom.mixed)

## 
## Attaching package: 'broom.mixed'

## The following object is masked from 'package:broom':
## 
##     tidyMCMC

library(purrr)

sv = sim_fit2model %>%
  map_dfr(tidy, effects = "ran_pars",scales = "vcov") 
 
head(sv)

## # A tibble: 6 x 4
##   effect   group    term             estimate
##   <chr>    <chr>    <chr>               <dbl>
## 1 ran_pars stand    var__(Intercept)    1.21 
## 2 ran_pars Residual var__Observation    1.36 
## 3 ran_pars stand    var__(Intercept)    5.56 
## 4 ran_pars Residual var__Observation    0.951
## 5 ran_pars stand    var__(Intercept)    2.61 
## 6 ran_pars Residual var__Observation    1.11

Choose three different sample sizes (your choice) and run 1000 model simulations with each sample size. Create 3 visualizations that compare distributions of the variances for each of the 3 sample sizes. Make sure that the axes are labelled correctly. What do these graphs say about the relationship between sample size and variance?

library(ggplot2)
library(furrr)

## Loading required package: future

stand_sims = c(5, 20, 100) %>%
  set_names(c("sample size 5","sample size 20","sample size 100")) %>%
  map(~replicate(1000, fit2model_fun(nstand = .x)))

## boundary (singular) fit: see ?isSingular

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular

stand_vars = stand_sims %>%
  modify_depth(2, ~tidy(.x, effects = "ran_pars", scales = "vcov")) %>%
  map_dfr(bind_rows, .id = "stand_num") %>%
  filter(group == "stand")

stand_vars = mutate(stand_vars, stand_num = fct_inorder(stand_num))

add_prefix = function(string) {
  paste("# of stands:" , string, sep = " ")
}

groupmed = stand_vars %>%
  group_by(stand_num) %>%
  summarise(mvar = median(estimate))

ggplot(stand_vars, aes(x = estimate))+
  geom_density(fill = "orange", alpha = 0.25) +
  facet_wrap(~stand_num, labeller = as_labeller(add_prefix)) +
  geom_vline(aes(xintercept = 4 , linetype = "True Variance"), size = 0.5) +
  geom_vline(data = groupmed, aes(xintercept = mvar, linetype = "Median Variance"), size = 0.5) +
  theme_bw() +
  scale_linetype_manual(name = "", values = c(2,1)) +
  theme(legend.position = "bottom", legend.key.width = unit(.1, "cm")) +
  labs(x = "Estimated Variance", y = "Density")

Plot the coefficients of the estimates of elevation and slope. Hint: the x-axis should have 1000 values. Discuss the graphs.

library(sjstats)

## 
## Attaching package: 'sjstats'

## The following object is masked from 'package:broom':
## 
##     bootstrap

library(dplyr)

coefplot = sim_fit2model %>%
  map_dfr(tidy,  effects = "fixed") %>%
  filter(term %in% c("elevation","slope")) %>%
  group_by(term) 
coefplot

## # A tibble: 2,000 x 7
## # Groups:   term [2]
##    effect term      estimate std.error statistic    df     p.value
##    <chr>  <chr>        <dbl>     <dbl>     <dbl> <dbl>       <dbl>
##  1 fixed  elevation  0.0206    0.00492     4.19   3.11 0.0230     
##  2 fixed  slope      0.0951    0.0164      5.78  15.9  0.0000288  
##  3 fixed  elevation -0.00546   0.00927    -0.589  3.03 0.597      
##  4 fixed  slope      0.0868    0.0140      6.21  14.2  0.0000214  
##  5 fixed  elevation  0.0152    0.00426     3.57   2.99 0.0377     
##  6 fixed  slope      0.121     0.0157      7.73  15.5  0.00000107 
##  7 fixed  elevation  0.0156    0.0117      1.33   2.99 0.276      
##  8 fixed  slope      0.124     0.0166      7.48  14.3  0.00000257 
##  9 fixed  elevation  0.0130    0.00336     3.87   3.01 0.0304     
## 10 fixed  slope      0.0937    0.0100      9.32  14.7  0.000000151
## # … with 1,990 more rows

mean = coefplot %>%
  summarise_at("estimate", funs( mean))
mean

## # A tibble: 2 x 2
##   term      estimate
##   <chr>        <dbl>
## 1 elevation  0.00492
## 2 slope      0.0997

plot = ggplot(coefplot, aes(x=rep(1:1000, each=2),y=estimate))+
  geom_point(size=0.5)+
  facet_wrap(~term) +
  geom_hline(data=mean,aes(yintercept=mean$estimate,color = term), size = 0.9) +
 labs(x="Replicated numbers", y="Estimate of coefficients")
plot

Submit a link to this document in R Pubs to your Moodle. This assignment is worth 25 points.

ANLY 505 - Problem Set #2

Simulation of Linear Mixed Models

Runhao Wang

2019-10-22