ANLY 505 - Problem Set #2

The statistical model:

\(y_t = \beta_0 + \beta_1 * (Elevation_s)_t + \beta_2 * Slope_t + (b_s)_t + \epsilon_t\)

Where:

\(\beta_0\) is the mean response when both Elevation and Slope are 0
\(\beta_1\) is the change in mean response for a 1-unit change in elevation. Elevation is measured at the stand level, so all plots in a stand share a single value in elevation.
\(\beta_2\) is the change in mean response for a 1-unit change in slope. Slope is measured at the plot level, so every plot potentially has a unique value of slope.

Let’s define the parameters:

the intercept \((\beta_0)\) will be -1
the coefficient for elevation \((\beta_1)\) will be set to 0.005
the coefficient for slope \((\beta_2)\) will be set to 0.1

nstand = 5
nplot = 4
b0 = -1
b1 = .005
b2 = .1
sds = 2
sd = 1

Simulate other variables:

set.seed(16)
stand = rep(LETTERS[1:nstand], each = nplot)
standeff = rep(rnorm(nstand, 0, sds), each = nplot)
ploteff = rnorm(nstand*nplot, 0, sd)

Simulate elevation and slope:

elevation = rep(runif(nstand, 1000, 1500), each = nplot)
slope = runif(nstand*nplot, 2, 75)

Simulate response variable:

resp2 = b0 + b1*elevation + b2*slope + standeff + ploteff

Your tasks (complete each task in its’ own code chunk, make sure to use echo=TRUE so I can see your code):

Loading the required packages

library(lme4)

## Loading required package: Matrix

library(purrr)
library(tidyverse)

## -- Attaching packages -------------------------------------------- tidyverse 1.2.1 --

## v ggplot2 3.1.0       v readr   1.3.1  
## v tibble  2.1.1       v dplyr   0.8.0.1
## v tidyr   0.8.3       v stringr 1.4.0  
## v ggplot2 3.1.0       v forcats 0.4.0

## -- Conflicts ----------------------------------------------- tidyverse_conflicts() --
## x tidyr::expand() masks Matrix::expand()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(broom)
library(broom.mixed)

## Warning in checkMatrixPackageVersion(): Package version inconsistency detected.
## TMB was built with Matrix version 1.2.17
## Current Matrix version is 1.2.15
## Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package

## 
## Attaching package: 'broom.mixed'

## The following object is masked from 'package:broom':
## 
##     tidyMCMC

library(ggplot2)
library(dplyr)
library(furrr)

## Loading required package: future

Fit a linear mixed model with the response variable as a function of elevation and slope with stand as a random effect. Are the estimated parameters similar to the intial parameters as we defined them?

model <- lmer(resp2 ~ 1 + elevation + slope + (1|stand))
model

## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ 1 + elevation + slope + (1 | stand)
## REML criterion at convergence: 81.9874
## Random effects:
##  Groups   Name        Std.Dev.
##  stand    (Intercept) 1.099   
##  Residual             1.165   
## Number of obs: 20, groups:  stand, 5
## Fixed Effects:
## (Intercept)    elevation        slope  
##   -21.31463      0.02060      0.09511

print(paste('b0 = ', b0))

## [1] "b0 =  -1"

print(paste('b1 = ', b1))

## [1] "b1 =  0.005"

print(paste('b2 = ', b2))

## [1] "b2 =  0.1"

The intercept b0 was initially set to -1 whereas the parameter from the model gave us intercept value of -21, which is quite different. The coefficient of elevation b1 was initially set to 0.005 whereas the coefficient of elevation we got from the model is 0.02 that is quite different. The coefficient of slope b2 was initially set to 0.1 whereas the coefficient of elevation we got from the model is 0.095 that is approximately 0.1, which is similar.

Create a function for your model and run 1000 simulations of that model.

Creating the function

model_sim = function(nstand = 5, nplot = 4, b0 = -1, b1 = 0.005, b2 = 0.1, sds = 2, sd = 1) {
  stand = rep(LETTERS[1:nstand], each = nplot)
  standeff = rep(rnorm(nstand, 0, sds), each = nplot)
  ploteff = rnorm(nstand * nplot, 0, sd)
  elevation = rep(runif(nstand, 1000, 1500), each = nplot)
  slope = runif(nstand * nplot, 2, 75)
  resp2 = b0 + b1 * elevation + b2 * slope + standeff + ploteff
  data = data.frame(resp2, elevation, slope, stand)
  lmer(resp2 ~ 1 + elevation + slope + (1|stand), data = data)
}
model_sim()

## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ 1 + elevation + slope + (1 | stand)
##    Data: data
## REML criterion at convergence: 85.1321
## Random effects:
##  Groups   Name        Std.Dev.
##  stand    (Intercept) 2.992   
##  Residual             1.077   
## Number of obs: 20, groups:  stand, 5
## Fixed Effects:
## (Intercept)    elevation        slope  
##    7.365232    -0.002138     0.102090

Running 1000 simulations

thousand_sims <- replicate(n = 1000, expr = model_sim())

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00494375
## (tol = 0.002, component 1)

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00426347
## (tol = 0.002, component 1)

## boundary (singular) fit: see ?isSingular

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00287986
## (tol = 0.002, component 1)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00314794
## (tol = 0.002, component 1)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00789635
## (tol = 0.002, component 1)

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0162996
## (tol = 0.002, component 1)

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00307489
## (tol = 0.002, component 1)

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular

Extract the stand and residual variances from this simulation run. Print the first 6 rows of the data.

variances <- thousand_sims %>% map_dfr(tidy, effects = "ran_pars", scales = "vcov")
head(variances)

## # A tibble: 6 x 4
##   effect   group    term             estimate
##   <chr>    <chr>    <chr>               <dbl>
## 1 ran_pars stand    var__(Intercept)    2.78 
## 2 ran_pars Residual var__Observation    1.20 
## 3 ran_pars stand    var__(Intercept)    2.91 
## 4 ran_pars Residual var__Observation    0.771
## 5 ran_pars stand    var__(Intercept)    1.72 
## 6 ran_pars Residual var__Observation    1.82

Choose three different sample sizes (your choice) and run 1000 model simulations with each sample size. Create 3 visualizations that compare distributions of the variances for each of the 3 sample sizes. Make sure that the axes are labelled correctly. What do these graphs say about the relationship between sample size and variance?

s_sims = c(5, 15, 300) %>%
  set_names() %>%
  map(~replicate(1000, model_sim(nstand = .x)))

## boundary (singular) fit: see ?isSingular

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00446168
## (tol = 0.002, component 1)

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00235588
## (tol = 0.002, component 1)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00538489
## (tol = 0.002, component 1)

## boundary (singular) fit: see ?isSingular

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00436266
## (tol = 0.002, component 1)

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00238533
## (tol = 0.002, component 1)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.017266
## (tol = 0.002, component 1)

s_vars = s_sims %>%
  modify_depth(2, ~tidy(.x, effects = "ran_pars", scales = "vcov")) %>%
  map_dfr(bind_rows, .id = "id") %>%
  filter(group == "stand")

ggplot(s_vars, aes(x = estimate)) +
  geom_density(fill = "blue", alpha = "0.25") +
  facet_wrap(~id) +
  geom_vline(xintercept = 4)

We can clearly see that with a higher sample size the estimate accuracy increases.

Plot the coefficients of the estimates of elevation and slope. Hint: the x-axis should have 1000 values. Discuss the graphs.

sim <- thousand_sims %>% 
  future_map(tidy, effects = "fixed") %>% 
  bind_rows()

sim %>% 
  filter(term %in% c("elevation", "slope")) %>% 
  group_by(term) %>% 
  mutate(x = 1 : 1000) %>%
  ungroup() %>% 
  mutate(val = ifelse(term == "elevation", 0.005, 0.1)) %>% 
  ggplot(aes(x = x, y = estimate)) +
  geom_line() +
  facet_wrap(~term) +
  geom_hline(aes(yintercept = val, color = term), linetype = 4, size = 0.5) +
  theme_bw()

Mean elevation is 0.005 and mean slope is 0.1, which are the initial values we set for these parameters. Thus it indicates that with a higher no of samples like 1000 in this case the mean of the samples for both elevation and slope are close to the values we initialized with.

Submit a link to this document in R Pubs to your Moodle. This assignment is worth 25 points.

ANLY 505 - Problem Set #2

Simulation of Linear Mixed Models

Neeraj Bhatt

2019-08-26