The statistical model:

\(y_t = \beta_0 + \beta_1 * (Elevation_s)_t + \beta_2 * Slope_t + (b_s)_t + \epsilon_t\)

Where:

Let’s define the parameters:

nstand = 5
nplot = 4
b0 = -1
b1 = .005
b2 = .1
sds = 2
sd = 1

Simulate other variables:

set.seed(16)
stand = rep(LETTERS[1:nstand], each = nplot)
standeff = rep( rnorm(nstand, 0, sds), each = nplot)
ploteff = rnorm(nstand*nplot, 0, sd)

Simulate elevation and slope:

elevation = rep( runif(nstand, 1000, 1500), each = nplot)
slope = runif(nstand*nplot, 2, 75)

Simulate response variable:

resp2 = b0 + b1*elevation + b2*slope + standeff + ploteff 

Your tasks (complete each task in its’ own code chunk, make sure to use echo=TRUE so I can see your code):

  1. Fit a linear mixed model with the response variable as a function of elevation and slope with stand as a random effect. Are the estimated parameters similar to the intial parameters as we defined them?

As we can see from the output, β^1=0.005, which is much smaller than the intial parameter 0.005. β^2=0.1 is close to the initial parameter we define. The estimated intercept -21.31 varies from the real value of -1.

#install.packages("lme4")
library(lme4)
## Warning: package 'lme4' was built under R version 3.6.1
## Loading required package: Matrix
library(Matrix)
LMMfit = lmer(resp2 ~ elevation + slope + (1|stand))
summary(LMMfit)
## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ elevation + slope + (1 | stand)
## 
## REML criterion at convergence: 82
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.65583 -0.62467 -0.01693  0.53669  1.41736 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  stand    (Intercept) 1.208    1.099   
##  Residual             1.358    1.165   
## Number of obs: 20, groups:  stand, 5
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept) -21.314628   6.602053  -3.228
## elevation     0.020600   0.004916   4.190
## slope         0.095105   0.016441   5.785
## 
## Correlation of Fixed Effects:
##           (Intr) elevtn
## elevation -0.991       
## slope      0.049 -0.148
  1. Create a function for your model and run 1000 simulations of that model.
model_function = function(nstand = 5, nplot = 4, b0 = -1, b1 = 0.005, b2 = 0.1, sds = 2, sd = 1) {
  stand = rep(LETTERS[1:nstand], each = nplot)
  standeff = rep(rnorm(nstand, 0, sds), each = nplot)
  ploteff = rnorm(nstand * nplot, 0, sd)
  elevation = rep(runif(nstand, 1000, 1500), each = nplot)
  slope = runif(nstand * nplot, 2, 75)
  resp2 = b0 + b1 * elevation + b2 * slope + standeff + ploteff
  data.re = data.frame(resp2, elevation, slope, stand)
  lmer(resp2 ~ elevation + slope + (1|stand), data = data.re)
}
model_function()
## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ elevation + slope + (1 | stand)
##    Data: data.re
## REML criterion at convergence: 80.9781
## Random effects:
##  Groups   Name        Std.Dev.
##  stand    (Intercept) 2.3573  
##  Residual             0.9754  
## Number of obs: 20, groups:  stand, 5
## Fixed Effects:
## (Intercept)    elevation        slope  
##   10.584601    -0.005464     0.086839
library(purrr)
sim<-rerun(1000, model_function())
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00350336
## (tol = 0.002, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0045514
## (tol = 0.002, component 1)
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00363544
## (tol = 0.002, component 1)
## boundary (singular) fit: see ?isSingular
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00439561
## (tol = 0.002, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00239513
## (tol = 0.002, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00219495
## (tol = 0.002, component 1)
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00322495
## (tol = 0.002, component 1)
## boundary (singular) fit: see ?isSingular
  1. Extract the stand and residual variances from this simulation run. Print the first 6 rows of the data.
library(broom)
var <- sim %>% map_dfr(tidy, effects = "ran_pars", scales = "vcov")
var[1:6,]
## # A tibble: 6 x 3
##   term                     group    estimate
##   <chr>                    <chr>       <dbl>
## 1 var_(Intercept).stand    stand       2.61 
## 2 var_Observation.Residual Residual    1.11 
## 3 var_(Intercept).stand    stand       9.73 
## 4 var_Observation.Residual Residual    1.36 
## 5 var_(Intercept).stand    stand       0.827
## 6 var_Observation.Residual Residual    0.914
  1. Choose three different sample sizes (your choice) and run 1000 model simulations with each sample size. Create 3 visualizations that compare distributions of the variances for each of the 3 sample sizes. Make sure that the axes are labelled correctly. What do these graphs say about the relationship between sample size and variance?

From the graphs, we can see that the estimated variance reaches true variance as the sample size increases. Although if the sample size is too small, the results may not be truly reflective of the sample. Hence we will need to verify the trend with larger samples and see if they hold true.

#install.packages("ggplot2")
library(ggplot2)
## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
stand_sims = c(10, 40, 200) %>%
     set_names() %>%
     map(~replicate(1000, model_function(nstand = .x) ) )
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00848934
## (tol = 0.002, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00617958
## (tol = 0.002, component 1)
## boundary (singular) fit: see ?isSingular
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00257766
## (tol = 0.002, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0173916
## (tol = 0.002, component 1)
stand_vars = stand_sims %>%
     modify_depth(2, ~tidy(.x, effects = "ran_pars", scales = "vcov") ) %>%
     map_dfr(bind_rows, .id = "stand_num") %>%
     filter(group == "stand")
head(stand_vars)
## # A tibble: 6 x 4
##   stand_num term                  group estimate
##   <chr>     <chr>                 <chr>    <dbl>
## 1 10        var_(Intercept).stand stand    3.76 
## 2 10        var_(Intercept).stand stand    3.46 
## 3 10        var_(Intercept).stand stand    2.51 
## 4 10        var_(Intercept).stand stand    4.11 
## 5 10        var_(Intercept).stand stand    0.838
## 6 10        var_(Intercept).stand stand    3.77
ggplot(stand_vars, aes(x = estimate) ) +
     geom_density(fill = "blue", alpha = .25) +
     facet_wrap(~stand_num) +
     geom_vline(xintercept = 4)

  1. Plot the coefficients of the estimates of elevation and slope. Hint: the x-axis should have 1000 values. Discuss the graphs. As we can see from the graph, the estimated coeffieicnts from the model can vary but eventually approach the true vale as the mean when iterated several times.
#install.packages("furrr")
library(furrr)
## Warning: package 'furrr' was built under R version 3.6.1
## Loading required package: future
## Warning: package 'future' was built under R version 3.6.1
coeff <- sim %>% 
  future_map(tidy, effects = "fixed") %>% 
  bind_rows()

coeff %>% 
  dplyr::filter(term %in% c("elevation", "slope")) %>% 
  group_by(term) %>% 
  mutate(x = 1 : 1000) %>%
  ungroup() %>% 
  mutate(real_value = ifelse(term == "elevation", 0.005, 0.1)) %>% 
  ggplot(aes(x = x, y = estimate)) +
  geom_line() +
  facet_wrap(~term) +
  geom_hline(aes(yintercept = real_value, color = term), linetype = 4, size = 0.5) +
  theme_bw()

  1. Submit a link to this document in R Pubs to your Moodle. This assignment is worth 25 points.