The statistical model:

\(y_t = \beta_0 + \beta_1 * (Elevation_s)_t + \beta_2 * Slope_t + (b_s)_t + \epsilon_t\)

Where:

Let’s define the parameters:

nstand = 5
nplot = 4
b0 = -1
b1 = .005
b2 = .1
sds = 2
sd = 1

Simulate other variables:

set.seed(16)
stand = rep(LETTERS[1:nstand], each = nplot)
standeff = rep( rnorm(nstand, 0, sds), each = nplot)
ploteff = rnorm(nstand*nplot, 0, sd)

Simulate elevation and slope:

elevation = rep( runif(nstand, 1000, 1500), each = nplot)
slope = runif(nstand*nplot, 2, 75)

Simulate response variable:

resp2 = b0 + b1*elevation + b2*slope + standeff + ploteff 

Your tasks (complete each task in its’ own code chunk, make sure to use echo=TRUE so I can see your code):

  1. Fit a linear mixed model with the response variable as a function of elevation and slope with stand as a random effect. Are the estimated parameters similar to the intial parameters as we defined them?
library(Matrix)
library(lme4)

resp2 <- b0 + b1*elevation + b2*slope + standeff + ploteff 

m1 <- lmer(resp2 ~ elevation + slope +(1|stand))
m1
## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ elevation + slope + (1 | stand)
## REML criterion at convergence: 81.9874
## Random effects:
##  Groups   Name        Std.Dev.
##  stand    (Intercept) 1.099   
##  Residual             1.165   
## Number of obs: 20, groups:  stand, 5
## Fixed Effects:
## (Intercept)    elevation        slope  
##   -21.31463      0.02060      0.09511
summary(m1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ elevation + slope + (1 | stand)
## 
## REML criterion at convergence: 82
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.65583 -0.62467 -0.01693  0.53669  1.41736 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  stand    (Intercept) 1.208    1.099   
##  Residual             1.358    1.165   
## Number of obs: 20, groups:  stand, 5
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept) -21.314628   6.602053  -3.228
## elevation     0.020600   0.004916   4.190
## slope         0.095105   0.016441   5.785
## 
## Correlation of Fixed Effects:
##           (Intr) elevtn
## elevation -0.991       
## slope      0.049 -0.148

Defined intercept(b0) is -1 which is totally different than estimated intercept -21.314628. Defined Elevation(b1) is .005 which is much smaller than estimated elevation 0.020600. Defined slope(b1) is 0.1 which is quite close to estimated slope 0.095105

  1. Create a function for your model and run 1000 simulations of that model.
set.seed(16)

m1_function = function(nstand = 5, nplot = 4, b0 = -1, b1 = 0.005, b2 = .1, sds = 2, sd = 1) 
{
  stand = rep(LETTERS[1:nstand], each = nplot)
  standeff = rep(rnorm(nstand, 0, sds), each = nplot)
  ploteff = rnorm(nstand*nplot, 0, sd)
  
  elevation = rep(runif(nstand, 1000, 1500), each = nplot)
  slope = runif(nstand*nplot, 2, 75)
  resp2 = b0 + b1*elevation + b2*slope + standeff + ploteff
  dframe = data.frame(resp2, elevation, slope, stand)
  m1 = lmer(resp2 ~ elevation + slope + (1|stand), data = dframe)
 
}

set.seed(16)
m1_simul = replicate (1000, expr = m1_function())
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
  1. Extract the stand and residual variances from this simulation run. Print the first 6 rows of the data.
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.1
## ✔ tidyr   0.8.3     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::expand() masks Matrix::expand()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(purrr)
library(tidyr)
library(broom)
tidy(m1, effects = "ran_pars", scales = "vcov")
## # A tibble: 2 x 3
##   term                     group    estimate
##   <chr>                    <chr>       <dbl>
## 1 var_(Intercept).stand    stand        1.21
## 2 var_Observation.Residual Residual     1.36
library("broom")
head(map_dfr(m1_simul, broom::tidy, .id = "model", effects = "ran_pars"))
## # A tibble: 6 x 4
##   model term                    group    estimate
##   <chr> <chr>                   <chr>       <dbl>
## 1 1     sd_(Intercept).stand    stand       1.10 
## 2 1     sd_Observation.Residual Residual    1.17 
## 3 2     sd_(Intercept).stand    stand       2.36 
## 4 2     sd_Observation.Residual Residual    0.975
## 5 3     sd_(Intercept).stand    stand       1.62 
## 6 3     sd_Observation.Residual Residual    1.05
  1. Choose three different sample sizes (your choice) and run 1000 model simulations with each sample size. Create 3 visualizations that compare distributions of the variances for each of the 3 sample sizes. Make sure that the axes are labelled correctly. What do these graphs say about the relationship between sample size and variance?
m1_sims <- c(5, 20, 100) %>% set_names() %>% map(~replicate (1000, m1_function(nstand = .x)))
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
library(dplyr)


var_stand = m1_sims %>% modify_depth(2, ~tidy(.x, effects = "ran_pars", scales = "vcov")) %>% map_dfr(bind_rows, .id = "stand_number") %>% filter (group == "stand")

head(var_stand)
## # A tibble: 6 x 4
##   stand_number term                  group estimate
##   <chr>        <chr>                 <chr>    <dbl>
## 1 5            var_(Intercept).stand stand     6.91
## 2 5            var_(Intercept).stand stand     1.92
## 3 5            var_(Intercept).stand stand    12.0 
## 4 5            var_(Intercept).stand stand     6.84
## 5 5            var_(Intercept).stand stand     2.89
## 6 5            var_(Intercept).stand stand     2.52
library(forcats)

var_stand = mutate(var_stand, stand_number = fct_inorder(stand_number))
add_prefix <- function(string) { paste(" #Stands= ", string, sep = " ")}

med_group = var_stand %>% group_by(stand_number) %>% summarise(mvar = median(estimate))

library(ggplot2)

ggplot(var_stand, aes(x = estimate)) + geom_density(fill = "blue", alpha = 0.25) + facet_wrap(~stand_number, labeller = as_labeller(add_prefix)) + geom_vline(aes(xintercept = 4, linetype = "True Variance"), size = 0.6) + geom_vline(data = med_group, aes(xintercept = mvar, linetype = "Median Variance"), size = 0.5) + theme_bw() + scale_linetype_manual(name = "", values = c(2,1)) + theme(legend.position = "bottom", legend.key.width = unit(.1, "cm")) + labs(x = "Estimated Variance", y = "Density (Probability)")

When sample size increases, variance decreases. As the sample size increases from 5 to 20 its more evident but not so much when it increases from 20 to 100. Hence the threshold of the relationship between true variance and estimated is 20. So increase in sample will not have any effect on accuracy of variance estimate.

  1. Plot the coefficients of the estimates of elevation and slope. Hint: the x-axis should have 1000 values. Discuss the graphs.
coeff = m1_simul %>%
  map_dfr(tidy,  effects = "fixed") %>%
  filter(term %in% c("elevation","slope")) %>%
  group_by(term) 
coeff
## # A tibble: 2,000 x 4
## # Groups:   term [2]
##    term      estimate std.error statistic
##    <chr>        <dbl>     <dbl>     <dbl>
##  1 elevation  0.0206    0.00492     4.19 
##  2 slope      0.0951    0.0164      5.78 
##  3 elevation -0.00546   0.00927    -0.589
##  4 slope      0.0868    0.0140      6.21 
##  5 elevation  0.0152    0.00426     3.57 
##  6 slope      0.121     0.0157      7.73 
##  7 elevation  0.0156    0.0117      1.33 
##  8 slope      0.124     0.0166      7.48 
##  9 elevation  0.0130    0.00336     3.87 
## 10 slope      0.0937    0.0100      9.32 
## # … with 1,990 more rows
library(dplyr)
mean = coeff %>%
  summarise_at("estimate", funs( mean))
mean
## # A tibble: 2 x 2
##   term      estimate
##   <chr>        <dbl>
## 1 elevation  0.00492
## 2 slope      0.0997
plot = ggplot(coeff, aes(x=rep(1:1000, each=2),y=estimate))+ geom_point(size=0.3) + facet_wrap(~term) + geom_hline(data=mean,aes(yintercept=mean$estimate,color = term), size = 0.9) + labs(x=" Numbers that replicated", y="Evaluated coefficients") 
plot

The mean estimates of coefficient (elevation and slope) are 0.004 and 0.099 respectively which are close to the initial parameters of 0.005 and 0.1.

  1. Submit a link to this document in R Pubs to your Moodle. This assignment is worth 25 points.