The statistical model:

\(y_t = \beta_0 + \beta_1 * (Elevation_s)_t + \beta_2 * Slope_t + (b_s)_t + \epsilon_t\)

Where:

Let’s define the parameters:

nstand = 5
nplot = 4
b0 = -1
b1 = .005
b2 = .1
sds = 2
sd = 1

Simulate other variables:

set.seed(16)
stand = rep(LETTERS[1:nstand], each = nplot)
standeff = rep( rnorm(nstand, 0, sds), each = nplot)
ploteff = rnorm(nstand*nplot, 0, sd)
stand
##  [1] "A" "A" "A" "A" "B" "B" "B" "B" "C" "C" "C" "C" "D" "D" "D" "D" "E"
## [18] "E" "E" "E"
standeff
##  [1]  0.9528268  0.9528268  0.9528268  0.9528268 -0.2507600 -0.2507600
##  [7] -0.2507600 -0.2507600  2.1924324  2.1924324  2.1924324  2.1924324
## [13] -2.8884581 -2.8884581 -2.8884581 -2.8884581  2.2956586  2.2956586
## [19]  2.2956586  2.2956586
ploteff
##  [1] -0.46841204 -1.00595059  0.06356268  1.02497260  0.57314202
##  [6]  1.84718210  0.11193337 -0.74603732  1.65821366  0.72172057
## [11] -1.66308050  0.57590953  0.47276012 -0.54273166  1.12768707
## [16] -1.64779762 -0.31417395 -0.18268157  1.47047849 -0.86589878

Simulate elevation and slope:

#reminder: runif(n, min=? and max=?)
elevation = rep( runif(nstand, 1000, 1500), each = nplot)
slope = runif(nstand*nplot, 2, 75)
elevation
##  [1] 1468.339 1468.339 1468.339 1468.339 1271.581 1271.581 1271.581
##  [8] 1271.581 1427.050 1427.050 1427.050 1427.050 1166.014 1166.014
## [15] 1166.014 1166.014 1424.256 1424.256 1424.256 1424.256
slope
##  [1] 48.45477 60.37014 59.58588 44.76939 61.88313 20.29559 71.51617
##  [8] 42.35035 63.67044 36.43613 10.58778 14.62304 35.11192 13.19697
## [15]  9.29946 46.56462 34.68245 53.61456 37.00606 42.30044

Simulate response variable:

resp2 = b0 + b1*elevation + b2*slope + standeff + ploteff 
resp2
##  [1] 11.671585 12.325584 13.316671 12.796432 11.868602  8.983889 12.370697
##  [8]  8.596145 16.352939 12.693014  7.723378 10.365894  5.925563  2.718576
## [15]  3.999244  4.950275 11.571009 13.595712 13.588022 11.781083

Your tasks (complete each task in its’ own code chunk, make sure to use echo=TRUE so I can see your code):

  1. Fit a linear mixed model with the response variable as a function of elevation and slope with stand as a random effect. Are the estimated parameters similar to the initial parameters as we defined them?
library(lme4)
## Loading required package: Matrix
ModelStdRandom = lmer(resp2 ~ 1 + elevation + slope + (1|stand))
results<-summary(ModelStdRandom)
results
## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ 1 + elevation + slope + (1 | stand)
## 
## REML criterion at convergence: 82
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.65583 -0.62467 -0.01693  0.53669  1.41736 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  stand    (Intercept) 1.208    1.099   
##  Residual             1.358    1.165   
## Number of obs: 20, groups:  stand, 5
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept) -21.314628   6.602053  -3.228
## elevation     0.020600   0.004916   4.190
## slope         0.095105   0.016441   5.785
## 
## Correlation of Fixed Effects:
##           (Intr) elevtn
## elevation -0.991       
## slope      0.049 -0.148
results$coefficients
##                 Estimate  Std. Error   t value
## (Intercept) -21.31462843 6.602052944 -3.228485
## elevation     0.02060019 0.004916391  4.190104
## slope         0.09510535 0.016441240  5.784560

As we can see the estimates, for the coefficients bo, b1 and b2 can be quite different from our set values, except for b2: bo= -21.31462843 compared to -1 defined before b1 (for elevation)= 0.02060019 vs.0.005 defined before *b2 (for slope) = 0.09510535 vs. 0.1 defined before (quite close)

  1. Create a function for your model and run 1000 simulations of that model.
#We re-use exactly the same steps as described before, applied specifically to the "ModelStdRandom" fitted in the previous question, as a function:
FctModelStdRandom <- function(nstand = 5, nplot = 4, b0 = -1, b1 = 0.005, b2 = 0.1, sds = 2, sd = 1) {
  stand <- rep(LETTERS[1:nstand], each = nplot)
  standeff <- rep(rnorm(nstand, 0, sds), each = nplot)
  ploteff <- rnorm(nstand * nplot, 0, sd)
  elevation <- rep(runif(nstand, 1000, 1500), each = nplot)
  slope <- runif(nstand * nplot, 2, 75)
  resp2 <- b0 + b1 * elevation + b2 * slope + standeff + ploteff
  dataVariables <- data.frame(resp2, elevation, slope, stand)
  lmer(resp2 ~ 1 + elevation + slope + (1|stand), data=dataVariables)
}
FctModelStdRandom()
## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ 1 + elevation + slope + (1 | stand)
##    Data: dataVariables
## REML criterion at convergence: 80.9781
## Random effects:
##  Groups   Name        Std.Dev.
##  stand    (Intercept) 2.3573  
##  Residual             0.9754  
## Number of obs: 20, groups:  stand, 5
## Fixed Effects:
## (Intercept)    elevation        slope  
##   10.584601    -0.005464     0.086839
Fct1_results <- replicate(n = 1000, FctModelStdRandom())
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular

comments: why do I have this comments?

length(Fct1_results)
## [1] 1000
  1. Extract the stand and residual variances from this simulation run. Print the first 6 rows of the data.
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.3
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::expand() masks Matrix::expand()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ tidyr::pack()   masks Matrix::pack()
## ✖ tidyr::unpack() masks Matrix::unpack()
library(broom)
library(broom.mixed)
## Registered S3 methods overwritten by 'broom.mixed':
##   method         from 
##   augment.lme    broom
##   augment.merMod broom
##   glance.lme     broom
##   glance.merMod  broom
##   glance.stanreg broom
##   tidy.brmsfit   broom
##   tidy.gamlss    broom
##   tidy.lme       broom
##   tidy.merMod    broom
##   tidy.rjags     broom
##   tidy.stanfit   broom
##   tidy.stanreg   broom
## 
## Attaching package: 'broom.mixed'
## The following object is masked from 'package:broom':
## 
##     tidyMCMC
library(purrr)

The stand and residual variances can be extracted via “scales”, with effect “ran_pars”

ResVariances_Std <- Fct1_results %>%
  map_dfr(tidy, effects = "ran_pars", scales = "vcov")
head(ResVariances_Std)
## # A tibble: 6 x 4
##   effect   group    term             estimate
##   <chr>    <chr>    <chr>               <dbl>
## 1 ran_pars stand    var__(Intercept)    2.61 
## 2 ran_pars Residual var__Observation    1.11 
## 3 ran_pars stand    var__(Intercept)    9.73 
## 4 ran_pars Residual var__Observation    1.36 
## 5 ran_pars stand    var__(Intercept)    0.827
## 6 ran_pars Residual var__Observation    0.914
  1. Choose three different sample sizes (your choice) and run 1000 model simulations with each sample size.
    Create 3 visualizations that compare distributions of the variances for each of the 3 sample sizes. Make sure that the axes are labelled correctly.
    What do these graphs say about the relationship between sample size and variance?
library(ggplot2)
library(future)
library(dplyr)
library(furrr)

#We chose 3 sample sizes: 10, 50 and 150

plan(multiprocess)
DiffSampleResults <- c(10, 50, 150) %>% 
  set_names(c("Sample Size = 10", "Sample Size = 50", "Sample Size = 150")) %>% 
  future_map(function(.size) replicate(n = 1000, expr = FctModelStdRandom(nstand = .size)))

compVar_DiffSampleResults <- DiffSampleResults %>% 
  modify_depth(.depth = 2, function(x) tidy(x, effects = "ran_pars", scales = "vcov")) %>% 
  map_dfr(bind_rows, .id = "id") %>% 
  filter(group == "stand")

compVar_DiffSampleResults_Median <- compVar_DiffSampleResults %>%
  group_by(id) %>%
  summarise(medianvar=median(estimate))

graphComp<-ggplot(compVar_DiffSampleResults, aes(x = estimate) ) +
  geom_density() +
  facet_wrap(~ id) +
  geom_vline(aes(xintercept=sds^2, linetype="True Variance = 4"))+
  geom_vline(data=compVar_DiffSampleResults_Median,aes(xintercept=medianvar, color=id, linetype="Median Variance"))+
  theme_bw()

graphComp

Apologies, the sample size are not in order (I couldn’t make work the code to modify this).

The 3 graphs show that the greater the sample size is,the closest the estimated variance from the true variance (4) is: the range of estimated variances tightens around the true variance (see that for the smallest sample=10, the variance ranges from 0 to 15, and gets better with the sample size).

  1. Plot the coefficients of the estimates of elevation and slope. Hint: the x-axis should have 1000 values. Discuss the graphs.
Coeff_Estimates <- Fct1_results %>% 
  map_dfr(tidy,  effects = "fixed") %>%
  bind_rows()

Coeff_Estimates
## # A tibble: 3,000 x 5
##    effect term        estimate std.error statistic
##    <chr>  <chr>          <dbl>     <dbl>     <dbl>
##  1 fixed  (Intercept) -15.9      5.63       -2.83 
##  2 fixed  elevation     0.0152   0.00426     3.57 
##  3 fixed  slope         0.121    0.0157      7.73 
##  4 fixed  (Intercept) -13.9     13.3        -1.05 
##  5 fixed  elevation     0.0156   0.0117      1.33 
##  6 fixed  slope         0.124    0.0166      7.48 
##  7 fixed  (Intercept) -12.3      4.45       -2.75 
##  8 fixed  elevation     0.0130   0.00336     3.87 
##  9 fixed  slope         0.0937   0.0100      9.32 
## 10 fixed  (Intercept) -11.9     21.4        -0.556
## # … with 2,990 more rows
Coeff_Estimates %>% 
  dplyr::filter(term %in% c("elevation", "slope")) %>% 
  group_by(term) %>% 
  mutate(x = 1 : 1000) %>%
  ungroup() %>% 
  mutate(TrueValues = ifelse(term == "elevation", 0.005, 0.1)) %>% 
  ggplot(aes(x = x, y = estimate)) +
  geom_line() +
  facet_wrap(~term) +
  geom_hline(aes(yintercept = TrueValues, color=term), linetype = 2, size = 1) +
  theme_bw()

The individual estimated coefficients fluctuate, but seem to be fluctuating around the true values of elevation and slope (which are 0.005 and 0.01, respectively).

  1. Submit a link to this document in R Pubs to your Moodle. This assignment is worth 25 points.