The statistical model:

\(y_t = \beta_0 + \beta_1 * (Elevation_s)_t + \beta_2 * Slope_t + (b_s)_t + \epsilon_t\)

Where:

Let’s define the parameters:

nstand = 5
nplot = 4
b0 = -1
b1 = .005
b2 = .1
sds = 2
sd = 1

Simulate other variables:

set.seed(16)
stand = rep(LETTERS[1:nstand], each = nplot)
standeff = rep( rnorm(nstand, 0, sds), each = nplot)
ploteff = rnorm(nstand*nplot, 0, sd)

Simulate elevation and slope:

elevation = rep( runif(nstand, 1000, 1500), each = nplot)
slope = runif(nstand*nplot, 2, 75)

Simulate response variable:

resp2 = b0 + b1*elevation + b2*slope + standeff + ploteff 

Your tasks (complete each task in its’ own code chunk, make sure to use echo=TRUE so I can see your code):

  1. Fit a linear mixed model with the response variable as a function of elevation and slope with stand as a random effect. Are the estimated parameters similar to the intial parameters as we defined them?
if (!require("lme4")) {
install.packages("lme4")
library(lme4)
}
## Loading required package: lme4
## Loading required package: Matrix
fit = lmer(resp2 ~ elevation + slope + (1|stand))
fit
## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ elevation + slope + (1 | stand)
## REML criterion at convergence: 81.9874
## Random effects:
##  Groups   Name        Std.Dev.
##  stand    (Intercept) 1.099   
##  Residual             1.165   
## Number of obs: 20, groups:  stand, 5
## Fixed Effects:
## (Intercept)    elevation        slope  
##   -21.31463      0.02060      0.09511
summary(fit)
## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ elevation + slope + (1 | stand)
## 
## REML criterion at convergence: 82
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.65583 -0.62467 -0.01693  0.53669  1.41736 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  stand    (Intercept) 1.208    1.099   
##  Residual             1.358    1.165   
## Number of obs: 20, groups:  stand, 5
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept) -21.314628   6.602050  -3.228
## elevation     0.020600   0.004916   4.190
## slope         0.095105   0.016441   5.785
## 
## Correlation of Fixed Effects:
##           (Intr) elevtn
## elevation -0.991       
## slope      0.049 -0.148

Answer: The fitted linear model have an estimated b2 at 0.095 which is close to the initial value 0.1, however the estimated b1 and b0 are different the initial values.

  1. Create a function for your model and run 1000 simulations of that model.
times = 1000
sims_func = function(nstand = 5, nplot = 4, b0 = -1, b1 = .005, b2 = 0.1, sds = 2, sd = 1) {
        stand = rep(LETTERS[1:nstand], each = nplot)
        standeff = rep(rnorm(nstand, 0, sds), each = nplot)
        ploteff = rnorm(nstand*nplot, 0, sd)
        elevation = rep(runif(nstand, 1000, 1500), each = nplot)
        slope = runif(nstand*nplot, 2, 75)
        resp2 = b0 + b1*elevation + b2*slope + standeff + ploteff 
        fit2 = lmer(resp2 ~ elevation + slope + (1|stand))
}
sims1000 = replicate(n = times, expr = sims_func())
  1. Extract the stand and residual variances from this simulation run. Print the first 6 rows of the data.
if (!require("broom")) {
install.packages("broom")
library(broom)
}
## Loading required package: broom
## Warning: package 'broom' was built under R version 3.5.2
if (!require("tidyverse")) {
install.packages("tidyverse")
library(tidyverse)
}
## Loading required package: tidyverse
## Warning: package 'tidyverse' was built under R version 3.5.2
## -- Attaching packages ---------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.8
## v tidyr   0.8.2     v stringr 1.3.1
## v readr   1.3.0     v forcats 0.3.0
## Warning: package 'ggplot2' was built under R version 3.5.2
## -- Conflicts ------------------------------------------------------------------------------- tidyverse_conflicts() --
## x tidyr::expand() masks Matrix::expand()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
variances = sims1000 %>% map_dfr(tidy, effects = "ran_pars", scales = "vcov")
variances %>% print(n = 6)
## # A tibble: 2,000 x 3
##   term                     group    estimate
##   <chr>                    <chr>       <dbl>
## 1 var_(Intercept).stand    stand       5.56 
## 2 var_Observation.Residual Residual    0.951
## 3 var_(Intercept).stand    stand       2.61 
## 4 var_Observation.Residual Residual    1.11 
## 5 var_(Intercept).stand    stand       9.73 
## 6 var_Observation.Residual Residual    1.36 
## # ... with 1,994 more rows
  1. Choose three different sample sizes (your choice) and run 1000 model simulations with each sample size. Create 3 visualizations that compare distributions of the variances for each of the 3 sample sizes. Make sure that the axes are labelled correctly. What do these graphs say about the relationship between sample size and variance?

Answer: With the sample size increases the variance will decrease.

  1. Plot the coefficients of the estimates of elevation and slope. Hint: the x-axis should have 1000 values. Discuss the graphs.
if (!require("furrr")) {
install.packages("furrr")
library(furrr)
}
## Loading required package: furrr
## Warning: package 'furrr' was built under R version 3.5.3
## Loading required package: future
## Warning: package 'future' was built under R version 3.5.3
coef = sims1000 %>% 
  future_map(tidy, effects = "fixed") %>% 
  bind_rows()

  med_elevation = median(coef$estimate[coef$term=="elevation"])
  med_elevation
## [1] 0.00469506
  med_slope = median(coef$estimate[coef$term=="slope"])
  med_slope
## [1] 0.09985177
  mean_elevation = mean(coef$estimate[coef$term=="elevation"])
  mean_elevation
## [1] 0.004896685
  mean_slope = mean(coef$estimate[coef$term=="slope"])
  mean_slope
## [1] 0.09969903
coef %>% 
  dplyr::filter(term %in% c("elevation", "slope")) %>% 
  group_by(term) %>% 
  mutate(series = 1 : times) %>%
  ungroup() %>%
  mutate(med_intercept = ifelse(term == "elevation", med_elevation, med_slope)) %>% 
  mutate(mean_intercept = ifelse(term == "elevation", mean_elevation, mean_slope)) %>% 
 

ggplot(aes(x = series, y = estimate)) +
  geom_line() +
  facet_wrap(~term) +
  geom_hline(aes(yintercept = med_intercept, color = term), linetype = 4, size = 0.3) +
  geom_hline(aes(yintercept = mean_intercept, color = term), size = 0.3) +
  labs(x="Series", y="Estimated Coefficients") +
  theme_bw()

Answer: Mean elevation is 0.004897, mean slope is 0.099699, median elevation is 0.004695, and median slope us 0.099852. Both mean and median value of 1000 times of simulation are close to the initial value or 0.005 and 0.1. However, their distribution are various and scattered with some extreme values.

  1. Submit a link to this document in R Pubs to your Moodle.

Sure