ANLY 505 - Simulated Data and Linear Mixed Models

The statistical model:

\(y_t = \beta_0 + \beta_1 * (Elevation_s)_t + \beta_2 * Slope_t + (b_s)_t + \epsilon_t\)

Where:

\(\beta_0\) is the mean response when both Elevation and Slope are 0
\(\beta_1\) is the change in mean response for a 1-unit change in elevation. Elevation is measured at the stand level, so all plots in a stand share a single value in elevation.
\(\beta_2\) is the change in mean response for a 1-unit change in slope. Slope is measured at the plot level, so every plot potentially has a unique value of slope.

Let’s define the parameters:

the intercept \((\beta_0)\) will be -1
the coefficient for elevation \((\beta_1)\) will be set to 0.005
the coefficient for slope \((\beta_2)\) will be set to 0.1

nstand = 5
nplot = 4
b0 = -1
b1 = .005
b2 = .1
sds = 2
sd = 1

Simulate other variables:

set.seed(16)
stand = rep(LETTERS[1:nstand], each = nplot)
standeff = rep( rnorm(nstand, 0, sds), each = nplot)
ploteff = rnorm(nstand*nplot, 0, sd)

Simulate elevation and slope:

elevation = rep( runif(nstand, 1000, 1500), each = nplot)
slope = runif(nstand*nplot, 2, 75)

Simulate response variable:

resp2 = b0 + b1*elevation + b2*slope + standeff + ploteff

Your tasks (complete each task in its’ own code chunk, make sure to use echo=TRUE so I can see your code):

Fit a linear mixed model with the response variable as a function of elevation and slope with stand as a random effect. Are the estimated parameters similar to the intial parameters as we defined them?

library(Matrix)
library(lme4)


fit1 <- lmer(resp2 ~ elevation + slope + (1|stand))
options(scipen=100,digits=4)
summary(fit1)

## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ elevation + slope + (1 | stand)
## 
## REML criterion at convergence: 82
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.6558 -0.6247 -0.0169  0.5367  1.4174 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  stand    (Intercept) 1.21     1.10    
##  Residual             1.36     1.17    
## Number of obs: 20, groups:  stand, 5
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept) -21.31463    6.60205   -3.23
## elevation     0.02060    0.00492    4.19
## slope         0.09511    0.01644    5.78
## 
## Correlation of Fixed Effects:
##           (Intr) elevtn
## elevation -0.991       
## slope      0.049 -0.148

Only the slope coefficient is close to the initial one: for the intercept and the elevation, the estimated values are rather different. Specifically, for both the intercept slope, the estimated values in the regression are over 2 st dev away from the expected ones.

Create a function for your model and run 1000 simulations of that model.

library(purrr)
sim_fun <- function(nstand = 5, nplot = 4, b0 = -1, b1 = 0.005, b2 = 0.1, sds = 2, sd = 1) {
  stand <- rep(LETTERS[1:nstand], each = nplot)
  standeff <- rep(rnorm(nstand, 0, sds), each = nplot)
  ploteff <- rnorm(nstand * nplot, 0, sd)
  elevation <- rep(runif(nstand, 1000, 1500), each = nplot)
  slope <- runif(nstand * nplot, 2, 75)
  resp2 <- b0 + b1 * elevation + b2 * slope + standeff + ploteff
  dat <- data.frame(resp2, elevation, slope, stand)
  lmer(resp2 ~ 1 + elevation + slope + (1|stand), data = dat)
}
sim_fun()

## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ 1 + elevation + slope + (1 | stand)
##    Data: dat
## REML criterion at convergence: 80.98
## Random effects:
##  Groups   Name        Std.Dev.
##  stand    (Intercept) 2.357   
##  Residual             0.975   
## Number of obs: 20, groups:  stand, 5
## Fixed Effects:
## (Intercept)    elevation        slope  
##    10.58460     -0.00546      0.08684

sims_1000 = rerun(1000, sim_fun())

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular

Extract the stand and residual variances from this simulation run. Print the first 6 rows of the data.

library(tidyverse)

## -- Attaching packages ----------------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --

## √ ggplot2 3.1.0       √ readr   1.3.1  
## √ tibble  2.1.1       √ dplyr   0.8.0.1
## √ tidyr   0.8.3       √ stringr 1.3.1  
## √ ggplot2 3.1.0       √ forcats 0.4.0

## -- Conflicts -------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x tidyr::expand() masks Matrix::expand()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(broom)
Res <- sims_1000 %>% map_dfr(tidy, effects = "ran_pars", scales = "vcov")
Res %>% print(n = 6)

## # A tibble: 2,000 x 3
##   term                     group    estimate
##   <chr>                    <chr>       <dbl>
## 1 var_(Intercept).stand    stand       2.61 
## 2 var_Observation.Residual Residual    1.11 
## 3 var_(Intercept).stand    stand       9.73 
## 4 var_Observation.Residual Residual    1.36 
## 5 var_(Intercept).stand    stand       0.827
## 6 var_Observation.Residual Residual    0.914
## # ... with 1,994 more rows

Choose three different sample sizes (your choice) and run 1000 model simulations with each sample size. Create 3 visualizations that compare distributions of the variances for each of the 3 sample sizes. Make sure that the axes are labelled correctly. What do these graphs say about the relationship between sample size and variance?

#insert code

This code crashes the knitting part but DOES generate the required output: not sure what the problem is. sims_sizes = c(5, 10, 100) %>%set_names(c(“sample_5”,“sample_10”,“sample_100”)) %>%map(~replicate(1000, sim_fun(nstand = .x)))

library(broom) sims_vars = sims_sizes %>%modify_depth(2, ~tidy(.x, effects = “ran_pars”, scales = “vcov”)) %>%map_dfr(bind_rows, .id = “stand_num”) %>%filter(group == “stand”) sims_vars = mutate(sims_vars, stand_num = fct_inorder(stand_num)) add_prefix = function(string) { paste(“# of stands:” , string, sep = " “) } groupmed = sims_vars %>%group_by(stand_num) %>%summarise(mvar = median(estimate)) library(ggplot2) ggplot(sims_vars, aes(x = estimate)) + geom_density(fill =”yellow“, alpha = 0.5) + facet_wrap(~stand_num, labeller = as_labeller(add_prefix)) + geom_vline(aes(xintercept = 4 , linetype =”True Variance“), size = 1) + geom_vline(data = groupmed, aes(xintercept = mvar, linetype =”Median Variance“), size = 1) + theme_bw() + scale_linetype_manual(name =”“, values = c(2,1)) + theme(legend.position =”bottom“, legend.key.width = unit(.1,”cm“)) + labs(x =”Estimated Variance“, y =”Density")

Plot the coefficients of the estimates of elevation and slope. Hint: the x-axis should have 1000 values. Discuss the graphs.

library(ggplot2)
library(dplyr)
library(furrr)

## Loading required package: future

sims_est <- sims_1000 %>% 
  future_map(tidy, effects = "fixed") %>% 
  bind_rows()

sims_est %>% 
  dplyr::filter(term %in% c("elevation", "slope")) %>% 
  group_by(term) %>% 
  mutate(x = 1 : 1000) %>%
  ungroup() %>% 
  mutate(real_value = ifelse(term == "elevation", 0.005, 0.1)) %>% 
  ggplot(aes(x = x, y = estimate)) +
  geom_line() +
  facet_wrap(~term) +
  geom_hline(aes(yintercept = real_value, color = term), size = 0.8) +
  theme_bw()

Submit a link to this document in R Pubs to your Moodle. This assignment is worth 25 points.

ANLY 505 - Simulated Data and Linear Mixed Models

Week 2

Krzysztof Herman

2019-11-11