ANLY 505 - Simulated Data and Linear Mixed Models

library(Matrix)
#install.packages('lmerTest')
library(lmerTest)

## Loading required package: lme4

## 
## Attaching package: 'lmerTest'

## The following object is masked from 'package:lme4':
## 
##     lmer

## The following object is masked from 'package:stats':
## 
##     step

library(lme4)
library(broom)
library(tm)

## Loading required package: NLP

library(tidytext)
library(stringr)
library(tidyr)

## 
## Attaching package: 'tidyr'

## The following objects are masked from 'package:Matrix':
## 
##     expand, pack, unpack

library(topicmodels)
library(tidyverse)

## ── Attaching packages ──────────────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.0     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.5
## ✓ readr   1.3.1     ✓ forcats 0.4.0

## ── Conflicts ─────────────────────────────────────────────────── tidyverse_conflicts() ──
## x ggplot2::annotate() masks NLP::annotate()
## x tidyr::expand()     masks Matrix::expand()
## x dplyr::filter()     masks stats::filter()
## x dplyr::lag()        masks stats::lag()
## x tidyr::pack()       masks Matrix::pack()
## x tidyr::unpack()     masks Matrix::unpack()

library(broom.mixed)

## Registered S3 methods overwritten by 'broom.mixed':
##   method         from 
##   augment.lme    broom
##   augment.merMod broom
##   glance.lme     broom
##   glance.merMod  broom
##   glance.stanreg broom
##   tidy.brmsfit   broom
##   tidy.gamlss    broom
##   tidy.lme       broom
##   tidy.merMod    broom
##   tidy.rjags     broom
##   tidy.stanfit   broom
##   tidy.stanreg   broom

## 
## Attaching package: 'broom.mixed'

## The following object is masked from 'package:broom':
## 
##     tidyMCMC

library(purrr)
library(furrr)

## Loading required package: future

#install.packages("dplyr")
library(dplyr)

The statistical model:

\(y_t = \beta_0 + \beta_1 * (Elevation_s)_t + \beta_2 * Slope_t + (b_s)_t + \epsilon_t\)

Where:

\(\beta_0\) is the mean response when both Elevation and Slope are 0
\(\beta_1\) is the change in mean response for a 1-unit change in elevation. Elevation is measured at the stand level, so all plots in a stand share a single value in elevation.
\(\beta_2\) is the change in mean response for a 1-unit change in slope. Slope is measured at the plot level, so every plot potentially has a unique value of slope.

Let’s define the parameters:

the intercept \((\beta_0)\) will be -1
the coefficient for elevation \((\beta_1)\) will be set to 0.005
the coefficient for slope \((\beta_2)\) will be set to 0.1

nstand = 5
nplot = 4
b0 = -1
b1 = .005
b2 = .1
sds = 2
sd = 1

Simulate other variables:

set.seed(16)
stand = rep(LETTERS[1:nstand], each = nplot)
standeff = rep( rnorm(nstand, 0, sds), each = nplot)
ploteff = rnorm(nstand*nplot, 0, sd)

Simulate elevation and slope:

elevation = rep( runif(nstand, 1000, 1500), each = nplot)
slope = runif(nstand*nplot, 2, 75)

Simulate response variable:

resp2 = b0 + b1*elevation + b2*slope + standeff + ploteff

Your tasks (complete each task in its’ own code chunk, make sure to use echo=TRUE so I can see your code):

Fit a linear mixed model with the response variable as a function of elevation and slope with stand as a random effect. Are the estimated parameters similar to the intial parameters as we defined them?

# use this chunk to answer question 1

fit <- lmer(resp2 ~ elevation + slope + (1|stand))
options(scipen=100,digits=4)
summary(fit)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: resp2 ~ elevation + slope + (1 | stand)
## 
## REML criterion at convergence: 82
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.6558 -0.6247 -0.0169  0.5367  1.4174 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  stand    (Intercept) 1.21     1.10    
##  Residual             1.36     1.17    
## Number of obs: 20, groups:  stand, 5
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept) -21.31463    6.60205   3.00131   -3.23    0.048 *  
## elevation     0.02060    0.00492   3.11348    4.19    0.023 *  
## slope         0.09511    0.01644  15.86803    5.78 0.000029 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) elevtn
## elevation -0.991       
## slope      0.049 -0.148

Intercept value is -21.3, elevation 0.02. Model estimated parameters are quite different from initial parameters

Create a function for your model and run 1000 simulations of that model.

# use this chunk to answer question 2
library(purrr)
set.seed(16)

fitmod = function(nstand = 5, nplot = 4, b0 = -1, b1 = 0.005, b2 = 0.1, sds = 2, sd = 1) {
  stand = rep(LETTERS[1:nstand], each = nplot)
  standeff = rep(rnorm(nstand, 0, sds), each = nplot)
  ploteff = rnorm(nstand * nplot, 0, sd)
  elevation <- rep(runif(nstand, 1000, 1500), each = nplot)
  slope = runif(nstand * nplot, 2, 75)
  resp2 = b0 + b1 * elevation + b2 * slope + standeff + ploteff
  
  dat = data.frame(resp2, elevation, slope, stand)
  lmer(resp2 ~ 1 + elevation + slope + (1|stand), data = dat)
}
fitmod()

## Linear mixed model fit by REML ['lmerModLmerTest']
## Formula: resp2 ~ 1 + elevation + slope + (1 | stand)
##    Data: dat
## REML criterion at convergence: 81.99
## Random effects:
##  Groups   Name        Std.Dev.
##  stand    (Intercept) 1.10    
##  Residual             1.17    
## Number of obs: 20, groups:  stand, 5
## Fixed Effects:
## (Intercept)    elevation        slope  
##    -21.3146       0.0206       0.0951

sim_R = replicate(n=1000, expr = fitmod())

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular

Extract the stand and residual variances from this simulation run. Print the first 6 rows of the data.

#variances <- sim_R %>% map_dfr (tidy, effects = "ran_pars", scales = "vcov")


#variances <- sim_R %>% map(`[[`, 1) %>% map_dfr(tidy, effects = "ran_pars", #scales = "vcov")
variances=sim_R %>% map_dfr(tidy,effects="ran_pars",scales="vcov")

variances %>% print(n = 6)

## # A tibble: 2,000 x 4
##   effect   group    term             estimate
##   <chr>    <chr>    <chr>               <dbl>
## 1 ran_pars stand    var__(Intercept)    5.56 
## 2 ran_pars Residual var__Observation    0.951
## 3 ran_pars stand    var__(Intercept)    2.61 
## 4 ran_pars Residual var__Observation    1.11 
## 5 ran_pars stand    var__(Intercept)    9.73 
## 6 ran_pars Residual var__Observation    1.36 
## # … with 1,994 more rows

Choose three different sample sizes (your choice) and run 1000 model simulations with each sample size. Create 3 visualizations that compare distributions of the variances for each of the 3 sample sizes. Make sure that the axes are labelled correctly. What do these graphs say about the relationship between sample size and variance?

# use this chunk to answer question 4

stand_sims <- c(20, 60, 200) %>%
set_names(c("sample1_20","sample2_60", "sample3_200")) %>%
map(~replicate(n = 1000, expr = fitmod(nstand = .x)))

stand_vars = stand_sims %>%
modify_depth(2, ~tidy(.x, effects = "ran_pars", scales = "vcov")) %>%
map_dfr(bind_rows, .id = "id")

#%>%
#filter(group == "stand") 

ggplot(stand_vars, aes(x = estimate)) +
  geom_density(fill = "blue", alpha = 0.25) +
  facet_wrap(~id) +
  geom_vline(xintercept = 4)

Plot the coefficients of the estimates of elevation and slope. Hint: the x-axis should have 1000 values. Discuss the graphs.

# use this chunk to answer question 5

simsest <- sim_R %>% 
  future_map(tidy, effects = "fixed") %>% 
  bind_rows()

simsest %>% 
  dplyr::filter(term %in% c("elevation", "slope")) %>%
  group_by(term) %>% 
  mutate(x = 1 : 1000) %>%
  ungroup() %>% 
  mutate(real_value = ifelse(term == "elevation", 0.005, 0.1)) %>% 
  ggplot(aes(x = x, y = estimate)) +
  geom_line() +
  facet_wrap(~term) +
  geom_hline(aes(yintercept = real_value, color = term), linetype = 4, size = 0.5) +
  theme_bw()

Submit a link to this document in R Pubs to your Moodle. This assignment is worth 25 points.

ANLY 505 - Simulated Data and Linear Mixed Models

Week 2

Kirti Tambe

2020-05-17