ANLY 505 - Simulated Data and Linear Mixed Models

The statistical model:

\(y_t = \beta_0 + \beta_1 * (Elevation_s)_t + \beta_2 * Slope_t + (b_s)_t + \epsilon_t\)

Where:

\(\beta_0\) is the mean response when both Elevation and Slope are 0
\(\beta_1\) is the change in mean response for a 1-unit change in elevation. Elevation is measured at the stand level, so all plots in a stand share a single value in elevation.
\(\beta_2\) is the change in mean response for a 1-unit change in slope. Slope is measured at the plot level, so every plot potentially has a unique value of slope.

Let’s define the parameters:

the intercept \((\beta_0)\) will be -1
the coefficient for elevation \((\beta_1)\) will be set to 0.005
the coefficient for slope \((\beta_2)\) will be set to 0.1

nstand = 5
nplot = 4
b0 = -1
b1 = .005
b2 = .1
sds = 2
sd = 1

Simulate other variables:

set.seed(16)
stand = rep(LETTERS[1:nstand], each = nplot)
standeff = rep( rnorm(nstand, 0, sds), each = nplot)
ploteff = rnorm(nstand*nplot, 0, sd)

Simulate elevation and slope:

elevation = rep( runif(nstand, 1000, 1500), each = nplot)
slope = runif(nstand*nplot, 2, 75)

Simulate response variable:

resp2 = b0 + b1*elevation + b2*slope + standeff + ploteff

Your tasks (complete each task in its’ own code chunk, make sure to use echo=TRUE so I can see your code):

Fit a linear mixed model with the response variable as a function of elevation and slope with stand as a random effect. Are the estimated parameters similar to the intial parameters as we defined them? From the result, the b1 =0.005 and it is much smaller than the initial Parameter.The b2=0.1 and it is close to the initial parameter.The estimated intercept -21.31 is complete different from real value -1.

library(lme4)

## Loading required package: Matrix

Q1 <- lmer(resp2 ~ 1 + elevation + slope + (1|stand))
summary(Q1)

## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ 1 + elevation + slope + (1 | stand)
## 
## REML criterion at convergence: 82
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.65583 -0.62467 -0.01693  0.53669  1.41736 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  stand    (Intercept) 1.208    1.099   
##  Residual             1.358    1.165   
## Number of obs: 20, groups:  stand, 5
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept) -21.314628   6.602053  -3.228
## elevation     0.020600   0.004916   4.190
## slope         0.095105   0.016441   5.785
## 
## Correlation of Fixed Effects:
##           (Intr) elevtn
## elevation -0.991       
## slope      0.049 -0.148

library(lmerTest)

## 
## Attaching package: 'lmerTest'

## The following object is masked from 'package:lme4':
## 
##     lmer

## The following object is masked from 'package:stats':
## 
##     step

cat("b0 = ", b0, sep = "")

## b0 = -1

cat("b1 = ", b1, sep = "")

## b1 = 0.005

cat("b2 = ", b2, sep = "")

## b2 = 0.1

Create a function for your model and run 1000 simulations of that model.

library(purrr)
mix_fun = function(nstand = 5, nplot = 4, b0 = -1, b1 = 0.005, b2 = 0.1, sds = 2, sd = 1) {
  stand = rep(LETTERS[1:nstand], each = nplot)
  standeff = rep(rnorm(nstand, 0, sds), each = nplot)
  ploteff = rnorm(nstand * nplot, 0, sd)
  elevation = rep(runif(nstand, 1000, 1500), each = nplot)
  slope = runif(nstand * nplot, 2, 75)
  resp2 = b0 + b1 * elevation + b2 * slope + standeff + ploteff
  dat = data.frame(resp2, elevation, slope, stand)
  lmer(resp2 ~ 1 + elevation + slope + (1|stand), data = dat)
}
mix_fun()

## Linear mixed model fit by REML ['lmerModLmerTest']
## Formula: resp2 ~ 1 + elevation + slope + (1 | stand)
##    Data: dat
## REML criterion at convergence: 80.9781
## Random effects:
##  Groups   Name        Std.Dev.
##  stand    (Intercept) 2.3573  
##  Residual             0.9754  
## Number of obs: 20, groups:  stand, 5
## Fixed Effects:
## (Intercept)    elevation        slope  
##   10.584601    -0.005464     0.086839

simsmix = rerun(1000, mix_fun())

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular

Extract the stand and residual variances from this simulation run. Print the first 6 rows of the data.

library(tidyverse)

## -- Attaching packages -------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --

## √ ggplot2 3.2.0     √ readr   1.3.1
## √ tibble  2.1.3     √ dplyr   0.8.3
## √ tidyr   0.8.3     √ stringr 1.4.0
## √ ggplot2 3.2.0     √ forcats 0.4.0

## -- Conflicts ----------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x tidyr::expand() masks Matrix::expand()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(broom)
library(broom.mixed)

## Registered S3 methods overwritten by 'broom.mixed':
##   method         from 
##   augment.lme    broom
##   augment.merMod broom
##   glance.lme     broom
##   glance.merMod  broom
##   glance.stanreg broom
##   tidy.brmsfit   broom
##   tidy.gamlss    broom
##   tidy.lme       broom
##   tidy.merMod    broom
##   tidy.rjags     broom
##   tidy.stanfit   broom
##   tidy.stanreg   broom

## 
## Attaching package: 'broom.mixed'

## The following object is masked from 'package:broom':
## 
##     tidyMCMC

variances <- simsmix %>% map_dfr(tidy, effects = "ran_pars", scales = "vcov")
variances %>% print(n = 6)

## # A tibble: 2,000 x 4
##   effect   group    term             estimate
##   <chr>    <chr>    <chr>               <dbl>
## 1 ran_pars stand    var__(Intercept)    2.61 
## 2 ran_pars Residual var__Observation    1.11 
## 3 ran_pars stand    var__(Intercept)    9.73 
## 4 ran_pars Residual var__Observation    1.36 
## 5 ran_pars stand    var__(Intercept)    0.827
## 6 ran_pars Residual var__Observation    0.914
## # ... with 1,994 more rows

Choose three different sample sizes (your choice) and run 1000 model simulations with each sample size. Create 3 visualizations that compare distributions of the variances for each of the 3 sample sizes. Make sure that the axes are labelled correctly. What do these graphs say about the relationship between sample size and variance? The real variance of stand is 4. The estimate becomes more accurate as the sample size increases as We can see from the plot.The variance decrease as the sample size increases

library(ggplot2)
library(dplyr)
stand_sims = c(10, 50, 200) %>%
  set_names() %>%
  map(~replicate(1000, mix_fun(nstand = .x)))

## boundary (singular) fit: see ?isSingular

stand_vars = stand_sims %>%
  modify_depth(2, ~tidy(.x, effects = "ran_pars", scales = "vcov")) %>%
  map_dfr(bind_rows, .id = "id") %>%
  filter(group == "stand")

ggplot(stand_vars, aes(x = estimate)) +
  geom_density(fill = "orange", alpha = "0.25") +
  facet_wrap(~id) +
  geom_vline(xintercept = 4)

Plot the coefficients of the estimates of elevation and slope. Hint: the x-axis should have 1000 values. Discuss the graphs. From the graph ,we can tell Defined Elevation(0.005) and slope(0.1) and mean of Elevation is exact same.

library(furrr)

## Loading required package: future

simsest <- simsmix %>% 
  future_map(tidy, effects = "fixed") %>% 
  bind_rows()

simsest %>% 
  dplyr::filter(term %in% c("elevation", "slope")) %>% 
  group_by(term) %>% 
  mutate(x = 1 : 1000) %>%
  ungroup() %>% 
  mutate(real_value = ifelse(term == "elevation", 0.005, 0.1)) %>% 
  ggplot(aes(x = x, y = estimate)) +
  geom_line() +
  facet_wrap(~term) +
  geom_hline(aes(yintercept = real_value, color = term), linetype = 4, size = 0.5) +
  theme_bw()

Submit a link to this document in R Pubs to your Moodle. This assignment is worth 25 points.

ANLY 505 - Simulated Data and Linear Mixed Models

Week 2

Hua Qiu

2019-11-09