ANLY 505 - Simulated Data and Linear Mixed Models

The statistical model:

\(y_t = \beta_0 + \beta_1 * (Elevation_s)_t + \beta_2 * Slope_t + (b_s)_t + \epsilon_t\)

Where:

\(\beta_0\) is the mean response when both Elevation and Slope are 0
\(\beta_1\) is the change in mean response for a 1-unit change in elevation. Elevation is measured at the stand level, so all plots in a stand share a single value in elevation.
\(\beta_2\) is the change in mean response for a 1-unit change in slope. Slope is measured at the plot level, so every plot potentially has a unique value of slope.

Let’s define the parameters:

the intercept \((\beta_0)\) will be -1
the coefficient for elevation \((\beta_1)\) will be set to 0.005
the coefficient for slope \((\beta_2)\) will be set to 0.1

nstand = 5
nplot = 4
b0 = -1
b1 = .005
b2 = .1
sds = 2
sd = 1

Simulate other variables:

set.seed(16)
stand = rep(LETTERS[1:nstand], each = nplot)
standeff = rep( rnorm(nstand, 0, sds), each = nplot)
ploteff = rnorm(nstand*nplot, 0, sd)

Simulate elevation and slope:

elevation = rep( runif(nstand, 1000, 1500), each = nplot)
slope = runif(nstand*nplot, 2, 75)

Simulate response variable:

resp2 = b0 + b1*elevation + b2*slope + standeff + ploteff

Your tasks (complete each task in its’ own code chunk, make sure to use echo=TRUE so I can see your code):

Fit a linear mixed model with the response variable as a function of elevation and slope with stand as a random effect. Are the estimated parameters similar to the intial parameters as we defined them?

# use this chunk to answer question 1
library(lme4)

## Loading required package: Matrix

ini <- lmer(resp2 ~ elevation + slope + (1|stand))
summary(ini)

## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ elevation + slope + (1 | stand)
## 
## REML criterion at convergence: 82
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.65583 -0.62467 -0.01693  0.53669  1.41736 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  stand    (Intercept) 1.208    1.099   
##  Residual             1.358    1.165   
## Number of obs: 20, groups:  stand, 5
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept) -21.314628   6.602053  -3.228
## elevation     0.020600   0.004916   4.190
## slope         0.095105   0.016441   5.785
## 
## Correlation of Fixed Effects:
##           (Intr) elevtn
## elevation -0.991       
## slope      0.049 -0.148

#the intercept = -21.314 which is different from intitial parameter b0 = -1; Elevation = 0.02 which is different from initial parameter b1 = .005; Slope = 0.09 which is very close to initial parameter b2= 0.1; standard deviation = 1.099 which is close to initial parameter sd = 1.

Create a function for your model and run 1000 simulations of that model.

# use this chunk to answer question 2
new2 = function(nstand = 5, nplot = 4, b0 = -1, b1 = 0.005, b2 = 0.1, sds = 2, sd = 1) {
  stand = rep(LETTERS[1:nstand], each = nplot)
  standeff = rep(rnorm(nstand, 0, sds), each = nplot)
  ploteff = rnorm(nstand * nplot, 0, sd)
  elevation = rep(runif(nstand, 1000, 1500), each = nplot)
  slope = runif(nstand * nplot, 2, 75)
  resp2 = b0 + b1 * elevation + b2 * slope + standeff + ploteff
  dat = data.frame(resp2, elevation, slope, stand)
  lmer(resp2 ~ 1 + elevation + slope + (1|stand), data = dat)
}
summary(new2())

## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ 1 + elevation + slope + (1 | stand)
##    Data: dat
## 
## REML criterion at convergence: 81
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.68812 -0.55536 -0.06602  0.60836  1.30264 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  stand    (Intercept) 5.5570   2.3573  
##  Residual             0.9514   0.9754  
## Number of obs: 20, groups:  stand, 5
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept) 10.584601  10.265872   1.031
## elevation   -0.005464   0.009270  -0.589
## slope        0.086839   0.013993   6.206
## 
## Correlation of Fixed Effects:
##           (Intr) elevtn
## elevation -0.994       
## slope     -0.153  0.112

sti1000 <- replicate(1000, expr = new2())

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular

Extract the stand and residual variances from this simulation run. Print the first 6 rows of the data.

# use this chunk to answer question 3
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

## ✔ ggplot2 3.2.1     ✔ purrr   0.3.3
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::expand() masks Matrix::expand()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ tidyr::pack()   masks Matrix::pack()
## ✖ tidyr::unpack() masks Matrix::unpack()

library(broom)

variances <- sti1000 %>% map_dfr(tidy, effects = "ran_pars", scales = "vcov")
variances %>% print(n = 6)

## # A tibble: 2,000 x 3
##   term                     group    estimate
##   <chr>                    <chr>       <dbl>
## 1 var_(Intercept).stand    stand       2.61 
## 2 var_Observation.Residual Residual    1.11 
## 3 var_(Intercept).stand    stand       9.73 
## 4 var_Observation.Residual Residual    1.36 
## 5 var_(Intercept).stand    stand       0.827
## 6 var_Observation.Residual Residual    0.914
## # … with 1,994 more rows

Choose three different sample sizes (your choice) and run 1000 model simulations with each sample size. Create 3 visualizations that compare distributions of the variances for each of the 3 sample sizes. Make sure that the axes are labelled correctly. What do these graphs say about the relationship between sample size and variance?

# use this chunk to answer question 4
library(ggplot2)
library(dplyr)
library(magrittr)

## 
## Attaching package: 'magrittr'

## The following object is masked from 'package:purrr':
## 
##     set_names

## The following object is masked from 'package:tidyr':
## 
##     extract

library(purrr)
library(dplyr)
library(furrr)

## Loading required package: future

library(lmerTest)

## 
## Attaching package: 'lmerTest'

## The following object is masked from 'package:lme4':
## 
##     lmer

## The following object is masked from 'package:stats':
## 
##     step

sti_3_sample = c(10, 50, 100) %>%
  set_names(c("sample size=10", "sample size=50", "sample size=100")) %>%
  map(~replicate(1000, new2(nstand = .x)))

## boundary (singular) fit: see ?isSingular

extract_var_3_sample<-sti_3_sample %>%
  modify_depth(2, ~tidy(.x, effects = "ran_pars", scales = "vcov")) %>%
  map_dfr(bind_rows, .id = "id")

subset_var<-subset(extract_var_3_sample, group == "stand", select = colnames(extract_var_3_sample))

mean(subset_var$estimate)

## [1] 4.010765

median(subset_var$estimate)

## [1] 3.805362

ggplot(subset_var, aes(x = estimate)) +
  geom_density(fill = "red", alpha = "0.5") +
  facet_wrap(group~id)+geom_vline(xintercept = c(mean(subset_var$estimate), median(subset_var$estimate)))+  labs(title = "Distributions of the variances for each of the 3 sample sizes")

Plot the coefficients of the estimates of elevation and slope. Hint: the x-axis should have 1000 values. Discuss the graphs.

# use this chunk to answer question 5
sim_1000_coeff <- sti1000 %>% 
  future_map(tidy, effects = "fixed") %>% 
  bind_rows()


df_graph<-sim_1000_coeff %>% 
  dplyr::filter(term %in% c("elevation", "slope")) %>% 
  group_by(term) %>% 
  mutate(range = 1 : 1000) %>%
  ungroup()

#compute mean slope and elevation
mean_slope<-mean(df_graph$estimate[df_graph$term=="slope"])
mean_elevation<-mean(df_graph$estimate[df_graph$term=="elevation"])


#add term mean estimate for slope and elevation
df_graph_revised<-df_graph%>% 
  mutate(term_mean_est = ifelse(term == "slope", mean_slope, mean_elevation))

Submit a link to this document in R Pubs to your Moodle. This assignment is worth 25 points.

ANLY 505 - Simulated Data and Linear Mixed Models

Week 2

xueying lu

2020-04-15