ANLY 505 - Simulated Data and Linear Mixed Models

The statistical model:

\(y_t = \beta_0 + \beta_1 * (Elevation_s)_t + \beta_2 * Slope_t + (b_s)_t + \epsilon_t\)

Where:

\(\beta_0\) is the mean response when both Elevation and Slope are 0
\(\beta_1\) is the change in mean response for a 1-unit change in elevation. Elevation is measured at the stand level, so all plots in a stand share a single value in elevation.
\(\beta_2\) is the change in mean response for a 1-unit change in slope. Slope is measured at the plot level, so every plot potentially has a unique value of slope.

Let’s define the parameters:

the intercept \((\beta_0)\) will be -1
the coefficient for elevation \((\beta_1)\) will be set to 0.005
the coefficient for slope \((\beta_2)\) will be set to 0.1

nstand = 5
nplot = 4
b0 = -1
b1 = .005
b2 = .1
sds = 2
sd = 1

Simulate other variables:

set.seed(16)
stand = rep(LETTERS[1:nstand], each = nplot)
standeff = rep( rnorm(nstand, 0, sds), each = nplot)
ploteff = rnorm(nstand*nplot, 0, sd)

Simulate elevation and slope:

elevation = rep( runif(nstand, 1000, 1500), each = nplot)
slope = runif(nstand*nplot, 2, 75)

Simulate response variable:

resp2 = b0 + b1*elevation + b2*slope + standeff + ploteff

Your tasks (complete each task in its’ own code chunk, make sure to use echo=TRUE so I can see your code):

Fit a linear mixed model with the response variable as a function of elevation and slope with stand as a random effect. Are the estimated parameters similar to the intial parameters as we defined them?

library("lme4")

## Loading required package: Matrix

model <- lmer(resp2 ~ elevation + slope + (1|stand))
model

## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ elevation + slope + (1 | stand)
## REML criterion at convergence: 81.9874
## Random effects:
##  Groups   Name        Std.Dev.
##  stand    (Intercept) 1.099   
##  Residual             1.165   
## Number of obs: 20, groups:  stand, 5
## Fixed Effects:
## (Intercept)    elevation        slope  
##   -21.31463      0.02060      0.09511

summary(model)

## Linear mixed model fit by REML ['lmerMod']
## Formula: resp2 ~ elevation + slope + (1 | stand)
## 
## REML criterion at convergence: 82
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.65583 -0.62467 -0.01693  0.53669  1.41736 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  stand    (Intercept) 1.208    1.099   
##  Residual             1.358    1.165   
## Number of obs: 20, groups:  stand, 5
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept) -21.314628   6.602053  -3.228
## elevation     0.020600   0.004916   4.190
## slope         0.095105   0.016441   5.785
## 
## Correlation of Fixed Effects:
##           (Intr) elevtn
## elevation -0.991       
## slope      0.049 -0.148

Create a function for your model and run 1000 simulations of that model.

function1 <- function(nstand=5, nplot=4, b0=-1, b1=0.005, b2=0.1, sigma_s=2, sigma=1) {
        stand <- rep(LETTERS[1:nstand], each = nplot)
        standeff <- rep(rnorm(nstand, 0, sigma_s), each = nplot)
        ploteff <- rnorm(nstand*nplot, 0, sigma)
        elevation <- rep(runif(nstand, 1000, 1500), each = nplot)
        slope <- runif(nstand*nplot, 2, 75)
        resp2 <- b0 + b1*elevation + b2*slope + standeff + ploteff 
        fit2 <- lmer(resp2 ~ elevation + slope + (1|stand))
}

times <- 1000
set.seed(16)
result <- replicate(times, function1())

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular

Extract the stand and residual variances from this simulation run. Print the first 6 rows of the data.

library("purrr")
library("tidyr")

## 
## Attaching package: 'tidyr'

## The following object is masked from 'package:Matrix':
## 
##     expand

library(broom)
tidy(model, effects = "ran_pars", scales = "vcov")

## # A tibble: 2 x 3
##   term                     group    estimate
##   <chr>                    <chr>       <dbl>
## 1 var_(Intercept).stand    stand        1.21
## 2 var_Observation.Residual Residual     1.36

head(map_dfr(result, broom::tidy, .id = "model", effects = "ran_pars"))

## # A tibble: 6 x 4
##   model term                    group    estimate
##   <chr> <chr>                   <chr>       <dbl>
## 1 1     sd_(Intercept).stand    stand       1.10 
## 2 1     sd_Observation.Residual Residual    1.17 
## 3 2     sd_(Intercept).stand    stand       2.36 
## 4 2     sd_Observation.Residual Residual    0.975
## 5 3     sd_(Intercept).stand    stand       1.62 
## 6 3     sd_Observation.Residual Residual    1.05

Choose three different sample sizes (your choice) and run 1000 model simulations with each sample size. Create 3 visualizations that compare distributions of the variances for each of the 3 sample sizes. Make sure that the axes are labelled correctly. What do these graphs say about the relationship between sample size and variance?

library("dplyr")

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library("forcats")
stand_sims <- c(5, 20, 100) %>%
  set_names() %>%
  map(~replicate (1000, function1(nstand = .x)))

## boundary (singular) fit: see ?isSingular

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular

stand_vars = stand_sims %>%
  modify_depth(2, ~tidy(.x, effects = "ran_pars", scales = "vcov")) %>%
  map_dfr(bind_rows, .id = "stand_number") %>%
  filter (group == "stand")
head(stand_vars)

## # A tibble: 6 x 4
##   stand_number term                  group estimate
##   <chr>        <chr>                 <chr>    <dbl>
## 1 5            var_(Intercept).stand stand     6.91
## 2 5            var_(Intercept).stand stand     1.92
## 3 5            var_(Intercept).stand stand    12.0 
## 4 5            var_(Intercept).stand stand     6.84
## 5 5            var_(Intercept).stand stand     2.89
## 6 5            var_(Intercept).stand stand     2.52

stand_vars = mutate(stand_vars, stand_number = fct_inorder(stand_number))
add_prefix <- function(string) {
  paste("Number of Stands:", string, sep = " ")
}
groupmed = stand_vars %>%
  group_by(stand_number) %>%
  summarise(mvar = median(estimate))
library("ggplot2")
ggplot(stand_vars, aes(x = estimate)) +
  geom_density(fill = "orange", alpha = 0.25) +
  facet_wrap(~stand_number, labeller = as_labeller(add_prefix)) +
  geom_vline(aes(xintercept = 4, linetype = "True Variance"), size = 0.5) +
  geom_vline(data = groupmed, aes(xintercept = mvar, linetype = "Median Variance"), size = 0.5) + theme_bw() +
  scale_linetype_manual(name = "", values = c(2,1)) +
  theme(legend.position = "bottom", legend.key.width = unit(.1, "cm")) +
  labs(x = "Estimated Variance", y = "Probability Density")

Plot the coefficients of the estimates of elevation and slope. Hint: the x-axis should have 1000 values. Discuss the graphs.

coef <- result %>%
  map_dfr(tidy, effects="fixed") %>%
  filter(term %in% c("elevation", "slope"))
coef$sequence <- rep(1:times, each=2)

coefmed <- coef %>%
  group_by(term) %>%
  summarise(med=median(estimate))

f2 <- ggplot(coef, aes(sequence, estimate))+
  geom_point(size=0.2)+
  facet_wrap(~term)+
  geom_hline(data=coefmed, aes(yintercept=med), size=0.4)+
  labs(x="Sequence number", y="Estimated coefficients")+
  theme_bw()
f2

#The model may not fit well in this case.

Submit a link to this document in R Pubs to your Moodle. This assignment is worth 25 points.

ANLY 505 - Simulated Data and Linear Mixed Models

Week 2

Shuai Wang

2020-03-27