The statistical model:
\(y_t = \beta_0 + \beta_1 * (Elevation_s)_t + \beta_2 * Slope_t + (b_s)_t + \epsilon_t\)
Where:
Let’s define the parameters:
nstand = 5
nplot = 4
b0 = -1
b1 = .005
b2 = .1
sds = 2
sd = 1
Simulate other variables:
set.seed(16)
stand = rep(LETTERS[1:nstand], each = nplot)
standeff = rep( rnorm(nstand, 0, sds), each = nplot)
ploteff = rnorm(nstand*nplot, 0, sd)
Simulate elevation and slope:
elevation = rep( runif(nstand, 1000, 1500), each = nplot)
slope = runif(nstand*nplot, 2, 75)
Simulate response variable:
resp2 = b0 + b1*elevation + b2*slope + standeff + ploteff
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.3
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(broom)
library(lme4)
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
Your tasks (complete each task in its’ own code chunk, make sure to use echo=TRUE so I can see your code):
# use this chunk to answer question 1
fit_1 = lm(resp2 ~ stand) # fit the model
summary(fit_1)
##
## Call:
## lm(formula = resp2 ~ stand)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.0604 -1.1517 0.0334 0.9560 4.5691
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.5276 1.0187 12.297 3.09e-09 ***
## standB -2.0727 1.4407 -1.439 0.171
## standC -0.7438 1.4407 -0.516 0.613
## standD -8.1292 1.4407 -5.643 4.68e-05 ***
## standE 0.1064 1.4407 0.074 0.942
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.037 on 15 degrees of freedom
## Multiple R-squared: 0.753, Adjusted R-squared: 0.6871
## F-statistic: 11.43 on 4 and 15 DF, p-value: 0.0001855
# use this chunk to answer question 2
fit_1_fun = function(nstand = 5, b0 = -1, b1 = 0.005, b2 = .1, sigma = 2) {
stand = rep(LETTERS[1:nstand], each = nplot)
standeff = rep(rnorm(nstand, 0, sds), each = nplot)
ploteff = rnorm(nstand*nplot, 0, sd)
elevation = rep( runif(nstand, 1000, 1500), each = nplot)
slope = runif(nstand*nplot, 2, 75)
resp = b0 + b1*elevation + b2*slope + standeff + ploteff
dat = data.frame(stand, standeff, ploteff)
fit1 = lmer(resp ~ elevation + slope + (1|stand), data = dat)
}
sims <- rerun(1000, fit_1_fun())
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
# use this chunk to answer question 3
stand_vars = sims %>%
map_dfr(tidy, effects = "ran_pars",scales = "vcov")
head(stand_vars)
## # A tibble: 6 x 3
## term group estimate
## <chr> <chr> <dbl>
## 1 var_(Intercept).stand stand 5.56
## 2 var_Observation.Residual Residual 0.951
## 3 var_(Intercept).stand stand 2.61
## 4 var_Observation.Residual Residual 1.11
## 5 var_(Intercept).stand stand 9.73
## 6 var_Observation.Residual Residual 1.36
Ans. Sample size chosen are 10, 25, 100. Increasing in sample size leads to variance decreases.
# use this chunk to answer question 4
stand_sims = c(10, 25, 100) %>%
set_names() %>%
map(~rerun(1000, fit_1_fun(nstand = .x)))
## boundary (singular) fit: see ?isSingular
stand_vars = stand_sims %>%
modify_depth(2, ~tidy(.x, effects = "ran_pars", scales = "vcov")) %>%
map_dfr(bind_rows, .id = "stand_num") %>%
filter(group == "stand")
head(stand_vars)
## # A tibble: 6 x 4
## stand_num term group estimate
## <chr> <chr> <chr> <dbl>
## 1 10 var_(Intercept).stand stand 5.60
## 2 10 var_(Intercept).stand stand 7.11
## 3 10 var_(Intercept).stand stand 1.68
## 4 10 var_(Intercept).stand stand 2.74
## 5 10 var_(Intercept).stand stand 3.16
## 6 10 var_(Intercept).stand stand 7.19
stand_vars = mutate(stand_vars, stand_num = fct_inorder(stand_num))
add_prefix = function(string) {
paste("Number of stands:", string, sep = " ") #label plots
}
groupmed = stand_vars %>%
group_by(stand_num) %>%
summarise(mvar = median(estimate)) # add median of each distribution
ggplot(stand_vars, aes(x = estimate)) +
geom_density(fill = "pink", alpha = 0.25) +
facet_wrap(~stand_num, labeller = as_labeller(add_prefix)) +
geom_vline(aes(xintercept = 4, linetype = "True Variance"), size = 0.5) +
geom_vline(data = groupmed, aes(xintercept = mvar, linetype = "Median Variance"), size = 0.5) +
theme_bw()+
scale_linetype_manual(name = "", values = c(2,1)) +
theme(legend.position = "bottom", legend.key.width = unit(.1, "cm")) +
labs(x = "Estimated Variance", y = "Density")
Ans. The median estimates of coefficients for elevation and slope are 0.0047 and 0.1006, respectively, which is considered pretty close to the initial set parameters of 0.005 and 0.1. Although the estimate b1 and b2 are scattered in the plot, and some estimates of b1 are negative, considering the model predicts the two important factor pretty well, I’d say the model is a good fit.
# use this chunk to answer question 5
coef = sims %>%
map_dfr(tidy, effects="fixed") %>%
filter(term %in% c("elevation", "slope"))
coef
## # A tibble: 2,000 x 4
## term estimate std.error statistic
## <chr> <dbl> <dbl> <dbl>
## 1 elevation -0.00546 0.00927 -0.589
## 2 slope 0.0868 0.0140 6.21
## 3 elevation 0.0152 0.00426 3.57
## 4 slope 0.121 0.0157 7.73
## 5 elevation 0.0156 0.0117 1.33
## 6 slope 0.124 0.0166 7.48
## 7 elevation 0.0130 0.00336 3.87
## 8 slope 0.0937 0.0100 9.32
## 9 elevation 0.0135 0.0157 0.856
## 10 slope 0.0767 0.0152 5.05
## # … with 1,990 more rows
coef_med = coef %>%
group_by(term) %>%
summarise(med=median(estimate))
coef_med
## # A tibble: 2 x 2
## term med
## <chr> <dbl>
## 1 elevation 0.00470
## 2 slope 0.0999
coef_plot = ggplot(coef, aes(x=rep(1:1000, each=2), y=estimate))+
geom_point(size=0.5)+
facet_wrap(~term)+
geom_hline(data=coef_med, aes(yintercept=med, color=term), size=0.8)+
labs(x="Replicated number", y="Coefficients of the estimates")
coef_plot
6. Submit a link to this document in R Pubs to your Moodle. This assignment is worth 25 points.