Load in packages
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
-- Attaching packages --------------------------------------- tidyverse 1.3.0 --
v ggplot2 3.3.2 v purrr 0.3.4
v tibble 3.0.3 v dplyr 1.0.2
v tidyr 1.1.1 v stringr 1.4.0
v readr 1.3.1 v forcats 0.5.0
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
library(lme4)
Loading required package: Matrix
Attaching package: 㤼㸱Matrix㤼㸲
The following objects are masked from 㤼㸱package:tidyr㤼㸲:
expand, pack, unpack
Load in dataset
schoolsmoke <- read_csv("~/EDUS 651/Module 4/schoolsmoke.csv")
Parsed with column specification:
cols(
school = col_double(),
thk = col_double(),
prethk = col_double(),
cc = col_double(),
gprethk = col_double(),
d = col_double(),
d2 = col_double(),
d3 = col_double(),
d4 = col_double(),
d5 = col_double()
)
glimpse(schoolsmoke)
Rows: 1,600
Columns: 10
$ school <dbl> 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, ...
$ thk <dbl> 1, 2, 3, 1, 2, 2, 3, 3, 2, 2, 4, 2, 2, 3, 4, 4, 1, 3, 4...
$ prethk <dbl> 5, 3, 4, 0, 1, 3, 1, 2, 3, 2, 2, 2, 3, 3, 2, 3, 3, 3, 2...
$ cc <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ gprethk <dbl> 2.153846, 2.153846, 2.153846, 2.153846, 2.153846, 2.153...
$ d <dbl> 1, 1, 2, 3, 0, 2, 2, 2, 1, 2, 2, 4, 0, 2, 1, 0, 2, 1, 2...
$ d2 <dbl> 2, 4, 1, 2, 3, 1, 2, 1, 1, 2, 3, 3, 2, 2, 2, 2, 4, 1, 1...
$ d3 <dbl> 2, 0, 3, 2, 3, 1, 4, 2, 3, 3, 3, 0, 2, 1, 2, 2, 2, 3, 2...
$ d4 <dbl> 3, 1, 3, 2, 2, 2, 2, 3, 2, 4, 3, 1, 2, 2, 2, 2, 2, 2, 2...
$ d5 <dbl> 0, 3, 3, 2, 2, 0, 0, 4, 1, 0, 2, 1, 2, 4, 3, 4, 1, 1, 2...
Practice precleaning
schoolsmoke.1 <- schoolsmoke %>%
mutate(., cc.factor = as.factor(cc))
glimpse(schoolsmoke.1)
Rows: 1,600
Columns: 11
$ school <dbl> 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193...
$ thk <dbl> 1, 2, 3, 1, 2, 2, 3, 3, 2, 2, 4, 2, 2, 3, 4, 4, 1, 3,...
$ prethk <dbl> 5, 3, 4, 0, 1, 3, 1, 2, 3, 2, 2, 2, 3, 3, 2, 3, 3, 3,...
$ cc <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
$ gprethk <dbl> 2.153846, 2.153846, 2.153846, 2.153846, 2.153846, 2.1...
$ d <dbl> 1, 1, 2, 3, 0, 2, 2, 2, 1, 2, 2, 4, 0, 2, 1, 0, 2, 1,...
$ d2 <dbl> 2, 4, 1, 2, 3, 1, 2, 1, 1, 2, 3, 3, 2, 2, 2, 2, 4, 1,...
$ d3 <dbl> 2, 0, 3, 2, 3, 1, 4, 2, 3, 3, 3, 0, 2, 1, 2, 2, 2, 3,...
$ d4 <dbl> 3, 1, 3, 2, 2, 2, 2, 3, 2, 4, 3, 1, 2, 2, 2, 2, 2, 2,...
$ d5 <dbl> 0, 3, 3, 2, 2, 0, 0, 4, 1, 0, 2, 1, 2, 4, 3, 4, 1, 1,...
$ cc.factor <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
Changed to a factor because that data is categorical.
model.1 <- lmer(thk ~ prethk + cc.factor + gprethk + (1|school), REML = FALSE, data = schoolsmoke.1)
summary(model.1)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: thk ~ prethk + cc.factor + gprethk + (1 | school)
Data: schoolsmoke.1
AIC BIC logLik deviance df.resid
4712.6 4744.9 -2350.3 4700.6 1594
Scaled residuals:
Min 1Q Median 3Q Max
-2.23332 -0.79567 0.02374 0.79254 2.16704
Random effects:
Groups Name Variance Std.Dev.
school (Intercept) 0.01734 0.1317
Residual 1.09334 1.0456
Number of obs: 1600, groups: school, 28
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.07609 0.25478 4.224
prethk 0.21260 0.02138 9.944
cc.factor1 0.43075 0.07554 5.702
gprethk 0.41835 0.11930 3.507
Correlation of Fixed Effects:
(Intr) prethk cc.fc1
prethk 0.000
cc.factor1 -0.288 0.000
gprethk -0.963 -0.179 0.148
This model is the same as we ran for Module 4. previous knowledge about tobacco health (prethk, t = 9.944), whether the school had the prevention curriculum (cc.factor1, t = 5.702), and the schoolwide previous knowledge about tobacco (gprethk, t = 3.507) are all positively and significantly related to students knowledge about tobacco and health. Further, a one unit increase in previous knowledge is predicted to result in a 0.21 unit increase in tobacco and health knowledge when adjusting for whether the school had the prevention curriculum and the school’s previous knowledge about tobacco. Additionally, whether the school had the prevention curriculum and the schoolwide previous tobacco knowledge have similar impacts on tobacco knowledge. If the students recieved the prevention curriculum there is a predicted 0.43 increase in tobacco knowledge while adjusting for the individual’s previous knowledge and the school’s knowledge. Lastly, a one unit increase in the school’s previous knowledge is predicted to result in a 0.42 increase in knowledge while adjusting for the curriculum recieved and the individual’s previous knowledge.
The AIC and BIC are 4712.6 and 4744.9 respectively.
model.2 <- lmer(thk ~ prethk + cc.factor + gprethk + (prethk|school), REML = FALSE, data = schoolsmoke.1)
summary(model.2)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: thk ~ prethk + cc.factor + gprethk + (prethk | school)
Data: schoolsmoke.1
AIC BIC logLik deviance df.resid
4715.5 4758.6 -2349.8 4699.5 1592
Scaled residuals:
Min 1Q Median 3Q Max
-2.32829 -0.81670 0.02938 0.81294 2.14320
Random effects:
Groups Name Variance Std.Dev. Corr
school (Intercept) 0.049006 0.22137
prethk 0.002063 0.04542 -0.97
Residual 1.090740 1.04438
Number of obs: 1600, groups: school, 28
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.90319 0.24823 3.639
prethk 0.21136 0.02323 9.097
cc.factor1 0.42343 0.07337 5.771
gprethk 0.50405 0.11449 4.403
Correlation of Fixed Effects:
(Intr) prethk cc.fc1
prethk -0.065
cc.factor1 -0.295 -0.004
gprethk -0.951 -0.174 0.160
The fixed estimate for prethk is significant (t = 9.097). Thus, for every one unit increase in previous knowledge we predict tobacco knowledge to increase by 0.211 while adjusting for the presense of the curriculum and the school’s previous knowledge. This is almost exactly what the previuos model predicted as well. The random variance for prethk is very small, 0.002. This that increase of 0.211 may actually be an increase of 0.209 or 0.213 for some individuals or inbetween those two.
lmerTest::rand(model.2)
ANOVA-like table for random-effects: Single term deletions
Model:
thk ~ prethk + cc.factor + gprethk + (prethk | school)
npar logLik AIC LRT Df Pr(>Chisq)
<none> 8 -2349.8 4715.5
prethk in (prethk | school) 6 -2350.3 4712.6 1.0762 2 0.5839
From these results we can see that with the random slope of prethk AIC goes up, although only slightly, which means the model WITHOUT the random slope is a better fit. However, this difference between the two models is not significant. I do not think the inclusion of the random slope for prethk benefits this model, it complicates it without evidence that it improves our model fit.
4.Finally, test a cross-level interaction between prethk and gprethk (don’t forget to create new variables representing the centered versions of prethk and gprethk). Interpret the results of the interaction effect. Use margins (Stata) or interplot (R) to obtain predicted slopes for gprethk for students with prethk (centered) scores from -2 to 2. Visualize and explain your results.
schoolsmoke.2 <- schoolsmoke.1 %>%
mutate(.,
prethk_cent = prethk - mean(prethk),
gprethk_cent = gprethk - mean(gprethk))
psych::describe(schoolsmoke.2, fast = TRUE)
no non-missing arguments to min; returning Infno non-missing arguments to max; returning -Inf
model.3 <- lmer(thk ~ cc.factor + prethk_cent + gprethk_cent + (1|school), REML = FALSE, data = schoolsmoke.2)
summary(model.3)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: thk ~ cc.factor + prethk_cent + gprethk_cent + (1 | school)
Data: schoolsmoke.2
AIC BIC logLik deviance df.resid
4712.6 4744.9 -2350.3 4700.6 1594
Scaled residuals:
Min 1Q Median 3Q Max
-2.23332 -0.79567 0.02374 0.79254 2.16704
Random effects:
Groups Name Variance Std.Dev.
school (Intercept) 0.01734 0.1317
Residual 1.09334 1.0456
Number of obs: 1600, groups: school, 28
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.38176 0.05235 45.499
cc.factor1 0.43075 0.07554 5.702
prethk_cent 0.21260 0.02138 9.944
gprethk_cent 0.41835 0.11930 3.507
Correlation of Fixed Effects:
(Intr) cc.fc1 prthk_
cc.factor1 -0.701
prethk_cent 0.000 0.000
gprethk_cnt -0.123 0.148 -0.179
model.4 <- lmer(thk ~ cc.factor + prethk_cent + gprethk_cent + prethk_cent:gprethk_cent + (1|school), REML = FALSE, data = schoolsmoke.2)
summary(model.4)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula:
thk ~ cc.factor + prethk_cent + gprethk_cent + prethk_cent:gprethk_cent +
(1 | school)
Data: schoolsmoke.2
AIC BIC logLik deviance df.resid
4702.9 4740.5 -2344.4 4688.9 1593
Scaled residuals:
Min 1Q Median 3Q Max
-2.24644 -0.81395 0.04367 0.80525 2.07026
Random effects:
Groups Name Variance Std.Dev.
school (Intercept) 0.0165 0.1284
Residual 1.0857 1.0420
Number of obs: 1600, groups: school, 28
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.35901 0.05205 45.322
cc.factor1 0.43001 0.07453 5.770
prethk_cent 0.20797 0.02135 9.743
gprethk_cent 0.36513 0.11890 3.071
prethk_cent:gprethk_cent 0.22898 0.06663 3.437
Correlation of Fixed Effects:
(Intr) cc.fc1 prthk_ gprth_
cc.factor1 -0.695
prethk_cent 0.008 0.000
gprethk_cnt -0.104 0.148 -0.171
prthk_cnt:_ -0.128 -0.003 -0.063 -0.130
The interaction between prethk_cent and gprethk_cent is significant (t = 3.437) and positive meaning when one increases so does the other.
Double check and compare
anova(model.3, model.4)
Data: schoolsmoke.2
Models:
model.3: thk ~ cc.factor + prethk_cent + gprethk_cent + (1 | school)
model.4: thk ~ cc.factor + prethk_cent + gprethk_cent + prethk_cent:gprethk_cent +
model.4: (1 | school)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
model.3 6 4712.6 4744.9 -2350.3 4700.6
model.4 7 4702.9 4740.5 -2344.4 4688.9 11.764 1 0.000604 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
With the interaction effect the model has a better fit which can be seen from the decrease in the AIC and the difference between models is significant.
interplot::interplot(model.4, var1 = "prethk_cent", var2 = "gprethk_cent")
This is asking how does the slope of individual previous knowledge change according to the schoolwide previous knowledge. In my model the moderating variable is the group level variable, because I want to consider how the context influences the individual. From this plot we can see that the schoolwide previous knowledge affects the slope of individual’s previous knowledge. When schoolwide previous knowledge is lowest an individual’s previous knowledge has a very small impact on tobacco knowledge. When schoolwide knowledge is highest an individual’s previous knowledge has a larger impact on tobacco knowledge.