This week, we will use schoolsmoke dataset that we worked with in class. The data are from the Television, School, and Family Smoking Prevention and Cessation Project (Flay et al. 1988; Rabe-Hesketh and Skrondal 2012, chap. 11), a schoolwide anti-smoking intervention where schools were randomly assigned into a schoolwide curricular intervention (cc
= 1) or a control condition (cc
= 0). 1,600 students are nested within 28 schools.
The dependent variable is the tobacco and health knowledge (THK) scale, which runs from 1-4, with higher numbers suggesting more knowledge about the risks of tobacco. We also have a group-averaged prior measure of THK (gprethk
), which represents the average tobacco knowledge score across a given school. Finally, we have a set of 5 items (peer1
-peer5
) that are individual item responses to a 5-question Likert-type measure of each student’s perceived pressure from peers to smoke. Higher scores on these items suggest higher levels of perceived pressure from peers.
suppressPackageStartupMessages(library(tidyverse))
package ‘ggplot2’ was built under R version 3.6.2package ‘tibble’ was built under R version 3.6.2package ‘tidyr’ was built under R version 3.6.2package ‘purrr’ was built under R version 3.6.2package ‘dplyr’ was built under R version 3.6.2
suppressPackageStartupMessages(library(lme4))
package ‘lme4’ was built under R version 3.6.2
suppressPackageStartupMessages(library(psych))
glimpse(schoolsmoke)
Rows: 1,600
Columns: 10
$ school [3m[38;5;246m<dbl>[39m[23m 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 1…
$ thk [3m[38;5;246m<dbl>[39m[23m 1, 2, 3, 1, 2, 2, 3, 3, 2, 2, 4, 2, 2, 3, 4, 4, 1, 3, 4, 2, 1, 3, 1, 3, 2, 4…
$ prethk [3m[38;5;246m<dbl>[39m[23m 5, 3, 4, 0, 1, 3, 1, 2, 3, 2, 2, 2, 3, 3, 2, 3, 3, 3, 2, 1, 3, 0, 1, 0, 1, 3…
$ cc [3m[38;5;246m<dbl>[39m[23m 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ gprethk [3m[38;5;246m<dbl>[39m[23m 2.153846, 2.153846, 2.153846, 2.153846, 2.153846, 2.153846, 2.153846, 2.1538…
$ peer1 [3m[38;5;246m<dbl>[39m[23m 1, 1, 2, 3, 0, 2, 2, 2, 1, 2, 2, 4, 0, 2, 1, 0, 2, 1, 2, 4, 3, 3, 2, 1, 2, 2…
$ peer2 [3m[38;5;246m<dbl>[39m[23m 2, 4, 1, 2, 3, 1, 2, 1, 1, 2, 3, 3, 2, 2, 2, 2, 4, 1, 1, 1, 2, 2, 1, 2, 1, 3…
$ peer3 [3m[38;5;246m<dbl>[39m[23m 2, 0, 3, 2, 3, 1, 4, 2, 3, 3, 3, 0, 2, 1, 2, 2, 2, 3, 2, 1, 3, 2, 1, 4, 2, 2…
$ peer4 [3m[38;5;246m<dbl>[39m[23m 3, 1, 3, 2, 2, 2, 2, 3, 2, 4, 3, 1, 2, 2, 2, 2, 2, 2, 2, 1, 4, 0, 1, 1, 4, 1…
$ peer5 [3m[38;5;246m<dbl>[39m[23m 0, 3, 3, 2, 2, 0, 0, 4, 1, 0, 2, 1, 2, 4, 3, 4, 1, 1, 2, 0, 2, 1, 1, 3, 3, 2…
Just a little pre-cleaning here: I want to change the variable cc
into a factor. It is coded as 0 and 1, so it can he used as is, but it really is a categorical variable, so I want R to know that when I run my models.
schoolsmoke.1 <- schoolsmoke %>%
mutate(., cc.factor = as_factor(cc))
glimpse(schoolsmoke.1)
Rows: 1,600
Columns: 11
$ school [3m[38;5;246m<dbl>[39m[23m 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193,…
$ thk [3m[38;5;246m<dbl>[39m[23m 1, 2, 3, 1, 2, 2, 3, 3, 2, 2, 4, 2, 2, 3, 4, 4, 1, 3, 4, 2, 1, 3, 1, 3, 2,…
$ prethk [3m[38;5;246m<dbl>[39m[23m 5, 3, 4, 0, 1, 3, 1, 2, 3, 2, 2, 2, 3, 3, 2, 3, 3, 3, 2, 1, 3, 0, 1, 0, 1,…
$ cc [3m[38;5;246m<dbl>[39m[23m 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ gprethk [3m[38;5;246m<dbl>[39m[23m 2.153846, 2.153846, 2.153846, 2.153846, 2.153846, 2.153846, 2.153846, 2.15…
$ peer1 [3m[38;5;246m<dbl>[39m[23m 1, 1, 2, 3, 0, 2, 2, 2, 1, 2, 2, 4, 0, 2, 1, 0, 2, 1, 2, 4, 3, 3, 2, 1, 2,…
$ peer2 [3m[38;5;246m<dbl>[39m[23m 2, 4, 1, 2, 3, 1, 2, 1, 1, 2, 3, 3, 2, 2, 2, 2, 4, 1, 1, 1, 2, 2, 1, 2, 1,…
$ peer3 [3m[38;5;246m<dbl>[39m[23m 2, 0, 3, 2, 3, 1, 4, 2, 3, 3, 3, 0, 2, 1, 2, 2, 2, 3, 2, 1, 3, 2, 1, 4, 2,…
$ peer4 [3m[38;5;246m<dbl>[39m[23m 3, 1, 3, 2, 2, 2, 2, 3, 2, 4, 3, 1, 2, 2, 2, 2, 2, 2, 2, 1, 4, 0, 1, 1, 4,…
$ peer5 [3m[38;5;246m<dbl>[39m[23m 0, 3, 3, 2, 2, 0, 0, 4, 1, 0, 2, 1, 2, 4, 3, 4, 1, 1, 2, 0, 2, 1, 1, 3, 3,…
$ cc.factor [3m[38;5;246m<fct>[39m[23m 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
See above: the variable cc.factor
is just cc
converted into a factor, which is great.
Calculate a composite scale score using the 5 peer pressure items (peer1-peer5). Estimate Cronbach’s alpha for this scale and report and interpret the results. Should we incorporate this measure into our analysis going forward?
peer_items <- schoolsmoke.1 %>%
select(., peer1:peer5)
alpha(peer_items)
Some items were negatively correlated with the total scale and probably
should be reversed.
To do this, run the function again with the 'check.keys=TRUE' option
Some items ( peer3 peer4 ) were negatively correlated with the total scale and
probably should be reversed.
To do this, run the function again with the 'check.keys=TRUE' option
NaNs produced
Reliability analysis
Call: alpha(x = peer_items)
raw_alpha <dbl> | std.alpha <dbl> | G6(smc) <dbl> | average_r <dbl> | S/N <dbl> | ase <dbl> | mean <dbl> | ||
---|---|---|---|---|---|---|---|---|
-0.07008044 | -0.06942871 | -0.05112059 | -0.01315507 | -0.0649213 | 0.04230182 | 1.986875 |
lower alpha upper 95% confidence boundaries
-0.15 -0.07 0.01
Reliability if an item is dropped:
raw_alpha <dbl> | std.alpha <dbl> | G6(smc) <dbl> | average_r <dbl> | S/N <dbl> | alpha se <dbl> | ||
---|---|---|---|---|---|---|---|
peer1 | -0.03913683 | -0.038972919 | -0.026723214 | -0.0094665262 | -0.037511006 | 0.04240944 | |
peer2 | -0.08177458 | -0.081228564 | -0.057324613 | -0.0191410415 | -0.075126173 | 0.04416833 | |
peer3 | 0.00147113 | 0.001988348 | 0.004132794 | 0.0004978293 | 0.001992309 | 0.04078108 | |
peer4 | -0.07196642 | -0.071719431 | -0.050868479 | -0.0170146471 | -0.066919969 | 0.04375667 | |
peer5 | -0.08870233 | -0.088059383 | -0.061271710 | -0.0206509625 | -0.080932516 | 0.04446153 |
Item statistics
n <dbl> | raw.r <dbl> | std.r <dbl> | r.cor <dbl> | r.drop <dbl> | mean <dbl> | sd <dbl> | |
---|---|---|---|---|---|---|---|
peer1 | 1600 | 0.4285559 | 0.4251198 | NaN | -0.038285759 | 1.988750 | 1.101423 |
peer2 | 1600 | 0.4471966 | 0.4517904 | NaN | -0.008595944 | 1.975000 | 1.081715 |
peer3 | 1600 | 0.4004108 | 0.3976501 | NaN | -0.067150784 | 1.961875 | 1.098829 |
peer4 | 1600 | 0.4465759 | 0.4459284 | NaN | -0.015582733 | 2.015000 | 1.095114 |
peer5 | 1600 | 0.4536514 | 0.4559530 | NaN | -0.003961319 | 1.993750 | 1.087175 |
Non missing response frequency for each item
0 1 2 3 4 miss
peer1 0.10 0.22 0.38 0.20 0.10 0
peer2 0.11 0.18 0.43 0.18 0.09 0
peer3 0.10 0.23 0.38 0.18 0.10 0
peer4 0.10 0.20 0.40 0.20 0.10 0
peer5 0.10 0.20 0.40 0.20 0.10 0
Wow, this alpha looks terrible! Pretty much 0. I would definitely NOT recommend using these items as a composite scale score for peer pressure. Notice, above, that you get a warning that some items were negatively correlated with the total scale. Sometimes, this happens when items are not reversed. You can add the check.keys=TRUE
option to the alpha
function to check this. But here, not knowing the individual item stems, and just based on the overall crumminess of the alpha score, I would chalk this up to being a bad scale and move on!
Treating “thk” as the DV, and “school” as the level-2 clustering variable (we will ignore classroom clustering for now- stay tuned!), estimate a null model. Compute, report, and interpret the ICC, as well as the likelihood ratio test at the bottom of the output. Is it worth conducting MLM on these data?
model.null <- lmer(thk ~ (1|school), REML = FALSE, data = schoolsmoke.1)
summary(model.null)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: thk ~ (1 | school)
Data: schoolsmoke.1
AIC BIC logLik deviance df.resid
4833.0 4849.1 -2413.5 4827.0 1597
Scaled residuals:
Min 1Q Median 3Q Max
-1.9039 -0.7734 0.0693 0.9240 1.7430
Random effects:
Groups Name Variance Std.Dev.
school (Intercept) 0.09438 0.3072
Residual 1.16219 1.0780
Number of obs: 1600, groups: school, 28
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.60750 0.06538 39.88
We can pull the Level 1 and Level 2 variance estimates from the table above to calculate an ICC:
null.ICC <- 0.09438 /(0.09438 + 1.16219 )
null.ICC
[1] 0.07510923
The estimated ICC was .08. This is above the .05 threshold we typically set for requiring MLM. This means that 8% of the variation in tobacco prevention knowledge occurs between schools, and 98% occurs within schools.
lmerTest::rand(model.null)
ANOVA-like table for random-effects: Single term deletions
Model:
thk ~ (1 | school)
npar logLik AIC LRT Df Pr(>Chisq)
<none> 3 -2413.5 4833.0
(1 | school) 2 -2445.6 4895.2 64.16 1 1.147e-15 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The LRT confirms the need for MLM - the model that includes a random intercept for school represents a significant improvement (p<.001) over the model that does not include a random intercept.
Interpret the estimated fixed effect of the intercept in the null model. What is this number measuring- what does it mean?
The intercept in the null model was estimated to be 2.61. In this case, we are measuring the average student score on the tobacco prevention scale, after adjusting for the nesting of students within schools.
3. Now, add “prethk” as a level one covariate. Interpret the coefficient estimate. Report what happens to the level-1 and level-2 variances of our random terms. Do they go up or down, and by how much? What does this mean?
model.1 <- lmer(thk ~ prethk + (1|school), REML = FALSE, data = schoolsmoke.1)
summary(model.1)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: thk ~ prethk + (1 | school)
Data: schoolsmoke.1
AIC BIC logLik deviance df.resid
4733.5 4755.0 -2362.7 4725.5 1596
Scaled residuals:
Min 1Q Median 3Q Max
-2.12273 -0.82727 0.03082 0.83756 2.15165
Random effects:
Groups Name Variance Std.Dev.
school (Intercept) 0.07519 0.2742
Residual 1.09321 1.0456
Number of obs: 1600, groups: school, 28
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.15164 0.07414 29.02
prethk 0.21747 0.02122 10.25
Correlation of Fixed Effects:
(Intr)
prethk -0.598
The estimate for prethk
= 0.22, and it is statistically significant (t = 10.25). In practical terms, this means that as a student’s pretest tobacco prevention score increases by 1 unit, we would predict their post score (thk
) to increase by about .22 points.
Both the level 1 and level 2 variances decreased at least a bit compared to the null model. The L1 (student) variance decreased from 1.16 to 1.09, and the L2 (school) variance decreased from .09 to about .08. What does this mean? As significant predictors are added to the model, we expect the variance to decrease- predictors explain previously unexplained variance in our outcome, making these estimates smaller.
4. Finally, add “cc” and “gprethk” as level-2 predictors to your model. Interpret the results. Explain how the estimates for prethk and gprethk are related to one another, and also explain what this suggests about the impact of school context on an individual student’s knowledge. Report what happens to the level-1 and level-2 variances of our random terms. Do they go up or down, and by how much?
model.2 <- lmer(thk ~ prethk + cc.factor + gprethk + (1|school), REML = FALSE, data = schoolsmoke.1)
summary(model.2)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: thk ~ prethk + cc.factor + gprethk + (1 | school)
Data: schoolsmoke.1
AIC BIC logLik deviance df.resid
4712.6 4744.9 -2350.3 4700.6 1594
Scaled residuals:
Min 1Q Median 3Q Max
-2.23332 -0.79567 0.02374 0.79254 2.16704
Random effects:
Groups Name Variance Std.Dev.
school (Intercept) 0.01734 0.1317
Residual 1.09334 1.0456
Number of obs: 1600, groups: school, 28
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.07609 0.25478 4.224
prethk 0.21260 0.02138 9.944
cc.factor1 0.43075 0.07554 5.702
gprethk 0.41835 0.11930 3.507
Correlation of Fixed Effects:
(Intr) prethk cc.fc1
prethk 0.000
cc.factor1 -0.288 0.000
gprethk -0.963 -0.179 0.148
The estimates for cc
(t = 5.70) and gprethk
(t = 3.51) are both positive and statisically significant. They are also very similar in magnitude. Students who attend schools that received the tobacco prevention curriculum (cc
= 1) are predicted to have tobacco prevention knowledge scores that are about .43 points higher compared to students who attend schools that did not receive the curriculum. Students who attend schools that have one unit higher school average tobacco prevention scores (maybe a stronger school culture around tobacco prevention) are predicted to score about .42 points higher on their individual thk
score.
This suggests that school context may be quite important in preventing tobacco use. Schools with prevention curricula, as well as schools where students on average tend to know more about tobacco prevention, both seem to have a positive impact on tobaco prevention knowledge for individual students.
Only the 2 variance decreased compared to model 1. The L1 (student) variance remained at about 1.09, and the L2 (school) variance decreased from .09 to about .02. This is a significant decrease in school level variability (that’s a good thing). Why no change at L1? The addition of L2 predictors like cc
can only explain variation at the school level (they have no within-school variability).
modelsummary
and broom.mixed
Packages to Organize Your Results:library(modelsummary)
package ‘modelsummary’ was built under R version 3.6.2
Attaching package: ‘modelsummary’
The following object is masked from ‘package:psych’:
SD
library(broom.mixed)
package ‘broom.mixed’ was built under R version 3.6.2Package version inconsistency detected.
TMB was built with Matrix version 1.2.18
Current Matrix version is 1.2.17
Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' packageRegistered S3 method overwritten by 'broom.mixed':
method from
tidy.gamlss broom
models <- list(model.null, model.1, model.2)
modelsummary(models, output = "markdown")
Model 1 | Model 2 | Model 3 | |
---|---|---|---|
(Intercept) | 2.607 | 2.152 | 1.076 |
(0.065) | (0.074) | (0.255) | |
sd__(Intercept) | 0.307 | 0.274 | 0.132 |
sd__Observation | 1.078 | 1.046 | 1.046 |
prethk | 0.217 | 0.213 | |
(0.021) | (0.021) | ||
cc.factor1 | 0.431 | ||
(0.076) | |||
gprethk | 0.418 | ||
(0.119) | |||
AIC | 4833.0 | 4733.5 | 4712.6 |
BIC | 4849.1 | 4755.0 | 4744.9 |
Log.Lik. | -2413.500 | -2362.747 | -2350.308 |
Not required, but a summary table is a great way to tie everything together!