This week, we are going to use data from Gavin and Hofmann (2002), a study on organizational climate and attitudes published in Leadership Quarterly. Here, we have individuals soldiers nested within companies. This is the same dataset that Garson uses in Chapter 6, so you can recreate his analysis.
library(tidyverse)
library(psych)
library(lme4)
library(haven)
schoolsmoke <- read_csv("schoolsmoke.csv")
Parsed with column specification:
cols(
school = col_double(),
thk = col_double(),
prethk = col_double(),
cc = col_double(),
gprethk = col_double(),
d = col_double(),
d2 = col_double(),
d3 = col_double(),
d4 = col_double(),
d5 = col_double()
)
schoolsmoke.clean <- schoolsmoke
glimpse(schoolsmoke)
Rows: 1,600
Columns: 10
$ school <dbl> 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193,...
$ thk <dbl> 1, 2, 3, 1, 2, 2, 3, 3, 2, 2, 4, 2, 2, 3, 4, 4, 1, 3, 4, 2, 1, 3, 1, 3, 2, 4, 1...
$ prethk <dbl> 5, 3, 4, 0, 1, 3, 1, 2, 3, 2, 2, 2, 3, 3, 2, 3, 3, 3, 2, 1, 3, 0, 1, 0, 1, 3, 2...
$ cc <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ gprethk <dbl> 2.153846, 2.153846, 2.153846, 2.153846, 2.153846, 2.153846, 2.153846, 2.153846,...
$ d <dbl> 1, 1, 2, 3, 0, 2, 2, 2, 1, 2, 2, 4, 0, 2, 1, 0, 2, 1, 2, 4, 3, 3, 2, 1, 2, 2, 2...
$ d2 <dbl> 2, 4, 1, 2, 3, 1, 2, 1, 1, 2, 3, 3, 2, 2, 2, 2, 4, 1, 1, 1, 2, 2, 1, 2, 1, 3, 2...
$ d3 <dbl> 2, 0, 3, 2, 3, 1, 4, 2, 3, 3, 3, 0, 2, 1, 2, 2, 2, 3, 2, 1, 3, 2, 1, 4, 2, 2, 3...
$ d4 <dbl> 3, 1, 3, 2, 2, 2, 2, 3, 2, 4, 3, 1, 2, 2, 2, 2, 2, 2, 2, 1, 4, 0, 1, 1, 4, 1, 2...
$ d5 <dbl> 0, 3, 3, 2, 2, 0, 0, 4, 1, 0, 2, 1, 2, 4, 3, 4, 1, 1, 2, 0, 2, 1, 1, 3, 3, 2, 4...
Remember our old friend describe from the psych package a few weeks ago? This is a great way to get a detailed summary of all the variables in a dataset.
psych::describe(schoolsmoke,
fast = TRUE)
This week, we are going to get a crash course in how to calculate reliability using Cronbach’s alpha. The alpha function in the psych package is a quick and easy way to calculate alpha:
peer_items <- schoolsmoke %>%
select(., d,
d2,
d3,
d4,
d5)
alpha(peer_items, check.keys=TRUE)
Some items were negatively correlated with total scale and were automatically reversed.
This is indicated by a negative sign for the variable name.
Reliability analysis
Call: alpha(x = peer_items, check.keys = TRUE)
lower alpha upper 95% confidence boundaries
-0.01 0.06 0.14
Reliability if an item is dropped:
Item statistics
Non missing response frequency for each item
0 1 2 3 4 miss
d 0.10 0.22 0.38 0.20 0.10 0
d2 0.11 0.18 0.43 0.18 0.09 0
d3 0.10 0.23 0.38 0.18 0.10 0
d4 0.10 0.20 0.40 0.20 0.10 0
d5 0.10 0.20 0.40 0.20 0.10 0
Now, we can create a new variable, peer, that is constructed by taking the mean of our 5 peer items.
my.keys.list <- list(peer = c("d","d2","d3", "d4","d5"))
my.scales <- scoreItems(my.keys.list, schoolsmoke, impute = "none")
NaNs produced
print(my.scales, short = FALSE)
Call: scoreItems(keys = my.keys.list, items = schoolsmoke, impute = "none")
(Standardized) Alpha:
peer
alpha -0.07
Standard errors of unstandardized Alpha:
peer
ASE 0.041
Standardized Alpha of observed scales:
peer
[1,] -0.07
Average item correlation:
peer
average.r -0.013
Median item correlation:
peer
-0.011
Guttman 6* reliability:
peer
Lambda.6 -0.052
Signal/Noise based upon av.r :
peer
Signal/Noise -0.065
Scale intercorrelations corrected for attenuation
raw correlations below the diagonal, alpha on the diagonal
corrected correlations above the diagonal:
Note that these are the correlations of the complete scales based on the correlation matrix,
not the observed scales based on the raw items.
peer
peer -0.07
Item by scale correlations:
corrected for item overlap and scale reliability
peer
d NaN
d2 NaN
d3 NaN
d4 NaN
d5 NaN
Non missing response frequency for each item
0 1 2 3 4 miss
d 0.10 0.22 0.38 0.20 0.10 0
d2 0.11 0.18 0.43 0.18 0.09 0
d3 0.10 0.23 0.38 0.18 0.10 0
d4 0.10 0.20 0.40 0.20 0.10 0
d5 0.10 0.20 0.40 0.20 0.10 0
my.scores <- as_tibble(my.scales$scores)
Part 1: Calculating Scale Scores and Checking Reliability
Calculate a composite scale score using the 5 peer pressure items (peer1-peer5). Estimate Cronbach’s alpha for this scale and report and interpret the results. Should we incorporate this measure into our analysis going forward?
The calculated composite scale scores are given above. The estimated Cronbach’s Alpha is -0.07. Since the alpha scores is less than 0.70, then this measure should not be incorporated into our analysis going forward.
schoolsmoke.1 <- bind_cols(schoolsmoke, my.scores)
glimpse(schoolsmoke.1)
Rows: 1,600
Columns: 11
$ school <dbl> 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193, 193,...
$ thk <dbl> 1, 2, 3, 1, 2, 2, 3, 3, 2, 2, 4, 2, 2, 3, 4, 4, 1, 3, 4, 2, 1, 3, 1, 3, 2, 4, 1...
$ prethk <dbl> 5, 3, 4, 0, 1, 3, 1, 2, 3, 2, 2, 2, 3, 3, 2, 3, 3, 3, 2, 1, 3, 0, 1, 0, 1, 3, 2...
$ cc <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ gprethk <dbl> 2.153846, 2.153846, 2.153846, 2.153846, 2.153846, 2.153846, 2.153846, 2.153846,...
$ d <dbl> 1, 1, 2, 3, 0, 2, 2, 2, 1, 2, 2, 4, 0, 2, 1, 0, 2, 1, 2, 4, 3, 3, 2, 1, 2, 2, 2...
$ d2 <dbl> 2, 4, 1, 2, 3, 1, 2, 1, 1, 2, 3, 3, 2, 2, 2, 2, 4, 1, 1, 1, 2, 2, 1, 2, 1, 3, 2...
$ d3 <dbl> 2, 0, 3, 2, 3, 1, 4, 2, 3, 3, 3, 0, 2, 1, 2, 2, 2, 3, 2, 1, 3, 2, 1, 4, 2, 2, 3...
$ d4 <dbl> 3, 1, 3, 2, 2, 2, 2, 3, 2, 4, 3, 1, 2, 2, 2, 2, 2, 2, 2, 1, 4, 0, 1, 1, 4, 1, 2...
$ d5 <dbl> 0, 3, 3, 2, 2, 0, 0, 4, 1, 0, 2, 1, 2, 4, 3, 4, 1, 1, 2, 0, 2, 1, 1, 3, 3, 2, 4...
$ peer <dbl> 1.6, 1.8, 2.4, 2.2, 2.0, 1.2, 2.0, 2.4, 1.6, 2.2, 2.6, 1.8, 1.6, 2.2, 2.0, 2.0,...
Let’s take a look at some of the key variables for our analysis. We have three variables, all which are scale scores constructed just like we did for the example above.
hist(schoolsmoke.1$peer,
main = "Distribution of Peer Scores",
sub = "N = 1600 Observations.",
xlab = "Scaled Peer Score")
models <- list()
model.0 <- lmer(thk ~ (1|school), REML = FALSE, data = schoolsmoke.1)
summary(model.0)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method [lmerModLmerTest]
Formula: thk ~ (1 | school)
Data: schoolsmoke.1
AIC BIC logLik deviance df.resid
4833.0 4849.1 -2413.5 4827.0 1597
Scaled residuals:
Min 1Q Median 3Q Max
-1.9039 -0.7734 0.0693 0.9240 1.7430
Random effects:
Groups Name Variance Std.Dev.
school (Intercept) 0.09438 0.3072
Residual 1.16219 1.0780
Number of obs: 1600, groups: school, 28
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.60750 0.06538 25.83292 39.88 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Here’s a quick and easy way to get our friend the ICC!
null.ICC <- 0.09438/(0.09438 + 1.16219)
null.ICC
[1] 0.07510923
lmerTest to Evaluate Random Effects:lmerTest::rand(model.0)
ANOVA-like table for random-effects: Single term deletions
Model:
thk ~ (1 | school)
npar logLik AIC LRT Df Pr(>Chisq)
<none> 3 -2413.5 4833.0
(1 | school) 2 -2445.6 4895.2 64.16 1 1.147e-15 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The ICC = 0.07510923. This ICC greater than 0.05, so we should move forward with MLM. The likelihood ratio test shows that it is important that we included the intercept into our model. It is worth conducting MLM on these data.
The estimated fixed effect of the intercept in the null model is 2.60750. This number is the average of the peer score. The significant test shows that it is significant.
prethkmodel.1 <- lmer(thk ~ prethk + (1|school), REML = FALSE, data = schoolsmoke.1)
summary(model.1)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method [lmerModLmerTest]
Formula: thk ~ prethk + (1 | school)
Data: schoolsmoke.1
AIC BIC logLik deviance df.resid
4733.5 4755.0 -2362.7 4725.5 1596
Scaled residuals:
Min 1Q Median 3Q Max
-2.12273 -0.82727 0.03082 0.83756 2.15165
Random effects:
Groups Name Variance Std.Dev.
school (Intercept) 0.07519 0.2742
Residual 1.09321 1.0456
Number of obs: 1600, groups: school, 28
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.152e+00 7.414e-02 6.142e+01 29.02 <2e-16 ***
prethk 2.175e-01 2.122e-02 1.599e+03 10.25 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
prethk -0.598
The coefficient estimate for prethk is 0.2175. The variances of our random terms decrease from 0.09438 to 0.07519 for school. The residual decreased from 1.16219 to 1.09321. This implies that some of the variation is explained by prethk, which is shown to be signficant. The AIC and BIC both decreased.
ccmodel.2 <- lmer(thk ~ prethk + cc + (1|school), REML = FALSE, data = schoolsmoke.1)
summary(model.2)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method [lmerModLmerTest]
Formula: thk ~ prethk + cc + (1 | school)
Data: schoolsmoke.1
AIC BIC logLik deviance df.resid
4721.3 4748.2 -2355.6 4711.3 1595
Scaled residuals:
Min 1Q Median 3Q Max
-2.20705 -0.80300 0.04029 0.80179 2.15135
Random effects:
Groups Name Variance Std.Dev.
school (Intercept) 0.03333 0.1826
Residual 1.09412 1.0460
Number of obs: 1600, groups: school, 28
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 1.945e+00 7.699e-02 4.993e+01 25.265 < 2e-16 ***
prethk 2.226e-01 2.113e-02 1.598e+03 10.535 < 2e-16 ***
cc 3.921e-01 8.944e-02 2.346e+01 4.385 0.000208 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) prethk
prethk -0.585
cc -0.580 0.023
model.3 <- lmer(thk ~ prethk + gprethk + cc + (1|school), REML = FALSE, data = schoolsmoke.1)
summary(model.3)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method [lmerModLmerTest]
Formula: thk ~ prethk + gprethk + cc + (1 | school)
Data: schoolsmoke.1
AIC BIC logLik deviance df.resid
4712.6 4744.9 -2350.3 4700.6 1594
Scaled residuals:
Min 1Q Median 3Q Max
-2.23332 -0.79567 0.02374 0.79254 2.16704
Random effects:
Groups Name Variance Std.Dev.
school (Intercept) 0.01734 0.1317
Residual 1.09334 1.0456
Number of obs: 1600, groups: school, 28
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 1.076e+00 2.548e-01 3.024e+01 4.224 0.000203 ***
prethk 2.126e-01 2.138e-02 1.571e+03 9.944 < 2e-16 ***
gprethk 4.183e-01 1.193e-01 3.387e+01 3.507 0.001301 **
cc 4.308e-01 7.554e-02 2.330e+01 5.702 7.94e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) prethk gprthk
prethk 0.000
gprethk -0.963 -0.179
cc -0.288 0.000 0.148
gprethk is considering the group level; whereas prethk is the individual level. The gprethk is significant, which implies an impact of school context on an individual student’s knowledge. The level 1 and level 2 variances go down slightly, which implies that the variation is being explained by the additional predictors.