This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
plot(cars)
library(tidyverse)
library(lme4)
STAR_long <- read_dta("STAR_long.dta")
View(STAR_long)
STAR_long.clean <- STAR_long %>%
mutate(.,
race.fac <- as_factor(race),
gender.fac <- as_factor(gender),
grade_0 = grade - 3)
glimpse(STAR_long.clean)
Rows: 8,826
Columns: 10
$ stdntid <dbl> 10023, 10023, 10023, 10023, 1...
$ grade <dbl> 3, 4, 5, 6, 7, 8, 3, 4, 5, 6,...
$ race <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ gender <dbl+lbl> 1, 1, 1, 1, 1, 1, 2, 2, 2...
$ ac_mot <dbl> 50, 50, 50, 50, 50, 50, 50, 5...
$ ever_lunch <dbl+lbl> 2, 2, 2, 2, 2, 2, 1, 1, 1...
$ science <dbl> 632, 756, 729, 762, 778, 783,...
$ `race.fac <- as_factor(race)` <fct> WHITE, WHITE, WHITE, WHITE, W...
$ `gender.fac <- as_factor(gender)` <fct> MALE, MALE, MALE, MALE, MALE,...
$ grade_0 <dbl> 0, 1, 2, 3, 4, 5, 0, 1, 2, 3,...
Using a time variable that starts at 0 it helps us interpret the intercept more easily.
model.null <- lmer(science ~ (1|stdntid), REML = FALSE, data = STAR_long.clean)
summary(model.null)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: science ~ (1 | stdntid)
Data: STAR_long.clean
AIC BIC logLik deviance df.resid
97947.4 97968.7 -48970.7 97941.4 8823
Scaled residuals:
Min 1Q Median 3Q Max
-3.1895 -0.3887 0.2381 0.6371 2.7355
Random effects:
Groups Name Variance Std.Dev.
stdntid (Intercept) 728.5 26.99
Residual 3361.7 57.98
Number of obs: 8826, groups: stdntid, 1471
Fixed effects:
Estimate Std. Error t value
(Intercept) 730.074 0.936 780
null.ICC <- 728.5/ (728.5 + 3361.7)
glimpse(null.ICC)
num 0.178
model.growth.null <- lmer(science ~ grade_0 + (1|stdntid), REML = FALSE, data = STAR_long.clean)
summary(model.growth.null)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: science ~ grade_0 + (1 | stdntid)
Data: STAR_long.clean
AIC BIC logLik deviance df.resid
90959.4 90987.7 -45475.7 90951.4 8822
Scaled residuals:
Min 1Q Median 3Q Max
-4.0254 -0.6361 0.0105 0.6367 4.0226
Random effects:
Groups Name Variance Std.Dev.
stdntid (Intercept) 1072 32.74
Residual 1300 36.05
Number of obs: 8826, groups: stdntid, 1471
Fixed effects:
Estimate Std. Error t value
(Intercept) 669.3920 1.0916 613.2
grade_0 24.2729 0.2247 108.0
Correlation of Fixed Effects:
(Intr)
grade_0 -0.515
growth.null.ICC <- 1072/(1072 + 1300)
glimpse(growth.null.ICC)
num 0.452
In the null model without the growth component (time) 17% of the variability was between students. However, when we add the growth component then 45% of the variability is between students. When we add time we are adding a variable that accounts for a lot of the variability.
model.1 <- lmer(science ~ grade_0 + race + ac_mot + gender + (1|stdntid), REML = FALSE, data = STAR_long.clean)
summary(model.1)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: science ~ grade_0 + race + ac_mot + gender + (1 | stdntid)
Data: STAR_long.clean
AIC BIC logLik deviance df.resid
90870.6 90920.1 -45428.3 90856.6 8819
Scaled residuals:
Min 1Q Median 3Q Max
-4.0758 -0.6326 0.0103 0.6356 3.9934
Random effects:
Groups Name Variance Std.Dev.
stdntid (Intercept) 991.8 31.49
Residual 1299.6 36.05
Number of obs: 8826, groups: stdntid, 1471
Fixed effects:
Estimate Std. Error t value
(Intercept) 717.9111 11.2699 63.702
grade_0 24.2729 0.2247 108.029
race -18.4500 2.0243 -9.114
ac_mot -0.3384 0.2291 -1.477
gender -6.4345 1.8647 -3.451
Correlation of Fixed Effects:
(Intr) grad_0 race ac_mot
grade_0 -0.050
race -0.194 0.000
ac_mot -0.937 0.000 -0.034
gender -0.048 0.000 0.056 -0.218
For some reason when I tried to use my factor variables for gender and race it did not recognize them.
anova(model.growth.null, model.1)
Data: STAR_long.clean
Models:
model.growth.null: science ~ grade_0 + (1 | stdntid)
model.1: science ~ grade_0 + race + ac_mot + gender + (1 | stdntid)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
model.growth.null 4 90959 90988 -45476 90951
model.1 7 90871 90920 -45428 90857 94.816 3 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Both AIC and BIC go down when we add gender, race, and academic motivation to the growth model, meaning this model is a better fit. However, ac_mot is not significant. The average science score at 5th grade is 24.27 with girls scoring 6.43 points less. I’m not sure how we interpret the race fixed effect because we have Black, White, and other. Multilevel R Squared = .03, which is small. Target model reduced the within-group variance by 0.01% and the between-group variance by 7.48%. This model is slightly better at reducing between group variance.
Hmisc::describe(STAR_long.clean)
Error in proxy[i, ..., drop = FALSE] : incorrect number of dimensions
I’m not sure what this means or how I’m doing this wrong.
model.2 <- lmer(science ~ grade_0 + race + ac_mot + gender + grade_0:ac_mot + (1|stdntid), REML = FALSE, data = STAR_long.clean)
summary(model.2)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: science ~ grade_0 + race + ac_mot + gender + grade_0:ac_mot +
(1 | stdntid)
Data: STAR_long.clean
AIC BIC logLik deviance df.resid
90872.5 90929.2 -45428.3 90856.5 8818
Scaled residuals:
Min 1Q Median 3Q Max
-4.0736 -0.6323 0.0099 0.6358 3.9931
Random effects:
Groups Name Variance Std.Dev.
stdntid (Intercept) 991.8 31.49
Residual 1299.6 36.05
Number of obs: 8826, groups: stdntid, 1471
Fixed effects:
Estimate Std. Error t value
(Intercept) 718.663903 13.170020 54.568
grade_0 23.971719 2.735083 8.765
race -18.450015 2.024300 -9.114
ac_mot -0.353688 0.267717 -1.321
gender -6.434471 1.864692 -3.451
grade_0:ac_mot 0.006121 0.055410 0.110
Correlation of Fixed Effects:
(Intr) grad_0 race ac_mot gender
grade_0 -0.519
race -0.166 0.000
ac_mot -0.954 0.516 -0.029
gender -0.041 0.000 0.056 -0.187
grad_0:c_mt 0.517 -0.997 0.000 -0.517 0.000
The interaction effect is not significant this means students do not differ in their growth by their level of academic achievement.
model.3 <- lmer(science ~ grade_0 + race + ac_mot + gender + (grade_0|stdntid), REML = FALSE, data = STAR_long.clean)
boundary (singular) fit: see ?isSingular
summary(model.3)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: science ~ grade_0 + race + ac_mot + gender + (grade_0 | stdntid)
Data: STAR_long.clean
AIC BIC logLik deviance df.resid
90873.2 90937.0 -45427.6 90855.2 8817
Scaled residuals:
Min 1Q Median 3Q Max
-4.0451 -0.6347 0.0098 0.6352 4.0099
Random effects:
Groups Name Variance Std.Dev. Corr
stdntid (Intercept) 9.497e+02 30.8165
grade_0 7.331e-02 0.2708 1.00
Residual 1.299e+03 36.0462
Number of obs: 8826, groups: stdntid, 1471
Fixed effects:
Estimate Std. Error t value
(Intercept) 717.8706 11.2643 63.730
grade_0 24.2729 0.2248 107.987
race -18.3997 2.0235 -9.093
ac_mot -0.3417 0.2290 -1.492
gender -6.3402 1.8640 -3.401
Correlation of Fixed Effects:
(Intr) grad_0 race ac_mot
grade_0 -0.048
race -0.194 0.000
ac_mot -0.937 0.000 -0.034
gender -0.048 0.000 0.056 -0.218
convergence code: 0
boundary (singular) fit: see ?isSingular
The random slope for year_0 means that slope for effect of growth on science scores can vary for each student (clusters).
lmerTest::rand(model.3)
ANOVA-like table for random-effects: Single term deletions
Model:
science ~ grade_0 + race + ac_mot + gender + (grade_0 | stdntid)
npar logLik AIC LRT Df Pr(>Chisq)
<none> 9 -45428 90873
grade_0 in (grade_0 | stdntid) 7 -45428 90871 1.3099 2 0.5195
Without the random slope for grad_0 the AIC actually goes down showing this model is not a better fit but it also is not significant. I would say there is not proof that the slope for grade_0 is random.