Part 1: Reshape the Data Into Long Format
Part 2: Test for Nonlinear Growth Parameters
Create a new variable, grade_0, which goes from 0-5 instead of 3-8 like our current grade variable does (just like last week). (See below)
Run and interpret three models, all with science scores (science) as the DV, and all with scores clustered within students (stdntid): 1) a null growth model, 2) a growth model with linear and quadratic growth, and 3) a growth model with linear, quadratic, and cubic growth. Include a random slope for grade_0 in all models (R folks - you can disregard this. The software is being too difficult! :)). Use AIC and BIC to choose which model has the best fit. Does this dataset show evidence of nonlinear growth in science scores?
The AIC for the null growth model, growth model with linear and quadratic growth, and growth model for liear, quadratic, and cubic are as follows, respectively: 90959.4, 88570, 87223 The BIC for the null growth model, growth model with linear and quadratic growth, and growth model for liear, quadratic, and cubic are as follows, respectively: 90987.7, 88605, 87266. The data shows evidence of nonlinear growth in science scores.
Part 3: Use a Repeated Measures (Heterogeneous Variance) Structure
Using your preferred model from #3 above, test a heterogeneous level-1 variance structure using the residuals()option in Stata, or the lme function in the nlme package for R users. R users - do not include a random slope for grade_0 at the student level (this was causing convergence issues). Does the use of this structure improve the model fit? No, the BIC is 89951. This is not an improvement from the previous models.
In your own words, how is the repeated measures structure different from the one we typically use in MLM? Why might it be important to use? Repeated Measures is a heterogeneous variance structure. The purpose of the repeated measures to to capture nonlinearity in the dataset. By doing so, it will allow you to develop a better model for predicting.
library(tidyverse)
library(lme4)
library(haven)
starwide <- haven::read_dta("STAR_wide.dta")
glimpse(starwide)
Rows: 1,471
Columns: 11
$ stdntid <dbl> 10023, 10042, 10056, 10082, 10097, 101...
$ race <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2...
$ gender <dbl+lbl> 1, 2, 2, 1, 1, 1, 2, 1, 1, 2, 2, 1...
$ ac_mot <dbl> 50, 50, 52, 49, 51, 50, 51, 57, 40, 48...
$ ever_lunch <dbl+lbl> 2, 1, 2, 2, 1, 2, 1, 2, 2, 2, 1, 1...
$ science3 <dbl> 632, 618, 618, 649, 577, 655, 662, 627...
$ science4 <dbl> 756, 710, 719, 722, 688, 719, 767, 709...
$ science5 <dbl> 729, 738, 760, 729, 681, 738, 749, 754...
$ science6 <dbl> 762, 740, 776, 753, 720, 710, 782, 755...
$ science7 <dbl> 778, 734, 782, 809, 698, 766, 763, 729...
$ science8 <dbl> 783, 771, 772, 806, 745, 774, 804, 793...
pivot_longer to reshape from wide to long….starwide.long <- starwide %>%
pivot_longer(.,
cols = starts_with("science"),
names_to = "year",
values_to = "science",
names_prefix = "science",
)
starwide.long.clean <- starwide.long %>%
mutate(.,
race.fac = as_factor(race),
gender.fac = as_factor(gender),
ever_lunch.fac = as_factor(ever_lunch),
grade_0 = as.numeric(year)-3,
grade_sq = grade_0^2,
grade_cube = grade_0^3)%>%
na.omit()
glimpse(starwide.long.clean)
Rows: 8,826
Columns: 13
$ stdntid <dbl> 10023, 10023, 10023, 10023, 10023...
$ race <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
$ gender <dbl+lbl> 1, 1, 1, 1, 1, 1, 2, 2, 2, 2,...
$ ac_mot <dbl> 50, 50, 50, 50, 50, 50, 50, 50, 5...
$ ever_lunch <dbl+lbl> 2, 2, 2, 2, 2, 2, 1, 1, 1, 1,...
$ year <chr> "3", "4", "5", "6", "7", "8", "3"...
$ science <dbl> 632, 756, 729, 762, 778, 783, 618...
$ race.fac <fct> WHITE, WHITE, WHITE, WHITE, WHITE...
$ gender.fac <fct> MALE, MALE, MALE, MALE, MALE, MAL...
$ ever_lunch.fac <fct> NON-FREE LUNCH, NON-FREE LUNCH, N...
$ grade_0 <dbl> 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, ...
$ grade_sq <dbl> 0, 1, 4, 9, 16, 25, 0, 1, 4, 9, 1...
$ grade_cube <dbl> 0, 1, 8, 27, 64, 125, 0, 1, 8, 27...
model.null.growth <- lmer(science ~ grade_0 + (1|stdntid), REML=FALSE, data = starwide.long.clean)
summary(model.null.growth)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: science ~ grade_0 + (1 | stdntid)
Data: starwide.long.clean
AIC BIC logLik deviance df.resid
90959.4 90987.7 -45475.7 90951.4 8822
Scaled residuals:
Min 1Q Median 3Q Max
-4.0254 -0.6361 0.0105 0.6367 4.0226
Random effects:
Groups Name Variance Std.Dev.
stdntid (Intercept) 1072 32.74
Residual 1300 36.05
Number of obs: 8826, groups: stdntid, 1471
Fixed effects:
Estimate Std. Error t value
(Intercept) 669.3920 1.0916 613.2
grade_0 24.2729 0.2247 108.0
Correlation of Fixed Effects:
(Intr)
grade_0 -0.515
null.growth.icc <- 1073/(1073 + 1300)
null.growth.icc
[1] 0.4521702
model.square <- lmer(science ~ grade_0 + grade_sq + (1|stdntid), data = starwide.long.clean)
summary(model.square)
Linear mixed model fit by REML ['lmerMod']
Formula: science ~ grade_0 + grade_sq + (1 | stdntid)
Data: starwide.long.clean
REML criterion at convergence: 88562.1
Scaled residuals:
Min 1Q Median 3Q Max
-4.6561 -0.5428 -0.0099 0.5538 4.5436
Random effects:
Groups Name Variance Std.Dev.
stdntid (Intercept) 1133.2 33.66
Residual 939.1 30.65
Number of obs: 8826, groups: stdntid, 1471
Fixed effects:
Estimate Std. Error t value
(Intercept) 646.2231 1.1379 567.92
grade_0 59.0262 0.6812 86.65
grade_sq -6.9507 0.1308 -53.15
Correlation of Fixed Effects:
(Intr) grad_0
grade_0 -0.485
grade_sq 0.383 -0.960
model.cube <- lmer(science ~ grade_0 + grade_cube + (1|stdntid), data = starwide.long.clean)
summary(model.cube)
Linear mixed model fit by REML ['lmerMod']
Formula: science ~ grade_0 + grade_cube + (1 | stdntid)
Data: starwide.long.clean
REML criterion at convergence: 89219.7
Scaled residuals:
Min 1Q Median 3Q Max
-4.4401 -0.5735 -0.0205 0.5745 4.4795
Random effects:
Groups Name Variance Std.Dev.
stdntid (Intercept) 1119 33.45
Residual 1026 32.04
Number of obs: 8826, groups: stdntid, 1471
Fixed effects:
Estimate Std. Error t value
(Intercept) 651.90787 1.13224 575.77
grade_0 43.18746 0.47165 91.57
grade_cube -0.79473 0.01795 -44.27
Correlation of Fixed Effects:
(Intr) grad_0
grade_0 -0.503
grade_cube 0.349 -0.906
model.both <- lmer(science ~ grade_0 + grade_sq + grade_cube + (1|stdntid), REML = FALSE, data = starwide.long.clean)
summary(model.both)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula:
science ~ grade_0 + grade_sq + grade_cube + (1 | stdntid)
Data: starwide.long.clean
AIC BIC logLik deviance df.resid
87223.4 87265.9 -43605.7 87211.4 8820
Scaled residuals:
Min 1Q Median 3Q Max
-5.2663 -0.4985 0.0027 0.5236 4.4394
Random effects:
Groups Name Variance Std.Dev.
stdntid (Intercept) 1158.5 34.04
Residual 781.6 27.96
Number of obs: 8826, groups: stdntid, 1471
Fixed effects:
Estimate Std. Error t value
(Intercept) 635.77161 1.13923 558.07
grade_0 106.75456 1.38750 76.94
grade_sq -33.07934 0.68954 -47.97
grade_cube 3.48382 0.09055 38.47
Correlation of Fixed Effects:
(Intr) grad_0 grd_sq
grade_0 -0.411
grade_sq 0.295 -0.955
grade_cube -0.238 0.894 -0.985
anova to test…anova(model.square, model.both)
refitting model(s) with ML (instead of REML)
Data: starwide.long.clean
Models:
model.square: science ~ grade_0 + grade_sq + (1 | stdntid)
model.both: science ~ grade_0 + grade_sq + grade_cube + (1 | stdntid)
npar AIC BIC logLik deviance Chisq Df
model.square 5 88570 88605 -44280 88560
model.both 6 87223 87266 -43606 87211 1348.6 1
Pr(>Chisq)
model.square
model.both < 2.2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
model.null.growth.het <- lme(science ~ grade_0,
random = ~ 1|stdntid,
data = starwide.long.clean,
method = "ML",
weights = varIdent(form = ~ 1 | grade_0)
)
summary(model.null.growth.het)
Linear mixed-effects model fit by maximum likelihood
Data: starwide.long.clean
Random effects:
Formula: ~1 | stdntid
(Intercept) Residual
StdDev: 34.32642 67.40924
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | grade_0
Parameter estimates:
0 1 2 3 4 5
1.0000000 0.4987552 0.4549923 0.4852018 0.3421510 0.3497103
Fixed effects: science ~ grade_0
Correlation:
(Intr)
grade_0 -0.593
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-4.91503937 -0.71613675 -0.05974243 0.51662541 4.30180514
Number of Observations: 8826
Number of Groups: 1471