Part 1: Running and Our “Final” 2-Level Model
Part 2: Check Those Assumptions!
Based on the Shapiro test, the residuals do not appear to follow a normal distribution. The p-value for the Shapiro test was small.
There appears to be multicolinearity concerns with this dataset. There is a significant relationship between residuals and math scores.
The observations we should consider removing are observations with Cook’s distances above 0.03. These observations are above the threshold.
Part 3: Compare the Results
I was not able to get a result from the rlmer. However, I was able to identify which outliers should be removed based on Cook’s distance. I removed those outlier; however, since there were so few of them, it did not change the results much. I would assume that the results are moderate violations. The estimate of the gkclasstype is very significant in the model.
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(lme4))
suppressPackageStartupMessages(library(psych))
library(haven)
library(tibble)
projectSTAR <- haven::read_dta("projectSTAR.dta")
glimpse(projectSTAR)
Rows: 6,325
Columns: 27
$ stdntid <dbl> 10001, 10133, 10246, 10263, 10266, 10275, 10281, 10282, 10285, 10286, 102...
$ race <dbl+lbl> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, ...
$ gender <dbl+lbl> 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 2, 2, 2, 2, 1, 1, 1, ...
$ FLAGSGK <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ flaggk <dbl+lbl> 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, ...
$ gkclasstype <dbl+lbl> 3, 3, 3, 1, 2, 3, 1, 3, 1, 2, 3, 3, 2, 3, 3, 3, 1, 2, 3, 3, 2, 1, 1, ...
$ gkschid <dbl> 169229, 169280, 218562, 205492, 257899, 161176, 189382, 189382, 201449, 2...
$ gksurban <dbl+lbl> 2, 2, 4, 2, 3, 3, 2, 2, 3, 3, 3, 2, 2, 2, 2, 3, 3, 3, 3, 3, 2, 1, 2, ...
$ gktchid <dbl> 16922904, 16928003, 21856202, 20549204, 25789904, 16117602, 18938204, 189...
$ gktgen <dbl+lbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...
$ gktrace <dbl+lbl> 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, ...
$ gkthighdegree <dbl+lbl> 2, 2, 3, 2, 3, 2, 3, 3, 3, 3, 2, 2, 3, 2, 2, 2, 2, ...
$ gktcareer <dbl+lbl> 4, 4, 4, 4, 4, NA, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...
$ gktyears <dbl> 5, 7, 8, 3, 12, 2, 7, 14, 4, 6, 11, 16, 12, 5, 17, 10, 6, 10, 13, 9, 18, ...
$ gkclasssize <dbl> 24, 22, 17, 17, 24, 24, 13, 24, 14, 23, 23, 22, 20, 24, 23, 27, 17, 24, 2...
$ gkfreelunch <dbl+lbl> 2, 2, 2, 1, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, ...
$ gkrepeat <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ gkspeced <dbl+lbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...
$ gkspecin <dbl+lbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...
$ gkpresent <dbl> 161, 175, 132, 178, 170, 94, 160, 154, 172, 95, 163, 172, 180, 149, 173, ...
$ gkabsent <dbl> 19, 5, 28, 2, 10, 3, 2, 7, 8, 2, 17, 8, 0, 31, 7, NA, 20, 7, 2, 19, 24, 0...
$ gktreadss <dbl> NA, 427, 450, 483, 456, 411, 443, 448, 463, 472, 428, 545, 408, 422, 472,...
$ gktmathss <dbl> NA, 478, 494, 513, 513, 468, 473, 449, 520, 536, 484, 626, 454, 439, 528,...
$ gktlistss <dbl> NA, 509, 549, 554, 520, 571, 595, 540, 565, 595, NA, 622, 474, 536, 578, ...
$ gkwordskillss <dbl> NA, 418, 444, 431, 468, 396, 444, 444, 480, 486, 423, 524, 410, 423, 458,...
$ gkmotivraw <dbl> 23, 24, 28, 27, 25, 24, NA, NA, 26, 27, 24, 24, 23, 28, 24, NA, 26, 25, N...
$ gkselfconcraw <dbl> 52, 53, 56, 61, 54, 55, NA, NA, 52, 61, 55, 49, 49, 59, 50, NA, 58, 45, N...
star.clean <- projectSTAR %>%
mutate(.,
schoolid = gkschid,
classid = gktchid,
read = gktreadss,
selfcon = gkselfconcraw,
high_degree = as_factor(gkthighdegree),
classtype = as_factor(gkclasstype),
years_exp = gktyears,
urbanicity = as_factor(gksurban),
math = gktmathss,
stu_frl = as_factor(gkfreelunch)
) %>%
group_by(classid) %>% # Create a new variable, which is the average FRL by classroom:
mutate(.,
class_frl = mean(gkfreelunch, na.rm = TRUE) - 1,
class_math = mean(math, na.rm = TRUE) - 1
) %>%
ungroup() %>%
group_by(schoolid) %>% # Create a new variable, which is the average FRL by school:
mutate(.,
school_frl = mean(gkfreelunch, na.rm = TRUE) - 1,
school_math = mean(math, na.rm = TRUE) - 1
) %>%
ungroup() %>%
select(.,
schoolid,
classid,
read,
math,
classtype,
years_exp,
urbanicity,
high_degree,
selfcon,
stu_frl,
class_frl,
school_frl,
class_math,
school_math
)
glimpse(star.clean)
Rows: 6,325
Columns: 14
$ schoolid <dbl> 169229, 169280, 218562, 205492, 257899, 161176, 189382, 189382, 201449, 230...
$ classid <dbl> 16922904, 16928003, 21856202, 20549204, 25789904, 16117602, 18938204, 18938...
$ read <dbl> NA, 427, 450, 483, 456, 411, 443, 448, 463, 472, 428, 545, 408, 422, 472, N...
$ math <dbl> NA, 478, 494, 513, 513, 468, 473, 449, 520, 536, 484, 626, 454, 439, 528, N...
$ classtype <fct> REGULAR + AIDE CLASS, REGULAR + AIDE CLASS, REGULAR + AIDE CLASS, SMALL CLA...
$ years_exp <dbl> 5, 7, 8, 3, 12, 2, 7, 14, 4, 6, 11, 16, 12, 5, 17, 10, 6, 10, 13, 9, 18, 1,...
$ urbanicity <fct> SUBURBAN, SUBURBAN, URBAN, SUBURBAN, RURAL, RURAL, SUBURBAN, SUBURBAN, RURA...
$ high_degree <fct> BACHELORS, BACHELORS, MASTERS, BACHELORS, MASTERS, BACHELORS, MASTERS, MAST...
$ selfcon <dbl> 52, 53, 56, 61, 54, 55, NA, NA, 52, 61, 55, 49, 49, 59, 50, NA, 58, 45, NA,...
$ stu_frl <fct> NON-FREE LUNCH, NON-FREE LUNCH, NON-FREE LUNCH, FREE LUNCH, NON-FREE LUNCH,...
$ class_frl <dbl> 0.8750000, 0.9545455, 0.7647059, 0.2352941, 0.4583333, 0.6086957, 0.8461538...
$ school_frl <dbl> 0.86206897, 0.86206897, 0.71666667, 0.15189873, 0.62857143, 0.63291139, 0.8...
$ class_math <dbl> 491.0476, 494.1000, 487.0000, 498.2500, 481.0000, 477.5833, 467.0000, 494.0...
$ school_math <dbl> 497.8768, 485.9811, 500.1071, 501.6133, 475.2800, 475.7821, 478.4062, 478.4...
model.0 <- lmer(math ~ selfcon + + high_degree + classtype + (1|schoolid), REML = FALSE, data = star.clean)
summary(model.0)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method [lmerModLmerTest]
Formula: math ~ selfcon + +high_degree + classtype + (1 | schoolid)
Data: star.clean
AIC BIC logLik deviance df.resid
49202.0 49260.2 -24592.0 49184.0 4747
Scaled residuals:
Min 1Q Median 3Q Max
-3.3594 -0.6586 -0.0782 0.5862 4.3409
Random effects:
Groups Name Variance Std.Dev.
schoolid (Intercept) 465.1 21.57
Residual 1736.4 41.67
Number of obs: 4756, groups: schoolid, 73
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 466.0099 7.3119 2558.6410 63.733 < 2e-16 ***
selfcon 0.4775 0.1202 4708.7817 3.972 7.22e-05 ***
high_degreeMASTERS -0.1069 1.5930 4720.2357 -0.067 0.947
high_degreeMASTERS + -5.2694 4.2195 4755.2835 -1.249 0.212
high_degreeSPECIALIST 21.2655 8.6539 4737.0385 2.457 0.014 *
classtypeREGULAR CLASS -9.1386 1.5455 4705.3062 -5.913 3.59e-09 ***
classtypeREGULAR + AIDE CLASS -9.2654 1.5292 4699.2676 -6.059 1.48e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) selfcn h_MAST h_MAS+ h_SPEC cREGUC
selfcon -0.924
hgh_MASTERS -0.028 -0.036
hg_MASTERS+ -0.003 -0.018 0.143
h_SPECIALIS 0.000 -0.036 0.099 0.014
cREGULARCLA -0.156 0.055 -0.083 -0.033 0.115
cREGULAR+AC -0.150 0.046 -0.078 -0.013 0.114 0.538
augment function from the broom.mixed Package to Get Predictions, Residuals, Cook’s Distances, Etc.:library(broom.mixed)
diagnostics <- augment(model.0)
ggplot(data = diagnostics, mapping = aes(x = .resid)) +
geom_histogram(binwidth = .50) + theme_classic() +
labs(title = "Histogram of Residuals",
x = "Residual Value") +
geom_vline(xintercept = c(-2.5, 2.5), linetype = "d otted")
Error in grid.Call.graphics(C_segments, x$x0, x$y0, x$x1, x$y1, x$arrow) :
invalid hex digit in 'color' or 'lty'
shapiro.test(diagnostics$.resid)
Shapiro-Wilk normality test
data: diagnostics$.resid
W = 0.98919, p-value < 2.2e-16
ggplot(data = diagnostics, mapping = aes(x = .fitted, y = .resid)) +
geom_point() + labs(title = "RVF Plot",
x = "Predicted Value, math scores",
y = "Residual Value") + theme_classic()
ggplot(data = diagnostics, mapping = aes(x = .fitted, y = .cooksd, label = schoolid)) +
geom_point() + geom_text(nudge_x = .25) + theme_classic() +
labs(title = "Cook's Distance Plot",
x = "Predicted Value, % math score",
y = "Cook's Distance") +
geom_hline(yintercept = 4/816, linetype = "dotted")
NA
library(robustlmm)
package 㤼㸱robustlmm㤼㸲 was built under R version 4.0.3
model.robust <- rlmer(math ~ selfcon + classtype + high_degree + (1|schoolid), REML = FALSE, data = star.clean)
prod.trimmed <- diagnostics %>%
filter(., .cooksd < .03)
model.trimmed <- lmer(math ~ selfcon + classtype + high_degree + (1|schoolid), REML = FALSE, data = star.clean)
summary(model.trimmed)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method [lmerModLmerTest]
Formula: math ~ selfcon + classtype + high_degree + (1 | schoolid)
Data: star.clean
AIC BIC logLik deviance df.resid
49202.0 49260.2 -24592.0 49184.0 4747
Scaled residuals:
Min 1Q Median 3Q Max
-3.3594 -0.6586 -0.0782 0.5862 4.3409
Random effects:
Groups Name Variance Std.Dev.
schoolid (Intercept) 465.1 21.57
Residual 1736.4 41.67
Number of obs: 4756, groups: schoolid, 73
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 466.0099 7.3119 2558.6410 63.733 < 2e-16 ***
selfcon 0.4775 0.1202 4708.7817 3.972 7.22e-05 ***
classtypeREGULAR CLASS -9.1386 1.5455 4705.3062 -5.913 3.59e-09 ***
classtypeREGULAR + AIDE CLASS -9.2654 1.5292 4699.2676 -6.059 1.48e-09 ***
high_degreeMASTERS -0.1069 1.5930 4720.2357 -0.067 0.947
high_degreeMASTERS + -5.2694 4.2195 4755.2835 -1.249 0.212
high_degreeSPECIALIST 21.2655 8.6539 4737.0385 2.457 0.014 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) selfcn cREGUC cRE+AC h_MAST h_MAS+
selfcon -0.924
cREGULARCLA -0.156 0.055
cREGULAR+AC -0.150 0.046 0.538
hgh_MASTERS -0.028 -0.036 -0.083 -0.078
hg_MASTERS+ -0.003 -0.018 -0.033 -0.013 0.143
h_SPECIALIS 0.000 -0.036 0.115 0.114 0.099 0.014
library(modelsummary)
Registered S3 methods overwritten by 'htmltools':
method from
print.html tools:rstudio
print.shiny.tag tools:rstudio
print.shiny.tag.list tools:rstudio
Attaching package: 㤼㸱modelsummary㤼㸲
The following object is masked from 㤼㸱package:psych㤼㸲:
SD
models <- list(model.0, model.trimmed)
modelsummary(models, output = "markdown")
| Model 1 | Model 2 | |
|---|---|---|
| (Intercept) | 466.010 | 466.010 |
| (7.312) | (7.312) | |
| classtypeREGULAR + AIDE CLASS | -9.265 | -9.265 |
| (1.529) | (1.529) | |
| classtypeREGULAR CLASS | -9.139 | -9.139 |
| (1.545) | (1.545) | |
| high_degreeMASTERS | -0.107 | -0.107 |
| (1.593) | (1.593) | |
| high_degreeMASTERS + | -5.269 | -5.269 |
| (4.220) | (4.220) | |
| high_degreeSPECIALIST | 21.265 | 21.265 |
| (8.654) | (8.654) | |
| sd__(Intercept) | 21.567 | 21.567 |
| sd__Observation | 41.670 | 41.670 |
| selfcon | 0.477 | 0.477 |
| (0.120) | (0.120) | |
| AIC | 49202.0 | 49202.0 |
| BIC | 49260.2 | 49260.2 |
| Log.Lik. | -24592.022 | -24592.022 |