Random effect dataset
https://rpubs.com/rslbliss/r_mlm_ws
https://methods101.com.au/docs/soci832_11_3_multilevel_models_week_11/
Dataset is students from 10 handpicked schools, representing a subset of students and schools from a US survey of eight-grade students at 1000 schools (800 public 200 private).
There are quite a lot of variables in the dataset, but we are going start by using just three:
# Variable Type Len Pos Label
-----------------------------------------------------------------------------------------------
15 cstr Num 8 112
5 homework Num 8 32 Time spent on math homework each week
11 math Num 8 80 Math score
4 meanses Num 8 24 Mean SES for the school
7 parented Num 8 48 Parents highest education level
10 percmin Num 8 72 Percent minority in school
8 public Num 8 56 Public school: 1=public, 0=non-public
13 race Num 8 96 race of student, 1=asian, 2=Hispanic, 3=Black, 4=White, 5=Native American
9 ratio Num 8 64 Student-Teacher ratio
18 region Num 8 136
1 schid Num 8 0 School ID
19 schnum Num 8 144 group(schid)
16 scsize Num 8 120
14 sctype Num 8 104 Type of school, 1=public, 2=catholic, 3=Private
other religious, 4=Private non-r
3 ses Num 8 16 Socioecnonomic Status
12 sex Num 8 88 Sex: 1=male, 2=female
2 stuid Num 8 8 Student ID
17 urban Num 8 128
6 white Num 8 40 Race: 1=white, 0=non-white
Here,
homework: number of hours of homework - Level 1 variable (associated with students) math: score in a math test - expanatory variable schnum: a number for each school - Level 2 variable (associated with school) scsize: school size - Level 2 variable public: school type - Level 2 variable
library(haven)
imm10_data <- read_dta("https://stats.idre.ucla.edu/stat/examples/imm/imm10.dta")
data visualization
Summary data
hist(imm10_data$math)
table(imm10_data$homework)
##
## 0 1 2 3 4 5 6 7
## 27 105 48 27 25 25 2 1
table(imm10_data$schnum)
##
## 1 2 3 4 5 6 7 8 9 10
## 23 20 24 22 22 20 67 21 21 20
table(imm10_data$public)
##
## 0 1
## 67 193
Random intercept - two groups
imm10 <- imm10_data
imm10
## # A tibble: 260 × 19
## schid stuid ses meanses homework white parented public ratio percmin math
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 7472 3 -0.130 -0.483 1 1 2 1 19 0 48
## 2 7472 8 -0.390 -0.483 0 1 2 1 19 0 48
## 3 7472 13 -0.800 -0.483 0 1 2 1 19 0 53
## 4 7472 17 -0.720 -0.483 1 1 2 1 19 0 42
## 5 7472 27 -0.740 -0.483 2 1 2 1 19 0 43
## 6 7472 28 -0.580 -0.483 1 1 2 1 19 0 57
## 7 7472 30 -0.830 -0.483 5 1 2 1 19 0 33
## 8 7472 36 -0.510 -0.483 1 1 3 1 19 0 64
## 9 7472 37 -0.560 -0.483 1 1 2 1 19 0 36
## 10 7472 42 0.210 -0.483 2 1 3 1 19 0 56
## # ℹ 250 more rows
## # ℹ 8 more variables: sex <dbl>, race <dbl>, sctype <dbl>, cstr <dbl>,
## # scsize <dbl>, urban <dbl>, region <dbl>, schnum <dbl>
Visualization of models
Simple linear regression would be
ggplot(imm10, aes(y = math, x = homework)) +
geom_smooth(method = lm, color = "black") +
geom_point(size = 1.5, alpha = 0.8) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(color='School')
When we look into binary variable,
ggplot(imm10, aes(y = math, x = homework)) +
geom_smooth(method = lm, color = "black") +
geom_point(size = 1.5, alpha = 0.8, aes(colour = as.factor(public))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(color='Public school')
Store simple linear regression
model1 <- lm(math ~ homework, data = imm10)
imm10$FEPredictions <- fitted(model1)
slm_est <- coef(summary(model1))[ , "Estimate"]
slm_se <- coef(summary(model1))[ , "Std. Error"]
Multiple linear regression with binary variable
model2 <- lm(math ~ homework + public, data = imm10)
imm10$FEPredictions <- fitted(model2)
mulm_est <- coef(summary(model2))[ , "Estimate"]
mulm_se <- coef(summary(model2))[ , "Std. Error"]
This plots would respresent the result of adjusted association (math test result ~ homework + school type )
library(moderndive)
gg <- ggplot(imm10, aes(y = math, x = homework)) +
geom_parallel_slopes(aes(col = factor(imm10$public)), se =F) +
geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$public))) +
geom_smooth(method = "lm") +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(color='Public school')
print(gg)
Simple vs multiple (binary)
slm_est %>% summary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.572 13.697 23.823 23.823 33.948 44.074
gg <- ggplot(imm10, aes(y = math, x = homework)) +
geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1) +
geom_parallel_slopes(aes(col = factor(imm10$public)), se =F) +
geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$public))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(color='Public school') +
ggtitle("Modeling with Simple linear regression")
print(gg)
Making random effect model
library(lmerTest)
library(lme4)
model2 <- lmer(data = imm10, math ~ homework + (1|public))
summary(model2)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: math ~ homework + (1 | public)
## Data: imm10
##
## REML criterion at convergence: 1851.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.46023 -0.61670 -0.01182 0.58184 2.58110
##
## Random effects:
## Groups Name Variance Std.Dev.
## public (Intercept) 74.50 8.631
## Residual 71.79 8.473
## Number of obs: 260, groups: public, 2
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 50.3850 6.2050 1.0415 8.120 0.0721 .
## homework 1.9051 0.3882 257.9447 4.908 1.64e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## homework -0.152
Mixed effect model vs multiple (binary)
imm10$FEPredictions <- fitted(model2)
ml_est <- coef(summary(model2))[ , "Estimate"]
ml_se <- coef(summary(model2))[ , "Std. Error"]
gg <- ggplot(imm10, aes(y = math, x = homework)) +
geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1) +
geom_parallel_slopes(aes(col = factor(imm10$public)), se =F) +
geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$public))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(color='Public school') +
ggtitle("Modeling with randeom effect (multi level analysis)")
print(gg)
With effect modification…
Multiple linear regression with effect modification (binary)
gg <- ggplot(imm10, aes(y = math, x = homework)) +
geom_smooth(method = "lm", aes(col = factor(imm10$public)), se = F) +
geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$public))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(color='Public school')
print(gg)
Simple LM + interaction term
Simple vs Multiple linear regression with effect modification (binary)
gg <- ggplot(imm10, aes(y = math, x = homework)) +
geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1) +
geom_smooth(method = "lm", aes(col = factor(imm10$public)), se = F) +
geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$public))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(color='Public school')
print(gg)
Random effect
Calculation of random slope
model2 <- lme4::lmer(data = imm10, math ~ homework + (homework|public))
summary(model2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: math ~ homework + (homework | public)
## Data: imm10
##
## REML criterion at convergence: 1848.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.69605 -0.63601 -0.02561 0.63863 2.56869
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## public (Intercept) 122.7580 11.0796
## homework 0.8922 0.9446 -1.00
## Residual 70.9827 8.4251
## Number of obs: 260, groups: public, 2
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 51.2419 7.9209 6.469
## homework 1.7881 0.7738 2.311
##
## Correlation of Fixed Effects:
## (Intr)
## homework -0.917
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
lmerTest::lmer(data = imm10, math ~ homework + (homework|public)) %>% summary
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: math ~ homework + (homework | public)
## Data: imm10
##
## REML criterion at convergence: 1848.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.69605 -0.63601 -0.02561 0.63863 2.56869
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## public (Intercept) 122.7580 11.0796
## homework 0.8922 0.9446 -1.00
## Residual 70.9827 8.4251
## Number of obs: 260, groups: public, 2
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 51.2419 7.9209 0.9955 6.469 0.0984 .
## homework 1.7881 0.7738 1.0612 2.311 0.2483
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## homework -0.917
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
Mixed model with random slope vs Multiple linear regression with effect modification (binary)
imm10$REPredictions <- fitted(model2)
ml_est <- coef(summary(model2))[ , "Estimate"]
ml_se <- coef(summary(model2))[ , "Std. Error"]
ml_est
## (Intercept) homework
## 51.241945 1.788118
gg <- ggplot(imm10, aes(y = math, x = homework, col=factor(imm10$public))) +
geom_smooth(method = "lm", se = F) +
geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1) +
geom_point(size = 1.5, alpha = 0.8) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(color='Public school')
print(gg)
# Comparison - all models at once
ggplot(imm10, aes(y = math, x = homework, col=factor(imm10$public))) +
geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1) +
geom_parallel_slopes(aes(col = factor(imm10$public)), se =F) + #geom_smooth(method = "lm", se = F) +
geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1) +
geom_smooth(method = "lm", se = F) +
geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1) +
geom_point(size = 1.5, alpha = 0.8) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(color='Public school')
The sample analysis can be done for a categorical variable
Data filtering
Visualization of models - with more complicated case
Here we are going to consider random effect by different schools
Schools are limited to school2, 3 and 4
imm10 <- imm10_data %>% filter(schnum == 2 | schnum == 3 | schnum == 4 )
ggplot(imm10, aes(y = math, x = homework)) +
geom_smooth(method = lm, color = "black") +
geom_point(size = 1.5, alpha = 0.8) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework")
ggplot(imm10, aes(y = math, x = homework)) +
geom_smooth(method = lm, color = "black") +
geom_point(size = 1.5, alpha = 0.8, aes(colour = as.factor(schnum))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(color='School')
model1 <- lm(math ~ homework, data = imm10)
imm10$FEPredictions <- fitted(model1)
slm_est <- coef(summary(model1))[ , "Estimate"]
slm_se <- coef(summary(model1))[ , "Std. Error"]
With adjusted model, we can make 3 different models, showing correlation between x and y, adjusted by school (but across all the schools)
library(moderndive)
gg <- ggplot(imm10, aes(y = math, x = homework)) +
geom_parallel_slopes(aes(col = factor(imm10$schnum)), se =F) +
geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(col='School')
print(gg)
To compare with result from simple linear regression, there is difference in slope, between adjusted model and simple linear model.
gg <- ggplot(imm10, aes(y = math, x = homework)) +
geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1) +
geom_parallel_slopes(aes(col = factor(imm10$schnum)), se =F) +
geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(col='School') +
ggtitle("Modeling with Simple linear regression")
print(gg)
With schools as random effects? (accounting the schools as average values)
library(lmerTest)
library(lme4)
model2 <- lmer(data = imm10, math ~ homework + (1|schnum))
imm10$FEPredictions <- fitted(model2)
ml_est <- coef(summary(model2))[ , "Estimate"]
ml_se <- coef(summary(model2))[ , "Std. Error"]
gg <- ggplot(imm10, aes(y = math, x = homework)) +
geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1) +
geom_parallel_slopes(aes(col = factor(imm10$schnum)), se =F) +
geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(col='School') +
ggtitle("Modeling with randeom effect (multi level analysis)")
print(gg)
With effect modification…
gg <- ggplot(imm10, aes(y = math, x = homework)) +
geom_smooth(method = "lm", aes(col = factor(imm10$schnum)), se = F) +
geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(col='School')
print(gg)
Simple LM + interaction term
gg <- ggplot(imm10, aes(y = math, x = homework)) +
geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1) +
geom_smooth(method = "lm", aes(col = factor(imm10$schnum)), se = F) +
geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(col='School')
print(gg)
Random effect
imm10$schnum <- as.factor(imm10$schnum)
model2 <- lme4::lmer(data = imm10, math ~ homework + (homework|schnum))
summary(model2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: math ~ homework + (homework | schnum)
## Data: imm10
##
## REML criterion at convergence: 448.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.93689 -0.52976 -0.02434 0.54695 2.45935
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## schnum (Intercept) 49.48 7.035
## homework 31.06 5.573 -0.88
## Residual 48.05 6.932
## Number of obs: 66, groups: schnum, 3
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 40.509 4.358 9.296
## homework 3.603 3.287 1.096
##
## Correlation of Fixed Effects:
## (Intr)
## homework -0.867
imm10$REPredictions <- fitted(model2)
ml_est <- coef(summary(model2))[ , "Estimate"]
ml_se <- coef(summary(model2))[ , "Std. Error"]
ml_est
## (Intercept) homework
## 40.508687 3.602677
gg <- ggplot(imm10, aes(y = math, x = homework, col=factor(imm10$schnum))) +
geom_smooth(method = "lm", se = F) +
geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1) +
geom_point(size = 1.5, alpha = 0.8) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(col='School')
print(gg)
Categorical variable with more factors
Data filtering
imm10 <- imm10_data
imm10
## # A tibble: 260 × 19
## schid stuid ses meanses homework white parented public ratio percmin math
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 7472 3 -0.130 -0.483 1 1 2 1 19 0 48
## 2 7472 8 -0.390 -0.483 0 1 2 1 19 0 48
## 3 7472 13 -0.800 -0.483 0 1 2 1 19 0 53
## 4 7472 17 -0.720 -0.483 1 1 2 1 19 0 42
## 5 7472 27 -0.740 -0.483 2 1 2 1 19 0 43
## 6 7472 28 -0.580 -0.483 1 1 2 1 19 0 57
## 7 7472 30 -0.830 -0.483 5 1 2 1 19 0 33
## 8 7472 36 -0.510 -0.483 1 1 3 1 19 0 64
## 9 7472 37 -0.560 -0.483 1 1 2 1 19 0 36
## 10 7472 42 0.210 -0.483 2 1 3 1 19 0 56
## # ℹ 250 more rows
## # ℹ 8 more variables: sex <dbl>, race <dbl>, sctype <dbl>, cstr <dbl>,
## # scsize <dbl>, urban <dbl>, region <dbl>, schnum <dbl>
Visualization of models
ggplot(imm10, aes(y = math, x = homework)) +
geom_smooth(method = lm, color = "black") +
geom_point(size = 1.5, alpha = 0.8) +
theme_classic() +
ylab("Math test result") +
ylab("Hours spent on homework") +
labs(color='School')
ggplot(imm10, aes(y = math, x = homework)) +
geom_smooth(method = lm, color = "black") +
geom_point(size = 1.5, alpha = 0.8, aes(colour = as.factor(schnum))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(color='School')
model1 <- lm(math ~ homework, data = imm10)
imm10$FEPredictions <- fitted(model1)
slm_est <- coef(summary(model1))[ , "Estimate"]
slm_se <- coef(summary(model1))[ , "Std. Error"]
library(moderndive)
gg <- ggplot(imm10, aes(y = math, x = homework)) +
geom_parallel_slopes(aes(col = factor(imm10$schnum)), se =F) +
geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(col='School')
print(gg)
gg <- ggplot(imm10, aes(y = math, x = homework)) +
geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1) +
geom_parallel_slopes(aes(col = factor(imm10$schnum)), se =F) +
geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(col='School') +
ggtitle("Modeling with Simple linear regression")
print(gg)
library(lmerTest)
library(lme4)
model2 <- lmer(data = imm10, math ~ homework + (1|schnum))
imm10$FEPredictions <- fitted(model2)
ml_est <- coef(summary(model2))[ , "Estimate"]
ml_se <- coef(summary(model2))[ , "Std. Error"]
gg <- ggplot(imm10, aes(y = math, x = homework)) +
geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1) +
geom_parallel_slopes(aes(col = factor(imm10$schnum)), se =F) +
geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(col='School') +
ggtitle("Modeling with randeom effect (multi level analysis)")
print(gg)
With effect modification…
gg <- ggplot(imm10, aes(y = math, x = homework)) +
geom_smooth(method = "lm", aes(col = factor(imm10$schnum)), se = F) +
geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(col='School')
print(gg)
Simple LM + interaction term
gg <- ggplot(imm10, aes(y = math, x = homework)) +
geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1) +
geom_smooth(method = "lm", aes(col = factor(imm10$schnum)), se = F) +
geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(col='School')
print(gg)
Random effect
imm10$schnum <- as.factor(imm10$schnum)
model2 <- lme4::lmer(data = imm10, math ~ homework + (homework|schnum))
summary(model2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: math ~ homework + (homework | schnum)
## Data: imm10
##
## REML criterion at convergence: 1764
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.5110 -0.5357 0.0175 0.6121 2.5708
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## schnum (Intercept) 69.31 8.325
## homework 22.45 4.738 -0.81
## Residual 43.07 6.563
## Number of obs: 260, groups: schnum, 10
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 44.771 2.744 16.318
## homework 2.040 1.554 1.313
##
## Correlation of Fixed Effects:
## (Intr)
## homework -0.804
imm10$REPredictions <- fitted(model2)
ml_est <- coef(summary(model2))[ , "Estimate"]
ml_se <- coef(summary(model2))[ , "Std. Error"]
ml_est
## (Intercept) homework
## 44.770593 2.040465
gg <- ggplot(imm10, aes(y = math, x = homework, col=factor(imm10$schnum))) +
geom_smooth(method = "lm", se = F) +
geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1) +
geom_point(size = 1.5, alpha = 0.8) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(col='School')
print(gg)
Simple LM + interaction term
gg <- ggplot(imm10, aes(y = math, x = homework)) +
geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1) +
geom_smooth(method = "lm", aes(col = factor(imm10$schnum)), se = F) +
geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(col='School')
print(gg)
Continuous random effect - scool size
Data filtering
imm10 <- imm10_data
imm10
## # A tibble: 260 × 19
## schid stuid ses meanses homework white parented public ratio percmin math
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 7472 3 -0.130 -0.483 1 1 2 1 19 0 48
## 2 7472 8 -0.390 -0.483 0 1 2 1 19 0 48
## 3 7472 13 -0.800 -0.483 0 1 2 1 19 0 53
## 4 7472 17 -0.720 -0.483 1 1 2 1 19 0 42
## 5 7472 27 -0.740 -0.483 2 1 2 1 19 0 43
## 6 7472 28 -0.580 -0.483 1 1 2 1 19 0 57
## 7 7472 30 -0.830 -0.483 5 1 2 1 19 0 33
## 8 7472 36 -0.510 -0.483 1 1 3 1 19 0 64
## 9 7472 37 -0.560 -0.483 1 1 2 1 19 0 36
## 10 7472 42 0.210 -0.483 2 1 3 1 19 0 56
## # ℹ 250 more rows
## # ℹ 8 more variables: sex <dbl>, race <dbl>, sctype <dbl>, cstr <dbl>,
## # scsize <dbl>, urban <dbl>, region <dbl>, schnum <dbl>
The same analysis but with a continuous variable
Visualization of models
ggplot(imm10, aes(y = math, x = homework)) +
geom_smooth(method = lm, color = "black") +
geom_point(size = 1.5, alpha = 0.8) +
theme_classic() +
ylab("Math test result") +
ylab("Hours spent on homework") +
labs(color='School')
ggplot(imm10, aes(y = math, x = homework)) +
geom_smooth(method = lm, color = "black") +
geom_point(size = 1.5, alpha = 0.8, aes(colour = meanses)) +
theme_classic() +
scale_color_viridis_c() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(color='Mean socio economic status (by schoool)')
model1 <- lm(math ~ homework + meanses, data = imm10)
imm10$FEPredictions <- fitted(model1)
slm_est <- coef(summary(model1))[ , "Estimate"]
slm_se <- coef(summary(model1))[ , "Std. Error"]
gg <- ggplot(imm10, aes(y = math, x = homework)) +
facet_wrap(~meanses) +
geom_smooth(method = "lm", se =F) +
geom_point(size = 1.5, alpha = 0.8) +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(col='School')
print(gg)
Grey is regular simple linear regression, and red is regular mixed linear regression
model2 <- lmer(data = imm10, math ~ homework + (1|meanses))
imm10$FEPredictions <- fitted(model2)
ml_est <- coef(summary(model2))[ , "Estimate"]
ml_se <- coef(summary(model2))[ , "Std. Error"]
gg <- ggplot(imm10, aes(y = math, x = homework)) +
facet_wrap(~meanses) +
geom_smooth(method = "lm", se =F) +
geom_point(size = 1.5, alpha = 0.8) +
geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1, col = "grey") +
geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1, col = "red") +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(col='School')
print(gg)
The ultimate model
multivariate, random mixed effect model
imm10 <- imm10_data
model5 <- lmer(math ~ homework + ses + meanses + (homework|schnum) + (meanses|region), REML=FALSE, data = imm10)
summary(model5)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: math ~ homework + ses + meanses + (homework | schnum) + (meanses |
## region)
## Data: imm10
##
## AIC BIC logLik deviance df.resid
## 1758.0 1797.2 -868.0 1736.0 249
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.65746 -0.67499 0.03456 0.63597 2.65138
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## schnum (Intercept) 4.920e+01 7.014486
## homework 1.708e+01 4.132667 -0.99
## region (Intercept) 0.000e+00 0.000000
## meanses 1.666e-06 0.001291 NaN
## Residual 4.132e+01 6.428340
## Number of obs: 260, groups: schnum, 10; region, 3
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 48.0222 2.3525 9.9707 20.413 1.83e-09 ***
## homework 1.8024 1.3600 9.6920 1.325 0.215476
## ses 2.3765 0.6359 239.2027 3.737 0.000233 ***
## meanses 6.2304 1.0196 11.6230 6.110 6.02e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) homwrk ses
## homework -0.973
## ses 0.042 -0.037
## meanses 0.066 -0.009 -0.611
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
imm10$REPredictions <- fitted(model5)
ml_est <- coef(summary(model5))[ , "Estimate"]
ml_se <- coef(summary(model5))[ , "Std. Error"]
ml_est
## (Intercept) homework ses meanses
## 48.022201 1.802384 2.376458 6.230416
gg <- ggplot(imm10, aes(y = math, x = homework)) +
facet_wrap(~meanses) +
geom_smooth(method = "lm", se =F) +
geom_point(size = 1.5, alpha = 0.8) +
geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1, col = "grey") +
geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1, col = "red") +
theme_classic() +
ylab("Math test result") +
xlab("Hours spent on homework") +
labs(col='School')
print(gg)
Rather then homework, the socio economic status is more important.
Bibliography
## Computing. R Foundation for Statistical Computing, Vienna, Austria. <https://www.R-project.org/>. We have invested a lot of time and effort in creating R, please cite it when using it for data analysis. See also 'citation("pkgname")' for citing R packages.
## Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). "Welcome to the tidyverse." Journal of Open Source Software_, *4*(43), 1686. doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
## R. version 0.5.0. Buffalo, New York. http://github.com/trinker/pacman
## J, reikoch, Beasley W, O'Connor B, Warnes GR, Quinn M, Kamvar ZN, Gao C (2024). yaml: Methods to Convert R Data to YAML and Back_. R package version 2.3.10, <https://CRAN.R-project.org/package=yaml>. ATTENTION: This citation information has been auto-generated from the package DESCRIPTION file and may need manual editing, see 'help("citation")'.
## version 0.4.4, <https://CRAN.R-project.org/package=reactable>.