COD_week11_2_MGK_BTE3207

Minsik Kim

2024-11-18

Random effect dataset

https://stats.idre.ucla.edu/

https://methods101.com.au/docs/soci832_11_3_multilevel_models_week_11/

Dataset is students from 10 handpicked schools, representing a subset of students and schools from a US survey of eight-grade students at 1000 schools (800 public 200 private).

There are quite a lot of variables in the dataset, but we are going start by using just three:


 # Variable Type Len Pos Label
-----------------------------------------------------------------------------------------------
15 cstr     Num    8 112
5 homework Num    8  32 Time spent on math homework each week
11 math     Num    8  80 Math score
4 meanses  Num    8  24 Mean SES for the school
7 parented Num    8  48 Parents highest education level

10 percmin  Num    8  72 Percent minority in school
 8 public   Num    8  56 Public school: 1=public, 0=non-public
13 race     Num    8  96 race of student, 1=asian, 2=Hispanic, 3=Black, 4=White, 5=Native American
 9 ratio    Num    8  64 Student-Teacher ratio
18 region   Num    8 136
 1 schid    Num    8   0 School ID
19 schnum   Num    8 144 group(schid)
16 scsize   Num    8 120
14 sctype   Num    8 104 Type of school, 1=public, 2=catholic, 3=Private
                         other religious, 4=Private non-r
 3 ses      Num    8  16 Socioecnonomic Status
12 sex      Num    8  88 Sex: 1=male, 2=female
 2 stuid    Num    8   8 Student ID
17 urban    Num    8 128
 6 white    Num    8  40 Race: 1=white, 0=non-white

Here,

homework: number of hours of homework - Level 1 variable (associated with students) math: score in a math test - expanatory variable schnum: a number for each school - Level 2 variable (associated with school) scsize: school size - Level 2 variable public: school type - Level 2 variable

library(haven)

imm10_data <- read_dta("https://stats.idre.ucla.edu/stat/examples/imm/imm10.dta")

data visualization

Summary data

hist(imm10_data$math)

table(imm10_data$homework)

## 
##   0   1   2   3   4   5   6   7 
##  27 105  48  27  25  25   2   1

table(imm10_data$schnum)

## 
##  1  2  3  4  5  6  7  8  9 10 
## 23 20 24 22 22 20 67 21 21 20

table(imm10_data$public)

## 
##   0   1 
##  67 193

Random intercept - two groups

imm10 <- imm10_data 
imm10

## # A tibble: 260 × 19
##    schid stuid    ses meanses homework white parented public ratio percmin  math
##    <dbl> <dbl>  <dbl>   <dbl>    <dbl> <dbl>    <dbl>  <dbl> <dbl>   <dbl> <dbl>
##  1  7472     3 -0.130  -0.483        1     1        2      1    19       0    48
##  2  7472     8 -0.390  -0.483        0     1        2      1    19       0    48
##  3  7472    13 -0.800  -0.483        0     1        2      1    19       0    53
##  4  7472    17 -0.720  -0.483        1     1        2      1    19       0    42
##  5  7472    27 -0.740  -0.483        2     1        2      1    19       0    43
##  6  7472    28 -0.580  -0.483        1     1        2      1    19       0    57
##  7  7472    30 -0.830  -0.483        5     1        2      1    19       0    33
##  8  7472    36 -0.510  -0.483        1     1        3      1    19       0    64
##  9  7472    37 -0.560  -0.483        1     1        2      1    19       0    36
## 10  7472    42  0.210  -0.483        2     1        3      1    19       0    56
## # ℹ 250 more rows
## # ℹ 8 more variables: sex <dbl>, race <dbl>, sctype <dbl>, cstr <dbl>,
## #   scsize <dbl>, urban <dbl>, region <dbl>, schnum <dbl>

Visualization of models

Simple linear regression would be

ggplot(imm10, aes(y = math, x = homework)) +
        geom_smooth(method = lm, color = "black") +
        geom_point(size = 1.5, alpha = 0.8) +
        theme_classic() + 
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(color='School')

When we look into binary variable,

ggplot(imm10, aes(y = math, x = homework)) +
        geom_smooth(method = lm, color = "black") +
        geom_point(size = 1.5, alpha = 0.8, aes(colour = as.factor(public))) +
        theme_classic() + 
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(color='Public school')

Store simple linear regression

model1 <- lm(math ~ homework, data = imm10)
imm10$FEPredictions <- fitted(model1)
slm_est <- coef(summary(model1))[ , "Estimate"]
slm_se <- coef(summary(model1))[ , "Std. Error"]

Multiple linear regression with binary variable

model2 <- lm(math ~ homework + public, data = imm10)
imm10$FEPredictions <- fitted(model2)
mulm_est <- coef(summary(model2))[ , "Estimate"]
mulm_se <- coef(summary(model2))[ , "Std. Error"]

This plots would respresent the result of adjusted association (math test result ~ homework + school type )

library(moderndive)

gg <- ggplot(imm10, aes(y = math, x = homework)) +
        geom_parallel_slopes(aes(col = factor(imm10$public)), se =F) +
        geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$public))) +
        geom_smooth(method = "lm") +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(color='Public school') 

        
print(gg)

Simple vs multiple (binary)

slm_est %>% summary

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.572  13.697  23.823  23.823  33.948  44.074

gg <- ggplot(imm10, aes(y = math, x = homework)) +
        geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1) +
        geom_parallel_slopes(aes(col = factor(imm10$public)), se =F) +
        geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$public))) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(color='Public school') +
        ggtitle("Modeling with Simple linear regression")

        
print(gg)

Making random effect model

library(lmerTest)
library(lme4)

model2 <- lmer(data = imm10, math ~ homework + (1|public))

summary(model2)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: math ~ homework + (1 | public)
##    Data: imm10
## 
## REML criterion at convergence: 1851.2
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.46023 -0.61670 -0.01182  0.58184  2.58110 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  public   (Intercept) 74.50    8.631   
##  Residual             71.79    8.473   
## Number of obs: 260, groups:  public, 2
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  50.3850     6.2050   1.0415   8.120   0.0721 .  
## homework      1.9051     0.3882 257.9447   4.908 1.64e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##          (Intr)
## homework -0.152

Mixed effect model vs multiple (binary)

imm10$FEPredictions <- fitted(model2)
ml_est <- coef(summary(model2))[ , "Estimate"]
ml_se <- coef(summary(model2))[ , "Std. Error"]


gg <- ggplot(imm10, aes(y = math, x = homework)) +
        geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1) +
        geom_parallel_slopes(aes(col = factor(imm10$public)), se =F) +
        geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$public))) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(color='Public school') +
        ggtitle("Modeling with randeom effect (multi level analysis)")

        
print(gg)

With effect modification…

Multiple linear regression with effect modification (binary)

gg <- ggplot(imm10, aes(y = math, x = homework)) +
        geom_smooth(method = "lm", aes(col = factor(imm10$public)), se = F) +
        geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$public))) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(color='Public school')

        
print(gg)

Simple LM + interaction term

Simple vs Multiple linear regression with effect modification (binary)

gg <- ggplot(imm10, aes(y = math, x = homework)) +
        geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1) +
        geom_smooth(method = "lm", aes(col = factor(imm10$public)), se = F) +
        geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$public))) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(color='Public school') 

        
print(gg)

Random effect

Calculation of random slope

model2 <- lme4::lmer(data = imm10, math ~ homework + (homework|public)) 

summary(model2)

## Linear mixed model fit by REML ['lmerMod']
## Formula: math ~ homework + (homework | public)
##    Data: imm10
## 
## REML criterion at convergence: 1848.3
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.69605 -0.63601 -0.02561  0.63863  2.56869 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  public   (Intercept) 122.7580 11.0796       
##           homework      0.8922  0.9446  -1.00
##  Residual              70.9827  8.4251       
## Number of obs: 260, groups:  public, 2
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  51.2419     7.9209   6.469
## homework      1.7881     0.7738   2.311
## 
## Correlation of Fixed Effects:
##          (Intr)
## homework -0.917
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

lmerTest::lmer(data = imm10, math ~ homework + (homework|public)) %>% summary

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: math ~ homework + (homework | public)
##    Data: imm10
## 
## REML criterion at convergence: 1848.3
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.69605 -0.63601 -0.02561  0.63863  2.56869 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  public   (Intercept) 122.7580 11.0796       
##           homework      0.8922  0.9446  -1.00
##  Residual              70.9827  8.4251       
## Number of obs: 260, groups:  public, 2
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)  
## (Intercept)  51.2419     7.9209  0.9955   6.469   0.0984 .
## homework      1.7881     0.7738  1.0612   2.311   0.2483  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##          (Intr)
## homework -0.917
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Mixed model with random slope vs Multiple linear regression with effect modification (binary)

imm10$REPredictions <- fitted(model2)
ml_est <- coef(summary(model2))[ , "Estimate"]
ml_se <- coef(summary(model2))[ , "Std. Error"]
ml_est

## (Intercept)    homework 
##   51.241945    1.788118

gg <- ggplot(imm10, aes(y = math, x = homework, col=factor(imm10$public))) +
        geom_smooth(method = "lm", se = F) +
        geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1) +
        geom_point(size = 1.5, alpha = 0.8) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(color='Public school') 
        

print(gg)

# Comparison - all models at once

ggplot(imm10, aes(y = math, x = homework, col=factor(imm10$public))) +
        geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1) +
        geom_parallel_slopes(aes(col = factor(imm10$public)), se =F) + #geom_smooth(method = "lm", se = F) +
        geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1) +
        geom_smooth(method = "lm", se = F) +
        geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1) +
        geom_point(size = 1.5, alpha = 0.8) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(color='Public school')

The sample analysis can be done for a categorical variable

Data filtering

Visualization of models - with more complicated case

Here we are going to consider random effect by different schools

Schools are limited to school2, 3 and 4

imm10 <- imm10_data %>% filter(schnum == 2 | schnum == 3 | schnum == 4 )

ggplot(imm10, aes(y = math, x = homework)) +
        geom_smooth(method = lm, color = "black") +
        geom_point(size = 1.5, alpha = 0.8) +
        theme_classic() + 
        ylab("Math test result") +
        xlab("Hours spent on homework")

ggplot(imm10, aes(y = math, x = homework)) +
        geom_smooth(method = lm, color = "black") +
        geom_point(size = 1.5, alpha = 0.8, aes(colour = as.factor(schnum))) +
        theme_classic() + 
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(color='School')

model1 <- lm(math ~ homework, data = imm10)
imm10$FEPredictions <- fitted(model1)
slm_est <- coef(summary(model1))[ , "Estimate"]
slm_se <- coef(summary(model1))[ , "Std. Error"]

With adjusted model, we can make 3 different models, showing correlation between x and y, adjusted by school (but across all the schools)

library(moderndive)

gg <- ggplot(imm10, aes(y = math, x = homework)) +
        geom_parallel_slopes(aes(col = factor(imm10$schnum)), se =F) +
        geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(col='School') 

        
print(gg)

To compare with result from simple linear regression, there is difference in slope, between adjusted model and simple linear model.

gg <- ggplot(imm10, aes(y = math, x = homework)) +
        geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1) +
        geom_parallel_slopes(aes(col = factor(imm10$schnum)), se =F) +
        geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(col='School') +
        ggtitle("Modeling with Simple linear regression")

        
print(gg)

With schools as random effects? (accounting the schools as average values)

library(lmerTest)
library(lme4)

model2 <- lmer(data = imm10, math ~ homework + (1|schnum))

imm10$FEPredictions <- fitted(model2)
ml_est <- coef(summary(model2))[ , "Estimate"]
ml_se <- coef(summary(model2))[ , "Std. Error"]


gg <- ggplot(imm10, aes(y = math, x = homework)) +
        geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1) +
        geom_parallel_slopes(aes(col = factor(imm10$schnum)), se =F) +
        geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(col='School') +
        ggtitle("Modeling with randeom effect (multi level analysis)")

        
print(gg)

With effect modification…

gg <- ggplot(imm10, aes(y = math, x = homework)) +
        geom_smooth(method = "lm", aes(col = factor(imm10$schnum)), se = F) +
        geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(col='School') 

        
print(gg)

Simple LM + interaction term

gg <- ggplot(imm10, aes(y = math, x = homework)) +
        geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1) +
        geom_smooth(method = "lm", aes(col = factor(imm10$schnum)), se = F) +
        geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(col='School') 

        
print(gg)

Random effect

imm10$schnum <- as.factor(imm10$schnum)

model2 <- lme4::lmer(data = imm10, math ~ homework + (homework|schnum)) 

summary(model2)

## Linear mixed model fit by REML ['lmerMod']
## Formula: math ~ homework + (homework | schnum)
##    Data: imm10
## 
## REML criterion at convergence: 448.4
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.93689 -0.52976 -0.02434  0.54695  2.45935 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  schnum   (Intercept) 49.48    7.035         
##           homework    31.06    5.573    -0.88
##  Residual             48.05    6.932         
## Number of obs: 66, groups:  schnum, 3
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   40.509      4.358   9.296
## homework       3.603      3.287   1.096
## 
## Correlation of Fixed Effects:
##          (Intr)
## homework -0.867

imm10$REPredictions <- fitted(model2)
ml_est <- coef(summary(model2))[ , "Estimate"]
ml_se <- coef(summary(model2))[ , "Std. Error"]
ml_est

## (Intercept)    homework 
##   40.508687    3.602677

gg <- ggplot(imm10, aes(y = math, x = homework, col=factor(imm10$schnum))) +
        geom_smooth(method = "lm", se = F) +
        geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1) +
        geom_point(size = 1.5, alpha = 0.8) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(col='School') 
        

print(gg)

Categorical variable with more factors

Data filtering

imm10 <- imm10_data
imm10

## # A tibble: 260 × 19
##    schid stuid    ses meanses homework white parented public ratio percmin  math
##    <dbl> <dbl>  <dbl>   <dbl>    <dbl> <dbl>    <dbl>  <dbl> <dbl>   <dbl> <dbl>
##  1  7472     3 -0.130  -0.483        1     1        2      1    19       0    48
##  2  7472     8 -0.390  -0.483        0     1        2      1    19       0    48
##  3  7472    13 -0.800  -0.483        0     1        2      1    19       0    53
##  4  7472    17 -0.720  -0.483        1     1        2      1    19       0    42
##  5  7472    27 -0.740  -0.483        2     1        2      1    19       0    43
##  6  7472    28 -0.580  -0.483        1     1        2      1    19       0    57
##  7  7472    30 -0.830  -0.483        5     1        2      1    19       0    33
##  8  7472    36 -0.510  -0.483        1     1        3      1    19       0    64
##  9  7472    37 -0.560  -0.483        1     1        2      1    19       0    36
## 10  7472    42  0.210  -0.483        2     1        3      1    19       0    56
## # ℹ 250 more rows
## # ℹ 8 more variables: sex <dbl>, race <dbl>, sctype <dbl>, cstr <dbl>,
## #   scsize <dbl>, urban <dbl>, region <dbl>, schnum <dbl>

Visualization of models

ggplot(imm10, aes(y = math, x = homework)) +
        geom_smooth(method = lm, color = "black") +
        geom_point(size = 1.5, alpha = 0.8) +
        theme_classic() + 
        ylab("Math test result") +
        ylab("Hours spent on homework") +
        labs(color='School')

ggplot(imm10, aes(y = math, x = homework)) +
        geom_smooth(method = lm, color = "black") +
        geom_point(size = 1.5, alpha = 0.8, aes(colour = as.factor(schnum))) +
        theme_classic() + 
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(color='School')

model1 <- lm(math ~ homework, data = imm10)
imm10$FEPredictions <- fitted(model1)
slm_est <- coef(summary(model1))[ , "Estimate"]
slm_se <- coef(summary(model1))[ , "Std. Error"]

library(moderndive)

gg <- ggplot(imm10, aes(y = math, x = homework)) +
        geom_parallel_slopes(aes(col = factor(imm10$schnum)), se =F) +
        geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(col='School') 

        
print(gg)

gg <- ggplot(imm10, aes(y = math, x = homework)) +
        geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1) +
        geom_parallel_slopes(aes(col = factor(imm10$schnum)), se =F) +
        geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(col='School') +
        ggtitle("Modeling with Simple linear regression")

        
print(gg)

library(lmerTest)
library(lme4)

model2 <- lmer(data = imm10, math ~ homework + (1|schnum))

imm10$FEPredictions <- fitted(model2)
ml_est <- coef(summary(model2))[ , "Estimate"]
ml_se <- coef(summary(model2))[ , "Std. Error"]


gg <- ggplot(imm10, aes(y = math, x = homework)) +
        geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1) +
        geom_parallel_slopes(aes(col = factor(imm10$schnum)), se =F) +
        geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(col='School') +
        ggtitle("Modeling with randeom effect (multi level analysis)")

        
print(gg)

With effect modification…

gg <- ggplot(imm10, aes(y = math, x = homework)) +
        geom_smooth(method = "lm", aes(col = factor(imm10$schnum)), se = F) +
        geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(col='School') 

        
print(gg)

Simple LM + interaction term

gg <- ggplot(imm10, aes(y = math, x = homework)) +
        geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1) +
        geom_smooth(method = "lm", aes(col = factor(imm10$schnum)), se = F) +
        geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(col='School') 

        
print(gg)

Random effect

imm10$schnum <- as.factor(imm10$schnum)

model2 <- lme4::lmer(data = imm10, math ~ homework + (homework|schnum)) 

summary(model2)

## Linear mixed model fit by REML ['lmerMod']
## Formula: math ~ homework + (homework | schnum)
##    Data: imm10
## 
## REML criterion at convergence: 1764
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.5110 -0.5357  0.0175  0.6121  2.5708 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  schnum   (Intercept) 69.31    8.325         
##           homework    22.45    4.738    -0.81
##  Residual             43.07    6.563         
## Number of obs: 260, groups:  schnum, 10
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   44.771      2.744  16.318
## homework       2.040      1.554   1.313
## 
## Correlation of Fixed Effects:
##          (Intr)
## homework -0.804

imm10$REPredictions <- fitted(model2)
ml_est <- coef(summary(model2))[ , "Estimate"]
ml_se <- coef(summary(model2))[ , "Std. Error"]
ml_est

## (Intercept)    homework 
##   44.770593    2.040465

gg <- ggplot(imm10, aes(y = math, x = homework, col=factor(imm10$schnum))) +
        geom_smooth(method = "lm", se = F) +
        geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1) +
        geom_point(size = 1.5, alpha = 0.8) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(col='School') 
        

print(gg)

Simple LM + interaction term

gg <- ggplot(imm10, aes(y = math, x = homework)) +
        geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1) +
        geom_smooth(method = "lm", aes(col = factor(imm10$schnum)), se = F) +
        geom_point(size = 1.5, alpha = 0.8, aes(color=factor(imm10$schnum))) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(col='School') 

        
print(gg)

Continuous random effect - scool size

Data filtering

imm10 <- imm10_data
imm10

## # A tibble: 260 × 19
##    schid stuid    ses meanses homework white parented public ratio percmin  math
##    <dbl> <dbl>  <dbl>   <dbl>    <dbl> <dbl>    <dbl>  <dbl> <dbl>   <dbl> <dbl>
##  1  7472     3 -0.130  -0.483        1     1        2      1    19       0    48
##  2  7472     8 -0.390  -0.483        0     1        2      1    19       0    48
##  3  7472    13 -0.800  -0.483        0     1        2      1    19       0    53
##  4  7472    17 -0.720  -0.483        1     1        2      1    19       0    42
##  5  7472    27 -0.740  -0.483        2     1        2      1    19       0    43
##  6  7472    28 -0.580  -0.483        1     1        2      1    19       0    57
##  7  7472    30 -0.830  -0.483        5     1        2      1    19       0    33
##  8  7472    36 -0.510  -0.483        1     1        3      1    19       0    64
##  9  7472    37 -0.560  -0.483        1     1        2      1    19       0    36
## 10  7472    42  0.210  -0.483        2     1        3      1    19       0    56
## # ℹ 250 more rows
## # ℹ 8 more variables: sex <dbl>, race <dbl>, sctype <dbl>, cstr <dbl>,
## #   scsize <dbl>, urban <dbl>, region <dbl>, schnum <dbl>

The same analysis but with a continuous variable

Visualization of models

ggplot(imm10, aes(y = math, x = homework)) +
        geom_smooth(method = lm, color = "black") +
        geom_point(size = 1.5, alpha = 0.8) +
        theme_classic() + 
        ylab("Math test result") +
        ylab("Hours spent on homework") +
        labs(color='School')

ggplot(imm10, aes(y = math, x = homework)) +
        geom_smooth(method = lm, color = "black") +
        geom_point(size = 1.5, alpha = 0.8, aes(colour = meanses)) +
        theme_classic() +
        scale_color_viridis_c() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(color='Mean socio economic status (by schoool)')

model1 <- lm(math ~ homework + meanses, data = imm10)
imm10$FEPredictions <- fitted(model1)
slm_est <- coef(summary(model1))[ , "Estimate"]
slm_se <- coef(summary(model1))[ , "Std. Error"]

gg <- ggplot(imm10, aes(y = math, x = homework)) +
        facet_wrap(~meanses) + 
        geom_smooth(method = "lm", se =F) +
        geom_point(size = 1.5, alpha = 0.8) +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(col='School') 
print(gg)

Grey is regular simple linear regression, and red is regular mixed linear regression

model2 <- lmer(data = imm10, math ~ homework + (1|meanses))

imm10$FEPredictions <- fitted(model2)
ml_est <- coef(summary(model2))[ , "Estimate"]
ml_se <- coef(summary(model2))[ , "Std. Error"]



gg <- ggplot(imm10, aes(y = math, x = homework)) +
        facet_wrap(~meanses) + 
        geom_smooth(method = "lm", se =F) +
        geom_point(size = 1.5, alpha = 0.8) +
        geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1, col = "grey") +
        geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1, col = "red") +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(col='School') 
print(gg)

The ultimate model

multivariate, random mixed effect model

imm10 <- imm10_data 
model5 <- lmer(math ~ homework + ses + meanses + (homework|schnum) + (meanses|region), REML=FALSE, data = imm10)
summary(model5)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: math ~ homework + ses + meanses + (homework | schnum) + (meanses |  
##     region)
##    Data: imm10
## 
##      AIC      BIC   logLik deviance df.resid 
##   1758.0   1797.2   -868.0   1736.0      249 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.65746 -0.67499  0.03456  0.63597  2.65138 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr 
##  schnum   (Intercept) 4.920e+01 7.014486      
##           homework    1.708e+01 4.132667 -0.99
##  region   (Intercept) 0.000e+00 0.000000      
##           meanses     1.666e-06 0.001291  NaN 
##  Residual             4.132e+01 6.428340      
## Number of obs: 260, groups:  schnum, 10; region, 3
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  48.0222     2.3525   9.9707  20.413 1.83e-09 ***
## homework      1.8024     1.3600   9.6920   1.325 0.215476    
## ses           2.3765     0.6359 239.2027   3.737 0.000233 ***
## meanses       6.2304     1.0196  11.6230   6.110 6.02e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##          (Intr) homwrk ses   
## homework -0.973              
## ses       0.042 -0.037       
## meanses   0.066 -0.009 -0.611
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

imm10$REPredictions <- fitted(model5)
ml_est <- coef(summary(model5))[ , "Estimate"]
ml_se <- coef(summary(model5))[ , "Std. Error"]
ml_est

## (Intercept)    homework         ses     meanses 
##   48.022201    1.802384    2.376458    6.230416

gg <- ggplot(imm10, aes(y = math, x = homework)) +
        facet_wrap(~meanses) + 
        geom_smooth(method = "lm", se =F) +
        geom_point(size = 1.5, alpha = 0.8) +
        geom_abline(slope = slm_est[2], intercept = slm_est[1], size=1, col = "grey") +
        geom_abline(slope = ml_est[2], intercept = ml_est[1], size=1, col = "red") +
        theme_classic() +
        ylab("Math test result") +
        xlab("Hours spent on homework") +
        labs(col='School') 
print(gg)

Rather then homework, the socio economic status is more important.

Bibliography

## Computing. R Foundation for Statistical Computing, Vienna, Austria. <https://www.R-project.org/>. We have invested a lot of time and effort in creating R, please cite it when using it for data analysis. See also 'citation("pkgname")' for citing R packages.
## Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). "Welcome to the tidyverse." Journal of Open Source Software_, *4*(43), 1686. doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
## R. version 0.5.0. Buffalo, New York. http://github.com/trinker/pacman
## J, reikoch, Beasley W, O'Connor B, Warnes GR, Quinn M, Kamvar ZN, Gao C (2024). yaml: Methods to Convert R Data to YAML and Back_. R package version 2.3.10, <https://CRAN.R-project.org/package=yaml>. ATTENTION: This citation information has been auto-generated from the package DESCRIPTION file and may need manual editing, see 'help("citation")'.
## version 0.4.4, <https://CRAN.R-project.org/package=reactable>.