Data Modeling and Imputation for Composite AERA Study

Nirmal Ghimire, Ph.D. https://www.linkedin.com/in/nirmal-ghimire-5b96a034/ (K-16 Literacy Center at University of Texas at Tyler)https://www.uttyler.edu/education/literacy-center/
2023-10-10
Show code
library(haven)
library(ggplot2)
library(tidyverse)
library(dplyr)
library(janitor)
# library(car)# For Levene's Test
library(compute.es) # For effect sizes
library(effects) # for adjusted means
library(pastecs) # for descriptive statistics
# library(multcomp) # For post hoc test
# library(WRS2) # for robust tests
# library(lme4) # To conduct Hierarchical Modeling
# library(lmerTest) # To provide p-values in type I, II, or III anova summary tables for linear mixed models
library(geomtextpath)
library(lavaan)
library(lavaanPlot)
library(tidySEM)
library(semTable)
library(mice)
library(ggmice)
library(broom.mixed)

A. Loading Data and Getting them Ready for Analysis

1. School, Student Background, and Student Home ICT Data Together

Show code
# Loading Student and School Data
ss_data <- read.csv("C:/Users/nghimire/OneDrive - University of Texas at Tyler/Documents/edsurvey_PISA_USA/SS_final_data.csv",
  as.is = TRUE
)
# Changing character vectors to factors
ss_data <- ss_data %>%
  mutate_if(is.character, as.factor) |>
  dplyr::select(GENDER, ETHNICITY, LANGN, COB_MOM, COB_DAD, MISCED, FISCED, COM_HOM, INTERNET, ICTHOME, SCH_LOCA, SCH_TYPE, FRPL, SCH_FRPL, SCH_ELS, SCH_DISA, EXT_CLAS, READ_SCR)

# Getting rid of unnecessary variables


# names(ss_data)
str(ss_data)
'data.frame':   4838 obs. of  18 variables:
 $ GENDER   : Factor w/ 2 levels "Female","Male": 2 1 2 2 2 2 1 1 1 2 ...
 $ ETHNICITY: Factor w/ 6 levels "Asian","Black or African American",..: 5 6 6 6 6 6 5 6 3 6 ...
 $ LANGN    : Factor w/ 3 levels "Another language (USA)",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ COB_MOM  : Factor w/ 2 levels "Other country",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ COB_DAD  : Factor w/ 2 levels "Other country",..: 2 2 2 2 2 2 2 2 1 2 ...
 $ MISCED   : Factor w/ 6 levels "ISCED 1","ISCED 2",..: 5 3 3 2 5 3 2 4 5 4 ...
 $ FISCED   : Factor w/ 6 levels "ISCED 1","ISCED 2",..: 3 3 4 3 4 3 3 5 3 4 ...
 $ COM_HOM  : Factor w/ 2 levels "No","Yes": 2 1 2 2 2 2 2 2 2 2 ...
 $ INTERNET : Factor w/ 3 levels "No","Yes, and I use it",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ ICTHOME  : int  10 5 9 6 12 11 9 10 6 9 ...
 $ SCH_LOCA : Factor w/ 5 levels "city","large_city",..: 5 5 5 5 5 5 5 5 5 5 ...
 $ SCH_TYPE : Factor w/ 2 levels "Private","Public": 2 2 2 2 2 2 2 2 2 2 ...
 $ FRPL     : Factor w/ 5 levels "<10% FRPL students",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ SCH_FRPL : int  79 79 79 79 79 79 79 79 79 79 ...
 $ SCH_ELS  : int  5 5 5 5 5 5 5 5 5 5 ...
 $ SCH_DISA : int  37 37 37 37 37 37 37 37 37 37 ...
 $ EXT_CLAS : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
 $ READ_SCR : num  544 432 504 438 536 ...
Show code
summary(ss_data)
    GENDER                         ETHNICITY   
 Female:2376   Asian                    : 284  
 Male  :2462   Black or African American: 769  
               Hispanic or Latino       :1221  
               Other                    :  47  
               Two or More Race         : 367  
               White, not Hispanic      :2097  
               NA's                     :  53  
                    LANGN               COB_MOM    
 Another language (USA): 219   Other country:1214  
 English               :4054   United States:3524  
 Spanish               : 517   NA's         : 100  
 NA's                  :  48                       
                                                   
                                                   
                                                   
          COB_DAD                   MISCED    
 Other country:1254   ISCED 1          : 104  
 United States:3449   ISCED 2          : 387  
 NA's         : 135   ISCED 3A, ISCED 4:1447  
                      ISCED 5A, 6      :2045  
                      ISCED 5B         : 688  
                      None             :  74  
                      NA's             :  93  
               FISCED     COM_HOM                        INTERNET   
 ISCED 1          : 118   No  : 604   No                     : 182  
 ISCED 2          : 448   Yes :4170   Yes, and I use it      :4261  
 ISCED 3A, ISCED 4:1784   NA's:  64   Yes, but I don’t use it: 139  
 ISCED 5A, 6      :1674               NA's                   : 256  
 ISCED 5B         : 535                                             
 None             :  93                                             
 NA's             : 186                                             
    ICTHOME             SCH_LOCA       SCH_TYPE   
 Min.   : 1.000   city      :1161   Private: 176  
 1st Qu.: 8.000   large_city: 655   Public :4206  
 Median :10.000   rural     : 302   NA's   : 456  
 Mean   : 9.413   small_town: 846                 
 3rd Qu.:11.000   town      :1453                 
 Max.   :12.000   NA's      : 421                 
 NA's   :192                                      
                     FRPL         SCH_FRPL         SCH_ELS     
 <10% FRPL students    : 125   Min.   :  1.00   Min.   : 0.00  
 >75% FRPL students    : 903   1st Qu.: 29.00   1st Qu.: 4.00  
 10-24.9% FRPL students: 505   Median : 45.00   Median :10.00  
 25-50% FRPL students  :1416   Mean   : 46.62   Mean   :19.29  
 50-74.9% FRPL students:1143   3rd Qu.: 65.00   3rd Qu.:24.00  
 NA's                  : 746   Max.   :100.00   Max.   :92.00  
                               NA's   :538      NA's   :942    
    SCH_DISA      EXT_CLAS       READ_SCR    
 Min.   :  0.00   No  :3025   Min.   :157.3  
 1st Qu.: 10.00   Yes :1187   1st Qu.:425.9  
 Median : 13.00   NA's: 626   Median :504.8  
 Mean   : 13.59               Mean   :500.6  
 3rd Qu.: 19.00               3rd Qu.:578.4  
 Max.   :100.00               Max.   :810.5  
 NA's   :553                                 

Checking for Assumptions of Multivariate Normality and Outliers for SEM Study

Mardia’s Test of Multivariate Normality

Show code
ss_num <- dplyr::select(ss_data, SCH_ELS, SCH_FRPL, ICTHOME, READ_SCR, SCH_DISA)
library(QuantPsyc)
mult.norm(ss_num)$mult.test
          Beta-hat       kappa        p-val
Skewness  4.538961 2830.042115 0.000000e+00
Kurtosis 37.072662    7.576058 3.552714e-14
Show code
ss_num$ss_maha <- mahalanobis(ss_num, colMeans(ss_num), cov(ss_num))
ss_num$maha_p <- pchisq(ss_num$ss_maha, df = 17, lower.tail = FALSE)
ss_high_maha <- ss_num %>%
  filter(maha_p <= .001) %>%
  arrange(maha_p)

ss_high_maha
[1] SCH_ELS  SCH_FRPL ICTHOME  READ_SCR SCH_DISA ss_maha  maha_p  
<0 rows> (or 0-length row.names)
Show code
nrow(ss_high_maha)
[1] 0

Checking missing data pattern

Show code
VIM::aggr(ss_data, col = c("dodgerblue", "maroon"), numbers = TRUE, sortVars = TRUE, labels = names(ss_data), cex.axis = .7, gap = 3, ylab = c("Histogram of missing data", "Pattern"))


 Variables sorted by number of missings: 
  Variable       Count
   SCH_ELS 0.194708557
      FRPL 0.154195949
  EXT_CLAS 0.129392311
  SCH_DISA 0.114303431
  SCH_FRPL 0.111202976
  SCH_TYPE 0.094253824
  SCH_LOCA 0.087019430
  INTERNET 0.052914427
   ICTHOME 0.039685821
    FISCED 0.038445639
   COB_DAD 0.027904093
   COB_MOM 0.020669698
    MISCED 0.019222819
   COM_HOM 0.013228607
 ETHNICITY 0.010954940
     LANGN 0.009921455
    GENDER 0.000000000
  READ_SCR 0.000000000
Show code
# summary(checking_missing_data, plot=FALSE)

naniar::mcar_test(ss_data)
# A tibble: 1 × 4
  statistic    df p.value missing.patterns
      <dbl> <dbl>   <dbl>            <int>
1     7307.  1561       0              118
Show code
# Imputing
new_ssdata <- mice::mice(ss_data[], m = 5, maxit = 50, method = "pmm", seed = 0411, printFlag = FALSE)

Plotting the imputed vs original data

Show code
densityplot(new_ssdata, ~COB_MOM)
Show code
ggmice::ggmice(
  new_ssdata,
  ggplot2::aes(x = SCH_FRPL, y = READ_SCR)
) +
  ggplot2::geom_jitter(width = 0.25) +
  ggplot2::labs(x = "School FRPL Percentage")

Pooled {lm} on New SS Data

Show code
# Running lm model on Pooled data
ss_pooled <- with(new_ssdata, lm(READ_SCR ~ SCH_ELS + FRPL + EXT_CLAS + SCH_DISA + SCH_FRPL + SCH_TYPE + SCH_LOCA + INTERNET + ICTHOME + FISCED + COB_DAD + COB_MOM + MISCED + COM_HOM + ETHNICITY + LANGN + GENDER))
summary(pool(ss_pooled))
                                 term     estimate  std.error
1                         (Intercept) 517.54175286 19.4968704
2                             SCH_ELS   0.01737201  0.1052718
3              FRPL>75% FRPL students -46.39849259 19.1809653
4          FRPL10-24.9% FRPL students  -8.96139826 10.2666687
5            FRPL25-50% FRPL students -25.86839079 10.7722291
6          FRPL50-74.9% FRPL students -32.49891922 13.7992211
7                         EXT_CLASYes  -3.84912894  3.5885261
8                            SCH_DISA  -0.37179565  0.2233463
9                            SCH_FRPL  -0.13717641  0.1952642
10                     SCH_TYPEPublic   3.23893828  8.4101723
11                 SCH_LOCAlarge_city  -4.98429830  4.9124647
12                      SCH_LOCArural -32.94104357  6.4002975
13                 SCH_LOCAsmall_town -13.41288070  4.6306869
14                       SCH_LOCAtown  -7.16386438  3.9293962
15          INTERNETYes, and I use it  41.31088320  8.2614903
16    INTERNETYes, but I don’t use it -38.45477873 11.2191608
17                            ICTHOME  -2.60909168  0.8246360
18                      FISCEDISCED 2   6.38531305 10.4530367
19            FISCEDISCED 3A, ISCED 4  12.34183379 10.0896338
20                  FISCEDISCED 5A, 6  26.66722281 10.5152958
21                     FISCEDISCED 5B   8.35050527 11.1840184
22                         FISCEDNone   8.75753429 13.6444762
23               COB_DADUnited States  -6.79950822  4.9662579
24               COB_MOMUnited States -15.57301062  5.1544082
25                      MISCEDISCED 2 -13.89954941 11.0372505
26            MISCEDISCED 3A, ISCED 4   8.97243343 10.5666228
27                  MISCEDISCED 5A, 6  15.61112110 10.7935044
28                     MISCEDISCED 5B   6.50082429 11.1914206
29                         MISCEDNone   7.51668266 14.8870925
30                         COM_HOMYes  30.15765844  4.7544997
31 ETHNICITYBlack or African American -78.47958781  7.9074566
32        ETHNICITYHispanic or Latino -47.34367146  7.6148080
33                     ETHNICITYOther -69.09352857 15.0653499
34          ETHNICITYTwo or More Race -40.96166776  8.6407991
35       ETHNICITYWhite, not Hispanic -17.01495577  7.6685743
36                       LANGNEnglish  34.87258385  7.8448429
37                       LANGNSpanish   9.04209931  9.3149949
38                         GENDERMale -25.16645529  2.7340136
    statistic         df       p.value
1  26.5448630 1271.08465 6.840684e-124
2   0.1650206   18.33173  8.707339e-01
3  -2.4189863   13.49010  3.035289e-02
4  -0.8728633   64.08763  3.859963e-01
5  -2.4013963   45.78419  2.045066e-02
6  -2.3551271   25.67038  2.644750e-02
7  -1.0726211  119.77129  2.855976e-01
8  -1.6646602  108.57361  9.886392e-02
9  -0.7025171   15.57549  4.927234e-01
10  0.3851215  184.08598  7.005920e-01
11 -1.0146227  126.20056  3.122270e-01
12 -5.1467988  769.82281  3.366554e-07
13 -2.8965208  317.46104  4.035257e-03
14 -1.8231464  383.76699  6.905905e-02
15  5.0004154   74.77914  3.668608e-06
16 -3.4275985  277.27873  7.013392e-04
17 -3.1639315   93.20280  2.102218e-03
18  0.6108572  389.48504  5.416504e-01
19  1.2232192  308.04933  2.221818e-01
20  2.5360411  225.78335  1.188723e-02
21  0.7466462  143.96211  4.564947e-01
22  0.6418373  486.47837  5.212811e-01
23 -1.3691412 2620.46898  1.710725e-01
24 -3.0212994 1095.06671  2.575359e-03
25 -1.2593308 1010.33267  2.082017e-01
26  0.8491297 1040.40930  3.960044e-01
27  1.4463441  844.93751  1.484516e-01
28  0.5808757  442.11776  5.616202e-01
29  0.5049127  956.17903  6.137365e-01
30  6.3429719  474.19808  5.263918e-10
31 -9.9247574 2397.39575  8.874992e-23
32 -6.2173165 4189.59123  5.551071e-10
33 -4.5862545 4202.97596  4.643989e-06
34 -4.7404953 1088.78288  2.414457e-06
35 -2.2187900  583.47482  2.688502e-02
36  4.4452877 2116.11973  9.231248e-06
37  0.9707036 1677.68645  3.318358e-01
38 -9.2049488 3110.79788  6.105696e-20

Breaking Imputed Data in Individual Sets

Show code
ss_data_1 <- complete(new_ssdata, 1)
ss_data_2 <- complete(new_ssdata, 2)
ss_data_3 <- complete(new_ssdata, 3)
ss_data_4 <- complete(new_ssdata, 4)
ss_data_5 <- complete(new_ssdata, 5)

LM on Dataset 1

Show code
ss_1 <- with(ss_data_1, lm(READ_SCR ~ SCH_ELS + FRPL + EXT_CLAS + SCH_DISA + SCH_FRPL + SCH_TYPE + SCH_LOCA + INTERNET + ICTHOME + FISCED + COB_DAD + COB_MOM + MISCED + COM_HOM + ETHNICITY + LANGN + GENDER))
# summary(ss_1)

LM on Dataset 2

Show code
ss_2 <- with(ss_data_2, lm(READ_SCR ~ SCH_ELS + FRPL + EXT_CLAS + SCH_DISA + SCH_FRPL + SCH_TYPE + SCH_LOCA + INTERNET + ICTHOME + FISCED + COB_DAD + COB_MOM + MISCED + COM_HOM + ETHNICITY + LANGN + GENDER))
# summary(ss_2)

LM on Dataset 3

Show code
ss_3 <- with(ss_data_3, lm(READ_SCR ~ SCH_ELS + FRPL + EXT_CLAS + SCH_DISA + SCH_FRPL + SCH_TYPE + SCH_LOCA + INTERNET + ICTHOME + FISCED + COB_DAD + COB_MOM + MISCED + COM_HOM + ETHNICITY + LANGN + GENDER))
# summary(ss_3)

LM on Dataset 4

Show code
ss_4 <- with(ss_data_4, lm(READ_SCR ~ SCH_ELS + FRPL + EXT_CLAS + SCH_DISA + SCH_FRPL + SCH_TYPE + SCH_LOCA + INTERNET + ICTHOME + FISCED + COB_DAD + COB_MOM + MISCED + COM_HOM + ETHNICITY + LANGN + GENDER))
# summary(ss_4)

LM on Dataset 5

Show code
ss_5 <- with(ss_data_5, lm(READ_SCR ~ SCH_ELS + FRPL + EXT_CLAS + SCH_DISA + SCH_FRPL + SCH_TYPE + SCH_LOCA + INTERNET + ICTHOME + FISCED + COB_DAD + COB_MOM + MISCED + COM_HOM + ETHNICITY + LANGN + GENDER))
# summary(ss_5)
summary(ss_data_5)
    GENDER                         ETHNICITY   
 Female:2376   Asian                    : 285  
 Male  :2462   Black or African American: 785  
               Hispanic or Latino       :1229  
               Other                    :  49  
               Two or More Race         : 370  
               White, not Hispanic      :2120  
                    LANGN               COB_MOM    
 Another language (USA): 220   Other country:1246  
 English               :4096   United States:3592  
 Spanish               : 522                       
                                                   
                                                   
                                                   
          COB_DAD                   MISCED    
 Other country:1293   ISCED 1          : 107  
 United States:3545   ISCED 2          : 396  
                      ISCED 3A, ISCED 4:1470  
                      ISCED 5A, 6      :2083  
                      ISCED 5B         : 708  
                      None             :  74  
               FISCED     COM_HOM                       INTERNET   
 ISCED 1          : 123   No : 618   No                     : 207  
 ISCED 2          : 470   Yes:4220   Yes, and I use it      :4477  
 ISCED 3A, ISCED 4:1860              Yes, but I don’t use it: 154  
 ISCED 5A, 6      :1731                                            
 ISCED 5B         : 556                                            
 None             :  98                                            
    ICTHOME             SCH_LOCA       SCH_TYPE   
 Min.   : 1.000   city      :1278   Private: 203  
 1st Qu.: 8.000   large_city: 740   Public :4635  
 Median :10.000   rural     : 330                 
 Mean   : 9.411   small_town: 911                 
 3rd Qu.:11.000   town      :1579                 
 Max.   :12.000                                   
                     FRPL         SCH_FRPL         SCH_ELS     
 <10% FRPL students    : 145   Min.   :  1.00   Min.   : 0.00  
 >75% FRPL students    :1046   1st Qu.: 29.00   1st Qu.: 4.00  
 10-24.9% FRPL students: 615   Median : 45.00   Median :10.00  
 25-50% FRPL students  :1655   Mean   : 47.01   Mean   :18.92  
 50-74.9% FRPL students:1377   3rd Qu.: 65.00   3rd Qu.:23.50  
                               Max.   :100.00   Max.   :92.00  
    SCH_DISA      EXT_CLAS      READ_SCR    
 Min.   :  0.00   No :3471   Min.   :157.3  
 1st Qu.: 10.00   Yes:1367   1st Qu.:425.9  
 Median : 13.00              Median :504.8  
 Mean   : 13.56              Mean   :500.6  
 3rd Qu.: 19.00              3rd Qu.:578.4  
 Max.   :100.00              Max.   :810.5  
Show code
str(ss_data_5)
'data.frame':   4838 obs. of  18 variables:
 $ GENDER   : Factor w/ 2 levels "Female","Male": 2 1 2 2 2 2 1 1 1 2 ...
 $ ETHNICITY: Factor w/ 6 levels "Asian","Black or African American",..: 5 6 6 6 6 6 5 6 3 6 ...
 $ LANGN    : Factor w/ 3 levels "Another language (USA)",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ COB_MOM  : Factor w/ 2 levels "Other country",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ COB_DAD  : Factor w/ 2 levels "Other country",..: 2 2 2 2 2 2 2 2 1 2 ...
 $ MISCED   : Factor w/ 6 levels "ISCED 1","ISCED 2",..: 5 3 3 2 5 3 2 4 5 4 ...
 $ FISCED   : Factor w/ 6 levels "ISCED 1","ISCED 2",..: 3 3 4 3 4 3 3 5 3 4 ...
 $ COM_HOM  : Factor w/ 2 levels "No","Yes": 2 1 2 2 2 2 2 2 2 2 ...
 $ INTERNET : Factor w/ 3 levels "No","Yes, and I use it",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ ICTHOME  : int  10 5 9 6 12 11 9 10 6 9 ...
 $ SCH_LOCA : Factor w/ 5 levels "city","large_city",..: 5 5 5 5 5 5 5 5 5 5 ...
 $ SCH_TYPE : Factor w/ 2 levels "Private","Public": 2 2 2 2 2 2 2 2 2 2 ...
 $ FRPL     : Factor w/ 5 levels "<10% FRPL students",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ SCH_FRPL : int  79 79 79 79 79 79 79 79 79 79 ...
 $ SCH_ELS  : int  5 5 5 5 5 5 5 5 5 5 ...
 $ SCH_DISA : int  37 37 37 37 37 37 37 37 37 37 ...
 $ EXT_CLAS : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
 $ READ_SCR : num  544 432 504 438 536 ...
Show code
write.csv(ss_data_5, file = "s_data.csv", row.names = FALSE)
# save(ss_data_5, file = "s_data.RData")

2. Teacher Data

Show code
teacher_final <- read.csv("C:/Users/nghimire/OneDrive - University of Texas at Tyler/Documents/edsurvey_PISA_USA/t_final.csv",
  as.is = TRUE
)
# names(teacher_final)
# Changing class of character variables
teacher_final <- data.frame(lapply(teacher_final, function(b) if (is.character(b)) as.factor(b) else b))

# Getting rid of unnecessary column
teacher_final <- teacher_final[, -c(1, 8, 12, 13)]

# summary(teacher_final)
# str(teacher_final)

# Teacher Education Data
teacher_education <- read.csv("C:/Users/nghimire/OneDrive - University of Texas at Tyler/Documents/edsurvey_PISA_USA/teacher_education.csv",
  as.is = TRUE
)
options(scipen = 999)
# Getting rid of unnecessary column
teach_edu <- teacher_education[, -1]
# Changing the class of teacher_education
# teach_edu$teacher_education <- as.factor(teach_edu$teacher_education)


# Getting Rid of First Four Digits from 'teacher_id' and 'school_id'
teach_edu$school_id <- stringr::str_sub(teach_edu$CNTSCHID, -4, -1)
teach_edu$teacher_id <- stringr::str_sub(teach_edu$CNTTCHID, -4, -1)

# Getting Rid of Leading Zeros
teach_edu$school_id <- stringr::str_remove(teach_edu$school_id, "^0+")
teach_edu$teacher_id <- stringr::str_remove(teach_edu$teacher_id, "^0+")

# Changing the class
teach_edu <- teach_edu |>
  dplyr::select(school_id, teacher_id, teacher_education) |>
  mutate(
    school_id = as.integer(school_id),
    teacher_id = as.integer(teacher_id),
    highest_education = as.factor(teacher_education)
  )

# str(teach_edu)

# Merging the data sets
t_data <- merge(teach_edu, teacher_final, by = c("school_id", "teacher_id"))
t_data <- t_data |>
  dplyr::select(school_id, teacher_id, gender, age, teacher_type, full_time, highest_education, teaching_exp, teacher_edu, initial_qual, prof_dev, reading_score)

str(t_data)
'data.frame':   3526 obs. of  12 variables:
 $ school_id        : int  1 1 1 1 1 1 1 1 1 1 ...
 $ teacher_id       : int  1269 1293 135 1379 1400 1445 1631 1829 2114 2115 ...
 $ gender           : Factor w/ 2 levels "Female","Male": NA 2 1 2 2 1 2 1 1 1 ...
 $ age              : int  NA 55 45 45 33 60 49 50 56 44 ...
 $ teacher_type     : Factor w/ 2 levels "General Teacher",..: NA 2 1 2 1 2 1 1 1 2 ...
 $ full_time        : Factor w/ 2 levels "Full-Time","Part-Time": NA 1 1 1 1 1 1 1 1 1 ...
 $ highest_education: Factor w/ 5 levels "Associate’s degree",..: NA 5 5 2 5 5 5 5 2 5 ...
 $ teaching_exp     : int  NA 24 16 16 11 32 20 21 29 23 ...
 $ teacher_edu      : Factor w/ 3 levels "None. No Teacher Education",..: NA 3 3 3 3 3 3 3 3 3 ...
 $ initial_qual     : Factor w/ 5 levels "In-service: Teacher Training Program",..: NA 3 3 3 3 3 3 3 3 3 ...
 $ prof_dev         : Factor w/ 2 levels "Didn't Participate",..: NA 2 2 2 2 2 2 2 2 2 ...
 $ reading_score    : num  510 510 510 510 510 ...
Show code
summary(t_data)
   school_id        teacher_id        gender          age       
 Min.   :  1.00   Min.   :   1.0   Female:1784   Min.   :20.00  
 1st Qu.: 47.00   1st Qu.: 942.2   Male  :1019   1st Qu.:34.00  
 Median : 88.00   Median :1888.5   NA's  : 723   Median :42.00  
 Mean   : 88.29   Mean   :1889.0                 Mean   :42.87  
 3rd Qu.:131.00   3rd Qu.:2837.8                 3rd Qu.:51.00  
 Max.   :175.00   Max.   :3779.0                 Max.   :70.00  
                                                 NA's   :700    
                   teacher_type      full_time   
 General Teacher         :1812   Full-Time:2773  
 Reading-Language Teacher:1049   Part-Time:  60  
 NA's                    : 665   NA's     : 693  
                                                 
                                                 
                                                 
                                                 
                               highest_education  teaching_exp  
 Associate’s degree                     :  12    Min.   : 0.00  
 Bachelor’s degree                      :1013    1st Qu.: 7.00  
 Doctoral or professional degree        :  94    Median :14.00  
 High school and/or some college courses:  11    Mean   :14.95  
 Master’s degree                        :1700    3rd Qu.:21.00  
 NA's                                   : 696    Max.   :50.00  
                                                 NA's   :741    
                          teacher_edu  
 None. No Teacher Education     : 229  
 Program of 1 Year or Less      : 577  
 Yes, Program Longer than 1 Year:2022  
 NA's                           : 698  
                                       
                                       
                                       
                                          initial_qual 
 In-service: Teacher Training Program           : 113  
 Other                                          : 142  
 Pre-service: Strandard Teacher Training Program:2303  
 Training on Another Pedagogical Profession     :  31  
 Work-Based Teacher Training                    : 241  
 NA's                                           : 696  
                                                       
               prof_dev    reading_score  
 Didn't Participate:  51   Min.   :260.3  
 Participated      :2753   1st Qu.:468.2  
 NA's              : 722   Median :502.7  
                           Mean   :505.1  
                           3rd Qu.:537.1  
                           Max.   :608.0  
                                          

Mardia Test for Teacher Data

Show code
teach_numeric <- dplyr::select(t_data, age, teaching_exp, reading_score)
mult.norm(teach_numeric)$mult.test
         Beta-hat    kappa p-val
Skewness  2.75807 1276.527     0
Kurtosis 17.33922   11.253     0
Show code
## Mahalanobis Distance
teach_numeric$mahal <- mahalanobis(teach_numeric, colMeans(teach_numeric), cov(teach_numeric))
teach_numeric$p <- pchisq(teach_numeric$mahal, df = 3, lower.tail = FALSE)
high_maha_teacher <- teach_numeric %>%
  dplyr::filter(p <= .001) %>%
  arrange(p)
high_maha_teacher
[1] age           teaching_exp  reading_score mahal        
[5] p            
<0 rows> (or 0-length row.names)
Show code
# rowsum(high_maha_teacher)

Imputing the data in teacher_education column

Show code
options(scipen = 999)
# Plotting the missing pattern
# plot_pattern(t_data)
# Or
checking_missing_data <- VIM::aggr(t_data, col = c("dodgerblue", "maroon"), numbers = TRUE, sortVars = TRUE, labels = names(t_data), cex.axis = .7, gap = 3, ylab = c("Histogram of missing data", "Pattern"))


 Variables sorted by number of missings: 
          Variable     Count
      teaching_exp 0.2101531
            gender 0.2050482
          prof_dev 0.2047646
               age 0.1985252
       teacher_edu 0.1979580
 highest_education 0.1973908
      initial_qual 0.1973908
         full_time 0.1965400
      teacher_type 0.1885990
         school_id 0.0000000
        teacher_id 0.0000000
     reading_score 0.0000000
Show code
summary(checking_missing_data, plot = FALSE)

 Missings per variable: 
          Variable Count
         school_id     0
        teacher_id     0
            gender   723
               age   700
      teacher_type   665
         full_time   693
 highest_education   696
      teaching_exp   741
       teacher_edu   698
      initial_qual   696
          prof_dev   722
     reading_score     0

 Missings in combinations of variables: 
            Combinations Count     Percent
 0:0:0:0:0:0:0:0:0:0:0:0  2709 76.82926829
 0:0:0:0:0:0:0:0:0:0:1:0    28  0.79410096
 0:0:0:0:0:0:0:0:0:1:0:0     2  0.05672150
 0:0:0:0:0:0:0:0:1:0:0:0     2  0.05672150
 0:0:0:0:0:0:0:0:1:1:0:0     1  0.02836075
 0:0:0:0:0:0:0:1:0:0:0:0    47  1.33295519
 0:0:0:0:0:0:1:1:0:0:0:0     1  0.02836075
 0:0:0:0:0:0:1:1:1:1:1:0     1  0.02836075
 0:0:0:0:0:1:0:0:0:0:0:0     1  0.02836075
 0:0:0:1:0:0:0:0:0:0:0:0     6  0.17016449
 0:0:0:1:0:0:1:1:1:1:1:0     1  0.02836075
 0:0:0:1:0:1:0:0:0:0:0:0     1  0.02836075
 0:0:0:1:0:1:1:1:1:1:1:0     3  0.08508225
 0:0:1:0:0:0:0:0:0:0:0:0    30  0.85082246
 0:0:1:0:0:0:0:0:0:0:1:0     1  0.02836075
 0:0:1:0:0:0:0:0:1:0:0:0     1  0.02836075
 0:0:1:0:0:0:1:0:0:0:0:0     1  0.02836075
 0:0:1:0:0:1:1:0:1:1:1:0     1  0.02836075
 0:0:1:1:0:0:0:0:0:0:0:0     1  0.02836075
 0:0:1:1:0:0:1:1:1:0:0:0     1  0.02836075
 0:0:1:1:0:1:1:1:1:1:1:0    22  0.62393647
 0:0:1:1:1:1:1:1:1:1:1:0   665 18.85989790
Show code
naniar::mcar_test(t_data)
# A tibble: 1 × 4
  statistic    df  p.value missing.patterns
      <dbl> <dbl>    <dbl>            <int>
1      372.   185 1.01e-14               22
Show code
# Imputing
new_data <- mice::mice(t_data, m = 5, maxit = 50, method = "pmm", seed = 0410, printFlag = FALSE)
# summary(new_data)

Plotting the imputed vs original data

Show code
densityplot(new_data, ~highest_education)
Show code
ggmice::ggmice(
  new_data,
  ggplot2::aes(x = teaching_exp, y = reading_score)
) +
  ggplot2::geom_jitter(width = 0.25) +
  ggplot2::labs(x = "Teaching Experience")

Pooled Linear Model

Show code
pooled_lm <- with(new_data, lm(reading_score ~ gender + age + teacher_type + full_time + highest_education + teaching_exp + teacher_edu + initial_qual + prof_dev))
## Pooled Statistics
summary(pool(pooled_lm))
                                                          term
1                                                  (Intercept)
2                                                   genderMale
3                                                          age
4                         teacher_typeReading-Language Teacher
5                                           full_timePart-Time
6                           highest_educationBachelor’s degree
7             highest_educationDoctoral or professional degree
8     highest_educationHigh school and/or some college courses
9                             highest_educationMaster’s degree
10                                                teaching_exp
11                        teacher_eduProgram of 1 Year or Less
12                  teacher_eduYes, Program Longer than 1 Year
13                                           initial_qualOther
14 initial_qualPre-service: Strandard Teacher Training Program
15      initial_qualTraining on Another Pedagogical Profession
16                     initial_qualWork-Based Teacher Training
17                                        prof_devParticipated
      estimate  std.error  statistic        df
1  491.0451642 19.6442728 24.9968614  20.78529
2    2.1475017  2.2357278  0.9605381  24.53414
3   -0.2872031  0.1238626 -2.3187235 249.00975
4    1.7009438  1.7738840  0.9588811 976.33670
5   35.2711991  5.7960439  6.0853920 573.00128
6    3.4827457 18.0067609  0.1934132  13.36907
7    6.1955769 19.0161891  0.3258054  13.56622
8   41.1263804 25.6156879  1.6055154  15.04531
9   14.2131938 17.7222506  0.8019971  14.11398
10   0.6097394  0.1476246  4.1303377 277.90985
11  10.1418112  4.1287233  2.4564037  54.99358
12   9.2704997  3.5876879  2.5839761 194.65525
13   7.8319378  7.1415849  1.0966666  25.10695
14   7.0007912  5.1494828  1.3595135  37.07149
15  -1.3017070  9.5126237 -0.1368400 173.30645
16   3.7636249  6.0943549  0.6175592  30.79034
17 -10.2256630  7.4555208 -1.3715558  32.82142
                    p.value
1  0.0000000000000000544673
2  0.3461560809803291283160
3  0.0212203213226792135326
4  0.3378560504549662502072
5  0.0000000021273780051914
6  0.8495429823132213398296
7  0.7495425607854081517800
8  0.1291609585854332775678
9  0.4358505832686117020813
10 0.0000479633897133009042
11 0.0172189944262561359101
12 0.0104991628257080065606
13 0.2831948580983822405521
14 0.1821964668061148984179
15 0.8913161358591843441701
16 0.5414067739100731913027
17 0.1795059863826796719977

Saving all the imputed data set as individual dataset

Show code
teacher_data_1 <- complete(new_data, 1)
teacher_data_2 <- complete(new_data, 2)
teacher_data_3 <- complete(new_data, 3)
teacher_data_4 <- complete(new_data, 4)
teacher_data_5 <- complete(new_data, 5)

Using Imputed 1

Show code
lm_one <- with(teacher_data_1, lm(reading_score ~ gender + age + teacher_type + full_time + highest_education + teaching_exp + teacher_edu + initial_qual + prof_dev))
# summary(lm_one)

Using Imputed 2

Show code
lm_two <- with(teacher_data_2, lm(reading_score ~ gender + age + teacher_type + full_time + highest_education + teaching_exp + teacher_edu + initial_qual + prof_dev))
summary(lm_two)

Call:
lm(formula = reading_score ~ gender + age + teacher_type + full_time + 
    highest_education + teaching_exp + teacher_edu + initial_qual + 
    prof_dev)

Residuals:
     Min       1Q   Median       3Q      Max 
-248.124  -33.554   -1.097   32.854  117.293 

Coefficients:
                                                            Estimate
(Intercept)                                                 474.0430
genderMale                                                    2.7521
age                                                          -0.3036
teacher_typeReading-Language Teacher                          1.2004
full_timePart-Time                                           35.9434
highest_educationBachelor’s degree                           17.1941
highest_educationDoctoral or professional degree             21.1999
highest_educationHigh school and/or some college courses     59.3143
highest_educationMaster’s degree                             27.0218
teaching_exp                                                  0.6181
teacher_eduProgram of 1 Year or Less                          7.6686
teacher_eduYes, Program Longer than 1 Year                    8.5648
initial_qualOther                                             2.0063
initial_qualPre-service: Strandard Teacher Training Program   6.6670
initial_qualTraining on Another Pedagogical Profession       -1.8108
initial_qualWork-Based Teacher Training                       4.7709
prof_devParticipated                                         -4.2699
                                                            Std. Error
(Intercept)                                                    14.2008
genderMale                                                      1.7374
age                                                             0.1141
teacher_typeReading-Language Teacher                            1.7332
full_timePart-Time                                              5.4026
highest_educationBachelor’s degree                             11.2460
highest_educationDoctoral or professional degree               12.0063
highest_educationHigh school and/or some college courses       16.6828
highest_educationMaster’s degree                               11.2416
teaching_exp                                                    0.1377
teacher_eduProgram of 1 Year or Less                            3.4808
teacher_eduYes, Program Longer than 1 Year                      3.2469
initial_qualOther                                               5.5414
initial_qualPre-service: Strandard Teacher Training Program     4.2635
initial_qualTraining on Another Pedagogical Profession          9.3680
initial_qualWork-Based Teacher Training                         4.8794
prof_devParticipated                                            6.2167
                                                            t value
(Intercept)                                                  33.381
genderMale                                                    1.584
age                                                          -2.661
teacher_typeReading-Language Teacher                          0.693
full_timePart-Time                                            6.653
highest_educationBachelor’s degree                            1.529
highest_educationDoctoral or professional degree              1.766
highest_educationHigh school and/or some college courses      3.555
highest_educationMaster’s degree                              2.404
teaching_exp                                                  4.490
teacher_eduProgram of 1 Year or Less                          2.203
teacher_eduYes, Program Longer than 1 Year                    2.638
initial_qualOther                                             0.362
initial_qualPre-service: Strandard Teacher Training Program   1.564
initial_qualTraining on Another Pedagogical Profession       -0.193
initial_qualWork-Based Teacher Training                       0.978
prof_devParticipated                                         -0.687
                                                            Pr(>|t|)
(Intercept)                                                  < 2e-16
genderMale                                                  0.113268
age                                                         0.007833
teacher_typeReading-Language Teacher                        0.488614
full_timePart-Time                                          3.32e-11
highest_educationBachelor’s degree                          0.126376
highest_educationDoctoral or professional degree            0.077528
highest_educationHigh school and/or some college courses    0.000382
highest_educationMaster’s degree                            0.016280
teaching_exp                                                7.35e-06
teacher_eduProgram of 1 Year or Less                        0.027650
teacher_eduYes, Program Longer than 1 Year                  0.008381
initial_qualOther                                           0.717329
initial_qualPre-service: Strandard Teacher Training Program 0.117972
initial_qualTraining on Another Pedagogical Profession      0.846739
initial_qualWork-Based Teacher Training                     0.328251
prof_devParticipated                                        0.492230
                                                               
(Intercept)                                                 ***
genderMale                                                     
age                                                         ** 
teacher_typeReading-Language Teacher                           
full_timePart-Time                                          ***
highest_educationBachelor’s degree                             
highest_educationDoctoral or professional degree            .  
highest_educationHigh school and/or some college courses    ***
highest_educationMaster’s degree                            *  
teaching_exp                                                ***
teacher_eduProgram of 1 Year or Less                        *  
teacher_eduYes, Program Longer than 1 Year                  ** 
initial_qualOther                                              
initial_qualPre-service: Strandard Teacher Training Program    
initial_qualTraining on Another Pedagogical Profession         
initial_qualWork-Based Teacher Training                        
prof_devParticipated                                           
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 48.09 on 3509 degrees of freedom
Multiple R-squared:  0.04298,   Adjusted R-squared:  0.03861 
F-statistic: 9.849 on 16 and 3509 DF,  p-value: < 2.2e-16
Show code
# names(teacher_data_2)
str(teacher_data_2)
'data.frame':   3526 obs. of  12 variables:
 $ school_id        : int  1 1 1 1 1 1 1 1 1 1 ...
 $ teacher_id       : int  1269 1293 135 1379 1400 1445 1631 1829 2114 2115 ...
 $ gender           : Factor w/ 2 levels "Female","Male": 2 2 1 2 2 1 2 1 1 1 ...
 $ age              : int  34 55 45 45 33 60 49 50 56 44 ...
 $ teacher_type     : Factor w/ 2 levels "General Teacher",..: 1 2 1 2 1 2 1 1 1 2 ...
 $ full_time        : Factor w/ 2 levels "Full-Time","Part-Time": 1 1 1 1 1 1 1 1 1 1 ...
 $ highest_education: Factor w/ 5 levels "Associate’s degree",..: 5 5 5 2 5 5 5 5 2 5 ...
 $ teaching_exp     : int  14 24 16 16 11 32 20 21 29 23 ...
 $ teacher_edu      : Factor w/ 3 levels "None. No Teacher Education",..: 2 3 3 3 3 3 3 3 3 3 ...
 $ initial_qual     : Factor w/ 5 levels "In-service: Teacher Training Program",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ prof_dev         : Factor w/ 2 levels "Didn't Participate",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ reading_score    : num  510 510 510 510 510 ...
Show code
# summary(teacher_data_2)
# The decision is to use the second set of imputed data to run further analysis
# write.csv(teacher_data_2, file = "t_data.csv", row.names = FALSE)

Using Imputed 3

Show code
lm_three <- with(teacher_data_3, lm(reading_score ~ gender + age + teacher_type + full_time + highest_education + teaching_exp + teacher_edu + initial_qual + prof_dev))
# summary(lm_three)

Using Imputed 4

Show code
lm_four <- with(teacher_data_4, lm(reading_score ~ gender + age + teacher_type + full_time + highest_education + teaching_exp + teacher_edu + initial_qual + prof_dev))
# summary(lm_four)

Using Imputed 5

Show code
lm_five <- with(teacher_data_5, lm(reading_score ~ gender + age + teacher_type + full_time + highest_education + teaching_exp + teacher_edu + initial_qual + prof_dev))
# summary(lm_five)

Average Reading Scores Based on Student and Teacher Characteristics: Categorical Variables Having More than Two Groups Only

Teacher Characteristics

Teacher Highest Education

Teacher Education

Initial Qualification

Show code
teacher_data <- teacher_data |>
  mutate(init_qual = recode(initial_qual,
    "Other" = "Training on Other Pedagogical Profession or Other",
    "Training on Another Pedagogical Profession" = "Training on Other Pedagogical Profession or Other"
  ))
### Getting Rid of initial_qual variable
teacher_data$initial_qual <- NULL

# Reading Score
teacher_data |>
  dplyr::group_by(init_qual) |>
  summarise(
    mean_score = mean(reading_score),
    sd_score = sd(reading_score)
  )
# A tibble: 4 × 3
  init_qual                                        mean_score sd_score
  <fct>                                                 <dbl>    <dbl>
1 In-service: Teacher Training Program                   495.     52.3
2 Training on Other Pedagogical Profession or Oth…       494.     49.1
3 Pre-service: Strandard Teacher Training Program        507.     48.3
4 Work-Based Teacher Training                            501.     52.8

Regression for Teacher

Show code
t_model <- lm(reading_score ~ high_education + init_qual + gender + age + teacher_type + full_time + teaching_exp + teacher_edu + prof_dev, data = teacher_data)
summary(t_model)

Call:
lm(formula = reading_score ~ high_education + init_qual + gender + 
    age + teacher_type + full_time + teaching_exp + teacher_edu + 
    prof_dev, data = teacher_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-248.336  -33.626   -0.947   32.888  117.907 

Coefficients:
                                                           Estimate
(Intercept)                                                499.0622
high_educationBachelor’s degree                             -9.0625
high_educationDoctoral or professional degree               -5.2077
high_educationMaster’s degree                                0.7532
init_qualTraining on Other Pedagogical Profession or Other   1.8320
init_qualPre-service: Strandard Teacher Training Program     7.0112
init_qualWork-Based Teacher Training                         5.5072
genderMale                                                   2.8360
age                                                         -0.2966
teacher_typeReading-Language Teacher                         1.2924
full_timePart-Time                                          34.9945
teaching_exp                                                 0.6180
teacher_eduProgram of 1 Year or Less                         8.0099
teacher_eduYes, Program Longer than 1 Year                   8.8726
prof_devParticipated                                        -4.0144
                                                           Std. Error
(Intercept)                                                   12.3660
high_educationBachelor’s degree                                8.5086
high_educationDoctoral or professional degree                  9.4650
high_educationMaster’s degree                                  8.4988
init_qualTraining on Other Pedagogical Profession or Other     5.3568
init_qualPre-service: Strandard Teacher Training Program       4.2689
init_qualWork-Based Teacher Training                           4.8824
genderMale                                                     1.7399
age                                                            0.1142
teacher_typeReading-Language Teacher                           1.7349
full_timePart-Time                                             5.4043
teaching_exp                                                   0.1378
teacher_eduProgram of 1 Year or Less                           3.4828
teacher_eduYes, Program Longer than 1 Year                     3.2487
prof_devParticipated                                           6.2231
                                                           t value
(Intercept)                                                 40.358
high_educationBachelor’s degree                             -1.065
high_educationDoctoral or professional degree               -0.550
high_educationMaster’s degree                                0.089
init_qualTraining on Other Pedagogical Profession or Other   0.342
init_qualPre-service: Strandard Teacher Training Program     1.642
init_qualWork-Based Teacher Training                         1.128
genderMale                                                   1.630
age                                                         -2.596
teacher_typeReading-Language Teacher                         0.745
full_timePart-Time                                           6.475
teaching_exp                                                 4.483
teacher_eduProgram of 1 Year or Less                         2.300
teacher_eduYes, Program Longer than 1 Year                   2.731
prof_devParticipated                                        -0.645
                                                           Pr(>|t|)
(Intercept)                                                 < 2e-16
high_educationBachelor’s degree                             0.28690
high_educationDoctoral or professional degree               0.58221
high_educationMaster’s degree                               0.92939
init_qualTraining on Other Pedagogical Profession or Other  0.73238
init_qualPre-service: Strandard Teacher Training Program    0.10060
init_qualWork-Based Teacher Training                        0.25941
genderMale                                                  0.10319
age                                                         0.00946
teacher_typeReading-Language Teacher                        0.45635
full_timePart-Time                                         1.08e-10
teaching_exp                                               7.59e-06
teacher_eduProgram of 1 Year or Less                        0.02152
teacher_eduYes, Program Longer than 1 Year                  0.00634
prof_devParticipated                                        0.51891
                                                              
(Intercept)                                                ***
high_educationBachelor’s degree                               
high_educationDoctoral or professional degree                 
high_educationMaster’s degree                                 
init_qualTraining on Other Pedagogical Profession or Other    
init_qualPre-service: Strandard Teacher Training Program      
init_qualWork-Based Teacher Training                          
genderMale                                                    
age                                                        ** 
teacher_typeReading-Language Teacher                          
full_timePart-Time                                         ***
teaching_exp                                               ***
teacher_eduProgram of 1 Year or Less                       *  
teacher_eduYes, Program Longer than 1 Year                 ** 
prof_devParticipated                                          
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 48.17 on 3511 degrees of freedom
Multiple R-squared:  0.03951,   Adjusted R-squared:  0.03568 
F-statistic: 10.32 on 14 and 3511 DF,  p-value: < 2.2e-16
Show code
# QuantPsyc::lm.beta(t_model)

Student/School Characteristics

Show code
s_data <- ss_data_5 |>
  dplyr::mutate(student_race = recode(ETHNICITY,
    "Other" = "Two or More Race/Other",
    "Two or More Race" = "Two or More Race/Other"
  ))
### Getting Rid of ETHNICITY
s_data$ETHNICITY <- NULL

## Reading Scores
s_data |>
  dplyr::group_by(student_race) |>
  summarise(
    mean_score = mean(READ_SCR),
    sd_score = sd(READ_SCR)
  )
# A tibble: 5 × 3
  student_race              mean_score sd_score
  <fct>                          <dbl>    <dbl>
1 Asian                           559.    100. 
2 Black or African American       447.     99.4
3 Hispanic or Latino              479.     97.5
4 Two or More Race/Other          495.     98.9
5 White, not Hispanic             526.    101. 

Mother Education(MISCED)

Show code
s_data <- s_data |>
  dplyr::mutate(mom_education = recode(MISCED,
    "ISCED 1" = "ISCED 1/None",
    "None" = "ISCED 1/None"
  ))
### Getting Rid of MISCED
s_data$MISCED <- NULL

## Reading Scores
s_data |>
  dplyr::group_by(mom_education) |>
  summarise(
    mean_score = mean(READ_SCR),
    sd_score = sd(READ_SCR)
  )
# A tibble: 5 × 3
  mom_education     mean_score sd_score
  <fct>                  <dbl>    <dbl>
1 ISCED 1/None            469.     98.4
2 ISCED 2                 453.     89.3
3 ISCED 3A, ISCED 4       493.     99.6
4 ISCED 5A, 6             521.    109. 
5 ISCED 5B                490.     99.0

Father Education(FISCED)

Show code
s_data <- s_data |>
  dplyr::mutate(dad_education = recode(FISCED,
    "ISCED 1" = "ISCED 1/None",
    "None" = "ISCED 1/None"
  ))
### Getting Rid of MISCED
s_data$FISCED <- NULL

## Reading Scores
s_data |>
  dplyr::group_by(dad_education) |>
  summarise(
    mean_score = mean(READ_SCR),
    sd_score = sd(READ_SCR)
  )
# A tibble: 5 × 3
  dad_education     mean_score sd_score
  <fct>                  <dbl>    <dbl>
1 ISCED 1/None            466.     95.3
2 ISCED 2                 463.     89.4
3 ISCED 3A, ISCED 4       491.     96.4
4 ISCED 5A, 6             529.    113. 
5 ISCED 5B                489.    102. 

School Location (SCH_LOCA)

Show code
s_data <- s_data |>
  dplyr::mutate(school_location = recode(SCH_LOCA,
    "rural" = "small_town/rural",
    "small_town" = "small_town/rural"
  ))
### Getting Rid of MISCED
s_data$SCH_LOCA <- NULL

## Reading Scores
s_data |>
  dplyr::group_by(school_location) |>
  summarise(
    mean_score = mean(READ_SCR),
    sd_score = sd(READ_SCR)
  )
# A tibble: 4 × 3
  school_location  mean_score sd_score
  <fct>                 <dbl>    <dbl>
1 city                   504.     104.
2 large_city             496.     108.
3 small_town/rural       489.     103.
4 town                   509.     105.

Home Langauge

Show code
## Reading Scores
s_data |>
  dplyr::group_by(LANGN) |>
  summarise(
    mean_score = mean(READ_SCR),
    sd_score = sd(READ_SCR)
  )
# A tibble: 3 × 3
  LANGN                  mean_score sd_score
  <fct>                       <dbl>    <dbl>
1 Another language (USA)       514.    109. 
2 English                      504.    105. 
3 Spanish                      464.     98.1

Home Internet

Show code
## Reading Scores
s_data |>
  dplyr::group_by(INTERNET) |>
  summarise(
    mean_score = mean(READ_SCR),
    sd_score = sd(READ_SCR)
  )
# A tibble: 3 × 3
  INTERNET                mean_score sd_score
  <fct>                        <dbl>    <dbl>
1 No                            442.    105. 
2 Yes, and I use it             507.    103. 
3 Yes, but I don’t use it       400.     90.7

Free or Reduced Price Lunch Status

Show code
## Reading Scores
s_data|>
  dplyr::group_by(FRPL)|>
  summarise(mean_score = mean(READ_SCR),
            sd_score = sd(READ_SCR))
# A tibble: 5 × 3
  FRPL                   mean_score sd_score
  <fct>                       <dbl>    <dbl>
1 <10% FRPL students           564.     92.2
2 >75% FRPL students           454.     99.1
3 10-24.9% FRPL students       545.    100. 
4 25-50% FRPL students         515.    100. 
5 50-74.9% FRPL students       492.    103. 
Show code
summary(s_data)
    GENDER                        LANGN               COB_MOM    
 Female:2376   Another language (USA): 220   Other country:1246  
 Male  :2462   English               :4096   United States:3592  
               Spanish               : 522                       
                                                                 
                                                                 
                                                                 
          COB_DAD     COM_HOM                       INTERNET   
 Other country:1293   No : 618   No                     : 207  
 United States:3545   Yes:4220   Yes, and I use it      :4477  
                                 Yes, but I don’t use it: 154  
                                                               
                                                               
                                                               
    ICTHOME          SCH_TYPE                        FRPL     
 Min.   : 1.000   Private: 203   <10% FRPL students    : 145  
 1st Qu.: 8.000   Public :4635   >75% FRPL students    :1046  
 Median :10.000                  10-24.9% FRPL students: 615  
 Mean   : 9.411                  25-50% FRPL students  :1655  
 3rd Qu.:11.000                  50-74.9% FRPL students:1377  
 Max.   :12.000                                               
    SCH_FRPL         SCH_ELS         SCH_DISA      EXT_CLAS  
 Min.   :  1.00   Min.   : 0.00   Min.   :  0.00   No :3471  
 1st Qu.: 29.00   1st Qu.: 4.00   1st Qu.: 10.00   Yes:1367  
 Median : 45.00   Median :10.00   Median : 13.00             
 Mean   : 47.01   Mean   :18.92   Mean   : 13.56             
 3rd Qu.: 65.00   3rd Qu.:23.50   3rd Qu.: 19.00             
 Max.   :100.00   Max.   :92.00   Max.   :100.00             
    READ_SCR                        student_race 
 Min.   :157.3   Asian                    : 285  
 1st Qu.:425.9   Black or African American: 785  
 Median :504.8   Hispanic or Latino       :1229  
 Mean   :500.6   Two or More Race/Other   : 419  
 3rd Qu.:578.4   White, not Hispanic      :2120  
 Max.   :810.5                                   
           mom_education            dad_education 
 ISCED 1/None     : 181   ISCED 1/None     : 221  
 ISCED 2          : 396   ISCED 2          : 470  
 ISCED 3A, ISCED 4:1470   ISCED 3A, ISCED 4:1860  
 ISCED 5A, 6      :2083   ISCED 5A, 6      :1731  
 ISCED 5B         : 708   ISCED 5B         : 556  
                                                  
         school_location
 city            :1278  
 large_city      : 740  
 small_town/rural:1241  
 town            :1579  
                        
                        

Regression SS

A. Combined Model

Show code
model_ss <- lm(READ_SCR ~ GENDER + student_race + LANGN + COB_MOM + COB_DAD + mom_education + dad_education + COM_HOM + INTERNET + ICTHOME + school_location + SCH_TYPE + FRPL + SCH_ELS + SCH_DISA + EXT_CLAS, data = s_data)
summary(model_ss)

Call:
lm(formula = READ_SCR ~ GENDER + student_race + LANGN + COB_MOM + 
    COB_DAD + mom_education + dad_education + COM_HOM + INTERNET + 
    ICTHOME + school_location + SCH_TYPE + FRPL + SCH_ELS + SCH_DISA + 
    EXT_CLAS, data = s_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-339.61  -62.66    3.26   67.45  279.68 

Coefficients:
                                       Estimate Std. Error t value
(Intercept)                           533.27388   17.02337  31.326
GENDERMale                            -25.77556    2.69965  -9.548
student_raceBlack or African American -77.95811    7.75320 -10.055
student_raceHispanic or Latino        -48.30198    7.58599  -6.367
student_raceTwo or More Race/Other    -44.28149    8.20029  -5.400
student_raceWhite, not Hispanic       -18.14613    7.36535  -2.464
LANGNEnglish                           34.55963    7.71833   4.478
LANGNSpanish                            8.08795    9.14157   0.885
COB_MOMUnited States                  -14.32337    5.01035  -2.859
COB_DADUnited States                   -5.91159    4.90204  -1.206
mom_educationISCED 2                  -20.66005    9.11039  -2.268
mom_educationISCED 3A, ISCED 4          2.30731    8.62465   0.268
mom_educationISCED 5A, 6                8.91699    8.82278   1.011
mom_educationISCED 5B                  -0.92099    9.08838  -0.101
dad_educationISCED 2                    2.08418    8.33776   0.250
dad_educationISCED 3A, ISCED 4          8.06241    7.86603   1.025
dad_educationISCED 5A, 6               22.57427    8.20059   2.753
dad_educationISCED 5B                   2.93681    8.70621   0.337
COM_HOMYes                             28.85008    4.53423   6.363
INTERNETYes, and I use it              39.55543    7.28902   5.427
INTERNETYes, but I don’t use it       -39.73948   10.42477  -3.812
ICTHOME                                -2.09122    0.72873  -2.870
school_locationlarge_city              -8.50075    4.41113  -1.927
school_locationsmall_town/rural       -19.15112    4.04880  -4.730
school_locationtown                    -8.16566    3.72009  -2.195
SCH_TYPEPublic                         -0.02858    6.88496  -0.004
FRPL>75% FRPL students                -64.92083    8.85290  -7.333
FRPL10-24.9% FRPL students            -15.09950    8.72813  -1.730
FRPL25-50% FRPL students              -33.90767    8.28079  -4.095
FRPL50-74.9% FRPL students            -46.29508    8.55092  -5.414
SCH_ELS                                 0.10570    0.07521   1.405
SCH_DISA                               -0.32412    0.19983  -1.622
EXT_CLASYes                            -4.49647    3.20358  -1.404
                                      Pr(>|t|)    
(Intercept)                            < 2e-16 ***
GENDERMale                             < 2e-16 ***
student_raceBlack or African American  < 2e-16 ***
student_raceHispanic or Latino        2.10e-10 ***
student_raceTwo or More Race/Other    6.99e-08 ***
student_raceWhite, not Hispanic        0.01379 *  
LANGNEnglish                          7.72e-06 ***
LANGNSpanish                           0.37634    
COB_MOMUnited States                   0.00427 ** 
COB_DADUnited States                   0.22790    
mom_educationISCED 2                   0.02339 *  
mom_educationISCED 3A, ISCED 4         0.78908    
mom_educationISCED 5A, 6               0.31222    
mom_educationISCED 5B                  0.91929    
dad_educationISCED 2                   0.80262    
dad_educationISCED 3A, ISCED 4         0.30543    
dad_educationISCED 5A, 6               0.00593 ** 
dad_educationISCED 5B                  0.73589    
COM_HOMYes                            2.17e-10 ***
INTERNETYes, and I use it             6.02e-08 ***
INTERNETYes, but I don’t use it        0.00014 ***
ICTHOME                                0.00413 ** 
school_locationlarge_city              0.05402 .  
school_locationsmall_town/rural       2.31e-06 ***
school_locationtown                    0.02821 *  
SCH_TYPEPublic                         0.99669    
FRPL>75% FRPL students                2.62e-13 ***
FRPL10-24.9% FRPL students             0.08370 .  
FRPL25-50% FRPL students              4.30e-05 ***
FRPL50-74.9% FRPL students            6.46e-08 ***
SCH_ELS                                0.15997    
SCH_DISA                               0.10487    
EXT_CLASYes                            0.16051    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 92.92 on 4805 degrees of freedom
Multiple R-squared:  0.2216,    Adjusted R-squared:  0.2165 
F-statistic: 42.76 on 32 and 4805 DF,  p-value: < 2.2e-16
Show code
# QuantPsyc::lm.beta(model_ss)

B. School Model

Show code
options(scipen = 999)
model_school <- lm(READ_SCR ~ school_location + SCH_TYPE + FRPL + SCH_ELS + SCH_DISA + EXT_CLAS, data = s_data)
summary(model_school)

Call:
lm(formula = READ_SCR ~ school_location + SCH_TYPE + FRPL + SCH_ELS + 
    SCH_DISA + EXT_CLAS, data = s_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-337.31  -68.77    4.17   71.79  292.06 

Coefficients:
                                  Estimate Std. Error t value
(Intercept)                      584.22857   11.34561  51.494
school_locationlarge_city        -13.39899    4.71422  -2.842
school_locationsmall_town/rural  -19.43819    4.14748  -4.687
school_locationtown               -5.66410    3.93780  -1.438
SCH_TYPEPublic                   -11.40530    7.36441  -1.549
FRPL>75% FRPL students          -110.42296    9.23061 -11.963
FRPL10-24.9% FRPL students       -17.14007    9.36533  -1.830
FRPL25-50% FRPL students         -47.78556    8.83150  -5.411
FRPL50-74.9% FRPL students       -71.91088    9.05253  -7.944
SCH_ELS                            0.11541    0.07514   1.536
SCH_DISA                          -0.09699    0.21333  -0.455
EXT_CLASYes                       -6.55133    3.42714  -1.912
                                            Pr(>|t|)    
(Intercept)                     < 0.0000000000000002 ***
school_locationlarge_city                     0.0045 ** 
school_locationsmall_town/rural  0.00000285225375143 ***
school_locationtown                           0.1504    
SCH_TYPEPublic                                0.1215    
FRPL>75% FRPL students          < 0.0000000000000002 ***
FRPL10-24.9% FRPL students                    0.0673 .  
FRPL25-50% FRPL students         0.00000006577429293 ***
FRPL50-74.9% FRPL students       0.00000000000000242 ***
SCH_ELS                                       0.1246    
SCH_DISA                                      0.6494    
EXT_CLASYes                                   0.0560 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 100.1 on 4826 degrees of freedom
Multiple R-squared:  0.09295,   Adjusted R-squared:  0.09089 
F-statistic: 44.96 on 11 and 4826 DF,  p-value: < 0.00000000000000022

C. Student and ICT Model

Show code
model_stu_ICT <- lm(READ_SCR ~ GENDER + student_race + LANGN + COB_MOM + COB_DAD + mom_education + dad_education + COM_HOM + INTERNET + ICTHOME, data = s_data)
summary(model_stu_ICT)

Call:
lm(formula = READ_SCR ~ GENDER + student_race + LANGN + COB_MOM + 
    COB_DAD + mom_education + dad_education + COM_HOM + INTERNET + 
    ICTHOME, data = s_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-343.42  -62.88    3.77   69.09  289.57 

Coefficients:
                                      Estimate Std. Error t value
(Intercept)                           474.7782    12.9391  36.693
GENDERMale                            -25.8235     2.7350  -9.442
student_raceBlack or African American -91.9820     7.6813 -11.975
student_raceHispanic or Latino        -51.5029     7.6349  -6.746
student_raceTwo or More Race/Other    -49.2520     8.2957  -5.937
student_raceWhite, not Hispanic       -21.4457     7.4088  -2.895
LANGNEnglish                           34.9356     7.8199   4.468
LANGNSpanish                            3.7848     9.2430   0.409
COB_MOMUnited States                  -17.3768     5.0566  -3.436
COB_DADUnited States                   -7.5449     4.9610  -1.521
mom_educationISCED 2                  -19.6681     9.2277  -2.131
mom_educationISCED 3A, ISCED 4          5.1157     8.7306   0.586
mom_educationISCED 5A, 6               14.9053     8.9208   1.671
mom_educationISCED 5B                   1.1089     9.2079   0.120
dad_educationISCED 2                    4.0194     8.4451   0.476
dad_educationISCED 3A, ISCED 4          9.9565     7.9626   1.250
dad_educationISCED 5A, 6               29.5824     8.2820   3.572
dad_educationISCED 5B                   6.3804     8.8201   0.723
COM_HOMYes                             35.4891     4.5604   7.782
INTERNETYes, and I use it              39.1149     7.3835   5.298
INTERNETYes, but I don’t use it       -44.5599    10.5627  -4.219
ICTHOME                                -1.9921     0.7384  -2.698
                                      Pr(>|t|)    
(Intercept)                            < 2e-16 ***
GENDERMale                             < 2e-16 ***
student_raceBlack or African American  < 2e-16 ***
student_raceHispanic or Latino        1.70e-11 ***
student_raceTwo or More Race/Other    3.11e-09 ***
student_raceWhite, not Hispanic       0.003813 ** 
LANGNEnglish                          8.09e-06 ***
LANGNSpanish                          0.682210    
COB_MOMUnited States                  0.000594 ***
COB_DADUnited States                  0.128362    
mom_educationISCED 2                  0.033106 *  
mom_educationISCED 3A, ISCED 4        0.557937    
mom_educationISCED 5A, 6              0.094817 .  
mom_educationISCED 5B                 0.904146    
dad_educationISCED 2                  0.634138    
dad_educationISCED 3A, ISCED 4        0.211212    
dad_educationISCED 5A, 6              0.000358 ***
dad_educationISCED 5B                 0.469472    
COM_HOMYes                            8.68e-15 ***
INTERNETYes, and I use it             1.23e-07 ***
INTERNETYes, but I don’t use it       2.50e-05 ***
ICTHOME                               0.006999 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 94.34 on 4816 degrees of freedom
Multiple R-squared:  0.1958,    Adjusted R-squared:  0.1923 
F-statistic: 55.85 on 21 and 4816 DF,  p-value: < 2.2e-16

D. ICT Model

Show code
ict_only_model <- lm(READ_SCR ~ COM_HOM + INTERNET + ICTHOME, data = s_data)
summary(ict_only_model)

Call:
lm(formula = READ_SCR ~ COM_HOM + INTERNET + ICTHOME, data = s_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-297.667  -70.736    3.899   74.143  298.670 

Coefficients:
                                Estimate Std. Error t value Pr(>|t|)
(Intercept)                     418.5152     8.0786  51.805  < 2e-16
COM_HOMYes                       58.4775     4.7622  12.280  < 2e-16
INTERNETYes, and I use it        43.1563     7.8793   5.477 4.54e-08
INTERNETYes, but I don’t use it -55.4009    11.2778  -4.912 9.29e-07
ICTHOME                          -0.7574     0.7721  -0.981    0.327
                                   
(Intercept)                     ***
COM_HOMYes                      ***
INTERNETYes, and I use it       ***
INTERNETYes, but I don’t use it ***
ICTHOME                            
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 100.9 on 4833 degrees of freedom
Multiple R-squared:  0.07664,   Adjusted R-squared:  0.07587 
F-statistic: 100.3 on 4 and 4833 DF,  p-value: < 2.2e-16

D. Student Only Model

Show code
student_only_model <- lm(READ_SCR ~ GENDER + student_race + LANGN + COB_MOM + COB_DAD + mom_education + dad_education, data = s_data)
summary(student_only_model)

Call:
lm(formula = READ_SCR ~ GENDER + student_race + LANGN + COB_MOM + 
    COB_DAD + mom_education + dad_education, data = s_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-342.95  -64.25    5.48   70.35  308.64 

Coefficients:
                                      Estimate Std. Error t value
(Intercept)                            522.539     11.129  46.953
GENDERMale                             -28.085      2.785 -10.085
student_raceBlack or African American  -98.765      7.841 -12.596
student_raceHispanic or Latino         -54.536      7.804  -6.988
student_raceTwo or More Race/Other     -53.471      8.479  -6.306
student_raceWhite, not Hispanic        -21.580      7.576  -2.848
LANGNEnglish                            34.660      7.997   4.334
LANGNSpanish                             1.171      9.450   0.124
COB_MOMUnited States                   -20.894      5.158  -4.050
COB_DADUnited States                    -9.505      5.069  -1.875
mom_educationISCED 2                   -17.592      9.431  -1.865
mom_educationISCED 3A, ISCED 4          12.466      8.902   1.400
mom_educationISCED 5A, 6                24.202      9.077   2.666
mom_educationISCED 5B                    9.376      9.381   1.000
dad_educationISCED 2                    -1.423      8.628  -0.165
dad_educationISCED 3A, ISCED 4           9.822      8.138   1.207
dad_educationISCED 5A, 6                30.628      8.445   3.627
dad_educationISCED 5B                    4.852      9.005   0.539
                                      Pr(>|t|)    
(Intercept)                            < 2e-16 ***
GENDERMale                             < 2e-16 ***
student_raceBlack or African American  < 2e-16 ***
student_raceHispanic or Latino        3.17e-12 ***
student_raceTwo or More Race/Other    3.12e-10 ***
student_raceWhite, not Hispanic        0.00441 ** 
LANGNEnglish                          1.49e-05 ***
LANGNSpanish                           0.90141    
COB_MOMUnited States                  5.19e-05 ***
COB_DADUnited States                   0.06082 .  
mom_educationISCED 2                   0.06219 .  
mom_educationISCED 3A, ISCED 4         0.16145    
mom_educationISCED 5A, 6               0.00769 ** 
mom_educationISCED 5B                  0.31760    
dad_educationISCED 2                   0.86899    
dad_educationISCED 3A, ISCED 4         0.22748    
dad_educationISCED 5A, 6               0.00029 ***
dad_educationISCED 5B                  0.59007    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 96.49 on 4820 degrees of freedom
Multiple R-squared:  0.1581,    Adjusted R-squared:  0.1552 
F-statistic: 53.25 on 17 and 4820 DF,  p-value: < 2.2e-16