A. Loading Data and Getting them Ready for Analysis
1. School, Student Background, and Student Home ICT Data Together
Show code
# Loading Student and School Data
ss_data <- read.csv("C:/Users/nghimire/OneDrive - University of Texas at Tyler/Documents/edsurvey_PISA_USA/SS_final_data.csv",
as.is = TRUE
)
# Changing character vectors to factors
ss_data <- ss_data %>%
mutate_if(is.character, as.factor) |>
dplyr::select(GENDER, ETHNICITY, LANGN, COB_MOM, COB_DAD, MISCED, FISCED, COM_HOM, INTERNET, ICTHOME, SCH_LOCA, SCH_TYPE, FRPL, SCH_FRPL, SCH_ELS, SCH_DISA, EXT_CLAS, READ_SCR)
# Getting rid of unnecessary variables
# names(ss_data)
str(ss_data)
'data.frame': 4838 obs. of 18 variables:
$ GENDER : Factor w/ 2 levels "Female","Male": 2 1 2 2 2 2 1 1 1 2 ...
$ ETHNICITY: Factor w/ 6 levels "Asian","Black or African American",..: 5 6 6 6 6 6 5 6 3 6 ...
$ LANGN : Factor w/ 3 levels "Another language (USA)",..: 2 2 2 2 2 2 2 2 2 2 ...
$ COB_MOM : Factor w/ 2 levels "Other country",..: 2 2 2 2 2 2 2 2 2 2 ...
$ COB_DAD : Factor w/ 2 levels "Other country",..: 2 2 2 2 2 2 2 2 1 2 ...
$ MISCED : Factor w/ 6 levels "ISCED 1","ISCED 2",..: 5 3 3 2 5 3 2 4 5 4 ...
$ FISCED : Factor w/ 6 levels "ISCED 1","ISCED 2",..: 3 3 4 3 4 3 3 5 3 4 ...
$ COM_HOM : Factor w/ 2 levels "No","Yes": 2 1 2 2 2 2 2 2 2 2 ...
$ INTERNET : Factor w/ 3 levels "No","Yes, and I use it",..: 2 2 2 2 2 2 2 2 2 2 ...
$ ICTHOME : int 10 5 9 6 12 11 9 10 6 9 ...
$ SCH_LOCA : Factor w/ 5 levels "city","large_city",..: 5 5 5 5 5 5 5 5 5 5 ...
$ SCH_TYPE : Factor w/ 2 levels "Private","Public": 2 2 2 2 2 2 2 2 2 2 ...
$ FRPL : Factor w/ 5 levels "<10% FRPL students",..: 2 2 2 2 2 2 2 2 2 2 ...
$ SCH_FRPL : int 79 79 79 79 79 79 79 79 79 79 ...
$ SCH_ELS : int 5 5 5 5 5 5 5 5 5 5 ...
$ SCH_DISA : int 37 37 37 37 37 37 37 37 37 37 ...
$ EXT_CLAS : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
$ READ_SCR : num 544 432 504 438 536 ...
Show code
GENDER ETHNICITY
Female:2376 Asian : 284
Male :2462 Black or African American: 769
Hispanic or Latino :1221
Other : 47
Two or More Race : 367
White, not Hispanic :2097
NA's : 53
LANGN COB_MOM
Another language (USA): 219 Other country:1214
English :4054 United States:3524
Spanish : 517 NA's : 100
NA's : 48
COB_DAD MISCED
Other country:1254 ISCED 1 : 104
United States:3449 ISCED 2 : 387
NA's : 135 ISCED 3A, ISCED 4:1447
ISCED 5A, 6 :2045
ISCED 5B : 688
None : 74
NA's : 93
FISCED COM_HOM INTERNET
ISCED 1 : 118 No : 604 No : 182
ISCED 2 : 448 Yes :4170 Yes, and I use it :4261
ISCED 3A, ISCED 4:1784 NA's: 64 Yes, but I don’t use it: 139
ISCED 5A, 6 :1674 NA's : 256
ISCED 5B : 535
None : 93
NA's : 186
ICTHOME SCH_LOCA SCH_TYPE
Min. : 1.000 city :1161 Private: 176
1st Qu.: 8.000 large_city: 655 Public :4206
Median :10.000 rural : 302 NA's : 456
Mean : 9.413 small_town: 846
3rd Qu.:11.000 town :1453
Max. :12.000 NA's : 421
NA's :192
FRPL SCH_FRPL SCH_ELS
<10% FRPL students : 125 Min. : 1.00 Min. : 0.00
>75% FRPL students : 903 1st Qu.: 29.00 1st Qu.: 4.00
10-24.9% FRPL students: 505 Median : 45.00 Median :10.00
25-50% FRPL students :1416 Mean : 46.62 Mean :19.29
50-74.9% FRPL students:1143 3rd Qu.: 65.00 3rd Qu.:24.00
NA's : 746 Max. :100.00 Max. :92.00
NA's :538 NA's :942
SCH_DISA EXT_CLAS READ_SCR
Min. : 0.00 No :3025 Min. :157.3
1st Qu.: 10.00 Yes :1187 1st Qu.:425.9
Median : 13.00 NA's: 626 Median :504.8
Mean : 13.59 Mean :500.6
3rd Qu.: 19.00 3rd Qu.:578.4
Max. :100.00 Max. :810.5
NA's :553
Checking for Assumptions of Multivariate Normality and Outliers for SEM Study
Mardia’s Test of Multivariate Normality
Show code
ss_num <- dplyr::select(ss_data, SCH_ELS, SCH_FRPL, ICTHOME, READ_SCR, SCH_DISA)
library(QuantPsyc)
mult.norm(ss_num)$mult.test
Beta-hat kappa p-val
Skewness 4.538961 2830.042115 0.000000e+00
Kurtosis 37.072662 7.576058 3.552714e-14
Show code
[1] SCH_ELS SCH_FRPL ICTHOME READ_SCR SCH_DISA ss_maha maha_p
<0 rows> (or 0-length row.names)
Show code
[1] 0
Checking missing data pattern
Show code
VIM::aggr(ss_data, col = c("dodgerblue", "maroon"), numbers = TRUE, sortVars = TRUE, labels = names(ss_data), cex.axis = .7, gap = 3, ylab = c("Histogram of missing data", "Pattern"))

Variables sorted by number of missings:
Variable Count
SCH_ELS 0.194708557
FRPL 0.154195949
EXT_CLAS 0.129392311
SCH_DISA 0.114303431
SCH_FRPL 0.111202976
SCH_TYPE 0.094253824
SCH_LOCA 0.087019430
INTERNET 0.052914427
ICTHOME 0.039685821
FISCED 0.038445639
COB_DAD 0.027904093
COB_MOM 0.020669698
MISCED 0.019222819
COM_HOM 0.013228607
ETHNICITY 0.010954940
LANGN 0.009921455
GENDER 0.000000000
READ_SCR 0.000000000
Show code
# summary(checking_missing_data, plot=FALSE)
naniar::mcar_test(ss_data)
# A tibble: 1 × 4
statistic df p.value missing.patterns
<dbl> <dbl> <dbl> <int>
1 7307. 1561 0 118
Show code
# Imputing
new_ssdata <- mice::mice(ss_data[], m = 5, maxit = 50, method = "pmm", seed = 0411, printFlag = FALSE)
Plotting the imputed vs original data
Show code
Show code
ggmice::ggmice(
new_ssdata,
ggplot2::aes(x = SCH_FRPL, y = READ_SCR)
) +
ggplot2::geom_jitter(width = 0.25) +
ggplot2::labs(x = "School FRPL Percentage")

Pooled {lm} on New SS Data
Show code
# Running lm model on Pooled data
ss_pooled <- with(new_ssdata, lm(READ_SCR ~ SCH_ELS + FRPL + EXT_CLAS + SCH_DISA + SCH_FRPL + SCH_TYPE + SCH_LOCA + INTERNET + ICTHOME + FISCED + COB_DAD + COB_MOM + MISCED + COM_HOM + ETHNICITY + LANGN + GENDER))
summary(pool(ss_pooled))
term estimate std.error
1 (Intercept) 517.54175286 19.4968704
2 SCH_ELS 0.01737201 0.1052718
3 FRPL>75% FRPL students -46.39849259 19.1809653
4 FRPL10-24.9% FRPL students -8.96139826 10.2666687
5 FRPL25-50% FRPL students -25.86839079 10.7722291
6 FRPL50-74.9% FRPL students -32.49891922 13.7992211
7 EXT_CLASYes -3.84912894 3.5885261
8 SCH_DISA -0.37179565 0.2233463
9 SCH_FRPL -0.13717641 0.1952642
10 SCH_TYPEPublic 3.23893828 8.4101723
11 SCH_LOCAlarge_city -4.98429830 4.9124647
12 SCH_LOCArural -32.94104357 6.4002975
13 SCH_LOCAsmall_town -13.41288070 4.6306869
14 SCH_LOCAtown -7.16386438 3.9293962
15 INTERNETYes, and I use it 41.31088320 8.2614903
16 INTERNETYes, but I don’t use it -38.45477873 11.2191608
17 ICTHOME -2.60909168 0.8246360
18 FISCEDISCED 2 6.38531305 10.4530367
19 FISCEDISCED 3A, ISCED 4 12.34183379 10.0896338
20 FISCEDISCED 5A, 6 26.66722281 10.5152958
21 FISCEDISCED 5B 8.35050527 11.1840184
22 FISCEDNone 8.75753429 13.6444762
23 COB_DADUnited States -6.79950822 4.9662579
24 COB_MOMUnited States -15.57301062 5.1544082
25 MISCEDISCED 2 -13.89954941 11.0372505
26 MISCEDISCED 3A, ISCED 4 8.97243343 10.5666228
27 MISCEDISCED 5A, 6 15.61112110 10.7935044
28 MISCEDISCED 5B 6.50082429 11.1914206
29 MISCEDNone 7.51668266 14.8870925
30 COM_HOMYes 30.15765844 4.7544997
31 ETHNICITYBlack or African American -78.47958781 7.9074566
32 ETHNICITYHispanic or Latino -47.34367146 7.6148080
33 ETHNICITYOther -69.09352857 15.0653499
34 ETHNICITYTwo or More Race -40.96166776 8.6407991
35 ETHNICITYWhite, not Hispanic -17.01495577 7.6685743
36 LANGNEnglish 34.87258385 7.8448429
37 LANGNSpanish 9.04209931 9.3149949
38 GENDERMale -25.16645529 2.7340136
statistic df p.value
1 26.5448630 1271.08465 6.840684e-124
2 0.1650206 18.33173 8.707339e-01
3 -2.4189863 13.49010 3.035289e-02
4 -0.8728633 64.08763 3.859963e-01
5 -2.4013963 45.78419 2.045066e-02
6 -2.3551271 25.67038 2.644750e-02
7 -1.0726211 119.77129 2.855976e-01
8 -1.6646602 108.57361 9.886392e-02
9 -0.7025171 15.57549 4.927234e-01
10 0.3851215 184.08598 7.005920e-01
11 -1.0146227 126.20056 3.122270e-01
12 -5.1467988 769.82281 3.366554e-07
13 -2.8965208 317.46104 4.035257e-03
14 -1.8231464 383.76699 6.905905e-02
15 5.0004154 74.77914 3.668608e-06
16 -3.4275985 277.27873 7.013392e-04
17 -3.1639315 93.20280 2.102218e-03
18 0.6108572 389.48504 5.416504e-01
19 1.2232192 308.04933 2.221818e-01
20 2.5360411 225.78335 1.188723e-02
21 0.7466462 143.96211 4.564947e-01
22 0.6418373 486.47837 5.212811e-01
23 -1.3691412 2620.46898 1.710725e-01
24 -3.0212994 1095.06671 2.575359e-03
25 -1.2593308 1010.33267 2.082017e-01
26 0.8491297 1040.40930 3.960044e-01
27 1.4463441 844.93751 1.484516e-01
28 0.5808757 442.11776 5.616202e-01
29 0.5049127 956.17903 6.137365e-01
30 6.3429719 474.19808 5.263918e-10
31 -9.9247574 2397.39575 8.874992e-23
32 -6.2173165 4189.59123 5.551071e-10
33 -4.5862545 4202.97596 4.643989e-06
34 -4.7404953 1088.78288 2.414457e-06
35 -2.2187900 583.47482 2.688502e-02
36 4.4452877 2116.11973 9.231248e-06
37 0.9707036 1677.68645 3.318358e-01
38 -9.2049488 3110.79788 6.105696e-20
Breaking Imputed Data in Individual Sets
LM on Dataset 1
Show code
ss_1 <- with(ss_data_1, lm(READ_SCR ~ SCH_ELS + FRPL + EXT_CLAS + SCH_DISA + SCH_FRPL + SCH_TYPE + SCH_LOCA + INTERNET + ICTHOME + FISCED + COB_DAD + COB_MOM + MISCED + COM_HOM + ETHNICITY + LANGN + GENDER))
# summary(ss_1)
LM on Dataset 2
Show code
ss_2 <- with(ss_data_2, lm(READ_SCR ~ SCH_ELS + FRPL + EXT_CLAS + SCH_DISA + SCH_FRPL + SCH_TYPE + SCH_LOCA + INTERNET + ICTHOME + FISCED + COB_DAD + COB_MOM + MISCED + COM_HOM + ETHNICITY + LANGN + GENDER))
# summary(ss_2)
LM on Dataset 3
Show code
ss_3 <- with(ss_data_3, lm(READ_SCR ~ SCH_ELS + FRPL + EXT_CLAS + SCH_DISA + SCH_FRPL + SCH_TYPE + SCH_LOCA + INTERNET + ICTHOME + FISCED + COB_DAD + COB_MOM + MISCED + COM_HOM + ETHNICITY + LANGN + GENDER))
# summary(ss_3)
LM on Dataset 4
Show code
ss_4 <- with(ss_data_4, lm(READ_SCR ~ SCH_ELS + FRPL + EXT_CLAS + SCH_DISA + SCH_FRPL + SCH_TYPE + SCH_LOCA + INTERNET + ICTHOME + FISCED + COB_DAD + COB_MOM + MISCED + COM_HOM + ETHNICITY + LANGN + GENDER))
# summary(ss_4)
LM on Dataset 5
Show code
ss_5 <- with(ss_data_5, lm(READ_SCR ~ SCH_ELS + FRPL + EXT_CLAS + SCH_DISA + SCH_FRPL + SCH_TYPE + SCH_LOCA + INTERNET + ICTHOME + FISCED + COB_DAD + COB_MOM + MISCED + COM_HOM + ETHNICITY + LANGN + GENDER))
# summary(ss_5)
summary(ss_data_5)
GENDER ETHNICITY
Female:2376 Asian : 285
Male :2462 Black or African American: 785
Hispanic or Latino :1229
Other : 49
Two or More Race : 370
White, not Hispanic :2120
LANGN COB_MOM
Another language (USA): 220 Other country:1246
English :4096 United States:3592
Spanish : 522
COB_DAD MISCED
Other country:1293 ISCED 1 : 107
United States:3545 ISCED 2 : 396
ISCED 3A, ISCED 4:1470
ISCED 5A, 6 :2083
ISCED 5B : 708
None : 74
FISCED COM_HOM INTERNET
ISCED 1 : 123 No : 618 No : 207
ISCED 2 : 470 Yes:4220 Yes, and I use it :4477
ISCED 3A, ISCED 4:1860 Yes, but I don’t use it: 154
ISCED 5A, 6 :1731
ISCED 5B : 556
None : 98
ICTHOME SCH_LOCA SCH_TYPE
Min. : 1.000 city :1278 Private: 203
1st Qu.: 8.000 large_city: 740 Public :4635
Median :10.000 rural : 330
Mean : 9.411 small_town: 911
3rd Qu.:11.000 town :1579
Max. :12.000
FRPL SCH_FRPL SCH_ELS
<10% FRPL students : 145 Min. : 1.00 Min. : 0.00
>75% FRPL students :1046 1st Qu.: 29.00 1st Qu.: 4.00
10-24.9% FRPL students: 615 Median : 45.00 Median :10.00
25-50% FRPL students :1655 Mean : 47.01 Mean :18.92
50-74.9% FRPL students:1377 3rd Qu.: 65.00 3rd Qu.:23.50
Max. :100.00 Max. :92.00
SCH_DISA EXT_CLAS READ_SCR
Min. : 0.00 No :3471 Min. :157.3
1st Qu.: 10.00 Yes:1367 1st Qu.:425.9
Median : 13.00 Median :504.8
Mean : 13.56 Mean :500.6
3rd Qu.: 19.00 3rd Qu.:578.4
Max. :100.00 Max. :810.5
Show code
'data.frame': 4838 obs. of 18 variables:
$ GENDER : Factor w/ 2 levels "Female","Male": 2 1 2 2 2 2 1 1 1 2 ...
$ ETHNICITY: Factor w/ 6 levels "Asian","Black or African American",..: 5 6 6 6 6 6 5 6 3 6 ...
$ LANGN : Factor w/ 3 levels "Another language (USA)",..: 2 2 2 2 2 2 2 2 2 2 ...
$ COB_MOM : Factor w/ 2 levels "Other country",..: 2 2 2 2 2 2 2 2 2 2 ...
$ COB_DAD : Factor w/ 2 levels "Other country",..: 2 2 2 2 2 2 2 2 1 2 ...
$ MISCED : Factor w/ 6 levels "ISCED 1","ISCED 2",..: 5 3 3 2 5 3 2 4 5 4 ...
$ FISCED : Factor w/ 6 levels "ISCED 1","ISCED 2",..: 3 3 4 3 4 3 3 5 3 4 ...
$ COM_HOM : Factor w/ 2 levels "No","Yes": 2 1 2 2 2 2 2 2 2 2 ...
$ INTERNET : Factor w/ 3 levels "No","Yes, and I use it",..: 2 2 2 2 2 2 2 2 2 2 ...
$ ICTHOME : int 10 5 9 6 12 11 9 10 6 9 ...
$ SCH_LOCA : Factor w/ 5 levels "city","large_city",..: 5 5 5 5 5 5 5 5 5 5 ...
$ SCH_TYPE : Factor w/ 2 levels "Private","Public": 2 2 2 2 2 2 2 2 2 2 ...
$ FRPL : Factor w/ 5 levels "<10% FRPL students",..: 2 2 2 2 2 2 2 2 2 2 ...
$ SCH_FRPL : int 79 79 79 79 79 79 79 79 79 79 ...
$ SCH_ELS : int 5 5 5 5 5 5 5 5 5 5 ...
$ SCH_DISA : int 37 37 37 37 37 37 37 37 37 37 ...
$ EXT_CLAS : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
$ READ_SCR : num 544 432 504 438 536 ...
Show code
write.csv(ss_data_5, file = "s_data.csv", row.names = FALSE)
# save(ss_data_5, file = "s_data.RData")
2. Teacher Data
Show code
teacher_final <- read.csv("C:/Users/nghimire/OneDrive - University of Texas at Tyler/Documents/edsurvey_PISA_USA/t_final.csv",
as.is = TRUE
)
# names(teacher_final)
# Changing class of character variables
teacher_final <- data.frame(lapply(teacher_final, function(b) if (is.character(b)) as.factor(b) else b))
# Getting rid of unnecessary column
teacher_final <- teacher_final[, -c(1, 8, 12, 13)]
# summary(teacher_final)
# str(teacher_final)
# Teacher Education Data
teacher_education <- read.csv("C:/Users/nghimire/OneDrive - University of Texas at Tyler/Documents/edsurvey_PISA_USA/teacher_education.csv",
as.is = TRUE
)
options(scipen = 999)
# Getting rid of unnecessary column
teach_edu <- teacher_education[, -1]
# Changing the class of teacher_education
# teach_edu$teacher_education <- as.factor(teach_edu$teacher_education)
# Getting Rid of First Four Digits from 'teacher_id' and 'school_id'
teach_edu$school_id <- stringr::str_sub(teach_edu$CNTSCHID, -4, -1)
teach_edu$teacher_id <- stringr::str_sub(teach_edu$CNTTCHID, -4, -1)
# Getting Rid of Leading Zeros
teach_edu$school_id <- stringr::str_remove(teach_edu$school_id, "^0+")
teach_edu$teacher_id <- stringr::str_remove(teach_edu$teacher_id, "^0+")
# Changing the class
teach_edu <- teach_edu |>
dplyr::select(school_id, teacher_id, teacher_education) |>
mutate(
school_id = as.integer(school_id),
teacher_id = as.integer(teacher_id),
highest_education = as.factor(teacher_education)
)
# str(teach_edu)
# Merging the data sets
t_data <- merge(teach_edu, teacher_final, by = c("school_id", "teacher_id"))
t_data <- t_data |>
dplyr::select(school_id, teacher_id, gender, age, teacher_type, full_time, highest_education, teaching_exp, teacher_edu, initial_qual, prof_dev, reading_score)
str(t_data)
'data.frame': 3526 obs. of 12 variables:
$ school_id : int 1 1 1 1 1 1 1 1 1 1 ...
$ teacher_id : int 1269 1293 135 1379 1400 1445 1631 1829 2114 2115 ...
$ gender : Factor w/ 2 levels "Female","Male": NA 2 1 2 2 1 2 1 1 1 ...
$ age : int NA 55 45 45 33 60 49 50 56 44 ...
$ teacher_type : Factor w/ 2 levels "General Teacher",..: NA 2 1 2 1 2 1 1 1 2 ...
$ full_time : Factor w/ 2 levels "Full-Time","Part-Time": NA 1 1 1 1 1 1 1 1 1 ...
$ highest_education: Factor w/ 5 levels "Associate’s degree",..: NA 5 5 2 5 5 5 5 2 5 ...
$ teaching_exp : int NA 24 16 16 11 32 20 21 29 23 ...
$ teacher_edu : Factor w/ 3 levels "None. No Teacher Education",..: NA 3 3 3 3 3 3 3 3 3 ...
$ initial_qual : Factor w/ 5 levels "In-service: Teacher Training Program",..: NA 3 3 3 3 3 3 3 3 3 ...
$ prof_dev : Factor w/ 2 levels "Didn't Participate",..: NA 2 2 2 2 2 2 2 2 2 ...
$ reading_score : num 510 510 510 510 510 ...
Show code
school_id teacher_id gender age
Min. : 1.00 Min. : 1.0 Female:1784 Min. :20.00
1st Qu.: 47.00 1st Qu.: 942.2 Male :1019 1st Qu.:34.00
Median : 88.00 Median :1888.5 NA's : 723 Median :42.00
Mean : 88.29 Mean :1889.0 Mean :42.87
3rd Qu.:131.00 3rd Qu.:2837.8 3rd Qu.:51.00
Max. :175.00 Max. :3779.0 Max. :70.00
NA's :700
teacher_type full_time
General Teacher :1812 Full-Time:2773
Reading-Language Teacher:1049 Part-Time: 60
NA's : 665 NA's : 693
highest_education teaching_exp
Associate’s degree : 12 Min. : 0.00
Bachelor’s degree :1013 1st Qu.: 7.00
Doctoral or professional degree : 94 Median :14.00
High school and/or some college courses: 11 Mean :14.95
Master’s degree :1700 3rd Qu.:21.00
NA's : 696 Max. :50.00
NA's :741
teacher_edu
None. No Teacher Education : 229
Program of 1 Year or Less : 577
Yes, Program Longer than 1 Year:2022
NA's : 698
initial_qual
In-service: Teacher Training Program : 113
Other : 142
Pre-service: Strandard Teacher Training Program:2303
Training on Another Pedagogical Profession : 31
Work-Based Teacher Training : 241
NA's : 696
prof_dev reading_score
Didn't Participate: 51 Min. :260.3
Participated :2753 1st Qu.:468.2
NA's : 722 Median :502.7
Mean :505.1
3rd Qu.:537.1
Max. :608.0
Mardia Test for Teacher Data
Show code
teach_numeric <- dplyr::select(t_data, age, teaching_exp, reading_score)
mult.norm(teach_numeric)$mult.test
Beta-hat kappa p-val
Skewness 2.75807 1276.527 0
Kurtosis 17.33922 11.253 0
Show code
## Mahalanobis Distance
teach_numeric$mahal <- mahalanobis(teach_numeric, colMeans(teach_numeric), cov(teach_numeric))
teach_numeric$p <- pchisq(teach_numeric$mahal, df = 3, lower.tail = FALSE)
high_maha_teacher <- teach_numeric %>%
dplyr::filter(p <= .001) %>%
arrange(p)
high_maha_teacher
[1] age teaching_exp reading_score mahal
[5] p
<0 rows> (or 0-length row.names)
Show code
# rowsum(high_maha_teacher)
Imputing the data in teacher_education column
Show code
options(scipen = 999)
# Plotting the missing pattern
# plot_pattern(t_data)
# Or
checking_missing_data <- VIM::aggr(t_data, col = c("dodgerblue", "maroon"), numbers = TRUE, sortVars = TRUE, labels = names(t_data), cex.axis = .7, gap = 3, ylab = c("Histogram of missing data", "Pattern"))

Variables sorted by number of missings:
Variable Count
teaching_exp 0.2101531
gender 0.2050482
prof_dev 0.2047646
age 0.1985252
teacher_edu 0.1979580
highest_education 0.1973908
initial_qual 0.1973908
full_time 0.1965400
teacher_type 0.1885990
school_id 0.0000000
teacher_id 0.0000000
reading_score 0.0000000
Show code
summary(checking_missing_data, plot = FALSE)
Missings per variable:
Variable Count
school_id 0
teacher_id 0
gender 723
age 700
teacher_type 665
full_time 693
highest_education 696
teaching_exp 741
teacher_edu 698
initial_qual 696
prof_dev 722
reading_score 0
Missings in combinations of variables:
Combinations Count Percent
0:0:0:0:0:0:0:0:0:0:0:0 2709 76.82926829
0:0:0:0:0:0:0:0:0:0:1:0 28 0.79410096
0:0:0:0:0:0:0:0:0:1:0:0 2 0.05672150
0:0:0:0:0:0:0:0:1:0:0:0 2 0.05672150
0:0:0:0:0:0:0:0:1:1:0:0 1 0.02836075
0:0:0:0:0:0:0:1:0:0:0:0 47 1.33295519
0:0:0:0:0:0:1:1:0:0:0:0 1 0.02836075
0:0:0:0:0:0:1:1:1:1:1:0 1 0.02836075
0:0:0:0:0:1:0:0:0:0:0:0 1 0.02836075
0:0:0:1:0:0:0:0:0:0:0:0 6 0.17016449
0:0:0:1:0:0:1:1:1:1:1:0 1 0.02836075
0:0:0:1:0:1:0:0:0:0:0:0 1 0.02836075
0:0:0:1:0:1:1:1:1:1:1:0 3 0.08508225
0:0:1:0:0:0:0:0:0:0:0:0 30 0.85082246
0:0:1:0:0:0:0:0:0:0:1:0 1 0.02836075
0:0:1:0:0:0:0:0:1:0:0:0 1 0.02836075
0:0:1:0:0:0:1:0:0:0:0:0 1 0.02836075
0:0:1:0:0:1:1:0:1:1:1:0 1 0.02836075
0:0:1:1:0:0:0:0:0:0:0:0 1 0.02836075
0:0:1:1:0:0:1:1:1:0:0:0 1 0.02836075
0:0:1:1:0:1:1:1:1:1:1:0 22 0.62393647
0:0:1:1:1:1:1:1:1:1:1:0 665 18.85989790
Show code
# A tibble: 1 × 4
statistic df p.value missing.patterns
<dbl> <dbl> <dbl> <int>
1 372. 185 1.01e-14 22
Show code
# Imputing
new_data <- mice::mice(t_data, m = 5, maxit = 50, method = "pmm", seed = 0410, printFlag = FALSE)
# summary(new_data)
Plotting the imputed vs original data
Show code
Show code
ggmice::ggmice(
new_data,
ggplot2::aes(x = teaching_exp, y = reading_score)
) +
ggplot2::geom_jitter(width = 0.25) +
ggplot2::labs(x = "Teaching Experience")

Pooled Linear Model
Show code
pooled_lm <- with(new_data, lm(reading_score ~ gender + age + teacher_type + full_time + highest_education + teaching_exp + teacher_edu + initial_qual + prof_dev))
## Pooled Statistics
summary(pool(pooled_lm))
term
1 (Intercept)
2 genderMale
3 age
4 teacher_typeReading-Language Teacher
5 full_timePart-Time
6 highest_educationBachelor’s degree
7 highest_educationDoctoral or professional degree
8 highest_educationHigh school and/or some college courses
9 highest_educationMaster’s degree
10 teaching_exp
11 teacher_eduProgram of 1 Year or Less
12 teacher_eduYes, Program Longer than 1 Year
13 initial_qualOther
14 initial_qualPre-service: Strandard Teacher Training Program
15 initial_qualTraining on Another Pedagogical Profession
16 initial_qualWork-Based Teacher Training
17 prof_devParticipated
estimate std.error statistic df
1 491.0451642 19.6442728 24.9968614 20.78529
2 2.1475017 2.2357278 0.9605381 24.53414
3 -0.2872031 0.1238626 -2.3187235 249.00975
4 1.7009438 1.7738840 0.9588811 976.33670
5 35.2711991 5.7960439 6.0853920 573.00128
6 3.4827457 18.0067609 0.1934132 13.36907
7 6.1955769 19.0161891 0.3258054 13.56622
8 41.1263804 25.6156879 1.6055154 15.04531
9 14.2131938 17.7222506 0.8019971 14.11398
10 0.6097394 0.1476246 4.1303377 277.90985
11 10.1418112 4.1287233 2.4564037 54.99358
12 9.2704997 3.5876879 2.5839761 194.65525
13 7.8319378 7.1415849 1.0966666 25.10695
14 7.0007912 5.1494828 1.3595135 37.07149
15 -1.3017070 9.5126237 -0.1368400 173.30645
16 3.7636249 6.0943549 0.6175592 30.79034
17 -10.2256630 7.4555208 -1.3715558 32.82142
p.value
1 0.0000000000000000544673
2 0.3461560809803291283160
3 0.0212203213226792135326
4 0.3378560504549662502072
5 0.0000000021273780051914
6 0.8495429823132213398296
7 0.7495425607854081517800
8 0.1291609585854332775678
9 0.4358505832686117020813
10 0.0000479633897133009042
11 0.0172189944262561359101
12 0.0104991628257080065606
13 0.2831948580983822405521
14 0.1821964668061148984179
15 0.8913161358591843441701
16 0.5414067739100731913027
17 0.1795059863826796719977
Saving all the imputed data set as individual dataset
Using Imputed 1
Show code
lm_one <- with(teacher_data_1, lm(reading_score ~ gender + age + teacher_type + full_time + highest_education + teaching_exp + teacher_edu + initial_qual + prof_dev))
# summary(lm_one)
Using Imputed 2
Show code
lm_two <- with(teacher_data_2, lm(reading_score ~ gender + age + teacher_type + full_time + highest_education + teaching_exp + teacher_edu + initial_qual + prof_dev))
summary(lm_two)
Call:
lm(formula = reading_score ~ gender + age + teacher_type + full_time +
highest_education + teaching_exp + teacher_edu + initial_qual +
prof_dev)
Residuals:
Min 1Q Median 3Q Max
-248.124 -33.554 -1.097 32.854 117.293
Coefficients:
Estimate
(Intercept) 474.0430
genderMale 2.7521
age -0.3036
teacher_typeReading-Language Teacher 1.2004
full_timePart-Time 35.9434
highest_educationBachelor’s degree 17.1941
highest_educationDoctoral or professional degree 21.1999
highest_educationHigh school and/or some college courses 59.3143
highest_educationMaster’s degree 27.0218
teaching_exp 0.6181
teacher_eduProgram of 1 Year or Less 7.6686
teacher_eduYes, Program Longer than 1 Year 8.5648
initial_qualOther 2.0063
initial_qualPre-service: Strandard Teacher Training Program 6.6670
initial_qualTraining on Another Pedagogical Profession -1.8108
initial_qualWork-Based Teacher Training 4.7709
prof_devParticipated -4.2699
Std. Error
(Intercept) 14.2008
genderMale 1.7374
age 0.1141
teacher_typeReading-Language Teacher 1.7332
full_timePart-Time 5.4026
highest_educationBachelor’s degree 11.2460
highest_educationDoctoral or professional degree 12.0063
highest_educationHigh school and/or some college courses 16.6828
highest_educationMaster’s degree 11.2416
teaching_exp 0.1377
teacher_eduProgram of 1 Year or Less 3.4808
teacher_eduYes, Program Longer than 1 Year 3.2469
initial_qualOther 5.5414
initial_qualPre-service: Strandard Teacher Training Program 4.2635
initial_qualTraining on Another Pedagogical Profession 9.3680
initial_qualWork-Based Teacher Training 4.8794
prof_devParticipated 6.2167
t value
(Intercept) 33.381
genderMale 1.584
age -2.661
teacher_typeReading-Language Teacher 0.693
full_timePart-Time 6.653
highest_educationBachelor’s degree 1.529
highest_educationDoctoral or professional degree 1.766
highest_educationHigh school and/or some college courses 3.555
highest_educationMaster’s degree 2.404
teaching_exp 4.490
teacher_eduProgram of 1 Year or Less 2.203
teacher_eduYes, Program Longer than 1 Year 2.638
initial_qualOther 0.362
initial_qualPre-service: Strandard Teacher Training Program 1.564
initial_qualTraining on Another Pedagogical Profession -0.193
initial_qualWork-Based Teacher Training 0.978
prof_devParticipated -0.687
Pr(>|t|)
(Intercept) < 2e-16
genderMale 0.113268
age 0.007833
teacher_typeReading-Language Teacher 0.488614
full_timePart-Time 3.32e-11
highest_educationBachelor’s degree 0.126376
highest_educationDoctoral or professional degree 0.077528
highest_educationHigh school and/or some college courses 0.000382
highest_educationMaster’s degree 0.016280
teaching_exp 7.35e-06
teacher_eduProgram of 1 Year or Less 0.027650
teacher_eduYes, Program Longer than 1 Year 0.008381
initial_qualOther 0.717329
initial_qualPre-service: Strandard Teacher Training Program 0.117972
initial_qualTraining on Another Pedagogical Profession 0.846739
initial_qualWork-Based Teacher Training 0.328251
prof_devParticipated 0.492230
(Intercept) ***
genderMale
age **
teacher_typeReading-Language Teacher
full_timePart-Time ***
highest_educationBachelor’s degree
highest_educationDoctoral or professional degree .
highest_educationHigh school and/or some college courses ***
highest_educationMaster’s degree *
teaching_exp ***
teacher_eduProgram of 1 Year or Less *
teacher_eduYes, Program Longer than 1 Year **
initial_qualOther
initial_qualPre-service: Strandard Teacher Training Program
initial_qualTraining on Another Pedagogical Profession
initial_qualWork-Based Teacher Training
prof_devParticipated
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 48.09 on 3509 degrees of freedom
Multiple R-squared: 0.04298, Adjusted R-squared: 0.03861
F-statistic: 9.849 on 16 and 3509 DF, p-value: < 2.2e-16
Show code
# names(teacher_data_2)
str(teacher_data_2)
'data.frame': 3526 obs. of 12 variables:
$ school_id : int 1 1 1 1 1 1 1 1 1 1 ...
$ teacher_id : int 1269 1293 135 1379 1400 1445 1631 1829 2114 2115 ...
$ gender : Factor w/ 2 levels "Female","Male": 2 2 1 2 2 1 2 1 1 1 ...
$ age : int 34 55 45 45 33 60 49 50 56 44 ...
$ teacher_type : Factor w/ 2 levels "General Teacher",..: 1 2 1 2 1 2 1 1 1 2 ...
$ full_time : Factor w/ 2 levels "Full-Time","Part-Time": 1 1 1 1 1 1 1 1 1 1 ...
$ highest_education: Factor w/ 5 levels "Associate’s degree",..: 5 5 5 2 5 5 5 5 2 5 ...
$ teaching_exp : int 14 24 16 16 11 32 20 21 29 23 ...
$ teacher_edu : Factor w/ 3 levels "None. No Teacher Education",..: 2 3 3 3 3 3 3 3 3 3 ...
$ initial_qual : Factor w/ 5 levels "In-service: Teacher Training Program",..: 3 3 3 3 3 3 3 3 3 3 ...
$ prof_dev : Factor w/ 2 levels "Didn't Participate",..: 2 2 2 2 2 2 2 2 2 2 ...
$ reading_score : num 510 510 510 510 510 ...
Show code
# summary(teacher_data_2)
# The decision is to use the second set of imputed data to run further analysis
# write.csv(teacher_data_2, file = "t_data.csv", row.names = FALSE)
Using Imputed 3
Show code
lm_three <- with(teacher_data_3, lm(reading_score ~ gender + age + teacher_type + full_time + highest_education + teaching_exp + teacher_edu + initial_qual + prof_dev))
# summary(lm_three)
Using Imputed 4
Show code
lm_four <- with(teacher_data_4, lm(reading_score ~ gender + age + teacher_type + full_time + highest_education + teaching_exp + teacher_edu + initial_qual + prof_dev))
# summary(lm_four)
Using Imputed 5
Show code
lm_five <- with(teacher_data_5, lm(reading_score ~ gender + age + teacher_type + full_time + highest_education + teaching_exp + teacher_edu + initial_qual + prof_dev))
# summary(lm_five)
Average Reading Scores Based on Student and Teacher Characteristics: Categorical Variables Having More than Two Groups Only
Teacher Characteristics
Teacher Highest Education
Teacher Education
Initial Qualification
Show code
teacher_data <- teacher_data |>
mutate(init_qual = recode(initial_qual,
"Other" = "Training on Other Pedagogical Profession or Other",
"Training on Another Pedagogical Profession" = "Training on Other Pedagogical Profession or Other"
))
### Getting Rid of initial_qual variable
teacher_data$initial_qual <- NULL
# Reading Score
teacher_data |>
dplyr::group_by(init_qual) |>
summarise(
mean_score = mean(reading_score),
sd_score = sd(reading_score)
)
# A tibble: 4 × 3
init_qual mean_score sd_score
<fct> <dbl> <dbl>
1 In-service: Teacher Training Program 495. 52.3
2 Training on Other Pedagogical Profession or Oth… 494. 49.1
3 Pre-service: Strandard Teacher Training Program 507. 48.3
4 Work-Based Teacher Training 501. 52.8
Regression for Teacher
Show code
t_model <- lm(reading_score ~ high_education + init_qual + gender + age + teacher_type + full_time + teaching_exp + teacher_edu + prof_dev, data = teacher_data)
summary(t_model)
Call:
lm(formula = reading_score ~ high_education + init_qual + gender +
age + teacher_type + full_time + teaching_exp + teacher_edu +
prof_dev, data = teacher_data)
Residuals:
Min 1Q Median 3Q Max
-248.336 -33.626 -0.947 32.888 117.907
Coefficients:
Estimate
(Intercept) 499.0622
high_educationBachelor’s degree -9.0625
high_educationDoctoral or professional degree -5.2077
high_educationMaster’s degree 0.7532
init_qualTraining on Other Pedagogical Profession or Other 1.8320
init_qualPre-service: Strandard Teacher Training Program 7.0112
init_qualWork-Based Teacher Training 5.5072
genderMale 2.8360
age -0.2966
teacher_typeReading-Language Teacher 1.2924
full_timePart-Time 34.9945
teaching_exp 0.6180
teacher_eduProgram of 1 Year or Less 8.0099
teacher_eduYes, Program Longer than 1 Year 8.8726
prof_devParticipated -4.0144
Std. Error
(Intercept) 12.3660
high_educationBachelor’s degree 8.5086
high_educationDoctoral or professional degree 9.4650
high_educationMaster’s degree 8.4988
init_qualTraining on Other Pedagogical Profession or Other 5.3568
init_qualPre-service: Strandard Teacher Training Program 4.2689
init_qualWork-Based Teacher Training 4.8824
genderMale 1.7399
age 0.1142
teacher_typeReading-Language Teacher 1.7349
full_timePart-Time 5.4043
teaching_exp 0.1378
teacher_eduProgram of 1 Year or Less 3.4828
teacher_eduYes, Program Longer than 1 Year 3.2487
prof_devParticipated 6.2231
t value
(Intercept) 40.358
high_educationBachelor’s degree -1.065
high_educationDoctoral or professional degree -0.550
high_educationMaster’s degree 0.089
init_qualTraining on Other Pedagogical Profession or Other 0.342
init_qualPre-service: Strandard Teacher Training Program 1.642
init_qualWork-Based Teacher Training 1.128
genderMale 1.630
age -2.596
teacher_typeReading-Language Teacher 0.745
full_timePart-Time 6.475
teaching_exp 4.483
teacher_eduProgram of 1 Year or Less 2.300
teacher_eduYes, Program Longer than 1 Year 2.731
prof_devParticipated -0.645
Pr(>|t|)
(Intercept) < 2e-16
high_educationBachelor’s degree 0.28690
high_educationDoctoral or professional degree 0.58221
high_educationMaster’s degree 0.92939
init_qualTraining on Other Pedagogical Profession or Other 0.73238
init_qualPre-service: Strandard Teacher Training Program 0.10060
init_qualWork-Based Teacher Training 0.25941
genderMale 0.10319
age 0.00946
teacher_typeReading-Language Teacher 0.45635
full_timePart-Time 1.08e-10
teaching_exp 7.59e-06
teacher_eduProgram of 1 Year or Less 0.02152
teacher_eduYes, Program Longer than 1 Year 0.00634
prof_devParticipated 0.51891
(Intercept) ***
high_educationBachelor’s degree
high_educationDoctoral or professional degree
high_educationMaster’s degree
init_qualTraining on Other Pedagogical Profession or Other
init_qualPre-service: Strandard Teacher Training Program
init_qualWork-Based Teacher Training
genderMale
age **
teacher_typeReading-Language Teacher
full_timePart-Time ***
teaching_exp ***
teacher_eduProgram of 1 Year or Less *
teacher_eduYes, Program Longer than 1 Year **
prof_devParticipated
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 48.17 on 3511 degrees of freedom
Multiple R-squared: 0.03951, Adjusted R-squared: 0.03568
F-statistic: 10.32 on 14 and 3511 DF, p-value: < 2.2e-16
Show code
# QuantPsyc::lm.beta(t_model)
Student/School Characteristics
Show code
s_data <- ss_data_5 |>
dplyr::mutate(student_race = recode(ETHNICITY,
"Other" = "Two or More Race/Other",
"Two or More Race" = "Two or More Race/Other"
))
### Getting Rid of ETHNICITY
s_data$ETHNICITY <- NULL
## Reading Scores
s_data |>
dplyr::group_by(student_race) |>
summarise(
mean_score = mean(READ_SCR),
sd_score = sd(READ_SCR)
)
# A tibble: 5 × 3
student_race mean_score sd_score
<fct> <dbl> <dbl>
1 Asian 559. 100.
2 Black or African American 447. 99.4
3 Hispanic or Latino 479. 97.5
4 Two or More Race/Other 495. 98.9
5 White, not Hispanic 526. 101.
Mother Education(MISCED)
Show code
s_data <- s_data |>
dplyr::mutate(mom_education = recode(MISCED,
"ISCED 1" = "ISCED 1/None",
"None" = "ISCED 1/None"
))
### Getting Rid of MISCED
s_data$MISCED <- NULL
## Reading Scores
s_data |>
dplyr::group_by(mom_education) |>
summarise(
mean_score = mean(READ_SCR),
sd_score = sd(READ_SCR)
)
# A tibble: 5 × 3
mom_education mean_score sd_score
<fct> <dbl> <dbl>
1 ISCED 1/None 469. 98.4
2 ISCED 2 453. 89.3
3 ISCED 3A, ISCED 4 493. 99.6
4 ISCED 5A, 6 521. 109.
5 ISCED 5B 490. 99.0
Father Education(FISCED)
Show code
s_data <- s_data |>
dplyr::mutate(dad_education = recode(FISCED,
"ISCED 1" = "ISCED 1/None",
"None" = "ISCED 1/None"
))
### Getting Rid of MISCED
s_data$FISCED <- NULL
## Reading Scores
s_data |>
dplyr::group_by(dad_education) |>
summarise(
mean_score = mean(READ_SCR),
sd_score = sd(READ_SCR)
)
# A tibble: 5 × 3
dad_education mean_score sd_score
<fct> <dbl> <dbl>
1 ISCED 1/None 466. 95.3
2 ISCED 2 463. 89.4
3 ISCED 3A, ISCED 4 491. 96.4
4 ISCED 5A, 6 529. 113.
5 ISCED 5B 489. 102.
School Location (SCH_LOCA)
Show code
s_data <- s_data |>
dplyr::mutate(school_location = recode(SCH_LOCA,
"rural" = "small_town/rural",
"small_town" = "small_town/rural"
))
### Getting Rid of MISCED
s_data$SCH_LOCA <- NULL
## Reading Scores
s_data |>
dplyr::group_by(school_location) |>
summarise(
mean_score = mean(READ_SCR),
sd_score = sd(READ_SCR)
)
# A tibble: 4 × 3
school_location mean_score sd_score
<fct> <dbl> <dbl>
1 city 504. 104.
2 large_city 496. 108.
3 small_town/rural 489. 103.
4 town 509. 105.
Home Langauge
Show code
## Reading Scores
s_data |>
dplyr::group_by(LANGN) |>
summarise(
mean_score = mean(READ_SCR),
sd_score = sd(READ_SCR)
)
# A tibble: 3 × 3
LANGN mean_score sd_score
<fct> <dbl> <dbl>
1 Another language (USA) 514. 109.
2 English 504. 105.
3 Spanish 464. 98.1
Home Internet
Show code
## Reading Scores
s_data |>
dplyr::group_by(INTERNET) |>
summarise(
mean_score = mean(READ_SCR),
sd_score = sd(READ_SCR)
)
# A tibble: 3 × 3
INTERNET mean_score sd_score
<fct> <dbl> <dbl>
1 No 442. 105.
2 Yes, and I use it 507. 103.
3 Yes, but I don’t use it 400. 90.7
Free or Reduced Price Lunch Status
Show code
## Reading Scores
s_data|>
dplyr::group_by(FRPL)|>
summarise(mean_score = mean(READ_SCR),
sd_score = sd(READ_SCR))
# A tibble: 5 × 3
FRPL mean_score sd_score
<fct> <dbl> <dbl>
1 <10% FRPL students 564. 92.2
2 >75% FRPL students 454. 99.1
3 10-24.9% FRPL students 545. 100.
4 25-50% FRPL students 515. 100.
5 50-74.9% FRPL students 492. 103.
Show code
GENDER LANGN COB_MOM
Female:2376 Another language (USA): 220 Other country:1246
Male :2462 English :4096 United States:3592
Spanish : 522
COB_DAD COM_HOM INTERNET
Other country:1293 No : 618 No : 207
United States:3545 Yes:4220 Yes, and I use it :4477
Yes, but I don’t use it: 154
ICTHOME SCH_TYPE FRPL
Min. : 1.000 Private: 203 <10% FRPL students : 145
1st Qu.: 8.000 Public :4635 >75% FRPL students :1046
Median :10.000 10-24.9% FRPL students: 615
Mean : 9.411 25-50% FRPL students :1655
3rd Qu.:11.000 50-74.9% FRPL students:1377
Max. :12.000
SCH_FRPL SCH_ELS SCH_DISA EXT_CLAS
Min. : 1.00 Min. : 0.00 Min. : 0.00 No :3471
1st Qu.: 29.00 1st Qu.: 4.00 1st Qu.: 10.00 Yes:1367
Median : 45.00 Median :10.00 Median : 13.00
Mean : 47.01 Mean :18.92 Mean : 13.56
3rd Qu.: 65.00 3rd Qu.:23.50 3rd Qu.: 19.00
Max. :100.00 Max. :92.00 Max. :100.00
READ_SCR student_race
Min. :157.3 Asian : 285
1st Qu.:425.9 Black or African American: 785
Median :504.8 Hispanic or Latino :1229
Mean :500.6 Two or More Race/Other : 419
3rd Qu.:578.4 White, not Hispanic :2120
Max. :810.5
mom_education dad_education
ISCED 1/None : 181 ISCED 1/None : 221
ISCED 2 : 396 ISCED 2 : 470
ISCED 3A, ISCED 4:1470 ISCED 3A, ISCED 4:1860
ISCED 5A, 6 :2083 ISCED 5A, 6 :1731
ISCED 5B : 708 ISCED 5B : 556
school_location
city :1278
large_city : 740
small_town/rural:1241
town :1579
Regression SS
A. Combined Model
Show code
model_ss <- lm(READ_SCR ~ GENDER + student_race + LANGN + COB_MOM + COB_DAD + mom_education + dad_education + COM_HOM + INTERNET + ICTHOME + school_location + SCH_TYPE + FRPL + SCH_ELS + SCH_DISA + EXT_CLAS, data = s_data)
summary(model_ss)
Call:
lm(formula = READ_SCR ~ GENDER + student_race + LANGN + COB_MOM +
COB_DAD + mom_education + dad_education + COM_HOM + INTERNET +
ICTHOME + school_location + SCH_TYPE + FRPL + SCH_ELS + SCH_DISA +
EXT_CLAS, data = s_data)
Residuals:
Min 1Q Median 3Q Max
-339.61 -62.66 3.26 67.45 279.68
Coefficients:
Estimate Std. Error t value
(Intercept) 533.27388 17.02337 31.326
GENDERMale -25.77556 2.69965 -9.548
student_raceBlack or African American -77.95811 7.75320 -10.055
student_raceHispanic or Latino -48.30198 7.58599 -6.367
student_raceTwo or More Race/Other -44.28149 8.20029 -5.400
student_raceWhite, not Hispanic -18.14613 7.36535 -2.464
LANGNEnglish 34.55963 7.71833 4.478
LANGNSpanish 8.08795 9.14157 0.885
COB_MOMUnited States -14.32337 5.01035 -2.859
COB_DADUnited States -5.91159 4.90204 -1.206
mom_educationISCED 2 -20.66005 9.11039 -2.268
mom_educationISCED 3A, ISCED 4 2.30731 8.62465 0.268
mom_educationISCED 5A, 6 8.91699 8.82278 1.011
mom_educationISCED 5B -0.92099 9.08838 -0.101
dad_educationISCED 2 2.08418 8.33776 0.250
dad_educationISCED 3A, ISCED 4 8.06241 7.86603 1.025
dad_educationISCED 5A, 6 22.57427 8.20059 2.753
dad_educationISCED 5B 2.93681 8.70621 0.337
COM_HOMYes 28.85008 4.53423 6.363
INTERNETYes, and I use it 39.55543 7.28902 5.427
INTERNETYes, but I don’t use it -39.73948 10.42477 -3.812
ICTHOME -2.09122 0.72873 -2.870
school_locationlarge_city -8.50075 4.41113 -1.927
school_locationsmall_town/rural -19.15112 4.04880 -4.730
school_locationtown -8.16566 3.72009 -2.195
SCH_TYPEPublic -0.02858 6.88496 -0.004
FRPL>75% FRPL students -64.92083 8.85290 -7.333
FRPL10-24.9% FRPL students -15.09950 8.72813 -1.730
FRPL25-50% FRPL students -33.90767 8.28079 -4.095
FRPL50-74.9% FRPL students -46.29508 8.55092 -5.414
SCH_ELS 0.10570 0.07521 1.405
SCH_DISA -0.32412 0.19983 -1.622
EXT_CLASYes -4.49647 3.20358 -1.404
Pr(>|t|)
(Intercept) < 2e-16 ***
GENDERMale < 2e-16 ***
student_raceBlack or African American < 2e-16 ***
student_raceHispanic or Latino 2.10e-10 ***
student_raceTwo or More Race/Other 6.99e-08 ***
student_raceWhite, not Hispanic 0.01379 *
LANGNEnglish 7.72e-06 ***
LANGNSpanish 0.37634
COB_MOMUnited States 0.00427 **
COB_DADUnited States 0.22790
mom_educationISCED 2 0.02339 *
mom_educationISCED 3A, ISCED 4 0.78908
mom_educationISCED 5A, 6 0.31222
mom_educationISCED 5B 0.91929
dad_educationISCED 2 0.80262
dad_educationISCED 3A, ISCED 4 0.30543
dad_educationISCED 5A, 6 0.00593 **
dad_educationISCED 5B 0.73589
COM_HOMYes 2.17e-10 ***
INTERNETYes, and I use it 6.02e-08 ***
INTERNETYes, but I don’t use it 0.00014 ***
ICTHOME 0.00413 **
school_locationlarge_city 0.05402 .
school_locationsmall_town/rural 2.31e-06 ***
school_locationtown 0.02821 *
SCH_TYPEPublic 0.99669
FRPL>75% FRPL students 2.62e-13 ***
FRPL10-24.9% FRPL students 0.08370 .
FRPL25-50% FRPL students 4.30e-05 ***
FRPL50-74.9% FRPL students 6.46e-08 ***
SCH_ELS 0.15997
SCH_DISA 0.10487
EXT_CLASYes 0.16051
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 92.92 on 4805 degrees of freedom
Multiple R-squared: 0.2216, Adjusted R-squared: 0.2165
F-statistic: 42.76 on 32 and 4805 DF, p-value: < 2.2e-16
Show code
# QuantPsyc::lm.beta(model_ss)
B. School Model
Show code
options(scipen = 999)
model_school <- lm(READ_SCR ~ school_location + SCH_TYPE + FRPL + SCH_ELS + SCH_DISA + EXT_CLAS, data = s_data)
summary(model_school)
Call:
lm(formula = READ_SCR ~ school_location + SCH_TYPE + FRPL + SCH_ELS +
SCH_DISA + EXT_CLAS, data = s_data)
Residuals:
Min 1Q Median 3Q Max
-337.31 -68.77 4.17 71.79 292.06
Coefficients:
Estimate Std. Error t value
(Intercept) 584.22857 11.34561 51.494
school_locationlarge_city -13.39899 4.71422 -2.842
school_locationsmall_town/rural -19.43819 4.14748 -4.687
school_locationtown -5.66410 3.93780 -1.438
SCH_TYPEPublic -11.40530 7.36441 -1.549
FRPL>75% FRPL students -110.42296 9.23061 -11.963
FRPL10-24.9% FRPL students -17.14007 9.36533 -1.830
FRPL25-50% FRPL students -47.78556 8.83150 -5.411
FRPL50-74.9% FRPL students -71.91088 9.05253 -7.944
SCH_ELS 0.11541 0.07514 1.536
SCH_DISA -0.09699 0.21333 -0.455
EXT_CLASYes -6.55133 3.42714 -1.912
Pr(>|t|)
(Intercept) < 0.0000000000000002 ***
school_locationlarge_city 0.0045 **
school_locationsmall_town/rural 0.00000285225375143 ***
school_locationtown 0.1504
SCH_TYPEPublic 0.1215
FRPL>75% FRPL students < 0.0000000000000002 ***
FRPL10-24.9% FRPL students 0.0673 .
FRPL25-50% FRPL students 0.00000006577429293 ***
FRPL50-74.9% FRPL students 0.00000000000000242 ***
SCH_ELS 0.1246
SCH_DISA 0.6494
EXT_CLASYes 0.0560 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 100.1 on 4826 degrees of freedom
Multiple R-squared: 0.09295, Adjusted R-squared: 0.09089
F-statistic: 44.96 on 11 and 4826 DF, p-value: < 0.00000000000000022
C. Student and ICT Model
Show code
model_stu_ICT <- lm(READ_SCR ~ GENDER + student_race + LANGN + COB_MOM + COB_DAD + mom_education + dad_education + COM_HOM + INTERNET + ICTHOME, data = s_data)
summary(model_stu_ICT)
Call:
lm(formula = READ_SCR ~ GENDER + student_race + LANGN + COB_MOM +
COB_DAD + mom_education + dad_education + COM_HOM + INTERNET +
ICTHOME, data = s_data)
Residuals:
Min 1Q Median 3Q Max
-343.42 -62.88 3.77 69.09 289.57
Coefficients:
Estimate Std. Error t value
(Intercept) 474.7782 12.9391 36.693
GENDERMale -25.8235 2.7350 -9.442
student_raceBlack or African American -91.9820 7.6813 -11.975
student_raceHispanic or Latino -51.5029 7.6349 -6.746
student_raceTwo or More Race/Other -49.2520 8.2957 -5.937
student_raceWhite, not Hispanic -21.4457 7.4088 -2.895
LANGNEnglish 34.9356 7.8199 4.468
LANGNSpanish 3.7848 9.2430 0.409
COB_MOMUnited States -17.3768 5.0566 -3.436
COB_DADUnited States -7.5449 4.9610 -1.521
mom_educationISCED 2 -19.6681 9.2277 -2.131
mom_educationISCED 3A, ISCED 4 5.1157 8.7306 0.586
mom_educationISCED 5A, 6 14.9053 8.9208 1.671
mom_educationISCED 5B 1.1089 9.2079 0.120
dad_educationISCED 2 4.0194 8.4451 0.476
dad_educationISCED 3A, ISCED 4 9.9565 7.9626 1.250
dad_educationISCED 5A, 6 29.5824 8.2820 3.572
dad_educationISCED 5B 6.3804 8.8201 0.723
COM_HOMYes 35.4891 4.5604 7.782
INTERNETYes, and I use it 39.1149 7.3835 5.298
INTERNETYes, but I don’t use it -44.5599 10.5627 -4.219
ICTHOME -1.9921 0.7384 -2.698
Pr(>|t|)
(Intercept) < 2e-16 ***
GENDERMale < 2e-16 ***
student_raceBlack or African American < 2e-16 ***
student_raceHispanic or Latino 1.70e-11 ***
student_raceTwo or More Race/Other 3.11e-09 ***
student_raceWhite, not Hispanic 0.003813 **
LANGNEnglish 8.09e-06 ***
LANGNSpanish 0.682210
COB_MOMUnited States 0.000594 ***
COB_DADUnited States 0.128362
mom_educationISCED 2 0.033106 *
mom_educationISCED 3A, ISCED 4 0.557937
mom_educationISCED 5A, 6 0.094817 .
mom_educationISCED 5B 0.904146
dad_educationISCED 2 0.634138
dad_educationISCED 3A, ISCED 4 0.211212
dad_educationISCED 5A, 6 0.000358 ***
dad_educationISCED 5B 0.469472
COM_HOMYes 8.68e-15 ***
INTERNETYes, and I use it 1.23e-07 ***
INTERNETYes, but I don’t use it 2.50e-05 ***
ICTHOME 0.006999 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 94.34 on 4816 degrees of freedom
Multiple R-squared: 0.1958, Adjusted R-squared: 0.1923
F-statistic: 55.85 on 21 and 4816 DF, p-value: < 2.2e-16
D. ICT Model
Show code
ict_only_model <- lm(READ_SCR ~ COM_HOM + INTERNET + ICTHOME, data = s_data)
summary(ict_only_model)
Call:
lm(formula = READ_SCR ~ COM_HOM + INTERNET + ICTHOME, data = s_data)
Residuals:
Min 1Q Median 3Q Max
-297.667 -70.736 3.899 74.143 298.670
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 418.5152 8.0786 51.805 < 2e-16
COM_HOMYes 58.4775 4.7622 12.280 < 2e-16
INTERNETYes, and I use it 43.1563 7.8793 5.477 4.54e-08
INTERNETYes, but I don’t use it -55.4009 11.2778 -4.912 9.29e-07
ICTHOME -0.7574 0.7721 -0.981 0.327
(Intercept) ***
COM_HOMYes ***
INTERNETYes, and I use it ***
INTERNETYes, but I don’t use it ***
ICTHOME
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 100.9 on 4833 degrees of freedom
Multiple R-squared: 0.07664, Adjusted R-squared: 0.07587
F-statistic: 100.3 on 4 and 4833 DF, p-value: < 2.2e-16
D. Student Only Model
Show code
student_only_model <- lm(READ_SCR ~ GENDER + student_race + LANGN + COB_MOM + COB_DAD + mom_education + dad_education, data = s_data)
summary(student_only_model)
Call:
lm(formula = READ_SCR ~ GENDER + student_race + LANGN + COB_MOM +
COB_DAD + mom_education + dad_education, data = s_data)
Residuals:
Min 1Q Median 3Q Max
-342.95 -64.25 5.48 70.35 308.64
Coefficients:
Estimate Std. Error t value
(Intercept) 522.539 11.129 46.953
GENDERMale -28.085 2.785 -10.085
student_raceBlack or African American -98.765 7.841 -12.596
student_raceHispanic or Latino -54.536 7.804 -6.988
student_raceTwo or More Race/Other -53.471 8.479 -6.306
student_raceWhite, not Hispanic -21.580 7.576 -2.848
LANGNEnglish 34.660 7.997 4.334
LANGNSpanish 1.171 9.450 0.124
COB_MOMUnited States -20.894 5.158 -4.050
COB_DADUnited States -9.505 5.069 -1.875
mom_educationISCED 2 -17.592 9.431 -1.865
mom_educationISCED 3A, ISCED 4 12.466 8.902 1.400
mom_educationISCED 5A, 6 24.202 9.077 2.666
mom_educationISCED 5B 9.376 9.381 1.000
dad_educationISCED 2 -1.423 8.628 -0.165
dad_educationISCED 3A, ISCED 4 9.822 8.138 1.207
dad_educationISCED 5A, 6 30.628 8.445 3.627
dad_educationISCED 5B 4.852 9.005 0.539
Pr(>|t|)
(Intercept) < 2e-16 ***
GENDERMale < 2e-16 ***
student_raceBlack or African American < 2e-16 ***
student_raceHispanic or Latino 3.17e-12 ***
student_raceTwo or More Race/Other 3.12e-10 ***
student_raceWhite, not Hispanic 0.00441 **
LANGNEnglish 1.49e-05 ***
LANGNSpanish 0.90141
COB_MOMUnited States 5.19e-05 ***
COB_DADUnited States 0.06082 .
mom_educationISCED 2 0.06219 .
mom_educationISCED 3A, ISCED 4 0.16145
mom_educationISCED 5A, 6 0.00769 **
mom_educationISCED 5B 0.31760
dad_educationISCED 2 0.86899
dad_educationISCED 3A, ISCED 4 0.22748
dad_educationISCED 5A, 6 0.00029 ***
dad_educationISCED 5B 0.59007
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 96.49 on 4820 degrees of freedom
Multiple R-squared: 0.1581, Adjusted R-squared: 0.1552
F-statistic: 53.25 on 17 and 4820 DF, p-value: < 2.2e-16