Introduction:

Education in adolescence is an important resource that can set them up to develop skills to succeed in adult life. Proficiency in language and mathematics is valuable for individual and societal success. A mixture of individual factors and organizational are associated with the success of students, and we intend to examine the influence of both types of factors on student test scores.

Description of the Data

The Raw Data

The original dataset is a collection of data from Snijders and Bosker (2012) adapted from the raw data from a 1989 study by H. P. Brandsma and W. M. Knuver containing information on 4106 pupils at 216 schools, found in the R mice library (1). The 14 variables of the adapted dataset are listed below, featuring demographic information on the students and schools and their pre- and post-test scores for language and mathematics. The information on the original study (2) shows that a random sample of 250 Dutch primary schools were selected within which all seventh grade students were tested on their proficiency in Dutch language and mathematics before and after an interval of one year. Information was also gathered on the student backgrounds and schoolrelated factors with an intention of measuring the effects of school and classrom characteristics on the progress of the students in these subjects.

sch - School number (numeric)

pup - Pupil ID (numeric)

iqv - IQ verbal (numeric)

iqp - IQ performal (numeric)

sex - Sex of pupil (categorical)

ses - SES score of pupil (numeric)

min - Minority member 0/1 (categorical)

rpg - Number of repeated groups, 0, 1, 2 (categorical)

lpr - language score PRE (numeric)

lpo - language score POST (numeric)

apr - Arithmetic score PRE (numeric)

apo - Arithmetic score POST (numeric)

den - Denomination classification 1-4 - at school level (categorical)

ssi - School SES indicator - at school level (numeric)

The Analytic Dataset

Over the course of the past two analyses, we have examined this dataset through an initial exploratory examination of each of the individual features as well as their relationships (3) as well as data cleaning processes to create a final analytic dataset (4). We noticed issues of skewness among the continuous features and sparse categories in the categorical features, both of which were corrected for through transformations and meaningful category regrouping during feature engineering. Missing data was prevalent in the dataset and imputed through multiple imputation using the R mice package. When examining relationships between the features, a measure of correlation among the continuous features was noticed, and PCA was examined to check for its relevance in future analysis. Clustering algorithms were also considered, and no obvious evidence in favor of them was found with a view of future analysis. Each of the features was standardized with a view of increasing the effectiveness of distance-based algorithms.

The final analytic dataset consists of 4601 observations of 15 variables, listed as follows:

sch - School number (numeric)

pup - Pupil ID (numeric)

tiqv - Transformed, standardized IQ verbal (numeric)

iqp - Standardized IQ performal (numeric)

sex - Sex of pupil (categorical)

tses - Transformed, standardized SES score of pupil (numeric)

min - Minority member 0/1 (categorical)

has_repeated_group - Whether or not there was a repeated group 0/1 (categorical)

tlpr - Transformed, standardized language score PRE (numeric)

tlpo - Transformed, standardized language score POST (numeric)

apr - Standardized Arithmetic score PRE (numeric)

tapo - Transformed, standardized Arithmetic score POST (numeric)

den - Denomination classification 1-4 - at school level (categorical)

tssi - Transformed, standardized School SES indicator - at school level (numeric)

tpost - Transformed, standardized post-score (transformed + standardized after calculating lpo+apo) (numeric)

A summary of the variables can be observed as follows:

## # A tibble: 2 × 2
##   l_chg_pos     n
##   <lgl>     <int>
## 1 FALSE       646
## 2 TRUE       3460
## # A tibble: 2 × 2
##   a_chg_pos     n
##   <lgl>     <int>
## 1 FALSE       431
## 2 TRUE       3675
##       pup            sch        sex      min      den           tlpr         
##  Min.   :   1   Min.   :  1.0   0:2100   0:3868   1:1271   Min.   :-2.81814  
##  1st Qu.:1073   1st Qu.: 66.0   1:2006   1: 238   2:1600   1st Qu.:-0.70954  
##  Median :2128   Median :132.0                     3:1038   Median : 0.03653  
##  Mean   :2121   Mean   :129.2                     4: 197   Mean   : 0.00000  
##  3rd Qu.:3170   3rd Qu.:187.0                              3rd Qu.: 0.69505  
##  Max.   :4214   Max.   :259.0                              Max.   : 2.57173  
##       tses                tssi               tiqv               tlpo          
##  Min.   :-2.171494   Min.   :-2.47302   Min.   :-3.14261   Min.   :-2.589462  
##  1st Qu.:-0.579751   1st Qu.:-0.58918   1st Qu.:-0.67684   1st Qu.:-0.764652  
##  Median : 0.001848   Median :-0.08691   Median : 0.03873   Median : 0.005185  
##  Mean   : 0.000000   Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.000000  
##  3rd Qu.: 0.807671   3rd Qu.: 0.79825   3rd Qu.: 0.53429   3rd Qu.: 0.754953  
##  Max.   : 1.752968   Max.   : 2.24266   Max.   : 3.47449   Max.   : 2.183419  
##       tapo                apr                iqp               tpost          
##  Min.   :-2.258335   Min.   :-3.13107   Min.   :-3.03163   Min.   :-2.562547  
##  1st Qu.:-0.882041   1st Qu.:-0.84354   1st Qu.:-0.77656   1st Qu.:-0.742193  
##  Median : 0.002267   Median : 0.01428   Median :-0.02488   Median :-0.006606  
##  Mean   : 0.000000   Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.000000  
##  3rd Qu.: 0.805056   3rd Qu.: 0.87210   3rd Qu.: 0.72380   3rd Qu.: 0.800893  
##  Max.   : 1.658299   Max.   : 2.30180   Max.   : 2.98187   Max.   : 2.092595  
##  has_repeated_group l_chg    a_chg   
##  0:3572             0: 646   0: 431  
##  1: 534             1:3460   1:3675  
##                                      
##                                      
##                                      
## 

As we can see, there are no longer missing values in any of the variables and all of the numeric variables are centered at 0. We can take a quick look at their distributions as follows:

The transformed distributions, while not perfect, generally show patterns of being unimodally, symmetrically distributed around a mean of 0.

Research Questions

With the analytic dataset created, the goal of this report is to answer two research questions. Language and arithmetic proficiency are often cumulatively considered in the assessment of the academic success of an individual, and is especially important as a measure of young adolescent students as an overall measure of their academic well-being. As such, in the analytic dataset, the language and arithmetic post-scores were combined into a total post-test score which will be used as a response variable for modeling purposes.

Another question that we want to consider is whether or not, on an individual basis, the student improved over the year between their pre-test and post-test. We will create two binary response variables, one for the language pre- and post-test and one for the arithemetic pre- and post-test, which represents whether or not the student improved (defined as an increase in their post-test score compared to their pre-test) over the past year.

We will use models to answer the following two research questions:

  1. What factors, individual and on a school level, are associated with higher total post-test scores (language and arithmetic added together)?
  2. What factors, individual and on a school level, are associated with increases in test scores for language and arithmetic after one year?

Methodology

Multiple linear regression is a statistical method that uses several explanatory variables to predict the outcome of a continous response variable, and is an extension of ordinary least squares regression. It fits a linear relationship between the explanatory variables and the response. The assumptions for multiple regression include the following: the response variable is normally distributed and has a constant variance, the explanatory variables are nonrandom, the explanatory variables are noncorrelated, the explanatory variables have a linear relationship with the response, and the data is randomly collected and independent.

Logistic regression is a statistical method used for analyzing a dataset in which there are one or more independent variables that determine an outcome. The outcome is measured with a binary response variable, and is therefore appropriate for situations where linear regression cannot be used due to a dichotomous categorical response variable.The assuptions for a logistic regression include the following: the dependent variable should be binary, the independent variables should not be correlated, the log odds (the logit of the probability) and the independent variables should have a linear relationship, the sample size has to be sufficiently large, observations must be independent. Unlike multiple linear analysis, due to a categorical response, logistic regression does not require a normally distributed response variable with a constant variance. Outliers for both the linear and logistic regression model can have a significant effect on the results, and as such should be examined before and after the model’s creation.

K-fold cross-validation is a statistical technique where data is split into k folds and then iteratively split into testing and training data which is then used to assess the predictive power of a model. It ensures the use of all datapoints and can help adjust for variability and keep the model from overfitting any one fold, leading to a more robust final result. We will use 5-fold cross validation to test the predictive performance of both the multiple linear regression and logistic regression models.

Linear Regression

We intend to create a linear regression model to identify factors that impact the total post-score of the students and quantify their effects. The predictors identified in the initial round of feature selection done in the previous analysis were apr, tlpr, tiqv, iqp, tses, rpg, tssi, den, min, and sex. We will look at the pairwise correlation plots of the numeric variables below.

A measure of correlation is observable between some of the identified predictors. We will see whether or not this will cause problems with our regression assumptions later. If necessary, PCA will be used to create noncorrelated principal components. We also observe that all of the correlations are identified

Main effect models

We will create a full model with all the selected predictors and perform stepwise selection to remove certain predictors based on their AIC value.

kable(summary(full_model)$coef, caption ="Full Main Effects Linear Model Parameter Estimates")
Full Main Effects Linear Model Parameter Estimates
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1308347 0.0200139 -6.537187 0.0000000
sex1 0.0668590 0.0189540 3.527429 0.0004242
min1 0.0524918 0.0412210 1.273421 0.2029408
den2 0.2517373 0.0225912 11.143168 0.0000000
den3 0.0911318 0.0248985 3.660127 0.0002553
den4 0.1054136 0.0467413 2.255254 0.0241700
tlpr 0.3266385 0.0131490 24.841355 0.0000000
tses 0.0926378 0.0112352 8.245346 0.0000000
tssi 0.0619850 0.0106659 5.811492 0.0000000
tiqv 0.1672796 0.0127085 13.162781 0.0000000
apr 0.2524611 0.0118620 21.283210 0.0000000
iqp 0.1275108 0.0113208 11.263382 0.0000000
has_repeated_group1 -0.2388497 0.0288936 -8.266537 0.0000000

The stepwise model only removed the parameter of min, which represents whether or not the student is of a minority group. This was the same feature that was identified in the initial exploratory analysis as having a nonsignificant p-value when a linear model for post was created with the same predictors using the mice imputed dataset.

We can now check the assumptions of both models, looking for evidence of violations to normality, homoscedasticity, and multicollinearity.

GVIF Df GVIF^(1/(2*Df))
sex 1.051641 1 1.025495
min 1.086965 1 1.042576
den 1.104577 3 1.016715
tlpr 2.025026 1 1.423034
tses 1.478446 1 1.215914
tssi 1.332431 1 1.154310
tiqv 1.891633 1 1.375366
apr 1.648014 1 1.283750
iqp 1.501076 1 1.225184
has_repeated_group 1.106545 1 1.051924
GVIF Df GVIF^(1/(2*Df))
sex 1.051535 1 1.025443
den 1.091854 3 1.014754
tlpr 2.024256 1 1.422764
tses 1.457989 1 1.207472
tssi 1.329791 1 1.153166
tiqv 1.870996 1 1.367844
apr 1.645849 1 1.282906
iqp 1.501039 1 1.225169
has_repeated_group 1.105350 1 1.051356

The Q-Q plot is largely linear and therefore shows evidence of normality, and the VIF values do not show evidence of multicollinearity that requires adjustment to the predictors in either model. However, the residuals vs. fitted plots do show a “football” shape, having smaller variances at both ends and much larger variances in the middle rather than completely random scatter above the line x = 0. This could evidence against the variance being constant, which would violate the assumption of homogeneity of variance and can lead to bias in the predictions of the linear regression model.

However, with a large dataset, it’s also possible that the narrowness observed for extreme values at each end is due to there being much fewer observations at those points. This would naturally affect the visual spread of the residuals compared to points along the x axis where there are many more observations, even in a case where all observations technically have the same variance. We will proceed with the rest of the linear regression analysis with caution.

Using these two candidate models, we will compare their performance using 5-fold cross-validation.

## Linear Regression 
## 
## 4106 samples
##   10 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 3285, 3285, 3285, 3284, 3285 
## Resampling results:
## 
##   RMSE       Rsquared   MAE     
##   0.5926082  0.6489202  0.473492
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE
## Linear Regression 
## 
## 4106 samples
##    9 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 3285, 3285, 3285, 3285, 3284 
## Resampling results:
## 
##   RMSE      Rsquared   MAE      
##   0.592711  0.6489602  0.4737837
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

Comparing across the RMSE, the R-squared, and the MAE, we observe very small differences between the performances of the full and regression models. The test RMSE of around 0.593 for both models indicates that the predictions of the standardized total post-score were off by an average of 0.593. Using the standard deviation (118.3467) from the transformed post-test score and the optimal lambda (1.515152) that was used to transform its values in order to put this back into the meaningful units of the original test score, this translates to predicted total post-test scores that were on average, off by about 8.85 points.

With such similar performances between the two models, on principles of parsimony, the information given suggests the stepwise model as the best linear model for predicting the total post-test score.

Interaction models

To continue our investigation, we will go through a similar process to the above but with a full model including all two-way interactions and then run a stepwise variable selection algorithm to create a second model.

Full Model Parameter Estimates (including 2-way Interaction Terms)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1064979 0.0274184 -3.8841685 0.0001043
sex1 0.0834443 0.0359217 2.3229521 0.0202313
min1 0.0301980 0.0857932 0.3519856 0.7248675
den2 0.2680596 0.0347261 7.7192615 0.0000000
den3 0.0693947 0.0375079 1.8501350 0.0643672
den4 0.0061992 0.0911167 0.0680359 0.9457604
tlpr 0.3567131 0.0289933 12.3032765 0.0000000
tses 0.0867820 0.0244123 3.5548412 0.0003825
tssi 0.0811018 0.0221348 3.6639911 0.0002515
tiqv 0.1636655 0.0276889 5.9108769 0.0000000
apr 0.2416855 0.0255163 9.4718254 0.0000000
iqp 0.1369337 0.0246568 5.5535907 0.0000000
has_repeated_group1 -0.1968657 0.0613606 -3.2083392 0.0013455
sex1:min1 -0.0736986 0.0862125 -0.8548486 0.3926858
sex1:den2 -0.0186334 0.0464428 -0.4012110 0.6882861
sex1:den3 0.0299292 0.0511305 0.5853495 0.5583455
sex1:den4 -0.1077445 0.0987973 -1.0905609 0.2755313
sex1:tlpr -0.0003232 0.0267767 -0.0120692 0.9903710
sex1:tses -0.0068189 0.0232842 -0.2928536 0.7696491
sex1:tssi 0.0028004 0.0219202 0.1277525 0.8983512
sex1:tiqv -0.0193043 0.0259309 -0.7444518 0.4566466
sex1:apr -0.0134254 0.0243814 -0.5506387 0.5819119
sex1:iqp 0.0476249 0.0230989 2.0617852 0.0392922
sex1:has_repeated_group1 -0.0661562 0.0609397 -1.0856009 0.2777205
min1:den2 -0.0329569 0.1106262 -0.2979121 0.7657856
min1:den3 0.2261507 0.1075253 2.1032317 0.0355074
min1:den4 -0.2282251 0.2883784 -0.7914081 0.4287524
min1:tlpr 0.0079044 0.0608150 0.1299743 0.8965933
min1:tses -0.0586789 0.0437015 -1.3427185 0.1794388
min1:tssi -0.0196607 0.0405581 -0.4847537 0.6278774
min1:tiqv -0.0335875 0.0532779 -0.6304205 0.5284553
min1:apr 0.0358244 0.0555244 0.6452014 0.5188334
min1:iqp -0.0324949 0.0557085 -0.5833024 0.5597224
min1:has_repeated_group1 -0.0706659 0.1069767 -0.6605724 0.5089243
den2:tlpr -0.0500320 0.0324534 -1.5416539 0.1232362
den3:tlpr -0.0166327 0.0354607 -0.4690444 0.6390633
den4:tlpr -0.0009627 0.0662051 -0.0145406 0.9883994
den2:tses 0.0072369 0.0273919 0.2641977 0.7916411
den3:tses 0.0057750 0.0305631 0.1889528 0.8501394
den4:tses 0.0348950 0.0596058 0.5854295 0.5582917
den2:tssi -0.0592745 0.0251548 -2.3563879 0.0185012
den3:tssi 0.0005565 0.0295975 0.0188021 0.9849999
den4:tssi 0.1420138 0.0616173 2.3047714 0.0212300
den2:tiqv 0.0495857 0.0319241 1.5532374 0.1204449
den3:tiqv 0.0307880 0.0337703 0.9116894 0.3619868
den4:tiqv -0.0150837 0.0621569 -0.2426714 0.8082723
den2:apr 0.0540228 0.0289048 1.8689948 0.0616960
den3:apr -0.0033687 0.0322360 -0.1045000 0.9167778
den4:apr 0.0446393 0.0601225 0.7424725 0.4578444
den2:iqp -0.0494948 0.0279247 -1.7724345 0.0763980
den3:iqp -0.0551015 0.0307824 -1.7900305 0.0735241
den4:iqp -0.0677145 0.0572862 -1.1820404 0.2372594
den2:has_repeated_group1 -0.1269193 0.0687168 -1.8469906 0.0648218
den3:has_repeated_group1 -0.1092569 0.0819053 -1.3339425 0.1822981
den4:has_repeated_group1 -0.0224686 0.1555317 -0.1444630 0.8851421
tlpr:tses -0.0086364 0.0161707 -0.5340745 0.5933194
tlpr:tssi 0.0152900 0.0149937 1.0197620 0.3079026
tlpr:tiqv -0.0258014 0.0133845 -1.9277052 0.0539620
tlpr:apr -0.0033486 0.0148403 -0.2256402 0.8214927
tlpr:iqp -0.0130072 0.0157016 -0.8283983 0.4074940
tlpr:has_repeated_group1 -0.0830017 0.0413268 -2.0084202 0.0446653
tses:tssi -0.0172101 0.0104507 -1.6467857 0.0996801
tses:tiqv -0.0052787 0.0154559 -0.3415336 0.7327197
tses:apr 0.0085199 0.0145080 0.5872540 0.5570661
tses:iqp 0.0170547 0.0137498 1.2403640 0.2149130
tses:has_repeated_group1 0.0004933 0.0354527 0.0139157 0.9888979
tssi:tiqv -0.0076874 0.0147291 -0.5219199 0.6017548
tssi:apr -0.0175288 0.0139338 -1.2580080 0.2084618
tssi:iqp -0.0135971 0.0134064 -1.0142289 0.3105344
tssi:has_repeated_group1 -0.0143241 0.0331753 -0.4317696 0.6659320
tiqv:apr -0.0166028 0.0157760 -1.0524064 0.2926762
tiqv:iqp 0.0029304 0.0147826 0.1982330 0.8428728
tiqv:has_repeated_group1 -0.0407572 0.0406055 -1.0037374 0.3155654
apr:iqp 0.0009278 0.0131381 0.0706159 0.9437069
apr:has_repeated_group1 -0.0351618 0.0363686 -0.9668194 0.3336923
iqp:has_repeated_group1 0.0249920 0.0353212 0.7075648 0.4792565
Stepwise Selected Model Parameter Estimates (including 2-way Interaction Terms
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0938870 0.0214198 -4.3831819 0.0000120
sex1 0.0673288 0.0188566 3.5705675 0.0003603
min1 0.0196275 0.0670235 0.2928444 0.7696560
den2 0.2432223 0.0231820 10.4918435 0.0000000
den3 0.0749242 0.0257478 2.9099217 0.0036347
den4 -0.0477896 0.0696477 -0.6861614 0.4926503
tlpr 0.3374911 0.0135927 24.8288267 0.0000000
tses 0.0924705 0.0115384 8.0141386 0.0000000
tssi 0.0746634 0.0163702 4.5609465 0.0000052
tiqv 0.1721524 0.0126799 13.5768219 0.0000000
apr 0.2550977 0.0118231 21.5761823 0.0000000
iqp 0.1087985 0.0142298 7.6458219 0.0000000
has_repeated_group1 -0.2947357 0.0337451 -8.7341685 0.0000000
sex1:iqp 0.0325983 0.0186337 1.7494267 0.0802924
min1:den2 -0.0932103 0.1021298 -0.9126656 0.3614723
min1:den3 0.1838061 0.1017406 1.8066155 0.0708959
min1:den4 -0.2264644 0.2773336 -0.8165775 0.4142176
min1:tses -0.0737487 0.0353519 -2.0861338 0.0370285
den2:tssi -0.0533074 0.0221998 -2.4012563 0.0163834
den3:tssi 0.0023204 0.0260210 0.0891761 0.9289463
den4:tssi 0.1402723 0.0532475 2.6343438 0.0084619
tlpr:tiqv -0.0389074 0.0084378 -4.6111006 0.0000041
tlpr:has_repeated_group1 -0.1138520 0.0314174 -3.6238571 0.0002938
tses:tssi -0.0145763 0.0095299 -1.5295377 0.1262087
tssi:apr -0.0157276 0.0096419 -1.6311683 0.1029320

The main and interaction effects identified with p-values less than an alpha level of 0.05 by the full interaction model are the following: Main effects: sex, den, tlpr, tses, tssi, tiqv, apr, iqp, has_repeated_group Interaction effects: sex:iqp, min:den, den:tssi, tlpr:has_repeated_group

All of the main effects that were selected in the main effects stepwise selected model have been selected for the interaction model as well, and min has also been included. The following two-way interaction effects were also kept in the stepwise selected model: sex:iqp, min:den, den:tses, den:tssi, tlpr:tiqv, tlpr:has_repeated_group, tses:tssi, and tssi:tapr. The ones significant at a 0.05 alpha level are the following: min:tses, den:tssi, tlpr:tiqv, and tlpr:has_repeated_group.

Once again, we will look at the residual plots and VIF to check for violations to the assumptions of linear regression.

## there are higher-order terms (interactions) in this model
## consider setting type = 'predictor'; see ?vif
GVIF Df GVIF^(1/(2*Df))
sex 3.809247 1 1.951729
min 4.748375 1 2.179077
den 23.809346 3 1.696125
tlpr 9.928978 1 3.151028
tses 7.039262 1 2.653161
tssi 5.787093 1 2.405638
tiqv 9.055627 1 3.009257
apr 7.690275 1 2.773134
iqp 7.180935 1 2.679727
has_repeated_group 5.032781 1 2.243386
sex:min 2.248965 1 1.499655
sex:den 19.999647 3 1.647544
sex:tlpr 4.071754 1 2.017859
sex:tses 3.155137 1 1.776271
sex:tssi 2.725603 1 1.650940
sex:tiqv 3.767442 1 1.940990
sex:apr 3.162665 1 1.778388
sex:iqp 2.883590 1 1.698114
sex:has_repeated_group 1.984716 1 1.408799
min:den 3.469597 3 1.230401
min:tlpr 4.163131 1 2.040375
min:tses 2.830600 1 1.682439
min:tssi 2.008851 1 1.417339
min:tiqv 4.360568 1 2.088197
min:apr 2.538613 1 1.593302
min:iqp 2.627520 1 1.620963
min:has_repeated_group 1.978804 1 1.406699
den:tlpr 27.074722 3 1.732849
den:tses 11.067287 3 1.492818
den:tssi 15.308579 3 1.575757
den:tiqv 22.032438 3 1.674340
den:apr 17.173784 3 1.606242
den:iqp 12.455987 3 1.522520
den:has_repeated_group 6.129867 3 1.352826
tlpr:tses 3.270337 1 1.808407
tlpr:tssi 2.853894 1 1.689347
tlpr:tiqv 2.955831 1 1.719253
tlpr:apr 2.890235 1 1.700069
tlpr:iqp 3.311464 1 1.819743
tlpr:has_repeated_group 3.110993 1 1.763801
tses:tssi 1.437101 1 1.198792
tses:tiqv 3.235540 1 1.798761
tses:apr 2.599010 1 1.612144
tses:iqp 2.309144 1 1.519587
tses:has_repeated_group 2.260765 1 1.503584
tssi:tiqv 2.822146 1 1.679924
tssi:apr 2.341525 1 1.530204
tssi:iqp 2.108531 1 1.452078
tssi:has_repeated_group 1.749666 1 1.322749
tiqv:apr 3.332620 1 1.825546
tiqv:iqp 3.171794 1 1.780953
tiqv:has_repeated_group 3.115000 1 1.764936
apr:iqp 2.238348 1 1.496111
apr:has_repeated_group 2.216783 1 1.488887
iqp:has_repeated_group 2.066145 1 1.437409
## there are higher-order terms (interactions) in this model
## consider setting type = 'predictor'; see ?vif
GVIF Df GVIF^(1/(2*Df))
sex 1.054132 1 1.026709
min 2.910283 1 1.705955
den 2.957540 3 1.198087
tlpr 2.191604 1 1.480407
tses 1.579220 1 1.256670
tssi 3.178745 1 1.782903
tiqv 1.907127 1 1.380988
apr 1.658111 1 1.287677
iqp 2.401859 1 1.549793
has_repeated_group 1.528596 1 1.236364
sex:iqp 1.884477 1 1.372762
min:den 2.565254 3 1.170007
min:tses 1.860164 1 1.363878
den:tssi 7.024258 3 1.383885
tlpr:tiqv 1.179702 1 1.086141
tlpr:has_repeated_group 1.805575 1 1.343717
tses:tssi 1.200084 1 1.095483
tssi:apr 1.125975 1 1.061119

The results look much the same to the main effect full and stepwise reduced models.

Finally, we will perform 5-fold cross-validation with the interaction models.

## Linear Regression 
## 
## 4106 samples
##   10 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 3285, 3285, 3285, 3284, 3285 
## Resampling results:
## 
##   RMSE       Rsquared   MAE      
##   0.5944228  0.6469267  0.4742027
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE
## Linear Regression 
## 
## 4106 samples
##   10 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 3285, 3285, 3285, 3285, 3284 
## Resampling results:
## 
##   RMSE       Rsquared  MAE      
##   0.5903401  0.651934  0.4711809
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

Once again, the results appear similar to the main effect models.

Model Selection

We will assess the four models with a few goodness of fit measures.

Goodness-of-fit Measures of Candidate Models
SSE R.sq R.adj AIC BIC
Full Main Effect Model 1434.527 0.6505416 0.6495171 -4291.929 -4209.766
Stepwise Main Effect Model 1435.095 0.6504032 0.6494639 -4292.302 -4216.460
Full Interaction Model 1400.588 0.6588094 0.6524597 -4264.239 -3783.903
Stepwise Interaction Model 1412.310 0.6559537 0.6539304 -4332.016 -4174.011

Once again, the statistics across the board are all quite similar; this, combined with similar performances with 5-fold cross-validation, suggest that the four models perform similarly in terms of predictive potential. The adjusted R-square and AIC values recommend the stepwise interaction model, while the BIC recommends the stepwise main effect model.

Just to check, I will try one last reduced model where all nonsignificant interaction effects from the stepwise interaction model are removed, starting from the one with the greatest p-value and continuing to remove terms one at a time and rerunning until a model is created with all factors having a p-value under the alpha of 0.05.

This ends up giving the following predictors: Main effects: sex, min, den, tlpr, tses, tssi, tiqv, apr, iqp, has_repeated_group Interaction effects: den:tssi, tlpr:tiqv, tlpr:has_repeated_group, tssi:apr

## Linear Regression 
## 
## 4106 samples
##   10 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 3285, 3284, 3284, 3285, 3286 
## Resampling results:
## 
##   RMSE       Rsquared   MAE      
##   0.5889761  0.6528472  0.4701559
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

Based on this, we do get a slightly smaller RMSE value compared to the other models, but the measures are still very similar. Once again, we can look at the goodness of fit statistics in comparison with our other candidate models.

Goodness-of-fit Measures of Candidate Models
SSE R.sq R.adj AIC BIC
Full Main Effect Model 1434.527 0.6505416 0.6495171 -4291.929 -4209.766
Stepwise Main Effect Model 1435.095 0.6504032 0.6494639 -4292.302 -4216.460
Full Interaction Model 1400.588 0.6588094 0.6524597 -4264.239 -3783.903
Stepwise Interaction Model 1412.310 0.6559537 0.6539304 -4332.016 -4174.011
Simplified Stepwise Interaction Model 1417.166 0.6547708 0.6532503 -4329.923 -4209.839

The adjusted R-squared and AIC values still recommend the original stepwise interaction model. However, the BIC value for the simplified stepwise interaction model is better than the original stepwise interaction model and the original full main effects model, although the stepwise main effects model still performs better than it according to its BIC.

The models all seem to perform very similarly. As such, our first recommendation would be the simplest model, that being the stepwise main effects model, for having the fewest predictors and no interaction effects. The differences in performance appear small enough for it to not differ substantially to the other models for predictive purposes. However, for analytic purposes and in order to examine the ways that the features affect the response with more depth, my second recommendation would be the stepwise interaction or simplified stepwise interaction models. These include significant interaction terms and allow for the modeling of the effects of certain features on the response changing depending on other features.

The parameter estimates of the stepwise selected main effects model are as follows:

Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1261515 0.0196746 -6.411893 0.0000000
sex1 0.0666164 0.0189545 3.514540 0.0004453
den2 0.2486549 0.0224628 11.069615 0.0000000
den3 0.0895137 0.0248680 3.599555 0.0003225
den4 0.1039741 0.0467312 2.224940 0.0261396
tlpr 0.3263121 0.0131475 24.819363 0.0000000
tses 0.0909549 0.0111580 8.151532 0.0000000
tssi 0.0613804 0.0106562 5.760079 0.0000000
tiqv 0.1655893 0.0126400 13.100441 0.0000000
apr 0.2530087 0.0118551 21.341785 0.0000000
iqp 0.1274391 0.0113215 11.256336 0.0000000
has_repeated_group1 -0.2376406 0.0288801 -8.228511 0.0000000

All of the main effects from the original data are included except for min, which had two levels to represent whether or not the student was a minority. This model is based on the transformed, standardized values of the variables. However, the signs of the parameters should be unchanged. From this model, we can see that higher values for the pre-test scores for arithemetic and language, higher values of individual socioeconomic scores and school socioeconomic status, and higher IQ scores are positively associated with the total post-test score. For the categorical features, being part of the group sex = 1 meant higher average post-test scores, and the denomination classifications of 2-4 performed better than the denomination classified as 1. The only feature to have a negative parameter estimate is has_repeated_group, where the presence of a repeated group for the individual is negatively associated with their post-test score.

Logistic Regression

Our second research question examines the question of whether or not an individual student has improved from their pre-test score to their post-test score. We created two binary response variables that measure whether or not an individual had a positive (greater than 0) change in score from their post-test score to their pre-test score, one for their arithmetic scores and one for language. We will use these variables to create two final logistic models.

Language Score Improvement: Yes or No
l_chg n
0 646
1 3460
Arithmetic Score Improvement: Yes or No
a_chg n
0 431
1 3675
## `summarise()` has grouped output by 'l_chg'. You can override using the
## `.groups` argument.
Combined Score Improvement
l_chg a_chg n
0 0 156
0 1 490
1 0 275
1 1 3185

Looking across these tables, we see that most students improved between their pre- and post-test scores for both arithmetic and language, which is expected. The table of the combinations of values for a_chg and l_chg show that 156 students showed no improvement in either subject, 490 showed improvement in arithmetic but not language, 275 showed improvement in language but not arithmetic, and 3185 showed improvement in both subjects.

Main Effect Models

We’ll begin by fitting two full main effect logistic regression models.

Language Improvement Logistic Full Main Effects Model Parameter Estimates
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.6361975 0.0957575 17.086893 0.0000000
sex1 0.6883649 0.0957025 7.192757 0.0000000
min1 0.3101520 0.1864894 1.663107 0.0962909
den2 0.2228432 0.1139980 1.954799 0.0506068
den3 -0.1227161 0.1161427 -1.056597 0.2906954
den4 0.1635197 0.2613480 0.625678 0.5315262
tses 0.1077922 0.0566879 1.901501 0.0572364
tssi 0.0652368 0.0536795 1.215302 0.2242508
tiqv 0.1769375 0.0589754 3.000193 0.0026981
apr -0.1188408 0.0622530 -1.908999 0.0562622
iqp -0.1289585 0.0581090 -2.219252 0.0264696
has_repeated_group1 -0.3020738 0.1189820 -2.538819 0.0111227
tapo 0.8997798 0.0726812 12.379821 0.0000000
Arithmetic Improvement Logistic Full Main Effects Model Parameter Estimates
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.0768835 0.1396567 22.0317616 0.0000000
sex1 -0.5606782 0.1185947 -4.7276856 0.0000023
min1 0.2706972 0.2094831 1.2922149 0.1962827
den2 0.2330302 0.1378617 1.6903179 0.0909671
den3 0.0932450 0.1440809 0.6471712 0.5175212
den4 0.3407571 0.3798616 0.8970559 0.3696891
tses 0.0545382 0.0703005 0.7757869 0.4378748
tssi 0.2195656 0.0653789 3.3583529 0.0007841
tiqv -0.0549398 0.0824169 -0.6666082 0.5050224
tlpr -0.0926403 0.0907471 -1.0208619 0.3073199
iqp 0.4949656 0.0699995 7.0709867 0.0000000
has_repeated_group1 -0.3500899 0.1382767 -2.5318076 0.0113476
tlpo 1.1531362 0.1001138 11.5182524 0.0000000

Just like the linear regression modeling process, we will continue by conducting stepwise feature selection for both models.

Language Improvement Logistic Stepwise Main Effects Model Parameter Estimates
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.6299280 0.0954738 17.0719897 0.0000000
sex1 0.6888521 0.0956709 7.2002278 0.0000000
min1 0.2992200 0.1860192 1.6085436 0.1077162
den2 0.2265830 0.1138930 1.9894375 0.0466529
den3 -0.1187971 0.1159940 -1.0241653 0.3057572
den4 0.2255486 0.2563477 0.8798541 0.3789384
tses 0.1363646 0.0516008 2.6426827 0.0082252
tiqv 0.1756527 0.0589409 2.9801527 0.0028810
apr -0.1207157 0.0621988 -1.9408044 0.0522820
iqp -0.1317355 0.0580559 -2.2691140 0.0232614
has_repeated_group1 -0.2921902 0.1185932 -2.4638033 0.0137472
tapo 0.9095235 0.0722238 12.5931189 0.0000000
Arithmetic Improvement Logistic Stepwise Main Effects Model Parameter Estimates
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.2057049 0.1125474 28.483163 0.0000000
sex1 -0.5561029 0.1176502 -4.726747 0.0000023
tssi 0.2329103 0.0577382 4.033902 0.0000549
tlpr -0.1207150 0.0853155 -1.414924 0.1570906
iqp 0.4885369 0.0686741 7.113842 0.0000000
has_repeated_group1 -0.3327376 0.1362473 -2.442160 0.0145997
tlpo 1.1613681 0.0957747 12.126043 0.0000000

The stepwise feature selection algorithm selected the following effects: Language Improvement Prediction Model: sex, min, den, ses, tiqv, apr, iqp, has_repeated_group, tapo Arithmetic Improvement Prediction Model: sex, tssi, tlpr, iqp, has_repeated_group, tlpo

The post-test score and pre-test score for the other subject are both significant predictors for whether or not the student improves in Arithmetic or Language, which makes logical sense, as higher values for both probably correspond to improvement over time. Sex was selected by both stepwise algorithms, as was performoral IQ and the presence of a repeated group.

Once again, we will do 5-fold cross-validation to compare the full and reduced models.

## Generalized Linear Model 
## 
## 4106 samples
##   10 predictor
##    2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 3285, 3285, 3285, 3284, 3285 
## Resampling results:
## 
##   Accuracy   Kappa     
##   0.8429122  0.09970755
## Generalized Linear Model 
## 
## 4106 samples
##    9 predictor
##    2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 3284, 3285, 3285, 3285, 3285 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.8412087  0.0853706
## Generalized Linear Model 
## 
## 4106 samples
##   10 predictor
##    2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 3284, 3285, 3285, 3285, 3285 
## Resampling results:
## 
##   Accuracy   Kappa   
##   0.8933257  0.147092
## Generalized Linear Model 
## 
## 4106 samples
##    6 predictor
##    2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 3285, 3285, 3285, 3285, 3284 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.8950313  0.1562784

The accuracy of predictions is pretty similar across all of these models, with the language models having accuracy measures around 84% and the arithmetic models having accuracy measures around 89%. However, the kappa values are all very low compared to the accuracy; the kappa measures how much better the model performs compared to random chance. Due to the imbalance in sizes between the two groups that are being predicted in the response variable, as many more students improved over the course of the year than did not, a model that assigns every individual into the “improved” category for arithmetic would accurately predict for about 89% of the individuals and 84% of the individuals for language.

Specificity and Sensitivity Comparisons of Main Effect Logistic Candidate Models
Specificity Sensitivity
Language Full Model 0.0804954 0.9852601
Language Stepwise Model 0.0712074 0.9849711
Arithmetic Full Model 0.1136891 0.9847619
Arithmetic Stepwise Model 0.1183295 0.9861224

We can see that the specificity is much lower than the sensitivity in all models, in large part because of the imbalance in the population sizes between the two groups being predicted. Many more students improved over the course of the year than did not.

Interaction models

Again, we will consider candidate models including the two-way interaction effects.

Language Improvement Logistic Full Interaction Model Parameter Estimates
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.7062976 0.1394097 12.2394432 0.0000000
sex1 0.8634790 0.1954405 4.4181172 0.0000100
min1 0.3198667 0.4646885 0.6883465 0.4912346
den2 0.2438101 0.1807563 1.3488329 0.1773906
den3 -0.1696851 0.1844776 -0.9198145 0.3576697
den4 -0.0422157 0.5086950 -0.0829883 0.9338608
tses 0.0342911 0.1224336 0.2800792 0.7794167
tssi 0.1214235 0.1132944 1.0717519 0.2838315
tiqv 0.3985457 0.1259812 3.1635338 0.0015587
apr -0.1118514 0.1340454 -0.8344287 0.4040394
iqp -0.2524666 0.1275772 -1.9789323 0.0478236
has_repeated_group1 -0.3416819 0.2619239 -1.3045080 0.1920604
tapo 1.0653578 0.1553065 6.8597100 0.0000000
sex1:min1 0.1609503 0.4173068 0.3856883 0.6997275
sex1:den2 -0.4393610 0.2422003 -1.8140400 0.0696716
sex1:den3 -0.0244622 0.2512248 -0.0973719 0.9224310
sex1:den4 -0.9519515 0.5674422 -1.6776185 0.0934216
sex1:tses 0.0359980 0.1233606 0.2918108 0.7704313
sex1:tssi -0.0308302 0.1164553 -0.2647385 0.7912109
sex1:tiqv -0.0571562 0.1258223 -0.4542617 0.6496405
sex1:apr 0.2430774 0.1325842 1.8333811 0.0667459
sex1:iqp 0.0747956 0.1249368 0.5986671 0.5493949
sex1:has_repeated_group1 -0.3092365 0.2544738 -1.2151995 0.2242900
sex1:tapo -0.2576516 0.1526308 -1.6880706 0.0913977
min1:den2 0.1846860 0.5325324 0.3468071 0.7287362
min1:den3 0.7325912 0.5418983 1.3518979 0.1764080
min1:den4 -0.7722948 1.3846286 -0.5577631 0.5770062
min1:tses 0.4555638 0.2414286 1.8869504 0.0591670
min1:tssi -0.4578091 0.2001644 -2.2871658 0.0221861
min1:tiqv 0.0288928 0.2194117 0.1316832 0.8952349
min1:apr 0.1793195 0.2703232 0.6633521 0.5071051
min1:iqp -0.1010233 0.2628319 -0.3843647 0.7007082
min1:has_repeated_group1 0.4115781 0.4444190 0.9261037 0.3543921
min1:tapo 0.1137495 0.3220799 0.3531717 0.7239597
den2:tses -0.0468959 0.1478047 -0.3172827 0.7510291
den3:tses 0.0249514 0.1508217 0.1654366 0.8686004
den4:tses 0.8345710 0.3631425 2.2981917 0.0215509
den2:tssi -0.1310320 0.1337752 -0.9794940 0.3273360
den3:tssi -0.0274568 0.1470471 -0.1867210 0.8518794
den4:tssi 0.4204708 0.3665275 1.1471739 0.2513098
den2:tiqv -0.2145140 0.1515525 -1.4154435 0.1569385
den3:tiqv -0.2141464 0.1534821 -1.3952533 0.1629395
den4:tiqv -0.3290181 0.3745718 -0.8783845 0.3797351
den2:apr -0.2435519 0.1587055 -1.5346151 0.1248784
den3:apr 0.0552162 0.1609949 0.3429687 0.7316220
den4:apr -0.0314004 0.3884972 -0.0808253 0.9355809
den2:iqp 0.1660909 0.1487873 1.1162972 0.2642949
den3:iqp -0.0573170 0.1558758 -0.3677090 0.7130902
den4:iqp -0.1157497 0.3426259 -0.3378311 0.7354905
den2:has_repeated_group1 -0.0458396 0.2871392 -0.1596423 0.8731628
den3:has_repeated_group1 -0.0938449 0.3190394 -0.2941482 0.7686446
den4:has_repeated_group1 1.2326111 0.9065194 1.3597183 0.1739191
den2:tapo 0.0922433 0.1814866 0.5082653 0.6112673
den3:tapo 0.1703408 0.1952135 0.8725871 0.3828882
den4:tapo 0.0833544 0.4274201 0.1950176 0.8453792
tses:tssi -0.2155143 0.0528084 -4.0810585 0.0000448
tses:tiqv 0.0108522 0.0778450 0.1394083 0.8891275
tses:apr 0.0219100 0.0808791 0.2708982 0.7864694
tses:iqp 0.0887965 0.0740750 1.1987370 0.2306302
tses:has_repeated_group1 0.2303632 0.1513631 1.5219246 0.1280280
tses:tapo 0.1368794 0.0935307 1.4634703 0.1433387
tssi:tiqv 0.0648836 0.0737792 0.8794290 0.3791687
tssi:apr 0.0461141 0.0765111 0.6027111 0.5467009
tssi:iqp -0.0212389 0.0731393 -0.2903905 0.7715175
tssi:has_repeated_group1 -0.2166419 0.1407285 -1.5394319 0.1236989
tssi:tapo -0.1345021 0.0856194 -1.5709305 0.1161988
tiqv:apr -0.0290393 0.0779735 -0.3724250 0.7095765
tiqv:iqp -0.0336456 0.0719880 -0.4673784 0.6402292
tiqv:has_repeated_group1 -0.1621035 0.1524055 -1.0636330 0.2874950
tiqv:tapo 0.0220366 0.0902475 0.2441797 0.8070917
apr:iqp -0.1361049 0.0755065 -1.8025585 0.0714576
apr:has_repeated_group1 0.1910030 0.1605133 1.1899515 0.2340655
apr:tapo 0.2178975 0.0822817 2.6481897 0.0080924
iqp:has_repeated_group1 0.2198454 0.1527205 1.4395283 0.1500009
iqp:tapo 0.0503895 0.0845277 0.5961296 0.5510887
has_repeated_group1:tapo -0.3164266 0.1935579 -1.6347906 0.1020930
Arithmetic Improvement Logistic Full Interaction Model Parameter Estimates
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.4695761 0.2606815 13.3096374 0.0000000
sex1 -0.8617315 0.2739196 -3.1459286 0.0016556
min1 -0.3416063 0.5332428 -0.6406206 0.5217692
den2 0.3328552 0.3095574 1.0752617 0.2822576
den3 -0.0281453 0.3158084 -0.0891214 0.9289854
den4 2.7795977 1.5085784 1.8425278 0.0653980
tses -0.0287365 0.1831552 -0.1568970 0.8753260
tssi 0.6044787 0.1744104 3.4658401 0.0005286
tiqv -0.1624788 0.2162353 -0.7513981 0.4524131
tlpr -0.0176622 0.2302605 -0.0767053 0.9388580
iqp 0.8027781 0.1903237 4.2179614 0.0000247
has_repeated_group1 -0.2672107 0.3907214 -0.6838907 0.4940442
tlpo 1.6252616 0.2344123 6.9333459 0.0000000
sex1:min1 -0.2168854 0.4318151 -0.5022645 0.6154814
sex1:den2 0.0395357 0.3003371 0.1316378 0.8952707
sex1:den3 0.0934059 0.3156058 0.2959576 0.7672624
sex1:den4 -0.4829058 0.9578003 -0.5041822 0.6141334
sex1:tses 0.1783996 0.1538894 1.1592713 0.2463456
sex1:tssi -0.4004225 0.1470836 -2.7224152 0.0064807
sex1:tiqv -0.0028840 0.1786224 -0.0161459 0.9871180
sex1:tlpr 0.0737025 0.1940536 0.3798051 0.7040901
sex1:iqp 0.0783898 0.1534750 0.5107658 0.6095150
sex1:has_repeated_group1 0.0515487 0.2929535 0.1759621 0.8603237
sex1:tlpo -0.3506656 0.2134852 -1.6425758 0.1004707
min1:den2 -0.5114155 0.5479291 -0.9333607 0.3506338
min1:den3 -0.8148272 0.5161926 -1.5785334 0.1144431
min1:den4 -1.0389231 1.6972414 -0.6121245 0.5404554
min1:tses -0.2961879 0.2259620 -1.3107861 0.1899300
min1:tssi -0.3154094 0.2075310 -1.5198183 0.1285566
min1:tiqv 0.3646741 0.2791589 1.3063315 0.1914399
min1:tlpr -0.3967096 0.3520100 -1.1269838 0.2597493
min1:iqp -0.3001086 0.2627899 -1.1420096 0.2534500
min1:has_repeated_group1 0.2100001 0.4662606 0.4503921 0.6524278
min1:tlpo -0.0511923 0.3938281 -0.1299863 0.8965773
den2:tses -0.1176445 0.1760279 -0.6683286 0.5039238
den3:tses 0.2602258 0.1828672 1.4230315 0.1547270
den4:tses 0.2724586 0.5682346 0.4794827 0.6315953
den2:tssi -0.1528418 0.1612343 -0.9479486 0.3431556
den3:tssi -0.0525908 0.1743992 -0.3015540 0.7629921
den4:tssi -1.3008416 0.6886695 -1.8889200 0.0589025
den2:tiqv 0.2136252 0.2115856 1.0096398 0.3126679
den3:tiqv -0.0255183 0.2095768 -0.1217612 0.9030882
den4:tiqv 0.0530526 0.6544739 0.0810614 0.9353931
den2:tlpr 0.0466693 0.2281831 0.2045255 0.8379428
den3:tlpr 0.1561803 0.2331556 0.6698546 0.5029505
den4:tlpr -0.8203246 0.7657520 -1.0712667 0.2840495
den2:iqp -0.1913753 0.1765179 -1.0841697 0.2782896
den3:iqp -0.2280933 0.1831837 -1.2451614 0.2130724
den4:iqp -0.7141710 0.6223839 -1.1474766 0.2511847
den2:has_repeated_group1 -0.4869359 0.3340924 -1.4574888 0.1449815
den3:has_repeated_group1 0.0427078 0.3762529 0.1135083 0.9096276
den4:has_repeated_group1 -1.6764096 0.9917218 -1.6904032 0.0909509
den2:tlpo 0.0667916 0.2534120 0.2635694 0.7921117
den3:tlpo -0.2134310 0.2560447 -0.8335693 0.4045237
den4:tlpo 1.4258712 1.0386258 1.3728440 0.1698008
tses:tssi -0.0305961 0.0643780 -0.4752560 0.6346045
tses:tiqv 0.1790616 0.1103070 1.6233024 0.1045248
tses:tlpr -0.0280786 0.1160395 -0.2419749 0.8087996
tses:iqp -0.1284964 0.0888481 -1.4462477 0.1481077
tses:has_repeated_group1 0.0025836 0.1766905 0.0146221 0.9883337
tses:tlpo 0.0672296 0.1239356 0.5424560 0.5875044
tssi:tiqv -0.2102156 0.1007570 -2.0863634 0.0369457
tssi:tlpr 0.1236997 0.1074341 1.1514012 0.2495672
tssi:iqp 0.0374776 0.0869173 0.4311867 0.6663326
tssi:has_repeated_group1 -0.1660776 0.1619886 -1.0252423 0.3052489
tssi:tlpo 0.0072497 0.1209184 0.0599550 0.9521915
tiqv:tlpr 0.0089603 0.1154480 0.0776131 0.9381358
tiqv:iqp -0.0704031 0.1007303 -0.6989262 0.4845981
tiqv:has_repeated_group1 -0.1085234 0.1990527 -0.5451993 0.5856164
tiqv:tlpo -0.0532668 0.1341620 -0.3970336 0.6913426
tlpr:iqp 0.0050247 0.1103647 0.0455278 0.9636866
tlpr:has_repeated_group1 -0.0269426 0.2283576 -0.1179842 0.9060802
tlpr:tlpo 0.0986089 0.1188317 0.8298203 0.4066404
iqp:has_repeated_group1 0.2046135 0.1727716 1.1843011 0.2362939
iqp:tlpo 0.3279312 0.1308045 2.5070331 0.0121749
has_repeated_group1:tlpo -0.0909160 0.2595510 -0.3502818 0.7261272
Language Improvement Logistic Stepwise Interaction Model Parameter Estimates
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.6668519 0.1206726 13.8130119 0.0000000
sex1 0.8012155 0.1821726 4.3981118 0.0000109
min1 0.6000305 0.2970773 2.0197792 0.0434063
den2 0.3308813 0.1547745 2.1378279 0.0325307
den3 -0.1669390 0.1541003 -1.0833138 0.2786692
den4 0.4690459 0.3766675 1.2452520 0.2130391
tses 0.1523072 0.0921371 1.6530493 0.0983208
tssi 0.0632335 0.0589607 1.0724690 0.2835094
tiqv 0.1813492 0.0598108 3.0320468 0.0024290
apr -0.1179500 0.0843279 -1.3987071 0.1619008
iqp -0.2032185 0.0687764 -2.9547729 0.0031290
has_repeated_group1 -0.5020179 0.1679044 -2.9899030 0.0027907
tapo 1.1202457 0.0991842 11.2946015 0.0000000
sex1:den2 -0.4570286 0.2316355 -1.9730511 0.0484897
sex1:den3 -0.0099298 0.2402417 -0.0413326 0.9670308
sex1:den4 -0.9236706 0.5360329 -1.7231604 0.0848595
sex1:apr 0.2017032 0.1237583 1.6298150 0.1031406
sex1:tapo -0.2318731 0.1311260 -1.7683219 0.0770071
min1:tses 0.5087286 0.2096014 2.4271237 0.0152191
min1:tssi -0.4270360 0.1814359 -2.3536470 0.0185903
den2:tses -0.1988805 0.1120345 -1.7751723 0.0758694
den3:tses -0.0570852 0.1207752 -0.4726566 0.6364582
den4:tses 0.7884426 0.3036920 2.5961917 0.0094263
tses:tssi -0.1829103 0.0497905 -3.6735983 0.0002392
tses:iqp 0.0836591 0.0580539 1.4410583 0.1495682
tses:tapo 0.1047601 0.0672718 1.5572660 0.1194073
apr:iqp -0.1359372 0.0610653 -2.2260960 0.0260078
apr:tapo 0.2024796 0.0692151 2.9253671 0.0034405
iqp:has_repeated_group1 0.2358161 0.1409126 1.6734917 0.0942305
has_repeated_group1:tapo -0.3160075 0.1664020 -1.8990601 0.0575566
Arithmetic Improvement Logistic Stepwise Interaction Model Parameter Estimates
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.4515175 0.1338034 25.7954357 0.0000000
sex1 -0.6895429 0.1274397 -5.4107382 0.0000001
min1 -0.6877967 0.3081668 -2.2318976 0.0256217
tses 0.0764993 0.0915006 0.8360533 0.4031249
tssi 0.4589999 0.1098708 4.1776319 0.0000295
tiqv -0.0011900 0.0902710 -0.0131827 0.9894820
tlpr -0.0192352 0.0960728 -0.2002151 0.8413124
iqp 0.7341440 0.1015723 7.2277991 0.0000000
has_repeated_group1 -0.3901779 0.1431474 -2.7257064 0.0064164
tlpo 1.3593522 0.1135831 11.9679130 0.0000000
sex1:tssi -0.3362162 0.1167376 -2.8801022 0.0039755
min1:tses -0.4223782 0.1818217 -2.3230348 0.0201773
min1:tlpr -0.3643695 0.2094919 -1.7393007 0.0819819
tses:tiqv 0.1438352 0.0801324 1.7949700 0.0726585
tses:iqp -0.1290262 0.0738014 -1.7482891 0.0804140
tssi:tiqv -0.1777920 0.0849924 -2.0918571 0.0364513
tssi:tlpr 0.1171160 0.0808115 1.4492493 0.1472680
tssi:has_repeated_group1 -0.2431223 0.1338241 -1.8167311 0.0692583
iqp:tlpo 0.3471903 0.0853256 4.0690027 0.0000472

The main and interaction effects chosen for each of the models by stepwise variable selection are as follows:

Language - Main effects: sex, min, den, tses, tssi, tiqv, apr, iqp, has_repeated_group, tapo - Interaction effects: sex:den, sex:apr, sex:tapo, min:tses, min:tssi, tses:iqp, tses:tapo, apr:iqp, apr:tapo, iqp:has_repeated_group, has_repeated_group:tapo

Arithmetic: - Main effects: sex, min, tses, tssi, tiqv, tlpr, iqp, has_repeated_group, tlpo - Interaction effects: sex:tssi, min:tses, min:tlpr, tses:tiqv, tses:iqp, tssi:tiqv, tssi:tlpr, tssi:has_repeated_group, iqp:tlpo

Again, we conduct 5-fold cross-validation on the interaction models.

## Generalized Linear Model 
## 
## 4106 samples
##   10 predictor
##    2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 3285, 3285, 3285, 3284, 3285 
## Resampling results:
## 
##   Accuracy  Kappa    
##   0.838773  0.1259755
## Generalized Linear Model 
## 
## 4106 samples
##   10 predictor
##    2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 3284, 3285, 3285, 3285, 3285 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.8426706  0.1240706
## Generalized Linear Model 
## 
## 4106 samples
##   10 predictor
##    2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 3284, 3285, 3285, 3285, 3285 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.8891862  0.1188952
## Generalized Linear Model 
## 
## 4106 samples
##    9 predictor
##    2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 3285, 3285, 3285, 3285, 3284 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.8943005  0.1253494

The accuracy and kappa results for the interaction models are quite similar to those of the main effect models.

We can also find the specificity and sensitivity for each of the interaction models.

Specificity and Sensitivity Comparisons of Interaction Logistic Candidate Models
Specificity Sensitivity
Language Full Model 0.1145511 0.9739884
Language Stepwise Model 0.1037152 0.9806358
Arithmetic Full Model 0.0974478 0.9820408
Arithmetic Stepwise Model 0.0928074 0.9882993

Model Selection

To begin with model selection, we compare the ROC curves for all of the language models and all of the arithmetic models.

The ROC curves for the Language interaction models look considerably better than the main effect models, while no obvious difference in performance seems apparent between the full and stepwise models for both. On the other hand, all of the arithmetic ROC curves look very similar.

We can examine the area under the curve for the ROC curves to greater depth in the below tables including the AUC summary statistics:

5-fold Cross-Validated AUC for Language candidate models
Min. 1st Qu. Median Mean 3rd Qu. Max.
Full Main Effect Model 0.7451 0.7461 0.7479 0.76004 0.7790 0.7821
Stepwise Main Effect Model 0.7437 0.7459 0.7485 0.76014 0.7792 0.7834
Full Interaction Model 0.7996 0.8198 0.8367 0.83366 0.8434 0.8688
Stepwise Interaction Model 0.7981 0.8185 0.8332 0.83272 0.8439 0.8699
5-fold Cross-Validated AUC for Arithmetic candidate models
Min. 1st Qu. Median Mean 3rd Qu. Max.
Full Main Effect Model 0.8343 0.8465 0.8496 0.84720 0.8517 0.8539
Stepwise Main Effect Model 0.8372 0.8442 0.8473 0.84670 0.8487 0.8561
Full Interaction Model 0.8362 0.8595 0.8602 0.86036 0.8697 0.8762
Stepwise Interaction Model 0.8268 0.8460 0.8468 0.84900 0.8583 0.8671

As expected from the plots, the differences in the mean AUC for the Language interaction models vs. main effect models are much smaller than the differences in the mean AUC for the Arithmetic interaction mdoels vs. main effect models.

The stepwise and full models performed similarly enough in all considered metrics for their differences in predictive performance to be considered negligible. As such, the stepwise models are preferred above the full for both the main effect and interaction models for both subjects on the principles of parsimony. Based on the AUC, the differences between the interaction models and main effect models for Language justifies the selection of the interaction model, while the differences between the interaction models and main effect models for Arithmetic do not seem as important. Therefore, the final logistic models for both subjects are the following:

Language: Stepwise Interaction Model Arithmetic: Stepwise Main Effects Model

Further Specifications

We can maximize the area under the curve for an ROC curve for each of the selected models, and then find the specificity and sensitivity for that point.

We can also create plots of the accuracy, specificity, and sensitivity of each of these models over different cutoff probabilities as well.

From these graphs, we see that an ideal cutoff probability of about 0.77 for the language models and 0.87 for the arithmetic models maximizes sensitivity and specificity.

We take a final look at the parameter estimates for the two selected models.

Language Improvement Logistic Stepwise Interaction Model Parameter Estimates
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.6668519 0.1206726 13.8130119 0.0000000
sex1 0.8012155 0.1821726 4.3981118 0.0000109
min1 0.6000305 0.2970773 2.0197792 0.0434063
den2 0.3308813 0.1547745 2.1378279 0.0325307
den3 -0.1669390 0.1541003 -1.0833138 0.2786692
den4 0.4690459 0.3766675 1.2452520 0.2130391
tses 0.1523072 0.0921371 1.6530493 0.0983208
tssi 0.0632335 0.0589607 1.0724690 0.2835094
tiqv 0.1813492 0.0598108 3.0320468 0.0024290
apr -0.1179500 0.0843279 -1.3987071 0.1619008
iqp -0.2032185 0.0687764 -2.9547729 0.0031290
has_repeated_group1 -0.5020179 0.1679044 -2.9899030 0.0027907
tapo 1.1202457 0.0991842 11.2946015 0.0000000
sex1:den2 -0.4570286 0.2316355 -1.9730511 0.0484897
sex1:den3 -0.0099298 0.2402417 -0.0413326 0.9670308
sex1:den4 -0.9236706 0.5360329 -1.7231604 0.0848595
sex1:apr 0.2017032 0.1237583 1.6298150 0.1031406
sex1:tapo -0.2318731 0.1311260 -1.7683219 0.0770071
min1:tses 0.5087286 0.2096014 2.4271237 0.0152191
min1:tssi -0.4270360 0.1814359 -2.3536470 0.0185903
den2:tses -0.1988805 0.1120345 -1.7751723 0.0758694
den3:tses -0.0570852 0.1207752 -0.4726566 0.6364582
den4:tses 0.7884426 0.3036920 2.5961917 0.0094263
tses:tssi -0.1829103 0.0497905 -3.6735983 0.0002392
tses:iqp 0.0836591 0.0580539 1.4410583 0.1495682
tses:tapo 0.1047601 0.0672718 1.5572660 0.1194073
apr:iqp -0.1359372 0.0610653 -2.2260960 0.0260078
apr:tapo 0.2024796 0.0692151 2.9253671 0.0034405
iqp:has_repeated_group1 0.2358161 0.1409126 1.6734917 0.0942305
has_repeated_group1:tapo -0.3160075 0.1664020 -1.8990601 0.0575566
Arithmetic Improvement Logistic Stepwise Main Effects Model Parameter Estimates
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.2057049 0.1125474 28.483163 0.0000000
sex1 -0.5561029 0.1176502 -4.726747 0.0000023
tssi 0.2329103 0.0577382 4.033902 0.0000549
tlpr -0.1207150 0.0853155 -1.414924 0.1570906
iqp 0.4885369 0.0686741 7.113842 0.0000000
has_repeated_group1 -0.3327376 0.1362473 -2.442160 0.0145997
tlpo 1.1613681 0.0957747 12.126043 0.0000000

We can see that sex, the socioeconomic status of the school (ssi), the post-test score for the opposite subject, their performal IQ, and the presence of a repeated group shows up as significant features in both models. In the language model, significant interactions are observed between sex and denomination group, minority status and individual socioeconomic score (ses), minority status and school socioeconomic score, individual SES score and school SSI score, arithmetic pre-test score and IQ performal score, and arithmetic pre-test score and arithmetic post-test score in the prediction of a positive change between the language post- and pre-test score.

Summary and Discussion

The final models chosen were the following:

Multiple Linear Regression: The stepwise main effect model was chosen for efficiency and predictive potential, as there were not particularly obvious benefits in using any of the more complex models for the prediction of the student’s total combined post-test score for Arithmetic and Language. There appeared to be possible violations to the assumption of homogeneity of variance for multiple linear regression; however, this may be attributable to the sparsity of observations for extreme values in the population compared to the bulk of the observations in the middle. The stepwise reduced interaction models were recommended for more insight into the school and individual related features and how they relate to each other in the prediction of post-test score.

Logistic Regression: Logistic regression models were created to predict for a positive change in score in each of the subjects of Language and Arithmetic for each student. The stepwise reduced main effect model was recommended for the prediction of a positive change in arithmetic score, and the stepwise reduced interaction model was recommended for the prediction of a positive change in language score. Different cut-off probabilities were recommended to maximize the specificity and sensitivity of predictions.

References

  1. https://xiang-a.github.io/sta552-project1/
  2. https://xiang-a.github.io/sta552-hw2/