What Influence SAT Math Score(s) ?

It is important to think about the relationship between PSAT scores, high school GPA and SAT scores as it can help students prepare ahead of time. The PSAT 8/9 is a relatively new exam intended to help eighth- and ninth-grade students plan a path to college. It is similar to other tests in the SAT Suite of Assessments, but it is important to note that it is not the same exam as the PSAT 10 or PSAT/National Merit Scholarship Qualifying Test (https://www.usnews.com/education/blogs/college-admissions-playbook/articles/2017-04-03/3-important-facts-about-the-psat-8-9). Does the results of these exams forecast or predict students aptitude to perform well in college entrance exams such as SAT?

That is something which will be investigated in this analysis using a small dataset from R’s online data repository. However, it is important to mention that the goal of this assignment is to compare two important predictors, One PSAT Math Scores and Second, High School GPA. There are numerous articles available on internet in supoort of these test than about importance of high school GPA, which brings me to a more important question which of these two variables is more reliable predictor of students aptitude.In this analysis, we analyzed the two independent variables “PSAT Math Scores” and “High School GPA” and their effect on the dependent variable “SAT Math Scores”. Basically, we want to see among these two explanatory variables “PSAT Math Scores” and “High School GPA” which one is a strong predictor of higher SAT exam score. We also want to see the interactions effects of other important variables such as gender and grades from math course during high school year(s).

In this data, gender is an binary variable with two values 1=Male and 0=Female. To check the interaction effect, if student is particularly is a female, we created a new variable female which was recoded to 1 for female and 0 for male. Similary, it is conceptually understandable and scientifically published that course success/grade is also an important indicator of higher SAT scores. So considering that, we will use another binary variable “CourseSuccess” which is = 1: if a student has a “B” or better grade in the high school math course and “0” if student got less than “B” grade in the math course during his high school year(s).

Creating one additional variable “Female”

This variable will be used in the regression model to see the interaction effect on SAT Math score, if respondent or student is a female.

Figure#1- PSAT Math Score vs SAT Math Score

The figure below shows the PSAT Math scores and SAT Math scores for the high school students. It can be seen that there is linear relationship between PSAT scores and SAT Math Scores.

## Warning: Removed 2167 rows containing non-finite values (stat_lm).
## Warning: Removed 2175 rows containing missing values (geom_point).

Figure#2- PSAT Math Score vs SAT Math Score by Course Success

The figure below shows the PSAT Math Score and SAT Math Score by high school math course sucess. The Course Success simply corresponds to the grade recevied in math course during high school by the student. In this case 1= B or better grade in the mathcourse, and 0 = less than B. From the plot below we can that the students who received B or Better grade which is displayed in “Blue” colour and also have higher scores in the PSAT math exam have higher SAT scores as well.

## Warning: Removed 2167 rows containing non-finite values (stat_lm).
## Warning: Removed 2172 rows containing missing values (geom_point).

Figure#3- PSAT Math Score vs ACT Math Score

In the figure below, we have another type of standarized college admission test called “ACT”. THe ACT score range from 0-36 each section. In this data we only have the scores for the math section. Comparing the two scores PSAT and ACT we also see linear relationship.

## Warning: Removed 1686 rows containing non-finite values (stat_lm).
## Warning: Removed 1692 rows containing missing values (geom_point).

Figure#4- PSAT Math Score vs ACT Math Score by gender

The figure below shows the same results as figure 3. However, here we have “jitters” in colour “ligh blue” and “dark blue” which represent the gender. The grey jitters or points represent the missing data or values for which gender is unknown. This figure explains that male student tend to score higher in the ACT math sections as compare female students. However, based on this data it would be doubtful as much of the gender information for the cases is missing.

## Warning: Removed 1686 rows containing non-finite values (stat_lm).
## Warning: Removed 1691 rows containing missing values (geom_point).

Figure#5- GPA vs SAT Math Scor by Gender

In the figure represent X=High School GPA and Y=SAT Math Score. Where coloured point/jitters represent Red=Male, and Blue=Female, and Grey=Missing Gender Information. Again, just like figure five the female test takes scored lowered as compare to their male counter parts. However, male test takers who have high GPA also scored higher on the SAT math exam.

## Warning: Removed 1468 rows containing non-finite values (stat_smooth).
## Warning: Removed 1603 rows containing missing values (geom_point).

Figure#6- GPA vs SAT Math Score by Course Grade.

The figure below shows a more comprehensive picture of the GPA vs SAT math scores. However, this time the colours represent different grades received by students. This visulization may be very grabled, but it shows a pattern. The students who received A,A+,A- scored high in the SAT test. These students also have high GPA values as compare to other students.

## Warning: Removed 1468 rows containing non-finite values (stat_smooth).
## Warning: Removed 1602 rows containing missing values (geom_point).

Figure#7- GPA vs ACT Math Score by Gender (Female = 1)

The figure below shows, the GPA and ACT math scores by gender. In this figure, we have used colored jitters to seperate female and male respondents. This again shows that there is differece between male and female ACT scores. Female students tend to recieve higher grade point averages, howevever, they have low ACT math scores. Where, male students who received low GPA have higher ACT math scores.

## Warning: Removed 334 rows containing non-finite values (stat_smooth).
## Warning: Removed 335 rows containing missing values (geom_point).

MODEL-1: PSAT MATH SCORES and SAT MATH SCORES (MODEL= Y ~ X, DATA=NAMEOFDATASET)

Model#1 shows the assocation between PSAT math scores and SAT math scores. First, the R-squared tells that 11.02% of the variance of the response variable SAT math score is explained by our regeression model. .It can be predicted that: EACH SCORE POINT INCREASE IN PSAT MATH EXAM, INCREASE SAT MATH SCORE BY 0.19.

The intecept or the constant is 52.02, when other variables in the model are held 0. The assocation is significant.

## 
## Call:
## lm(formula = SATM ~ PSATM, data = mathtest)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -26.9165  -4.2986   0.0835   4.5104  24.9729 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 52.02711    1.40841  36.940  < 2e-16 ***
## PSATM        0.19104    0.02327   8.208 1.64e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.103 on 544 degrees of freedom
##   (2150 observations deleted due to missingness)
## Multiple R-squared:  0.1102, Adjusted R-squared:  0.1086 
## F-statistic: 67.37 on 1 and 544 DF,  p-value: 1.637e-15

Model-2: ACT MATH SCORE AND PSAT MATH SCORE

Model#2 shows association between ACT Math Score and PSAT Math Score. Similarly, this model can be interpreted as: EACH UNIT INCREASE IN PSAT SCORE, INCREASE ACT MATH SCORE BY 0.13. The intercept or the constant for SAT score while PSAT math score is 0 is equal to 19.59. This model is also sigificant.

## 
## Call:
## lm(formula = ACTM ~ PSATM, data = mathtest)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.3185  -2.0473  -0.0473   2.0882  16.4095 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 19.590467   0.557988   35.11   <2e-16 ***
## PSATM        0.135579   0.009448   14.35   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.357 on 1027 degrees of freedom
##   (1667 observations deleted due to missingness)
## Multiple R-squared:  0.167,  Adjusted R-squared:  0.1662 
## F-statistic: 205.9 on 1 and 1027 DF,  p-value: < 2.2e-16

Model-3: High School GPA and SAT Math Scores

In model#3, insted of using PSAT score as an independent variable we used GPA as an indepentdent variable because the goal of this analysis is to find better predictor for the SAT math score. The model would intepreted as: EACH UNIT INCREASE IN GPA, INCREASES THE SAT MATH SCORE BY 0.53. The intercept or the constant for this model which represent the SAT score when GPA is equal to zero is 43.66. The model is significant.

## 
## Call:
## lm(formula = SATM ~ GPAadj, data = mathtest)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -26.7926  -4.3943   0.1396   4.9892  17.2752 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 43.64011    1.59976   27.28   <2e-16 ***
## GPAadj       0.53390    0.04467   11.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.106 on 1228 degrees of freedom
##   (1466 observations deleted due to missingness)
## Multiple R-squared:  0.1042, Adjusted R-squared:  0.1035 
## F-statistic: 142.9 on 1 and 1228 DF,  p-value: < 2.2e-16

Model-4: High School GPA + PSAT MATH Score and SAT SCORE

In model#4, we want to use multiple variables togather. So in this model we have two indepent variables namely, PSAT and GPA which will predict the SAT Math score. This model would be interpreted: “EACH UNIT INCREASE IN PSAT SCORE, INCREASE THE SAT SCORE BY .16 POINTS, WHEN OTHER VARIBLES IN THE MODEL ARE HELD CONSTANT”; SIMILARY FOR GPA; “EACH UNIT INCREASE IN GPA, INCREASE THE SAT MATH SCORE BY .64 POINTS WHEN OTHER VARIABLES IN THE MODEL ARE HELD CONSTANT”. THe intercept or the constatnt in this model which shows SAT score when PSAT and GPA are 0 is equal to 30.11.

## 
## Call:
## lm(formula = SATM ~ PSATM + GPAadj, data = mathtest)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.8637  -4.1657  -0.0063   3.8915  22.2199 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.11408    2.85295  10.555  < 2e-16 ***
## PSATM        0.16275    0.02209   7.367 6.56e-13 ***
## GPAadj       0.64760    0.07469   8.671  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.668 on 542 degrees of freedom
##   (2151 observations deleted due to missingness)
## Multiple R-squared:  0.2185, Adjusted R-squared:  0.2157 
## F-statistic: 75.79 on 2 and 542 DF,  p-value: < 2.2e-16

Model-5: PSAT + GPA with interaction term Gender to predict SAT Math Score.

In this model we are simplying using the interaction term to define the effects on our variables. The R squared tells us that 62.24% of the variance of the response varianle “SAT math score” is explained by our regression model. Lets intepret the key findings of this model which interaction terms. First, for gender, the regression models shows that being male decreases your SAT score by 1.382 points, BUT this is not significant at degree of freedom. Second, for each 1 point increase PSAT math score there is a 0.28 point increase in SAT score for male student, while other variables in the model are held constant. Similarly, the last interaction between gender and GPA shows that “Each point increase in GPA, decreases .398 score on SAT math exam, for male student, while other variables in the model are held constant”

## 
## Call:
## lm(formula = SATM ~ PSATM * Gender + GPAadj * Gender, data = mathtest)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.555  -3.019  -0.322   3.962  10.743 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9.45821    6.73601   1.404   0.1626    
## PSATM          0.58265    0.07902   7.373 1.64e-11 ***
## Gender        -1.38248    9.13678  -0.151   0.8800    
## GPAadj         0.46019    0.19975   2.304   0.0228 *  
## PSATM:Gender   0.28432    0.12184   2.334   0.0211 *  
## Gender:GPAadj -0.39899    0.25832  -1.545   0.1248    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.963 on 132 degrees of freedom
##   (2558 observations deleted due to missingness)
## Multiple R-squared:  0.6224, Adjusted R-squared:  0.6081 
## F-statistic: 43.52 on 5 and 132 DF,  p-value: < 2.2e-16

Model-6 PSAT + GPA with interaction term Female and SAT math scores.

In this model instead of using gender variable in which 1=male and 0=female, we will use another variable female which is an binary variable with the values 1=female and 0=male. This was done to double check if there was a sigificant effect on SAT Math Score, if student or respondent was female. But the results for this model are similar at the previous. One of the important finding of this model is regarding the R2 and PSAT scores. According to this model “EACH UNIT INCREASE PSAT MATH SCORE, INCREASE YOUR SAT MATH SCORE BY .86 POINTS, WHILE OTHER VARIABLES ARE HELD CONSTANT”. The R2 shows that 62.24% of the variance of the response variable is explained by this regression model.Therefore no further interpertation is required.

## 
## Call:
## lm(formula = SATM ~ PSATM * Female + GPAadj * Female, data = mathtest)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.555  -3.019  -0.322   3.962  10.743 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    8.07573    6.17308   1.308   0.1931    
## PSATM          0.86697    0.09274   9.348 2.98e-16 ***
## Female         1.38248    9.13678   0.151   0.8800    
## GPAadj         0.06120    0.16379   0.374   0.7093    
## PSATM:Female  -0.28432    0.12184  -2.334   0.0211 *  
## Female:GPAadj  0.39899    0.25832   1.545   0.1248    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.963 on 132 degrees of freedom
##   (2558 observations deleted due to missingness)
## Multiple R-squared:  0.6224, Adjusted R-squared:  0.6081 
## F-statistic: 43.52 on 5 and 132 DF,  p-value: < 2.2e-16

Model-7: PSAT Score + GPA with interaction term course success and SAT Math Score.

In this model, we have used another interaction term/variable “coursesuccess” which actually shows the students received grade in the math course during the high school semester. The obtained R2 for this model is 33.64% which means that 33.64% of variance is explaned by response variable in this regression model. The model can be interpreted by two variables one the CourseSucess variable and other by its interactions with GPA, PSAT score. First the CourseSuccess and the SAT Math score. Interpretation: “Receiving B or Better Grade in high school math course, increase the SAT Math score by 10.81 points”, however, it is important to mention that this association is not siginificant. Another finding from this model which is siginificant but contrary to literature, is regarding the interaction between PSAT Score and course success grade variable: It can be interpreted: “EACH UNIT INCREASE IN PSAT MATH SCORE, decreases SAT Math score by -0.345 if student received B or Better Grade”. This interaction is sigificant. The last interaction between GPA and Course Success Grade is also significant. The model shows “EACH UNIT INCREASE IN GPA, INCREASE YOUR SAT MATH SCORE BY .357 POINTS, IF YOU RECIEVED B OR BETTER GRADE IN HIGH SCHOOL MATH COURSE” this is true while the other variables in the model are held constant.

## 
## Call:
## lm(formula = SATM ~ PSATM * CourseSuccess + GPAadj * CourseSuccess, 
##     data = mathtest)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.8953  -3.8604   0.2053   3.7909  30.3989 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          22.16613    5.23081   4.238 2.75e-05 ***
## PSATM                 0.52795    0.06607   7.991 1.18e-14 ***
## CourseSuccess        10.81858    6.69203   1.617   0.1067    
## GPAadj                0.20129    0.13573   1.483   0.1388    
## PSATM:CourseSuccess  -0.34559    0.07227  -4.782 2.37e-06 ***
## CourseSuccess:GPAadj  0.35702    0.17340   2.059   0.0401 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.407 on 441 degrees of freedom
##   (2249 observations deleted due to missingness)
## Multiple R-squared:  0.3364, Adjusted R-squared:  0.3288 
## F-statistic:  44.7 on 5 and 441 DF,  p-value: < 2.2e-16

A Combined Table For Regression Models.

In the end, I would conclude that PSAT scores are weak predictors of college SAT Math scores. Based on the results from Model 1 and Model 3, it can be clearly seen that performing well in math courses during high schoo year(s) and having a high GPA, is associated with higher SAT Math Score . Therefore, it is more reliable to predict using the high school GPA that; Student who have high GPA, especially those who also score B or better grade in high school math course are more likely to score higher on SAT Math Exam.

htmlreg(list(m1, m2, m3, m4, m5, m6, m7))
Statistical models
Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7
(Intercept) 52.03*** 19.59*** 43.64*** 30.11*** 9.46 8.08 22.17***
(1.41) (0.56) (1.60) (2.85) (6.74) (6.17) (5.23)
PSATM 0.19*** 0.14*** 0.16*** 0.58*** 0.87*** 0.53***
(0.02) (0.01) (0.02) (0.08) (0.09) (0.07)
GPAadj 0.53*** 0.65*** 0.46* 0.06 0.20
(0.04) (0.07) (0.20) (0.16) (0.14)
Gender -1.38
(9.14)
PSATM:Gender 0.28*
(0.12)
Gender:GPAadj -0.40
(0.26)
Female 1.38
(9.14)
PSATM:Female -0.28*
(0.12)
Female:GPAadj 0.40
(0.26)
CourseSuccess 10.82
(6.69)
PSATM:CourseSuccess -0.35***
(0.07)
CourseSuccess:GPAadj 0.36*
(0.17)
R2 0.11 0.17 0.10 0.22 0.62 0.62 0.34
Adj. R2 0.11 0.17 0.10 0.22 0.61 0.61 0.33
Num. obs. 546 1029 1230 545 138 138 447
RMSE 7.10 3.36 7.11 6.67 4.96 4.96 6.41
p < 0.001, p < 0.01, p < 0.05