It is important to think about the relationship between PSAT scores, high school GPA and SAT scores as it can help students prepare ahead of time. The PSAT 8/9 is a relatively new exam intended to help eighth- and ninth-grade students plan a path to college. It is similar to other tests in the SAT Suite of Assessments, but it is important to note that it is not the same exam as the PSAT 10 or PSAT/National Merit Scholarship Qualifying Test (https://www.usnews.com/education/blogs/college-admissions-playbook/articles/2017-04-03/3-important-facts-about-the-psat-8-9). Does the results of these exams forecast or predict students aptitude to perform well in college entrance exams such as SAT?
That is something which will be investigated in this analysis using a small dataset from R’s online data repository. However, it is important to mention that the goal of this assignment is to compare two important predictors, One PSAT Math Scores and Second, High School GPA. There are numerous articles available on internet in supoort of these test than about importance of high school GPA, which brings me to a more important question which of these two variables is more reliable predictor of students aptitude.In this analysis, we analyzed the two independent variables “PSAT Math Scores” and “High School GPA” and their effect on the dependent variable “SAT Math Scores”. Basically, we want to see among these two explanatory variables “PSAT Math Scores” and “High School GPA” which one is a strong predictor of higher SAT exam score. We also want to see the interactions effects of other important variables such as gender and grades from math course during high school year(s).
In this data, gender is an binary variable with two values 1=Male and 0=Female. To check the interaction effect, if student is particularly is a female, we created a new variable female which was recoded to 1 for female and 0 for male. Similary, it is conceptually understandable and scientifically published that course success/grade is also an important indicator of higher SAT scores. So considering that, we will use another binary variable “CourseSuccess” which is = 1: if a student has a “B” or better grade in the high school math course and “0” if student got less than “B” grade in the math course during his high school year(s).
This variable will be used in the regression model to see the interaction effect on SAT Math score, if respondent or student is a female.
The figure below shows the PSAT Math scores and SAT Math scores for the high school students. It can be seen that there is linear relationship between PSAT scores and SAT Math Scores.
## Warning: Removed 2167 rows containing non-finite values (stat_lm).
## Warning: Removed 2175 rows containing missing values (geom_point).
The figure below shows the PSAT Math Score and SAT Math Score by high school math course sucess. The Course Success simply corresponds to the grade recevied in math course during high school by the student. In this case 1= B or better grade in the mathcourse, and 0 = less than B. From the plot below we can that the students who received B or Better grade which is displayed in “Blue” colour and also have higher scores in the PSAT math exam have higher SAT scores as well.
## Warning: Removed 2167 rows containing non-finite values (stat_lm).
## Warning: Removed 2172 rows containing missing values (geom_point).
In the figure below, we have another type of standarized college admission test called “ACT”. THe ACT score range from 0-36 each section. In this data we only have the scores for the math section. Comparing the two scores PSAT and ACT we also see linear relationship.
## Warning: Removed 1686 rows containing non-finite values (stat_lm).
## Warning: Removed 1692 rows containing missing values (geom_point).
The figure below shows the same results as figure 3. However, here we have “jitters” in colour “ligh blue” and “dark blue” which represent the gender. The grey jitters or points represent the missing data or values for which gender is unknown. This figure explains that male student tend to score higher in the ACT math sections as compare female students. However, based on this data it would be doubtful as much of the gender information for the cases is missing.
## Warning: Removed 1686 rows containing non-finite values (stat_lm).
## Warning: Removed 1691 rows containing missing values (geom_point).
In the figure represent X=High School GPA and Y=SAT Math Score. Where coloured point/jitters represent Red=Male, and Blue=Female, and Grey=Missing Gender Information. Again, just like figure five the female test takes scored lowered as compare to their male counter parts. However, male test takers who have high GPA also scored higher on the SAT math exam.
## Warning: Removed 1468 rows containing non-finite values (stat_smooth).
## Warning: Removed 1603 rows containing missing values (geom_point).
The figure below shows a more comprehensive picture of the GPA vs SAT math scores. However, this time the colours represent different grades received by students. This visulization may be very grabled, but it shows a pattern. The students who received A,A+,A- scored high in the SAT test. These students also have high GPA values as compare to other students.
## Warning: Removed 1468 rows containing non-finite values (stat_smooth).
## Warning: Removed 1602 rows containing missing values (geom_point).
The figure below shows, the GPA and ACT math scores by gender. In this figure, we have used colored jitters to seperate female and male respondents. This again shows that there is differece between male and female ACT scores. Female students tend to recieve higher grade point averages, howevever, they have low ACT math scores. Where, male students who received low GPA have higher ACT math scores.
## Warning: Removed 334 rows containing non-finite values (stat_smooth).
## Warning: Removed 335 rows containing missing values (geom_point).
Model#1 shows the assocation between PSAT math scores and SAT math scores. First, the R-squared tells that 11.02% of the variance of the response variable SAT math score is explained by our regeression model. .It can be predicted that: EACH SCORE POINT INCREASE IN PSAT MATH EXAM, INCREASE SAT MATH SCORE BY 0.19.
The intecept or the constant is 52.02, when other variables in the model are held 0. The assocation is significant.
##
## Call:
## lm(formula = SATM ~ PSATM, data = mathtest)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.9165 -4.2986 0.0835 4.5104 24.9729
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 52.02711 1.40841 36.940 < 2e-16 ***
## PSATM 0.19104 0.02327 8.208 1.64e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.103 on 544 degrees of freedom
## (2150 observations deleted due to missingness)
## Multiple R-squared: 0.1102, Adjusted R-squared: 0.1086
## F-statistic: 67.37 on 1 and 544 DF, p-value: 1.637e-15
Model#2 shows association between ACT Math Score and PSAT Math Score. Similarly, this model can be interpreted as: EACH UNIT INCREASE IN PSAT SCORE, INCREASE ACT MATH SCORE BY 0.13. The intercept or the constant for SAT score while PSAT math score is 0 is equal to 19.59. This model is also sigificant.
##
## Call:
## lm(formula = ACTM ~ PSATM, data = mathtest)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.3185 -2.0473 -0.0473 2.0882 16.4095
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.590467 0.557988 35.11 <2e-16 ***
## PSATM 0.135579 0.009448 14.35 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.357 on 1027 degrees of freedom
## (1667 observations deleted due to missingness)
## Multiple R-squared: 0.167, Adjusted R-squared: 0.1662
## F-statistic: 205.9 on 1 and 1027 DF, p-value: < 2.2e-16
In model#3, insted of using PSAT score as an independent variable we used GPA as an indepentdent variable because the goal of this analysis is to find better predictor for the SAT math score. The model would intepreted as: EACH UNIT INCREASE IN GPA, INCREASES THE SAT MATH SCORE BY 0.53. The intercept or the constant for this model which represent the SAT score when GPA is equal to zero is 43.66. The model is significant.
##
## Call:
## lm(formula = SATM ~ GPAadj, data = mathtest)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.7926 -4.3943 0.1396 4.9892 17.2752
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 43.64011 1.59976 27.28 <2e-16 ***
## GPAadj 0.53390 0.04467 11.95 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.106 on 1228 degrees of freedom
## (1466 observations deleted due to missingness)
## Multiple R-squared: 0.1042, Adjusted R-squared: 0.1035
## F-statistic: 142.9 on 1 and 1228 DF, p-value: < 2.2e-16
In model#4, we want to use multiple variables togather. So in this model we have two indepent variables namely, PSAT and GPA which will predict the SAT Math score. This model would be interpreted: “EACH UNIT INCREASE IN PSAT SCORE, INCREASE THE SAT SCORE BY .16 POINTS, WHEN OTHER VARIBLES IN THE MODEL ARE HELD CONSTANT”; SIMILARY FOR GPA; “EACH UNIT INCREASE IN GPA, INCREASE THE SAT MATH SCORE BY .64 POINTS WHEN OTHER VARIABLES IN THE MODEL ARE HELD CONSTANT”. THe intercept or the constatnt in this model which shows SAT score when PSAT and GPA are 0 is equal to 30.11.
##
## Call:
## lm(formula = SATM ~ PSATM + GPAadj, data = mathtest)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.8637 -4.1657 -0.0063 3.8915 22.2199
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.11408 2.85295 10.555 < 2e-16 ***
## PSATM 0.16275 0.02209 7.367 6.56e-13 ***
## GPAadj 0.64760 0.07469 8.671 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.668 on 542 degrees of freedom
## (2151 observations deleted due to missingness)
## Multiple R-squared: 0.2185, Adjusted R-squared: 0.2157
## F-statistic: 75.79 on 2 and 542 DF, p-value: < 2.2e-16
In this model we are simplying using the interaction term to define the effects on our variables. The R squared tells us that 62.24% of the variance of the response varianle “SAT math score” is explained by our regression model. Lets intepret the key findings of this model which interaction terms. First, for gender, the regression models shows that being male decreases your SAT score by 1.382 points, BUT this is not significant at degree of freedom. Second, for each 1 point increase PSAT math score there is a 0.28 point increase in SAT score for male student, while other variables in the model are held constant. Similarly, the last interaction between gender and GPA shows that “Each point increase in GPA, decreases .398 score on SAT math exam, for male student, while other variables in the model are held constant”
##
## Call:
## lm(formula = SATM ~ PSATM * Gender + GPAadj * Gender, data = mathtest)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.555 -3.019 -0.322 3.962 10.743
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.45821 6.73601 1.404 0.1626
## PSATM 0.58265 0.07902 7.373 1.64e-11 ***
## Gender -1.38248 9.13678 -0.151 0.8800
## GPAadj 0.46019 0.19975 2.304 0.0228 *
## PSATM:Gender 0.28432 0.12184 2.334 0.0211 *
## Gender:GPAadj -0.39899 0.25832 -1.545 0.1248
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.963 on 132 degrees of freedom
## (2558 observations deleted due to missingness)
## Multiple R-squared: 0.6224, Adjusted R-squared: 0.6081
## F-statistic: 43.52 on 5 and 132 DF, p-value: < 2.2e-16
In this model instead of using gender variable in which 1=male and 0=female, we will use another variable female which is an binary variable with the values 1=female and 0=male. This was done to double check if there was a sigificant effect on SAT Math Score, if student or respondent was female. But the results for this model are similar at the previous. One of the important finding of this model is regarding the R2 and PSAT scores. According to this model “EACH UNIT INCREASE PSAT MATH SCORE, INCREASE YOUR SAT MATH SCORE BY .86 POINTS, WHILE OTHER VARIABLES ARE HELD CONSTANT”. The R2 shows that 62.24% of the variance of the response variable is explained by this regression model.Therefore no further interpertation is required.
##
## Call:
## lm(formula = SATM ~ PSATM * Female + GPAadj * Female, data = mathtest)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.555 -3.019 -0.322 3.962 10.743
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.07573 6.17308 1.308 0.1931
## PSATM 0.86697 0.09274 9.348 2.98e-16 ***
## Female 1.38248 9.13678 0.151 0.8800
## GPAadj 0.06120 0.16379 0.374 0.7093
## PSATM:Female -0.28432 0.12184 -2.334 0.0211 *
## Female:GPAadj 0.39899 0.25832 1.545 0.1248
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.963 on 132 degrees of freedom
## (2558 observations deleted due to missingness)
## Multiple R-squared: 0.6224, Adjusted R-squared: 0.6081
## F-statistic: 43.52 on 5 and 132 DF, p-value: < 2.2e-16
In this model, we have used another interaction term/variable “coursesuccess” which actually shows the students received grade in the math course during the high school semester. The obtained R2 for this model is 33.64% which means that 33.64% of variance is explaned by response variable in this regression model. The model can be interpreted by two variables one the CourseSucess variable and other by its interactions with GPA, PSAT score. First the CourseSuccess and the SAT Math score. Interpretation: “Receiving B or Better Grade in high school math course, increase the SAT Math score by 10.81 points”, however, it is important to mention that this association is not siginificant. Another finding from this model which is siginificant but contrary to literature, is regarding the interaction between PSAT Score and course success grade variable: It can be interpreted: “EACH UNIT INCREASE IN PSAT MATH SCORE, decreases SAT Math score by -0.345 if student received B or Better Grade”. This interaction is sigificant. The last interaction between GPA and Course Success Grade is also significant. The model shows “EACH UNIT INCREASE IN GPA, INCREASE YOUR SAT MATH SCORE BY .357 POINTS, IF YOU RECIEVED B OR BETTER GRADE IN HIGH SCHOOL MATH COURSE” this is true while the other variables in the model are held constant.
##
## Call:
## lm(formula = SATM ~ PSATM * CourseSuccess + GPAadj * CourseSuccess,
## data = mathtest)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.8953 -3.8604 0.2053 3.7909 30.3989
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.16613 5.23081 4.238 2.75e-05 ***
## PSATM 0.52795 0.06607 7.991 1.18e-14 ***
## CourseSuccess 10.81858 6.69203 1.617 0.1067
## GPAadj 0.20129 0.13573 1.483 0.1388
## PSATM:CourseSuccess -0.34559 0.07227 -4.782 2.37e-06 ***
## CourseSuccess:GPAadj 0.35702 0.17340 2.059 0.0401 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.407 on 441 degrees of freedom
## (2249 observations deleted due to missingness)
## Multiple R-squared: 0.3364, Adjusted R-squared: 0.3288
## F-statistic: 44.7 on 5 and 441 DF, p-value: < 2.2e-16
In the end, I would conclude that PSAT scores are weak predictors of college SAT Math scores. Based on the results from Model 1 and Model 3, it can be clearly seen that performing well in math courses during high schoo year(s) and having a high GPA, is associated with higher SAT Math Score . Therefore, it is more reliable to predict using the high school GPA that; Student who have high GPA, especially those who also score B or better grade in high school math course are more likely to score higher on SAT Math Exam.
htmlreg(list(m1, m2, m3, m4, m5, m6, m7))
Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | Model 7 | ||
---|---|---|---|---|---|---|---|---|
(Intercept) | 52.03*** | 19.59*** | 43.64*** | 30.11*** | 9.46 | 8.08 | 22.17*** | |
(1.41) | (0.56) | (1.60) | (2.85) | (6.74) | (6.17) | (5.23) | ||
PSATM | 0.19*** | 0.14*** | 0.16*** | 0.58*** | 0.87*** | 0.53*** | ||
(0.02) | (0.01) | (0.02) | (0.08) | (0.09) | (0.07) | |||
GPAadj | 0.53*** | 0.65*** | 0.46* | 0.06 | 0.20 | |||
(0.04) | (0.07) | (0.20) | (0.16) | (0.14) | ||||
Gender | -1.38 | |||||||
(9.14) | ||||||||
PSATM:Gender | 0.28* | |||||||
(0.12) | ||||||||
Gender:GPAadj | -0.40 | |||||||
(0.26) | ||||||||
Female | 1.38 | |||||||
(9.14) | ||||||||
PSATM:Female | -0.28* | |||||||
(0.12) | ||||||||
Female:GPAadj | 0.40 | |||||||
(0.26) | ||||||||
CourseSuccess | 10.82 | |||||||
(6.69) | ||||||||
PSATM:CourseSuccess | -0.35*** | |||||||
(0.07) | ||||||||
CourseSuccess:GPAadj | 0.36* | |||||||
(0.17) | ||||||||
R2 | 0.11 | 0.17 | 0.10 | 0.22 | 0.62 | 0.62 | 0.34 | |
Adj. R2 | 0.11 | 0.17 | 0.10 | 0.22 | 0.61 | 0.61 | 0.33 | |
Num. obs. | 546 | 1029 | 1230 | 545 | 138 | 138 | 447 | |
RMSE | 7.10 | 3.36 | 7.11 | 6.67 | 4.96 | 4.96 | 6.41 | |
p < 0.001, p < 0.01, p < 0.05 |