This is a case study to exercise model building techniques using logistical regression.
The data set contains several parameters which are considered important during the application for Masters Programs.There are 400 observations and 9 total variables. This dataset is from kaggle (https://www.kaggle.com/datasets/mohansacharya/graduate-admissions) and the way that the data was collected has not been specified. The data was also uploaded to github (https://raw.githubusercontent.com/JackRoss10089/STA-321/main/Admission_Predict.csv). When applying to graduate school many important factors are considered. For this simple logistical regression, we will focus on the relationship between research experience and undergraduate GPA. The parameters included are:
The practical question analyzes the relationship between a given graduate school applicant’s research experience and their undergraduate gpa.
To begin analysis, first it is necessary to evaluate the variables in the data set and choose which variables can be used to build the model.
Now that the data only contains the binary response variable Research and the explanatory variable CGPA (undergraduate GPA), we can build the simple logistic regression model. First we will complete the exploratory data analysis by checking for extreme skew in the relationship between the two variables.
## Warning in max(density(admissions0$CGPA)$Research0): no non-missing arguments to
## max; returning -Inf
Based upon the output in the histogram, no transformation of CGPA is required. Therefore, we can proceed with building the model.
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -19.881775 | 2.1977304 | -9.046503 | 0 |
| CGPA | 2.342411 | 0.2567435 | 9.123544 | 0 |
## Waiting for profiling to be done...
| Estimate | Std. Error | z value | Pr(>|z|) | 2.5 % | 97.5 % | |
|---|---|---|---|---|---|---|
| (Intercept) | -19.881775 | 2.1977304 | -9.046503 | 0 | -24.386295 | -15.753195 |
| CGPA | 2.342411 | 0.2567435 | 9.123544 | 0 | 1.860313 | 2.868861 |
From the above table, we can see that CGPA is positively associated with students who have research experience since β1=2.342411 with a p-value close to 0. The 95% confidence interval [1.860313, 2.868861]. This is generally observed in academic settings as students who achieve a higher GPA are generally more likely to pursue extra-curricular research opportunities.
A common interpretation of this model is by using the odds ratio. Next we will convert the estimated regression coefficients to the odds ratio.
| Estimate | Std. Error | z value | Pr(>|z|) | odds.ratio | |
|---|---|---|---|---|---|
| (Intercept) | -19.881775 | 2.1977304 | -9.046503 | 0 | 0.00000 |
| CGPA | 2.342411 | 0.2567435 | 9.123544 | 0 | 10.40629 |
The odds ratio associated with CGPA is 10.406 meaning that as the CGPA increases by one unit, the odds of having research experience increase by about 104%. This is a significant relationship between CGPA and Research experience in this setting.
Next, we generate the probability curve and the rate of change in success probability.
The left-hand side plot in the above figure is the standard S curve representing how the probability of having research experience increases as the CGPA increases. After diving deeper to see the rate of change in the probability of having research experience, we obtain the curve on the right-hand side that indicates that the rate of change in the probability of having research experience increases when CGPA is less than 8.5 and decreases when CGPA is greater than 8.5. The turning point is about 8.5.
Overall, this case study models real world parameters for incoming graduate students. In particular, this model is designed to assess the relationship between research experience and college GPA. This model has demonstrated that there is a statistically significant relationship between these two vraibles. This relationship precisely implies that as GPA increases by one unit, a student is about 10 times more likely to have research experience.