I use dataset from Kaggle which provides data on one’s academic scores and their probability of graduate admissions to universities. The variables are: Research Experience (yes/no), Academic Scores (GRE, TOEFL, etc.) and University Ratings (out of 5). Today, I am including the following variables in the analysis: GRE Scores, University Rating, Research Experience (0 or 1), Chance of Admit (ranging from 0 to 1 which means calculated probability (percentage point) of person’s admittance ot the graduate program based on his or her academic record). I would like to examine if having a research experience could significantly increase one’s chances of graduate admission. My working hypothesis is: The applicants with a research experience have higher probability of admissions than those without the research experience. Constructing three models, I will further examine the relational effect of research experience along with other factors on the chances of graduate admissions.
First, I downloaded the data and imported it in R.
Grad_ad <- read.csv ("C:/Users/Marcy/Documents/soc 712/Grad_ad.csv")
head (Grad_ad)
Model 1: This model is to view the correlation between the chance of graduate admission and a research experience.
mod1 <- glm(Chance.of.Admit ~ Research , family = "binomial", data = Grad_ad)
non-integer #successes in a binomial glm!
summary (mod1)
Call:
glm(formula = Chance.of.Admit ~ Research, family = "binomial",
data = Grad_ad)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.92754 -0.15370 0.02479 0.20781 0.58073
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.5533 0.1400 3.951 7.77e-05 ***
Research 0.7714 0.2028 3.803 0.000143 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 52.519 on 499 degrees of freedom
Residual deviance: 37.827 on 498 degrees of freedom
AIC: 391.41
Number of Fisher Scoring iterations: 4
On average, probability of admissions without a research is 0.55, but with a research experience, it increases to 0.77.
Model 2 will examine correlation between probability of admissions and two other variables as having a research experince and University Rating (which is a rating that University assigns to every applicant based on their academic record) to see if there is a significance.
mod2 <- glm(Chance.of.Admit ~ Research + University.Rating, family = "binomial", data = Grad_ad)
non-integer #successes in a binomial glm!
summary (mod2)
Call:
glm(formula = Chance.of.Admit ~ Research + University.Rating,
family = "binomial", data = Grad_ad)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.70020 -0.10693 0.04256 0.16499 0.57909
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.3736 0.2898 -1.289 0.197376
Research 0.4373 0.2227 1.963 0.049601 *
University.Rating 0.3685 0.1024 3.597 0.000322 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 52.519 on 499 degrees of freedom
Residual deviance: 24.414 on 497 degrees of freedom
AIC: 380.95
Number of Fisher Scoring iterations: 4
Without university rating and research, chances of admissions are less then zero which making sense. With a research, chances of admissions on average are 0.4. And with university ratings 0.36.
Model 3 will include interaction of the variables
mod3 <- glm(Chance.of.Admit ~ GRE.Score + Research * University.Rating, family = "binomial", data = Grad_ad)
non-integer #successes in a binomial glm!
summary (mod3)
Call:
glm(formula = Chance.of.Admit ~ GRE.Score + Research * University.Rating,
family = "binomial", data = Grad_ad)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.64438 -0.09417 0.02689 0.12391 0.38687
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -10.89072 3.93440 -2.768 0.00564 **
GRE.Score 0.03622 0.01301 2.784 0.00538 **
Research -0.37918 0.62619 -0.606 0.54482
University.Rating 0.10143 0.15605 0.650 0.51569
Research:University.Rating 0.18937 0.20818 0.910 0.36302
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 52.519 on 499 degrees of freedom
Residual deviance: 14.880 on 495 degrees of freedom
AIC: 372.43
Number of Fisher Scoring iterations: 5
In order to see which model fits best to examine my hypothesis, I conduct ANOVA and lmtest.
anova(mod1, mod2, mod3, test= "Chisq")
Analysis of Deviance Table
Model 1: Chance.of.Admit ~ Research
Model 2: Chance.of.Admit ~ Research + University.Rating
Model 3: Chance.of.Admit ~ GRE.Score + Research * University.Rating
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 498 37.827
2 497 24.414 1 13.4122 0.000250 ***
3 495 14.880 2 9.5339 0.008506 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Likelihood Ratio Test
library (texreg)
lmtest::lrtest(mod1, mod2, mod3)
Likelihood ratio test
Model 1: Chance.of.Admit ~ Research
Model 2: Chance.of.Admit ~ Research + University.Rating
Model 3: Chance.of.Admit ~ GRE.Score + Research * University.Rating
#Df LogLik Df Chisq Pr(>Chisq)
1 2 -193.71
2 3 -187.47 1 12.466 0.0004144 ***
3 5 -181.22 2 12.512 0.0019193 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The model 2 and 3 have significant correlations to examine. To choose the best fit model, I conduct the AIC/BIC test.
library (texreg)
htmlreg(list(mod1,mod2,mod3), doctype=FALSE)
| Model 1 | Model 2 | Model 3 | ||
|---|---|---|---|---|
| (Intercept) | 0.55*** | -0.37 | -10.89** | |
| (0.14) | (0.29) | (3.93) | ||
| Research | 0.77*** | 0.44* | -0.38 | |
| (0.20) | (0.22) | (0.63) | ||
| University.Rating | 0.37*** | 0.10 | ||
| (0.10) | (0.16) | |||
| GRE.Score | 0.04** | |||
| (0.01) | ||||
| Research:University.Rating | 0.19 | |||
| (0.21) | ||||
| AIC | 391.41 | 380.95 | 372.43 | |
| BIC | 399.84 | 393.59 | 393.51 | |
| Log Likelihood | -193.71 | -187.47 | -181.22 | |
| Deviance | 37.83 | 24.41 | 14.88 | |
| Num. obs. | 500 | 500 | 500 | |
| p < 0.001, p < 0.01, p < 0.05 | ||||
The Lowerst numbers of AIC/BIC indicate better fit for a model, thus, we are going to plot model 3 which has also provides more variables to examine.
library(visreg)
Plotting relationships of affect of having research experience on probability of admissions by different GRE scores of the applicants.
visreg(mod3, "Research", by ='GRE.Score', scale="response")
If all applicants are sorted in groups with lower, average, and higher GRE scores (as graph shows), it is evident that having Reaseach Experience slightly increase chances of graduate admission to univesities in all these groups of applicants. Also, this correlation is more stronger for people with lower GRE scores.
Now, prlotting effect of University Rating by GRE scores.
visreg(mod3, "University.Rating", by ='GRE.Score', scale="response")
Regardless of of GRE score of the applicant, more higher university ratings increase significantly probability of admissions. it is a strong positive correlation.
Now, plotting effect of having researh experience by university rating.
visreg(mod3, "Research", by ='University.Rating', scale="response")
Thus, to conclude, having a research experience correlates with probability of admissions to graduate schools but ONLY when another factor as University Rating is at play. Having research experience slightly increases chances of admissions when University Rating of an applicant record is 3 and up. Having research experience does not have any impact on chances of admission if the university rating is 0-2. Moreover, when the university rating is 4 and up, the research experience is significantly increases the probability of admissions.