Graduate Admission Prediction

Analysis of Factors Contributing to Successful Admission

Wei Zhang s3759607

Last updated: 01 June, 2019

Introduction

This dataset was analysed to identify factors contributing to successful graduate admission and therefore give undergraduate students ideas on how to work towards a graduate program.

Aim

Variables

Method

Data

admission <- read_csv("Admission_Predict_Ver1.1.csv")
admission$Research <- factor(admission$Research, levels = c(1, 0),labels = c('Yes', 'No'))

This dataset is from https://www.kaggle.com/mohansacharya/graduate-admissions

It’s open source and has a Creative Commons Licence (CC0: Public Domain).

This dataset has a mixture of qualitative and quantitative variables. This analysis uses the following variables.

Descriptive Statistics and Visualisation

sum(is.na(admission$`Chance of Admit`))
## [1] 0
sum(is.na(admission$CGPA))
## [1] 0
boxplot(admission$`Chance of Admit`, main="Box Plot of Admission Chance", ylab="Chance of Admit")$out

## [1] 0.34 0.34
outlier <- filter(admission, `Chance of Admit`==0.34)
outlier

Descriptive Statistics Cont.

Depending on the presence of research experience, the admission chance is shown in this table.

admission %>% group_by(Research) %>% summarise(Min = min(`Chance of Admit`, na.rm = TRUE),
Q1 = quantile(`Chance of Admit`, probs = 0.25, na.rm = TRUE),
Median = median(`Chance of Admit`, na.rm=TRUE), 
Q3 = quantile(`Chance of Admit`, probs = 0.75, na.rm = TRUE),
Max = max(`Chance of Admit`, na.rm = TRUE),
Mean = mean(`Chance of Admit`, na.rm = TRUE),
SD = sd(`Chance of Admit`, na.rm = TRUE),
n = n(), Missing = sum(is.na(`Chance of Admit`))) -> table1
knitr::kable(table1)
Research Min Q1 Median Q3 Max Mean SD n Missing
Yes 0.36 0.7200 0.800 0.8925 0.97 0.7899643 0.1232083 280 0
No 0.34 0.5675 0.645 0.7100 0.89 0.6349091 0.1119177 220 0
admission %>% boxplot(`Chance of Admit` ~ Research, data = ., ylab = "Chance of Admit")

Check Assumption

There are 3 assumptions to be checked

admission_research <- admission %>% filter(Research == "Yes")
admission_research$`Chance of Admit` %>% qqPlot(dist="norm") 

## [1] 37 18
admission_noresearch <- admission %>% filter(Research == "No")
admission_noresearch$`Chance of Admit` %>% qqPlot(dist = "norm") 

## [1]  37 171
leveneTest(`Chance of Admit` ~ Research, data = admission)

Hypothesis Testing 1

Hypothesis:

H0: u1 - u2 = 0 same mean between two groups

Ha: u1 - u2 > 0 group with research experience has highter mean than group without

According to two-sample t-test results, the one-tailed p-value P < 0.05. H0 is rejected. There is a statistically significant difference between the means. Students with research experience have a statistically significant higher admission chance than students without research experience.

t.test( `Chance of Admit` ~ Research, data = admission, var.equal = TRUE, alternative = "greater")
## 
##  Two Sample t-test
## 
## data:  Chance of Admit by Research
## t = 14.539, df = 498, p-value < 2.2e-16
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.1374803       Inf
## sample estimates:
## mean in group Yes  mean in group No 
##         0.7899643         0.6349091

Visualise the relationship

Scatter plot is used to visualise the relationship between admission chance and GPA. Based on the output, no outlier was found which further confirmed the previous decision (keep outlier). Furthermore, the scatter plot predicts a possible linear relationship between admission chance and GPA. This will be tested later.

plot(`Chance of Admit` ~ CGPA, data=admission, ylab = "Admission Chance", xlab = "Undergraduate GPA", main = "Admission chance affected by GPA")
model <- lm(`Chance of Admit` ~ CGPA, data=admission)
summary(model)
## 
## Call:
## lm(formula = `Chance of Admit` ~ CGPA, data = admission)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.276592 -0.028169  0.006619  0.038483  0.176961 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.04434    0.04230  -24.69   <2e-16 ***
## CGPA         0.20592    0.00492   41.85   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06647 on 498 degrees of freedom
## Multiple R-squared:  0.7787, Adjusted R-squared:  0.7782 
## F-statistic:  1752 on 1 and 498 DF,  p-value: < 2.2e-16
abline(model, col= "red")

Hypothesis Testing Results

H0:The data does not fit the linear regression model HA:The data fit the linear regression model F-test is performed. Based on the output, p-value < 2.2e-16 so Ho is rejected. It’s safe to assume a linear relationship between the two variables.

(Intercept) -1.04434 0.04230 -24.69 <2e-16 CGPA 0.20592 0.00492 41.85 <2e-16

As per above results, p < 0.05 for both intercept and slop. Therefore, H0 is rejected for both. It’s safe to assume intercept and slot are non zero.

Linear Regression Assumptions

The research design ensured the independence among each students.

Linearity is checked and confirmed as per previous slides.

Residuals fall close to the line in the “Normal Q-Q” graph.

The trend line is roughly flat at 0 as per the “Residuals vs Fitted” graph. The red line should is close to flat in the “Scale-Location” graph. Also, variance in the square root of the standardised residuals is roughly consistent across predicted. In “Residuals vs Leverage”, no values fall outside of the band. No influencial cases.

plot(model)

Prediction & Correlation

Using the estimated linear regression model: Chance of Admit = -1.044 + 0.206*CGPA

A hypothesis test for r has the following statistical hypotheses:

H0:r=0

HA:r!=0

R reports the correlation between admission chance and GPA to be r=0.88 and the p-value = 0 <.001.

H0 is rejected. There was a statistically significant positive correlation between admission chance and GPA.

bivariate<-as.matrix(dplyr::select(admission, `Chance of Admit`,CGPA)) #Create a matrix of the variables to be correlated
rcorr(bivariate, type = "pearson")
##                 Chance of Admit CGPA
## Chance of Admit            1.00 0.88
## CGPA                       0.88 1.00
## 
## n= 500 
## 
## 
## P
##                 Chance of Admit CGPA
## Chance of Admit                  0  
## CGPA             0

Discussion

Conclusion

Advantages and Limitations

Recommendations

References

Data Sourced from:

https://www.kaggle.com/mohansacharya/graduate-admissions

Mohan S Acharya, Asfia Armaan, Aneeta S Antony : A Comparison of Regression Models for Prediction of Graduate Admissions, IEEE International Conference on Computational Intelligence in Data Science 2019

Data licensed under CC0: Public Domain