Introduction

This dataset was analysed to identify factors contributing to successful graduate admission and therefore give undergraduate students ideas on how to work towards a graduate program.

Aim

Find out if research experience will improve the graduate admission chance
Find out the relationship between undergraduate GPA and admission chance

Variables

GRE Scores (out of 340)
TOEFL Scores (out of 120)
University Rating (out of 5)
Statement of Purpose (out of 5)
Letter of Recommendation Strength (out of 5)
Undergraduate GPA (out of 10)
Research Experience (either 0 or 1)
Chance of Admit (ranging from 0 to 1)

Method

Independent sample t-test is utilised to compare mean admission rates between two groups: Students with research experience and students without research experience.
Correlation and simple linear regression are used to examine the relationship between undergraduate GPA and chances of admission.

Data

admission <- read_csv("Admission_Predict_Ver1.1.csv")
admission$Research <- factor(admission$Research, levels = c(1, 0),labels = c('Yes', 'No'))

This dataset is from https://www.kaggle.com/mohansacharya/graduate-admissions

It’s open source and has a Creative Commons Licence (CC0: Public Domain).

This dataset has a mixture of qualitative and quantitative variables. This analysis uses the following variables.

Undergraduate GPA: numeric out of 10
Research Experience: categorical, 0 or 1 . 0 represents students with no research experience, 1 represents students with research experience
Chance of Admit: numeric, ranging from 0 to 1

Descriptive Statistics and Visualisation

No missing values for variables: Chance of Admit, CGPA
Box plot is used to detect outliers for variable “Chance of Admit”. The output shows two outliers with the same value (Chance of Admit = 0.34). The two outliers’ value is not too far away from the lower fence in the box plot. Also, according to the two outlier rows filtered, they may just be two cases with low admission chance. Therefore, I have decided not to exclude these values as outliers.

sum(is.na(admission$`Chance of Admit`))

## [1] 0

sum(is.na(admission$CGPA))

## [1] 0

boxplot(admission$`Chance of Admit`, main="Box Plot of Admission Chance", ylab="Chance of Admit")$out

## [1] 0.34 0.34

outlier <- filter(admission, `Chance of Admit`==0.34)
outlier

Descriptive Statistics Cont.

Depending on the presence of research experience, the admission chance is shown in this table.

admission %>% group_by(Research) %>% summarise(Min = min(`Chance of Admit`, na.rm = TRUE),
Q1 = quantile(`Chance of Admit`, probs = 0.25, na.rm = TRUE),
Median = median(`Chance of Admit`, na.rm=TRUE), 
Q3 = quantile(`Chance of Admit`, probs = 0.75, na.rm = TRUE),
Max = max(`Chance of Admit`, na.rm = TRUE),
Mean = mean(`Chance of Admit`, na.rm = TRUE),
SD = sd(`Chance of Admit`, na.rm = TRUE),
n = n(), Missing = sum(is.na(`Chance of Admit`))) -> table1
knitr::kable(table1)

Research	Min	Q1	Median	Q3	Max	Mean	SD	n	Missing
Yes	0.36	0.7200	0.800	0.8925	0.97	0.7899643	0.1232083	280	0
No	0.34	0.5675	0.645	0.7100	0.89	0.6349091	0.1119177	220	0

admission %>% boxplot(`Chance of Admit` ~ Research, data = ., ylab = "Chance of Admit")

Check Assumption

There are 3 assumptions to be checked

The two groups (with and without research experience) are independent of each other
According to the qqPlot results, data points mostly fall within the blue lines. Also, both groups have large sample size. Normality can be assumed according to Central Limit Theorem.
Homogeneity of variance is tested using Levene’s test. The p-value for the Levene’s test of equal variance for admission chance between two groups was p=0.13 > 0.05. Therefore, we fail to reject H0 (H0:σ1^2=σ22). It’s safe to assume equal variance.

admission_research <- admission %>% filter(Research == "Yes")
admission_research$`Chance of Admit` %>% qqPlot(dist="norm")

## [1] 37 18

admission_noresearch <- admission %>% filter(Research == "No")
admission_noresearch$`Chance of Admit` %>% qqPlot(dist = "norm")

## [1]  37 171

leveneTest(`Chance of Admit` ~ Research, data = admission)

Hypothesis Testing 1

Hypothesis:

H0: u1 - u2 = 0 same mean between two groups

Ha: u1 - u2 > 0 group with research experience has highter mean than group without

According to two-sample t-test results, the one-tailed p-value P < 0.05. H0 is rejected. There is a statistically significant difference between the means. Students with research experience have a statistically significant higher admission chance than students without research experience.

t.test( `Chance of Admit` ~ Research, data = admission, var.equal = TRUE, alternative = "greater")

## 
##  Two Sample t-test
## 
## data:  Chance of Admit by Research
## t = 14.539, df = 498, p-value < 2.2e-16
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.1374803       Inf
## sample estimates:
## mean in group Yes  mean in group No 
##         0.7899643         0.6349091

Visualise the relationship

Scatter plot is used to visualise the relationship between admission chance and GPA. Based on the output, no outlier was found which further confirmed the previous decision (keep outlier). Furthermore, the scatter plot predicts a possible linear relationship between admission chance and GPA. This will be tested later.

plot(`Chance of Admit` ~ CGPA, data=admission, ylab = "Admission Chance", xlab = "Undergraduate GPA", main = "Admission chance affected by GPA")
model <- lm(`Chance of Admit` ~ CGPA, data=admission)
summary(model)

## 
## Call:
## lm(formula = `Chance of Admit` ~ CGPA, data = admission)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.276592 -0.028169  0.006619  0.038483  0.176961 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.04434    0.04230  -24.69   <2e-16 ***
## CGPA         0.20592    0.00492   41.85   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06647 on 498 degrees of freedom
## Multiple R-squared:  0.7787, Adjusted R-squared:  0.7782 
## F-statistic:  1752 on 1 and 498 DF,  p-value: < 2.2e-16

abline(model, col= "red")

Hypothesis Testing Results

Linear Regression Overall Model

H0:The data does not fit the linear regression model HA:The data fit the linear regression model F-test is performed. Based on the output, p-value < 2.2e-16 so Ho is rejected. It’s safe to assume a linear relationship between the two variables.

Linear Regression - Testing Model Parameters Intercept: H0: a = 0 Ha: a != 0 Slope: H0: b = 0 Ha: b != 0

(Intercept) -1.04434 0.04230 -24.69 <2e-16 CGPA 0.20592 0.00492 41.85 <2e-16

As per above results, p < 0.05 for both intercept and slop. Therefore, H0 is rejected for both. It’s safe to assume intercept and slot are non zero.

Linear Regression Assumptions

Independence

The research design ensured the independence among each students.

Linearity

Linearity is checked and confirmed as per previous slides.

Normality of residuals

Residuals fall close to the line in the “Normal Q-Q” graph.

Homoscedasticity

The trend line is roughly flat at 0 as per the “Residuals vs Fitted” graph. The red line should is close to ﬂat in the “Scale-Location” graph. Also, variance in the square root of the standardised residuals is roughly consistent across predicted. In “Residuals vs Leverage”, no values fall outside of the band. No influencial cases.

plot(model)

Prediction & Correlation

Prediction

Using the estimated linear regression model: Chance of Admit = -1.044 + 0.206*CGPA

Correlation

A hypothesis test for r has the following statistical hypotheses:

H0:r=0

HA:r!=0

R reports the correlation between admission chance and GPA to be r=0.88 and the p-value = 0 <.001.

H0 is rejected. There was a statistically significant positive correlation between admission chance and GPA.

bivariate<-as.matrix(dplyr::select(admission, `Chance of Admit`,CGPA)) #Create a matrix of the variables to be correlated
rcorr(bivariate, type = "pearson")

##                 Chance of Admit CGPA
## Chance of Admit            1.00 0.88
## CGPA                       0.88 1.00
## 
## n= 500 
## 
## 
## P
##                 Chance of Admit CGPA
## Chance of Admit                  0  
## CGPA             0

Discussion

Conclusion

Two-sample t-test: The results concluded that students with research experience have a statistically significant higher admission chance than students without research experience.
Correlation: There is a statistically significant positive correlation between admission chance and GPA. (r = 0.88)
Linear Regression: There is a linear relationship between admission chance and GPA. GPA can be used to make predictions for admission chance: Chance of Admit = -1.044 + 0.206*CGPA

Advantages and Limitations

Advantages: The sample size is large enough to allow the this analysis.
Limitations: It’s unclear from this data source how the “Chance of Admit” was measured or collected. If this is a subjective measurement, the results could be biased. Also, different ratings of universities would have different requirements thus admission standard. This is not reflected in the test methods. Lastly, university majors are not included in this dataset but it could be an important factor for the admission chance.

Recommendations

Propose directions for future investigations: A new variable such as University Major could be added to this dataset so a more case specific result can be concluded. ANOVA could be used to analyse different admission chance among different universities or majors. Due to the time limitation, I did not analyse every variable in the dataset. More complex analysis could be performed to find the relationship among other variables.

References

Data Sourced from:

https://www.kaggle.com/mohansacharya/graduate-admissions

Mohan S Acharya, Asfia Armaan, Aneeta S Antony : A Comparison of Regression Models for Prediction of Graduate Admissions, IEEE International Conference on Computational Intelligence in Data Science 2019

Data licensed under CC0: Public Domain

Graduate Admission Prediction

Analysis of Factors Contributing to Successful Admission

RPubs link information