Chances of Admission using Regression
#Lets first introduce the data.
str(admission)
## 'data.frame': 400 obs. of 8 variables:
## $ GRE.Score : int 337 324 316 322 314 330 321 308 302 323 ...
## $ TOEFL.Score : int 118 107 104 110 103 115 109 101 102 108 ...
## $ University.Rating: int 4 4 3 3 2 5 3 2 1 3 ...
## $ SOP : num 4.5 4 3 3.5 2 4.5 3 3 2 3.5 ...
## $ LOR : num 4.5 4.5 3.5 2.5 3 3 4 4 1.5 3 ...
## $ CGPA : num 9.65 8.87 8 8.67 8.21 9.34 8.2 7.9 8 8.6 ...
## $ Research : int 1 1 1 1 0 1 1 0 0 0 ...
## $ Admit : num 0.92 0.76 0.72 0.8 0.65 0.9 0.75 0.68 0.5 0.45 ...
#As we see the data are not normalized, so lets build a function to apply MinMax normalization to the data (all the columns).
normalize=function(x)
{
return ((x-min(x))/(max(x)-min(x)))
}
admission_norm=as.data.frame(lapply(admission,normalize))
summary(admission_norm)
## GRE.Score TOEFL.Score University.Rating SOP
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.3600 1st Qu.:0.3929 1st Qu.:0.2500 1st Qu.:0.375
## Median :0.5400 Median :0.5357 Median :0.5000 Median :0.625
## Mean :0.5362 Mean :0.5504 Mean :0.5219 Mean :0.600
## 3rd Qu.:0.7000 3rd Qu.:0.7143 3rd Qu.:0.7500 3rd Qu.:0.750
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.000
## LOR CGPA Research Admit
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.5000 1st Qu.:0.4391 1st Qu.:0.0000 1st Qu.:0.4762
## Median :0.6250 Median :0.5801 Median :1.0000 Median :0.6190
## Mean :0.6131 Mean :0.5766 Mean :0.5475 Mean :0.6101
## 3rd Qu.:0.7500 3rd Qu.:0.7252 3rd Qu.:1.0000 3rd Qu.:0.7778
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
#Now all the values are normalized, so we can go ahead with the regression model.
#We want to predict the chances of admission based on all other features.
#Lets see first a vizualization of the data and think about their correlation.
pairs.panels(admission_norm)
cor(admission_norm)
## GRE.Score TOEFL.Score University.Rating SOP LOR
## GRE.Score 1.0000000 0.8359768 0.6689759 0.6128307 0.5575545
## TOEFL.Score 0.8359768 1.0000000 0.6955898 0.6579805 0.5677209
## University.Rating 0.6689759 0.6955898 1.0000000 0.7345228 0.6601235
## SOP 0.6128307 0.6579805 0.7345228 1.0000000 0.7295925
## LOR 0.5575545 0.5677209 0.6601235 0.7295925 1.0000000
## CGPA 0.8330605 0.8284174 0.7464787 0.7181440 0.6702113
## Research 0.5803906 0.4898579 0.4477825 0.4440288 0.3968593
## Admit 0.8026105 0.7915940 0.7112503 0.6757319 0.6698888
## CGPA Research Admit
## GRE.Score 0.8330605 0.5803906 0.8026105
## TOEFL.Score 0.8284174 0.4898579 0.7915940
## University.Rating 0.7464787 0.4477825 0.7112503
## SOP 0.7181440 0.4440288 0.6757319
## LOR 0.6702113 0.3968593 0.6698888
## CGPA 1.0000000 0.5216542 0.8732891
## Research 0.5216542 1.0000000 0.5532021
## Admit 0.8732891 0.5532021 1.0000000
admission_regg=lm(Admit~.,data=admission_norm)
summary(admission_regg)
##
## Call:
## lm(formula = Admit ~ ., data = admission_norm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.41680 -0.03338 0.01595 0.05758 0.25282
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.01043 0.01790 0.583 0.56050
## GRE.Score 0.13789 0.04745 2.906 0.00387 **
## TOEFL.Score 0.12976 0.04842 2.680 0.00768 **
## University.Rating 0.03630 0.03029 1.198 0.23150
## SOP -0.02099 0.03531 -0.594 0.55267
## LOR 0.14192 0.03518 4.034 6.6e-05 ***
## CGPA 0.58903 0.06052 9.734 < 2e-16 ***
## Research 0.03893 0.01263 3.081 0.00221 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1012 on 392 degrees of freedom
## Multiple R-squared: 0.8035, Adjusted R-squared: 0.8
## F-statistic: 228.9 on 7 and 392 DF, p-value: < 2.2e-16
#This model describes 80% of the data, so it is a good model. This is very clear, beacuse even the pairs graphic shows that the variables are good correlated with each other. Let us see now how the ANNs algorithm fits the data.
Chances of Admission using Artificial Neural Network
#This model describes 80% of the data, so it is a good model. Let us see now how the ANNs algorithm fits the data.
#We have 400 observations and lets divide them 300 for the training set and 100 for the test set.
admission_train=admission_norm[1:299,]
admission_test=admission_norm[300:400,]
admission_ann=neuralnet(Admit~.,data=admission_norm,hidden=1)
plot(admission_ann,rep="best")
model_results = compute(admission_ann, admission_test)
predicted_admit = model_results$net.result
cor(predicted_admit, admission_test$Admit)
## [,1]
## [1,] 0.9098734
#From 80% here it is increased to 91%. So, we can say that the best algorithm to predict the chances of admission at a university is Artifical Neural Network with just one hidden node in its single layer.
#Based on the results we can say that in this case is more suitable to use ANNs algorithm.