Predicting the Chances of Admission at a university using Regression and Artificial Neural Network

Chances of Admission using Regression

#Lets first introduce the data.

str(admission)

## 'data.frame':    400 obs. of  8 variables:
##  $ GRE.Score        : int  337 324 316 322 314 330 321 308 302 323 ...
##  $ TOEFL.Score      : int  118 107 104 110 103 115 109 101 102 108 ...
##  $ University.Rating: int  4 4 3 3 2 5 3 2 1 3 ...
##  $ SOP              : num  4.5 4 3 3.5 2 4.5 3 3 2 3.5 ...
##  $ LOR              : num  4.5 4.5 3.5 2.5 3 3 4 4 1.5 3 ...
##  $ CGPA             : num  9.65 8.87 8 8.67 8.21 9.34 8.2 7.9 8 8.6 ...
##  $ Research         : int  1 1 1 1 0 1 1 0 0 0 ...
##  $ Admit            : num  0.92 0.76 0.72 0.8 0.65 0.9 0.75 0.68 0.5 0.45 ...

#As we see the data are not normalized, so lets build a function to apply MinMax normalization to the data (all the columns).

normalize=function(x)
{
 return ((x-min(x))/(max(x)-min(x)))
}
admission_norm=as.data.frame(lapply(admission,normalize))
summary(admission_norm)

##    GRE.Score       TOEFL.Score     University.Rating      SOP       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000    Min.   :0.000  
##  1st Qu.:0.3600   1st Qu.:0.3929   1st Qu.:0.2500    1st Qu.:0.375  
##  Median :0.5400   Median :0.5357   Median :0.5000    Median :0.625  
##  Mean   :0.5362   Mean   :0.5504   Mean   :0.5219    Mean   :0.600  
##  3rd Qu.:0.7000   3rd Qu.:0.7143   3rd Qu.:0.7500    3rd Qu.:0.750  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000    Max.   :1.000  
##       LOR              CGPA           Research          Admit       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.5000   1st Qu.:0.4391   1st Qu.:0.0000   1st Qu.:0.4762  
##  Median :0.6250   Median :0.5801   Median :1.0000   Median :0.6190  
##  Mean   :0.6131   Mean   :0.5766   Mean   :0.5475   Mean   :0.6101  
##  3rd Qu.:0.7500   3rd Qu.:0.7252   3rd Qu.:1.0000   3rd Qu.:0.7778  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000

#Now all the values are normalized, so we can go ahead with the regression model.
#We want to predict the chances of admission based on all other features.
#Lets see first a vizualization of the data and think about their correlation.

pairs.panels(admission_norm)

cor(admission_norm)

##                   GRE.Score TOEFL.Score University.Rating       SOP       LOR
## GRE.Score         1.0000000   0.8359768         0.6689759 0.6128307 0.5575545
## TOEFL.Score       0.8359768   1.0000000         0.6955898 0.6579805 0.5677209
## University.Rating 0.6689759   0.6955898         1.0000000 0.7345228 0.6601235
## SOP               0.6128307   0.6579805         0.7345228 1.0000000 0.7295925
## LOR               0.5575545   0.5677209         0.6601235 0.7295925 1.0000000
## CGPA              0.8330605   0.8284174         0.7464787 0.7181440 0.6702113
## Research          0.5803906   0.4898579         0.4477825 0.4440288 0.3968593
## Admit             0.8026105   0.7915940         0.7112503 0.6757319 0.6698888
##                        CGPA  Research     Admit
## GRE.Score         0.8330605 0.5803906 0.8026105
## TOEFL.Score       0.8284174 0.4898579 0.7915940
## University.Rating 0.7464787 0.4477825 0.7112503
## SOP               0.7181440 0.4440288 0.6757319
## LOR               0.6702113 0.3968593 0.6698888
## CGPA              1.0000000 0.5216542 0.8732891
## Research          0.5216542 1.0000000 0.5532021
## Admit             0.8732891 0.5532021 1.0000000

admission_regg=lm(Admit~.,data=admission_norm)
summary(admission_regg)

## 
## Call:
## lm(formula = Admit ~ ., data = admission_norm)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.41680 -0.03338  0.01595  0.05758  0.25282 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.01043    0.01790   0.583  0.56050    
## GRE.Score          0.13789    0.04745   2.906  0.00387 ** 
## TOEFL.Score        0.12976    0.04842   2.680  0.00768 ** 
## University.Rating  0.03630    0.03029   1.198  0.23150    
## SOP               -0.02099    0.03531  -0.594  0.55267    
## LOR                0.14192    0.03518   4.034  6.6e-05 ***
## CGPA               0.58903    0.06052   9.734  < 2e-16 ***
## Research           0.03893    0.01263   3.081  0.00221 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1012 on 392 degrees of freedom
## Multiple R-squared:  0.8035, Adjusted R-squared:    0.8 
## F-statistic: 228.9 on 7 and 392 DF,  p-value: < 2.2e-16

#This model describes 80% of the data, so it is a good model. This is very clear, beacuse even the pairs graphic shows that the variables are good correlated with each other. Let us see now how the ANNs algorithm fits the data.

Chances of Admission using Artificial Neural Network

#This model describes 80% of the data, so it is a good model. Let us see now how the ANNs algorithm fits the data.

#We have 400 observations and lets divide them 300 for the training set and 100 for the test set.

admission_train=admission_norm[1:299,]
admission_test=admission_norm[300:400,]
admission_ann=neuralnet(Admit~.,data=admission_norm,hidden=1)

plot(admission_ann,rep="best")

model_results = compute(admission_ann, admission_test)
predicted_admit = model_results$net.result
cor(predicted_admit, admission_test$Admit)

##           [,1]
## [1,] 0.9098734

#From 80% here it is increased to 91%. So, we can say that the best algorithm to predict the chances of admission at a university is Artifical Neural Network with just one hidden node in its single layer.

#Based on the results we can say that in this case is more suitable to use ANNs algorithm.