Introduction and Exploratory Data Analysis

The goal of this analysis is to build a model that can predict whether a person is admitted into graduate school or not based on his/her GRE (Gradute Record Exam scores), GPA (Grade Point Avarage), and the rank of his/her undergraduate alma-mater. There are four ranks and the first rank is the highest. Admit is dependent variable and predictors are gre, gpa, and rank.

library(readxl)
logData <- read_excel("D:/KARIR/R Portofolio/logisticregression/gre.xlsx")
head(logData)
## # A tibble: 6 x 4
##   admit   gre   gpa  rank
##   <dbl> <dbl> <dbl> <dbl>
## 1     0   600  3.4      3
## 2     0   340  2.9      1
## 3     1   520  3.19     3
## 4     0   500  2.98     3
## 5     1   620  3.45     2
## 6     1   580  2.86     4

Inspect the structure of the data.

str(logData)
## tibble [300 x 4] (S3: tbl_df/tbl/data.frame)
##  $ admit: num [1:300] 0 0 1 0 1 1 0 0 1 0 ...
##  $ gre  : num [1:300] 600 340 520 500 620 580 520 460 700 580 ...
##  $ gpa  : num [1:300] 3.4 2.9 3.19 2.98 3.45 2.86 2.85 3.07 3.52 3.46 ...
##  $ rank : num [1:300] 3 1 3 3 2 4 3 2 4 4 ...

As we can see, admit and rank variables are not in factor type. So, we have to recast the data type become factor.

logData$admit = as.factor(logData$admit)
logData$rank = as.factor(logData$rank)
str(logData)
## tibble [300 x 4] (S3: tbl_df/tbl/data.frame)
##  $ admit: Factor w/ 2 levels "0","1": 1 1 2 1 2 2 1 1 2 1 ...
##  $ gre  : num [1:300] 600 340 520 500 620 580 520 460 700 580 ...
##  $ gpa  : num [1:300] 3.4 2.9 3.19 2.98 3.45 2.86 2.85 3.07 3.52 3.46 ...
##  $ rank : Factor w/ 4 levels "1","2","3","4": 3 1 3 3 2 4 3 2 4 4 ...

Admit is a binary response variable which consist of two classes: Class 0 corresponds not admitted and class 1 admitted. Let us check the class distribution of our new response variable.

table(logData$admit)
## 
##   0   1 
## 173 127

Logistic Model

logModel <- glm(admit ~ gre + gpa + rank, data = logData, family = "binomial")
summary(logModel)
## 
## Call:
## glm(formula = admit ~ gre + gpa + rank, family = "binomial", 
##     data = logData)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8847  -0.7765  -0.3580   0.7054   2.8140  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -10.376416   1.656450  -6.264 3.75e-10 ***
## gre           0.012637   0.001937   6.525 6.80e-11 ***
## gpa           1.159424   0.416520   2.784  0.00538 ** 
## rank2        -0.647369   0.429061  -1.509  0.13135    
## rank3        -1.275603   0.457073  -2.791  0.00526 ** 
## rank4        -1.737568   0.520840  -3.336  0.00085 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 408.81  on 299  degrees of freedom
## Residual deviance: 288.33  on 294  degrees of freedom
## AIC: 300.33
## 
## Number of Fisher Scoring iterations: 5

We have to careful in intrepreting the output of the logistic model because it is on link-scale (logit); thus, the numerical output of the model corresponds to the log-odds! The coefficient of gre and gpa have positive sign. These indicate that the chance to admit into graduate school increases with gre and gpa. At the other hand, rank has negative sign. By taking the exponent of the coefficient values, we get the odds ratios.

exp(coefficients(logModel)[2:6])
##       gre       gpa     rank2     rank3     rank4 
## 1.0127168 3.1880974 0.5234212 0.2792626 0.1759478

According to the odds ratio of gre, the increase of gre by 1 unit increases the chance to admit into graduate school by 1.013 times. For gpa odds ratio, the increase of gpa by 1 unit increases the chance to admit into graduate school by 3.188 times. But in term of rank, the first rank is the base for others. So, a person who his/her alma-mater was second rank has chance to admit into graduate school by 0.523 compared to first rank and so on.

Prediction

In this example, we will predict the chance of a person who has GRE equal 700, 3.50 in GPA, and his/her alma-mater was first rank.

predict(logModel, newdata = list(gre=700, gpa=3.50, rank="1"), type = "response")
##         1 
## 0.9260248

So, his/her chance to admit into graduate school is 92,6%.