library(aod)
library(ggplot2)

Example A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school. The response variable, admit/don’t admit, is a binary variable.

mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
head(mydata)
  admit gre  gpa rank
1     0 380 3.61    3
2     1 660 3.67    3
3     1 800 4.00    1
4     1 640 3.19    4
5     0 520 2.93    4
6     1 760 3.00    2
summary(mydata)
     admit             gre             gpa             rank      
 Min.   :0.0000   Min.   :220.0   Min.   :2.260   Min.   :1.000  
 1st Qu.:0.0000   1st Qu.:520.0   1st Qu.:3.130   1st Qu.:2.000  
 Median :0.0000   Median :580.0   Median :3.395   Median :2.000  
 Mean   :0.3175   Mean   :587.7   Mean   :3.390   Mean   :2.485  
 3rd Qu.:1.0000   3rd Qu.:660.0   3rd Qu.:3.670   3rd Qu.:3.000  
 Max.   :1.0000   Max.   :800.0   Max.   :4.000   Max.   :4.000  

Notice that the mean of admit is 0.3175 which is less than 0.5. Hence, from this fact we can say that there are greater number of reject than acceptance.

sapply(mydata, sd)
      admit         gre         gpa        rank 
  0.4660867 115.5165364   0.3805668   0.9444602 

Here, we see that there is a low standard deviation in the admit column which means that the values are close to its mean given that we only have 0 and 1.

xtabs(~admit + rank, data = mydata)
     rank
admit  1  2  3  4
    0 28 97 93 55
    1 33 54 28 12

Since there is no empty cell, we say that the admits are distributed well enough in each rank.

Now,we run our logit function. First, we convert rank variable into integer factor to indicated that it should be treated as categorical variable.

mydata$rank <- factor(mydata$rank)
mylogit <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial")
summary(mylogit)

Call:
glm(formula = admit ~ gre + gpa + rank, family = "binomial", 
    data = mydata)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.6268  -0.8662  -0.6388   1.1490   2.0790  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.989979   1.139951  -3.500 0.000465 ***
gre          0.002264   0.001094   2.070 0.038465 *  
gpa          0.804038   0.331819   2.423 0.015388 *  
rank2       -0.675443   0.316490  -2.134 0.032829 *  
rank3       -1.340204   0.345306  -3.881 0.000104 ***
rank4       -1.551464   0.417832  -3.713 0.000205 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 499.98  on 399  degrees of freedom
Residual deviance: 458.52  on 394  degrees of freedom
AIC: 470.52

Number of Fisher Scoring iterations: 4

From the table above, we have the deviance residuals which are measure of model fit which shows the distribution of the deviance residuals for individual cases used in the model.

Next we have the coefficients which give the change in the log odds of the outcome for a one unit increase in the predictor variable (gre, gpa and rank). Thus,
\(\bullet\) in each one unit change in gre will increase the log odds of getting admit by 0.002264, and its p-value implies that it is somewhat significant in determining the admission;
\(\bullet\) in each unit increase in gpa increases the log odds of getting admit by 0.804038 and p-value implies that it is somewhat significant in determining the admission.
\(\bullet\) Now for the rank, when going to rank 2 college from rank 1 college will decrease the log odds of getting admit by -0.675443. Going from rank 2 to rank 3 will decrease it by -1.340204 and going from rank 3 to rank 4 will also decrease it by -1.551464.

Null deviance is the value when you only have intercept in your equation with no variables and Residual deviance is the value when you are taking all the variables into account.The greater the difference, the better the model. Now, from the table above it tells us that the model is a good fit.

confint(mylogit)
                    2.5 %       97.5 %
(Intercept) -6.2716202334 -1.792547080
gre          0.0001375921  0.004435874
gpa          0.1602959439  1.464142727
rank2       -1.3008888002 -0.056745722
rank3       -2.0276713127 -0.670372346
rank4       -2.4000265384 -0.753542605
confint.default(mylogit)
                    2.5 %       97.5 %
(Intercept) -6.2242418514 -1.755716295
gre          0.0001202298  0.004408622
gpa          0.1536836760  1.454391423
rank2       -1.2957512650 -0.055134591
rank3       -2.0169920597 -0.663415773
rank4       -2.3703986294 -0.732528724

Using confint() we get the confidence interval of our coefficient estimate based on the profiled log-likelihood function while using confint.default() we get the confidence interval based on just the standard errors. Observe that the coefficient estimates in our logit function lie between these intervals.

wald.test(b = coef(mylogit), Sigma = vcov(mylogit), Terms = 4:6)
Wald test:
----------

Chi-squared test:
X2 = 20.9, df = 3, P(> X2) = 0.00011

Wald test measures to confirm whether a set of independent variables are collectively significant for a model or not. Referring to our summary table above, the term 4:6 refers to the varibale rank, thus we are going to test if the rank is significant in the admission. Observe that using \(\alpha=0.05\), rank is significant since its p-value is 0.00011 which is less than \(\alpha=0.05\).

l <- cbind(0, 0, 0, 1, -1, 0)
wald.test(b = coef(mylogit), Sigma = vcov(mylogit), L = l)
Wald test:
----------

Chi-squared test:
X2 = 5.5, df = 1, P(> X2) = 0.019

Here, we specifically test the difference between rank 2 and rank 3 and notice that it has a chi-squared statistic of 5.5, a degree of freedom equal to 1 and a p-value of 0.019 which is less than 0.05. Therefor, at \(\alpha=0.05\), the difference between rank 2 and 3 is significant.

exp(coef(mylogit))
(Intercept)         gre         gpa       rank2       rank3       rank4 
  0.0185001   1.0022670   2.2345448   0.5089310   0.2617923   0.2119375 
exp(cbind(OR = coef(mylogit), confint(mylogit)))
                   OR       2.5 %    97.5 %
(Intercept) 0.0185001 0.001889165 0.1665354
gre         1.0022670 1.000137602 1.0044457
gpa         2.2345448 1.173858216 4.3238349
rank2       0.5089310 0.272289674 0.9448343
rank3       0.2617923 0.131641717 0.5115181
rank4       0.2119375 0.090715546 0.4706961

From the code above, we exponentiate the coefficients and interpret it as odds-ratios. From the table, we see that for every 1 unit increase of gre, the odds of being admitted is increased by a factor of 1.0022670 and for every 1 unit increase of gpa, the odds of being admitted is increased by a factor of 2.2345448. Note also that being in rank 2 has a higher odds of being admitted than in rank 3 and rank 4.

Now, we calculate the predicted probability of admission at each value of rank, holding gre and gpa at their means. Creating a new data frame, we have

newdata1 <- with(mydata, data.frame(gre = mean(gre), gpa = mean(gpa), rank = factor(1:4)))
newdata1
    gre    gpa rank
1 587.7 3.3899    1
2 587.7 3.3899    2
3 587.7 3.3899    3
4 587.7 3.3899    4

Then, create a new variable in the dataset (data frame) newdata1 called rankP, the rest of the command tells R that the values of rankP should be predictions made using the predict( ) function

newdata1$rankP <- predict(mylogit, newdata = newdata1, type = "response")
newdata1
    gre    gpa rank     rankP
1 587.7 3.3899    1 0.5166016
2 587.7 3.3899    2 0.3522846
3 587.7 3.3899    3 0.2186120
4 587.7 3.3899    4 0.1846684

From the table above, the predicted probability of being accepted into a graduate program is 0.5166016 for students from the highest prestige undergraduate institutions (rank 1), 0.3522846 for the students from institution of rank 2, 0.2186120 for the students from institution of rank 3 and 0.18 for students from the lowest ranked institutions (rank 4), holding gre and gpa at their means.

Now, we want to predict probabilities varying the value of gre and rank creating 100 values of gre between 200 and 800, at each value of rank (i.e., 1, 2, 3, and 4).

newdata2 <- with(mydata, data.frame(gre = rep(seq(from = 200, to = 800, length.out = 100),
    4), gpa = mean(gpa), rank = factor(rep(1:4, each = 100))))
newdata3 <- cbind(newdata2, predict(mylogit, newdata = newdata2, type = "link",
    se = TRUE))
newdata3 <- within(newdata3, {
    PredictedProb <- plogis(fit)
    LL <- plogis(fit - (1.96 * se.fit))
    UL <- plogis(fit + (1.96 * se.fit))
})
head(newdata3)
       gre    gpa rank        fit    se.fit residual.scale        UL        LL
1 200.0000 3.3899    1 -0.8114870 0.5147714              1 0.5492064 0.1393812
2 206.0606 3.3899    1 -0.7977632 0.5090986              1 0.5498513 0.1423880
3 212.1212 3.3899    1 -0.7840394 0.5034491              1 0.5505074 0.1454429
4 218.1818 3.3899    1 -0.7703156 0.4978239              1 0.5511750 0.1485460
5 224.2424 3.3899    1 -0.7565919 0.4922237              1 0.5518545 0.1516973
6 230.3030 3.3899    1 -0.7428681 0.4866494              1 0.5525464 0.1548966
  PredictedProb
1     0.3075737
2     0.3105042
3     0.3134499
4     0.3164108
5     0.3193867
6     0.3223773

Here, we see the predicted probabilities of the first 6 of the 100 values holding gpa at its mean and varying gre and rank. To better understand this table, we use a graph for visualizations.

ggplot(newdata3, aes(x = gre, y = PredictedProb)) + geom_ribbon(aes(ymin = LL,
    ymax = UL, fill = rank), alpha = 0.2) + geom_line(aes(colour = rank),
    size = 1)

From this graph, we can tell that a higher gre and having a rank 1 undergraduate institution has a higher probability to be admitted in the school. Additionally, higher gre is also an advantage since it can increase the probability of being admitted.However, the type of school has more advantage as seen in the graph.

Now,to test the significance of our model we want to test whether the model with predictors fits significantly better than a model with just an intercept (i.e., a null model).

with(mylogit, null.deviance - deviance)
[1] 41.45903
with(mylogit, df.null - df.residual)
[1] 5
with(mylogit, pchisq(null.deviance - deviance, df.null - df.residual, lower.tail = FALSE))
[1] 7.578194e-08

The chi-square of 41.46 with 5 degrees of freedom and an associated p-value of less than 0.001 tells us that our model as a whole fits significantly better than an empty model.

To see the model’s log likelihood, we have

logLik(mylogit)
'log Lik.' -229.2587 (df=6)