There are several issues that can come up when including interaction terms in logit regression models due to the model’s nonlinearity. Unlike linear models, the coefficient of an interaction term in a logit regression does not have a simple, straightforward interpretation. The nonlinearity of the model means that the relationship between the independent variables and the probability of an outcome changes depending on the values of the other variables in the model. Additionally, the sign of the interaction term coefficient may not match with the expected direction of the conditional effect that originally led to its inclusion. The standard error of the coefficient also provides no clear information regarding the statistical significance of the effect. Overall, interaction terms in logit regression models cannot be interpreted as they can be in linear models. Instead, they can make it more difficult to understand how the variables interact to influence the probability of an outcome.
A simulation-based approach can help resolve issues that arise when including interaction terms in logit regression models by providing more precise results. One reason for this is the approach implicitly correcting for bias in the formula typically used to calculate predicted probabilities. Unlike traditional methods, which rely on complex calculus-based approximations, a simulation-based approach uses straightforward numerical techniques. This enhances the intuition of researchers and readers who may not be familiar with multivariate calculus, allowing for a better understanding and interpretation of the results.
# Import the data
DATA <- read_csv("C:/Users/dijan/Documents/DATA 712/graduation_data.csv", show_col_types = FALSE)
# Renaming variables
names(DATA)[names(DATA) == "Cohort Year"] <- "Cohort_year"
names(DATA)[names(DATA) == "% Grads"] <- "Grad_percentage"
# Convert Cohort_year from numeric to character
DATA$Cohort_year <- as.character(DATA$Cohort_year)
# Removing the rows where the borough is "District 79"
library(dplyr)
DATA <- DATA %>%
filter(!grepl("District 79", Borough))
# Convert graduation percentage to a binary variable (1 if the graduation percentage is 70 or above, 0 if the graduation percentage is below 70)
DATA$Grad_binary <- ifelse(DATA$Grad_percentage >= 70, 1, 0)
# Model predicting graduation binary outcome by borough
m1 <- glm(Grad_binary ~ Borough, family = binomial, data = DATA)
summary(m1)
##
## Call:
## glm(formula = Grad_binary ~ Borough, family = binomial, data = DATA)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.9095 0.3788 -5.041 4.64e-07 ***
## BoroughBrooklyn 1.8450 0.4562 4.044 5.24e-05 ***
## BoroughManhattan 2.4376 0.4611 5.286 1.25e-07 ***
## BoroughQueens 2.5786 0.4642 5.554 2.79e-08 ***
## BoroughStaten Island 4.3432 0.6009 7.228 4.90e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 424.58 on 309 degrees of freedom
## Residual deviance: 329.49 on 305 degrees of freedom
## AIC: 339.49
##
## Number of Fisher Scoring iterations: 5
# Model predicting graduation binary outcome by borough and cohort year
m2 <- glm(Grad_binary ~ Borough + Cohort_year, family = binomial, data = DATA)
summary(m2)
##
## Call:
## glm(formula = Grad_binary ~ Borough + Cohort_year, family = binomial,
## data = DATA)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.361e+00 1.525e+00 -5.483 4.18e-08 ***
## BoroughBrooklyn 4.200e+00 1.098e+00 3.827 0.000130 ***
## BoroughManhattan 5.338e+00 1.129e+00 4.727 2.28e-06 ***
## BoroughQueens 5.603e+00 1.137e+00 4.930 8.23e-07 ***
## BoroughStaten Island 8.542e+00 1.294e+00 6.600 4.10e-11 ***
## Cohort_year2002 2.052e-15 1.354e+00 0.000 1.000000
## Cohort_year2003 8.404e-01 1.310e+00 0.641 0.521232
## Cohort_year2004 1.546e+00 1.286e+00 1.202 0.229392
## Cohort_year2005 1.851e+00 1.218e+00 1.520 0.128574
## Cohort_year2006 3.197e+00 1.185e+00 2.698 0.006985 **
## Cohort_year2007 3.761e+00 1.199e+00 3.138 0.001704 **
## Cohort_year2008 3.475e+00 1.190e+00 2.919 0.003513 **
## Cohort_year2009 3.761e+00 1.199e+00 3.138 0.001704 **
## Cohort_year2010 4.393e+00 1.228e+00 3.576 0.000348 ***
## Cohort_year2011 5.218e+00 1.299e+00 4.018 5.87e-05 ***
## Cohort_year2012 7.336e+00 1.618e+00 4.535 5.76e-06 ***
## Cohort_year2013 8.829e+00 1.729e+00 5.105 3.31e-07 ***
## Cohort_year2014 8.447e+00 1.754e+00 4.817 1.46e-06 ***
## Cohort_year2015 8.447e+00 2.000e+00 4.225 2.39e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 424.58 on 309 degrees of freedom
## Residual deviance: 188.35 on 291 degrees of freedom
## AIC: 226.35
##
## Number of Fisher Scoring iterations: 7
# Model predicting graduation binary outcome by borough and cohort year, including an interaction between borough and cohort year
m3 <- glm(Grad_binary ~ Borough * Cohort_year, family = binomial, data = DATA)
summary(m3)
##
## Call:
## glm(formula = Grad_binary ~ Borough * Cohort_year, family = binomial,
## data = DATA)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.057e+01 1.024e+04 -0.002 0.998
## BoroughBrooklyn 4.395e-06 1.448e+04 0.000 1.000
## BoroughManhattan 6.513e-06 1.448e+04 0.000 1.000
## BoroughQueens 4.940e-06 1.448e+04 0.000 1.000
## BoroughStaten Island 2.126e+01 1.024e+04 0.002 0.998
## Cohort_year2002 5.226e-06 1.448e+04 0.000 1.000
## Cohort_year2003 5.262e-06 1.448e+04 0.000 1.000
## Cohort_year2004 5.257e-06 1.448e+04 0.000 1.000
## Cohort_year2005 5.291e-06 1.354e+04 0.000 1.000
## Cohort_year2006 5.252e-06 1.295e+04 0.000 1.000
## Cohort_year2007 5.356e-06 1.295e+04 0.000 1.000
## Cohort_year2008 5.323e-06 1.295e+04 0.000 1.000
## Cohort_year2009 5.559e-06 1.295e+04 0.000 1.000
## Cohort_year2010 5.244e-06 1.295e+04 0.000 1.000
## Cohort_year2011 5.177e-06 1.295e+04 0.000 1.000
## Cohort_year2012 2.016e+01 1.024e+04 0.002 0.998
## Cohort_year2013 2.097e+01 1.024e+04 0.002 0.998
## Cohort_year2014 2.057e+01 1.024e+04 0.002 0.998
## Cohort_year2015 2.057e+01 1.024e+04 0.002 0.998
## BoroughBrooklyn:Cohort_year2002 -4.398e-06 2.047e+04 0.000 1.000
## BoroughManhattan:Cohort_year2002 -6.516e-06 2.047e+04 0.000 1.000
## BoroughQueens:Cohort_year2002 -4.944e-06 2.047e+04 0.000 1.000
## BoroughStaten Island:Cohort_year2002 -5.226e-06 1.448e+04 0.000 1.000
## BoroughBrooklyn:Cohort_year2003 -4.435e-06 2.047e+04 0.000 1.000
## BoroughManhattan:Cohort_year2003 -6.553e-06 2.047e+04 0.000 1.000
## BoroughQueens:Cohort_year2003 1.987e+01 1.773e+04 0.001 0.999
## BoroughStaten Island:Cohort_year2003 -5.262e-06 1.448e+04 0.000 1.000
## BoroughBrooklyn:Cohort_year2004 -4.429e-06 2.047e+04 0.000 1.000
## BoroughManhattan:Cohort_year2004 1.987e+01 1.773e+04 0.001 0.999
## BoroughQueens:Cohort_year2004 1.987e+01 1.773e+04 0.001 0.999
## BoroughStaten Island:Cohort_year2004 -5.257e-06 1.448e+04 0.000 1.000
## BoroughBrooklyn:Cohort_year2005 -4.464e-06 1.915e+04 0.000 1.000
## BoroughManhattan:Cohort_year2005 2.057e+01 1.698e+04 0.001 0.999
## BoroughQueens:Cohort_year2005 1.947e+01 1.698e+04 0.001 0.999
## BoroughStaten Island:Cohort_year2005 4.055e-01 1.354e+04 0.000 1.000
## BoroughBrooklyn:Cohort_year2006 1.918e+01 1.651e+04 0.001 0.999
## BoroughManhattan:Cohort_year2006 2.097e+01 1.651e+04 0.001 0.999
## BoroughQueens:Cohort_year2006 2.097e+01 1.651e+04 0.001 0.999
## BoroughStaten Island:Cohort_year2006 1.987e+01 1.518e+04 0.001 0.999
## BoroughBrooklyn:Cohort_year2007 2.016e+01 1.651e+04 0.001 0.999
## BoroughManhattan:Cohort_year2007 2.097e+01 1.651e+04 0.001 0.999
## BoroughQueens:Cohort_year2007 2.195e+01 1.651e+04 0.001 0.999
## BoroughStaten Island:Cohort_year2007 1.987e+01 1.518e+04 0.001 0.999
## BoroughBrooklyn:Cohort_year2008 2.016e+01 1.651e+04 0.001 0.999
## BoroughManhattan:Cohort_year2008 2.097e+01 1.651e+04 0.001 0.999
## BoroughQueens:Cohort_year2008 2.097e+01 1.651e+04 0.001 0.999
## BoroughStaten Island:Cohort_year2008 1.987e+01 1.518e+04 0.001 0.999
## BoroughBrooklyn:Cohort_year2009 2.097e+01 1.651e+04 0.001 0.999
## BoroughManhattan:Cohort_year2009 2.097e+01 1.651e+04 0.001 0.999
## BoroughQueens:Cohort_year2009 2.097e+01 1.651e+04 0.001 0.999
## BoroughStaten Island:Cohort_year2009 1.987e+01 1.518e+04 0.001 0.999
## BoroughBrooklyn:Cohort_year2010 2.097e+01 1.651e+04 0.001 0.999
## BoroughManhattan:Cohort_year2010 2.195e+01 1.651e+04 0.001 0.999
## BoroughQueens:Cohort_year2010 2.195e+01 1.651e+04 0.001 0.999
## BoroughStaten Island:Cohort_year2010 1.987e+01 1.518e+04 0.001 0.999
## BoroughBrooklyn:Cohort_year2011 2.195e+01 1.651e+04 0.001 0.999
## BoroughManhattan:Cohort_year2011 2.195e+01 1.651e+04 0.001 0.999
## BoroughQueens:Cohort_year2011 4.113e+01 1.831e+04 0.002 0.998
## BoroughStaten Island:Cohort_year2011 1.987e+01 1.518e+04 0.001 0.999
## BoroughBrooklyn:Cohort_year2012 1.792e+00 1.448e+04 0.000 1.000
## BoroughManhattan:Cohort_year2012 2.097e+01 1.651e+04 0.001 0.999
## BoroughQueens:Cohort_year2012 2.097e+01 1.651e+04 0.001 0.999
## BoroughStaten Island:Cohort_year2012 -2.877e-01 1.295e+04 0.000 1.000
## BoroughBrooklyn:Cohort_year2013 2.016e+01 1.651e+04 0.001 0.999
## BoroughManhattan:Cohort_year2013 2.016e+01 1.651e+04 0.001 0.999
## BoroughQueens:Cohort_year2013 2.016e+01 1.651e+04 0.001 0.999
## BoroughStaten Island:Cohort_year2013 -1.099e+00 1.295e+04 0.000 1.000
## BoroughBrooklyn:Cohort_year2014 2.057e+01 1.698e+04 0.001 0.999
## BoroughManhattan:Cohort_year2014 2.057e+01 1.698e+04 0.001 0.999
## BoroughQueens:Cohort_year2014 2.057e+01 1.698e+04 0.001 0.999
## BoroughStaten Island:Cohort_year2014 -6.932e-01 1.354e+04 0.000 1.000
## BoroughBrooklyn:Cohort_year2015 2.057e+01 1.915e+04 0.001 0.999
## BoroughManhattan:Cohort_year2015 2.057e+01 1.915e+04 0.001 0.999
## BoroughQueens:Cohort_year2015 2.057e+01 1.915e+04 0.001 0.999
## BoroughStaten Island:Cohort_year2015 -6.932e-01 1.619e+04 0.000 1.000
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 424.58 on 309 degrees of freedom
## Residual deviance: 172.11 on 235 degrees of freedom
## AIC: 322.11
##
## Number of Fisher Scoring iterations: 19
# Likelihood ratio test
anova(m1, m2, m3, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: Grad_binary ~ Borough
## Model 2: Grad_binary ~ Borough + Cohort_year
## Model 3: Grad_binary ~ Borough * Cohort_year
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 305 329.49
## 2 291 188.35 14 141.133 <2e-16 ***
## 3 235 172.11 56 16.241 1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# AIC and BIC for Model 1
AIC_1 <- AIC(m1)
BIC_1 <- BIC(m1)
# AIC and BIC for Model 2
AIC_2 <- AIC(m2)
BIC_2 <- BIC(m2)
# AIC and BIC for Model 3
AIC_3 <- AIC(m3)
BIC_3 <- BIC(m3)
# Display the results
cat("AIC and BIC for Model 1: AIC =", AIC_1, ", BIC =", BIC_1, "\n")
## AIC and BIC for Model 1: AIC = 339.4874 , BIC = 358.1703
cat("AIC and BIC for Model 2: AIC =", AIC_2, ", BIC =", BIC_2, "\n")
## AIC and BIC for Model 2: AIC = 226.3544 , BIC = 297.3493
cat("AIC and BIC for Model 3: AIC =", AIC_3, ", BIC =", BIC_3, "\n")
## AIC and BIC for Model 3: AIC = 322.1136 , BIC = 602.3565
Based on the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values, Model 2, which includes both borough and cohort year, is the best model. It has the lowest AIC value (226.35) and the lowest BIC value (297.35) among the three models, indicating the best balance between model fit and complexity.
# Model 2
summary(m2)
##
## Call:
## glm(formula = Grad_binary ~ Borough + Cohort_year, family = binomial,
## data = DATA)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.361e+00 1.525e+00 -5.483 4.18e-08 ***
## BoroughBrooklyn 4.200e+00 1.098e+00 3.827 0.000130 ***
## BoroughManhattan 5.338e+00 1.129e+00 4.727 2.28e-06 ***
## BoroughQueens 5.603e+00 1.137e+00 4.930 8.23e-07 ***
## BoroughStaten Island 8.542e+00 1.294e+00 6.600 4.10e-11 ***
## Cohort_year2002 2.052e-15 1.354e+00 0.000 1.000000
## Cohort_year2003 8.404e-01 1.310e+00 0.641 0.521232
## Cohort_year2004 1.546e+00 1.286e+00 1.202 0.229392
## Cohort_year2005 1.851e+00 1.218e+00 1.520 0.128574
## Cohort_year2006 3.197e+00 1.185e+00 2.698 0.006985 **
## Cohort_year2007 3.761e+00 1.199e+00 3.138 0.001704 **
## Cohort_year2008 3.475e+00 1.190e+00 2.919 0.003513 **
## Cohort_year2009 3.761e+00 1.199e+00 3.138 0.001704 **
## Cohort_year2010 4.393e+00 1.228e+00 3.576 0.000348 ***
## Cohort_year2011 5.218e+00 1.299e+00 4.018 5.87e-05 ***
## Cohort_year2012 7.336e+00 1.618e+00 4.535 5.76e-06 ***
## Cohort_year2013 8.829e+00 1.729e+00 5.105 3.31e-07 ***
## Cohort_year2014 8.447e+00 1.754e+00 4.817 1.46e-06 ***
## Cohort_year2015 8.447e+00 2.000e+00 4.225 2.39e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 424.58 on 309 degrees of freedom
## Residual deviance: 188.35 on 291 degrees of freedom
## AIC: 226.35
##
## Number of Fisher Scoring iterations: 7
The results of Model 2 suggest that both borough and cohort year significantly influenced whether a cohort of students achieved a graduation rate of 70% or higher. Compared to the Bronx, cohorts from Brooklyn, Manhattan, Queens, and Staten Island were more likely to achieve a graduation rate of 70% or higher. Staten Island showed the highest increase in odds, with a coefficient of 8.542. The cohort year, which refers to the year when a cohort began ninth grade in a given school, also revealed a positive trend. Later cohorts (from 2006 to 2015) had significantly higher odds of achieving the 70% graduation rate. This suggests that graduation rates improved over time, as cohorts from later years were more likely to meet the 70% benchmark compared to earlier cohorts (2002 to 2005).The differences for these earlier cohorts were not shown to be statistically significant. These findings imply that certain boroughs and later cohorts saw improvements in graduation rates over time. This could have been influenced by various factors such as educational policies, school resources, and community engagement. To gain a better understanding of the factors leading to changes in graduation rates, further investigation is required.
GitHub repository link- https://github.com/Dijana12/Assignment-4