Dropping out of high school can have a significant impact on the success and well-being of an individual’s life. High school dropouts often face higher unemployment rates, lower salaries, and are at a higher risk of being incarcerated. These are not only personal consequences; they can extend over to the community and economy.
This analysis aims to examine how the number of high school dropouts within a cohort varies depending on the cohort’s expected graduation timeline and the year the cohort began high school. Specifically, it compares cohorts who were expected to graduate in 4 years to those who were expected to graduate in 5 or 6 years. Additionally, the analysis examines how dropout rates have changed over time for cohorts that started high school each year between 2001 and 2015.
# Importing the data
Graduation <- read_csv("C:/Users/dijan/Documents/DATA 712/graduation_data.csv", show_col_types = FALSE)
# Looking at the data
data("Graduation")
dplyr::glimpse(Graduation)
## Rows: 327
## Columns: 27
## $ Borough <chr> "Bronx", "Bronx", "Bronx", "Bro…
## $ Category <chr> "All Students", "All Students",…
## $ `Cohort Year` <dbl> 2015, 2014, 2013, 2012, 2011, 2…
## $ Cohort <chr> "4 year August", "4 year August…
## $ `# Total Cohort` <dbl> 13891, 13951, 13730, 13838, 142…
## $ `# Grads` <dbl> 9752, 9398, 9102, 8985, 8821, 8…
## $ `% Grads` <dbl> 70.2, 67.4, 66.3, 64.9, 61.8, 5…
## $ `# Total Regents` <dbl> 8446, 8246, 8105, 8149, 8073, 7…
## $ `% Total Regents of Cohort` <dbl> 60.8, 59.1, 59.0, 58.9, 56.5, 5…
## $ `% Total Regents of Grads` <dbl> 86.6, 87.7, 89.0, 90.7, 91.5, 9…
## $ `# Advanced Regents` <dbl> 1579, 1584, 1548, 1505, 1494, 1…
## $ `% Advanced Regents of Cohort` <dbl> 11.4, 11.4, 11.3, 10.9, 10.5, 1…
## $ `% Advanced Regents of Grads` <dbl> 16.2, 16.9, 17.0, 16.8, 16.9, 1…
## $ `# Regents without Advanced` <dbl> 6867, 6662, 6557, 6644, 6579, 6…
## $ `% Regents without Advanced of Cohort` <dbl> 49.4, 47.8, 47.8, 48.0, 46.1, 4…
## $ `% Regents without Advanced of Grads` <dbl> 70.4, 70.9, 72.0, 73.9, 74.6, 7…
## $ `# Local` <dbl> 1306, 1152, 997, 836, 748, 710,…
## $ `% Local of Cohort` <dbl> 9.4, 8.3, 7.3, 6.0, 5.2, 5.0, 4…
## $ `% Local of Grads` <dbl> 13.4, 12.3, 11.0, 9.3, 8.5, 8.4…
## $ `# Still Enrolled` <dbl> 2124, 2632, 2742, 2876, 3243, 3…
## $ `% Still Enrolled` <dbl> 15.3, 18.9, 20.0, 20.8, 22.7, 2…
## $ `# Dropout` <dbl> 1759, 1693, 1606, 1757, 1866, 2…
## $ `% Dropout` <dbl> 12.7, 12.1, 11.7, 12.7, 13.1, 1…
## $ `# SACC (IEP Diploma)` <dbl> 80, 64, 118, 100, 207, 240, 273…
## $ `% SACC (IEP Diploma) of Cohort` <dbl> 0.6, 0.5, 0.9, 0.7, 1.4, 1.7, 1…
## $ `# TASC (GED)` <dbl> 175, 164, 151, 110, 126, 144, 1…
## $ `% TASC (GED) of Cohort` <dbl> 1.3, 1.2, 1.1, 0.8, 0.9, 1.0, 1…
# Renaming variables
Graduation <- Graduation %>%
rename(Dropout = `# Dropout`,
Cohort_year = `Cohort Year`)
# Removing the rows where the borough is "District 79"
Graduation <- Graduation %>%
filter(Borough != "District 79")
# Convert 'Cohort' into a binary variable (0 for cohorts who were expected to graduate in 4 years, 1 for cohorts who were expected to graduate in 5 or 6 years)
Graduation <- Graduation %>%
mutate(Expected_graduation = case_when(
Cohort %in% c("4 year August", "4 year June") ~ 0,
TRUE ~ 1
))
# Model predicting dropout outcome by expected graduation
m1 <- glm(Dropout ~ Expected_graduation, family = poisson, data = Graduation)
summary(m1)
##
## Call:
## glm(formula = Dropout ~ Expected_graduation, family = poisson,
## data = Graduation)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.378984 0.002191 3367.5 <2e-16 ***
## Expected_graduation 0.443781 0.002651 167.4 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 216902 on 309 degrees of freedom
## Residual deviance: 187515 on 308 degrees of freedom
## AIC: 190395
##
## Number of Fisher Scoring iterations: 4
This Poisson regression model shows that the expected graduation time significantly affects the number of students who drop out. The coefficient for expected graduation is 0.4438, meaning that cohorts expected to graduate in 5 or 6 years had a higher number of students who dropped out compared to those expected to graduate in 4 years. The p-value is extremely small, indicating strong statistical significance.
# Average Marginal Effects for expected graduation using Model 1
sim_coefs1 <- sim(m1)
sim_est1 <- sim_ame(sim_coefs1, var = "Expected_graduation",
contrast = "rd")
summary(sim_est1)
## Estimate 2.5 % 97.5 %
## E[Y(0)] 1602 1595 1609
## E[Y(1)] 2497 2490 2505
## RD 895 884 905
This model estimates the expected number of dropouts for cohorts based on their expected graduation time. For cohorts expected to graduate in 4 years, the expected number of dropouts is 1602. For cohorts expected to graduate in 5 or 6 years, the expected number of dropouts is 2497. The difference between these two groups is 895 dropouts. The results are statistically significant.
# Model predicting dropout outcome by expected graduation and cohort year
m2 <- glm(Dropout ~ Expected_graduation + Cohort_year, family = poisson, data = Graduation)
summary(m2)
##
## Call:
## glm(formula = Dropout ~ Expected_graduation + Cohort_year, family = poisson,
## data = Graduation)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 112.684562 0.634614 177.6 <2e-16 ***
## Expected_graduation 0.398569 0.002664 149.6 <2e-16 ***
## Cohort_year -0.052432 0.000316 -165.9 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 216902 on 309 degrees of freedom
## Residual deviance: 160046 on 307 degrees of freedom
## AIC: 162928
##
## Number of Fisher Scoring iterations: 4
This Poisson regression model shows that both the expected graduation time and cohort year significantly affect the number of students who drop out. The coefficient for expected graduation is 0.3986, meaning that cohorts expected to graduate in 5 or 6 years had a higher number of dropouts compared to those expected to graduate in 4 years. The coefficient for cohort year is -0.0524, suggesting a slight decrease in the number of dropouts with each subsequent cohort year. The p-values for both variables are extremely small, indicating strong statistical significance.
# Average Marginal Effects for expected graduation using Model 2
sim_coefs2 <- sim(m2)
sim_est2 <- sim_ame(sim_coefs2, var = "Expected_graduation",
contrast = "rd")
summary(sim_est2)
## Estimate 2.5 % 97.5 %
## E[Y(0)] 1645 1638 1652
## E[Y(1)] 2450 2444 2458
## RD 806 796 816
This model estimates the expected number of dropouts based on expected graduation time and cohort year. For cohorts expected to graduate in 4 years, the expected number of dropouts is 1645. For cohorts expected to graduate in 5 or 6 years, the expected number of dropouts is 2450. The difference between these two groups is 806 dropouts. This model accounts for changes in dropout patterns over time. The results are statistically significant.
# Effect of cohort year
sim_est2a <- sim_ame(sim_coefs2, var = "Cohort_year",
contrast = "rd")
## Warning: `contrast` is ignored when the focal variable is continuous.
summary(sim_est2a)
## Estimate 2.5 % 97.5 %
## E[dY/d(Cohort_year)] -111 -113 -110
This model estimates the effect of cohort year on the expected number of dropouts. For each one-year increase in cohort year, the expected number of dropouts decreases by 111. This suggests that over time cohorts experienced fewer dropouts. This result is statistically significant.
# Dose-Response relationship prediction for cohort year
sim_est2b <- sim_adrf(sim_coefs2, var = "Cohort_year",
contrast = "adrf")
summary(sim_est2b)
## Estimate 2.5 % 97.5 %
## E[Y(2001)] 3036 3022 3051
## E[Y(2001.7)] 2927 2914 2940
## E[Y(2002.4)] 2822 2810 2833
## E[Y(2003.1)] 2720 2710 2730
## E[Y(2003.8)] 2622 2613 2631
## E[Y(2004.5)] 2527 2520 2536
## E[Y(2005.2)] 2436 2430 2444
## E[Y(2005.9)] 2348 2343 2355
## E[Y(2006.6)] 2264 2259 2270
## E[Y(2007.3)] 2182 2177 2188
## E[Y(2008)] 2104 2099 2109
## E[Y(2008.7)] 2028 2023 2033
## E[Y(2009.4)] 1955 1950 1960
## E[Y(2010.1)] 1884 1879 1890
## E[Y(2010.8)] 1816 1811 1822
## E[Y(2011.5)] 1751 1745 1757
## E[Y(2012.2)] 1688 1682 1694
## E[Y(2012.9)] 1627 1620 1634
## E[Y(2013.6)] 1568 1561 1576
## E[Y(2014.3)] 1512 1505 1519
## E[Y(2015)] 1457 1450 1465
plot(sim_est2b)
This model estimates the expected number of dropouts for different cohort years. The results show a gradual decrease in the expected number of dropouts over time, with the number decreasing from 3036 in 2001 to 1457 in 2015. This suggests that over the years dropout rates have steadily declined. The results are statistically significant.
# Dose-Response relationship effect for cohort year
sim_est2b <- sim_adrf(sim_coefs2, var = "Cohort_year",
contrast = "amef")
summary(sim_est2b)
## Estimate 2.5 % 97.5 %
## E[dY/d(Cohort_year)|2001] -159.2 -161.8 -156.6
## E[dY/d(Cohort_year)|2001.7] -153.5 -155.9 -151.0
## E[dY/d(Cohort_year)|2002.4] -147.9 -150.2 -145.6
## E[dY/d(Cohort_year)|2003.1] -142.6 -144.7 -140.5
## E[dY/d(Cohort_year)|2003.8] -137.5 -139.4 -135.5
## E[dY/d(Cohort_year)|2004.5] -132.5 -134.4 -130.6
## E[dY/d(Cohort_year)|2005.2] -127.7 -129.5 -126.0
## E[dY/d(Cohort_year)|2005.9] -123.1 -124.7 -121.5
## E[dY/d(Cohort_year)|2006.6] -118.7 -120.2 -117.2
## E[dY/d(Cohort_year)|2007.3] -114.4 -115.8 -113.0
## E[dY/d(Cohort_year)|2008] -110.3 -111.6 -109.0
## E[dY/d(Cohort_year)|2008.7] -106.3 -107.5 -105.1
## E[dY/d(Cohort_year)|2009.4] -102.5 -103.6 -101.3
## E[dY/d(Cohort_year)|2010.1] -98.8 -99.9 -97.7
## E[dY/d(Cohort_year)|2010.8] -95.2 -96.2 -94.3
## E[dY/d(Cohort_year)|2011.5] -91.8 -92.7 -90.9
## E[dY/d(Cohort_year)|2012.2] -88.5 -89.3 -87.7
## E[dY/d(Cohort_year)|2012.9] -85.3 -86.1 -84.5
## E[dY/d(Cohort_year)|2013.6] -82.2 -82.9 -81.5
## E[dY/d(Cohort_year)|2014.3] -79.3 -79.9 -78.6
## E[dY/d(Cohort_year)|2015] -76.4 -77.0 -75.8
plot(sim_est2b)
This model estimates the effect of cohort year on the change in the expected number of dropouts over time. The results show a gradual decrease in the change of dropouts with each subsequent cohort year, starting with a decrease of 159.2 dropouts in 2001 and gradually declining to 76.4 dropouts in 2015. This suggests that the effect of cohort year on dropouts has become less pronounced over time. The results are statistically significant.
dispersiontest(m2)
##
## Overdispersion test
##
## data: m2
## z = 13.39, p-value < 2.2e-16
## alternative hypothesis: true dispersion is greater than 1
## sample estimates:
## dispersion
## 409.1987
The over dispersion test for the Poisson regression model indicates that the dispersion parameter is significantly greater than 1. The test statistic and the p-value strongly suggests that the data exhibits over dispersion. The estimated dispersion value is 409.1987, indicating that the variance of the outcome variable is much higher than expected under the Poisson distribution, where the mean and variance should be equal. This suggests that a Poisson model may not be the best fit for this data. Alternative models, such as negative binomial regression, may be more appropriate.
# Negative binomial regression
m3 <- MASS::glm.nb(Dropout ~ Expected_graduation + Cohort_year, data = Graduation)
summary(m3)
##
## Call:
## MASS::glm.nb(formula = Dropout ~ Expected_graduation + Cohort_year,
## data = Graduation, init.theta = 3.011356575, link = log)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 117.304644 16.923254 6.932 4.16e-12 ***
## Expected_graduation 0.397519 0.066815 5.950 2.69e-09 ***
## Cohort_year -0.054733 0.008424 -6.497 8.20e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(3.0114) family taken to be 1)
##
## Null deviance: 411.35 on 309 degrees of freedom
## Residual deviance: 327.08 on 307 degrees of freedom
## AIC: 5193.2
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 3.011
## Std. Err.: 0.230
##
## 2 x log-likelihood: -5185.157
This Negative Binomial regression model shows that both expected graduation time and cohort year significantly affect the number of dropouts. The coefficient for expected graduation is 0.3975, meaning that cohorts expected to graduate in 5 or 6 years had a higher number of dropouts compared to those expected to graduate in 4 years. The coefficient for cohort year is -0.0547, indicating that each subsequent cohort year is associated with a decrease in the number of dropouts. Both variables are statistically significant, and the model accounts for over dispersion, providing a better fit than the Poisson model.
# Effect of expected graduation
sim_coefs3 <- sim(m3)
sim_est3 <- sim_ame(sim_coefs3, var = "Expected_graduation",
contrast = "rd")
summary(sim_est3)
## Estimate 2.5 % 97.5 %
## E[Y(0)] 1648 1488 1828
## E[Y(1)] 2452 2256 2665
## RD 804 553 1074
This model estimates the expected number of dropouts based on expected graduation time using the Negative Binomial regression. For cohorts expected to graduate in 4 years, the expected number of dropouts is 1648. For those expected to graduate in 5 or 6 years, the expected number of dropouts is 2452. The difference between these two groups is 804 dropouts. The result is statistically significant.
# Effect of cohort year
sim_est3a <- sim_ame(sim_coefs3, var = "Cohort_year",
contrast = "rd")
## Warning: `contrast` is ignored when the focal variable is continuous.
summary(sim_est3a)
## Estimate 2.5 % 97.5 %
## E[dY/d(Cohort_year)] -116 -153 -79
This model estimates the change in the expected number of dropouts based on cohort year using the Negative Binomial regression. For each subsequent cohort year, the expected number of dropouts decreases by 116.2. The result is statistically significant.
# Dose-response relationship: prediction
sim_est3b <- sim_adrf(sim_coefs3, var = "Cohort_year",
contrast = "adrf")
summary(sim_est3b)
## Estimate 2.5 % 97.5 %
## E[Y(2001)] 3085 2698 3531
## E[Y(2001.7)] 2969 2622 3362
## E[Y(2002.4)] 2857 2548 3204
## E[Y(2003.1)] 2750 2480 3057
## E[Y(2003.8)] 2646 2407 2914
## E[Y(2004.5)] 2547 2334 2780
## E[Y(2005.2)] 2451 2265 2661
## E[Y(2005.9)] 2359 2195 2540
## E[Y(2006.6)] 2270 2119 2426
## E[Y(2007.3)] 2185 2040 2334
## E[Y(2008)] 2103 1971 2239
## E[Y(2008.7)] 2024 1897 2154
## E[Y(2009.4)] 1948 1821 2082
## E[Y(2010.1)] 1875 1743 2018
## E[Y(2010.8)] 1804 1668 1956
## E[Y(2011.5)] 1736 1599 1897
## E[Y(2012.2)] 1671 1528 1842
## E[Y(2012.9)] 1608 1457 1793
## E[Y(2013.6)] 1548 1389 1744
## E[Y(2014.3)] 1490 1328 1693
## E[Y(2015)] 1434 1268 1646
plot(sim_est3b)
This model estimates the expected number of dropouts for different cohort years using the Negative Binomial regression. There is a gradual decrease in the expected number of dropouts over time, with the number decreasing from 3085 in 2001 to 1434 in 2015. This suggests that dropout rates have steadily declined over the years. The results are statistically significant.
# Dose-response relationship: effect
sim_est3b <- sim_adrf(sim_coefs3, var = "Cohort_year",
contrast = "amef")
summary(sim_est3b)
## Estimate 2.5 % 97.5 %
## E[dY/d(Cohort_year)|2001] -168.8 -243.4 -103.3
## E[dY/d(Cohort_year)|2001.7] -162.5 -231.7 -100.6
## E[dY/d(Cohort_year)|2002.4] -156.4 -220.6 -98.0
## E[dY/d(Cohort_year)|2003.1] -150.5 -210.1 -95.4
## E[dY/d(Cohort_year)|2003.8] -144.8 -200.2 -92.8
## E[dY/d(Cohort_year)|2004.5] -139.4 -190.7 -90.3
## E[dY/d(Cohort_year)|2005.2] -134.2 -181.6 -87.9
## E[dY/d(Cohort_year)|2005.9] -129.1 -173.0 -85.5
## E[dY/d(Cohort_year)|2006.6] -124.3 -165.0 -83.2
## E[dY/d(Cohort_year)|2007.3] -119.6 -157.1 -81.0
## E[dY/d(Cohort_year)|2008] -115.1 -149.7 -78.8
## E[dY/d(Cohort_year)|2008.7] -110.8 -142.6 -76.8
## E[dY/d(Cohort_year)|2009.4] -106.6 -136.1 -74.7
## E[dY/d(Cohort_year)|2010.1] -102.6 -129.6 -72.8
## E[dY/d(Cohort_year)|2010.8] -98.7 -123.3 -70.9
## E[dY/d(Cohort_year)|2011.5] -95.0 -117.2 -69.0
## E[dY/d(Cohort_year)|2012.2] -91.5 -111.7 -67.2
## E[dY/d(Cohort_year)|2012.9] -88.0 -106.4 -65.5
## E[dY/d(Cohort_year)|2013.6] -84.7 -101.4 -63.8
## E[dY/d(Cohort_year)|2014.3] -81.5 -96.8 -62.1
## E[dY/d(Cohort_year)|2015] -78.5 -92.4 -60.5
plot(sim_est3b)
This model estimates the change in the expected number of dropouts based on cohort year using the Negative Binomial regression. For each subsequent cohort year, the expected number of dropouts decreases by 168.8 in 2001, with the effect becoming slightly smaller over time as the years progress. For example, by 2015, the expected decrease in dropouts per cohort year was 78.5. These results suggest a consistent decline in the number of dropouts over time. The effect is statistically significant across all years.
The goal of this analysis was to investigate how expected graduation time and cohort year influence the number of high school dropouts. The results show significant differences in dropout rates based on these factors. Cohorts expected to graduate in 5 or 6 years experienced a higher number of dropouts compared to those expected to graduate in 4 years, with a difference of 804 dropouts. This finding suggests that longer graduation timelines may be associated with higher dropout rates, possibly due to disengagement over time. Additionally, the analysis revealed a consistent decline in the number of dropouts across cohorts from 2001 to 2015, with each subsequent cohort year being associated with a decrease in dropouts. The expected number of dropouts decreased from 3085 in 2001 to 1434 in 2015, indicating a steady decline in dropout rates over time. This decline may reflect broader reforms in education or increased engagement over the years. The statistical significance of these findings suggests that both expected graduation time and cohort year play important roles in dropout trends. Future research should explore additional factors that may influence these trends, such as changes in educational policies, school resources, and community engagement.