Chronic absenteeism in high school can have a significant impact on the academic success of students. It is defined as missing 10% or more school days during a single academic year and includes excused absences, unexcused absences, and suspensions. Being chronically absent from school can lead to lower test scores, reduced participation in class, and an overall decline in academic performance. These consequences extend beyond short-term effects, potentially impacting a student’s future opportunities and career.
This analysis seeks to examine how chronic absenteeism rates can vary across high school grade levels (9th grade through 12th grade). One hypothesis is that 12th graders may exhibit the highest levels of chronic absenteeism. This can be driven by factors such as senioritis, a decline in motivation, and disengagement from school. These factors are often more prevalent among students in their final year of high school, who may be more focused on post-graduation plans rather than current academic responsibilities.
The primary research question guiding this analysis is:
How do chronic absenteeism rates vary across different high school grade levels (9th grade through 12th grade), and are 12th graders more likely to exhibit higher rates of chronic absenteeism compared to earlier grade levels?
This question is designed to explore the differences in chronic absenteeism rates between grades in high school, with a particular focus on the hypothesis that 12th graders may have higher chronic absenteeism rates. The results of this analysis could provide valuable insights into patterns of chronic absenteeism and inform targeted interventions to address this issue at different stages of high school.
This analysis utilizes attendance data from the 2013-2019 Attendance Results provided by the Department of Education (DOE) and distributed by NYC Open Data. The dataset includes high school students in grades 9th to 12th and focuses on the proportion of students considered chronically absent within cohorts of students in the same school, grade, and academic year.
The relevant variables for this analysis are:
Grade: The grade level of students (9th, 10th, 11th, or 12th).
Chronically Absent Percentage: The percentage of students who were chronically absent for each cohort.
A beta regression will be used to analyze this data because the dependent variable, the proportion of chronically absent students, is continuous. Since this proportion is constrained within the open interval (0, 1), a beta regression is an appropriate method for modeling the relationship.
# Importing the data
DATA <- read_csv("2013-2019_Attendance_Results_-_School_20250330.csv", show_col_types = FALSE)
# Looking at the data
data("DATA")
dplyr::glimpse(DATA)
## Rows: 824,146
## Columns: 13
## $ DBN <chr> "01M015", "01M015", "01M015", "01M015"…
## $ `School Name` <chr> "P.S. 015 Roberto Clemente", "P.S. 015…
## $ Grade <chr> "All Grades", "All Grades", "All Grade…
## $ Year <chr> "2013-14", "2014-15", "2015-16", "2016…
## $ `Demographic Category` <chr> "All Students", "All Students", "All S…
## $ `Demographic Variable` <chr> "All Students", "All Students", "All S…
## $ `# Total Days` <dbl> 34803, 33455, 29840, 30601, 33264, 308…
## $ `# Days Absent` <chr> "2783", "2374", "2071", "1994", "2078"…
## $ `# Days Present` <chr> "32020", "31081", "27769", "28607", "3…
## $ `% Attendance` <chr> "92.0", "92.9", "93.1", "93.5", "93.8"…
## $ `# Contributing 20+ Total Days` <chr> "216", "197", "186", "193", "195", "18…
## $ `# Chronically Absent` <chr> "58", "46", "51", "48", "37", "45", "1…
## $ `% Chronically Absent` <chr> "26.9", "23.4", "27.4", "24.9", "19.0"…
# Renaming variable
DATA <- DATA %>%
rename(Chronically_absent_proportion = `% Chronically Absent`)
# Removing the rows where the grade level is not 9, 10, 11, or 12
DATA <- DATA %>%
filter(Grade %in% c("9", "10", "11", "12"))
# Removing the rows where the demographic category is not "All Students"
DATA <- DATA %>%
filter(`Demographic Category` %in% c("All Students"))
# Removing the rows where the chronically absent proportion is "s"
DATA <- DATA %>%
filter(Chronically_absent_proportion != "s")
# Renaming grade levels and putting them in order
DATA <- DATA %>%
mutate(Grade = ifelse(Grade == "9", "9th Grade",
ifelse(Grade == "10", "10th Grade",
ifelse(Grade == "11", "11th Grade",
ifelse(Grade == "12", "12th Grade", Grade)))))
DATA$Grade <- factor(DATA$Grade, levels = c("9th Grade", "10th Grade", "11th Grade", "12th Grade"))
# Converting chronically absent proportion from character to numeric
DATA$Chronically_absent_proportion <- as.numeric(DATA$Chronically_absent_proportion)
# Scale chronically absent proportion to a proportion between 0 and 1
DATA$Chronically_absent_proportion <- DATA$Chronically_absent_proportion / 100
# Adjust the 0 and 1 values to fall between (0, 1)
DATA <- DATA %>%
mutate(
Chronically_absent_proportion = case_when(
Chronically_absent_proportion == 0 ~ 0.0001,
Chronically_absent_proportion == 1 ~ 0.9999,
TRUE ~ Chronically_absent_proportion
)
)
Chronically_absent_proportion variable is scaled from a percentage (ranging from 0 to 100) to a proportion between 0 and 1 by dividing the values by 100. To satisfy the requirements of beta regression, exact 0 and 1 values are adjusted to 0.0001 and 0.9999, respectively, ensuring that all values lie within the open interval (0, 1). This transformation is necessary because Beta regression requires input data strictly between 0 and 1.
# Beta regression model
beta <- betareg(Chronically_absent_proportion ~ Grade, data = DATA)
summary(beta)
##
## Call:
## betareg(formula = Chronically_absent_proportion ~ Grade, data = DATA)
##
## Quantile residuals:
## Min 1Q Median 3Q Max
## -4.8665 -0.5604 0.0549 0.5935 6.8980
##
## Coefficients (mean model with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.56651 0.01614 -35.089 < 2e-16 ***
## Grade10th Grade 0.09291 0.02271 4.092 4.28e-05 ***
## Grade11th Grade 0.02733 0.02295 1.190 0.234
## Grade12th Grade 0.42759 0.02282 18.736 < 2e-16 ***
##
## Phi coefficients (precision model with identity link):
## Estimate Std. Error z value Pr(>|z|)
## (phi) 4.71642 0.05903 79.9 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Type of estimator: ML (maximum likelihood)
## Log-likelihood: 2441 on 5 Df
## Pseudo R-squared: 0.03258
## Number of iterations: 19 (BFGS) + 1 (Fisher scoring)
The beta regression model reveals that the grade level significantly influences the proportion of chronically absent students. The coefficient for 10th grade is 0.09291, indicating that students in 10th grade have a higher proportion of chronic absenteeism compared to 9th graders, with a highly significant p-value (4.28e-05). For 12th grade students, the coefficient is 0.42759, showing a much higher proportion of chronic absenteeism compared to 9th graders. This result is also highly significant. However, the 11th grade coefficient is not statistically significant (p-value = 0.234), suggesting that there is no meaningful difference in chronic absenteeism between 9th and 11th graders. The precision model’s phi coefficient is highly significant, indicating a stable variance structure in the data.
# Simulate the Beta regression model
beta_sim <- sim(beta)
# Compute average marginal effects for the Beta regression model
ame_beta <- sim_ame(beta_sim, var = "Grade", contrast = "rd")
## Warning: `contrast` is ignored when any focal variable takes on more than two
## levels.
# Display the AME results
ame_beta
## A `clarify_est` object (from `sim_ame()`)
## - Average adjusted predictions for `Grade`
## - 1000 simulated values
## - 4 quantities estimated:
## E[Y(9th Grade)] 0.3620431
## E[Y(10th Grade)] 0.3837645
## E[Y(11th Grade)] 0.3683782
## E[Y(12th Grade)] 0.4653256
# Plot the Average Marginal Effects for the Beta regression model
plot(ame_beta)
The results of the average marginal effects (AME) for the Beta regression model show the expected proportion of chronically absent students for each grade level. The expected proportion of chronically absent students is 0.362 for 9th grade, 0.384 for 10th grade, 0.368 for 11th grade, and 0.465 for 12th grade. These values indicate that 12th graders have the highest proportion of chronic absenteeism, followed by 10th and 11th graders. 9th graders were shown to have the lowest expected proportion of chronic absenteeism. The marginal effects highlight the differences in absenteeism across grade levels, suggesting a trend where the rate of chronic absenteeism is larger in higher grades.
This analysis explored chronic absenteeism across high school grade levels, with a particular focus on whether 12th graders exhibit higher rates of chronic absenteeism compared to earlier grades. The results from the beta regression model indicated that grade level does have a significant impact on chronic absenteeism, with 12th graders showing the highest proportion of chronic absenteeism, followed by 10th graders. The coefficient for 10th grade was also significant, suggesting a noticeable increase in chronic absenteeism compared to 9th grade, while 11th grade did not show a statistically significant difference.
The Average Marginal Effects (AME) results further highlighted these trends, with 12th graders having the highest expected proportion of chronic absenteeism, supporting the hypothesis that there may be factors contributing to higher absenteeism in the final year of high school.
These findings can inform school policies and interventions aimed at reducing chronic absenteeism, particularly among 12th graders. Given the impact chronic absenteeism can have on academic performance and future opportunities, these insights could be used to tailor interventions that address the unique challenges faced by students in different grade levels. Further research into the potential factors that lead to chronic absenteeism would need to be conducted to get a better understanding of chronic absenteeism and find ways to reduce it across all grade levels.