DATA 712 HW#6
Introduction
Hate crimes remain a pressing social concern in the United States, with significant variation across time and demographic groups. Understanding patterns in the frequency of hate crime incidents is essential for informing public policy and community interventions. This analysis draws on hate crime data from the FBI to explore how the number of reported hate crime incidents varies by year and by the race of the offender. Using count regression models, including Poisson and Negative Binomial models, the study investigates whether these variables significantly predict incident counts and whether the assumptions of different modeling approaches are appropriate for the structure of the data.
Load data
colnames(DATA) <- c("Year", "Pug Agency Name", "State", "State Name", "Incident Date",
"Offender Race", "Offender Ethnicity", "Offense", "Reason/Bias")
print(colnames(DATA))## [1] "Year" "Pug Agency Name" "State"
## [4] "State Name" "Incident Date" "Offender Race"
## [7] "Offender Ethnicity" "Offense" "Reason/Bias"
DATA <- DATA %>%
filter(!is.na(Year), !is.na(`Offender Race`))
grouped_data <- DATA %>%
group_by(Year, `Offender Race`) %>%
summarise(incident_count = n(), .groups = "drop") %>%
mutate(
Year = as.factor(Year),
`Offender Race` = as.factor(`Offender Race`)
)
head(grouped_data)## # A tibble: 6 × 3
## Year `Offender Race` incident_count
## <fct> <fct> <int>
## 1 1991 American Indian or Alaska Native 12
## 2 1991 Asian 47
## 3 1991 Black or African American 775
## 4 1991 Multiple 77
## 5 1991 Unknown 1989
## 6 1991 White 1689
I grouped the data by Year and
Offender Race to create a count variable representing the
number of incidents in each group. This prepares the data for Poisson
and Negative Binomial modeling (if necessary).
Poisson Model
poisson_model <- glm(incident_count ~ Year + `Offender Race`,
family = poisson, data = grouped_data)
dispersion <- sum(residuals(poisson_model, type = "pearson")^2) / poisson_model$df.residual
dispersion ## [1] 44.58803
The Poisson regression model is to predict the number of hate crime incidents by year and offender race. The dispersion statistic was 44.59, which is much greater than 1. This indicates severe over dispersion, meaning that the variance in the data far exceeds what the Poisson model assumes. As a result, the standard errors and significance tests from the Poisson model may be unreliable. Therefore, I proceeded to fit a Negative Binomial model, which adjusts for overdispersion by including a dispersion parameter.
Negative Binomial Model
nb_model <- glm.nb(incident_count ~ Year + `Offender Race`, data = grouped_data)
screenreg(list(Poisson = poisson_model, NegBin = nb_model))##
## ====================================================================================
## Poisson NegBin
## ------------------------------------------------------------------------------------
## (Intercept) 3.33 *** 3.27 ***
## (0.03) (0.10)
## Year1992 0.37 *** 0.27 *
## (0.02) (0.13)
## Year1993 0.51 *** 0.48 ***
## (0.02) (0.13)
## Year1994 0.26 *** 0.20
## (0.02) (0.13)
## Year1995 0.55 *** 0.57 ***
## (0.02) (0.13)
## Year1996 0.65 *** 0.57 ***
## (0.02) (0.13)
## Year1997 0.57 *** 0.53 ***
## (0.02) (0.13)
## Year1998 0.54 *** 0.44 ***
## (0.02) (0.13)
## Year1999 0.55 *** 0.45 ***
## (0.02) (0.13)
## Year2000 0.58 *** 0.50 ***
## (0.02) (0.13)
## Year2001 0.75 *** 0.61 ***
## (0.02) (0.13)
## Year2002 0.49 *** 0.44 ***
## (0.02) (0.13)
## Year2003 0.50 *** 0.51 ***
## (0.02) (0.13)
## Year2004 0.52 *** 0.43 **
## (0.02) (0.13)
## Year2005 0.48 *** 0.46 ***
## (0.02) (0.13)
## Year2006 0.52 *** 0.55 ***
## (0.02) (0.13)
## Year2007 0.51 *** 0.45 ***
## (0.02) (0.13)
## Year2008 0.56 *** 0.48 ***
## (0.02) (0.13)
## Year2009 0.37 *** 0.36 **
## (0.02) (0.13)
## Year2010 0.37 *** 0.41 **
## (0.02) (0.13)
## Year2011 0.32 *** 0.35 **
## (0.02) (0.13)
## Year2012 0.36 *** 0.43 **
## (0.02) (0.13)
## Year2013 0.27 *** 0.26 *
## (0.02) (0.13)
## Year2014 0.20 *** 0.24
## (0.02) (0.13)
## Year2015 0.24 *** 0.39 **
## (0.02) (0.13)
## Year2016 0.31 *** 0.38 **
## (0.02) (0.13)
## Year2017 0.25 *** 0.38 **
## (0.02) (0.13)
## Year2018 0.23 *** 0.52 ***
## (0.02) (0.13)
## Year2019 0.33 *** 0.54 ***
## (0.02) (0.13)
## Year2020 0.56 *** 0.76 ***
## (0.02) (0.12)
## Year2021 0.66 *** 1.07 ***
## (0.02) (0.12)
## Year2022 0.73 *** 1.08 ***
## (0.02) (0.12)
## Year2023 0.74 *** 0.99 ***
## (0.02) (0.12)
## `Offender Race`Asian 0.30 *** 0.29 ***
## (0.03) (0.06)
## `Offender Race`Black or African American 3.13 *** 3.15 ***
## (0.03) (0.06)
## `Offender Race`Multiple 1.27 *** 1.29 ***
## (0.03) (0.06)
## `Offender Race`Native Hawaiian or Other Pacific Islander -1.00 *** -1.20 ***
## (0.08) (0.11)
## `Offender Race`Not Specified 3.65 *** 3.40 ***
## (0.03) (0.10)
## `Offender Race`Unknown 4.21 *** 4.27 ***
## (0.03) (0.06)
## `Offender Race`White 4.22 *** 4.24 ***
## (0.03) (0.06)
## ------------------------------------------------------------------------------------
## AIC 9896.98 2591.89
## BIC 10031.99 2730.28
## Log Likelihood -4908.49 -1254.95
## Deviance 8167.01 235.06
## Num. obs. 216 216
## ====================================================================================
## *** p < 0.001; ** p < 0.01; * p < 0.05
The Negative Binomial model produced a much better fit than the Poisson model, as indicated by the lower AIC (2591.89 vs. 9896.98) and substantially lower deviance (235.06 vs. 8167.01). This confirms that the Negative Binomial model better accounts for the overdispersion in the data. The regression coefficients remain relatively stable across both models, but the standard errors are more reliable in the Negative Binomial model. Therefore, I use this model for all further simulations and interpretation of results.
Interpreting the NegBin Model Using clarify
# Simulate parameters from NB model
sim_nb <- sim(nb_model, n = 1000, vcov = "robust")
# Set reference values
set_vals <- sim_setx(sim_nb, x = list(
Year = "2020",
`Offender Race` = "White"
))# Predicted count for a reference profile
predicted_count <- sim_apply(sim_nb, FUN = predict, setx = set_vals)
summary(predicted_count)## Estimate 2.5 % 97.5 %
## 1 3.27 3.27 3.27
## 2 3.56 3.56 3.56
## 3 6.42 6.42 6.42
## 4 4.56 4.56 4.56
## 5 7.54 7.54 7.54
## 6 7.51 7.51 7.51
## 7 3.54 3.54 3.54
## 8 3.83 3.83 3.83
## 9 6.69 6.69 6.69
## 10 4.83 4.83 4.83
## 11 7.81 7.81 7.81
## 12 7.78 7.78 7.78
## 13 3.75 3.75 3.75
## 14 4.04 4.04 4.04
## 15 6.90 6.90 6.90
## 16 5.03 5.03 5.03
## 17 8.01 8.01 8.01
## 18 7.98 7.98 7.98
## 19 3.46 3.46 3.46
## 20 3.76 3.76 3.76
## 21 6.62 6.62 6.62
## 22 4.75 4.75 4.75
## 23 7.73 7.73 7.73
## 24 7.70 7.70 7.70
## 25 3.84 3.84 3.84
## 26 4.13 4.13 4.13
## 27 6.99 6.99 6.99
## 28 5.13 5.13 5.13
## 29 8.11 8.11 8.11
## 30 8.08 8.08 8.08
## 31 3.83 3.83 3.83
## 32 4.13 4.13 4.13
## 33 6.99 6.99 6.99
## 34 5.12 5.12 5.12
## 35 8.10 8.10 8.10
## 36 8.07 8.07 8.07
## 37 3.80 3.80 3.80
## 38 4.09 4.09 4.09
## 39 6.95 6.95 6.95
## 40 5.09 5.09 5.09
## 41 8.07 8.07 8.07
## 42 8.04 8.04 8.04
## 43 3.71 3.71 3.71
## 44 4.00 4.00 4.00
## 45 6.86 6.86 6.86
## 46 5.00 5.00 5.00
## 47 7.98 7.98 7.98
## 48 7.95 7.95 7.95
## 49 3.71 3.71 3.71
## 50 4.01 4.01 4.01
## 51 6.87 6.87 6.87
## 52 5.00 5.00 5.00
## 53 7.98 7.98 7.98
## 54 7.95 7.95 7.95
## 55 3.77 3.77 3.77
## 56 4.06 4.06 4.06
## 57 6.92 6.92 6.92
## 58 5.06 5.06 5.06
## 59 8.04 8.04 8.04
## 60 8.01 8.01 8.01
## 61 3.87 3.87 3.87
## 62 4.17 4.17 4.17
## 63 7.03 7.03 7.03
## 64 5.16 5.16 5.16
## 65 8.14 8.14 8.14
## 66 8.11 8.11 8.11
## 67 3.71 3.71 3.71
## 68 4.00 4.00 4.00
## 69 6.86 6.86 6.86
## 70 5.00 5.00 5.00
## 71 7.98 7.98 7.98
## 72 7.95 7.95 7.95
## 73 3.78 3.78 3.78
## 74 4.07 4.07 4.07
## 75 6.93 6.93 6.93
## 76 5.07 5.07 5.07
## 77 8.05 8.05 8.05
## 78 8.02 8.02 8.02
## 79 3.70 3.70 3.70
## 80 3.99 3.99 3.99
## 81 6.85 6.85 6.85
## 82 4.99 4.99 4.99
## 83 7.97 7.97 7.97
## 84 7.94 7.94 7.94
## 85 3.73 3.73 3.73
## 86 4.02 4.02 4.02
## 87 6.88 6.88 6.88
## 88 5.02 5.02 5.02
## 89 8.00 8.00 8.00
## 90 7.97 7.97 7.97
## 91 3.82 3.82 3.82
## 92 4.11 4.11 4.11
## 93 6.97 6.97 6.97
## 94 5.11 5.11 5.11
## 95 8.09 8.09 8.09
## 96 8.06 8.06 8.06
## 97 3.72 3.72 3.72
## 98 4.01 4.01 4.01
## 99 6.87 6.87 6.87
## 100 5.00 5.00 5.00
## 101 7.98 7.98 7.98
## 102 7.95 7.95 7.95
## 103 3.75 3.75 3.75
## 104 4.04 4.04 4.04
## 105 6.90 6.90 6.90
## 106 5.04 5.04 5.04
## 107 8.02 8.02 8.02
## 108 7.99 7.99 7.99
## 109 3.63 3.63 3.63
## 110 3.92 3.92 3.92
## 111 6.78 6.78 6.78
## 112 4.92 4.92 4.92
## 113 7.90 7.90 7.90
## 114 7.87 7.87 7.87
## 115 3.68 3.68 3.68
## 116 3.97 3.97 3.97
## 117 6.83 6.83 6.83
## 118 4.97 4.97 4.97
## 119 7.94 7.94 7.94
## 120 7.91 7.91 7.91
## 121 3.62 3.62 3.62
## 122 3.91 3.91 3.91
## 123 6.77 6.77 6.77
## 124 4.91 4.91 4.91
## 125 7.89 7.89 7.89
## 126 7.86 7.86 7.86
## 127 3.70 3.70 3.70
## 128 3.99 3.99 3.99
## 129 6.85 6.85 6.85
## 130 4.98 4.98 4.98
## 131 7.96 7.96 7.96
## 132 7.93 7.93 7.93
## 133 3.53 3.53 3.53
## 134 3.82 3.82 3.82
## 135 6.68 6.68 6.68
## 136 4.82 4.82 4.82
## 137 2.33 2.33 2.33
## 138 7.80 7.80 7.80
## 139 7.77 7.77 7.77
## 140 3.51 3.51 3.51
## 141 3.80 3.80 3.80
## 142 6.66 6.66 6.66
## 143 4.80 4.80 4.80
## 144 2.31 2.31 2.31
## 145 7.78 7.78 7.78
## 146 7.75 7.75 7.75
## 147 3.65 3.65 3.65
## 148 3.94 3.94 3.94
## 149 6.81 6.81 6.81
## 150 4.94 4.94 4.94
## 151 2.45 2.45 2.45
## 152 7.92 7.92 7.92
## 153 7.89 7.89 7.89
## 154 3.65 3.65 3.65
## 155 3.94 3.94 3.94
## 156 6.81 6.81 6.81
## 157 4.94 4.94 4.94
## 158 2.45 2.45 2.45
## 159 7.92 7.92 7.92
## 160 7.89 7.89 7.89
## 161 3.65 3.65 3.65
## 162 3.94 3.94 3.94
## 163 6.81 6.81 6.81
## 164 4.94 4.94 4.94
## 165 2.45 2.45 2.45
## 166 7.05 7.05 7.05
## 167 7.92 7.92 7.92
## 168 7.89 7.89 7.89
## 169 3.79 3.79 3.79
## 170 4.08 4.08 4.08
## 171 6.94 6.94 6.94
## 172 5.08 5.08 5.08
## 173 2.59 2.59 2.59
## 174 7.19 7.19 7.19
## 175 8.06 8.06 8.06
## 176 8.03 8.03 8.03
## 177 3.81 3.81 3.81
## 178 4.10 4.10 4.10
## 179 6.97 6.97 6.97
## 180 5.10 5.10 5.10
## 181 2.61 2.61 2.61
## 182 7.21 7.21 7.21
## 183 8.08 8.08 8.08
## 184 8.05 8.05 8.05
## 185 4.03 4.03 4.03
## 186 4.32 4.32 4.32
## 187 7.18 7.18 7.18
## 188 5.32 5.32 5.32
## 189 2.83 2.83 2.83
## 190 7.43 7.43 7.43
## 191 8.30 8.30 8.30
## 192 8.27 8.27 8.27
## 193 4.33 4.33 4.33
## 194 4.63 4.63 4.63
## 195 7.49 7.49 7.49
## 196 5.62 5.62 5.62
## 197 3.13 3.13 3.13
## 198 7.73 7.73 7.73
## 199 8.60 8.60 8.60
## 200 8.57 8.57 8.57
## 201 4.34 4.34 4.34
## 202 4.64 4.64 4.64
## 203 7.50 7.50 7.50
## 204 5.63 5.63 5.63
## 205 3.14 3.14 3.14
## 206 7.74 7.74 7.74
## 207 8.61 8.61 8.61
## 208 8.58 8.58 8.58
## 209 4.26 4.26 4.26
## 210 4.55 4.55 4.55
## 211 7.41 7.41 7.41
## 212 5.55 5.55 5.55
## 213 3.06 3.06 3.06
## 214 7.66 7.66 7.66
## 215 8.53 8.53 8.53
## 216 8.50 8.50 8.50
variables <- c("Year", "Offender Race")
fd_results <- map(variables, ~ sim_ame(sim_nb, var = .x, contrast = "difference"))
names(fd_results) <- variables
fd_results## $Year
## A `clarify_est` object (from `sim_ame()`)
## - Average adjusted predictions for `Year`
## - 1000 simulated values
## - 33 quantities estimated:
## E[Y(1991)] 708.2904
## E[Y(1992)] 927.0628
## E[Y(1993)] 1142.2909
## E[Y(1994)] 862.4233
## E[Y(1995)] 1255.5797
## E[Y(1996)] 1248.0694
## # ... and 27 more
##
## $`Offender Race`
## A `clarify_est` object (from `sim_ame()`)
## - Average adjusted predictions for `Offender Race`
## - 1000 simulated values
## - 8 quantities estimated:
## E[Y(American Indian or Alaska Native)] 44.69348
## E[Y(Asian)] 59.81220
## E[Y(Black or African American)] 1046.70712
## E[Y(Multiple)] 162.22248
## E[Y(Native Hawaiian or Other Pacific Islander)] 13.45159
## E[Y(Not Specified)] 1339.16568
## # ... and 2 more
Based on simulations from the Negative Binomial model, the predicted number of hate crimes committed by White offenders in 2020 is approximately 3.27 incidents. This value is stable across simulations (as shown by identical 2.5% and 97.5% confidence bounds), reflecting low uncertainty for this specific group and time point. This serves as a baseline against which differences across years and racial categories can be compared.
The average predicted number of hate crimes varied significantly across years and offender racial categories. For instance, the adjusted expected count for “Black or African American” offenders was 1,046.71 incidents, while the predicted count for “White” offenders was around 3.27. Similarly, “Not Specified” race had an even higher predicted count of 1,339.17, likely due to a large amount of unclassified or incomplete data. These differences highlight strong associations between offender race and the number of reported hate crime incidents, though they may also reflect reporting biases or structural factors in the data.
# Extract original estimates from clarify object
race_preds <- attr(fd_results$`Offender Race`, "original")
# Convert to clean data frame
race_df <- tibble(
Race = gsub("E\\[Y\\((.*?)\\)\\]", "\\1", names(race_preds)),
Predicted_Count = as.numeric(race_preds)
)
race_df## # A tibble: 8 × 2
## Race Predicted_Count
## <chr> <dbl>
## 1 American Indian or Alaska Native 44.7
## 2 Asian 59.8
## 3 Black or African American 1047.
## 4 Multiple 162.
## 5 Native Hawaiian or Other Pacific Islander 13.5
## 6 Not Specified 1339.
## 7 Unknown 3193.
## 8 White 3098.
# Plot: Predicted counts by offender race
ggplot(race_df, aes(x = reorder(Race, Predicted_Count), y = Predicted_Count)) +
geom_col(fill = "steelblue") +
coord_flip() +
labs(
title = "Predicted Hate Crime Counts by Offender Race",
x = "Offender Race",
y = "Predicted Count"
) +
theme_minimal()# Extract original year estimates
year_preds <- attr(fd_results$Year, "original")
year_df <- tibble(
Year = gsub("E\\[Y\\((.*?)\\)\\]", "\\1", names(year_preds)), # Clean year labels
Predicted_Count = as.numeric(year_preds)
) %>%
mutate(Year = as.integer(Year)) %>%
arrange(Year)
# View table
year_df## # A tibble: 33 × 2
## Year Predicted_Count
## <int> <dbl>
## 1 1991 708.
## 2 1992 927.
## 3 1993 1142.
## 4 1994 862.
## 5 1995 1256.
## 6 1996 1248.
## 7 1997 1205.
## 8 1998 1103.
## 9 1999 1106.
## 10 2000 1169.
## # ℹ 23 more rows
# Plot: Predicted counts by year
ggplot(year_df, aes(x = Year, y = Predicted_Count)) +
geom_line(color = "darkgreen", size = 1) +
geom_point(color = "darkgreen") +
labs(
title = "Predicted Hate Crime Counts by Year",
x = "Year",
y = "Predicted Count"
) +
theme_minimal()## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Conclusion
For this analysis, I applied count regression models to examine patterns in reported hate crime incidents in the United States, focusing on changes over time and differences by offender race. The initial Poisson model revealed substantial over dispersion (44.59), making the model’s assumptions and standard errors unreliable. To make up for thid, a Negative Binomial model was used, offering a significantly improved fit with a lower AIC (2591.89 vs. 9896.98) and reduced deviance (235.06 vs. 8167.01).
Using simulation-based inference from the clarify
package, I estimated the average predicted counts of hate crime
incidents across offender race and year. The results showed disparities.
For example, offenders categorized as “Unknown” were predicted to
account for over 3,200 incidents, and those identified as “White” were
associated with approximately 3,100. In contrast, predicted counts for
“Black or African American” offenders were around 1,000, while
categories such as “Asian” and “Native Hawaiian or Other Pacific
Islander” were associated with far fewer predicted
incidents—approximately 60 and 13. These results highlight both real
disparities and potential issues in data reporting or
classification.
The line plot of predicted counts over time further emphasized shifting trends. While the number of incidents remained relatively stable from the mid-1990s through the mid-2010s, a steap increase began around 2019. Predicted counts peaked in 2021 and 2022, reaching over 2,000 incidents per year—nearly double the average from the previous decade. This pattern likely reflects broader social and political dynamics during that period, including increased polarization, the COVID-19 pandemic, and racial justice movements.
Overall, the results show how both factors related to time and demographic factors shape hate crime trends in the U.S., and how simulation-based modeling can offer a clearer, more nuanced understanding of patterns in over dispersed count data.