DATA 712 HW#6

Introduction

Hate crimes remain a pressing social concern in the United States, with significant variation across time and demographic groups. Understanding patterns in the frequency of hate crime incidents is essential for informing public policy and community interventions. This analysis draws on hate crime data from the FBI to explore how the number of reported hate crime incidents varies by year and by the race of the offender. Using count regression models, including Poisson and Negative Binomial models, the study investigates whether these variables significantly predict incident counts and whether the assumptions of different modeling approaches are appropriate for the structure of the data.

knitr::opts_knit$set(root.dir = "/Users/ruthiemaurer/Desktop/ESS")
library(readxl)
library(dplyr)
library(ggplot2)
library(MASS)       
library(clarify)    
library(texreg)     
library(tidyr)
library(purrr)

Load data

DATA <- read_excel("/Users/ruthiemaurer/Desktop/ESS/hate_crime copy.xlsx")

colnames(DATA) <- c("Year", "Pug Agency Name", "State", "State Name", "Incident Date", 
                    "Offender Race", "Offender Ethnicity", "Offense", "Reason/Bias")
print(colnames(DATA))

## [1] "Year"               "Pug Agency Name"    "State"             
## [4] "State Name"         "Incident Date"      "Offender Race"     
## [7] "Offender Ethnicity" "Offense"            "Reason/Bias"

DATA <- DATA %>%
  filter(!is.na(Year), !is.na(`Offender Race`))

grouped_data <- DATA %>%
  group_by(Year, `Offender Race`) %>%
  summarise(incident_count = n(), .groups = "drop") %>%
  mutate(
    Year = as.factor(Year),
    `Offender Race` = as.factor(`Offender Race`)
  )

head(grouped_data)

## # A tibble: 6 × 3
##   Year  `Offender Race`                  incident_count
##   <fct> <fct>                                     <int>
## 1 1991  American Indian or Alaska Native             12
## 2 1991  Asian                                        47
## 3 1991  Black or African American                   775
## 4 1991  Multiple                                     77
## 5 1991  Unknown                                    1989
## 6 1991  White                                      1689

I grouped the data by Year and Offender Race to create a count variable representing the number of incidents in each group. This prepares the data for Poisson and Negative Binomial modeling (if necessary).

Poisson Model

poisson_model <- glm(incident_count ~ Year + `Offender Race`, 
                     family = poisson, data = grouped_data)

dispersion <- sum(residuals(poisson_model, type = "pearson")^2) / poisson_model$df.residual
dispersion

## [1] 44.58803

The Poisson regression model is to predict the number of hate crime incidents by year and offender race. The dispersion statistic was 44.59, which is much greater than 1. This indicates severe over dispersion, meaning that the variance in the data far exceeds what the Poisson model assumes. As a result, the standard errors and significance tests from the Poisson model may be unreliable. Therefore, I proceeded to fit a Negative Binomial model, which adjusts for overdispersion by including a dispersion parameter.

Negative Binomial Model

nb_model <- glm.nb(incident_count ~ Year + `Offender Race`, data = grouped_data)

screenreg(list(Poisson = poisson_model, NegBin = nb_model))

## 
## ====================================================================================
##                                                           Poisson       NegBin      
## ------------------------------------------------------------------------------------
## (Intercept)                                                   3.33 ***      3.27 ***
##                                                              (0.03)        (0.10)   
## Year1992                                                      0.37 ***      0.27 *  
##                                                              (0.02)        (0.13)   
## Year1993                                                      0.51 ***      0.48 ***
##                                                              (0.02)        (0.13)   
## Year1994                                                      0.26 ***      0.20    
##                                                              (0.02)        (0.13)   
## Year1995                                                      0.55 ***      0.57 ***
##                                                              (0.02)        (0.13)   
## Year1996                                                      0.65 ***      0.57 ***
##                                                              (0.02)        (0.13)   
## Year1997                                                      0.57 ***      0.53 ***
##                                                              (0.02)        (0.13)   
## Year1998                                                      0.54 ***      0.44 ***
##                                                              (0.02)        (0.13)   
## Year1999                                                      0.55 ***      0.45 ***
##                                                              (0.02)        (0.13)   
## Year2000                                                      0.58 ***      0.50 ***
##                                                              (0.02)        (0.13)   
## Year2001                                                      0.75 ***      0.61 ***
##                                                              (0.02)        (0.13)   
## Year2002                                                      0.49 ***      0.44 ***
##                                                              (0.02)        (0.13)   
## Year2003                                                      0.50 ***      0.51 ***
##                                                              (0.02)        (0.13)   
## Year2004                                                      0.52 ***      0.43 ** 
##                                                              (0.02)        (0.13)   
## Year2005                                                      0.48 ***      0.46 ***
##                                                              (0.02)        (0.13)   
## Year2006                                                      0.52 ***      0.55 ***
##                                                              (0.02)        (0.13)   
## Year2007                                                      0.51 ***      0.45 ***
##                                                              (0.02)        (0.13)   
## Year2008                                                      0.56 ***      0.48 ***
##                                                              (0.02)        (0.13)   
## Year2009                                                      0.37 ***      0.36 ** 
##                                                              (0.02)        (0.13)   
## Year2010                                                      0.37 ***      0.41 ** 
##                                                              (0.02)        (0.13)   
## Year2011                                                      0.32 ***      0.35 ** 
##                                                              (0.02)        (0.13)   
## Year2012                                                      0.36 ***      0.43 ** 
##                                                              (0.02)        (0.13)   
## Year2013                                                      0.27 ***      0.26 *  
##                                                              (0.02)        (0.13)   
## Year2014                                                      0.20 ***      0.24    
##                                                              (0.02)        (0.13)   
## Year2015                                                      0.24 ***      0.39 ** 
##                                                              (0.02)        (0.13)   
## Year2016                                                      0.31 ***      0.38 ** 
##                                                              (0.02)        (0.13)   
## Year2017                                                      0.25 ***      0.38 ** 
##                                                              (0.02)        (0.13)   
## Year2018                                                      0.23 ***      0.52 ***
##                                                              (0.02)        (0.13)   
## Year2019                                                      0.33 ***      0.54 ***
##                                                              (0.02)        (0.13)   
## Year2020                                                      0.56 ***      0.76 ***
##                                                              (0.02)        (0.12)   
## Year2021                                                      0.66 ***      1.07 ***
##                                                              (0.02)        (0.12)   
## Year2022                                                      0.73 ***      1.08 ***
##                                                              (0.02)        (0.12)   
## Year2023                                                      0.74 ***      0.99 ***
##                                                              (0.02)        (0.12)   
## `Offender Race`Asian                                          0.30 ***      0.29 ***
##                                                              (0.03)        (0.06)   
## `Offender Race`Black or African American                      3.13 ***      3.15 ***
##                                                              (0.03)        (0.06)   
## `Offender Race`Multiple                                       1.27 ***      1.29 ***
##                                                              (0.03)        (0.06)   
## `Offender Race`Native Hawaiian or Other Pacific Islander     -1.00 ***     -1.20 ***
##                                                              (0.08)        (0.11)   
## `Offender Race`Not Specified                                  3.65 ***      3.40 ***
##                                                              (0.03)        (0.10)   
## `Offender Race`Unknown                                        4.21 ***      4.27 ***
##                                                              (0.03)        (0.06)   
## `Offender Race`White                                          4.22 ***      4.24 ***
##                                                              (0.03)        (0.06)   
## ------------------------------------------------------------------------------------
## AIC                                                        9896.98       2591.89    
## BIC                                                       10031.99       2730.28    
## Log Likelihood                                            -4908.49      -1254.95    
## Deviance                                                   8167.01        235.06    
## Num. obs.                                                   216           216       
## ====================================================================================
## *** p < 0.001; ** p < 0.01; * p < 0.05

The Negative Binomial model produced a much better fit than the Poisson model, as indicated by the lower AIC (2591.89 vs. 9896.98) and substantially lower deviance (235.06 vs. 8167.01). This confirms that the Negative Binomial model better accounts for the overdispersion in the data. The regression coefficients remain relatively stable across both models, but the standard errors are more reliable in the Negative Binomial model. Therefore, I use this model for all further simulations and interpretation of results.

Interpreting the NegBin Model Using `clarify`

# Simulate parameters from NB model
sim_nb <- sim(nb_model, n = 1000, vcov = "robust")

# Set reference values
set_vals <- sim_setx(sim_nb, x = list(
  Year = "2020",
  `Offender Race` = "White"
))

# Predicted count for a reference profile
predicted_count <- sim_apply(sim_nb, FUN = predict, setx = set_vals)
summary(predicted_count)

##     Estimate 2.5 % 97.5 %
## 1       3.27  3.27   3.27
## 2       3.56  3.56   3.56
## 3       6.42  6.42   6.42
## 4       4.56  4.56   4.56
## 5       7.54  7.54   7.54
## 6       7.51  7.51   7.51
## 7       3.54  3.54   3.54
## 8       3.83  3.83   3.83
## 9       6.69  6.69   6.69
## 10      4.83  4.83   4.83
## 11      7.81  7.81   7.81
## 12      7.78  7.78   7.78
## 13      3.75  3.75   3.75
## 14      4.04  4.04   4.04
## 15      6.90  6.90   6.90
## 16      5.03  5.03   5.03
## 17      8.01  8.01   8.01
## 18      7.98  7.98   7.98
## 19      3.46  3.46   3.46
## 20      3.76  3.76   3.76
## 21      6.62  6.62   6.62
## 22      4.75  4.75   4.75
## 23      7.73  7.73   7.73
## 24      7.70  7.70   7.70
## 25      3.84  3.84   3.84
## 26      4.13  4.13   4.13
## 27      6.99  6.99   6.99
## 28      5.13  5.13   5.13
## 29      8.11  8.11   8.11
## 30      8.08  8.08   8.08
## 31      3.83  3.83   3.83
## 32      4.13  4.13   4.13
## 33      6.99  6.99   6.99
## 34      5.12  5.12   5.12
## 35      8.10  8.10   8.10
## 36      8.07  8.07   8.07
## 37      3.80  3.80   3.80
## 38      4.09  4.09   4.09
## 39      6.95  6.95   6.95
## 40      5.09  5.09   5.09
## 41      8.07  8.07   8.07
## 42      8.04  8.04   8.04
## 43      3.71  3.71   3.71
## 44      4.00  4.00   4.00
## 45      6.86  6.86   6.86
## 46      5.00  5.00   5.00
## 47      7.98  7.98   7.98
## 48      7.95  7.95   7.95
## 49      3.71  3.71   3.71
## 50      4.01  4.01   4.01
## 51      6.87  6.87   6.87
## 52      5.00  5.00   5.00
## 53      7.98  7.98   7.98
## 54      7.95  7.95   7.95
## 55      3.77  3.77   3.77
## 56      4.06  4.06   4.06
## 57      6.92  6.92   6.92
## 58      5.06  5.06   5.06
## 59      8.04  8.04   8.04
## 60      8.01  8.01   8.01
## 61      3.87  3.87   3.87
## 62      4.17  4.17   4.17
## 63      7.03  7.03   7.03
## 64      5.16  5.16   5.16
## 65      8.14  8.14   8.14
## 66      8.11  8.11   8.11
## 67      3.71  3.71   3.71
## 68      4.00  4.00   4.00
## 69      6.86  6.86   6.86
## 70      5.00  5.00   5.00
## 71      7.98  7.98   7.98
## 72      7.95  7.95   7.95
## 73      3.78  3.78   3.78
## 74      4.07  4.07   4.07
## 75      6.93  6.93   6.93
## 76      5.07  5.07   5.07
## 77      8.05  8.05   8.05
## 78      8.02  8.02   8.02
## 79      3.70  3.70   3.70
## 80      3.99  3.99   3.99
## 81      6.85  6.85   6.85
## 82      4.99  4.99   4.99
## 83      7.97  7.97   7.97
## 84      7.94  7.94   7.94
## 85      3.73  3.73   3.73
## 86      4.02  4.02   4.02
## 87      6.88  6.88   6.88
## 88      5.02  5.02   5.02
## 89      8.00  8.00   8.00
## 90      7.97  7.97   7.97
## 91      3.82  3.82   3.82
## 92      4.11  4.11   4.11
## 93      6.97  6.97   6.97
## 94      5.11  5.11   5.11
## 95      8.09  8.09   8.09
## 96      8.06  8.06   8.06
## 97      3.72  3.72   3.72
## 98      4.01  4.01   4.01
## 99      6.87  6.87   6.87
## 100     5.00  5.00   5.00
## 101     7.98  7.98   7.98
## 102     7.95  7.95   7.95
## 103     3.75  3.75   3.75
## 104     4.04  4.04   4.04
## 105     6.90  6.90   6.90
## 106     5.04  5.04   5.04
## 107     8.02  8.02   8.02
## 108     7.99  7.99   7.99
## 109     3.63  3.63   3.63
## 110     3.92  3.92   3.92
## 111     6.78  6.78   6.78
## 112     4.92  4.92   4.92
## 113     7.90  7.90   7.90
## 114     7.87  7.87   7.87
## 115     3.68  3.68   3.68
## 116     3.97  3.97   3.97
## 117     6.83  6.83   6.83
## 118     4.97  4.97   4.97
## 119     7.94  7.94   7.94
## 120     7.91  7.91   7.91
## 121     3.62  3.62   3.62
## 122     3.91  3.91   3.91
## 123     6.77  6.77   6.77
## 124     4.91  4.91   4.91
## 125     7.89  7.89   7.89
## 126     7.86  7.86   7.86
## 127     3.70  3.70   3.70
## 128     3.99  3.99   3.99
## 129     6.85  6.85   6.85
## 130     4.98  4.98   4.98
## 131     7.96  7.96   7.96
## 132     7.93  7.93   7.93
## 133     3.53  3.53   3.53
## 134     3.82  3.82   3.82
## 135     6.68  6.68   6.68
## 136     4.82  4.82   4.82
## 137     2.33  2.33   2.33
## 138     7.80  7.80   7.80
## 139     7.77  7.77   7.77
## 140     3.51  3.51   3.51
## 141     3.80  3.80   3.80
## 142     6.66  6.66   6.66
## 143     4.80  4.80   4.80
## 144     2.31  2.31   2.31
## 145     7.78  7.78   7.78
## 146     7.75  7.75   7.75
## 147     3.65  3.65   3.65
## 148     3.94  3.94   3.94
## 149     6.81  6.81   6.81
## 150     4.94  4.94   4.94
## 151     2.45  2.45   2.45
## 152     7.92  7.92   7.92
## 153     7.89  7.89   7.89
## 154     3.65  3.65   3.65
## 155     3.94  3.94   3.94
## 156     6.81  6.81   6.81
## 157     4.94  4.94   4.94
## 158     2.45  2.45   2.45
## 159     7.92  7.92   7.92
## 160     7.89  7.89   7.89
## 161     3.65  3.65   3.65
## 162     3.94  3.94   3.94
## 163     6.81  6.81   6.81
## 164     4.94  4.94   4.94
## 165     2.45  2.45   2.45
## 166     7.05  7.05   7.05
## 167     7.92  7.92   7.92
## 168     7.89  7.89   7.89
## 169     3.79  3.79   3.79
## 170     4.08  4.08   4.08
## 171     6.94  6.94   6.94
## 172     5.08  5.08   5.08
## 173     2.59  2.59   2.59
## 174     7.19  7.19   7.19
## 175     8.06  8.06   8.06
## 176     8.03  8.03   8.03
## 177     3.81  3.81   3.81
## 178     4.10  4.10   4.10
## 179     6.97  6.97   6.97
## 180     5.10  5.10   5.10
## 181     2.61  2.61   2.61
## 182     7.21  7.21   7.21
## 183     8.08  8.08   8.08
## 184     8.05  8.05   8.05
## 185     4.03  4.03   4.03
## 186     4.32  4.32   4.32
## 187     7.18  7.18   7.18
## 188     5.32  5.32   5.32
## 189     2.83  2.83   2.83
## 190     7.43  7.43   7.43
## 191     8.30  8.30   8.30
## 192     8.27  8.27   8.27
## 193     4.33  4.33   4.33
## 194     4.63  4.63   4.63
## 195     7.49  7.49   7.49
## 196     5.62  5.62   5.62
## 197     3.13  3.13   3.13
## 198     7.73  7.73   7.73
## 199     8.60  8.60   8.60
## 200     8.57  8.57   8.57
## 201     4.34  4.34   4.34
## 202     4.64  4.64   4.64
## 203     7.50  7.50   7.50
## 204     5.63  5.63   5.63
## 205     3.14  3.14   3.14
## 206     7.74  7.74   7.74
## 207     8.61  8.61   8.61
## 208     8.58  8.58   8.58
## 209     4.26  4.26   4.26
## 210     4.55  4.55   4.55
## 211     7.41  7.41   7.41
## 212     5.55  5.55   5.55
## 213     3.06  3.06   3.06
## 214     7.66  7.66   7.66
## 215     8.53  8.53   8.53
## 216     8.50  8.50   8.50

variables <- c("Year", "Offender Race")

fd_results <- map(variables, ~ sim_ame(sim_nb, var = .x, contrast = "difference"))

names(fd_results) <- variables
fd_results

## $Year
## A `clarify_est` object (from `sim_ame()`)
##  - Average adjusted predictions for `Year`
##  - 1000 simulated values
##  - 33 quantities estimated:                     
##  E[Y(1991)]  708.2904
##  E[Y(1992)]  927.0628
##  E[Y(1993)] 1142.2909
##  E[Y(1994)]  862.4233
##  E[Y(1995)] 1255.5797
##  E[Y(1996)] 1248.0694
## # ... and 27 more
## 
## $`Offender Race`
## A `clarify_est` object (from `sim_ame()`)
##  - Average adjusted predictions for `Offender Race`
##  - 1000 simulated values
##  - 8 quantities estimated:                                                           
##  E[Y(American Indian or Alaska Native)]            44.69348
##  E[Y(Asian)]                                       59.81220
##  E[Y(Black or African American)]                 1046.70712
##  E[Y(Multiple)]                                   162.22248
##  E[Y(Native Hawaiian or Other Pacific Islander)]   13.45159
##  E[Y(Not Specified)]                             1339.16568
## # ... and 2 more

Based on simulations from the Negative Binomial model, the predicted number of hate crimes committed by White offenders in 2020 is approximately 3.27 incidents. This value is stable across simulations (as shown by identical 2.5% and 97.5% confidence bounds), reflecting low uncertainty for this specific group and time point. This serves as a baseline against which differences across years and racial categories can be compared.

The average predicted number of hate crimes varied significantly across years and offender racial categories. For instance, the adjusted expected count for “Black or African American” offenders was 1,046.71 incidents, while the predicted count for “White” offenders was around 3.27. Similarly, “Not Specified” race had an even higher predicted count of 1,339.17, likely due to a large amount of unclassified or incomplete data. These differences highlight strong associations between offender race and the number of reported hate crime incidents, though they may also reflect reporting biases or structural factors in the data.

# Extract original estimates from clarify object
race_preds <- attr(fd_results$`Offender Race`, "original")

# Convert to clean data frame
race_df <- tibble(
  Race = gsub("E\\[Y\\((.*?)\\)\\]", "\\1", names(race_preds)),  
  Predicted_Count = as.numeric(race_preds)
)
race_df

## # A tibble: 8 × 2
##   Race                                      Predicted_Count
##   <chr>                                               <dbl>
## 1 American Indian or Alaska Native                     44.7
## 2 Asian                                                59.8
## 3 Black or African American                          1047. 
## 4 Multiple                                            162. 
## 5 Native Hawaiian or Other Pacific Islander            13.5
## 6 Not Specified                                      1339. 
## 7 Unknown                                            3193. 
## 8 White                                              3098.

# Plot: Predicted counts by offender race
ggplot(race_df, aes(x = reorder(Race, Predicted_Count), y = Predicted_Count)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(
    title = "Predicted Hate Crime Counts by Offender Race",
    x = "Offender Race",
    y = "Predicted Count"
  ) +
  theme_minimal()

# Extract original year estimates
year_preds <- attr(fd_results$Year, "original")

year_df <- tibble(
  Year = gsub("E\\[Y\\((.*?)\\)\\]", "\\1", names(year_preds)),  # Clean year labels
  Predicted_Count = as.numeric(year_preds)
) %>%
  mutate(Year = as.integer(Year)) %>%
  arrange(Year)

# View table
year_df

## # A tibble: 33 × 2
##     Year Predicted_Count
##    <int>           <dbl>
##  1  1991            708.
##  2  1992            927.
##  3  1993           1142.
##  4  1994            862.
##  5  1995           1256.
##  6  1996           1248.
##  7  1997           1205.
##  8  1998           1103.
##  9  1999           1106.
## 10  2000           1169.
## # ℹ 23 more rows

# Plot: Predicted counts by year
ggplot(year_df, aes(x = Year, y = Predicted_Count)) +
  geom_line(color = "darkgreen", size = 1) +
  geom_point(color = "darkgreen") +
  labs(
    title = "Predicted Hate Crime Counts by Year",
    x = "Year",
    y = "Predicted Count"
  ) +
  theme_minimal()

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Conclusion

For this analysis, I applied count regression models to examine patterns in reported hate crime incidents in the United States, focusing on changes over time and differences by offender race. The initial Poisson model revealed substantial over dispersion (44.59), making the model’s assumptions and standard errors unreliable. To make up for thid, a Negative Binomial model was used, offering a significantly improved fit with a lower AIC (2591.89 vs. 9896.98) and reduced deviance (235.06 vs. 8167.01).

Using simulation-based inference from the clarify package, I estimated the average predicted counts of hate crime incidents across offender race and year. The results showed disparities. For example, offenders categorized as “Unknown” were predicted to account for over 3,200 incidents, and those identified as “White” were associated with approximately 3,100. In contrast, predicted counts for “Black or African American” offenders were around 1,000, while categories such as “Asian” and “Native Hawaiian or Other Pacific Islander” were associated with far fewer predicted incidents—approximately 60 and 13. These results highlight both real disparities and potential issues in data reporting or classification.

The line plot of predicted counts over time further emphasized shifting trends. While the number of incidents remained relatively stable from the mid-1990s through the mid-2010s, a steap increase began around 2019. Predicted counts peaked in 2021 and 2022, reaching over 2,000 incidents per year—nearly double the average from the previous decade. This pattern likely reflects broader social and political dynamics during that period, including increased polarization, the COVID-19 pandemic, and racial justice movements.

Overall, the results show how both factors related to time and demographic factors shape hate crime trends in the U.S., and how simulation-based modeling can offer a clearer, more nuanced understanding of patterns in over dispersed count data.

DATA 712 HW#6

DATA 712 HW#6

Introduction

Load data

Poisson Model

Negative Binomial Model

Interpreting the NegBin Model Using clarify

Conclusion

Interpreting the NegBin Model Using `clarify`