Overview

“Higher Rates Of Hate Crimes Are Tied To Income Inequality”

The article produced by abc news aims to inspect for trends of hate crimes within key socioeconomic factors for each state. Multivariate linear regression was used to analyze which of the key factors are indicators of hate incidents occurring across the states & District of Colombia.

About the data

The dataset used can be found here. The hate incidents are reported through two agencies: the FBI & Southern Poverty Law Center. It is important to note the reporting processes for both agencies. First, the FBI uses reports submitted voluntarily from law enforcement agencies, thus, it can not be comprehensive. The FBI also collects data on only prosecutable hate crimes, ignoring many more hate incidents. Secondly, SPLC

The dataset has the following variables in the table below. The Gini Index (independent variable) is a summary measuring income inequality, and Hate crime rate (Dependent value) represented in both Average annual hate crimes per 100,000 population reported by the FBI & Hate crimes per 100,000 population reported by Southern Poverty Law Center

Header Definition
state State name
median_household_income Median household income, 2016
share_unemployed_seasonal Share of the population that is unemployed (seasonally adjusted), Sept. 2016
share_population_in_metro_areas Share of the population that lives in metropolitan areas, 2015
share_population_with_high_school_degree Share of adults 25 and older with a high-school degree, 2009
share_non_citizen Share of the population that are not U.S. citizens, 2015
share_white_poverty Share of white residents who are living in poverty, 2015
gini_index Gini Index, 2015
share_non_white Share of the population that is not white, 2015
share_voters_voted_trump Share of 2016 U.S. presidential voters who voted for Donald Trump
hate_crimes_per_100k_splc Hate crimes per 100,000 population, Southern Poverty Law Center, Nov. 9-18, 2016
avg_hatecrimes_per_100k_fbi Average annual hate crimes per 100,000 population, FBI, 2010-2015

The Variables headers are as they appear in the table above, representing many more socioeconomic factors the can be tied with Hate crimes.

#Load necessary packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(dplyr)


#Load Data from Github
Hate_crime_dataset <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/hate-crimes/hate_crimes.csv")

Subset the data

Since the article talks about the relationship between income inequality, represented in Gini index, and the Hate crimes rate per 100K let’s filter the data-set to remove unwanted variables. We will remove all variables except State, Gini index, Hate crime rate reported by FBI and SPLC

#Subset and remove excess variables 

Hate_crime_subset <- select(Hate_crime_dataset, 
                         select = c(state, gini_index, hate_crimes_per_100k_splc,avg_hatecrimes_per_100k_fbi  ))

#remame columns  

colnames(Hate_crime_subset) <- c("State",  "Gini_index", "Hate_crime_rate_SPLC","Hate_crime_rate_FBI" )

Looking at the Dataset

The article stated that the factor that affected the likelihood of hate crimes occurring is income inequality, which is presented in our dataset by Gini Index. Below we will create two plot to visualise the correlation between income inequality and crime rates. Note that a Gini index of 0 represents perfect equality, while an index of 1 implies perfect inequality. Note that the data are missing some values for some state which were not reported.

In the code below, we will trim our data sets for easier view and create two plots that support the article findings:

## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 4 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).

ggplot(data = Hate_crime_subset, mapping = aes(x= Gini_index, y= Hate_crime_rate_FBI)) + 
  geom_point( size = 4)+
  geom_line(colour = "red")+
  geom_smooth(method = lm, se = F)+
  labs(title= "Hate Crime to Gini Index (FBI)")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).

Conclusion:

Looking at the two plots above a clear coloration is presented in both data. However, more reporting channels can provide more data supporting causation, considering the limitations of both processes of reporting.