FiveThirtyEight published “Higher Rates Of Hate Crimes Are Tied To Income Inequality” which uses data on hate crimes, provided by the FBI and the Southern Poverty Law Center (SPLC), as well as socioeconomic factors, aggregated from multiple sources, to determine what socioeconomic factor or factors are linked to hate crimes.
To determine that hate crimes are tied to income inequality, FiveThirtyEight performed multivariate linear regression. Unfortunately, the article only shows one visualization, which is two maps displaying average hate crimes per 100,000 residents, one from 2010-2015 and another from the 10 days following the 2016 election. The article then stated that their findings held for both pre-election and post-election.
While the findings of the article are interesting, they provide no visualizations supporting their actual findings. In order to extend the work of the article, new visualizations were created to represent the findings of the article.
The first step is to load the data from the FiveThirtyEight github repository on hate crimes and socioeconomic factors.
<-read.csv(
hate_data"https://raw.githubusercontent.com/fivethirtyeight/data/master/hate-crimes/hate_crimes.csv")
glimpse
can be used to get a basic overview of the
data.
glimpse(hate_data)
## Rows: 51
## Columns: 12
## $ state <chr> "Alabama", "Alaska", "Arizona…
## $ median_household_income <int> 42278, 67629, 49254, 44922, 6…
## $ share_unemployed_seasonal <dbl> 0.060, 0.064, 0.063, 0.052, 0…
## $ share_population_in_metro_areas <dbl> 0.64, 0.63, 0.90, 0.69, 0.97,…
## $ share_population_with_high_school_degree <dbl> 0.821, 0.914, 0.842, 0.824, 0…
## $ share_non_citizen <dbl> 0.02, 0.04, 0.10, 0.04, 0.13,…
## $ share_white_poverty <dbl> 0.12, 0.06, 0.09, 0.12, 0.09,…
## $ gini_index <dbl> 0.472, 0.422, 0.455, 0.458, 0…
## $ share_non_white <dbl> 0.35, 0.42, 0.49, 0.26, 0.61,…
## $ share_voters_voted_trump <dbl> 0.63, 0.53, 0.50, 0.60, 0.33,…
## $ hate_crimes_per_100k_splc <dbl> 0.12583893, 0.14374012, 0.225…
## $ avg_hatecrimes_per_100k_fbi <dbl> 1.8064105, 1.6567001, 3.41392…
To understand what each column actually means, the README.md
was consulted. Based on the README.md, most of the column names are
intuitive. However, the gini_index
was ambiguous because
not everyone is familiar with what it is. According to the U.S. Census
Bureau, the Gini Index represents income inequality, it is a coefficent
ranging from 0 to 1 where 0 indicates everyone’s income is equal and 1
indicates that one person or group has all the income (https://www.census.gov/topics/income-poverty/income-inequality/about/metrics/gini-index.html).
To make what this column represents more clear, the column name was
renamed to share_income_inequality
.
colnames(hate_data)[which(names(hate_data) == "gini_index")] <-
"share_income_inequality"
The other column names that were unclear was the
hate_crimes_per_100k_splc
and
avg_hatecrimes_per_100k_fbi
. They are unclear because what
they do not mention is that the FBI data was data collected from
2010-2015, which FiveThirtyEight used to represent the hate crimes
leading up to the 2016 presidential election and the SPLC data was
collected from November 9-18, 2016 which FiveThirtyEight used to
represent hate crimes after the 2016 presidential election. The article
states that they had to use these two different sources of data because
the FBI had not yet released it’s 2016 data and the SPLC did not collect
data prior to the 2016 election. Later in the article, it states that
their findings hold true both prior to and after the election. Despite
the FBI data having shortcomings in that it only counts prosecutable
hate crimes, not all hate incidents, and it is only hate crimes
voluntarily reported - it does represent a much longer time period than
the SPLC. For this extended analysis due to the longer time period, the
hate_crimes_per_100k_fbi
was used and the
hate_crimes_per_100k_splc
was removed. Furthermore, because
the conclusion was just that income inequality was linked to hate
crimes, only share_income_inequality
and
avg_hatecrimes_per_100k_fbi
as well as state
were saved.
<- select(hate_data, share_income_inequality,
hate_data avg_hatecrimes_per_100k_fbi, state)
Now to show the result, that higher income inequality is tied to hate crimes, these variables are plotted along with the line of best fit.
ggplot(data = hate_data, aes(x = share_income_inequality,
y = avg_hatecrimes_per_100k_fbi)) +
geom_point() + geom_smooth(formula = y ~ x, method = "lm", se = FALSE) +
ggtitle("Income Inequality and Hate Crimes per 100,000 people") +
xlab("Share of Income Inequality") +
ylab("Average Hate Crimes per 100,000 people")
Another interesting question is if certain regions of the United States then had more hate crimes than others. The states were broken up by the regions that are designated by the United States Census Bureau.
<- subset(hate_data, subset = state %in% c("Connecticut", "Maine",
hate_data_ne "Massachusetts", "New Hampshire", "Rhode Island", "New Jersey", "New York",
"Pennsylvania"))
<- subset(hate_data, subset = state %in% c("Indiana", "Illinois",
hate_data_mw "Michigan", "Ohio", "Wisconsin", "Iowa", "Kansas", "Minnesota", "Missouri",
"Nebraska", "North Dakota", "South Dakota"))
<- subset(hate_data, subset = state %in% c("Delaware",
hate_data_s "District of Columbia", "Florida", "Georgia", "Maryland", "North Carolina",
"South Carolina", "Virginia", "West Virginia", "Alabama", "Kentucky",
"Mississippi", "Tennessee", "Arkansas", "Louisiana", "Oklahoma", "Texas"))
<- subset(hate_data, subset = state %in% c("Arizona", "Colorado",
hate_data_w "Idaho", "New Mexico", "Montana", "Utah" ,"Nevada", "Wyoming", "Alaska",
"California", "Hawaii", "Oregon", "Washington"))
Once broken up, the hate crimes were plotted by region.
boxplot(hate_data_ne$avg_hatecrimes_per_100k_fbi,
$avg_hatecrimes_per_100k_fbi,
hate_data_mw$avg_hatecrimes_per_100k_fbi,
hate_data_s$avg_hatecrimes_per_100k_fbi,
hate_data_wmain = "Hate Crimes per 100,00 people by Region of United States",
names = c("North East", "Mid West", "South", "West"))
From this plot, it indicates that the most hate crimes occur in the North East and the least in the South.
To extend the work of FiveThirtyEight, a plot was created to show the linear positive relationship between income inequality and hate crimes. To go a step further, the hate crimes were also displayed by region of the United States.