Week 1 Assignment

Overview

“Higher Rates Of Hate Crimes Are Tied To Income Inequality”

The article produced by abc news aims to inspect for trends of hate crimes within key socioeconomic factors for each state. Multivariate linear regression was used to analyze which of the key factors are indicators of hate incidents occurring across the states & District of Colombia.

About the data

The dataset used can be found here. The hate incidents are reported through two agencies: the FBI & Southern Poverty Law Center. It is important to note the reporting processes for both agencies. First, the FBI uses reports submitted voluntarily from law enforcement agencies, thus, it can not be comprehensive. The FBI also collects data on only prosecutable hate crimes, ignoring many more hate incidents. Secondly, SPLC

The dataset has the following variables in the table below. The Gini Index (independent variable) is a summary measuring income inequality, and Hate crime rate (Dependent value) represented in both Average annual hate crimes per 100,000 population reported by the FBI & Hate crimes per 100,000 population reported by Southern Poverty Law Center

Header	Definition
`state`	State name
`median_household_income`	Median household income, 2016
`share_unemployed_seasonal`	Share of the population that is unemployed (seasonally adjusted), Sept. 2016
`share_population_in_metro_areas`	Share of the population that lives in metropolitan areas, 2015
`share_population_with_high_school_degree`	Share of adults 25 and older with a high-school degree, 2009
`share_non_citizen`	Share of the population that are not U.S. citizens, 2015
`share_white_poverty`	Share of white residents who are living in poverty, 2015
`gini_index`	Gini Index, 2015
`share_non_white`	Share of the population that is not white, 2015
`share_voters_voted_trump`	Share of 2016 U.S. presidential voters who voted for Donald Trump
`hate_crimes_per_100k_splc`	Hate crimes per 100,000 population, Southern Poverty Law Center, Nov. 9-18, 2016
`avg_hatecrimes_per_100k_fbi`	Average annual hate crimes per 100,000 population, FBI, 2010-2015

The Variables headers are as they appear in the table above, representing many more socioeconomic factors the can be tied with Hate crimes.

#Load necessary packages
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2)
library(dplyr)


#Load Data from Github
Hate_crime_dataset <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/hate-crimes/hate_crimes.csv")

Subset the data

Since the article talks about the relationship between income inequality, represented in Gini index, and the Hate crimes rate per 100K let’s filter the data-set to remove unwanted variables. We will remove all variables except State, Gini index, Hate crime rate reported by FBI and SPLC

#Subset and remove excess variables 

Hate_crime_subset <- select(Hate_crime_dataset, 
                         select = c(state, gini_index, hate_crimes_per_100k_splc,avg_hatecrimes_per_100k_fbi  ))

#remame columns  

colnames(Hate_crime_subset) <- c("State",  "Gini_index", "Hate_crime_rate_SPLC","Hate_crime_rate_FBI" )

Looking at the Dataset

The article stated that the factor that affected the likelihood of hate crimes occurring is income inequality, which is presented in our dataset by Gini Index. Below we will create two plot to visualise the correlation between income inequality and crime rates. Note that a Gini index of 0 represents perfect equality, while an index of 1 implies perfect inequality. Note that the data are missing some values for some state which were not reported.

In the code below, we will trim our data sets for easier view and create two plots that support the article findings:

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 4 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 4 rows containing missing values (`geom_point()`).

ggplot(data = Hate_crime_subset, mapping = aes(x= Gini_index, y= Hate_crime_rate_FBI)) + 
  geom_point( size = 4)+
  geom_line(colour = "red")+
  geom_smooth(method = lm, se = F)+
  labs(title= "Hate Crime to Gini Index (FBI)")

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 1 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 1 rows containing missing values (`geom_point()`).

Conclusion:

Looking at the two plots above a clear coloration is presented in both data. However, more reporting channels can provide more data supporting causation, considering the limitations of both processes of reporting.