A hate crime can be defined as a “criminal act that demonstrates an accused’s prejudice based on the actual or perceived race, color, religion, national origin, sex, age, marital status, personal appearance, sexual orientation, gender identity or expression, family responsibility, homelessness, physical disability, matriculation, or political affiliation of a victim of the subject designated act.” This definition was taken from the DC Gov Website.
Fivethirtyeight did a study focusing on the number of hate crimes per state based on household income. The study focused on data from before the 2016 election and after to see if there were any differences in the number of hate crimes between states. The study found that hate crimes and household income were correlated but the impact of the election had no major effect on the number of hate crimes.
I wanted to take this dataset from Kaggle further and analyze hate crimes based on other factors. I wanted to compare hate crimes per state based on: the importance of religion, beliefs about sexuality, and share of the population that are not U.S. citizens.
1.The greater number of non-citizens in a state will cause more hate crimes in a state. 2.The more important religion is in a state, the greater number of hate crimes that will occur in the state. 3.The greater the percentage of the population who believe that homosexuality should be discouraged in a state will cause greater hate crime rates in a state.
I took the hate crime dataset used in the Fivethirtyeight article formed on Kaggle for the hate crime data. This data provides the FBI data for the average annual hate crimes from 2010 to 2015. The dataset shows population adjusted hate crimes based on police records or crimes known to the police. The Criminal Justice Information Services Division describes how all of the crimes recorded were motivated by the “offender’s bias against a race, gender, identity, religion, disability, sexual orientation, or ethnicity.” The law enforcement must provide sufficient evidence “to conclude that the offender’s actions were motivated.” This data does not include hate incidents that occurred without police knowledge or record. More information about the methodology behind the FBI data can be found here. Additionally, I also used the data of the share of the population that are not U.S. citizens in 2015 from the Kaggle set.
To examine importance of religion and beliefs about homosexuality, I found a study done by Pew Research Center for each state. The importance of religion in one’s life by state dataset was compiled in 2014 and includes the options: very important, somewhat important, not too important, not at all, and don’t know. The views about homosexuality of adults by state dataset was compiled in 2014 as well and includes the options: should be accepted, should be discouraged, neither/both equally, and don’t know. For further explanation on how Pew Research Center measures religious data, a Pew Research Article can be found here.
In order to perform the analysis of the factors of hate crimes, I installed and downloaded the following packages on RStudio:
library(tidyverse)
library(tidytext)
library(ggplot2)
library(sf)
library(readxl)
library(readr)For the Kaggle hate crime dataset, I used the read.csv function to upload the csv file.
read.csv("hate_crimes.csv") -> originalDataFor the data from Pew Research Center, I copied and pasted the tables into Excel and imported the data using the code below. I have provided the code for “Importance of Religion” below.
library(readxl)
importanceofreligion <- read_excel("importanceofreligion.xlsx")In order to show the average hate crimes across the United States, I uploaded a shapefile of the United States from Github. I filtered the data to exclude Puerto Rico, Hawaii, and Alaska for mapping purposes.
usa <- st_read ("usa/cb_2017_us_state_20m.shp")
usa_50 <- usa %>%
filter(!NAME %in% c("Puerto Rico","Hawaii", "Alaska"))In order to merge the United States shape file and hate crime dataset together to create a map, I changed the USA dataset states column to match the title of the column of the hate crime dataset. I filtered District of Columbia out of the dataset because the average annual hate crime rate was much higher than any other state. Hawaii does not have any recorded hate crimes, so I filtered that out of the dataset as well. I filtered out Alaska for mapping purposes. After filtering the datasets, I inner joined the two datasets and plotted them on a map.
colnames(usa_50)[6] <- "state"
hatenoDC <- originalData %>%
filter(!state %in% c("District of Columbia", "Hawaii"))
hateState <- inner_join (hatenoDC, usa_50)
ggplot()+ geom_sf(data=hateState, aes( fill= avg_hatecrimes_per_100k_fbi)) + theme(axis.text.x=element_blank(), axis.text.y=element_blank(), axis.ticks=element_blank(), rect=element_blank()) +
scale_fill_viridis_c("Hate Crimes", option = "magma", direction = -1) +ggtitle("Average Hate Crimes Per State")I created bar charts to display the five states that have the highest and lowest average annual hate crimes. The District of Columbia has the highest rate overall by far and since it is not a state, I filtered it out of the dataset again. The average annual hate crime rates range from .26 to 10.95.
hatenoDC <- hatecrimes%>%filter(!state %in% "District of Columbia")hatenoDC %>% group_by(avg_hatecrimes_per_100k_fbi) %>%
arrange(avg_hatecrimes_per_100k_fbi) %>%
head(5) -> top5Hate
ggplot(top5Hate, aes(state, avg_hatecrimes_per_100k_fbi, fill =avg_hatecrimes_per_100k_fbi )) + geom_col(show.legend = FALSE) + xlab("State") + ylab("Average Hate Crimes") + ggtitle("Highest Average Annual Hate Crimes")hatecrimes %>% group_by(avg_hatecrimes_per_100k_fbi) %>%
arrange(avg_hatecrimes_per_100k_fbi) %>%
head(5) -> bottom5Hate
ggplot(bottom5Hate, aes(state, avg_hatecrimes_per_100k_fbi, fill=avg_hatecrimes_per_100k_fbi)) +
geom_col(show.legend = FALSE)+ xlab("State") + ylab("Average Hate Crimes") + ggtitle("Lowest Average Annual Hate Crimes")The states with the highest average annual hate crimes are Massachusetts, North Dakota, New Jersey, Kentucky, and Washington, respectively. The states with the lowest average annual hate crimes are Wyoming, Georgia, Iowa, and Mississippi, respectively. The District of Columbia has the highest annual number of hate crimes for the adjusted population. I thought this was very interesting because it was over double the number of Massachusetts, the state with the most hate crimes. I think there must be more hate crimes in DC because it is the capital of the United States. Additionally, the Washingtonian stated that the hate crimes in DC have increased over 62 percent since 2017. The hate crimes in DC were the highest overall and I wanted to see if there was an influence on the factors of share of non U.S. citizens, importance of religion, and the percentage of people who believe homosexuality should be discouraged.
I wanted to show both plots to demonstrate how the District of Columbia is an outlier of the data but can influence the graphs. The graph that includes DC hate crime rate looks like there is a decent correlation between the share of non-citizen and hate crimes. When the outlier is taken out, no real correlation between the two can be seen. After conducting a correlation test, the data without DC shows a correlation coefficient of .1600652, meaning that there is a very slight positive correlation but almost no correlation. Overall, a high share of non U.S. citizens in a states population the higher does not increase the amount of hate crimes in the state.
Many hate crimes can be motivated by religion, such as the shooting in a Pittsburgh synagogue in October of 2018. I think that if religion is very important in a state, there will be more hate crimes in the state. Because religion is more important, citizens may be attending mass more and spreading their religion which may cause hate crimes torwards that religion.
I created two scatterplots see if there was a correlation between the importance of religion and hate crimes. The first scatter plot includes the District of Columbia, while the second does not. I also ran correlation tests on both data frames to see how correlated the two factors are.
hate_and_religion <- inner_join(importanceofreligion, hatecrimes)ggplot(hate_and_religion, aes(very_important, avg_hatecrimes_per_100k_fbi)) + geom_point() + geom_smooth(method=lm, se=FALSE) +theme_minimal() +
xlab("Importance of Religion") + ylab("Average Hate Crimes") + ggtitle("Importance of Religion and Hate Crimes With DC")hate_and_religionNODC <- hate_and_religion %>%
filter(!state %in% "District of Columbia")
ggplot(hate_and_religionNODC, aes(very_important, avg_hatecrimes_per_100k_fbi)) + geom_point() + geom_smooth(method=lm, se=FALSE) +theme_minimal() +
xlab("Importance of Religion") + ylab("Average Hate Crimes") + ggtitle("Importance of Religion and Hate Crimes")1.cor.test(hate_and_religion$very_important,hate_and_religion$avg_hatecrimes_per_100k_fbi)
2.cor.test(hate_and_religionNODC$very_important,hate_and_religionNODC$avg_hatecrimes_per_100k_fbi)
Both of these graphs display a negative correlation which means that one variable increases as the other decreases. These graphs both display a negative correlation, showing that as importance of religion increases, the number of hate crimes actually decrease slightly. The correlation coefficient for the data without DC is -.3084713. This is pretty close to zero so there is only a slight negative correlation between the importance of religion and hate crimes. The negative correlation disagrees with my hypothesis because more religious states do not have more hate crimes.
Another cause of hate crimes can be sexual orientation. I think that if more people do not accept homosexuality in a state, there will be more hate crimes towards those who are homosexual. Several states have a high percentage of adults who believe that homosexuality should be discouraged.
I merged the two datasets together after changing column names to match. I created two scatterplots see if there was a correlation between the beliefs about homosexuality and hate crimes. The first scatter plot includes the District of Columbia, while the second does not. I also ran correlation tests on both data frames to see how correlated the two factors are.
colnames(homosexualitybeliefs)[1] <- "state"
colnames(homosexualitybeliefs)[3] <- "should_be_discouraged"
hate_and_sexuality <- inner_join(homosexualitybeliefs, hatecrimes)ggplot(hate_and_sexuality, aes(should_be_discouraged, avg_hatecrimes_per_100k_fbi)) + geom_point() + geom_smooth(method=lm, se=FALSE) +theme_minimal() + xlab("Percentage of People Who Discourage Homosexuality") + ylab("Average Hate Crimes") + ggtitle("Beliefs About Homosexuality and Hate Crimes With DC")hate_and_sexualityNODC <- inner_join(homosexualitybeliefs, hatenoDC)
ggplot(hate_and_sexualityNODC, aes(should_be_discouraged,avg_hatecrimes_per_100k_fbi)) + geom_point() +
geom_smooth(method=lm, se=FALSE) +theme_minimal() +
xlab("Percentage of People Who Discourage Homosexuality") + ylab("Average Hate Crimes") + ggtitle("Beliefs About Homosexuality and Hate Crimes")1.cor.test(hate_and_sexuality$should_be_discouraged,hate_and_sexuality$avg_hatecrimes_per_100k_fbi)
2.cor.test(hate_and_sexualityNODC$should_be_discouraged,hate_and_sexualityNODC$avg_hatecrimes_per_100k_fbi) Both of these graphs display a negative correlation between percentage of people who believe homosexuality should be discouraged and the number of hate crimes. This means that as the percentage of people who belief homosexuality should be discouraged increases, the number of hate crimes decrease. The negative correlation disagrees with my hypothesis. The correlation coefficient for the data without DC is -.3089947. Similar to the importance of religion correlation coefficient, the coefficient is relatively close to zero, meaning that there is only a slight negative correlation between the two factors.
After analyzing several factors that might influence the amount of hate crimes in a state, I found that none of the factors had high correlation with hate crimes. I thought that if the population in a state has more non-U.S. citizens, there will be a greater number of hate crimes. There is a very slight positive correlation between the two but overall the correlation coefficient is not great enough to find that there is a real correlation between the two. Additionally, I thought that the more important religion was in a state, hate crime numbers would be higher. The scatter plots demonstrated that there is actually a slight negative correlation between the two and the opposite is true. The data shows that the more religious states do not have more hate crimes. Lastly, I thought if the percentage of the population who believe that homosexuality should be discouraged was high, the hate crime rate would be high as well. This also showed a negative correlation between the two. Overall, these three factors have no real influence on the number of hate crimes in a state. For further research, I would analyze other factors that could potentially influence the number of hate crimes such as the share of minority in each state or people who support the NRA versus those who don’t in each state. I think it would also be interesting to investigate the hate crimes in the District of Columbia to understand why the number is so much higher for a relatively small population in comparison to states.