1) Introduction

Hate crimes and racism are one of the biggest social issues the world is suffering from. In the US there was surge in extremist groups and hate crimes after the election of Donald Trump, the last US president. In this project I will look into the hate crimes data between 9 to 18 Nov 2016 and try to see any relationship between the hate crimes and the states that chose Donald Trump and presence of minority groups.

Research question - Is there a significant relationship between percentage of Trump voters and presence of minority and the rate of hate crimes in US during the period of 9 to 18th Nov 2016?

2) Data Display

Required libraries

library(readr)
library(ggplot2)
library(DT)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Data repository

Get the data from fivethirtyeight GitHub account

initial <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/hate-crimes/hate_crimes.csv")
## Parsed with column specification:
## cols(
##   state = col_character(),
##   median_household_income = col_double(),
##   share_unemployed_seasonal = col_double(),
##   share_population_in_metro_areas = col_double(),
##   share_population_with_high_school_degree = col_double(),
##   share_non_citizen = col_double(),
##   share_white_poverty = col_double(),
##   gini_index = col_double(),
##   share_non_white = col_double(),
##   share_voters_voted_trump = col_double(),
##   hate_crimes_per_100k_splc = col_double(),
##   avg_hatecrimes_per_100k_fbi = col_double()
## )
summary(initial)
##     state           median_household_income share_unemployed_seasonal
##  Length:51          Min.   :35521           Min.   :0.02800          
##  Class :character   1st Qu.:48657           1st Qu.:0.04200          
##  Mode  :character   Median :54916           Median :0.05100          
##                     Mean   :55224           Mean   :0.04957          
##                     3rd Qu.:60719           3rd Qu.:0.05750          
##                     Max.   :76165           Max.   :0.07300          
##                                                                      
##  share_population_in_metro_areas share_population_with_high_school_degree
##  Min.   :0.3100                  Min.   :0.7990                          
##  1st Qu.:0.6300                  1st Qu.:0.8405                          
##  Median :0.7900                  Median :0.8740                          
##  Mean   :0.7502                  Mean   :0.8691                          
##  3rd Qu.:0.8950                  3rd Qu.:0.8980                          
##  Max.   :1.0000                  Max.   :0.9180                          
##                                                                          
##  share_non_citizen share_white_poverty   gini_index     share_non_white 
##  Min.   :0.01000   Min.   :0.04000     Min.   :0.4190   Min.   :0.0600  
##  1st Qu.:0.03000   1st Qu.:0.07500     1st Qu.:0.4400   1st Qu.:0.1950  
##  Median :0.04500   Median :0.09000     Median :0.4540   Median :0.2800  
##  Mean   :0.05458   Mean   :0.09176     Mean   :0.4538   Mean   :0.3157  
##  3rd Qu.:0.08000   3rd Qu.:0.10000     3rd Qu.:0.4665   3rd Qu.:0.4200  
##  Max.   :0.13000   Max.   :0.17000     Max.   :0.5320   Max.   :0.8100  
##  NA's   :3                                                              
##  share_voters_voted_trump hate_crimes_per_100k_splc avg_hatecrimes_per_100k_fbi
##  Min.   :0.040            Min.   :0.06745           Min.   : 0.2669            
##  1st Qu.:0.415            1st Qu.:0.14271           1st Qu.: 1.2931            
##  Median :0.490            Median :0.22620           Median : 1.9871            
##  Mean   :0.490            Mean   :0.30409           Mean   : 2.3676            
##  3rd Qu.:0.575            3rd Qu.:0.35694           3rd Qu.: 3.1843            
##  Max.   :0.700            Max.   :1.52230           Max.   :10.9535            
##                           NA's   :4                 NA's   :1
head(initial)
## # A tibble: 6 x 12
##   state median_househol~ share_unemploye~ share_populatio~ share_populatio~
##   <chr>            <dbl>            <dbl>            <dbl>            <dbl>
## 1 Alab~            42278            0.06              0.64            0.821
## 2 Alas~            67629            0.064             0.63            0.914
## 3 Ariz~            49254            0.063             0.9             0.842
## 4 Arka~            44922            0.052             0.69            0.824
## 5 Cali~            60487            0.059             0.97            0.806
## 6 Colo~            60940            0.04              0.8             0.893
## # ... with 7 more variables: share_non_citizen <dbl>,
## #   share_white_poverty <dbl>, gini_index <dbl>, share_non_white <dbl>,
## #   share_voters_voted_trump <dbl>, hate_crimes_per_100k_splc <dbl>,
## #   avg_hatecrimes_per_100k_fbi <dbl>

Variables

Dependent variable - hate_crimes_per_100k_splc (hate crimes per 100000 population per the Southern Poverty Law Center) and is numerical

Independent variable - i)Whether the state was in Blue or Red and its categorical ii)vote share of non-whites which is numerical

Data cleaning

There are in total 51 cases among which 4 have value NA for the variable hate_crimes_per_100k_splc so we will be considering the only 47 cases

Add new columns

Create a dataframe of the above dataset. Add two columns ElectTrump where 1 is exceeds 50% vote share to Trump and 0 is less than 50%. Choice - to specify choice of candidate if Trump, Clinton or Split.

hate_crimes <- data.frame(initial %>% select(state, share_non_white, share_voters_voted_trump, hate_crimes_per_100k_splc) %>% filter(hate_crimes_per_100k_splc > 0) %>% mutate(ElectTrump = case_when(share_voters_voted_trump > 0.5 ~ 1, share_voters_voted_trump < 0.5 ~ 0), Choice = 
                                                                                                      case_when(share_voters_voted_trump > 0.5 ~ "Trump", share_voters_voted_trump <  0.5 ~ "Clinton",   TRUE ~ "Split")))

View data using datatable

datatable(hate_crimes)

Visualize the data using histogram

Overall data

We see from the below plot the hate_crimes data is skewed right, bimodal distribution and has few outliers

ggplot(hate_crimes, aes(x=hate_crimes_per_100k_splc)) + geom_histogram(bins = 30)

States that voted for Trump

From the plot below we see the distribution nearly normal in the states that voted Trump

hate_crimes_trump <- filter(hate_crimes, ElectTrump==1)
ggplot(hate_crimes_trump, aes(x=hate_crimes_per_100k_splc)) + geom_histogram(color="black", fill="grey")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

States that didn’t voted for Trump

We see the distribution looks like skewed right

hate_crimes_trump <- filter(hate_crimes, ElectTrump==0)
ggplot(hate_crimes_trump, aes(x=hate_crimes_per_100k_splc)) + geom_histogram(color="black", fill="grey")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Crime versus non white share

We see a large part of the crime is under 0.4 and as it goes above 0.4 the non_white share is under 0.3

ggplot(hate_crimes, aes(x=share_non_white, y=hate_crimes_per_100k_splc)) + geom_point()

Visualize over boxplot

From the boxplot it appears that hate_crime rate is higher where the vote share for Democrats was higher.

ggplot(hate_crimes) + 
  geom_boxplot(mapping = aes(x = as.factor(Choice), y = hate_crimes_per_100k_splc)) +
  labs(x = "", y = "Average Hate Crimes") 

3) Data Analysis

Linear Regression

model <- lm(hate_crimes_per_100k_splc ~ share_non_white + ElectTrump, hate_crimes)
summary(model)
## 
## Call:
## lm(formula = hate_crimes_per_100k_splc ~ share_non_white + ElectTrump, 
##     data = hate_crimes)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.31347 -0.10560 -0.05053  0.03507  1.13203 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.395280   0.094782   4.170 0.000145 ***
## share_non_white -0.007953   0.244279  -0.033 0.974180    
## ElectTrump      -0.190565   0.072604  -2.625 0.011958 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.242 on 43 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.1408, Adjusted R-squared:  0.1009 
## F-statistic: 3.525 on 2 and 43 DF,  p-value: 0.03824

Hypothesis Tests

\(H\)0 : Hate-crime rate is independent on Trump vote share and presence of minority in the state

\(H\)1 : Hate-crime rate is dependent on Trump vote share and presence of minority in the state

From on the model values we see r-squared=0.14, p-value=0.03 which is less than 0.05. We have sufficient evidence to reject our null hypothesis(\(H\)0), which means the hate-crime is dependent on Trump vote share and presence on minority in the state.

4) Conclusion

From the data visualization we see there is higher crime rate in the states having higher Democratic vote share.

From the model output it appears that with holding all the other predictors constant as the Trump vote share increases there will be a decrease of crime rate by 0.190565 per 100k population. Similarly, with all the other predictors constant as the non white share of population increases by 100k in the state there will be a decrease of hate crime rate by 0.007953.