Hate crimes and racism are one of the biggest social issues the world is suffering from. In the US there was surge in extremist groups and hate crimes after the election of Donald Trump, the last US president. In this project I will look into the hate crimes data between 9 to 18 Nov 2016 and try to see any relationship between the hate crimes and the states that chose Donald Trump and presence of minority groups.
Research question - Is there a significant relationship between percentage of Trump voters and presence of minority and the rate of hate crimes in US during the period of 9 to 18th Nov 2016?
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Get the data from fivethirtyeight GitHub account
initial <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/hate-crimes/hate_crimes.csv")## Parsed with column specification:
## cols(
## state = col_character(),
## median_household_income = col_double(),
## share_unemployed_seasonal = col_double(),
## share_population_in_metro_areas = col_double(),
## share_population_with_high_school_degree = col_double(),
## share_non_citizen = col_double(),
## share_white_poverty = col_double(),
## gini_index = col_double(),
## share_non_white = col_double(),
## share_voters_voted_trump = col_double(),
## hate_crimes_per_100k_splc = col_double(),
## avg_hatecrimes_per_100k_fbi = col_double()
## )
## state median_household_income share_unemployed_seasonal
## Length:51 Min. :35521 Min. :0.02800
## Class :character 1st Qu.:48657 1st Qu.:0.04200
## Mode :character Median :54916 Median :0.05100
## Mean :55224 Mean :0.04957
## 3rd Qu.:60719 3rd Qu.:0.05750
## Max. :76165 Max. :0.07300
##
## share_population_in_metro_areas share_population_with_high_school_degree
## Min. :0.3100 Min. :0.7990
## 1st Qu.:0.6300 1st Qu.:0.8405
## Median :0.7900 Median :0.8740
## Mean :0.7502 Mean :0.8691
## 3rd Qu.:0.8950 3rd Qu.:0.8980
## Max. :1.0000 Max. :0.9180
##
## share_non_citizen share_white_poverty gini_index share_non_white
## Min. :0.01000 Min. :0.04000 Min. :0.4190 Min. :0.0600
## 1st Qu.:0.03000 1st Qu.:0.07500 1st Qu.:0.4400 1st Qu.:0.1950
## Median :0.04500 Median :0.09000 Median :0.4540 Median :0.2800
## Mean :0.05458 Mean :0.09176 Mean :0.4538 Mean :0.3157
## 3rd Qu.:0.08000 3rd Qu.:0.10000 3rd Qu.:0.4665 3rd Qu.:0.4200
## Max. :0.13000 Max. :0.17000 Max. :0.5320 Max. :0.8100
## NA's :3
## share_voters_voted_trump hate_crimes_per_100k_splc avg_hatecrimes_per_100k_fbi
## Min. :0.040 Min. :0.06745 Min. : 0.2669
## 1st Qu.:0.415 1st Qu.:0.14271 1st Qu.: 1.2931
## Median :0.490 Median :0.22620 Median : 1.9871
## Mean :0.490 Mean :0.30409 Mean : 2.3676
## 3rd Qu.:0.575 3rd Qu.:0.35694 3rd Qu.: 3.1843
## Max. :0.700 Max. :1.52230 Max. :10.9535
## NA's :4 NA's :1
## # A tibble: 6 x 12
## state median_househol~ share_unemploye~ share_populatio~ share_populatio~
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Alab~ 42278 0.06 0.64 0.821
## 2 Alas~ 67629 0.064 0.63 0.914
## 3 Ariz~ 49254 0.063 0.9 0.842
## 4 Arka~ 44922 0.052 0.69 0.824
## 5 Cali~ 60487 0.059 0.97 0.806
## 6 Colo~ 60940 0.04 0.8 0.893
## # ... with 7 more variables: share_non_citizen <dbl>,
## # share_white_poverty <dbl>, gini_index <dbl>, share_non_white <dbl>,
## # share_voters_voted_trump <dbl>, hate_crimes_per_100k_splc <dbl>,
## # avg_hatecrimes_per_100k_fbi <dbl>
Dependent variable - hate_crimes_per_100k_splc (hate crimes per 100000 population per the Southern Poverty Law Center) and is numerical
Independent variable - i)Whether the state was in Blue or Red and its categorical ii)vote share of non-whites which is numerical
There are in total 51 cases among which 4 have value NA for the variable hate_crimes_per_100k_splc so we will be considering the only 47 cases
Create a dataframe of the above dataset. Add two columns ElectTrump where 1 is exceeds 50% vote share to Trump and 0 is less than 50%. Choice - to specify choice of candidate if Trump, Clinton or Split.
hate_crimes <- data.frame(initial %>% select(state, share_non_white, share_voters_voted_trump, hate_crimes_per_100k_splc) %>% filter(hate_crimes_per_100k_splc > 0) %>% mutate(ElectTrump = case_when(share_voters_voted_trump > 0.5 ~ 1, share_voters_voted_trump < 0.5 ~ 0), Choice =
case_when(share_voters_voted_trump > 0.5 ~ "Trump", share_voters_voted_trump < 0.5 ~ "Clinton", TRUE ~ "Split")))We see from the below plot the hate_crimes data is skewed right, bimodal distribution and has few outliers
From the plot below we see the distribution nearly normal in the states that voted Trump
hate_crimes_trump <- filter(hate_crimes, ElectTrump==1)
ggplot(hate_crimes_trump, aes(x=hate_crimes_per_100k_splc)) + geom_histogram(color="black", fill="grey")## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
We see the distribution looks like skewed right
hate_crimes_trump <- filter(hate_crimes, ElectTrump==0)
ggplot(hate_crimes_trump, aes(x=hate_crimes_per_100k_splc)) + geom_histogram(color="black", fill="grey")## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
##
## Call:
## lm(formula = hate_crimes_per_100k_splc ~ share_non_white + ElectTrump,
## data = hate_crimes)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.31347 -0.10560 -0.05053 0.03507 1.13203
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.395280 0.094782 4.170 0.000145 ***
## share_non_white -0.007953 0.244279 -0.033 0.974180
## ElectTrump -0.190565 0.072604 -2.625 0.011958 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.242 on 43 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.1408, Adjusted R-squared: 0.1009
## F-statistic: 3.525 on 2 and 43 DF, p-value: 0.03824
\(H\)0 : Hate-crime rate is independent on Trump vote share and presence of minority in the state
\(H\)1 : Hate-crime rate is dependent on Trump vote share and presence of minority in the state
From on the model values we see r-squared=0.14, p-value=0.03 which is less than 0.05. We have sufficient evidence to reject our null hypothesis(\(H\)0), which means the hate-crime is dependent on Trump vote share and presence on minority in the state.
From the data visualization we see there is higher crime rate in the states having higher Democratic vote share.
From the model output it appears that with holding all the other predictors constant as the Trump vote share increases there will be a decrease of crime rate by 0.190565 per 100k population. Similarly, with all the other predictors constant as the non white share of population increases by 100k in the state there will be a decrease of hate crime rate by 0.007953.