606 Project

Part 1 - Introduction

In recent couple years, we hear hate crimes more often from the media. Hate Crimes are the offenses that motivated by a particular race, religion, ethnicity, gender, age, disability, ancestry, national origin or sexual orientation. it may due to “the new normal” that caused by pandemic. Pandemic also cause a lot of people losing their job, furthermore, does unemployment drive hate crimes as well? Is there a relationship between unemployment rate and hate crimes?

Part 2 - Data

The data are from FBI and Southern Poverty Law Center.

The FBI Uniform Crime Reporting Program collects hate crime data from law enforcement agencies. the UCR Program collects data on only prosecutable hate crimes, which make up a fraction of hate incidents (which includes non-prosecutable offenses, such as circulation of white nationalist recruitment materials on college campuses).

The Southern Poverty Law Center uses media accounts and people’s self-reports to assess the situation.

I want to define what is high unemployed rate, so I use median as a dividing line.

If the share_unemployed_seasonal is higher than median, then the reply under high_unemployed is true, else is false.

Here is the summary of the data:

##     state           median_household_income share_unemployed_seasonal
##  Length:47          Min.   :35521           Min.   :0.02900          
##  Class :character   1st Qu.:47630           1st Qu.:0.04350          
##  Mode  :character   Median :54310           Median :0.05200          
##                     Mean   :54802           Mean   :0.05087          
##                     3rd Qu.:60598           3rd Qu.:0.05800          
##                     Max.   :76165           Max.   :0.07300          
##  hate_crimes_per_100k_splc avg_hatecrimes_per_100k_fbi hate_crimes_combine
##  Min.   :0.06745           Min.   : 0.412              Min.   : 0.5324    
##  1st Qu.:0.14271           1st Qu.: 1.304              1st Qu.: 1.4788    
##  Median :0.22620           Median : 1.937              Median : 2.2272    
##  Mean   :0.30409           Mean   : 2.342              Mean   : 2.6460    
##  3rd Qu.:0.35693           3rd Qu.: 3.119              3rd Qu.: 3.4408    
##  Max.   :1.52230           Max.   :10.953              Max.   :12.4758    
##  high_unemployed share_unemployed_seasonal_100k
##  Mode :logical   Min.   :2900                  
##  FALSE:27        1st Qu.:4350                  
##  TRUE :20        Median :5200                  
##                  Mean   :5087                  
##                  3rd Qu.:5800                  
##                  Max.   :7300

Part 3 - Exploratory data analysis

First of all, lets check the data distribution and summary of unemployed rate. The data appear a little left skewed for me.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2900    4350    5200    5087    5800    7300

Now lets check the data distribution and summary of crimes case. It is different from previous result, the data appear right skewed for me, and clearly we can see one outliers from the plot.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5324  1.4788  2.2272  2.6460  3.4408 12.4758

Because we see the outliers, we would like to clear it.

Now we can see the mean drop from 2.6460 to 2.4323

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5324  1.4648  2.2042  2.4323  3.4073  5.4327

After removing the outliner, let compare the result of both higher than avg unemployed rate and lower than avg unemployed rate group. There is more states are lower than avg unemployed rate.

However, they tend to have a higher hate crime case.

ggplot(new_hate_url, aes(x = high_unemployed)) +
  geom_bar(fill="blue") +
  labs(
    x = "", y = "",
    title = "Is your state has higher than avg unemployed rate?"
  ) +
  coord_flip()

boxplot(new_hate_url$hate_crimes_combine ~ new_hate_url$high_unemployed, main = "Boxplot with hate crimes case of high unemployed rate", ylab = "hate crimes case", xlab = "high unemployed rate",col="blue")

Part 4 - Inference

hate_model <- lm(new_hate_url$hate_crimes_combine ~ new_hate_url$share_unemployed_seasonal_100k )
summary(hate_model)

## 
## Call:
## lm(formula = new_hate_url$hate_crimes_combine ~ new_hate_url$share_unemployed_seasonal_100k)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8308 -0.8845 -0.2797  0.9434  2.9586 
## 
## Coefficients:
##                                               Estimate Std. Error t value
## (Intercept)                                  2.8991798  0.9601032   3.020
## new_hate_url$share_unemployed_seasonal_100k -0.0000924  0.0001866  -0.495
##                                             Pr(>|t|)   
## (Intercept)                                   0.0042 **
## new_hate_url$share_unemployed_seasonal_100k   0.6229   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.233 on 44 degrees of freedom
## Multiple R-squared:  0.005542,   Adjusted R-squared:  -0.01706 
## F-statistic: 0.2452 on 1 and 44 DF,  p-value: 0.6229

\[ \hat{y} = 2.8991798 - 0.0000924 \times share\_unemployed\_seasonal\_100k \]

the p-value is 0.6229 which mean it is not statistically significant. since the sample size is small and not huge, it should not affect the raito of N, so the high p-value is pretty accurate and meaningful.

also, one more thing bring my attention which is R-squared is 0.005542 which mean effect size is small, it also show in the plot which is far away from the regression line.

## `geom_smooth()` using formula 'y ~ x'

Also, the distribution is heavy tailed since there is few extreme prositve and negative residuals.

ggplot(data = hate_model, aes(sample = .resid)) +
  stat_qq(colour = "blue", size = 1) + stat_qq_line(colour = "red", size = 1)

Part 5 - Conclusion

Unlike Maimuna Majumder found that higher rates of hate crimes are tied to income inequality, I do not think the data show that is a strong relationship between unemployment rate and hate crimes case from all the plot and summary we see above.

There is too much variability in the model and High p value shows that the model is not statistically significant. I believe if the data break it down into zip code it may have a more accurate conclusion by comparing by zip code group since they share more similar cultural values. Also I think political socialization drive the result. It is more fair to pick a state for study instead of the whole US.

https://fivethirtyeight.com/features/higher-rates-of-hate-crimes-are-tied-to-income-inequality/