In recent years, we have heard hate crimes more often from the media like New York Times. Hate Crimes are offenses that are motivated by a particular race, religion, ethnicity, gender, age, disability, ancestry, national origin or sexual orientation. Rising hate crimes may be due to “the new normal” that was caused by the pandemic.
From the acticle “Higher Rates Of Hate Crimes Are Tied To Income Inequality” by Maimuna Majumder. Maimuna point out that income inequality likely serves as a catalyzing condition for hate incidents.
From the earlier study, Jay wanted to see if there is relationship between the unemployment rate and hate crimes. We (Benson and Jay) both found that we develop a stronger interest in this topic after couple discussion. We really want to go deeper with this topic.
Furthermore, we want to know if household income and education level drive hate crimes as well besides income inequality and unemployment rate.
We create 2 new models which is household income and education level model and compare with unployment model to see there is any new finding or relatetionship. Also we want to see if New York Times API can support our statement of hearing hate more often.
As we said, we have hread more and more hate crimes in recent years. From the NYT archive API, we found hate crime related articles increase 300% from 2019 to 2020 under New York Times Archive API.
nyturl19 <- paste("https://api.nytimes.com/svc/archive/v1/2019/1.json?api-key=", apikey, sep='')
hateData19 <- fromJSON(nyturl19) #Look at what is useful
hateData19 <- hateData19 %>%
as.data.frame() %>%
select(-response.docs.multimedia) %>%
filter(str_detect(response.docs.lead_paragraph, "hate crime"))
nyturl20 <- paste("https://api.nytimes.com/svc/archive/v1/2020/1.json?api-key=", apikey, sep='')
hateData20 <- fromJSON(nyturl20) #Look at what is useful
hateData20 <- hateData20 %>%
as.data.frame() %>%
select(-response.docs.multimedia) %>%
filter(str_detect(response.docs.lead_paragraph, "hate crime"))
hateData19 <- hateData19 %>%
select(response.docs.pub_date,response.docs.lead_paragraph)
hateData20 <- hateData20 %>%
select(response.docs.pub_date,response.docs.lead_paragraph)
print(hateData19)
## response.docs.pub_date
## 1 2019-01-29T18:15:15+0000
## response.docs.lead_paragraph
## 1 Jussie Smollett, one of the stars of the Fox television show “Empire,” was attacked in Chicago early Tuesday morning by two people who yelled racial and homophobic slurs and wrapped a rope around his neck, according to the police, who said they were investigating the incident as “a possible hate crime.”
print(hateData20)
## response.docs.pub_date
## 1 2020-01-03T10:00:23+0000
## 2 2020-01-06T10:00:14+0000
## 3 2020-01-21T10:00:13+0000
## response.docs.lead_paragraph
## 1 When slurs were spray-painted on a kosher market in West Los Angeles, it was classified by law enforcement as an anti-Semitic hate crime. But when a swastika was carved into a park picnic table or spray-painted on a stop sign, it was classified as an anti-religious hate crime.
## 2 Authorities in Oregon charged a woman with hate crimes for pulling off another woman’s hijab, trying to strangle her with it and “intentionally desecrating” it, prosecutors said on Friday.
## 3 Today, even as the 75th anniversary of the liberation of Auschwitz, on Jan. 27, approaches, our news feeds are teeming with more and more reports of hate crimes and extreme ideologies. I am alive because my paternal grandfather’s Spidey sense had him frantically looking for ways out of Germany in 1933. “When madmen are elected, it’s time to leave the country,” he said. Now I, and many others I’m sure, worry that a catastrophe is looming, and wonder how we can guard against it. Schoolchildren are now taught: “Be an upstander, not a bystander!” History, we’re told, shows that, as Edmund Burke supposedly said, “The only thing necessary for the triumph of evil is that good men do nothing.” Or in the words of historian Ian Kershaw, “The road to Auschwitz was built by hate but paved with indifference.” These aphorisms pithily conjure up images of wise men stroking their long white beards, but they’re wrong.
The data are from FBI, Education Attainment in the U.S. and Southern Poverty Law Center. The FBI Uniform Crime Reporting Program collects hate crime data from law enforcement agencies. The UCR Program collects data on only prosecutable hate crimes, which make up a fraction of hate incidents (which includes non-prosecutable offenses, such as circulation of white nationalist recruitment materials on college campuses). The Southern Poverty Law Center uses media accounts and people’s self-reports to assess the situation. The Education Attainment is from U.S. Census Bureau.
The Southern Poverty Law Center uses media accounts and people’s self-reports to assess the situation.
We want to define what is high unemployed rate housegold income and education level, so we use median as a dividing line.
If the share_unemployed_seasonal is higher than median, then the reply under high_unemployed is true, else is false.
If the median_household_income is higher than median, then the reply under high_housegoldincome is true, else is false.
If the PercentBachelorsOrHigher is higher than median, then the reply under high_education is true, else is false.
Here is the summary of the data:
## state median_household_income share_unemployed_seasonal
## Length:47 Min. :35521 Min. :0.02900
## Class :character 1st Qu.:47630 1st Qu.:0.04350
## Mode :character Median :54310 Median :0.05200
## Mean :54802 Mean :0.05087
## 3rd Qu.:60598 3rd Qu.:0.05800
## Max. :76165 Max. :0.07300
## hate_crimes_per_100k_splc avg_hatecrimes_per_100k_fbi hate_crimes_combine
## Min. :0.06745 Min. : 0.412 Min. : 0.5324
## 1st Qu.:0.14271 1st Qu.: 1.304 1st Qu.: 1.4788
## Median :0.22620 Median : 1.937 Median : 2.2272
## Mean :0.30409 Mean : 2.342 Mean : 2.6460
## 3rd Qu.:0.35693 3rd Qu.: 3.119 3rd Qu.: 3.4408
## Max. :1.52230 Max. :10.953 Max. :12.4758
## high_unemployed high_housegoldincome share_unemployed_seasonal_100k
## Mode :logical Mode :logical Min. :2900
## FALSE:27 FALSE:24 1st Qu.:4350
## TRUE :20 TRUE :23 Median :5200
## Mean :5087
## 3rd Qu.:5800
## Max. :7300
First of all, lets check the the unemployement level, hate crime level, and high education status in each states. Then go thru the data distribution , summary of unemployed rate, household income, and education level.
#mapping
plot_usmap(data = joined_tibble, values = "hate_crimes_per_100k_splc", color = "red") +
scale_fill_continuous(
low = "white", high = "red", name = "Hate Crimes SPLC source", label = scales::comma
) + theme(legend.position = "right")
plot_usmap(data = joined_tibble, values = "avg_hatecrimes_per_100k_fbi", color = "red") +
scale_fill_continuous(
low = "white", high = "red", name = "Hate Crimes fbi source", label = scales::comma
) + theme(legend.position = "right")
plot_usmap(data = hate_urlforplot, values = "median_household_income", color = "purple") +
scale_fill_continuous(
low = "white", high = "purple", name = "Household income arcoss US", label = scales::comma
) + theme(legend.position = "right")
plot_usmap(data = hate_urlforplot, values = "share_unemployed_seasonal", color = "brown") +
scale_fill_continuous(
low = "white", high = "brown", name = "unemployment arcoss US", label = scales::comma
) + theme(legend.position = "right")
plot_usmap(data = joined_tibble, values = "PercentBachelorsOrHigher", color = "blue") +
scale_fill_continuous(
low = "white", high = "blue", name = "Bachelors Or Higher", label = scales::comma
) + theme(legend.position = "right")
plot_usmap(data = joined_tibble, values = "PercentHighSchoolOrHigher", color = "blue") +
scale_fill_continuous(
low = "white", high = "blue", name = "HighSchool Or Higher", label = scales::comma
) + theme(legend.position = "right")
Now lets look at the density of hate crime, umployment rate and household income.
lets check the data density and summary of unemployed rate. The unemployment data appear a little left skewed for me.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2900 4350 5200 5087 5800 7300
Now lets check the data distribution and summary of hate crimes case. Hate crimes data is different from previous result, the data appear right skewed for me, and clearly we can see one outliers from the Now lets check the data distribution and summary of crimes case. It is different from previous result, the data appear right skewed for me, and clearly we can see one outliers from the plot.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5324 1.4788 2.2272 2.6460 3.4408 12.4758
Because we see the outliers, we would like to clear it.
Now we can see the mean drop from 2.6460 to 2.4323, and max drop from 12.4758 to 5.4327 .
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5324 1.4648 2.2042 2.4323 3.4073 5.4327
For household income, we have similar left skewed results, however, there is no outliner this time.
ggplot(hate_url, aes(median_household_income)) +
geom_histogram(aes(y = ..density..),bins=20) +
stat_function(fun = dnorm,
args = list(mean = mean(hate_url$median_household_income),
sd = sd(hate_url$median_household_income)),
col = "#1b98e0",
size = 1)
summary(hate_url$median_household_income)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 35521 47630 54310 54802 60598 76165
After removing the outliner of hate crimes, let’s compare the result of both group of higher than avg unemployed rate and lower than avg unemployed rate. There is more states has lower unemployed rate.
However, the states has lower unemployed rate tend to have a higher hate crime cases.Higer household income has a higher hate crimes as well. It is a shocking result.
boxplot(new_hate_url$hate_crimes_combine ~ new_hate_url$high_unemployed, main = "Boxplot with hate crimes case of high unemployed rate", ylab = "hate crimes case", xlab = "high unemployed rate",col="blue")
boxplot(new_hate_url$hate_crimes_combine ~ new_hate_url$high_housegoldincome, main = "Boxplot with hate crimes case of high household income", ylab = "hate crimes case", xlab = "high unemployed rate",col="blue")
If unployment and lower income does not drive hate crimes, how about education?
Also, the states has High education level tend to have a higher hate crime cases !!!???
boxplot(joined_tibble$avg_hatecrimes_per_100k_fbi ~ joined_tibble$high_education, main = "Boxplot with hate crimes case of high education rate", ylab = "hate crimes case", xlab = "high education rate",col="blue")
We started thinking if these 3 models are statistically significant.
hate_model <- lm(new_hate_url$hate_crimes_combine ~ new_hate_url$share_unemployed_seasonal_100k )
summary(hate_model)
##
## Call:
## lm(formula = new_hate_url$hate_crimes_combine ~ new_hate_url$share_unemployed_seasonal_100k)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8308 -0.8845 -0.2797 0.9434 2.9586
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 2.8991798 0.9601032 3.020
## new_hate_url$share_unemployed_seasonal_100k -0.0000924 0.0001866 -0.495
## Pr(>|t|)
## (Intercept) 0.0042 **
## new_hate_url$share_unemployed_seasonal_100k 0.6229
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.233 on 44 degrees of freedom
## Multiple R-squared: 0.005542, Adjusted R-squared: -0.01706
## F-statistic: 0.2452 on 1 and 44 DF, p-value: 0.6229
hate_model2 <- lm(new_hate_url$hate_crimes_combine ~ new_hate_url$median_household_income )
summary(hate_model2)
##
## Call:
## lm(formula = new_hate_url$hate_crimes_combine ~ new_hate_url$median_household_income)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7398 -1.0429 -0.1728 1.0988 2.6949
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.057e-01 1.075e+00 0.470 0.6405
## new_hate_url$median_household_income 3.535e-05 1.946e-05 1.816 0.0761 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.192 on 44 degrees of freedom
## Multiple R-squared: 0.06975, Adjusted R-squared: 0.0486
## F-statistic: 3.299 on 1 and 44 DF, p-value: 0.07614
The p-value is 0.6229 which mean the unemployment model is not statistically significant
also, one more thing bring my attention which is R-squared is 0.005542 which is low.
The p-value of household income model is 0.07614 which is still show that the model is not statistically significant.
R-squared is is low for household income model too, it is 0.06975. However, base on the p-value and R-squared, household income model has a stronger relationship than employment although it is not good to use.
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
Also, the distribution is heavy tailed since there is few extreme prositve and negative residuals for both model.
ggplot(data = hate_model, aes(sample = .resid)) +
stat_qq(colour = "blue", size = 1) + stat_qq_line(colour = "red", size = 1)
ggplot(data = hate_model2, aes(sample = .resid)) +
stat_qq(colour = "blue", size = 1) + stat_qq_line(colour = "red", size = 1)
Which mean both unemployment and household model was not good.
How about education level model?
##
## Call:
## lm(formula = PercentHighSchoolOrHigher ~ avg_hatecrimes_per_100k_fbi,
## data = joined_tibble)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.066658 -0.018123 0.005416 0.020261 0.045239
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.833e-01 8.043e-03 109.82 <2e-16 ***
## avg_hatecrimes_per_100k_fbi 5.583e+02 3.227e+02 1.73 0.0902 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02676 on 47 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.05985, Adjusted R-squared: 0.03985
## F-statistic: 2.992 on 1 and 47 DF, p-value: 0.09023
##
## Call:
## lm(formula = PercentBachelorsOrHigher ~ avg_hatecrimes_per_100k_fbi,
## data = joined_tibble)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.104531 -0.037579 0.003512 0.025589 0.101360
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2776 0.0149 18.633 <2e-16 ***
## avg_hatecrimes_per_100k_fbi 1591.7418 597.7276 2.663 0.0106 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04957 on 47 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.1311, Adjusted R-squared: 0.1126
## F-statistic: 7.092 on 1 and 47 DF, p-value: 0.01057
##
## Call:
## lm(formula = PercentHighSchoolOrHigher ~ hate_crimes_per_100k_splc,
## data = joined_tibble)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.062138 -0.014798 0.001038 0.015985 0.044748
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.764e-01 6.862e-03 127.717 < 2e-16 ***
## hate_crimes_per_100k_splc 6.145e+03 2.088e+03 2.942 0.00518 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0249 on 44 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.1644, Adjusted R-squared: 0.1454
## F-statistic: 8.658 on 1 and 44 DF, p-value: 0.00518
##
## Call:
## lm(formula = PercentBachelorsOrHigher ~ hate_crimes_per_100k_splc,
## data = joined_tibble)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.110438 -0.036228 -0.005639 0.041764 0.110488
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.798e-01 1.369e-02 20.435 < 2e-16 ***
## hate_crimes_per_100k_splc 1.235e+04 4.168e+03 2.964 0.00489 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04969 on 44 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.1664, Adjusted R-squared: 0.1475
## F-statistic: 8.784 on 1 and 44 DF, p-value: 0.004889
Based on the regression line from the plot, the highest education level had the highest hate crime rate.
Also, the P-value is low, 0.004889, and it replicates. However, the data show a low Multiple R-squared: 0.1664, which means the education level has not much related to the crime rate.
ggplot(data = joined_tibble, aes(x = avg_hatecrimes_per_100k_fbi, y = PercentHighSchoolOrHigher)) +
geom_jitter() +
geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
hist(m1$residuals)
qqnorm(m1$residuals)
qqline(m1$residuals)
ggplot(data = joined_tibble, aes(x = avg_hatecrimes_per_100k_fbi, y = PercentBachelorsOrHigher)) +
geom_jitter() +
geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
hist(m2$residuals)
qqnorm(m2$residuals)
qqline(m2$residuals)
ggplot(data = joined_tibble, aes(x = hate_crimes_per_100k_splc, y = PercentHighSchoolOrHigher)) +
geom_jitter() +
geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 5 rows containing non-finite values (stat_smooth).
## Warning: Removed 5 rows containing missing values (geom_point).
hist(m3$residuals)
qqnorm(m3$residuals)
qqline(m3$residuals)
ggplot(data = joined_tibble, aes(x = hate_crimes_per_100k_splc, y = PercentBachelorsOrHigher)) +
geom_jitter() +
geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 5 rows containing non-finite values (stat_smooth).
## Warning: Removed 5 rows containing missing values (geom_point).
hist(m4$residuals)
qqnorm(m4$residuals)
qqline(m4$residuals)
Unlike our prediction that higher rates of hate crimes are tied to household income, unemployment rate and education level. Our analysis show that is not a strong relationship between unemployment rate, household income, education level and hate crimes case from all the plot and summary.
Although education and household income has a stronger relation than unemployment. However, there is still too much variability in the model and High p value shows that the model is not statistically significant.
Maybe hate crimes has higher chance that it is cause by what you see or read from media instead of the income/employment and education. it is just like article Hate Speech on Twitter Predicts Frequency of Real-life Hate Crimes point out.
https://www.nyu.edu/about/news-publications/news/2019/june/hate-speech-on-twitter-predicts-frequency-of-real-life-hate-crim.html https://fivethirtyeight.com/features/higher-rates-of-hate-crimes-are-tied-to-income-inequality/ https://www.census.gov/data/tables/2018/demo/education-attainment/cps-detailed-tables.html