607 final project

look at the data and pull the useful info

nyturl19 <- paste("https://api.nytimes.com/svc/archive/v1/2019/1.json?api-key=", apikey, sep='')
hateData19 <- fromJSON(nyturl19) #Look at what is useful
hateData19 <- hateData19 %>% 
as.data.frame() %>%
select(-response.docs.multimedia) %>%
filter(str_detect(response.docs.lead_paragraph, "hate crime"))
nyturl20 <- paste("https://api.nytimes.com/svc/archive/v1/2020/1.json?api-key=", apikey, sep='')
hateData20 <- fromJSON(nyturl20) #Look at what is useful
hateData20 <- hateData20 %>% 
as.data.frame() %>%
select(-response.docs.multimedia) %>%
filter(str_detect(response.docs.lead_paragraph, "hate crime"))
hateData19 <- hateData19 %>% 
select(response.docs.pub_date,response.docs.lead_paragraph)
hateData20 <- hateData20 %>% 
select(response.docs.pub_date,response.docs.lead_paragraph)
print(hateData19)

##     response.docs.pub_date
## 1 2019-01-29T18:15:15+0000
##                                                                                                                                                                                                                                                                                       response.docs.lead_paragraph
## 1 Jussie Smollett, one of the stars of the Fox television show “Empire,” was attacked in Chicago early Tuesday morning by two people who yelled racial and homophobic slurs and wrapped a rope around his neck, according to the police, who said they were investigating the incident as “a possible hate crime.”

print(hateData20)

##     response.docs.pub_date
## 1 2020-01-03T10:00:23+0000
## 2 2020-01-06T10:00:14+0000
## 3 2020-01-21T10:00:13+0000
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         response.docs.lead_paragraph
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             When slurs were spray-painted on a kosher market in West Los Angeles, it was classified by law enforcement as an anti-Semitic hate crime. But when a swastika was carved into a park picnic table or spray-painted on a stop sign, it was classified as an anti-religious hate crime. 
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Authorities in Oregon charged a woman with hate crimes for pulling off another woman’s hijab, trying to strangle her with it and “intentionally desecrating” it, prosecutors said on Friday. 
## 3 Today, even as the 75th anniversary of the liberation of Auschwitz, on Jan. 27, approaches, our news feeds are teeming with more and more reports of hate crimes and extreme ideologies. I am alive because my paternal grandfather’s Spidey sense had him frantically looking for ways out of Germany in 1933. “When madmen are elected, it’s time to leave the country,” he said. Now I, and many others I’m sure, worry that a catastrophe is looming, and wonder how we can guard against it. Schoolchildren are now taught: “Be an upstander, not a bystander!” History, we’re told, shows that, as Edmund Burke supposedly said, “The only thing necessary for the triumph of evil is that good men do nothing.” Or in the words of historian Ian Kershaw, “The road to Auschwitz was built by hate but paved with indifference.” These aphorisms pithily conjure up images of wise men stroking their long white beards, but they’re wrong.

Data

The data are from FBI, Education Attainment in the U.S. and Southern Poverty Law Center. The FBI Uniform Crime Reporting Program collects hate crime data from law enforcement agencies. The UCR Program collects data on only prosecutable hate crimes, which make up a fraction of hate incidents (which includes non-prosecutable offenses, such as circulation of white nationalist recruitment materials on college campuses). The Southern Poverty Law Center uses media accounts and people’s self-reports to assess the situation. The Education Attainment is from U.S. Census Bureau.

The Southern Poverty Law Center uses media accounts and people’s self-reports to assess the situation.

We want to define what is high unemployed rate housegold income and education level, so we use median as a dividing line.

If the share_unemployed_seasonal is higher than median, then the reply under high_unemployed is true, else is false.

If the median_household_income is higher than median, then the reply under high_housegoldincome is true, else is false.

If the PercentBachelorsOrHigher is higher than median, then the reply under high_education is true, else is false.

Here is the summary of the data:

##     state           median_household_income share_unemployed_seasonal
##  Length:47          Min.   :35521           Min.   :0.02900          
##  Class :character   1st Qu.:47630           1st Qu.:0.04350          
##  Mode  :character   Median :54310           Median :0.05200          
##                     Mean   :54802           Mean   :0.05087          
##                     3rd Qu.:60598           3rd Qu.:0.05800          
##                     Max.   :76165           Max.   :0.07300          
##  hate_crimes_per_100k_splc avg_hatecrimes_per_100k_fbi hate_crimes_combine
##  Min.   :0.06745           Min.   : 0.412              Min.   : 0.5324    
##  1st Qu.:0.14271           1st Qu.: 1.304              1st Qu.: 1.4788    
##  Median :0.22620           Median : 1.937              Median : 2.2272    
##  Mean   :0.30409           Mean   : 2.342              Mean   : 2.6460    
##  3rd Qu.:0.35693           3rd Qu.: 3.119              3rd Qu.: 3.4408    
##  Max.   :1.52230           Max.   :10.953              Max.   :12.4758    
##  high_unemployed high_housegoldincome share_unemployed_seasonal_100k
##  Mode :logical   Mode :logical        Min.   :2900                  
##  FALSE:27        FALSE:24             1st Qu.:4350                  
##  TRUE :20        TRUE :23             Median :5200                  
##                                       Mean   :5087                  
##                                       3rd Qu.:5800                  
##                                       Max.   :7300

Data analysis

First of all, lets check the the unemployement level, hate crime level, and high education status in each states. Then go thru the data distribution , summary of unemployed rate, household income, and education level.

#mapping 
plot_usmap(data = joined_tibble, values = "hate_crimes_per_100k_splc", color = "red") + 
  scale_fill_continuous(
    low = "white", high = "red", name = "Hate Crimes SPLC source", label = scales::comma
  ) + theme(legend.position = "right")

plot_usmap(data = joined_tibble, values = "avg_hatecrimes_per_100k_fbi", color = "red") + 
  scale_fill_continuous(
    low = "white", high = "red", name = "Hate Crimes fbi source", label = scales::comma
  ) + theme(legend.position = "right")

plot_usmap(data = hate_urlforplot, values = "median_household_income", color = "purple") + 
  scale_fill_continuous(
    low = "white", high = "purple", name = "Household income arcoss US", label = scales::comma
  ) + theme(legend.position = "right")

plot_usmap(data = hate_urlforplot, values = "share_unemployed_seasonal", color = "brown") + 
  scale_fill_continuous(
    low = "white", high = "brown", name = "unemployment arcoss US", label = scales::comma
  ) + theme(legend.position = "right")

plot_usmap(data = joined_tibble, values = "PercentBachelorsOrHigher", color = "blue") + 
  scale_fill_continuous(
    low = "white", high = "blue", name = "Bachelors Or Higher", label = scales::comma
  ) + theme(legend.position = "right")

plot_usmap(data = joined_tibble, values = "PercentHighSchoolOrHigher", color = "blue") + 
  scale_fill_continuous(
    low = "white", high = "blue", name = "HighSchool Or Higher", label = scales::comma
  ) + theme(legend.position = "right")

Now lets look at the density of hate crime, umployment rate and household income.

lets check the data density and summary of unemployed rate. The unemployment data appear a little left skewed for me.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2900    4350    5200    5087    5800    7300

Now lets check the data distribution and summary of hate crimes case. Hate crimes data is different from previous result, the data appear right skewed for me, and clearly we can see one outliers from the Now lets check the data distribution and summary of crimes case. It is different from previous result, the data appear right skewed for me, and clearly we can see one outliers from the plot.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5324  1.4788  2.2272  2.6460  3.4408 12.4758

Because we see the outliers, we would like to clear it.

Now we can see the mean drop from 2.6460 to 2.4323, and max drop from 12.4758 to 5.4327 .

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5324  1.4648  2.2042  2.4323  3.4073  5.4327

For household income, we have similar left skewed results, however, there is no outliner this time.

ggplot(hate_url, aes(median_household_income)) +        
  geom_histogram(aes(y = ..density..),bins=20) +
  stat_function(fun = dnorm,
                args = list(mean = mean(hate_url$median_household_income),
                            sd = sd(hate_url$median_household_income)),
                col = "#1b98e0",
                size = 1)

summary(hate_url$median_household_income)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   35521   47630   54310   54802   60598   76165

After removing the outliner of hate crimes, let’s compare the result of both group of higher than avg unemployed rate and lower than avg unemployed rate. There is more states has lower unemployed rate.

However, the states has lower unemployed rate tend to have a higher hate crime cases.Higer household income has a higher hate crimes as well. It is a shocking result.

boxplot(new_hate_url$hate_crimes_combine ~ new_hate_url$high_unemployed, main = "Boxplot with hate crimes case of high unemployed rate", ylab = "hate crimes case", xlab = "high unemployed rate",col="blue")

boxplot(new_hate_url$hate_crimes_combine ~ new_hate_url$high_housegoldincome, main = "Boxplot with hate crimes case of high household income", ylab = "hate crimes case", xlab = "high unemployed rate",col="blue")

If unployment and lower income does not drive hate crimes, how about education?

Also, the states has High education level tend to have a higher hate crime cases !!!???

boxplot(joined_tibble$avg_hatecrimes_per_100k_fbi ~ joined_tibble$high_education, main = "Boxplot with hate crimes case of high education rate", ylab = "hate crimes case", xlab = "high education rate",col="blue")

We started thinking if these 3 models are statistically significant.

hate_model <- lm(new_hate_url$hate_crimes_combine ~ new_hate_url$share_unemployed_seasonal_100k )
summary(hate_model)

## 
## Call:
## lm(formula = new_hate_url$hate_crimes_combine ~ new_hate_url$share_unemployed_seasonal_100k)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8308 -0.8845 -0.2797  0.9434  2.9586 
## 
## Coefficients:
##                                               Estimate Std. Error t value
## (Intercept)                                  2.8991798  0.9601032   3.020
## new_hate_url$share_unemployed_seasonal_100k -0.0000924  0.0001866  -0.495
##                                             Pr(>|t|)   
## (Intercept)                                   0.0042 **
## new_hate_url$share_unemployed_seasonal_100k   0.6229   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.233 on 44 degrees of freedom
## Multiple R-squared:  0.005542,   Adjusted R-squared:  -0.01706 
## F-statistic: 0.2452 on 1 and 44 DF,  p-value: 0.6229

hate_model2 <- lm(new_hate_url$hate_crimes_combine ~ new_hate_url$median_household_income )
summary(hate_model2)

## 
## Call:
## lm(formula = new_hate_url$hate_crimes_combine ~ new_hate_url$median_household_income)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7398 -1.0429 -0.1728  1.0988  2.6949 
## 
## Coefficients:
##                                       Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                          5.057e-01  1.075e+00   0.470   0.6405  
## new_hate_url$median_household_income 3.535e-05  1.946e-05   1.816   0.0761 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.192 on 44 degrees of freedom
## Multiple R-squared:  0.06975,    Adjusted R-squared:  0.0486 
## F-statistic: 3.299 on 1 and 44 DF,  p-value: 0.07614

The p-value is 0.6229 which mean the unemployment model is not statistically significant

also, one more thing bring my attention which is R-squared is 0.005542 which is low.

The p-value of household income model is 0.07614 which is still show that the model is not statistically significant.

R-squared is is low for household income model too, it is 0.06975. However, base on the p-value and R-squared, household income model has a stronger relationship than employment although it is not good to use.

## `geom_smooth()` using formula 'y ~ x'

## `geom_smooth()` using formula 'y ~ x'

Also, the distribution is heavy tailed since there is few extreme prositve and negative residuals for both model.

ggplot(data = hate_model, aes(sample = .resid)) +
  stat_qq(colour = "blue", size = 1) + stat_qq_line(colour = "red", size = 1)

ggplot(data = hate_model2, aes(sample = .resid)) +
  stat_qq(colour = "blue", size = 1) + stat_qq_line(colour = "red", size = 1)

Which mean both unemployment and household model was not good.

How about education level model?

## 
## Call:
## lm(formula = PercentHighSchoolOrHigher ~ avg_hatecrimes_per_100k_fbi, 
##     data = joined_tibble)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.066658 -0.018123  0.005416  0.020261  0.045239 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 8.833e-01  8.043e-03  109.82   <2e-16 ***
## avg_hatecrimes_per_100k_fbi 5.583e+02  3.227e+02    1.73   0.0902 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02676 on 47 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.05985,    Adjusted R-squared:  0.03985 
## F-statistic: 2.992 on 1 and 47 DF,  p-value: 0.09023

## 
## Call:
## lm(formula = PercentBachelorsOrHigher ~ avg_hatecrimes_per_100k_fbi, 
##     data = joined_tibble)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.104531 -0.037579  0.003512  0.025589  0.101360 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    0.2776     0.0149  18.633   <2e-16 ***
## avg_hatecrimes_per_100k_fbi 1591.7418   597.7276   2.663   0.0106 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04957 on 47 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.1311, Adjusted R-squared:  0.1126 
## F-statistic: 7.092 on 1 and 47 DF,  p-value: 0.01057

## 
## Call:
## lm(formula = PercentHighSchoolOrHigher ~ hate_crimes_per_100k_splc, 
##     data = joined_tibble)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.062138 -0.014798  0.001038  0.015985  0.044748 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               8.764e-01  6.862e-03 127.717  < 2e-16 ***
## hate_crimes_per_100k_splc 6.145e+03  2.088e+03   2.942  0.00518 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0249 on 44 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.1644, Adjusted R-squared:  0.1454 
## F-statistic: 8.658 on 1 and 44 DF,  p-value: 0.00518

## 
## Call:
## lm(formula = PercentBachelorsOrHigher ~ hate_crimes_per_100k_splc, 
##     data = joined_tibble)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.110438 -0.036228 -0.005639  0.041764  0.110488 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               2.798e-01  1.369e-02  20.435  < 2e-16 ***
## hate_crimes_per_100k_splc 1.235e+04  4.168e+03   2.964  0.00489 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04969 on 44 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.1664, Adjusted R-squared:  0.1475 
## F-statistic: 8.784 on 1 and 44 DF,  p-value: 0.004889

Based on the regression line from the plot, the highest education level had the highest hate crime rate.
Also, the P-value is low, 0.004889, and it replicates. However, the data show a low Multiple R-squared: 0.1664, which means the education level has not much related to the crime rate.

ggplot(data = joined_tibble, aes(x = avg_hatecrimes_per_100k_fbi, y = PercentHighSchoolOrHigher)) +
  geom_jitter() + 
  geom_smooth(method = "lm")

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 2 rows containing non-finite values (stat_smooth).

## Warning: Removed 2 rows containing missing values (geom_point).

hist(m1$residuals)

qqnorm(m1$residuals)
qqline(m1$residuals)

ggplot(data = joined_tibble, aes(x = avg_hatecrimes_per_100k_fbi, y = PercentBachelorsOrHigher)) +
  geom_jitter() + 
  geom_smooth(method = "lm")

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 2 rows containing non-finite values (stat_smooth).

## Warning: Removed 2 rows containing missing values (geom_point).

hist(m2$residuals)

qqnorm(m2$residuals)
qqline(m2$residuals)

ggplot(data = joined_tibble, aes(x = hate_crimes_per_100k_splc, y = PercentHighSchoolOrHigher)) +
  geom_jitter() + 
  geom_smooth(method = "lm")

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 5 rows containing non-finite values (stat_smooth).

## Warning: Removed 5 rows containing missing values (geom_point).

hist(m3$residuals)

qqnorm(m3$residuals)
qqline(m3$residuals)

ggplot(data = joined_tibble, aes(x = hate_crimes_per_100k_splc, y = PercentBachelorsOrHigher)) +
  geom_jitter() + 
  geom_smooth(method = "lm")

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 5 rows containing non-finite values (stat_smooth).

## Warning: Removed 5 rows containing missing values (geom_point).

hist(m4$residuals)

qqnorm(m4$residuals)
qqline(m4$residuals)

Conclusion

Unlike our prediction that higher rates of hate crimes are tied to household income, unemployment rate and education level. Our analysis show that is not a strong relationship between unemployment rate, household income, education level and hate crimes case from all the plot and summary.

Although education and household income has a stronger relation than unemployment. However, there is still too much variability in the model and High p value shows that the model is not statistically significant.

Maybe hate crimes has higher chance that it is cause by what you see or read from media instead of the income/employment and education. it is just like article Hate Speech on Twitter Predicts Frequency of Real-life Hate Crimes point out.

https://www.nyu.edu/about/news-publications/news/2019/june/hate-speech-on-twitter-predicts-frequency-of-real-life-hate-crim.html https://fivethirtyeight.com/features/higher-rates-of-hate-crimes-are-tied-to-income-inequality/ https://www.census.gov/data/tables/2018/demo/education-attainment/cps-detailed-tables.html

607 final project

Jay Lee and Benson Toi

5/15/2022

Introduction

New York Times API

load libary and apikey in backend

look at the data and pull the useful info

Data

Data analysis

Conclusion