Load necessary packages:
library(ggplot2)
library(dplyr)
library(broom)
library(knitr)
library(fivethirtyeight)
# Create a categorical variable of Trump support level based on the numerical
# variable of share of 2016 U.S. presidential voters who voted for Donald Trump.
# Each of the low, medium, and high levels have roughly one third of states.
hate_crimes <- hate_crimes %>%
mutate(
trump_support_level = cut_number(share_vote_trump, 3, labels=c("low", "medium", "high"))
)trump_support_level.hate_crimes.
?hate_crimesView(hate_crimes) and explore the datasetWrite your answers here:
Let’s model the relationship, both visually and via regression, between:
Create a visual model of this data (do not forget to include appropriate axes labels and title):
# Write code to plot this model below:
ggplot(hate_crimes, aes(x=trump_support_level, y=hate_crimes_per_100k_splc)) +
geom_boxplot() +
labs(x="Trump Support Level", y="Hate crimes per 100K individuals in the 10 days after the 2016 US
election)", title="Hate crimes vs. Trump support")Output the regression table and interpret the results
# Write code to generate a regression table below:
lm( hate_crimes_per_100k_splc~ trump_support_level , hate_crimes) %>%
tidy()## term estimate std.error statistic p.value
## 1 (Intercept) 0.4601833 0.05294126 8.692337 4.182015e-11
## 2 trump_support_levelmedium -0.2378850 0.07852458 -3.029433 4.091171e-03
## 3 trump_support_levelhigh -0.2691408 0.08003966 -3.362593 1.606977e-03
From the regression table, I found a negative relationship between the level of trump support and the hate crimes rate. In other words, the higher the trump support level is, the lower the hate crimes rate.
Write you answers here:
Create two separate visualizations (do not forget to include appropriate axes labels and title) and run two separate simple linear regressions (using only one predictor) for \(y\): Hate crimes per 100K individuals in the 10 days after the 2016 US election with
and interpret any slope values.
# Write code to plot this model below:
ggplot(hate_crimes, aes(x=gini_index, y=hate_crimes_per_100k_splc)) +
geom_point() +
labs(x="The Gini Index", y="Hate crimes per 100K individuals in the 10 days after the 2016 US
election ") +
geom_smooth(method="lm", se=FALSE)# Write code to generate a regression table below:
lm( hate_crimes_per_100k_splc~ gini_index , hate_crimes) %>%
tidy()## term estimate std.error statistic p.value
## 1 (Intercept) -1.527463 0.7833043 -1.950025 0.05741966
## 2 gini_index 4.020510 1.7177215 2.340606 0.02374447
# Write code to plot this model below:
ggplot(hate_crimes, aes(x=share_pop_hs, y=hate_crimes_per_100k_splc)) +
geom_point() +
labs(x="Share of adults with a high-school degree", y="Hate crimes per 100K individuals in the 10 days after the 2016 US
election ", title="Hate crimes vs. High school education level") +
geom_smooth(method="lm", se=FALSE)# Write code to generate a regression table below:
lm( hate_crimes_per_100k_splc~ share_pop_hs , hate_crimes) %>%
tidy()## term estimate std.error statistic p.value
## 1 (Intercept) -1.705274 0.9228076 -1.847919 0.07119297
## 2 share_pop_hs 2.320228 1.0647852 2.179057 0.03460305
Run a multiple regression for
an interpret both slope coefficients
# Write code to generate a regression table below. No need for a visualization
# here:
lm( hate_crimes_per_100k_splc~ share_pop_hs+gini_index , hate_crimes) %>%
tidy()## term estimate std.error statistic p.value
## 1 (Intercept) -8.211991 1.418930 -5.787453 6.921925e-07
## 2 share_pop_hs 5.255865 1.002924 5.240545 4.340152e-06
## 3 gini_index 8.702370 1.629755 5.339678 3.117468e-06
Write your interpretation below:
Create two new data frames:
hate_crimes_no_new_york: the hate_crimes dataset without New Yorkhate_crimes_no_DC: the hate_crimes data without the District of ColumbiaRepeat the multiple regression from Question 3 and indicate the removal of which state from the dataset has a bigger impact on the analysis. Why do you think this is?
# Write code to generate regression tables below:
hate_crimes_no_new_york <- hate_crimes %>%
filter(state != "New York")
lm( hate_crimes_per_100k_splc~ share_pop_hs+gini_index , hate_crimes_no_new_york) %>%
tidy()## term estimate std.error statistic p.value
## 1 (Intercept) -8.655118 1.448044 -5.977109 3.947701e-07
## 2 share_pop_hs 5.399269 1.001034 5.393695 2.763474e-06
## 3 gini_index 9.414876 1.706481 5.517131 1.833887e-06
hate_crimes_no_DC <- hate_crimes %>%
filter(state != "District of Columbia")
lm( hate_crimes_per_100k_splc~ share_pop_hs+gini_index , hate_crimes_no_DC) %>%
tidy()## term estimate std.error statistic p.value
## 1 (Intercept) -3.989258 1.5083121 -2.644849 0.011365187
## 2 share_pop_hs 3.284001 0.9432964 3.481409 0.001157652
## 3 gini_index 3.135572 1.8352715 1.708506 0.094752663
Write your response here: