Data 606 - Project Proposal

Data Preparation

data <- read_csv("https://raw.githubusercontent.com/baroncurtin2/data606/master/project/data/hate_crimes.csv")

## Parsed with column specification:
## cols(
##   state = col_character(),
##   median_household_income = col_integer(),
##   share_unemployed_seasonal = col_double(),
##   share_population_in_metro_areas = col_double(),
##   share_population_with_high_school_degree = col_double(),
##   share_non_citizen = col_double(),
##   share_white_poverty = col_double(),
##   gini_index = col_double(),
##   share_non_white = col_double(),
##   share_voters_voted_trump = col_double(),
##   hate_crimes_per_100k_splc = col_double(),
##   avg_hatecrimes_per_100k_fbi = col_double()
## )

region_mapping <- read_csv("https://raw.githubusercontent.com/baroncurtin2/data606/master/project/data/region_mapping.csv") %>%
  # convert all headers to lowercase
  rename_all(funs(str_to_lower(.)))

## Parsed with column specification:
## cols(
##   State = col_character(),
##   `State Code` = col_character(),
##   Region = col_character(),
##   Division = col_character()
## )

Adding Qualitative Variables

data %<>%
  left_join(region_mapping, by = "state")

Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.

Are there more annual hate crimes per 100,000 population in areas where the greater share of the population voted for Trump in 2016?

Cases

What are the cases, and how many are there?

There are 51 cases, all 50 US states and the District of Columbia. Each case has relevant statistics on hate crimes and vote results from the 2016 election.

Data collection

Describe the method of data collection.

The data collection was simple. The data source was posted on FiveThirtyEight’s GitHub in a CSV format. That data was gathered from numerous sources including the Kaiser Family Foundeation, Census Bureau, United States Election Project, Souther Poverty Law Center, and the FBI.

Type of study

What type of study is this (observational/experiment)?

This is an observational study as there it is just analyzing data on events that have occured.

Data Source

If you collected the data, state self-collected. If not, provide a citation/link.

Data
Region Mapping

Response

What is the response variable, and what type is it (numerical/categorical)?

The response variable is average annual hate crimes per 100,000 population and it is a numerical variable.

Explanatory

What is the explanatory variable, and what type is it (numerical/categorival)?

The explanatory variable is the share of the population in the state that voted for Trump in 2016. This is also a numerical variable.

Relevant summary statistics

Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

summary(data)

##     state           median_household_income share_unemployed_seasonal
##  Length:51          Min.   :35521           Min.   :0.02800          
##  Class :character   1st Qu.:48657           1st Qu.:0.04200          
##  Mode  :character   Median :54916           Median :0.05100          
##                     Mean   :55224           Mean   :0.04957          
##                     3rd Qu.:60719           3rd Qu.:0.05750          
##                     Max.   :76165           Max.   :0.07300          
##                                                                      
##  share_population_in_metro_areas share_population_with_high_school_degree
##  Min.   :0.3100                  Min.   :0.7990                          
##  1st Qu.:0.6300                  1st Qu.:0.8405                          
##  Median :0.7900                  Median :0.8740                          
##  Mean   :0.7502                  Mean   :0.8691                          
##  3rd Qu.:0.8950                  3rd Qu.:0.8980                          
##  Max.   :1.0000                  Max.   :0.9180                          
##                                                                          
##  share_non_citizen share_white_poverty   gini_index     share_non_white 
##  Min.   :0.01000   Min.   :0.04000     Min.   :0.4190   Min.   :0.0600  
##  1st Qu.:0.03000   1st Qu.:0.07500     1st Qu.:0.4400   1st Qu.:0.1950  
##  Median :0.04500   Median :0.09000     Median :0.4540   Median :0.2800  
##  Mean   :0.05458   Mean   :0.09176     Mean   :0.4538   Mean   :0.3157  
##  3rd Qu.:0.08000   3rd Qu.:0.10000     3rd Qu.:0.4665   3rd Qu.:0.4200  
##  Max.   :0.13000   Max.   :0.17000     Max.   :0.5320   Max.   :0.8100  
##  NA's   :3                                                              
##  share_voters_voted_trump hate_crimes_per_100k_splc
##  Min.   :0.040            Min.   :0.06745          
##  1st Qu.:0.415            1st Qu.:0.14271          
##  Median :0.490            Median :0.22620          
##  Mean   :0.490            Mean   :0.30409          
##  3rd Qu.:0.575            3rd Qu.:0.35694          
##  Max.   :0.700            Max.   :1.52230          
##                           NA's   :4                
##  avg_hatecrimes_per_100k_fbi  state code           region         
##  Min.   : 0.2669             Length:51          Length:51         
##  1st Qu.: 1.2931             Class :character   Class :character  
##  Median : 1.9871             Mode  :character   Mode  :character  
##  Mean   : 2.3676                                                  
##  3rd Qu.: 3.1843                                                  
##  Max.   :10.9535                                                  
##  NA's   :1                                                        
##    division        
##  Length:51         
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
##

one <- data %>%
  select(state, median_household_income, share_voters_voted_trump) %>%
  arrange(desc(median_household_income)) %>%
  head(5)

# top 5 median incomes
kable(one, "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))

state	median_household_income	share_voters_voted_trump
Maryland	76165	0.35
New Hampshire	73397	0.47
Hawaii	71223	0.30
Connecticut	70161	0.41
District of Columbia	68277	0.04

Visualizations

ggplot(data, aes(x = share_voters_voted_trump, y = avg_hatecrimes_per_100k_fbi, col = region)) +
  geom_point(aes(size = avg_hatecrimes_per_100k_fbi), alpha = .6, shape = 16) +
  geom_abline()

Linear Model

model <- lm(avg_hatecrimes_per_100k_fbi ~ share_voters_voted_trump, data = data)
summary(model)

## 
## Call:
## lm(formula = avg_hatecrimes_per_100k_fbi ~ share_voters_voted_trump, 
##     data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.1081 -1.1586 -0.0971  0.8863  5.2238 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                6.0260     0.9281   6.493 4.41e-08 ***
## share_voters_voted_trump  -7.4087     1.8300  -4.049 0.000187 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.495 on 48 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.2546, Adjusted R-squared:  0.239 
## F-statistic: 16.39 on 1 and 48 DF,  p-value: 0.0001869

Multiplot

p1 <- data %>%
  filter(region == "Northeast") %>%
  ggplot(aes(x = share_voters_voted_trump, y = avg_hatecrimes_per_100k_fbi)) +
  geom_point(aes(size = avg_hatecrimes_per_100k_fbi), alpha = .6, shape = 16) +
  geom_abline()

p2 <- data %>%
  filter(region == "South") %>%
  ggplot(aes(x = share_voters_voted_trump, y = avg_hatecrimes_per_100k_fbi)) +
  geom_point(aes(size = avg_hatecrimes_per_100k_fbi), alpha = .6, shape = 16) +
  geom_abline()

p3 <- data %>%
  filter(region == "Midwest") %>%
  ggplot(aes(x = share_voters_voted_trump, y = avg_hatecrimes_per_100k_fbi)) +
  geom_point(aes(size = avg_hatecrimes_per_100k_fbi), alpha = .6, shape = 16) +
  geom_abline()

p4 <- data %>%
  filter(region == "West") %>%
  ggplot(aes(x = share_voters_voted_trump, y = avg_hatecrimes_per_100k_fbi)) +
  geom_point(aes(size = avg_hatecrimes_per_100k_fbi), alpha = .6, shape = 16) +
  geom_abline()

p1

p2

p3

p4

There does appear to be fairly weak positive relationship across all of the regions.