DATA 333 Analysis of Continuous Data

Variable Selection and Research Question

I will be observing the relationship between race and feelings towards police in 2017 in the voter dataset. Race will be my independent variable, and I will be specifically looking at whether Black responders and White responders had different feelings towards police, which will be my dependent variable. I hypothesize that White responders will overall have better feelings towards police than Black responders.

Data Prep

library(readr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
voterdata <- read.csv("/Users/Nazija/Downloads/Abbreviated Voter Dataset Labeled.csv")
data <- voterdata%>%
  select(race,ft_police_2017)%>%
  filter(race == c("White","Black"))
head(data)

##    race ft_police_2017
## 1 White             76
## 2 White             78
## 3 White             94
## 4 White             60
## 5 White             NA
## 6 White             98

Comparison of Means

Table

data%>%
  group_by(race)%>%
  summarize(avg = mean(ft_police_2017, na.rm = TRUE))

## `summarise()` ungrouping output (override with `.groups` argument)

## # A tibble: 2 x 2
##   race    avg
##   <fct> <dbl>
## 1 Black  59.5
## 2 White  77.6

Visualization

data%>%
  group_by(race)%>%
  summarize(avg = mean(ft_police_2017, na.rm = TRUE))%>%
  ggplot()+
  geom_col(aes(x = race, y = avg, fill = race))

## `summarise()` ungrouping output (override with `.groups` argument)

Interpretation

The difference between the mean average feelings towards police shows that overall, Black respondents feel less positive towards police that White respondents, since the average rating for Black respondents is lower.

Comparison of Distributions

Visualization

data%>%
  ggplot()+
  geom_histogram((aes(x = ft_police_2017, fill = race)))+
  facet_wrap(~race)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 1356 rows containing non-finite values (stat_bin).

##### Only Black Voters

data%>%
  filter(race == "Black")%>%
  ggplot()+
  geom_histogram((aes(x = ft_police_2017)))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 145 rows containing non-finite values (stat_bin).

Only White Voters

data%>%
  filter(race == "White")%>%
  ggplot()+
  geom_histogram((aes(x = ft_police_2017)))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 1211 rows containing non-finite values (stat_bin).

Interpretation

The histograms show that the most popular rating for White voters was 100, while for Black voters it was far more distributed, with the most popular rating at 50. This shows that on average, White voters rate police more highly than Black voters do.

Sampling Distribution & T-test

Sampling Distribution

Black_data<-data%>%
  filter(race == "Black")
White_data<-data%>%
  filter(race == "White")

Black_sampling<-replicate(10000,
          sample(Black_data$ft_police_2017, 40)%>%
            mean(na.rm = TRUE))%>%
  data.frame()%>%
  rename("mean" = 1)

White_sampling<-replicate(10000,
          sample(White_data$ft_police_2017, 40)%>%
            mean(na.rm = TRUE))%>%
  data.frame()%>%
  rename("mean" = 1)

Black_sampling%>%
  ggplot()+
  geom_histogram(aes(x = mean), fill = "blue")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

White_sampling%>%
  ggplot()+
  geom_histogram(aes(x = mean), fill = "red")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot()+
  geom_histogram(data = Black_sampling, aes(x = mean), fill = "blue")+
  geom_histogram(data = White_sampling, aes(x = mean), fill = "red")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

T-test

data%>%
  summarize(ft_police_2017 = mean(ft_police_2017, na.rm = TRUE))

##   ft_police_2017
## 1         75.989

t.test(ft_police_2017~race, data = data)

## 
##  Welch Two Sample t-test
## 
## data:  ft_police_2017 by race
## t = -8.3238, df = 216.41, p-value = 9.617e-15
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -22.42499 -13.83825
## sample estimates:
## mean in group Black mean in group White 
##            59.46114            77.59276

The null hypothesis would be that there is no difference in the mean value for feelings towards police between the two races. However, since there is a difference in the mean value, this would back up the alternative hypothesis. The t-test shows that there is a statistically significant difference between Black voters and White voters in their men feelings towards police.