Data Prep
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning in file(con, "r"): cannot open file '/var/db/timezone/zoneinfo/
## +VERSION': No such file or directory
Abbreviated_Voter_Dataset_Labeled <- read_csv("/Users/chelsyrodriguez/Downloads/Abbreviated Voter Dataset Labeled.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_character(),
## NumChildren = col_double(),
## Immigr_Economy_GiveTake = col_double(),
## ft_fem_2017 = col_double(),
## ft_immig_2017 = col_double(),
## ft_police_2017 = col_double(),
## ft_dem_2017 = col_double(),
## ft_rep_2017 = col_double(),
## ft_evang_2017 = col_double(),
## ft_muslim_2017 = col_double(),
## ft_jew_2017 = col_double(),
## ft_christ_2017 = col_double(),
## ft_gays_2017 = col_double(),
## ft_unions_2017 = col_double(),
## ft_altright_2017 = col_double(),
## ft_black_2017 = col_double(),
## ft_white_2017 = col_double(),
## ft_hisp_2017 = col_double()
## )
## ℹ Use `spec()` for the full column specifications.
Response summary
I hypothesize there is a relationship between areatype and GunOwnership. I think this is important to study because we want to identify if respondents are capable or are even allowed to own a gun depending on where they currently reside in. To be specific, my analysis will investigate if the area the respondent resides in (DV) is dependent upon owning a gun in their household (IV).
IV Response Summary
table(Abbreviated_Voter_Dataset_Labeled$GunOwnership) %>%
prop.table()
##
## Dont Know Gun in Household No Gun in Household
## 0.06677648 0.39039534 0.54282818
*DV Response Summary
table(Abbreviated_Voter_Dataset_Labeled$areatype) %>%
prop.table()
##
## City Other Rural Area Suburb Town
## 0.284652218 0.006426411 0.186869960 0.378276210 0.143775202
Primary Investigation
*Expected values
chisq.test(Abbreviated_Voter_Dataset_Labeled$areatype, Abbreviated_Voter_Dataset_Labeled$GunOwnership) [7]
## Warning in chisq.test(Abbreviated_Voter_Dataset_Labeled$areatype,
## Abbreviated_Voter_Dataset_Labeled$GunOwnership): Chi-squared approximation may
## be incorrect
## $expected
##
## Dont Know Gun in Household No Gun in Household
## City 149.60910 872.52938 1212.86152
## Other 3.34696 19.51967 27.13337
## Rural Area 97.32959 567.63209 789.03832
## Suburb 198.67552 1158.68779 1610.63669
## Town 75.03883 437.63107 608.33010
*Observed Values
chisq.test(Abbreviated_Voter_Dataset_Labeled$areatype, Abbreviated_Voter_Dataset_Labeled$GunOwnership) [6]
## Warning in chisq.test(Abbreviated_Voter_Dataset_Labeled$areatype,
## Abbreviated_Voter_Dataset_Labeled$GunOwnership): Chi-squared approximation may
## be incorrect
## $observed
##
## Dont Know Gun in Household No Gun in Household
## City 144 684 1407
## Other 0 21 29
## Rural Area 122 841 491
## Suburb 173 1053 1742
## Town 85 457 579
I’ll be comparing respondents that live in the city and respondents that live in the suburb. I expected 873 respondents that live in the city own a gun in their household, however I observed 684 respondents that reside in the city own a gun. *I expected 1,159 respondents that live in the suburbs own a gun in their household, howeverI observed 1,053 respondents that reside in the suburbs own a gun.
Abbreviated_Voter_Dataset_Labeled %>%
group_by(areatype,GunOwnership) %>%
summarize(n=n()) %>%
mutate(percent=n/sum(n)) %>%
ggplot()+
geom_col(aes(x=areatype,y=percent,fill=GunOwnership))
## `summarise()` has grouped output by 'areatype'. You can override using the `.groups` argument.

Chi-Square Test
chisq.test(Abbreviated_Voter_Dataset_Labeled$areatype, Abbreviated_Voter_Dataset_Labeled$GunOwnership)
## Warning in chisq.test(Abbreviated_Voter_Dataset_Labeled$areatype,
## Abbreviated_Voter_Dataset_Labeled$GunOwnership): Chi-squared approximation may
## be incorrect
##
## Pearson's Chi-squared test
##
## data: Abbreviated_Voter_Dataset_Labeled$areatype and Abbreviated_Voter_Dataset_Labeled$GunOwnership
## X-squared = 353.36, df = 8, p-value < 2.2e-16
According to the results of the chi-square test, the p-value identified the scientific notation as < 2.2e-16 which is a smaller value than .05. There is a statistically significant relationship between areatype and GunOwnership. This results reject the null hypothesis.