I hypothesize that there is a relationship between political party (variable: PartyRegistration) and gun ownership (variable: GunOwnership). I will be analyzing responses to the voter data dataset in order to test this hypothesis.
library(readr)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.2
data<-read_csv("/Users/rebeccagibble/Downloads/abbreviated voter dataset .csv")
## Parsed with column specification:
## cols(
## .default = col_character(),
## NumChildren = col_double(),
## Immigr_Economy_GiveTake = col_double(),
## ft_fem_2017 = col_double(),
## ft_immig_2017 = col_double(),
## ft_police_2017 = col_double(),
## ft_dem_2017 = col_double(),
## ft_rep_2017 = col_double(),
## ft_evang_2017 = col_double(),
## ft_muslim_2017 = col_double(),
## ft_jew_2017 = col_double(),
## ft_christ_2017 = col_double(),
## ft_gays_2017 = col_double(),
## ft_unions_2017 = col_double(),
## ft_altright_2017 = col_double(),
## ft_black_2017 = col_double(),
## ft_white_2017 = col_double(),
## ft_hisp_2017 = col_double()
## )
## See spec(...) for full column specifications.
head(data)
## # A tibble: 6 x 53
## gender race education familyincome children region urbancity Vote2012
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Female White 4-year Prefer not … No West Suburb Barack …
## 2 Female White Some Col… $60K-$69,999 No West Rural Ar… Mitt Ro…
## 3 Male White High Sch… $50K-$59,999 No Midwe… City Mitt Ro…
## 4 Male White Some Col… $70K-$79,999 No South City Barack …
## 5 Male White 4-year $40K-$49,999 No South Suburb Mitt Ro…
## 6 Female White 2-year $30K-$39,999 No West Suburb Barack …
## # … with 45 more variables: Vote2016 <chr>, TrumpSanders <chr>,
## # PartyRegistration <chr>, PartyIdentification <chr>,
## # PartyIdentification2 <chr>, PartyIdentification3 <chr>,
## # NewsPublicAffairs <chr>, DemPrimary <chr>, RepPrimary <chr>,
## # ImmigrantContributions <chr>, ImmigrantNaturalization <chr>,
## # ImmigrationShouldBe <chr>, Abortion <chr>, GayMarriage <chr>,
## # DeathPenalty <chr>, DeathPenaltyFreq <chr>, TaxWealthy <chr>,
## # Healthcare <chr>, GlobWarmExist <chr>, GlobWarmingSerious <chr>,
## # AffirmativeAction <chr>, Religion <chr>, ReligiousImportance <chr>,
## # ChurchAttendance <chr>, PrayerFrequency <chr>, NumChildren <dbl>,
## # areatype <chr>, GunOwnership <chr>, EconomyBetterWorse <chr>,
## # Immigr_Economy_GiveTake <dbl>, ft_fem_2017 <dbl>, ft_immig_2017 <dbl>,
## # ft_police_2017 <dbl>, ft_dem_2017 <dbl>, ft_rep_2017 <dbl>,
## # ft_evang_2017 <dbl>, ft_muslim_2017 <dbl>, ft_jew_2017 <dbl>,
## # ft_christ_2017 <dbl>, ft_gays_2017 <dbl>, ft_unions_2017 <dbl>,
## # ft_altright_2017 <dbl>, ft_black_2017 <dbl>, ft_white_2017 <dbl>,
## # ft_hisp_2017 <dbl>
Select variables necessary for analysis Filter to keep only those categories of interest in your analysis Store prepared data in a new object
data<-data%>%
select(PartyRegistration, GunOwnership)%>%
filter(PartyRegistration %in% c("Democrat","Republican"),
GunOwnership %in% c("No Gun in Household","Gun in Household"))
The table below shows the actual % of responses given for each category of GunOwnership.
table(data$PartyRegistration)%>%
prop.table()%>%
round(2)
##
## Democrat Republican
## 0.57 0.43
The table below shows the actual % of responses given for each category of PartyRegistration.
table(data$GunOwnership)%>%
prop.table()%>%
round(2)
##
## Gun in Household No Gun in Household
## 0.4 0.6
Below are the values that we would expect to observe in a crosstab if the two variables were completely independent of eachother. This is what we might consider the “null hypothesis”.
VariableA//Response1 0.57 * VariableB Response1 0.4 = 0.228 or 22.8%
VariableA//Response2 0.43 * VariableB Response1 0.4 = 0.172 or 17.2%
VariableA//Response1 0.57 * VariableB Response 0.6 = 0.342 or 34.2%
VariableA//Response2 0.43 * VariableB Response 0.6 = 0.258 or 25.8%
My scenario only has 2 cateorgies per variable, hence why these are the only calculations.
The table below shows the actual % of responses for each category combination. A crosstab showing table %. These values are very different from the expected observations from the null hypothesis.
data%>%
group_by(PartyRegistration,GunOwnership)%>%
summarize(n=n())%>%
mutate(percent=n/sum(n))
## `summarise()` regrouping output by 'PartyRegistration' (override with `.groups` argument)
## # A tibble: 4 x 4
## # Groups: PartyRegistration [2]
## PartyRegistration GunOwnership n percent
## <chr> <chr> <int> <dbl>
## 1 Democrat Gun in Household 519 0.298
## 2 Democrat No Gun in Household 1225 0.702
## 3 Republican Gun in Household 702 0.525
## 4 Republican No Gun in Household 636 0.475
The table below shows row% to highlight the relationship of interest. My independent variable is represented in the rows of my table, so I calculated row %.
table(data$PartyRegistration, data$GunOwnership)%>%
prop.table(1)
##
## Gun in Household No Gun in Household
## Democrat 0.2975917 0.7024083
## Republican 0.5246637 0.4753363
data%>%
group_by(PartyRegistration, GunOwnership)%>%
summarize(n=n())%>%
mutate(percent=n/sum(n))%>%
filter(GunOwnership=="Gun in Household")%>%
ggplot()+
geom_col(aes(x=PartyRegistration, y=percent, fill=GunOwnership))
## `summarise()` regrouping output by 'PartyRegistration' (override with `.groups` argument)
Based upon the table and visualization we can see that Republicans are more likely to have a gun in their household than Democrats are. Approximately 30% of Democrats have a gun in the house whereas approximately 50% of Republicans do based off of this data.
Below are the results of the chi-squared test for independence. This tells us whether there is a statistically significant relationship between the variables.
The results below indicate that there is a statistically significant relationship between Gun Ownership and Party Registration.
chisq.test(data$PartyRegistration, data$GunOwnership)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data$PartyRegistration and data$GunOwnership
## X-squared = 162.24, df = 1, p-value < 2.2e-16
There is a statistically significant relationship between registered political parties (Democrat and Republican) and whether or not they own a gun. We know this because the p-value is < 0.05. In fact, the p-value is way smaller than 0.05 which means there is a strong significance.