Rebecca Gibble Chi-Squared Test for Independence

Research Question

I hypothesize that there is a relationship between political party (variable: PartyRegistration) and gun ownership (variable: GunOwnership). I will be analyzing responses to the voter data dataset in order to test this hypothesis.

Import Data

library(readr)
library(dplyr)

## Warning: package 'dplyr' was built under R version 3.6.2

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.6.2

data<-read_csv("/Users/rebeccagibble/Downloads/abbreviated voter dataset .csv")

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   NumChildren = col_double(),
##   Immigr_Economy_GiveTake = col_double(),
##   ft_fem_2017 = col_double(),
##   ft_immig_2017 = col_double(),
##   ft_police_2017 = col_double(),
##   ft_dem_2017 = col_double(),
##   ft_rep_2017 = col_double(),
##   ft_evang_2017 = col_double(),
##   ft_muslim_2017 = col_double(),
##   ft_jew_2017 = col_double(),
##   ft_christ_2017 = col_double(),
##   ft_gays_2017 = col_double(),
##   ft_unions_2017 = col_double(),
##   ft_altright_2017 = col_double(),
##   ft_black_2017 = col_double(),
##   ft_white_2017 = col_double(),
##   ft_hisp_2017 = col_double()
## )

## See spec(...) for full column specifications.

head(data)

## # A tibble: 6 x 53
##   gender race  education familyincome children region urbancity Vote2012
##   <chr>  <chr> <chr>     <chr>        <chr>    <chr>  <chr>     <chr>   
## 1 Female White 4-year    Prefer not … No       West   Suburb    Barack …
## 2 Female White Some Col… $60K-$69,999 No       West   Rural Ar… Mitt Ro…
## 3 Male   White High Sch… $50K-$59,999 No       Midwe… City      Mitt Ro…
## 4 Male   White Some Col… $70K-$79,999 No       South  City      Barack …
## 5 Male   White 4-year    $40K-$49,999 No       South  Suburb    Mitt Ro…
## 6 Female White 2-year    $30K-$39,999 No       West   Suburb    Barack …
## # … with 45 more variables: Vote2016 <chr>, TrumpSanders <chr>,
## #   PartyRegistration <chr>, PartyIdentification <chr>,
## #   PartyIdentification2 <chr>, PartyIdentification3 <chr>,
## #   NewsPublicAffairs <chr>, DemPrimary <chr>, RepPrimary <chr>,
## #   ImmigrantContributions <chr>, ImmigrantNaturalization <chr>,
## #   ImmigrationShouldBe <chr>, Abortion <chr>, GayMarriage <chr>,
## #   DeathPenalty <chr>, DeathPenaltyFreq <chr>, TaxWealthy <chr>,
## #   Healthcare <chr>, GlobWarmExist <chr>, GlobWarmingSerious <chr>,
## #   AffirmativeAction <chr>, Religion <chr>, ReligiousImportance <chr>,
## #   ChurchAttendance <chr>, PrayerFrequency <chr>, NumChildren <dbl>,
## #   areatype <chr>, GunOwnership <chr>, EconomyBetterWorse <chr>,
## #   Immigr_Economy_GiveTake <dbl>, ft_fem_2017 <dbl>, ft_immig_2017 <dbl>,
## #   ft_police_2017 <dbl>, ft_dem_2017 <dbl>, ft_rep_2017 <dbl>,
## #   ft_evang_2017 <dbl>, ft_muslim_2017 <dbl>, ft_jew_2017 <dbl>,
## #   ft_christ_2017 <dbl>, ft_gays_2017 <dbl>, ft_unions_2017 <dbl>,
## #   ft_altright_2017 <dbl>, ft_black_2017 <dbl>, ft_white_2017 <dbl>,
## #   ft_hisp_2017 <dbl>

Prepare Data

Select variables necessary for analysis Filter to keep only those categories of interest in your analysis Store prepared data in a new object

data<-data%>%
  select(PartyRegistration, GunOwnership)%>%
  filter(PartyRegistration %in% c("Democrat","Republican"),
         GunOwnership %in% c("No Gun in Household","Gun in Household"))

Null Hypothesis

The table below shows the actual % of responses given for each category of GunOwnership.

table(data$PartyRegistration)%>%
  prop.table()%>%
  round(2)

## 
##   Democrat Republican 
##       0.57       0.43

The table below shows the actual % of responses given for each category of PartyRegistration.

table(data$GunOwnership)%>%
  prop.table()%>%
  round(2)

## 
##    Gun in Household No Gun in Household 
##                 0.4                 0.6

Below are the values that we would expect to observe in a crosstab if the two variables were completely independent of eachother. This is what we might consider the “null hypothesis”.

VariableA//Response1 0.57 * VariableB Response1 0.4 = 0.228 or 22.8%
VariableA//Response2 0.43 * VariableB Response1 0.4 = 0.172 or 17.2%
VariableA//Response1 0.57 * VariableB Response 0.6 = 0.342 or 34.2%
VariableA//Response2 0.43 * VariableB Response 0.6 = 0.258 or 25.8%

My scenario only has 2 cateorgies per variable, hence why these are the only calculations.

Actual Observations

The table below shows the actual % of responses for each category combination. A crosstab showing table %. These values are very different from the expected observations from the null hypothesis.

data%>%
  group_by(PartyRegistration,GunOwnership)%>%
  summarize(n=n())%>%
  mutate(percent=n/sum(n))

## `summarise()` regrouping output by 'PartyRegistration' (override with `.groups` argument)

## # A tibble: 4 x 4
## # Groups:   PartyRegistration [2]
##   PartyRegistration GunOwnership            n percent
##   <chr>             <chr>               <int>   <dbl>
## 1 Democrat          Gun in Household      519   0.298
## 2 Democrat          No Gun in Household  1225   0.702
## 3 Republican        Gun in Household      702   0.525
## 4 Republican        No Gun in Household   636   0.475

Relationship of Interest: Table

The table below shows row% to highlight the relationship of interest. My independent variable is represented in the rows of my table, so I calculated row %.

table(data$PartyRegistration, data$GunOwnership)%>%
  prop.table(1)

##             
##              Gun in Household No Gun in Household
##   Democrat          0.2975917           0.7024083
##   Republican        0.5246637           0.4753363

Relationship of Interest: Visualization

data%>%
  group_by(PartyRegistration, GunOwnership)%>%
  summarize(n=n())%>%
  mutate(percent=n/sum(n))%>%
  filter(GunOwnership=="Gun in Household")%>%
  ggplot()+
  geom_col(aes(x=PartyRegistration, y=percent, fill=GunOwnership))

## `summarise()` regrouping output by 'PartyRegistration' (override with `.groups` argument)

Based upon the table and visualization we can see that Republicans are more likely to have a gun in their household than Democrats are. Approximately 30% of Democrats have a gun in the house whereas approximately 50% of Republicans do based off of this data.

Chi-Squared Test

Below are the results of the chi-squared test for independence. This tells us whether there is a statistically significant relationship between the variables.

The results below indicate that there is a statistically significant relationship between Gun Ownership and Party Registration.

chisq.test(data$PartyRegistration, data$GunOwnership)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data$PartyRegistration and data$GunOwnership
## X-squared = 162.24, df = 1, p-value < 2.2e-16

There is a statistically significant relationship between registered political parties (Democrat and Republican) and whether or not they own a gun. We know this because the p-value is < 0.05. In fact, the p-value is way smaller than 0.05 which means there is a strong significance.