I hypothesize there is a relationship between Gender and GayMarriage. Which is how male and female deferently responded to favoring or opositing GayMarriage. I will do following steps to test this hypothesis. Here, Gender is the indepdent variable and GayMarriage is the dependent variable.
Loading the necessary packages.
library(readr)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Importing data into R and named it Health_Data.
Voter_Data = read_csv("/Users/sakif/Downloads/Abbreviated Voter Dataset Labeled.csv")
##
## ── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────
## cols(
## .default = col_character(),
## NumChildren = col_double(),
## Immigr_Economy_GiveTake = col_double(),
## ft_fem_2017 = col_double(),
## ft_immig_2017 = col_double(),
## ft_police_2017 = col_double(),
## ft_dem_2017 = col_double(),
## ft_rep_2017 = col_double(),
## ft_evang_2017 = col_double(),
## ft_muslim_2017 = col_double(),
## ft_jew_2017 = col_double(),
## ft_christ_2017 = col_double(),
## ft_gays_2017 = col_double(),
## ft_unions_2017 = col_double(),
## ft_altright_2017 = col_double(),
## ft_black_2017 = col_double(),
## ft_white_2017 = col_double(),
## ft_hisp_2017 = col_double()
## )
## ℹ Use `spec()` for the full column specifications.
head(Voter_Data)
## # A tibble: 6 x 53
## gender race education familyincome children region urbancity Vote2012
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Female White 4-year Prefer not … No West Suburb Barack …
## 2 Female White Some Col… $60K-$69,999 No West Rural Ar… Mitt Ro…
## 3 Male White High Sch… $50K-$59,999 No Midwe… City Mitt Ro…
## 4 Male White Some Col… $70K-$79,999 No South City Barack …
## 5 Male White 4-year $40K-$49,999 No South Suburb Mitt Ro…
## 6 Female White 2-year $30K-$39,999 No West Suburb Barack …
## # … with 45 more variables: Vote2016 <chr>, TrumpSanders <chr>,
## # PartyRegistration <chr>, PartyIdentification <chr>,
## # PartyIdentification2 <chr>, PartyIdentification3 <chr>,
## # NewsPublicAffairs <chr>, DemPrimary <chr>, RepPrimary <chr>,
## # ImmigrantContributions <chr>, ImmigrantNaturalization <chr>,
## # ImmigrationShouldBe <chr>, Abortion <chr>, GayMarriage <chr>,
## # DeathPenalty <chr>, DeathPenaltyFreq <chr>, TaxWealthy <chr>,
## # Healthcare <chr>, GlobWarmExist <chr>, GlobWarmingSerious <chr>,
## # AffirmativeAction <chr>, Religion <chr>, ReligiousImportance <chr>,
## # ChurchAttendance <chr>, PrayerFrequency <chr>, NumChildren <dbl>,
## # areatype <chr>, GunOwnership <chr>, EconomyBetterWorse <chr>,
## # Immigr_Economy_GiveTake <dbl>, ft_fem_2017 <dbl>, ft_immig_2017 <dbl>,
## # ft_police_2017 <dbl>, ft_dem_2017 <dbl>, ft_rep_2017 <dbl>,
## # ft_evang_2017 <dbl>, ft_muslim_2017 <dbl>, ft_jew_2017 <dbl>,
## # ft_christ_2017 <dbl>, ft_gays_2017 <dbl>, ft_unions_2017 <dbl>,
## # ft_altright_2017 <dbl>, ft_black_2017 <dbl>, ft_white_2017 <dbl>,
## # ft_hisp_2017 <dbl>
Identifing two categorical variable named Gender and GayMarriage. We can find the relationship between this both variables and named it as Gay_Marriage. It shows how both male and female response about GayMarriage by favoring it or oppositing it.
Gay_Marriage = Voter_Data %>%
select(gender, GayMarriage) %>%
rename(Gender = gender) %>%
filter(GayMarriage %in% c("Favor", "Oppose", "Not sure"))
Gay_Marriage
## # A tibble: 7,971 x 2
## Gender GayMarriage
## <chr> <chr>
## 1 Female Favor
## 2 Female Oppose
## 3 Male Favor
## 4 Male Favor
## 5 Male Oppose
## 6 Female Favor
## 7 Female Oppose
## 8 Male Oppose
## 9 Male Favor
## 10 Female Favor
## # … with 7,961 more rows
We got 2 types of variable. So here we see how responded response them.
Here is the response of the independent variable Gender.
table(Gay_Marriage$Gender) %>%
prop.table() %>%
round(2)
##
## Female Male
## 0.51 0.49
Here is the response of the dependent variable GayMarriage.
table(Gay_Marriage$GayMarriage) %>%
prop.table() %>%
round(2)
##
## Favor Not sure Oppose
## 0.45 0.12 0.43
This table is showing the quantity of response for each category combination of expected observation.
chisq.test(Gay_Marriage$GayMarriage, Gay_Marriage$Gender)[7]
## $expected
## Gay_Marriage$Gender
## Gay_Marriage$GayMarriage Female Male
## Favor 1825.1232 1767.8768
## Not sure 484.5999 469.4001
## Oppose 1739.2769 1684.7231
This table is showing the quantity of response for each category combination of observed observation.
chisq.test(Gay_Marriage$GayMarriage, Gay_Marriage$Gender)[6]
## $observed
## Gay_Marriage$Gender
## Gay_Marriage$GayMarriage Female Male
## Favor 2014 1579
## Not sure 548 406
## Oppose 1487 1937
The observed observation is totally different than expected observation. For female the observed observation got increased than expected observation. While for male the observed observation got decressed than expected observation.
Calculating column% to highlight the relationship of interest between the variables.
table(Gay_Marriage$GayMarriage, Gay_Marriage$Gender) %>%
prop.table(2)
##
## Female Male
## Favor 0.4974068 0.4026007
## Not sure 0.1353421 0.1035186
## Oppose 0.3672512 0.4938807
Visualizing the results of the column% table.
Gay_Marriage %>%
group_by(Gender, GayMarriage) %>%
summarize(n = n()) %>%
mutate(Percent = n/sum(n)) %>%
ggplot() +
geom_col(aes(x = Gender, y = Percent, fill = GayMarriage))
## `summarise()` regrouping output by 'Gender' (override with `.groups` argument)
From this analysis it’s clearly showing that, if we compare both for favoring case, 49% female favoring GayMarriage where only 40% male favoring it. Also, if we compare both again for oppositing case, 49% male oppositing it where 36% females are only oppositing it.
Calculating a chi-square test to determine if there is a statistically significant relationship between the variables.
chisq.test(Gay_Marriage$GayMarriage, Gay_Marriage$Gender)
##
## Pearson's Chi-squared test
##
## data: Gay_Marriage$GayMarriage and Gay_Marriage$Gender
## X-squared = 130.95, df = 2, p-value < 2.2e-16
This result indicated that there is a statistically significant relationship between Gender and GayMarriage. Which shows how female and male responsed to Gay_Marriage.