Introduction

I hypothesize there is a relationship between Gender and GayMarriage. Which is how male and female deferently responded to favoring or opositing GayMarriage. I will do following steps to test this hypothesis. Here, Gender is the indepdent variable and GayMarriage is the dependent variable.

1. Load Packages

Loading the necessary packages.

library(readr)
library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

2. Import Data

Importing data into R and named it Health_Data.

Voter_Data = read_csv("/Users/sakif/Downloads/Abbreviated Voter Dataset Labeled.csv")

## 
## ── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────
## cols(
##   .default = col_character(),
##   NumChildren = col_double(),
##   Immigr_Economy_GiveTake = col_double(),
##   ft_fem_2017 = col_double(),
##   ft_immig_2017 = col_double(),
##   ft_police_2017 = col_double(),
##   ft_dem_2017 = col_double(),
##   ft_rep_2017 = col_double(),
##   ft_evang_2017 = col_double(),
##   ft_muslim_2017 = col_double(),
##   ft_jew_2017 = col_double(),
##   ft_christ_2017 = col_double(),
##   ft_gays_2017 = col_double(),
##   ft_unions_2017 = col_double(),
##   ft_altright_2017 = col_double(),
##   ft_black_2017 = col_double(),
##   ft_white_2017 = col_double(),
##   ft_hisp_2017 = col_double()
## )
## ℹ Use `spec()` for the full column specifications.

head(Voter_Data)

## # A tibble: 6 x 53
##   gender race  education familyincome children region urbancity Vote2012
##   <chr>  <chr> <chr>     <chr>        <chr>    <chr>  <chr>     <chr>   
## 1 Female White 4-year    Prefer not … No       West   Suburb    Barack …
## 2 Female White Some Col… $60K-$69,999 No       West   Rural Ar… Mitt Ro…
## 3 Male   White High Sch… $50K-$59,999 No       Midwe… City      Mitt Ro…
## 4 Male   White Some Col… $70K-$79,999 No       South  City      Barack …
## 5 Male   White 4-year    $40K-$49,999 No       South  Suburb    Mitt Ro…
## 6 Female White 2-year    $30K-$39,999 No       West   Suburb    Barack …
## # … with 45 more variables: Vote2016 <chr>, TrumpSanders <chr>,
## #   PartyRegistration <chr>, PartyIdentification <chr>,
## #   PartyIdentification2 <chr>, PartyIdentification3 <chr>,
## #   NewsPublicAffairs <chr>, DemPrimary <chr>, RepPrimary <chr>,
## #   ImmigrantContributions <chr>, ImmigrantNaturalization <chr>,
## #   ImmigrationShouldBe <chr>, Abortion <chr>, GayMarriage <chr>,
## #   DeathPenalty <chr>, DeathPenaltyFreq <chr>, TaxWealthy <chr>,
## #   Healthcare <chr>, GlobWarmExist <chr>, GlobWarmingSerious <chr>,
## #   AffirmativeAction <chr>, Religion <chr>, ReligiousImportance <chr>,
## #   ChurchAttendance <chr>, PrayerFrequency <chr>, NumChildren <dbl>,
## #   areatype <chr>, GunOwnership <chr>, EconomyBetterWorse <chr>,
## #   Immigr_Economy_GiveTake <dbl>, ft_fem_2017 <dbl>, ft_immig_2017 <dbl>,
## #   ft_police_2017 <dbl>, ft_dem_2017 <dbl>, ft_rep_2017 <dbl>,
## #   ft_evang_2017 <dbl>, ft_muslim_2017 <dbl>, ft_jew_2017 <dbl>,
## #   ft_christ_2017 <dbl>, ft_gays_2017 <dbl>, ft_unions_2017 <dbl>,
## #   ft_altright_2017 <dbl>, ft_black_2017 <dbl>, ft_white_2017 <dbl>,
## #   ft_hisp_2017 <dbl>

3. Data Preparation

Identifing two categorical variable named Gender and GayMarriage. We can find the relationship between this both variables and named it as Gay_Marriage. It shows how both male and female response about GayMarriage by favoring it or oppositing it.

Gay_Marriage = Voter_Data %>%
  select(gender, GayMarriage) %>%
  rename(Gender = gender) %>%
  filter(GayMarriage %in% c("Favor", "Oppose", "Not sure"))
  
Gay_Marriage

## # A tibble: 7,971 x 2
##    Gender GayMarriage
##    <chr>  <chr>      
##  1 Female Favor      
##  2 Female Oppose     
##  3 Male   Favor      
##  4 Male   Favor      
##  5 Male   Oppose     
##  6 Female Favor      
##  7 Female Oppose     
##  8 Male   Oppose     
##  9 Male   Favor      
## 10 Female Favor      
## # … with 7,961 more rows

4. Data Summary

We got 2 types of variable. So here we see how responded response them.

(a) IV Response Summary

Here is the response of the independent variable Gender.

table(Gay_Marriage$Gender) %>%
  prop.table() %>%
  round(2)

## 
## Female   Male 
##   0.51   0.49

(b) DV Response Summary

Here is the response of the dependent variable GayMarriage.

table(Gay_Marriage$GayMarriage) %>%
  prop.table() %>%
  round(2)

## 
##    Favor Not sure   Oppose 
##     0.45     0.12     0.43

5. Expected Values

This table is showing the quantity of response for each category combination of expected observation.

chisq.test(Gay_Marriage$GayMarriage, Gay_Marriage$Gender)[7]

## $expected
##                         Gay_Marriage$Gender
## Gay_Marriage$GayMarriage    Female      Male
##                 Favor    1825.1232 1767.8768
##                 Not sure  484.5999  469.4001
##                 Oppose   1739.2769 1684.7231

6. Observed Values

This table is showing the quantity of response for each category combination of observed observation.

chisq.test(Gay_Marriage$GayMarriage, Gay_Marriage$Gender)[6]

## $observed
##                         Gay_Marriage$Gender
## Gay_Marriage$GayMarriage Female Male
##                 Favor      2014 1579
##                 Not sure    548  406
##                 Oppose     1487 1937

The observed observation is totally different than expected observation. For female the observed observation got increased than expected observation. While for male the observed observation got decressed than expected observation.

7. Data Analysis

Calculating column% to highlight the relationship of interest between the variables.

table(Gay_Marriage$GayMarriage, Gay_Marriage$Gender) %>%
  prop.table(2)

##           
##               Female      Male
##   Favor    0.4974068 0.4026007
##   Not sure 0.1353421 0.1035186
##   Oppose   0.3672512 0.4938807

8. Data Visualization

Visualizing the results of the column% table.

Gay_Marriage %>%
  group_by(Gender, GayMarriage) %>% 
  summarize(n = n()) %>%
  mutate(Percent = n/sum(n)) %>%
  ggplot() +
  geom_col(aes(x = Gender, y = Percent, fill = GayMarriage))

## `summarise()` regrouping output by 'Gender' (override with `.groups` argument)

9. Data Interpretation

From this analysis it’s clearly showing that, if we compare both for favoring case, 49% female favoring GayMarriage where only 40% male favoring it. Also, if we compare both again for oppositing case, 49% male oppositing it where 36% females are only oppositing it.

10. Chi-Square Statistical Test

Calculating a chi-square test to determine if there is a statistically significant relationship between the variables.

chisq.test(Gay_Marriage$GayMarriage, Gay_Marriage$Gender)

## 
##  Pearson's Chi-squared test
## 
## data:  Gay_Marriage$GayMarriage and Gay_Marriage$Gender
## X-squared = 130.95, df = 2, p-value < 2.2e-16

11. Interpret the Chi-Square Test

This result indicated that there is a statistically significant relationship between Gender and GayMarriage. Which shows how female and male responsed to Gay_Marriage.

Skills Drill 2

Sakif Shadman