Rebecca Gibble Homework 3

Variable Selection & Research Question

Categorical (Independent) Variable

The independent variable I will be analyzing is GayMarriage which tells us whether the respondent favors or oppposes gay marriage.

Continuous (Dependent) Variable

The dependent variable I will be analyzing is NumChildren which tells us how many children the respondent has.

Research Question/Hypothesis

I hypothesize that there is a relationship between number of children a respondent has and whether they support gay marriage or not.

Data Prep

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.6.2

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(readr)
library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.6.2

voterdata <- read_csv("/Users/rebeccagibble/Downloads/(Data)Abbreviated Labeled Voter 2017 Dataset.csv")

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   NumChildren = col_double(),
##   Immigr_Economy_GiveTake = col_double(),
##   ft_fem_2017 = col_double(),
##   ft_immig_2017 = col_double(),
##   ft_police_2017 = col_double(),
##   ft_dem_2017 = col_double(),
##   ft_rep_2017 = col_double(),
##   ft_evang_2017 = col_double(),
##   ft_muslim_2017 = col_double(),
##   ft_jew_2017 = col_double(),
##   ft_christ_2017 = col_double(),
##   ft_gays_2017 = col_double(),
##   ft_unions_2017 = col_double(),
##   ft_altright_2017 = col_double(),
##   ft_black_2017 = col_double(),
##   ft_white_2017 = col_double(),
##   ft_hisp_2017 = col_double()
## )

## See spec(...) for full column specifications.

data<-voterdata%>%
  select(NumChildren, GayMarriage)%>%
  filter(GayMarriage %in% c("Favor","Oppose"))

Comparison of Means

Table comparing mean of continuous variable between groups

data%>%
  select(GayMarriage, NumChildren)%>%
  filter(GayMarriage %in% c("Favor","Oppose"))%>%
  group_by(GayMarriage)%>%
  summarize(Number_Children=mean(NumChildren,na.rm=TRUE))

## `summarise()` ungrouping output (override with `.groups` argument)

## # A tibble: 2 x 2
##   GayMarriage Number_Children
##   <chr>                 <dbl>
## 1 Favor                 0.326
## 2 Oppose                0.467

Visualization

data%>%
  ggplot()+
  geom_histogram(aes(x=NumChildren))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 99 rows containing non-finite values (stat_bin).

Interpretation

By means of analyzing the table and visualization, we see that those who favor gay marriage have on average 0.32 kids and those who oppose gay marriage have on average 0.47 kids. However, respondents cannot have 1/2 of a child (for example), so instead we can say that respondents who favor gay marriage tend to have fewer children than those who oppose it.

Comparison of Distributions

data%>%
  ggplot()+
  geom_histogram(aes(x=NumChildren))+
  facet_wrap(~GayMarriage)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 99 rows containing non-finite values (stat_bin).

Interpretation

By analyzing these histograms side by side we see that they are very similar. The biggest difference is between those with 0 children who favor vs. oppose gay marriage with more people favoring gay marriage than opposing. Because the graphs are so similar, we can assume that in general there are just more people who support gay marriage and that there may not be a real relationship between number of children and support for gay marriage. We will keep analyzing to understand the relationship better.

Sampling Distribution & T-test

Group 1

data1=rep_data<-data%>%
  filter(GayMarriage=="Favor")

Group 2

data2=rep_data<-data%>%
  filter(GayMarriage=="Oppose")

Drawing 10,000 samples of 40 respondents and calculating means of continuous variables for each group with respective sampling distributions

Group 1

favor_data<-replicate(10000,
sample(data1$NumChildren, 40)%>%mean(na.rm=TRUE)
)%>%
  data.frame()%>%
  rename("mean"=1)

favor_data%>%
  ggplot()+
  geom_histogram(aes(x=mean),fill="green")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Group 2

oppose_data<-replicate(10000,
sample(data2$NumChildren, 40)%>%mean(na.rm=TRUE)
)%>%
  data.frame()%>%
  rename("mean"=1)

oppose_data%>%
  ggplot()+
  geom_histogram(aes(x=mean),fill="red")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Interpretation of these histograms

Because these distributions show 10,000 samples of 40 respondents, they are far closer to a normal distribution than the population distribution.

T-Test Set Up

data%>%
  summarize(NumChildren=mean(NumChildren,na.rm = TRUE))

## # A tibble: 1 x 1
##   NumChildren
##         <dbl>
## 1       0.395

If the number of children that a person has, has no impact on their support for gay marriage, then there should be very little difference between 0.39 and the group-wise averages for both sides, favor and opppose.

data%>%
  group_by(GayMarriage)%>%
  summarize(NumChildren=mean(NumChildren,na.rm = TRUE))

## `summarise()` ungrouping output (override with `.groups` argument)

## # A tibble: 2 x 2
##   GayMarriage NumChildren
##   <chr>             <dbl>
## 1 Favor             0.326
## 2 Oppose            0.467

These averages are approximately the same distance from 0.39, but in opposite directions. Thsese numbers do stray a bit from 0.39.

Hypotheses

Null Hypothesis: There is no difference in the mean value between the two groups.

Alt Hypothesis: There is a difference in the mean value between the two groups.

T-Test Execution

t.test(NumChildren~GayMarriage, data=data)

## 
##  Welch Two Sample t-test
## 
## data:  NumChildren by GayMarriage
## t = -6.4456, df = 6240.5, p-value = 1.237e-10
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.18473685 -0.09857266
## sample estimates:
##  mean in group Favor mean in group Oppose 
##            0.3258237            0.4674785

There is a statistically significant difference between those who favor and oppose gay marriage and their mean number of children. We know this because the p-value is smaller than 0.05.