Rebecca Gibble Final Project

Analysis Overview

I am interested in analyzing how those who graduated high school only compare to those who graduated from college (variable: educ_2019, which is the indepdent variable) in their feelings about women and their role in society. I will be analyzing the following dependent variables:

Gender equality issue importance (imiss_y_2019)
Gender roles (sexism1_2019)
- Women should return to their traditional roles in society
Modern sexism demands (sexism2_2019)
- Women often miss out on good jobs because of discrimination
Feeling towards women on a scale from 1-100 (Women_2019)

*As a note, when referring to college graduates throughout this analysis, that entails those who graduated from a four year institution.

Importing the data

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.6.2

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(readr)

## Warning: package 'readr' was built under R version 3.6.2

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.6.2

data<-read_csv("/Users/rebeccagibble/Downloads/Voter Data 2019 (1).csv")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   weight_18_24_2018 = col_logical(),
##   izip_2019 = col_character(),
##   housevote_other_2019 = col_character(),
##   senatevote_other_2019 = col_character(),
##   senatevote2_other_2019 = col_character(),
##   SenCand1Name_2019 = col_character(),
##   SenCand1Party_2019 = col_character(),
##   SenCand2Name_2019 = col_character(),
##   SenCand2Party_2019 = col_character(),
##   SenCand3Name_2019 = col_character(),
##   SenCand3Party_2019 = col_character(),
##   SenCand1Name2_2019 = col_character(),
##   SenCand1Party2_2019 = col_character(),
##   SenCand2Name2_2019 = col_character(),
##   SenCand2Party2_2019 = col_character(),
##   SenCand3Name2_2019 = col_character(),
##   SenCand3Party2_2019 = col_character(),
##   governorvote_other_2019 = col_character(),
##   GovCand1Name_2019 = col_character(),
##   GovCand1Party_2019 = col_character()
##   # ... with 108 more columns
## )
## ℹ Use `spec()` for the full column specifications.

## Warning: 800 parsing failures.
##  row               col           expected           actual                                                     file
## 2033 weight_18_24_2018 1/0/T/F/TRUE/FALSE .917710168467982 '/Users/rebeccagibble/Downloads/Voter Data 2019 (1).csv'
## 2828 weight_18_24_2018 1/0/T/F/TRUE/FALSE 1.41022291345592 '/Users/rebeccagibble/Downloads/Voter Data 2019 (1).csv'
## 4511 weight_18_24_2018 1/0/T/F/TRUE/FALSE 1.77501243840922 '/Users/rebeccagibble/Downloads/Voter Data 2019 (1).csv'
## 7264 weight_18_24_2018 1/0/T/F/TRUE/FALSE 1.29486870319614 '/Users/rebeccagibble/Downloads/Voter Data 2019 (1).csv'
## 7277 weight_18_24_2018 1/0/T/F/TRUE/FALSE 1.44972719707603 '/Users/rebeccagibble/Downloads/Voter Data 2019 (1).csv'
## .... ................. .................. ................ ........................................................
## See problems(...) for more details.

head(data)

## # A tibble: 6 x 1,282
##   weight_2016 weight_2017 weight_panel_20… weight_latino_2… weight_18_24_20…
##         <dbl>       <dbl>            <dbl>            <dbl> <lgl>           
## 1       0.358       0.438            0.503               NA NA              
## 2       0.563       0.366            0.389               NA NA              
## 3       0.552       0.550            0.684               NA NA              
## 4       0.208      NA               NA                   NA NA              
## 5       0.334       0.346            0.322               NA NA              
## 6       0.207       0.148            0.594               NA NA              
## # … with 1,277 more variables: weight_overall_2018 <dbl>, weight_2019 <dbl>,
## #   weight1_2018 <dbl>, weight1_2019 <dbl>, weight2_2019 <dbl>,
## #   weight3_2019 <dbl>, cassfullcd <dbl>, vote2020_2019 <dbl>,
## #   trumpapp_2019 <dbl>, fav_trump_2019 <dbl>, fav_obama_2019 <dbl>,
## #   fav_hrc_2019 <dbl>, fav_sanders_2019 <dbl>, fav_putin_2019 <dbl>,
## #   fav_schumer_2019 <dbl>, fav_pelosi_2019 <dbl>, fav_comey_2019 <dbl>,
## #   fav_mueller_2019 <dbl>, fav_mcconnell_2019 <dbl>, fav_kavanaugh_2019 <dbl>,
## #   fav_biden_2019 <dbl>, fav_warren_2019 <dbl>, fav_harris_2019 <dbl>,
## #   fav_gillibrand_2019 <dbl>, fav_patrick_2019 <dbl>, fav_booker_2019 <dbl>,
## #   fav_garcetti_2019 <dbl>, fav_klobuchar_2019 <dbl>, fav_gorsuch_2019 <dbl>,
## #   fav_kasich_2019 <dbl>, fav_haley_2019 <dbl>, fav_bloomberg_2019 <dbl>,
## #   fav_holder_2019 <dbl>, fav_avenatti_2019 <dbl>, fav_castro_2019 <dbl>,
## #   fav_landrieu_2019 <dbl>, fav_orourke_2019 <dbl>,
## #   fav_hickenlooper_2019 <dbl>, fav_pence_2019 <dbl>, add_confirm_2019 <dbl>,
## #   izip_2019 <chr>, votereg_2019 <dbl>, votereg_f_2019 <dbl>,
## #   regzip_2019 <dbl>, region_2019 <dbl>, turnout18post_2019 <dbl>,
## #   tsmart_G2018_2019 <dbl>, tsmart_G2018_vote_type_2019 <dbl>,
## #   tsmart_P2018_2019 <dbl>, tsmart_P2018_party_2019 <dbl>,
## #   tsmart_P2018_vote_type_2019 <dbl>, housevote_2019 <dbl>,
## #   housevote_other_2019 <chr>, senatevote_2019 <dbl>,
## #   senatevote_other_2019 <chr>, senatevote2_2019 <dbl>,
## #   senatevote2_other_2019 <chr>, SenCand1Name_2019 <chr>,
## #   SenCand1Party_2019 <chr>, SenCand2Name_2019 <chr>,
## #   SenCand2Party_2019 <chr>, SenCand3Name_2019 <chr>,
## #   SenCand3Party_2019 <chr>, SenCand1Name2_2019 <chr>,
## #   SenCand1Party2_2019 <chr>, SenCand2Name2_2019 <chr>,
## #   SenCand2Party2_2019 <chr>, SenCand3Name2_2019 <chr>,
## #   SenCand3Party2_2019 <chr>, governorvote_2019 <dbl>,
## #   governorvote_other_2019 <chr>, GovCand1Name_2019 <chr>,
## #   GovCand1Party_2019 <chr>, GovCand2Name_2019 <chr>,
## #   GovCand2Party_2019 <chr>, GovCand3Name_2019 <chr>,
## #   GovCand3Party_2019 <chr>, inst_court_2019 <dbl>, inst_media_2019 <dbl>,
## #   inst_congress_2019 <dbl>, inst_justice_2019 <dbl>, inst_FBI_2019 <dbl>,
## #   inst_military_2019 <dbl>, inst_church_2019 <dbl>, inst_business_2019 <dbl>,
## #   Democrats_2019 <dbl>, Republicans_2019 <dbl>, Men_2019 <dbl>,
## #   Women_2019 <dbl>, wm_2019 <dbl>, ww_2019 <dbl>, bm_2019 <dbl>,
## #   bw_2019 <dbl>, hm_2019 <dbl>, hw_2019 <dbl>, rwm_2019 <dbl>,
## #   rww_2019 <dbl>, rbm_2019 <dbl>, rbw_2019 <dbl>, pwm_2019 <dbl>, …

Recoding and selecting variables

Recoding variables from numeric form to their labeled form.

Selecting only necessary variables.

newdata<-data%>%
  mutate(GenderEquality = ifelse(imiss_y_2019==1,"Very Important",
                                 ifelse(imiss_y_2019==2,"Somewhat Important",
                                        ifelse(imiss_y_2019==3, "Not very Important",
                                               ifelse(imiss_y_2019==4, "Unimportant", NA)))),
          GenderRoles = ifelse(sexism1_2019==1, "Strongly Agree",
                              ifelse(sexism1_2019==2, "Somewhat Agee",
                                     ifelse(sexism1_2019==3, "Somewhat Disagree",
                                            ifelse(sexism1_2019==4, "Strongly Disagree",NA)))),
         ModernSexism = ifelse(sexism2_2019==1, "Strongly Agree",
                               ifelse(sexism2_2019==2, "Somewhat Agree",
                                      ifelse(sexism2_2019==3, "Somewhat Disagree",
                                             ifelse(sexism2_2019==4, "Strongly Disagree",NA)))),
         FeelingAboutWomen = ifelse(Women_2019>100,NA, Women_2019))%>%
  select(GenderEquality,GenderRoles,ModernSexism,FeelingAboutWomen,educ_2019)

head(newdata)

## # A tibble: 6 x 5
##   GenderEquality    GenderRoles      ModernSexism     FeelingAboutWom… educ_2019
##   <chr>             <chr>            <chr>                       <dbl>     <dbl>
## 1 Very Important    Strongly Disagr… Strongly Disagr…               80         5
## 2 <NA>              <NA>             <NA>                           NA        NA
## 3 Very Important    Strongly Disagr… Somewhat Disagr…               71         2
## 4 Not very Importa… Strongly Disagr… Strongly Agree                 10         3
## 5 Somewhat Importa… Strongly Disagr… Somewhat Disagr…               95         5
## 6 Very Important    Strongly Disagr… Strongly Disagr…              100         4

Filtering Data

Filtering data to only select two groups needed for analysis.

Previewing data to confirm changes.

newdata2<-newdata%>%
  filter(educ_2019 %in% c("2","5"))%>%
  mutate(EducationLevel = ifelse(educ_2019==2, "High school graduate",
                                 ifelse(educ_2019==5, "College graduate",NA)))%>%
  select(GenderEquality,GenderRoles,ModernSexism,FeelingAboutWomen,EducationLevel)

head(newdata2)

## # A tibble: 6 x 5
##   GenderEquality   GenderRoles   ModernSexism   FeelingAboutWom… EducationLevel 
##   <chr>            <chr>         <chr>                     <dbl> <chr>          
## 1 Very Important   Strongly Dis… Strongly Disa…               80 College gradua…
## 2 Very Important   Strongly Dis… Somewhat Disa…               71 High school gr…
## 3 Somewhat Import… Strongly Dis… Somewhat Disa…               95 College gradua…
## 4 Very Important   Strongly Dis… Strongly Disa…               99 College gradua…
## 5 Very Important   Strongly Dis… Strongly Disa…               82 College gradua…
## 6 Very Important   Strongly Dis… Strongly Disa…               99 College gradua…

Analyses for each dependent variable vs. the independent variable

Analysis 1

Analyzing relationship between education level (EducationLevel) and gender equality issue importance (GenderEquality).

Crosstab: Gender Equality

table(newdata2$GenderEquality,newdata2$EducationLevel)%>%
  prop.table(2)

##                     
##                      College graduate High school graduate
##   Not very Important        0.1607247            0.1713198
##   Somewhat Important        0.2916423            0.3362944
##   Unimportant               0.1455289            0.1472081
##   Very Important            0.4021040            0.3451777

From this crosstab we can see that college educated respondents are more likely to find that gender equality is very important compared to only high school graduates.

Stacked Bar Chart: Gender Equality

newdata2%>%
  filter(!is.na(GenderEquality))%>%
  group_by(EducationLevel,GenderEquality)%>%
  summarize(n=n())%>%
  mutate(percent=n/sum(n))%>%
  ggplot()+
  geom_col(aes(x=EducationLevel,y=percent,fill=GenderEquality))+
  theme_minimal()

## `summarise()` regrouping output by 'EducationLevel' (override with `.groups` argument)

The stacked barchart illustrates the same data as noted above while simultaneously making it easier to tell the difference between the different categories. On average, college graduates are more likely to believe that gender equality is very important which we can see from the area of the purple bar for college graduates being greater than the area of the purple bar for high school graduates.

Chi-Square Test: Gender Equality

chisq.test(newdata2$GenderEquality,newdata2$EducationLevel)

## 
##  Pearson's Chi-squared test
## 
## data:  newdata2$GenderEquality and newdata2$EducationLevel
## X-squared = 12.889, df = 3, p-value = 0.004883

There is a statistically significant relationship between education level and the importance of gender equality. We know this because the p-value obtained from the chi-squared test of 0.004883 is less than 0.05.

Analysis 2

Analyzing relationship between education level (EducationLevel) and gender roles (women should return to their traditional roles in society) (variable:GenderRoles).

Crosstab: Gender Roles

table(newdata2$GenderRoles,newdata2$EducationLevel)%>%
  prop.table(2)

##                    
##                     College graduate High school graduate
##   Somewhat Agee           0.10599884           0.16379860
##   Somewhat Disagree       0.22947001           0.29063098
##   Strongly Agree          0.03669190           0.07329509
##   Strongly Disagree       0.62783925           0.47227533

College educated respondents are approximately 1.3 times more likely to strongly disagree with women returning to their taditional roles in society versus high school graduates.

Stacked Bar Chart: Gender Roles

newdata2%>%
  filter(!is.na(GenderRoles))%>%
  group_by(EducationLevel,GenderRoles)%>%
  summarize(n=n())%>%
  mutate(percent=n/sum(n))%>%
  ggplot()+
  geom_col(aes(x=EducationLevel,y=percent,fill=GenderRoles))+
  theme_minimal()

## `summarise()` regrouping output by 'EducationLevel' (override with `.groups` argument)

The stacked barchart overall shows that college graduates, in general, are more likely to think that women are not obligated to return to their traditional gender roles in society. We see this in the “Strongly Disagree” category (purple bar) which is greater for college graduates than high school graduates.

Chi-Square Test: Gender Roles

chisq.test(newdata2$GenderRoles,newdata2$EducationLevel)

## 
##  Pearson's Chi-squared test
## 
## data:  newdata2$GenderRoles and newdata2$EducationLevel
## X-squared = 88.475, df = 3, p-value < 2.2e-16

There is a statistically significant relationship between education level and women returning to their traditional gender roles in society. We know this because the p-value obtained from this chi-squared test of 2.2e-16 is less than 0.05. In fact, the relationship between education level and gender roles is more significant than the relationship between education level and gender equality (analyzed above). This is known because the p-value in this analysis (2.2e-16) is smaller than the p-value in Analysis 1 (0.004883).

Analysis 3

Analyzing relationship between education level (EducationLevel) and modern sexism demands (women often miss out on good jobs because of discrimination) (variable:ModernSexism).

Crosstab: Modern Sexism

table(newdata2$ModernSexism,newdata2$EducationLevel)%>%
  prop.table(2)

##                    
##                     College graduate High school graduate
##   Somewhat Agree           0.2216405            0.2305236
##   Somewhat Disagree        0.2082606            0.2688378
##   Strongly Agree           0.1436882            0.1819923
##   Strongly Disagree        0.4264107            0.3186462

High school graduates are more likely to strongly agree that women often miss out on good jobs due to discrimination as compared to college graduates, however, not by much. The greatest difference seen above is that college graduates are more likely to strongly disagree that women often miss out on good jobs due to discrimination.

Stacked Bar Chart: Modern Sexism

newdata2%>%
  filter(!is.na(ModernSexism))%>%
  group_by(EducationLevel,ModernSexism)%>%
  summarize(n=n())%>%
  mutate(percent=n/sum(n))%>%
  ggplot()+
  geom_col(aes(x=EducationLevel,y=percent,fill=ModernSexism))+
  theme_minimal()

## `summarise()` regrouping output by 'EducationLevel' (override with `.groups` argument)

The stacked barchart illustrates more distinctly the differences in opinion between college and high school graduates in terms of discrimination towards women in the workplace. In general, college graduates are more likely to think that women are not discriminated against – this is shown especially in the “Strongly Disagree” category.

Chi-Square Test: Modern Sexism

chisq.test(newdata2$ModernSexism,newdata2$EducationLevel)

## 
##  Pearson's Chi-squared test
## 
## data:  newdata2$ModernSexism and newdata2$EducationLevel
## X-squared = 45.766, df = 3, p-value = 6.359e-10

This chi-squared test shows that there is a statistically significant relationship between education level and their opinion on discrimination towards women in the workforce. The p-value obtained from this test is 6.359e-10 which is less than 0.05, therefore, signifying that there is statistical significance.

Analysis 4

Analyzing relationship between education level (EducationLevel) and feeling towards women on a scale from 1-100 (FeelingAboutWomen).

Table Comparing Means: Feeling Towards Women

newdata3<-newdata2%>%
  filter(EducationLevel %in% c("College graduate","High school graduate"))%>%
  group_by(EducationLevel)%>%
  summarize(Avg_FT_Women = mean(FeelingAboutWomen,na.rm=TRUE))

## `summarise()` ungrouping output (override with `.groups` argument)

newdata3

## # A tibble: 2 x 2
##   EducationLevel       Avg_FT_Women
##   <chr>                       <dbl>
## 1 College graduate             76.8
## 2 High school graduate         76.9

The above table shows the difference in means between college graduates and high school graduates in their average feeling towards women. The means are almost the same, with a very nominal difference of only 0.11059.

Difference in means visualization

ggplot()+
geom_col(data=newdata3,aes(x=EducationLevel,y=Avg_FT_Women,fill=Avg_FT_Women))

The graph further displays how close the means are. The difference in means between college graduates and high school graduates on their feeling towards women is only 0.11059, making the columns appear nearly identical to the human eye. Because the means are so close, I hypothesize that there is no statistically significant difference between high school and college gradutes in their average feeling towards women. We will further analyze this hypothesis below.

Visualization: Histogram comparing population distributions

newdata2%>%
  ggplot()+
  geom_histogram(aes(x=FeelingAboutWomen))+
  facet_wrap(~EducationLevel)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 172 rows containing non-finite values (stat_bin).

The above graphs depict the population disstribtions for the average feeling towards women between college graduates versus high school graduates. The graphs are relatively the same, except there is a bigger spike at 100 for feeling towards women for high school graduates. Overall, however, these graphs closely resemble one another – especially for population distributions. It should be noted though that neither histogtam for the population distribution follows a normal distribution.

Visualization: Histogram comparing sampling distributions

Creating 2 different datasets to filter out the different education levels being analyzed. These datasets will then be used to create sampling distributions.

Data Set 1 (college graduates)

data1=college_data<-newdata2%>%
  filter(EducationLevel=="College graduate")

Data Set 2 (high school graduates)

data2=highschool_data<-newdata2%>%
  filter(EducationLevel=="High school graduate")

Drawing 10,000 random samples of 40 respondents for both data sets shown above then creating a sampling distribution to compare. The sampling distribution shows both datasets on one graph.

data1=college_data<-replicate(10000,
sample(newdata2$FeelingAboutWomen, 40)%>%mean(na.rm=TRUE))%>%
  data.frame()%>%
  rename("mean"=1)

data2=highschool_data<-replicate(10000,
sample(newdata2$FeelingAboutWomen, 40)%>%mean(na.rm=TRUE)
)%>%
  data.frame()%>%
  rename("mean"=1)

ggplot()+
  geom_histogram(data=college_data,aes(x=mean),fill="red",alpha=0.5)+
  geom_histogram(data=highschool_data,aes(x=mean),fill="green",alpha=0.5)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

From this graph we can see right away that these sampling distributions are far closer to a normal distribution than the population distributions were. The sample distributions also overlap greatly, with only slight differences between the two.

T-Test

Hypotheses

Null Hypothesis: There is no difference in the mean value between the two groups.

Alternative Hypothesis: There is a difference in the mean value between the two groups.

t.test(FeelingAboutWomen~EducationLevel, data=newdata2)

## 
##  Welch Two Sample t-test
## 
## data:  FeelingAboutWomen by EducationLevel
## t = -0.14775, df = 3105.1, p-value = 0.8826
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.578124  1.356953
## sample estimates:
##     mean in group College graduate mean in group High school graduate 
##                           76.78163                           76.89222

The t-test shown above generates a p-value of 0.8826 which is greater than 0.05, therefore signifying that there is not a statistically significant difference between high school and college graduates in their mean feeling towards women. We fail to reject the null hypothesis for this analysis.

Conclusions

The overall findings of these analyses show that college graduates are more likely to strongly agree that gender equality is important and strongly disagree that women should return to their traditional roles in society versus high school graduates only. However, high school graduates are more likely to strongly agree that women miss out on jobs due to discrimination, whereas college graduates are more likely to strongly disagree with that statement. This could be due to the differences in industries between college and high school graduates and their overall experiences in the workforce. In all, from the analyses, the most statistical significant relationship was between education level and gender roles (analysis 2) is more significant, the second being the relationship between education level and modern sexism demands (analysis 3), and the third being education level and gender equality (analysis 1). While all 3 of these analyses showed statistical significance, there was no statistical significance between college graduates and high school graduates in their average feeling towards women. The means of these two groups as well as their population distribution were almost the same.