Setup

Load packages

library(ggplot2)
library(dplyr)
library(statsr)
library(mltools)
library(data.table)
library(descr)

Load data

load("gss.Rdata")

Part 1: Data

The General Social Survey (GSS) is part of a continuing study of American public opinion and values since 1972. The purpose is to study the trends in attitudes and behaviors of the American society based on the information gathered.

There are 57061 rows and 114 of variables in the dataset

glimpse(gss)
## Rows: 57,061
## Columns: 114
## $ caseid   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18~
## $ year     <int> 1972, 1972, 1972, 1972, 1972, 1972, 1972, 1972, 1972, 1972, 1~
## $ age      <int> 23, 70, 48, 27, 61, 26, 28, 27, 21, 30, 30, 56, 54, 49, 41, 5~
## $ sex      <fct> Female, Male, Female, Female, Female, Male, Male, Male, Femal~
## $ race     <fct> White, White, White, White, White, White, White, White, Black~
## $ hispanic <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ uscitzn  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ educ     <int> 16, 10, 12, 17, 12, 14, 13, 16, 12, 12, 13, 6, 9, 8, 9, 14, 1~
## $ paeduc   <int> 10, 8, 8, 16, 8, 18, 16, 16, 12, 10, 12, NA, 5, NA, NA, NA, 8~
## $ maeduc   <int> NA, 8, 8, 12, 8, 19, 12, 14, 12, 7, NA, 8, 5, 10, 3, 0, 8, 8,~
## $ speduc   <int> NA, 12, 11, 20, 12, NA, NA, NA, NA, 11, 12, 9, 8, NA, 8, 14, ~
## $ degree   <fct> Bachelor, Lt High School, High School, Bachelor, High School,~
## $ vetyears <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ sei      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ wrkstat  <fct> "Working Fulltime", "Retired", "Working Parttime", "Working F~
## $ wrkslf   <fct> Someone Else, Someone Else, Someone Else, Someone Else, Someo~
## $ marital  <fct> Never Married, Married, Married, Married, Married, Never Marr~
## $ spwrksta <fct> NA, "Keeping House", "Working Fulltime", "Working Fulltime", ~
## $ sibs     <int> 3, 4, 5, 5, 2, 1, 7, 1, 2, 7, 7, 6, 2, 2, 0, 7, 0, 2, 2, 7, 2~
## $ childs   <int> 0, 5, 4, 0, 2, 0, 2, 0, 2, 4, 1, 5, 1, 2, 5, 2, 2, 3, 3, 0, 2~
## $ agekdbrn <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ incom16  <fct> Average, Above Average, Average, Average, Below Average, Aver~
## $ born     <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ parborn  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ granborn <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ income06 <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ coninc   <int> 25926, 33333, 33333, 41667, 69444, 60185, 50926, 18519, 3704,~
## $ region   <fct> E. Nor. Central, E. Nor. Central, E. Nor. Central, E. Nor. Ce~
## $ partyid  <fct> "Ind,Near Dem", "Not Str Democrat", "Independent", "Not Str D~
## $ polviews <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ relig    <fct> Jewish, Catholic, Protestant, Other, Protestant, Protestant, ~
## $ attend   <fct> Once A Year, Every Week, Once A Month, NA, NA, Once A Year, E~
## $ natspac  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ natenvir <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ natheal  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ natcity  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ natcrime <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ natdrug  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ nateduc  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ natrace  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ natarms  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ nataid   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ natfare  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ natroad  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ natsoc   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ natmass  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ natpark  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ confinan <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ conbus   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ conclerg <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ coneduc  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ confed   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ conlabor <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ conpress <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ conmedic <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ contv    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ conjudge <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ consci   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ conlegis <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ conarmy  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ joblose  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ jobfind  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ satjob   <fct> A Little Dissat, NA, Mod. Satisfied, Very Satisfied, NA, Mod.~
## $ richwork <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ jobinc   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ jobsec   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ jobhour  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ jobpromo <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ jobmeans <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ class    <fct> Middle Class, Middle Class, Working Class, Middle Class, Work~
## $ rank     <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ satfin   <fct> Not At All Sat, More Or Less, Satisfied, Not At All Sat, Sati~
## $ finalter <fct> Better, Stayed Same, Better, Stayed Same, Better, Better, Bet~
## $ finrela  <fct> Average, Above Average, Average, Average, Above Average, Abov~
## $ unemp    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ govaid   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ getaid   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ union    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ getahead <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ parsol   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ kidssol  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ abdefect <fct> Yes, Yes, Yes, No, Yes, Yes, Yes, Yes, Yes, Yes, No, No, Yes,~
## $ abnomore <fct> Yes, No, Yes, No, Yes, Yes, No, Yes, No, No, No, No, No, Yes,~
## $ abhlth   <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, NA, No, No,~
## $ abpoor   <fct> Yes, No, Yes, Yes, Yes, Yes, No, Yes, No, Yes, NA, Yes, No, Y~
## $ abrape   <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, NA, Yes, NA, No, NA, ~
## $ absingle <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, No, NA, Yes, No, ~
## $ abany    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ pillok   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ sexeduc  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ divlaw   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ premarsx <fct> Not Wrong At All, Always Wrong, Always Wrong, Always Wrong, S~
## $ teensex  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ xmarsex  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ homosex  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ suicide1 <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ suicide2 <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ suicide3 <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ suicide4 <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ fear     <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ owngun   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ pistol   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ shotgun  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ rifle    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ news     <fct> Everyday, Everyday, Everyday, Once A Week, Everyday, Everyday~
## $ tvhours  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ racdif1  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ racdif2  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ racdif3  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ racdif4  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ helppoor <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ helpnot  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ helpsick <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ helpblk  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
head(gss)
##   caseid year age    sex  race hispanic uscitzn educ paeduc maeduc speduc
## 1      1 1972  23 Female White     <NA>    <NA>   16     10     NA     NA
## 2      2 1972  70   Male White     <NA>    <NA>   10      8      8     12
## 3      3 1972  48 Female White     <NA>    <NA>   12      8      8     11
## 4      4 1972  27 Female White     <NA>    <NA>   17     16     12     20
## 5      5 1972  61 Female White     <NA>    <NA>   12      8      8     12
## 6      6 1972  26   Male White     <NA>    <NA>   14     18     19     NA
##           degree vetyears sei          wrkstat       wrkslf       marital
## 1       Bachelor     <NA>  NA Working Fulltime Someone Else Never Married
## 2 Lt High School     <NA>  NA          Retired Someone Else       Married
## 3    High School     <NA>  NA Working Parttime Someone Else       Married
## 4       Bachelor     <NA>  NA Working Fulltime Someone Else       Married
## 5    High School     <NA>  NA    Keeping House Someone Else       Married
## 6    High School     <NA>  NA Working Fulltime Someone Else Never Married
##           spwrksta sibs childs agekdbrn       incom16 born parborn granborn
## 1             <NA>    3      0       NA       Average <NA>    <NA>       NA
## 2    Keeping House    4      5       NA Above Average <NA>    <NA>       NA
## 3 Working Fulltime    5      4       NA       Average <NA>    <NA>       NA
## 4 Working Fulltime    5      0       NA       Average <NA>    <NA>       NA
## 5 Temp Not Working    2      2       NA Below Average <NA>    <NA>       NA
## 6             <NA>    1      0       NA       Average <NA>    <NA>       NA
##   income06 coninc          region          partyid polviews      relig
## 1     <NA>  25926 E. Nor. Central     Ind,Near Dem     <NA>     Jewish
## 2     <NA>  33333 E. Nor. Central Not Str Democrat     <NA>   Catholic
## 3     <NA>  33333 E. Nor. Central      Independent     <NA> Protestant
## 4     <NA>  41667 E. Nor. Central Not Str Democrat     <NA>      Other
## 5     <NA>  69444 E. Nor. Central  Strong Democrat     <NA> Protestant
## 6     <NA>  60185 E. Nor. Central     Ind,Near Dem     <NA> Protestant
##         attend natspac natenvir natheal natcity natcrime natdrug nateduc
## 1  Once A Year    <NA>     <NA>    <NA>    <NA>     <NA>    <NA>    <NA>
## 2   Every Week    <NA>     <NA>    <NA>    <NA>     <NA>    <NA>    <NA>
## 3 Once A Month    <NA>     <NA>    <NA>    <NA>     <NA>    <NA>    <NA>
## 4         <NA>    <NA>     <NA>    <NA>    <NA>     <NA>    <NA>    <NA>
## 5         <NA>    <NA>     <NA>    <NA>    <NA>     <NA>    <NA>    <NA>
## 6  Once A Year    <NA>     <NA>    <NA>    <NA>     <NA>    <NA>    <NA>
##   natrace natarms nataid natfare natroad natsoc natmass natpark confinan conbus
## 1    <NA>    <NA>   <NA>    <NA>    <NA>   <NA>    <NA>    <NA>     <NA>   <NA>
## 2    <NA>    <NA>   <NA>    <NA>    <NA>   <NA>    <NA>    <NA>     <NA>   <NA>
## 3    <NA>    <NA>   <NA>    <NA>    <NA>   <NA>    <NA>    <NA>     <NA>   <NA>
## 4    <NA>    <NA>   <NA>    <NA>    <NA>   <NA>    <NA>    <NA>     <NA>   <NA>
## 5    <NA>    <NA>   <NA>    <NA>    <NA>   <NA>    <NA>    <NA>     <NA>   <NA>
## 6    <NA>    <NA>   <NA>    <NA>    <NA>   <NA>    <NA>    <NA>     <NA>   <NA>
##   conclerg coneduc confed conlabor conpress conmedic contv conjudge consci
## 1     <NA>    <NA>   <NA>     <NA>     <NA>     <NA>  <NA>     <NA>   <NA>
## 2     <NA>    <NA>   <NA>     <NA>     <NA>     <NA>  <NA>     <NA>   <NA>
## 3     <NA>    <NA>   <NA>     <NA>     <NA>     <NA>  <NA>     <NA>   <NA>
## 4     <NA>    <NA>   <NA>     <NA>     <NA>     <NA>  <NA>     <NA>   <NA>
## 5     <NA>    <NA>   <NA>     <NA>     <NA>     <NA>  <NA>     <NA>   <NA>
## 6     <NA>    <NA>   <NA>     <NA>     <NA>     <NA>  <NA>     <NA>   <NA>
##   conlegis conarmy joblose jobfind          satjob richwork jobinc jobsec
## 1     <NA>    <NA>    <NA>    <NA> A Little Dissat     <NA>   <NA>   <NA>
## 2     <NA>    <NA>    <NA>    <NA>            <NA>     <NA>   <NA>   <NA>
## 3     <NA>    <NA>    <NA>    <NA>  Mod. Satisfied     <NA>   <NA>   <NA>
## 4     <NA>    <NA>    <NA>    <NA>  Very Satisfied     <NA>   <NA>   <NA>
## 5     <NA>    <NA>    <NA>    <NA>            <NA>     <NA>   <NA>   <NA>
## 6     <NA>    <NA>    <NA>    <NA>  Mod. Satisfied     <NA>   <NA>   <NA>
##   jobhour jobpromo jobmeans         class rank         satfin    finalter
## 1    <NA>     <NA>     <NA>  Middle Class   NA Not At All Sat      Better
## 2    <NA>     <NA>     <NA>  Middle Class   NA   More Or Less Stayed Same
## 3    <NA>     <NA>     <NA> Working Class   NA      Satisfied      Better
## 4    <NA>     <NA>     <NA>  Middle Class   NA Not At All Sat Stayed Same
## 5    <NA>     <NA>     <NA> Working Class   NA      Satisfied      Better
## 6    <NA>     <NA>     <NA>  Middle Class   NA   More Or Less      Better
##         finrela unemp govaid getaid union getahead parsol kidssol abdefect
## 1       Average  <NA>   <NA>   <NA>  <NA>     <NA>   <NA>    <NA>      Yes
## 2 Above Average  <NA>   <NA>   <NA>  <NA>     <NA>   <NA>    <NA>      Yes
## 3       Average  <NA>   <NA>   <NA>  <NA>     <NA>   <NA>    <NA>      Yes
## 4       Average  <NA>   <NA>   <NA>  <NA>     <NA>   <NA>    <NA>       No
## 5 Above Average  <NA>   <NA>   <NA>  <NA>     <NA>   <NA>    <NA>      Yes
## 6 Above Average  <NA>   <NA>   <NA>  <NA>     <NA>   <NA>    <NA>      Yes
##   abnomore abhlth abpoor abrape absingle abany pillok sexeduc divlaw
## 1      Yes    Yes    Yes    Yes      Yes  <NA>   <NA>    <NA>   <NA>
## 2       No    Yes     No    Yes      Yes  <NA>   <NA>    <NA>   <NA>
## 3      Yes    Yes    Yes    Yes      Yes  <NA>   <NA>    <NA>   <NA>
## 4       No    Yes    Yes    Yes      Yes  <NA>   <NA>    <NA>   <NA>
## 5      Yes    Yes    Yes    Yes      Yes  <NA>   <NA>    <NA>   <NA>
## 6      Yes    Yes    Yes    Yes      Yes  <NA>   <NA>    <NA>   <NA>
##           premarsx teensex xmarsex homosex suicide1 suicide2 suicide3 suicide4
## 1 Not Wrong At All    <NA>    <NA>    <NA>     <NA>     <NA>     <NA>     <NA>
## 2     Always Wrong    <NA>    <NA>    <NA>     <NA>     <NA>     <NA>     <NA>
## 3     Always Wrong    <NA>    <NA>    <NA>     <NA>     <NA>     <NA>     <NA>
## 4     Always Wrong    <NA>    <NA>    <NA>     <NA>     <NA>     <NA>     <NA>
## 5  Sometimes Wrong    <NA>    <NA>    <NA>     <NA>     <NA>     <NA>     <NA>
## 6  Sometimes Wrong    <NA>    <NA>    <NA>     <NA>     <NA>     <NA>     <NA>
##   fear owngun pistol shotgun rifle        news tvhours racdif1 racdif2 racdif3
## 1 <NA>   <NA>   <NA>    <NA>  <NA>    Everyday      NA    <NA>    <NA>    <NA>
## 2 <NA>   <NA>   <NA>    <NA>  <NA>    Everyday      NA    <NA>    <NA>    <NA>
## 3 <NA>   <NA>   <NA>    <NA>  <NA>    Everyday      NA    <NA>    <NA>    <NA>
## 4 <NA>   <NA>   <NA>    <NA>  <NA> Once A Week      NA    <NA>    <NA>    <NA>
## 5 <NA>   <NA>   <NA>    <NA>  <NA>    Everyday      NA    <NA>    <NA>    <NA>
## 6 <NA>   <NA>   <NA>    <NA>  <NA>    Everyday      NA    <NA>    <NA>    <NA>
##   racdif4 helppoor helpnot helpsick helpblk
## 1    <NA>     <NA>    <NA>     <NA>    <NA>
## 2    <NA>     <NA>    <NA>     <NA>    <NA>
## 3    <NA>     <NA>    <NA>     <NA>    <NA>
## 4    <NA>     <NA>    <NA>     <NA>    <NA>
## 5    <NA>     <NA>    <NA>     <NA>    <NA>
## 6    <NA>     <NA>    <NA>     <NA>    <NA>

As the participants from across the United States were randomly selected based on the addresses for the survey and from each household, an adult member will be randomly selected to complete the interview, random sampling was used and we can assume that everyone in the community has an equal chance to be selected. Thus, generalizability is achieved. In other words, the findings can be generalized to the U.S population. However, as this is not a randomized experiment, random assignment was not used, causal relationship between the variables cannot be established, no causality can be inferred from these findings.


Part 2: Research question

In some countries, female and male do not have equal chance to attend school, especially for higher education, less women can achieve higher education level. Women were expected to put the family first instead of achieving own successes. However, how is the situation in developed country like United States? It will be interesting to explore if there is some correlation between sex and degree levels within 1972 to 2012 in United States.

degree_df<- select(gss, c(sex, degree))

# count the sum of na values

total_navalues<- sum(is.na(degree_df))
total_navalues
## [1] 1010
degree_df<-na.omit(degree_df)

Part 3: EDA

Let’s visualize the proportions in contingency table:

tab<- crosstab(degree_df$sex, degree_df$degree)

tab
##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |-------------------------|
## 
## ================================================================================
##               degree_df$degree
## degr_df$sx    Lt Hgh Sch   High Schol   Junir Cllg   Bachelor   Graduate   Total
## --------------------------------------------------------------------------------
## Male                5153        12340         1272       3822       2091   24678
## --------------------------------------------------------------------------------
## Female              6669        16947         1798       4180       1779   31373
## --------------------------------------------------------------------------------
## Total              11822        29287         3070       8002       3870   56051
## ================================================================================

As shown above, we could see that majority of the participants of both sex completed high school as their highest education level followed by Lt High School. On the other hand, only minority of participants of both sex achieve Junior College followed by Graduate education.

summary(degree_df)
##      sex                   degree     
##  Male  :24678   Lt High School:11822  
##  Female:31373   High School   :29287  
##                 Junior College: 3070  
##                 Bachelor      : 8002  
##                 Graduate      : 3870
tab2<- table(degree_df$sex, degree_df$degree)
prop.table(tab2)
##         
##          Lt High School High School Junior College   Bachelor   Graduate
##   Male       0.09193413  0.22015664     0.02269362 0.06818790 0.03730531
##   Female     0.11898093  0.30234965     0.03207793 0.07457494 0.03173895

In terms of proportions, the numbers of female taking part in the survey is higher than male. Thus, there is not much difference in proportion between both sex in attending Lt High School, High School and Junior College. However, it is obvious that male has higher proportion in receiving Graduate level education than female.


Part 4: Inference

H0: There is no significant correlation between sex and degree level, sex and degree levels are independent.

HA: There is some significant correlation between sex and degree level, sex and degree levels are dependent.

Since both sex and degree level are categorical variables (degree level with more than 2 levels), chi-square test will be performed to test on the hypothesis.

Check on conditions:

  1. As the participants were randomly selected in the study, random sampling were used without replacement. Since only 57061 samples were collected in this dataset, the number of participants are less than 10% of the population. Each case only contributes to one cell in the table. Thus, the independence condition is met.

  2. Each particular scenario has at least 5 expected cases. The sample size condition is also met.

Therefore, chi-square goodness of fit test will be performed.

chisq.test(x=degree_df$sex, y= degree_df$degree, correct=FALSE, simulate.p.value = FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  degree_df$sex and degree_df$degree
## X-squared = 254.35, df = 4, p-value < 2.2e-16

Based on the chi-suqare test, the X-squared value is 254.35 and p-value is less than 0.05. Thus, we will reject the null hypothesis and conclude that the data provide strong evidence that there is some significant correlation between sex and degree level. In other words, sex and degree level are dependent.

However, there are na values from the data collected and that the proportion between both male and female is not equal. Also, the age of the participants plays a part as well on the level of education they are receiving at that time (some participants are below 20 years old). Thus, there is also possibility that the conclusion will be different because participants might still receive higher education in older age. For this specific question, it might be helpful to just analyze participants from older age category.