Setup

Load packages

library(ggplot2)
library(dplyr)
library(statsr)

Load data

load("gss.Rdata")

Part 1: Data

Background

About the GSS

For more than four decades, the General Social Survey (GSS) has studied the growing complexity of American society. It is the only full-probability, personal-interview survey designed to monitor changes in both social characteristics and attitudes currently being conducted in the United States.

The General Social Survey

The General Social Survey (GSS) is a nationally representative survey of adults in the United States conducted since 1972. The GSS collects data on contemporary American society in order to monitor and explain trends in opinions, attitudes and behaviors. The GSS has adopted questions from earlier surveys which allows researchers to conduct comparisons for up to 80 years.
  • Describe how the observations in the sample are collected, and the implications of this data collection method on the scope of inference (generalizability / causality)

    • the data collection details are as follows, (ref:GSS cumulative codebook), Block quota sampling was used in 1972, 1973, and 1974 surveys and for half of the 1975 and 1976 surveys. Full probability sampling was employed in half of the 1975 and 1976 surveys and the 1977, 1978, 1980, 1982-1991, 1993-1998, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2016, and 2018 Also, the 2004, 2006, 2008, 2010, 2012, 2014, 2016, and 2018 surveys had sub-sampled non-respondents Target population: Each survey from 1972 to 2004 was an independently drawn sample of English-speaking persons 18 years of age or over, living in non-institutional arrangements within the United States.Starting in2006 Spanish-speakers were added to the target population.

    • Regarding to the sample design of GSS. The observations of the sample are collected through the random sampling method. Therefore, It can be generalized the inference to the population. By the way, It can not be a causality, by its method, there’s no random assignment to the sample study.


Part 2: Research question

The interesting question is the equality of accessing to education of black people compared to White people in United State. The one of the indicator that can significantly indicate the equality of the race is the accessing to the education. In my opinion, everybody or no matter what’s your nationality, they should be acquired the education equally.


Part 3: Exploratory data analysis

# To see overall data structure
glimpse(gss)
## Rows: 57,061
## Columns: 114
## $ caseid   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,...
## $ year     <int> 1972, 1972, 1972, 1972, 1972, 1972, 1972, 1972, 1972, 1972...
## $ age      <int> 23, 70, 48, 27, 61, 26, 28, 27, 21, 30, 30, 56, 54, 49, 41...
## $ sex      <fct> Female, Male, Female, Female, Female, Male, Male, Male, Fe...
## $ race     <fct> White, White, White, White, White, White, White, White, Bl...
## $ hispanic <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ uscitzn  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ educ     <int> 16, 10, 12, 17, 12, 14, 13, 16, 12, 12, 13, 6, 9, 8, 9, 14...
## $ paeduc   <int> 10, 8, 8, 16, 8, 18, 16, 16, 12, 10, 12, NA, 5, NA, NA, NA...
## $ maeduc   <int> NA, 8, 8, 12, 8, 19, 12, 14, 12, 7, NA, 8, 5, 10, 3, 0, 8,...
## $ speduc   <int> NA, 12, 11, 20, 12, NA, NA, NA, NA, 11, 12, 9, 8, NA, 8, 1...
## $ degree   <fct> Bachelor, Lt High School, High School, Bachelor, High Scho...
## $ vetyears <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ sei      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ wrkstat  <fct> Working Fulltime, Retired, Working Parttime, Working Fullt...
## $ wrkslf   <fct> Someone Else, Someone Else, Someone Else, Someone Else, So...
## $ marital  <fct> Never Married, Married, Married, Married, Married, Never M...
## $ spwrksta <fct> NA, Keeping House, Working Fulltime, Working Fulltime, Tem...
## $ sibs     <int> 3, 4, 5, 5, 2, 1, 7, 1, 2, 7, 7, 6, 2, 2, 0, 7, 0, 2, 2, 7...
## $ childs   <int> 0, 5, 4, 0, 2, 0, 2, 0, 2, 4, 1, 5, 1, 2, 5, 2, 2, 3, 3, 0...
## $ agekdbrn <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ incom16  <fct> Average, Above Average, Average, Average, Below Average, A...
## $ born     <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ parborn  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ granborn <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ income06 <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ coninc   <int> 25926, 33333, 33333, 41667, 69444, 60185, 50926, 18519, 37...
## $ region   <fct> E. Nor. Central, E. Nor. Central, E. Nor. Central, E. Nor....
## $ partyid  <fct> "Ind,Near Dem", "Not Str Democrat", "Independent", "Not St...
## $ polviews <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ relig    <fct> Jewish, Catholic, Protestant, Other, Protestant, Protestan...
## $ attend   <fct> Once A Year, Every Week, Once A Month, NA, NA, Once A Year...
## $ natspac  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ natenvir <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ natheal  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ natcity  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ natcrime <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ natdrug  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ nateduc  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ natrace  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ natarms  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ nataid   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ natfare  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ natroad  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ natsoc   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ natmass  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ natpark  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ confinan <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ conbus   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ conclerg <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ coneduc  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ confed   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ conlabor <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ conpress <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ conmedic <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ contv    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ conjudge <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ consci   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ conlegis <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ conarmy  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ joblose  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ jobfind  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ satjob   <fct> A Little Dissat, NA, Mod. Satisfied, Very Satisfied, NA, M...
## $ richwork <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ jobinc   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ jobsec   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ jobhour  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ jobpromo <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ jobmeans <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ class    <fct> Middle Class, Middle Class, Working Class, Middle Class, W...
## $ rank     <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ satfin   <fct> Not At All Sat, More Or Less, Satisfied, Not At All Sat, S...
## $ finalter <fct> Better, Stayed Same, Better, Stayed Same, Better, Better, ...
## $ finrela  <fct> Average, Above Average, Average, Average, Above Average, A...
## $ unemp    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ govaid   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ getaid   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ union    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ getahead <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ parsol   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ kidssol  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ abdefect <fct> Yes, Yes, Yes, No, Yes, Yes, Yes, Yes, Yes, Yes, No, No, Y...
## $ abnomore <fct> Yes, No, Yes, No, Yes, Yes, No, Yes, No, No, No, No, No, Y...
## $ abhlth   <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, NA, No, ...
## $ abpoor   <fct> Yes, No, Yes, Yes, Yes, Yes, No, Yes, No, Yes, NA, Yes, No...
## $ abrape   <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, NA, Yes, NA, No, N...
## $ absingle <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, No, NA, Yes, N...
## $ abany    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ pillok   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ sexeduc  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ divlaw   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ premarsx <fct> Not Wrong At All, Always Wrong, Always Wrong, Always Wrong...
## $ teensex  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ xmarsex  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ homosex  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ suicide1 <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ suicide2 <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ suicide3 <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ suicide4 <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ fear     <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ owngun   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ pistol   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ shotgun  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ rifle    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ news     <fct> Everyday, Everyday, Everyday, Once A Week, Everyday, Every...
## $ tvhours  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ racdif1  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ racdif2  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ racdif3  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ racdif4  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ helppoor <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ helpnot  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ helpsick <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ helpblk  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
bw_race <- gss %>%
  filter(race == "White"| race == "Black", !is.na(educ)) %>%
  group_by(race, year) %>%
  summarise(avg_edu = mean(educ), counts = n())
## `summarise()` regrouping output by 'race' (override with `.groups` argument)
bw_race
## # A tibble: 58 x 4
## # Groups:   race [2]
##    race   year avg_edu counts
##    <fct> <int>   <dbl>  <int>
##  1 White  1972    11.6   1347
##  2 White  1973    11.8   1303
##  3 White  1974    12.0   1301
##  4 White  1975    11.9   1320
##  5 White  1976    11.8   1356
##  6 White  1977    11.8   1333
##  7 White  1978    12.0   1352
##  8 White  1980    12.1   1313
##  9 White  1982    12.1   1318
## 10 White  1983    12.4   1415
## # ... with 48 more rows
# Education's distribution comparing 
ggplot(data = bw_race, aes(x = race, y = avg_edu, fill = race)) +
  geom_boxplot() +
  ggtitle("Education level comparison of White and Black people") +
  ylab("Average education level")  

The box plot shows the distribution of education level of black and white people.

Next, Let’s see the education trend by year.

ggplot(data = bw_race, aes(x = year, y = avg_edu, color = race)) +
  geom_line(size = 1) +
  ggtitle("Average Education lv. trend of Black and White people") +
  ylab("Average Education level")

  The graph shows that the both of average of black&white's education lv. are fairly increased by year.

Part 4: Inference

\(H_0: \mu_{edu-white} = \mu_{edu-black}\); \(H_A: \mu_{edu-white} \ne \mu_{edu-black}\)

bw_race %>%
  count()
## # A tibble: 2 x 2
## # Groups:   race [2]
##   race      n
##   <fct> <int>
## 1 White    29
## 2 Black    29
# Drop factor: "other"
bw_race$race <- bw_race$race[bw_race$race != "other"]
levels(bw_race$race)
## [1] "White" "Black" "Other"
table(bw_race$race)
## 
## White Black Other 
##    29    29     0
bw_race$race <- factor(bw_race$race)
levels(bw_race$race)
## [1] "White" "Black"
inference(y = avg_edu, x = race, data = bw_race, statistic = "mean", type = "ht", null = 0, 
          alternative = "twosided", method = "theoretical")
## Response variable: numerical
## Explanatory variable: categorical (2 levels) 
## n_White = 29, y_bar_White = 12.7745, s_White = 0.7068
## n_Black = 29, y_bar_Black = 11.6176, s_Black = 0.9983
## H0: mu_White =  mu_Black
## HA: mu_White != mu_Black
## t = 5.0934, df = 28
## p_value = < 0.0001

inference(y = avg_edu, x = race, data = bw_race, statistic = "mean", type = "ci", null = 0, 
          alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical (2 levels)
## n_White = 29, y_bar_White = 12.7745, s_White = 0.7068
## n_Black = 29, y_bar_Black = 11.6176, s_Black = 0.9983
## 95% CI (White - Black): (0.6916 , 1.6221)

As the result, The P-value is nearly to zero. There for The null hypothesis is rejected. And the conclusion is we are 95% confident that the average of education of white people is more(0.6916 - 1.6221) than the black people