Setup

Load packages

library(ggplot2)
library(dplyr)
library(statsr)

Load data

Make sure your data and R Markdown files are in the same directory. When loaded your data file will be called gss. Delete this note when before you submit your work.

load("gss.Rdata")

Part 1: Data

Overview of GSS Data

The GSS contains a standard core of demographic, behavioral, and attitudinal questions, plus topics of special interest. Among the topics covered are civil liberties, crime and violence, intergroup tolerance, morality, national spending priorities, psychological well-being, social mobility, and stress and traumatic events. Altogether the GSS is the single best source for sociological and attitudinal trend data covering the United States.

From 1972 to 1974, a modified probability sample were used. In the 1975 and 1976 studies were conducted with a transitional sample design; one-half full probability and one-half block quota for methdological comparison and the impact of the changes in responses. The General Social Survey then switched to a full probability sample for the 1977+ surveys. The fact that the random sampling is one of methods of probability sample lead us to conclude the samples of GSS survey is at random.

The probability sample is a stratified, multistage area probability sample of clusters of households in the continental United States. The selection of geographic areas at successive stages is in accordance with the method of probabilities proportional to size (p.p.s.). In othe words, the assignment of the Primary Sampling Units is at random, based on Standard Metropolitan Statistical Areas (SMSAs).

It is a ccummulative data from 1972 to 2012 in order to monitor and explain trends and contraints in attitudes, behaviors and attributes. The missing values are removed for the learning purpose. The data set is a data.frame containing 57061 observation and 114 variables.

dim(gss)
## [1] 57061   114
class(gss)
## [1] "data.frame"
head(gss)
##   caseid year age    sex  race hispanic uscitzn educ paeduc maeduc speduc
## 1      1 1972  23 Female White     <NA>    <NA>   16     10     NA     NA
## 2      2 1972  70   Male White     <NA>    <NA>   10      8      8     12
## 3      3 1972  48 Female White     <NA>    <NA>   12      8      8     11
## 4      4 1972  27 Female White     <NA>    <NA>   17     16     12     20
## 5      5 1972  61 Female White     <NA>    <NA>   12      8      8     12
## 6      6 1972  26   Male White     <NA>    <NA>   14     18     19     NA
##           degree vetyears sei          wrkstat       wrkslf       marital
## 1       Bachelor     <NA>  NA Working Fulltime Someone Else Never Married
## 2 Lt High School     <NA>  NA          Retired Someone Else       Married
## 3    High School     <NA>  NA Working Parttime Someone Else       Married
## 4       Bachelor     <NA>  NA Working Fulltime Someone Else       Married
## 5    High School     <NA>  NA    Keeping House Someone Else       Married
## 6    High School     <NA>  NA Working Fulltime Someone Else Never Married
##           spwrksta sibs childs agekdbrn       incom16 born parborn
## 1             <NA>    3      0       NA       Average <NA>    <NA>
## 2    Keeping House    4      5       NA Above Average <NA>    <NA>
## 3 Working Fulltime    5      4       NA       Average <NA>    <NA>
## 4 Working Fulltime    5      0       NA       Average <NA>    <NA>
## 5 Temp Not Working    2      2       NA Below Average <NA>    <NA>
## 6             <NA>    1      0       NA       Average <NA>    <NA>
##   granborn income06 coninc          region          partyid polviews
## 1       NA     <NA>  25926 E. Nor. Central     Ind,Near Dem     <NA>
## 2       NA     <NA>  33333 E. Nor. Central Not Str Democrat     <NA>
## 3       NA     <NA>  33333 E. Nor. Central      Independent     <NA>
## 4       NA     <NA>  41667 E. Nor. Central Not Str Democrat     <NA>
## 5       NA     <NA>  69444 E. Nor. Central  Strong Democrat     <NA>
## 6       NA     <NA>  60185 E. Nor. Central     Ind,Near Dem     <NA>
##        relig       attend natspac natenvir natheal natcity natcrime
## 1     Jewish  Once A Year    <NA>     <NA>    <NA>    <NA>     <NA>
## 2   Catholic   Every Week    <NA>     <NA>    <NA>    <NA>     <NA>
## 3 Protestant Once A Month    <NA>     <NA>    <NA>    <NA>     <NA>
## 4      Other         <NA>    <NA>     <NA>    <NA>    <NA>     <NA>
## 5 Protestant         <NA>    <NA>     <NA>    <NA>    <NA>     <NA>
## 6 Protestant  Once A Year    <NA>     <NA>    <NA>    <NA>     <NA>
##   natdrug nateduc natrace natarms nataid natfare natroad natsoc natmass
## 1    <NA>    <NA>    <NA>    <NA>   <NA>    <NA>    <NA>   <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>   <NA>    <NA>    <NA>   <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>   <NA>    <NA>    <NA>   <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>   <NA>    <NA>    <NA>   <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>   <NA>    <NA>    <NA>   <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>   <NA>    <NA>    <NA>   <NA>    <NA>
##   natpark confinan conbus conclerg coneduc confed conlabor conpress
## 1    <NA>     <NA>   <NA>     <NA>    <NA>   <NA>     <NA>     <NA>
## 2    <NA>     <NA>   <NA>     <NA>    <NA>   <NA>     <NA>     <NA>
## 3    <NA>     <NA>   <NA>     <NA>    <NA>   <NA>     <NA>     <NA>
## 4    <NA>     <NA>   <NA>     <NA>    <NA>   <NA>     <NA>     <NA>
## 5    <NA>     <NA>   <NA>     <NA>    <NA>   <NA>     <NA>     <NA>
## 6    <NA>     <NA>   <NA>     <NA>    <NA>   <NA>     <NA>     <NA>
##   conmedic contv conjudge consci conlegis conarmy joblose jobfind
## 1     <NA>  <NA>     <NA>   <NA>     <NA>    <NA>    <NA>    <NA>
## 2     <NA>  <NA>     <NA>   <NA>     <NA>    <NA>    <NA>    <NA>
## 3     <NA>  <NA>     <NA>   <NA>     <NA>    <NA>    <NA>    <NA>
## 4     <NA>  <NA>     <NA>   <NA>     <NA>    <NA>    <NA>    <NA>
## 5     <NA>  <NA>     <NA>   <NA>     <NA>    <NA>    <NA>    <NA>
## 6     <NA>  <NA>     <NA>   <NA>     <NA>    <NA>    <NA>    <NA>
##            satjob richwork jobinc jobsec jobhour jobpromo jobmeans
## 1 A Little Dissat     <NA>   <NA>   <NA>    <NA>     <NA>     <NA>
## 2            <NA>     <NA>   <NA>   <NA>    <NA>     <NA>     <NA>
## 3  Mod. Satisfied     <NA>   <NA>   <NA>    <NA>     <NA>     <NA>
## 4  Very Satisfied     <NA>   <NA>   <NA>    <NA>     <NA>     <NA>
## 5            <NA>     <NA>   <NA>   <NA>    <NA>     <NA>     <NA>
## 6  Mod. Satisfied     <NA>   <NA>   <NA>    <NA>     <NA>     <NA>
##           class rank         satfin    finalter       finrela unemp govaid
## 1  Middle Class   NA Not At All Sat      Better       Average  <NA>   <NA>
## 2  Middle Class   NA   More Or Less Stayed Same Above Average  <NA>   <NA>
## 3 Working Class   NA      Satisfied      Better       Average  <NA>   <NA>
## 4  Middle Class   NA Not At All Sat Stayed Same       Average  <NA>   <NA>
## 5 Working Class   NA      Satisfied      Better Above Average  <NA>   <NA>
## 6  Middle Class   NA   More Or Less      Better Above Average  <NA>   <NA>
##   getaid union getahead parsol kidssol abdefect abnomore abhlth abpoor
## 1   <NA>  <NA>     <NA>   <NA>    <NA>      Yes      Yes    Yes    Yes
## 2   <NA>  <NA>     <NA>   <NA>    <NA>      Yes       No    Yes     No
## 3   <NA>  <NA>     <NA>   <NA>    <NA>      Yes      Yes    Yes    Yes
## 4   <NA>  <NA>     <NA>   <NA>    <NA>       No       No    Yes    Yes
## 5   <NA>  <NA>     <NA>   <NA>    <NA>      Yes      Yes    Yes    Yes
## 6   <NA>  <NA>     <NA>   <NA>    <NA>      Yes      Yes    Yes    Yes
##   abrape absingle abany pillok sexeduc divlaw         premarsx teensex
## 1    Yes      Yes  <NA>   <NA>    <NA>   <NA> Not Wrong At All    <NA>
## 2    Yes      Yes  <NA>   <NA>    <NA>   <NA>     Always Wrong    <NA>
## 3    Yes      Yes  <NA>   <NA>    <NA>   <NA>     Always Wrong    <NA>
## 4    Yes      Yes  <NA>   <NA>    <NA>   <NA>     Always Wrong    <NA>
## 5    Yes      Yes  <NA>   <NA>    <NA>   <NA>  Sometimes Wrong    <NA>
## 6    Yes      Yes  <NA>   <NA>    <NA>   <NA>  Sometimes Wrong    <NA>
##   xmarsex homosex suicide1 suicide2 suicide3 suicide4 fear owngun pistol
## 1    <NA>    <NA>     <NA>     <NA>     <NA>     <NA> <NA>   <NA>   <NA>
## 2    <NA>    <NA>     <NA>     <NA>     <NA>     <NA> <NA>   <NA>   <NA>
## 3    <NA>    <NA>     <NA>     <NA>     <NA>     <NA> <NA>   <NA>   <NA>
## 4    <NA>    <NA>     <NA>     <NA>     <NA>     <NA> <NA>   <NA>   <NA>
## 5    <NA>    <NA>     <NA>     <NA>     <NA>     <NA> <NA>   <NA>   <NA>
## 6    <NA>    <NA>     <NA>     <NA>     <NA>     <NA> <NA>   <NA>   <NA>
##   shotgun rifle        news tvhours racdif1 racdif2 racdif3 racdif4
## 1    <NA>  <NA>    Everyday      NA    <NA>    <NA>    <NA>    <NA>
## 2    <NA>  <NA>    Everyday      NA    <NA>    <NA>    <NA>    <NA>
## 3    <NA>  <NA>    Everyday      NA    <NA>    <NA>    <NA>    <NA>
## 4    <NA>  <NA> Once A Week      NA    <NA>    <NA>    <NA>    <NA>
## 5    <NA>  <NA>    Everyday      NA    <NA>    <NA>    <NA>    <NA>
## 6    <NA>  <NA>    Everyday      NA    <NA>    <NA>    <NA>    <NA>
##   helppoor helpnot helpsick helpblk
## 1     <NA>    <NA>     <NA>    <NA>
## 2     <NA>    <NA>     <NA>    <NA>
## 3     <NA>    <NA>     <NA>    <NA>
## 4     <NA>    <NA>     <NA>    <NA>
## 5     <NA>    <NA>     <NA>    <NA>
## 6     <NA>    <NA>     <NA>    <NA>

Part 2: Research question

Those holding a higher educational degree are more likely to feel government spending too little money to the education. The quality of the education set the color for the future generation. Having the human resources with higher education will set the positive tone for the country’s future. With their lifetime experience, those with higher degree have felt the importance and the benefit of getting higher education through the course of their life. The better education may have foundation to better opportunities for indivudual, healthier environment for society, and more advancing for the country. In the sense, those with higher degree will have more importance on spending more money on the education system.


Part 3: Exploratory data analysis

library("RMySQL")
## Loading required package: DBI
library("DBI")
library("lubridate")
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library("lattice")
library("grid")

As seen below, About 50% of sample graduated high school, followed by LT high school with 20%, bachelor, gradute and junior college.

DG <- gss$degree
DG.freq <- table(DG)
DG.relfreq <- (DG.freq / nrow(gss))*100
cbind(DG.freq, DG.relfreq)
##                DG.freq DG.relfreq
## Lt High School   11822  20.718179
## High School      29287  51.325774
## Junior College    3070   5.380207
## Bachelor          8002  14.023589
## Graduate          3870   6.782216

In general, many people also think the government spend too little on the national education. About 4% of people feel that government spend too much money on the national education.

NED <- gss$nateduc
NED.freq <- table(NED)
NED.relfreq <- (NED.freq / nrow(gss))*100
cbind(NED.freq, NED.relfreq)
##             NED.freq NED.relfreq
## Too Little     20619   36.135013
## About Right     9374   16.428033
## Too Much        2262    3.964179

Please see the graphs below for better understanding of the distributions.

par(mfrow = c(1,2))
barplot(DG.freq, xlab = "Degree", ylab = "Frequency", main = "Distribution of Degree")
barplot(NED.freq, xlab = "Perception on educational budget", ylab = "Frequency", main = "Attitude on educational budget")

As indicated in the research question, those with higher educational degree are more likely to feel that the government should spend much more money on on the national education.

attach(gss)
plot(degree, nateduc, xlab = "Educational degree", ylab = "Level of Govt budget on Education", main="Degree by Attitude on Educational budget")


Part 4: Inference

Hypothsis: H0: There is no signifcant difference in the perspective on the national education budget across the different level of educational degree (p=0.2) HA: Those with higher educational degree feel more that government sped too little on the national education (p>0.2)

Conditions: 1. Independence: sample of 57,061 is less than 10% of american population and full probaility sample which is one type of random sample. The condition for indepdence is met. 2. Sample size / skew: The 20% of 57,061 samples are 11,412. And it is more than 10 sample requirement for the sample size and skewness.

Mehtod of analysis: The hypothsis test for a proportion will be used since the variable at interest, esp. the dependent variable, are categorical data with recoded as success or fail. The research question is also related to the proportional difference in the perception on the government’s investment on national education in relation to the education level of the samples. In order to do the analysis, the “Degree” will be used as the independent variable and the “Nateduc” as the dependent variable. Considering the response “Too little” as the sucess from the dependent variable, the hyppthesis test will be performed at 95% confidence level.

inference(y = nateduc, x= degree, gss, statistic = "proportion", type = "ht", method = "theoretical", alternative = "greater")
## Response variable: categorical (3 levels) 
## Explanatory variable: categorical (5 levels) 
## Observed:
##                 y
## x                Too Little About Right Too Much
##   Lt High School       3868        2593      619
##   High School         10784        4735     1062
##   Junior College       1187         363       79
##   Bachelor             2993         997      313
##   Graduate             1482         453      128
## 
## Expected:
##                 y
## x                Too Little About Right  Too Much
##   Lt High School   4543.313   2044.4238  492.2631
##   High School     10640.208   4787.9366 1152.8551
##   Junior College   1045.347    470.3907  113.2622
##   Bachelor         2761.282   1242.5361  299.1819
##   Graduate         1323.850    595.7128  143.4377
## 
## H0: degree and nateduc are independent
## HA: degree and nateduc are dependent
## chi_sq = 467.3183, df = 8, p_value = 0

We have to reject the null hypothesis since the p value is less than 0.05. This means that the educational level have impact on their position to the budget spending of government on the national education. However, in oppsite to the research question, those with high school diploma feels significantly more government should spend more money on the national education.