library(ggplot2)
library(dplyr)
library(statsr)Make sure your data and R Markdown files are in the same directory. When loaded your data file will be called gss. Delete this note when before you submit your work.
load("gss.Rdata")The GSS contains a standard core of demographic, behavioral, and attitudinal questions, plus topics of special interest. Among the topics covered are civil liberties, crime and violence, intergroup tolerance, morality, national spending priorities, psychological well-being, social mobility, and stress and traumatic events. Altogether the GSS is the single best source for sociological and attitudinal trend data covering the United States.
From 1972 to 1974, a modified probability sample were used. In the 1975 and 1976 studies were conducted with a transitional sample design; one-half full probability and one-half block quota for methdological comparison and the impact of the changes in responses. The General Social Survey then switched to a full probability sample for the 1977+ surveys. The fact that the random sampling is one of methods of probability sample lead us to conclude the samples of GSS survey is at random.
The probability sample is a stratified, multistage area probability sample of clusters of households in the continental United States. The selection of geographic areas at successive stages is in accordance with the method of probabilities proportional to size (p.p.s.). In othe words, the assignment of the Primary Sampling Units is at random, based on Standard Metropolitan Statistical Areas (SMSAs).
It is a ccummulative data from 1972 to 2012 in order to monitor and explain trends and contraints in attitudes, behaviors and attributes. The missing values are removed for the learning purpose. The data set is a data.frame containing 57061 observation and 114 variables.
dim(gss)## [1] 57061 114
class(gss)## [1] "data.frame"
head(gss)## caseid year age sex race hispanic uscitzn educ paeduc maeduc speduc
## 1 1 1972 23 Female White <NA> <NA> 16 10 NA NA
## 2 2 1972 70 Male White <NA> <NA> 10 8 8 12
## 3 3 1972 48 Female White <NA> <NA> 12 8 8 11
## 4 4 1972 27 Female White <NA> <NA> 17 16 12 20
## 5 5 1972 61 Female White <NA> <NA> 12 8 8 12
## 6 6 1972 26 Male White <NA> <NA> 14 18 19 NA
## degree vetyears sei wrkstat wrkslf marital
## 1 Bachelor <NA> NA Working Fulltime Someone Else Never Married
## 2 Lt High School <NA> NA Retired Someone Else Married
## 3 High School <NA> NA Working Parttime Someone Else Married
## 4 Bachelor <NA> NA Working Fulltime Someone Else Married
## 5 High School <NA> NA Keeping House Someone Else Married
## 6 High School <NA> NA Working Fulltime Someone Else Never Married
## spwrksta sibs childs agekdbrn incom16 born parborn
## 1 <NA> 3 0 NA Average <NA> <NA>
## 2 Keeping House 4 5 NA Above Average <NA> <NA>
## 3 Working Fulltime 5 4 NA Average <NA> <NA>
## 4 Working Fulltime 5 0 NA Average <NA> <NA>
## 5 Temp Not Working 2 2 NA Below Average <NA> <NA>
## 6 <NA> 1 0 NA Average <NA> <NA>
## granborn income06 coninc region partyid polviews
## 1 NA <NA> 25926 E. Nor. Central Ind,Near Dem <NA>
## 2 NA <NA> 33333 E. Nor. Central Not Str Democrat <NA>
## 3 NA <NA> 33333 E. Nor. Central Independent <NA>
## 4 NA <NA> 41667 E. Nor. Central Not Str Democrat <NA>
## 5 NA <NA> 69444 E. Nor. Central Strong Democrat <NA>
## 6 NA <NA> 60185 E. Nor. Central Ind,Near Dem <NA>
## relig attend natspac natenvir natheal natcity natcrime
## 1 Jewish Once A Year <NA> <NA> <NA> <NA> <NA>
## 2 Catholic Every Week <NA> <NA> <NA> <NA> <NA>
## 3 Protestant Once A Month <NA> <NA> <NA> <NA> <NA>
## 4 Other <NA> <NA> <NA> <NA> <NA> <NA>
## 5 Protestant <NA> <NA> <NA> <NA> <NA> <NA>
## 6 Protestant Once A Year <NA> <NA> <NA> <NA> <NA>
## natdrug nateduc natrace natarms nataid natfare natroad natsoc natmass
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## natpark confinan conbus conclerg coneduc confed conlabor conpress
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## conmedic contv conjudge consci conlegis conarmy joblose jobfind
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## satjob richwork jobinc jobsec jobhour jobpromo jobmeans
## 1 A Little Dissat <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 Mod. Satisfied <NA> <NA> <NA> <NA> <NA> <NA>
## 4 Very Satisfied <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 Mod. Satisfied <NA> <NA> <NA> <NA> <NA> <NA>
## class rank satfin finalter finrela unemp govaid
## 1 Middle Class NA Not At All Sat Better Average <NA> <NA>
## 2 Middle Class NA More Or Less Stayed Same Above Average <NA> <NA>
## 3 Working Class NA Satisfied Better Average <NA> <NA>
## 4 Middle Class NA Not At All Sat Stayed Same Average <NA> <NA>
## 5 Working Class NA Satisfied Better Above Average <NA> <NA>
## 6 Middle Class NA More Or Less Better Above Average <NA> <NA>
## getaid union getahead parsol kidssol abdefect abnomore abhlth abpoor
## 1 <NA> <NA> <NA> <NA> <NA> Yes Yes Yes Yes
## 2 <NA> <NA> <NA> <NA> <NA> Yes No Yes No
## 3 <NA> <NA> <NA> <NA> <NA> Yes Yes Yes Yes
## 4 <NA> <NA> <NA> <NA> <NA> No No Yes Yes
## 5 <NA> <NA> <NA> <NA> <NA> Yes Yes Yes Yes
## 6 <NA> <NA> <NA> <NA> <NA> Yes Yes Yes Yes
## abrape absingle abany pillok sexeduc divlaw premarsx teensex
## 1 Yes Yes <NA> <NA> <NA> <NA> Not Wrong At All <NA>
## 2 Yes Yes <NA> <NA> <NA> <NA> Always Wrong <NA>
## 3 Yes Yes <NA> <NA> <NA> <NA> Always Wrong <NA>
## 4 Yes Yes <NA> <NA> <NA> <NA> Always Wrong <NA>
## 5 Yes Yes <NA> <NA> <NA> <NA> Sometimes Wrong <NA>
## 6 Yes Yes <NA> <NA> <NA> <NA> Sometimes Wrong <NA>
## xmarsex homosex suicide1 suicide2 suicide3 suicide4 fear owngun pistol
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## shotgun rifle news tvhours racdif1 racdif2 racdif3 racdif4
## 1 <NA> <NA> Everyday NA <NA> <NA> <NA> <NA>
## 2 <NA> <NA> Everyday NA <NA> <NA> <NA> <NA>
## 3 <NA> <NA> Everyday NA <NA> <NA> <NA> <NA>
## 4 <NA> <NA> Once A Week NA <NA> <NA> <NA> <NA>
## 5 <NA> <NA> Everyday NA <NA> <NA> <NA> <NA>
## 6 <NA> <NA> Everyday NA <NA> <NA> <NA> <NA>
## helppoor helpnot helpsick helpblk
## 1 <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA>
Those holding a higher educational degree are more likely to feel government spending too little money to the education. The quality of the education set the color for the future generation. Having the human resources with higher education will set the positive tone for the country’s future. With their lifetime experience, those with higher degree have felt the importance and the benefit of getting higher education through the course of their life. The better education may have foundation to better opportunities for indivudual, healthier environment for society, and more advancing for the country. In the sense, those with higher degree will have more importance on spending more money on the education system.
library("RMySQL")## Loading required package: DBI
library("DBI")
library("lubridate")##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library("lattice")
library("grid")As seen below, About 50% of sample graduated high school, followed by LT high school with 20%, bachelor, gradute and junior college.
DG <- gss$degree
DG.freq <- table(DG)
DG.relfreq <- (DG.freq / nrow(gss))*100
cbind(DG.freq, DG.relfreq)## DG.freq DG.relfreq
## Lt High School 11822 20.718179
## High School 29287 51.325774
## Junior College 3070 5.380207
## Bachelor 8002 14.023589
## Graduate 3870 6.782216
In general, many people also think the government spend too little on the national education. About 4% of people feel that government spend too much money on the national education.
NED <- gss$nateduc
NED.freq <- table(NED)
NED.relfreq <- (NED.freq / nrow(gss))*100
cbind(NED.freq, NED.relfreq)## NED.freq NED.relfreq
## Too Little 20619 36.135013
## About Right 9374 16.428033
## Too Much 2262 3.964179
Please see the graphs below for better understanding of the distributions.
par(mfrow = c(1,2))
barplot(DG.freq, xlab = "Degree", ylab = "Frequency", main = "Distribution of Degree")
barplot(NED.freq, xlab = "Perception on educational budget", ylab = "Frequency", main = "Attitude on educational budget")As indicated in the research question, those with higher educational degree are more likely to feel that the government should spend much more money on on the national education.
attach(gss)
plot(degree, nateduc, xlab = "Educational degree", ylab = "Level of Govt budget on Education", main="Degree by Attitude on Educational budget")Hypothsis: H0: There is no signifcant difference in the perspective on the national education budget across the different level of educational degree (p=0.2) HA: Those with higher educational degree feel more that government sped too little on the national education (p>0.2)
Conditions: 1. Independence: sample of 57,061 is less than 10% of american population and full probaility sample which is one type of random sample. The condition for indepdence is met. 2. Sample size / skew: The 20% of 57,061 samples are 11,412. And it is more than 10 sample requirement for the sample size and skewness.
Mehtod of analysis: The hypothsis test for a proportion will be used since the variable at interest, esp. the dependent variable, are categorical data with recoded as success or fail. The research question is also related to the proportional difference in the perception on the government’s investment on national education in relation to the education level of the samples. In order to do the analysis, the “Degree” will be used as the independent variable and the “Nateduc” as the dependent variable. Considering the response “Too little” as the sucess from the dependent variable, the hyppthesis test will be performed at 95% confidence level.
inference(y = nateduc, x= degree, gss, statistic = "proportion", type = "ht", method = "theoretical", alternative = "greater")## Response variable: categorical (3 levels)
## Explanatory variable: categorical (5 levels)
## Observed:
## y
## x Too Little About Right Too Much
## Lt High School 3868 2593 619
## High School 10784 4735 1062
## Junior College 1187 363 79
## Bachelor 2993 997 313
## Graduate 1482 453 128
##
## Expected:
## y
## x Too Little About Right Too Much
## Lt High School 4543.313 2044.4238 492.2631
## High School 10640.208 4787.9366 1152.8551
## Junior College 1045.347 470.3907 113.2622
## Bachelor 2761.282 1242.5361 299.1819
## Graduate 1323.850 595.7128 143.4377
##
## H0: degree and nateduc are independent
## HA: degree and nateduc are dependent
## chi_sq = 467.3183, df = 8, p_value = 0
We have to reject the null hypothesis since the p value is less than 0.05. This means that the educational level have impact on their position to the budget spending of government on the national education. However, in oppsite to the research question, those with high school diploma feels significantly more government should spend more money on the national education.