This extract of the General Social Survey (GSS) Cumulative File 1972-2012 provides a sample of selected indicators in the GSS with the goal of providing a convenient data resource for students learning statistical reasoning using the R language. Unlike the full General Social Survey Cumulative File, we have removed missing values from the responses and created factor variables when appropriate to facilitate analysis using R. Our hope is that this will allow students to focus on statistical concepts without having to (initially) be concerned about some of the data management and interpretation issues associated with missing data and factor variables in R. Other than the two modifications mentioned above, all data and coding come from the original dataset. Students and researchers seeking to conduct research or explore the full codebook behind the full General Social Survey Cumulative File are urged to consult the original dataset at the citation that follows:
This data treat observations as independent samples by year.
Smith, Tom W., Michael Hout, and Peter V. Marsden. General Social Survey, 1972-2012 [Cumulative File]. ICPSR34802-v1. Storrs, CT: Roper Center for Public Opinion Research, University of Connecticut /Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2013-09-11.
Persistent URL: http://doi.org/10.3886/ICPSR34802.v1
This particular data that is used in this project is a longitudinal observation study. The result from this study cannot show causation.
Since 1972, the General Social Survey (GSS) has been monitoring societal change and studying the growing complexity of American society. The GSS aims to gather data on contemporary American society in order to monitor and explain trends and constants in attitudes, behaviors, and attributes; to examine the structure and functioning of society in general as well as the role played by relevant subgroups; to compare the United States to other societies in order to place American society in comparative perspective and develop cross-national models of human society; and to make high-quality data easily accessible to scholars, students, policy makers, and others, with minimal cost and waiting.
GSS questions cover a diverse range of issues including national spending priorities, marijuana use, crime and punishment, race relations, quality of life, confidence in institutions, and sexual behavior.
Come up with a research question that you want to answer using these data. You should phrase your research question in a way that matches up with the scope of inference your dataset allows for. You are welcomed to create new variables based on existing ones. Along with your research question include a brief discussion (1-2 sentences) as to why this question is of interest to you and/or your audience.
I would like to find if there is an association between having a degree and the status of employment.
This question is of interest to me because I would like to know whether the outcome between these variables are significant.
Conditions for chi-square test:
1). Data is from reputable
source and is claimed to be randomly selected. ✓
2). Expected
sample size condition met, all expected counts ≥ 5. ✓
The following tables used the tally() function to count how many working status and degree categories have occurred in their respective column.
## wrkstat
## Working Fulltime Working Parttime Temp Not Working Unempl, Laid Off
## 28207 5842 1213 1873
## Retired School Keeping House Other
## 7642 1751 9387 1132
## <NA>
## 14
## degree
## Lt High School High School Junior College Bachelor Graduate
## 11822 29287 3070 8002 3870
## <NA>
## 1010
## wrkstat
## Working Fulltime Working Parttime Temp Not Working Unempl, Laid Off
## 0.4943306286 0.1023816617 0.0212579520 0.0328245211
## Retired School Keeping House Other
## 0.1339268502 0.0306864584 0.1645081579 0.0198384185
## <NA>
## 0.0002453515
## degree
## Lt High School High School Junior College Bachelor Graduate
## 0.20718179 0.51325774 0.05380207 0.14023589 0.06782216
## <NA>
## 0.01770036
Then, I graphed bar plots of work status count that is depended on their degree category.
tab <- table(gss$wrkstat, gss$degree)
new_tab <- data.frame(tab)
df_wide <- as.data.frame.matrix(tab)
df_wide## Lt High School High School Junior College Bachelor Graduate
## Working Fulltime 3438 14829 1911 5106 2642
## Working Parttime 997 3292 336 799 338
## Temp Not Working 227 602 85 199 87
## Unempl, Laid Off 510 1023 80 164 58
## Retired 2596 3277 244 762 488
## School 323 1063 104 192 59
## Keeping House 3322 4657 274 698 165
## Other 407 539 36 75 33
Research question:
Is there an association between having a
degree and the status of employment?
1).
H0: The working and degree status variables are
independent.
Ha: The working and degree status variables
are dependent.
2).
## Lt High School High School Junior College Bachelor Graduate
## Working Fulltime 5890.4888 14592.6643 1529.93237 3984.3027 1928.61181
## Working Parttime 1215.3905 3010.9193 315.67250 822.0852 397.93244
## Temp Not Working 253.1185 627.0571 65.74228 171.2083 82.87382
## Unempl, Laid Off 387.0603 958.8749 100.53090 261.8060 126.72788
## Retired 1553.9365 3849.6082 403.60280 1051.0763 508.77617
## School 367.2327 909.7554 95.38109 248.3947 120.23609
## Keeping House 1922.8567 4763.5439 499.42217 1300.6125 629.56475
## Other 229.9159 569.5769 59.71590 155.5142 75.27705
## Lt High School High School Junior College Bachelor
## Working Fulltime -31.954451 1.956419 9.7423998 17.7704946
## Working Parttime -6.264348 5.122494 1.1441041 -0.8051481
## Temp Not Working -1.641670 -1.000640 2.3751036 2.1239904
## Unempl, Laid Off 6.248887 2.070844 -2.0476616 -6.0447152
## Retired 26.434892 -9.228886 -7.9444423 -8.9165207
## School -2.308198 5.080693 0.8825135 -3.5782219
## Keeping House 31.907204 -1.543703 -10.0870161 -16.7095267
## Other 11.678711 -1.281200 -3.0689842 -6.4563566
## Graduate
## Working Fulltime 16.2443936
## Working Parttime -3.0043967
## Temp Not Working 0.4532523
## Unempl, Laid Off -6.1051571
## Retired -0.9210899
## School -5.5845740
## Keeping House -18.5150996
## Other -4.8727415
##
## Pearson's Chi-squared test
##
## data: df_wide
## X-squared = 4871.8, df = 28, p-value < 2.2e-16
We reject H0. At 5% significance level, there is statistical evidence showing an association between having a degree and working status.