In this final project we will analyze the full data set from the GSS.
Please install the gssr
package and read the instruction
and example here.
In this project, you will be required to finish the following tasks:
Requirements
You are required to submit a pdf (text + plot) with code hidden
along with the rmd file that can be knitted to generate the pdf report.
You can put this chunk in a code cell at top of Rmd will hide code in
knitted doc: opts_chunk$set(echo=FALSE)
The final report should be at least 8 pages long and formatted including sections of Introduction, Questions and Findings, and Conclusion:
gss
data
set and how it can be used to answer questions based on given materials
and your own researchRubrics:
Install the package gssr
, then read its mainpage to understand the
basic structure of the data set. Obtain basic information about the data
set from the GSS mainpage.
Give a brief description of gss
data set in the
first paragraph of Introduction section.
Full documentation about the data set is a place for reference. You want to scan through the “Introduction” and “Index to Data Set” to get an idea about what variables are included.
The gssr
package mainpage also has an tutorial of
how to obtain the meaning for each variable in R.
Before you start, you can use online exploration tool to navigate and to do preliminary analysis and even extract a subset of data. However your final Rmd must be accompanied with R code.
For this part, you don’t have to any explicit work in your report.
GSS data analysis are often quoted by media. One example is this article. You can get some hints from these studies and you can also pursue to explore some of the finding in new directions/depth.
Key trends section lists many interesting trends which should give you some hints about where to start.
For this part, you don’t have to any explicit work in your report.
In 2016, the GSS added a new question on harassment at work. The question is phrased as the following.
Over the past five years, have you been harassed by your superiors or co-workers at your job, for example, have you experienced any bullying, physical or psychological abuse?
Answers to this question are stored in the harass5
variable in the data set from Year 2016.
What are the possible responses to this question and how many respondents chose each of these answers?
What percent of the respondents for whom this question is
applicable (i.e. excluding NA
s and
Does not apply
s) have been harassed by their superiors or
co-workers at their job.
Among those who answered “yes” to the question, what is the proportion of males and females, respectively?
Pick up at least three questions of your interest that can be
explored by the gss
data set. The questions should meet the
following requirements:
At least one question needs to explore data from the past 10 years (2012-2022).
At least one question should study a trend with respect to time (compare the same variable from different years).
The questions should be related to each other in some way such that you can draw an overall conclusion in the Conclusion section.