Introduction


In this final project we will analyze the full data set from the GSS.

Please install the gssr package and read the instruction and example here.

In this project, you will be required to finish the following tasks:

  1. Understand basic information and the data structure of the data set.
  2. Understand how to get information for each variable from the code book or the help documentation built in in the R package.
  3. Read recommended articles to have hints about how the data set can be used to answer questions of interest.
  4. Finish required tasks as a warm-up exercise to analyze the GSS data set.
  5. (Main Task) Explore at least 3 new questions of your own interest and make efforts to do analysis trying to answer the question with what you had learned from this course.

Requirements

  1. You are required to submit a pdf (text + plot) with code hidden along with the rmd file that can be knitted to generate the pdf report. You can put this chunk in a code cell at top of Rmd will hide code in knitted doc: opts_chunk$set(echo=FALSE)

  2. The final report should be at least 8 pages long and formatted including sections of Introduction, Questions and Findings, and Conclusion:

    • Introduction: this section shall introduce the gss data set and how it can be used to answer questions based on given materials and your own research
    • Questions and Findings: this section shall include both required tasks and your own questions. For each question, clearly state the question, the visualization or analysis result that is related to that question, and the actual answer along with optional meaningful discussion in words.
    • Conclusion: summarize what you learn from your own questions. Your questions should be related to each other such that your analysis can give a comprehensive insight of some social aspects under investigation.
  3. Rubrics:


Task List


1. Understand basic information and the data structure of the data set.

  • Install the package gssr, then read its mainpage to understand the basic structure of the data set. Obtain basic information about the data set from the GSS mainpage.

  • Give a brief description of gss data set in the first paragraph of Introduction section.


2. Understand how to get information for each variable from the code book or the help documentation built in in the R package.

  • Full documentation about the data set is a place for reference. You want to scan through the “Introduction” and “Index to Data Set” to get an idea about what variables are included.

  • The gssr package mainpage also has an tutorial of how to obtain the meaning for each variable in R.

  • Before you start, you can use online exploration tool to navigate and to do preliminary analysis and even extract a subset of data. However your final Rmd must be accompanied with R code.

  • For this part, you don’t have to any explicit work in your report.


4. Required warm-up questions

In 2016, the GSS added a new question on harassment at work. The question is phrased as the following.

Over the past five years, have you been harassed by your superiors or co-workers at your job, for example, have you experienced any bullying, physical or psychological abuse?

Answers to this question are stored in the harass5 variable in the data set from Year 2016.

  • What are the possible responses to this question and how many respondents chose each of these answers?

  • What percent of the respondents for whom this question is applicable (i.e. excluding NAs and Does not applys) have been harassed by their superiors or co-workers at their job.

  • Among those who answered “yes” to the question, what is the proportion of males and females, respectively?

5. Your own questions

Pick up at least three questions of your interest that can be explored by the gss data set. The questions should meet the following requirements:

  • At least one question needs to explore data from the past 10 years (2012-2022).

  • At least one question should study a trend with respect to time (compare the same variable from different years).

  • The questions should be related to each other in some way such that you can draw an overall conclusion in the Conclusion section.