Setup

Load packages

library(ggplot2)
library(dplyr)
library(statsr)

Load data

Make sure your data and R Markdown files are in the same directory. When loaded your data file will be called gss. Delete this note when before you submit your work.

load("gss.Rdata")

Part 1: Data

dim(gss)[1]

## [1] 57061

variable.names(gss)

##   [1] "caseid"   "year"     "age"      "sex"      "race"     "hispanic"
##   [7] "uscitzn"  "educ"     "paeduc"   "maeduc"   "speduc"   "degree"  
##  [13] "vetyears" "sei"      "wrkstat"  "wrkslf"   "marital"  "spwrksta"
##  [19] "sibs"     "childs"   "agekdbrn" "incom16"  "born"     "parborn" 
##  [25] "granborn" "income06" "coninc"   "region"   "partyid"  "polviews"
##  [31] "relig"    "attend"   "natspac"  "natenvir" "natheal"  "natcity" 
##  [37] "natcrime" "natdrug"  "nateduc"  "natrace"  "natarms"  "nataid"  
##  [43] "natfare"  "natroad"  "natsoc"   "natmass"  "natpark"  "confinan"
##  [49] "conbus"   "conclerg" "coneduc"  "confed"   "conlabor" "conpress"
##  [55] "conmedic" "contv"    "conjudge" "consci"   "conlegis" "conarmy" 
##  [61] "joblose"  "jobfind"  "satjob"   "richwork" "jobinc"   "jobsec"  
##  [67] "jobhour"  "jobpromo" "jobmeans" "class"    "rank"     "satfin"  
##  [73] "finalter" "finrela"  "unemp"    "govaid"   "getaid"   "union"   
##  [79] "getahead" "parsol"   "kidssol"  "abdefect" "abnomore" "abhlth"  
##  [85] "abpoor"   "abrape"   "absingle" "abany"    "pillok"   "sexeduc" 
##  [91] "divlaw"   "premarsx" "teensex"  "xmarsex"  "homosex"  "suicide1"
##  [97] "suicide2" "suicide3" "suicide4" "fear"     "owngun"   "pistol"  
## [103] "shotgun"  "rifle"    "news"     "tvhours"  "racdif1"  "racdif2" 
## [109] "racdif3"  "racdif4"  "helppoor" "helpnot"  "helpsick" "helpblk"

The General Social Survey (GSS) is a survey designed to collect informatio to monitoe and explain trends in attitudes, behavior and attibutes of resident of the United States.

The marjority of data was collected by randomly secleted from spanish speaking person 18 years og age or older who lives in the United States and wiiling to take a questionare by computer- assisted personal interview (CAPI), face-to-face interview, telephone interview since 1972.

In fact, this interview is voluntary- based so that may imply the sample was biasd toward people who were more willing or able to complete the survey. Howerver, this survey contains 57061 respondents and more than 100 variables have been collected, the data is belived to be generalization of the whole United Stateds’ residents.

Since tha data is obtained from observation, we cannot make causal conclusions.

Part 2: Research question

Does people have opinion of should government improve the standard of living of living of all poor Americans depended on their party affliations?

Part 3: Exploratory data analysis

First I have to sorted out what I data I should use, in this case is and and create a table for numerical summary.

table <- table(gss$partyid,gss$helppoor)
table

##                     
##                      Govt Action Agree With Both People Help Selves
##   Strong Democrat           1364            1758                329
##   Not Str Democrat          1079            2609                429
##   Ind,Near Dem               608            1600                230
##   Independent                777            1956                404
##   Ind,Near Rep               252            1085                368
##   Not Str Republican         423            2039                602
##   Strong Republican          217            1011                660
##   Other Party                 67             153                 55

As we can tell from the table, there are more democrats think government should help poor Americans improve their standards of living, but the results did not show us the proportion of people in different party think of imroving poor Americans standard of living. Next, we should conduct a graph shows distribution of opinions towards improving poor Americans’ standards of living, and construct a graph shows the proportion of people in each party think of improving poor Americans’ standards of living.

ggplot(gss %>% filter(helppoor!='NA') %>% filter(partyid!='Other Party'), aes(partyid, fill=helppoor)) +  
 geom_bar(position="dodge") +
 theme(axis.text.x = element_text(angle = 90, hjust = 1))

mosaicplot(table, shade = TRUE)

Those grapghs shows us the basic idea of distribution and proportion of thoughts, but we want to get a clear view of data, so I want to group some views and some party affliation.

partyid2 <-recode(gss$partyid,'Strong Democrat' = "Democrat", 'Not Str Democrat' = "Democrat", 'Ind,Near Dem' = "Democrat", 'Ind,Near Rep' = "Republican", 'Not Str Republican' = "Republican", 'Strong Republican' = 'Republican')
helppoor2 <- recode(gss$helppoor, 'Govt Action' = "Yes", 'Agree With Both' = "No", 'People Help Selves' = "No")
 newtable <- table(partyid2, helppoor2)
 newtable

##              helppoor2
## partyid2       Yes   No
##   Democrat    3051 6955
##   Independent  777 2360
##   Republican   892 5765
##   Other Party   67  208

ggplot(gss , aes(partyid2, fill = helppoor2)) +
  geom_bar(position = "dodge")

mosaicplot(newtable, shade = TRUE)

We can see the proportion of Democrat think government should improve poor Americans’s standards of living is more that the proportion of Republican opinon of should improve porr Americans’s standards of living. What happen if we put Agree with both on the “yes Column”

helppoor3 <- recode(gss$helppoor, 'Govt Action' = "Yes", 'Agree With Both' = "Yes", 'People Help Selves' = "No")
 newtable2 <- table(partyid2, helppoor3)
 newtable2

##              helppoor3
## partyid2       Yes   No
##   Democrat    9018  988
##   Independent 2733  404
##   Republican  5027 1630
##   Other Party  220   55

ggplot(gss , aes(partyid2, fill = helppoor3)) +
  geom_bar(position = "dodge")

mosaicplot(newtable2, shade = TRUE)

The results show us the same when we put “Agree with Both” to “yes, government should improve poor Americans’s standards of living”

Part 4: Inference

Hypothsis

Now we test whether there is relationship between opinion on improve poor Americans’s standards of living and people party affiliations. \(H_0\) : (Nothing going on) In population, opinion of people on improve poor Americans’s standards of living are independent. \(H_A\) : (Something going on) In population, opinion of people on improve poor Americans’s standards of living are dependent.

Conditions of independence

Random sample select has been used in this survey. sample size is smaller than 10% of the whole United States population. Each respondents’ answer correspond to on cell in the table.

Results

chisq.test(newtable)

## 
##  Pearson's Chi-squared test
## 
## data:  newtable
## X-squared = 644.92, df = 3, p-value < 2.2e-16

chisq.test(newtable2)

## 
##  Pearson's Chi-squared test
## 
## data:  newtable2
## X-squared = 678.61, df = 3, p-value < 2.2e-16

Both results have p-value near zero, so we can say opinion of people on improve poor Americans’s standards of living are dependent.

Statistical inference with the GSS data